Benchmarking a Beowulf Cluster

Hi guys. I am trying to test my universities cluster using the Intel Linpack Benchmarking software. Let me say from the get-go that i am a Linux novice, and only recently learnt some Linux commands to have a play around, so if you can please keep the language simple :slight_smile:
I have run a Linpack test as can be seen from the below terminal copy and paste. However, the test that i have currently run, i believe is just testing the avg GFLOPS for the current node i am on. What if i want to test the strength of a cluster of 3 nodes all working together? Would anyone happen to know what i would need to type in the command line in order to run a test on 3 nodes that i have logged into (all clustered by the university already). The nodes are all Xeon 64's, with 4 Gigs of ram each. Would really appreciate any help.

[******@beowulf linpack]$ ./xlinpack_xeon64
 Input data or print help ? Type [data]/help :
Number of equations to solve (problem size): 20000
Leading dimension of array: 20000
Number of trials to run: 4
Data alignment value (in Kbytes): 4
Current date/time: Wed Apr 29 13:50:57 2009
 CPU frequency:    3.400 GHz
Number of CPUs: 4
Number of threads: 4
Parameters are set to:
 Number of tests                             : 1
Number of equations to solve (problem size) : 20000
Leading dimension of array                  : 20000
Number of trials to run                     : 4
Data alignment value (in Kbytes)            : 4
 Maximum memory requested that can be used = 3200404096, at the size = 20000
============= Timing linear equation system solver =================
 Size   LDA    Align. Time(s)    GFlops   Residual      Residual(norm)
Error: info returned = 1
20000  20000  4      237.211    22.4869  4.455000e-10 3.943651e-02
Error: info returned = 1
20000  20000  4      236.686    22.5368  4.455000e-10 3.943651e-02
Error: info returned = 1
20000  20000  4      235.285    22.6710  4.455000e-10 3.943651e-02
Error: info returned = 1
20000  20000  4      237.404    22.4686  4.455000e-10 3.943651e-02
 Performance Summary (GFlops)
 Size   LDA    Align.  Average  Maximal
20000  20000  4       22.5408  22.6710
 End of tests

First, you should (a) install the ATLAS and scalapak libraries, and make sure these are on each node. Second, you need to install one of the MPI packages (OpenMPI, LAMMPI, MPICH, etc); the run-times need to be on each node, and the compiler libraries and tools need to be on one node. Third, you need to recompile for MPI and ATLAS. I believe linpack uses a configure script in which you tell it to use MPI or something like that. Fourth, for these benchmarks, you should disable Linux's swap; this ensures the linpack doesn't start swapping and killing performance. (Do this with sysctl vm.swappiness=0" and after "=1") (If it runs out of memory, the problem size is too large, and the process fails.)

Next, start out with a simple test to make sure your hpl + mpi setup is working. You'll need a dummy config file like this:

Our cluster benchmark
My university lab
HPL.out
6
1
400
1
50
1
1     
4
3
-1
1       # of panel fact
0
1       # of recursive stopping criterium
2
1       # of panels in recursion
2
1       # of recursive panel fact.
0
1       # of Bcasts
1
1       # of Lookahead depths
0
2
60
0
0
0       
8       # alignment of double

It should run to completion and give you some reasonable output (the last number of the output is GFlops). With these Xeon's, your theoretical peak is 3 (nodes) * 4 (CPUs) * 4 (cores) * 3.4 (GHz) * N (Floating point operations / cycle) = 163 * N. (See the Intel spec sheet for your processor to determine N).

Once you have that working, you're ready for tuning the HPL suite: run a series of tests, each with a different configuration. One configuration file does this. The linpack program permutes all possible combinations of parameters within the file, and runs one test on each permutation. A quide to this format can be found here, but here's what I suggest you start with:

Our cluster benchmark
My university lab
HPL.out
6
7
100 200 400 800 1600 3200 6400 
5
50 100 150 200 250
1
4
12 4 3 1
1 3 4 12
-1
3       # of panel fact
0 1 2   PFACTs (0=left, 1=Crout, 2=Right)
4       # of recursive stopping criterium
1 2 4 8 NBMINs (>= 1)
3       # of panels in recursion
2 3 4   NDIVs
3       # of recursive panel fact.
0 1 2   RFACTs (0=left, 1=Crout, 2=Right)
5       # of Bcasts
0 1 2 3 4 5
3       # Lookaheads
0 1 2
2       # SWAP type
60      # SWAP=2 threshold
0
0
1
8

Now, this will will take a long time. Make sure you pipe the output with " | tee hpout.dat" to make sure you capture the output and can see it and it gets saved to disk.

After this, look for the top 8 or 16 results, and refine the config file to use only the parameters that produced these results.

NOW you can start performance tuning the cluster. Most critically, you will want to (a) tune the TCP/IP kernel parameters, (b) disable all non-essential Linux processes on all nodes, and (c) tune the switch parameters for the cluster ports -- ie, disable auto-negotiate and maybe tune the messaging queues (some switches use different types of service and have small queues for each one; you want one large queue for all TOS).

Thanks for the response otheus.

Everything seems to be working except the tuning of the HPL.dat. I keep getting processor errors such as:

HPL ERROR from process # 0, on line 419 of function HPL_pdinfo:
>>> Need at least 8 processes for these tests <<<

HPL ERROR from process # 0, on line 621 of function HPL_pdinfo:
>>> Illegal input in file HPL.dat. Exiting ... <

That is trying to run it on 8 cores across 2 nodes. I have also tried your HPL.dat you provided, and i get a similar error except it says Need at least 12 processes.

Do you know what causes these errors. I have a hosts file in the same directory with the names of the two nodes which i wish to run the tests on.

At the command line i am typing:

mpirun -np 8 -machinefile hosts xhpl_em64t

where hosts file has the names:
machine1
machine2

With each machine being a 3ghz QX6850 Core 2 Extreme (Quad Core), 4GB RAM.

The dat file being uses for two nodes is:

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any) 
8            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
29184         Ns
1            # of NBs
128           NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
2            Ps
4            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)
##### This line (no. 32) is ignored (it serves as a separator). ######
0                               Number of additional problem sizes for PTRANS
1200 10000 30000                values of N
0                               number of additional blocking sizes for PTRANS
40 9 8 13 13 20 16 32 64        values of NB

Hoping someone could please help.

Thanks.

Several points:

  • The number of processor-cores you run this thing on must match the product of P and Q. So if P is 2 and Q is 4, you will need 8 cores; no more, no less
  • If you provide MPICH with -np 8 and you specify a machine file, it expects at least that number of hosts in the machine file. If a host has multiple processor-cores (in your case, yes of course), you enter the hostname for each core. So if machine1 has 8 processor cores, your machinefile should include 8 lines of "machine1".

Otheus, thank you so much for your responses. I can't wait to test that out when i get to university on Monday. I think my mistake is that i have not put down a name in the machines file for each core.

Hopefully this should work. Sorry for the double post, and will post back to let you know how it goes.

Hi Otheus, i have tried what you have stated, and i am still getting the error:

HPL ERROR from process # 0, on line 621 of function HPL_pdinfo:
>>> Illegal input in file HPL.dat. Exiting ... <<<

I am using 2 nodes including the head node, so 2 in total. Each of these nodes is a quad core system. So my machine file has this in it:
machine1
machine1
machine1
machine1
machine2
machine2
machine2
machine2

The command execution line i am typing in is:

mpirun -np 8 -machinefile hosts xhpl_em64t

p*q = 8 from my HPL.dat file, where p = 2 , and q = 4.

Yet still i am getting that error. Would you happen to know what else could be wrong?

Thanks.

You are getting a different error.

HPL ERROR from process # 0, on line 621 of function HPL_pdinfo:
>>> Illegal input in file HPL.dat. Exiting ... <<<

Unfortunately, I don't have access to the source code to say what the error is. The HPL.dat file I posted was a suggested sample based on my reading of internet documentation. It could be that documentation is out-of-date or perhaps newer or perhaps just wrong. Check the README and/or sample HPL.dat files that came with you HPL and go through them line-by-line vis-a-vis your own.

Hi Otheus, same error as the other day, just that i accidently didnt copy the part of the error about there needing to be a certain number of processes for the test. The thing is, when i change the P and Q values to 1 and 1 respectively within the HPL.dat file, it works, and it performs the tests. The moment i make them anything other than 1 and 1, such as 2 and 4 to run across 8 processes on 2 nodes, then it gives me the original error message.

Been stumped for weeks now =(

Edit: The full error message being received is still the same as the other one:

"HPL ERROR from process # 0, on line 419 of function HPL_pdinfo:
>>> Need at least 8 processes for these tests <<<

HPL ERROR from process # 0, on line 621 of function HPL_pdinfo:
>>> Illegal input in file HPL.dat. Exiting ... <"

Did you make sure you compiled with your MPI installation, with mpicc ??

Hi Otheus, thanks for your reply.

By "compiled with your MPI installation", do you mean simply placing openmpi in the execution line, such as:

mpirun -np 8 -machinefile machines xhpl_em64t

That is the line i have been using, to make use of openmpi.

I apologise if this is not what you meant, i am still very new to all this, as well as linux, a total newbie i might add, but decided to take on this challenging task at uni for a research topic, and to learn about linux.

Hoping you can shed some light on what you meant by "compiled with your MPI installation", or how i go about doing this.

Thanks Otheus.

How did you compile and install this program?

Oh the software was already installed by the university. I am using the university cluster to run these tests, with open mpi, and Linpack Benchmark all already installed and have been told by the administrator that they are all working properly, and that i just need to know how to learn to use Linpack and tune the Dat file.

Just that once the values of P and Q are other than 1 and 1 in the dat file, then the errors begin.

Uh-hunh. I wouldn't completely trust that if I were you. Let's take it step-by-step.

  1. Use "type" or "which" or "whence" to find the full path of the linpack executable:
    text type -a hpl
  2. verify this has been compiled dynamically and not statically:
    text file <hpl path from step 1>

    You should see something like "Dynamically linked i386 object". As long as you don't see "statically linked binary" proceed to the next step. Otherwise, talk to your system admin and ask him/her very specifically how he/she compiled it.
  3. Next, run
    text ldd <hpl path from step 1>

    You should see something like "libopen-rte.so" in the output. If you do not, ask your sysadmin to point you to the correct hpl, the one this is compiled "against" (with) the openMPI runtimes.
  4. The libopen-rte.so should point to the full path of a file. If it does not, again, go to your system administrator and ask him/her for the full LD_LIBRARY_PATH that you should have to run against this hpl program.
  5. The path should be available to you by default on all machines in the cluster. If not, add the LD_LIBRARY_PATH setting into the .bashrc file and include your .bashrc from your .bash_profile (if you're using csh, god help you; if ksh, just change the names to .kshrc and .profile). Now log into the other machine and run the ldd command as above; you should see the line pointing to the full path of the MPI rte library.
  6. Make sure this all works by running:
    text mpirun -np 8 printenv LD_LIBRARY_PATH

    You should get 8 instances of the correct LD_LIBRARY_PATH.
  7. Now go back and try getting this to run for exactly 2 processes. (P=2, Q=1, -np 2)
  8. Now modify the machine file so it has two lines in it, one for each hostname, and run it again.
  9. If we're at this point, try again with 8; if it fails, there are some other things to look at and try.

Hi Otheus. After typing in which -a hpl, this is what comes up:

/usr/bin/which: no hpl in (/usr/local/openmpi-1.2.6/bin:/usr/local/lam-7.1.4/bin:/usr/local/openmpi-1.2.6/bin:/usr/local/lam-7.1.4/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/usr/NX/bin:/home/jleungsh/bin:/usr/NX/bin)

Which of those directories is the correct one?

I have skipped down to step 6 in the mean time, and i get this:

[jleungsh@hydrus14 em64t]$ mpirun -np 8 printenv LD_LIBRARY_PATH
/usr/local/openmpi-1.2.6/lib:
/usr/local/openmpi-1.2.6/lib:
/usr/local/openmpi-1.2.6/lib:
/usr/local/openmpi-1.2.6/lib:
/usr/local/openmpi-1.2.6/lib:
/usr/local/openmpi-1.2.6/lib:
/usr/local/openmpi-1.2.6/lib:
/usr/local/openmpi-1.2.6/lib:

Thanks

Sorry, try:

which xhpl_em64t

Ok Otheus, i have done everything you have said, and still the same error.

This is the output from steps 1-2:

in_intel/em64t/xhpl_em64t
/opt/intel/mkl/10.0.1.014/benchmarks/mp_linpack/bin_intel/em64t/xhpl_em64t: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.4.1, dynamically linked (uses shared libs), for GNU/Linux 2.4.1, not stripped.

Output from step 3:

libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003d2c400000)
librt.so.1 => /lib64/librt.so.1 (0x0000003d31000000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003d2c000000)
libm.so.6 => /lib64/libm.so.6 (0x0000003d2bc00000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003d30000000)
libc.so.6 => /lib64/libc.so.6 (0x0000003d2b800000)
/lib64/ld-linux-x86-64.so.2 (0x0000003d2a600000)

Output from step 6:

/usr/local/openmpi-1.2.6/lib:
/usr/local/openmpi-1.2.6/lib:
/usr/local/openmpi-1.2.6/lib:
/usr/local/openmpi-1.2.6/lib:
/usr/local/openmpi-1.2.6/lib:
/usr/local/openmpi-1.2.6/lib:
/usr/local/openmpi-1.2.6/lib:
/usr/local/openmpi-1.2.6/lib:

I then opened the HPL.dat, and made P = 2, Q = 1.

I then opened my machines file, and included the names of the node i am logged into, plus another free node:
hydrus1
hydrus2

And then i ran this command from the terminal:

mpirun -np 2 -machinefile machines xhpl_em64t

And i was displayed with the following error:

HPL ERROR from process # 0, on line 419 of function HPL_pdinfo:
>>> Need at least 2 processes for these tests <<<

HPL ERROR from process # 0, on line 621 of function HPL_pdinfo:
>>> Illegal input in file HPL.dat. Exiting ... <<<

HPL ERROR from process # 0, on line 419 of function HPL_pdinfo:
>>> Need at least 2 processes for these tests <<<

HPL ERROR from process # 0, on line 621 of function HPL_pdinfo:
>>> Illegal input in file HPL.dat. Exiting ... <<<

I am really desperate to get this to work, have been trying for weeks.

Thanks for your help once again Otheus, i really appreciate the efforts you are going to.

As I suspected, your administrator did not give you the MPI version; rather he compiled it for the threading model. Show your administrator the output from step 3 and kindly ask him/her to recompile it for you with the OpenMPI.

After that, everything should work fine.

I was able to succesfully run xhpl using your HPL.dat file on 6 processors across 2 nodes.

The command used was:

mpirun -mca btl tcp,self -np 6 -hostfile machines ./xhpl

(Note the "-mca btl tcp,self" is to avoid warnings and errors w.r.t. Infiniband, which we don't have here. If this is your problem, it's still your admin's fault!)

The machines file was as follows:

sandbox
sandbox
sandbox2
sandbox2
sandbox2
sandbox2

The HPL.dat file was identical to the one you posted here, with the exception of Q = 3 (since I have one 4-cpu host and one 2-cpu host). Also HPL.out was changed to 340 for speed reasons, but prior tests indicated it worked well to at least 2500 on my hosts. Large values of N may cause the system to run out of memory, but this would be a different error.

To compile the HPL program, I used the command "make arch=TIS" (my firm is called Tiscover) and the following Makefile (Make.TIS). For your admin's sake, I have hilighted some important changes:

SHELL        = /bin/sh
CD           = cd
CP           = cp
LN_S         = ln -s
MKDIR        = mkdir
RM           = /bin/rm -f
TOUCH        = touch
ARCH         = TIS
TOPdir       = $(HOME)/downloads/hpl-2.0
INCdir       = $(TOPdir)/include
BINdir       = $(TOPdir)/bin/$(ARCH)
LIBdir       = $(TOPdir)/lib/$(ARCH)
HPLlib       = $(LIBdir)/libhpl.a
MPinc        =
MPlib        =
LAdir        =
LAinc        =
LAlib        = /usr/lib/libblas.a
F2CDEFS      = -DAdd__ -DF77_INTEGER=int -DStringSunStyle
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
HPL_LIBS     = $(HPLlib) $(LAlib) $(MPlib)
HPL_OPTS     =
HPL_DEFS     = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
CC           = /usr/bin/mpicc
CCNOOPT      = $(HPL_DEFS)
CCFLAGS      = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops
LINKER       = /usr/bin/mpif77
LINKFLAGS    = $(CCFLAGS)
ARCHIVER     = ar
ARFLAGS      = r
RANLIB       = echo