FAQ

From oldwiki.scinet.utoronto.ca
Revision as of 18:15, 5 May 2009 by Dgruner (talk | contribs)
Jump to navigation Jump to search

MPI development and interactive testing

I am in the process of playing around with the mpi calls in my code to get it to work. I do a lot of tests and each of them takes a couple of seconds only.Sometimes (like now that I'm sending this email), all the machines are full and I'm put in the line. Since I just need a couple of SECONDS, is there any way I can test it on the log-in nodes? I can't do it using the llsubmit command and if I use mpiexec then I need a host file. Can I use a host file to run my 2 second test jobs on the log in nodes? If yes, can you send me an example host file please?

Answer

You can run small MPI jobs on the tcs-f11n06 node, which is meant for development use. Please don't run them on the main login node tcs-f11n05. Now, as for the hostfile, it simply looks like:

tcs-f11n06
tcs-f11n06
tcs-f11n06
tcs-f11n06

for a 4-task run. When you invoke "poe" or "mpirun", there are runtime arguments that you specify pointing to this file. You can also specify it in an environment variable MP_HOSTFILE, so, if your file is in your /scratch/USER/hostfile, then you would do

export MP_HOSTFILE=/scratch/USER/hostfile

in your shell. After that you can simply run your program. You can run it with the poe command (do a "man poe" for details), or even by just directly running it. The number of MPI processes will by default be the number of entries in your hostfile.


How can I monitor my jobs?

By the way, how can I monitor the load? not with llq?

Answer

You can get more information with the command

/xcat/tools/tcs-scripts/LL/jobState.sh,

which I alias as:

alias llq1='/xcat/tools/tcs-scripts/LL/jobState.sh'

If you run "llq1 -n" you will see a listing of jobs together with a lot of information, including the load.


Next question, please

We'll answer it asap!