Difference between revisions of "FAQ"
Line 26: | Line 26: | ||
just directly running it. The number of MPI processes will by default | just directly running it. The number of MPI processes will by default | ||
be the number of entries in your hostfile. | be the number of entries in your hostfile. | ||
+ | |||
+ | |||
+ | ===How can I monitor my jobs?=== | ||
+ | |||
+ | By the way, how can I monitor the load? not with llq? | ||
+ | |||
+ | ====Answer==== | ||
+ | |||
+ | You can get more information with the command | ||
+ | /xcat/tools/tcs-scripts/LL/jobState.sh, | ||
+ | which I alias as: | ||
+ | alias llq1='/xcat/tools/tcs-scripts/LL/jobState.sh' | ||
+ | If you run "llq1 -n" you will see a listing of jobs together with a lot of information, including the load. | ||
+ | |||
+ | |||
===Next question, please=== | ===Next question, please=== | ||
We'll answer it asap! | We'll answer it asap! |
Revision as of 18:15, 5 May 2009
MPI development and interactive testing
I am in the process of playing around with the mpi calls in my code to get it to work. I do a lot of tests and each of them takes a couple of seconds only.Sometimes (like now that I'm sending this email), all the machines are full and I'm put in the line. Since I just need a couple of SECONDS, is there any way I can test it on the log-in nodes? I can't do it using the llsubmit command and if I use mpiexec then I need a host file. Can I use a host file to run my 2 second test jobs on the log in nodes? If yes, can you send me an example host file please?
Answer
You can run small MPI jobs on the tcs-f11n06 node, which is meant for development use. Please don't run them on the main login node tcs-f11n05. Now, as for the hostfile, it simply looks like:
tcs-f11n06 tcs-f11n06 tcs-f11n06 tcs-f11n06
for a 4-task run. When you invoke "poe" or "mpirun", there are runtime arguments that you specify pointing to this file. You can also specify it in an environment variable MP_HOSTFILE, so, if your file is in your /scratch/USER/hostfile, then you would do
export MP_HOSTFILE=/scratch/USER/hostfile
in your shell. After that you can simply run your program. You can run it with the poe command (do a "man poe" for details), or even by just directly running it. The number of MPI processes will by default be the number of entries in your hostfile.
How can I monitor my jobs?
By the way, how can I monitor the load? not with llq?
Answer
You can get more information with the command
/xcat/tools/tcs-scripts/LL/jobState.sh,
which I alias as:
alias llq1='/xcat/tools/tcs-scripts/LL/jobState.sh'
If you run "llq1 -n" you will see a listing of jobs together with a lot of information, including the load.
Next question, please
We'll answer it asap!