Difference between revisions of "User:Ljdursi"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
(Replaced content with 'Jonathan Dursi HPC Computational Science Specialist ljdursi@scinet.utoronto.ca http://www.cita.utoronto.ca/~ljdursi/')
Line 3: Line 3:
 
ljdursi@scinet.utoronto.ca
 
ljdursi@scinet.utoronto.ca
 
http://www.cita.utoronto.ca/~ljdursi/
 
http://www.cita.utoronto.ca/~ljdursi/
 
Jonathan Dursi
 
HPC Computational Science Specialist
 
ljdursi@scinet.utoronto.ca
 
http://www.cita.utoronto.ca/~ljdursi/
 
 
 
__TOC__
 
 
 
==The Basics==
 
===Who do I contact for support?===
 
 
Who do I contact if I have problems or questions about how to use the SciNet systems?
 
 
'''Answer:'''
 
 
E-mail [mailto:support@scinet.utoronto.ca <support@scinet.utoronto.ca>] 
 
 
Try to avoid sending email only to specific individuals at SciNet. Your chances of a quick reply increase significantly if you email our team!
 
 
===What does ''code scaling'' mean?===
 
 
'''Answer:'''
 
 
Please see [[Introduction_To_Performance#Parallel_Speedup|A Performance Primer]]
 
 
===What do you mean by ''throughput''?===
 
 
'''Answer:'''
 
 
Please see [[Introduction_To_Performance#Throughput|A Performance Primer]].
 
 
Here is a simple example:
 
 
Suppose you need to do 10 computations.  Say each of these runs for
 
1 day on 8 cores, but they take "only" 18 hours on 16 cores.  What is the
 
fastest way to get all 10 computations done - as 8-core jobs or as
 
16-core jobs?  Let us assume you have 2 nodes at your disposal.
 
The answer, after some simple arithmetic, is that running your 10
 
jobs as 8-core jobs will take 5 days, whereas if you ran them
 
as 16-core jobs it would take 7.5 days.  Take your own conclusions...
 
 
===I changed my .bashrc/.bash_profile and now nothing works===
 
 
The default startup scripts provided by SciNet are as follows.  Certain things - like sourcing <tt>/etc/profile</tt>
 
and <tt>/etc/bashrc</tt> are ''required'' for various SciNet routines to work! 
 
 
'''.bash_profile'''
 
<source lang="bash">
 
if [ -f /etc/profile ]; then
 
      . /etc/profile
 
fi
 
 
# commands which work for both GPC and TCS can go here
 
 
alias passwd='echo "Please use the SciNet portal to change  password: https://portal.scinet.utoronto.ca/change_password"'
 
 
HOST=$(uname)
 
 
if [ "${HOST}" == "AIX" ]
 
then
 
        # do things for the TCS machine
 
        alias llq1='/xcat/tools/tcs-scripts/LL/jobState.sh'
 
        alias llstat='/xcat/tools/tcs-scripts/LL/jobSummary.sh'
 
 
        if [ "${TERM}" = "xterm-color" ]; then
 
                export TERM=xterm
 
        fi
 
 
        # user environment for login shells goes here
 
        # replace colon with your own commands
 
        :
 
else
 
        # do things for the GPC machine
 
        # user environment for login shells goes here
 
        # replace colon with your own commands
 
        :
 
 
fi
 
 
PS1="\h-\$ "
 
 
 
if [ -f ~/.bashrc ]; then
 
      . ~/.bashrc
 
fi
 
</source>
 
 
'''.bashrc'''
 
<source lang="bash">
 
if [ -f /etc/bashrc ]; then
 
      . /etc/bashrc
 
fi
 
 
# commands which work for both GPC and TCS can go here
 
 
export BASH_ENV=~/.bashrc
 
 
HOST=$(uname)
 
 
if [ "${HOST}" == "AIX" ]; then
 
        # do things for the TCS machine
 
        # user environment for all shells goes here
 
        # replace colon with your own commands
 
        :
 
else
 
        # do things for the GPC machine
 
 
        module load intel openmpi
 
 
        # user environment for all shells goes here
 
        # replace colon with your own commands
 
        :
 
fi
 
</source>
 
 
===How can I run Matlab / IDL / my favourite commercial software on SciNet?===
 
 
'''Answer:'''
 
 
Because SciNet serves such a disparate group of user communities, there is just no way we can buy licenses for everyone's commercial package.  The only commercial software we have purchased is that which in principle can benefit everyone -- fast compilers and math libraries (Intel's on GPC, and IBM's on TCS).
 
 
If your research group requires a commercial package that you already have or are willing to buy licenses for, contact us at [mailto:support@scinet.utoronto.ca support@scinet] and we can work together to find out if it is feasible to implement the packages licensing arrangement on the SciNet clusters, and if so, what is the the best way to do it.
 
 
Note that it is important that you contact us before installing commercially licensed software on SciNet machines, even if you have a way to do it in your own directory without requiring sysadmin intervention.  It puts us in a very awkward position if someone is found to be running unlicensed or invalidly licensed software on our systems, so we need to be aware of what is being installed where.
 
 
 
==Compiling your Code==
 
 
===Autoparallelization does not work!===
 
 
I compiled my code with the <tt>-qsmp=omp,auto</tt> option, and then I specified that it should be run with 64 threads - with
 
export OMP_NUM_THREADS=64
 
 
However, when I check the load using <tt>llq1 -n</tt>, it shows a load on the node of 1.37.  Why?
 
 
'''Answer:'''
 
 
Using the autoparallelization will only get you so far.  In fact, it usually does not do too much.  What is helpful is to run the compiler with the <tt>-qreport</tt> option, and then read the output listing carefully to see where the compiler thought it could parallelize, where it could not, and the reasons for this.  Then you can go back to your code and carefully try to address each of the issues brought up by the compiler.
 
We ''emphasize'' that this is just a rough first guide, and that the compilers are still not magical!  For more sophisticated approaches to parallelizing your code, email us at [mailto:support@scinet.utoronto.ca <support@scinet.utoronto.ca>]  to set up an appointment with one
 
of our technical analysts.
 
 
 
==Testing your Code==
 
 
=== Can I run a something for a short time on the development nodes? ===
 
 
I am in the process of playing around with the mpi calls in my code to get it to work. I do a lot of tests and each of them takes a couple of seconds only.  Can I do this on the development nodes?
 
 
'''Answer:'''
 
 
Yes, as long as it's very brief (a few minutes).  People use the development nodes
 
for their work, and you don't want to bog it down for people, and testing a real
 
code can chew up a lot more resources than compiling, etc.    The procedures differ
 
depending on what machine you're using.
 
 
==== TCS ====
 
 
On the TCS you can run small MPI jobs on the tcs02 node, which is meant for
 
development use.  But even for this test run on one node, you'll need a host file --
 
a list of hosts (in this case, all tcs-f11n06, which is the `real' name of tcs02)
 
that the job will run on.  Create a file called `hostfile' containing the following:
 
 
tcs-f11n06
 
tcs-f11n06
 
tcs-f11n06
 
tcs-f11n06
 
 
for a 4-task run.  When you invoke "poe" or "mpirun", there are runtime
 
arguments that you specify pointing to this file.  You can also specify it
 
in an environment variable MP_HOSTFILE, so, if your file is in your
 
/scratch/USER/hostfile, then you would do
 
 
<pre>
 
export MP_HOSTFILE=/scratch/USER/hostfile
 
</pre>
 
 
in your shell.  You will also need to create a <tt>.rhosts</tt> file in your
 
home director, again listing <tt>tcs-f11n06</tt> so that <tt>poe</tt>
 
can start jobs.  After that you can simply run your program.  You can use
 
mpiexec:
 
 
<pre>
 
mpiexec -n 4 my_test_program
 
</pre>
 
 
adding <tt> -hostfile /path/to/my/hostfile</tt> if you did not set the environment
 
variable above.  Alternatively, you can run it with the poe command (do a "man poe" for details), or even by
 
just directly running it.  In this case the number of MPI processes will by default
 
be the number of entries in your hostfile.
 
 
==== GPC ====
 
 
On the GPC one can run short test jobs on the GPC [[GPC_Quickstart#Compile.2FDevel_Nodes | development nodes ]]<tt>gpc01</tt>..<tt>gpc04</tt>;
 
if they are single-node jobs (which they should be) they don't need a hostfile.  Even better, though, is to request an [[ Moab#Interactive | interactive ]] job and run the tests either in regular batch queue or using a short high availability [[ Moab#debug | debug ]] queue that is reserved for this purpose.
 
 
=== How do I run a longer (but still shorter than an hour) test job quickly ? ===
 
 
'''Answer'''
 
 
On the GPC there is a high turnover short queue called [[ Moab#debug | debug ]] that is designed for
 
this purpose.  You can use it by adding
 
<pre>
 
#PBS -q debug
 
</pre>
 
to your submission script.
 
 
==Running your jobs==
 
 
===OpenMP on the TCS===
 
 
How do I run an OpenMP job on the TCS?
 
 
'''Answer:'''
 
 
Please look at the [[TCS_Quickstart#Submission_Script_for_an_OpenMP_Job | TCS Quickstart ]] page.
 
 
 
===How do I run serial jobs on GPC?===
 
 
'''Answer''':
 
 
So it should be said first that SciNet is a parallel computing resource,
 
and our priority will always be parallel jobs.  Having said that, if
 
you can make efficient use of the resources using serial jobs and get
 
good science done, that's good too, and we're happy to help you.
 
 
The GPC nodes each have 8 processing cores, and making efficient use of these
 
nodes means using all eight cores.  As a result, we'd like to have the
 
users take up whole nodes (eg, run multiples of 8 jobs) at a time.  The most
 
straightforward way to do this is to bunch the jobs in groups of 8 that
 
will take roughly the same amount of time, and create a job that looks a
 
bit like this
 
 
<source lang="bash">
 
#!/bin/bash
 
# MOAB/Torque submission script for multiple serial jobs on
 
# SciNet GPC
 
#
 
#PBS -l nodes=1:ppn=8,walltime=1:00:00
 
#PBS -N serialx8
 
 
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from
 
cd $PBS_O_WORKDIR
 
 
# EXECUTION COMMAND; ampersand off 8 jobs and wait
 
(cd jobdir1; ./dojob1) &
 
(cd jobdir2; ./dojob2) &
 
(cd jobdir3; ./dojob3) &
 
(cd jobdir4; ./dojob4) &
 
(cd jobdir5; ./dojob5) &
 
(cd jobdir6; ./dojob6) &
 
(cd jobdir7; ./dojob7) &
 
(cd jobdir8; ./dojob8) &
 
wait
 
</source>
 
 
There are three important things to take note of here.  First, the <tt>wait</tt>
 
command at the end is crucial; without it the job will terminate
 
immediately, killing the 8 programs you just started.
 
 
Second is that it is important to group the programs by how long they
 
will take.  If (say) <tt>dojob8</tt> takes 2 hours and the rest only take 1,
 
then for one hour 7 of the 8 cores on the GPC node are wasted; they are
 
sitting idle but are unavailable for other users, and the utilization of
 
this node over the whole run is only 56%.  This is the sort of thing
 
we'll notice, and users who don't make efficient use of the machine will
 
have their ability to use scinet resources reduced.
 
 
Third is that it is necessary to have a good idea of how much memory the
 
jobs will require.  The GPC compute nodes have about 14.5GB in total available
 
to user jobs running on the 8 cores (a bit less, say 13GB, on the devel ndoes <tt>gpc01..04</tt>). 
 
So the jobs also have to be  bunched in ways that will fit into 14.5GB.  If that's not possible --
 
each individual job requires significantly in excess of ~1.8GB -- then
 
its possible in principle to just run fewer jobs so that they do fit;
 
but then, again there is an under-utilization problem.  In that case,
 
the jobs are likely candidates for parallelization, and you can contact
 
us at [mailto:support@scinet.utoronto.ca <support@scinet.utoronto.ca>] and arrange a meeting with one of the
 
technical analysts to help you do just that.
 
 
===Why can't I request only a single cpu for my job on GPC?===
 
 
'''Answer''':
 
 
On GPC, computers are allocated by the node - that is, in chunks of 8 processors.  If you want to run a job that requires only one processor, you need to bundle the jobs into groups of 8, so as to not be wasting the other 7 for 48 hours. See [[FAQ#How_do_I_run_serial_jobs_on_GPC.3F|How do I run serial jobs on GPC?]]
 
 
===How do I run serial jobs on TCS?===
 
 
'''Answer''': You don't.
 
 
 
===But in the queue I found a user who is running jobs on GPC, each of which is using only one processor, so why can't I?===
 
 
'''Answer''':
 
 
The pradat* and atlaspt* jobs, amongst others, are jobs of the ATLAS high energy physics project. That they are reported as single cpu jobs is an artifact of the moab scheduler. They are in fact being automatically bundled into 8-job bundles but have to run individually to be compatible with their international grid-based systems.
 
 
===How can I automatically resubmit a job?===
 
 
Commonly you may have a job that you know will take longer to run than what is
 
permissible in the queue.  As long as your program contains checkpoint or
 
restart capability, you can have one job automatically submit the next. In
 
the following example it is assumed that the program finishes before
 
the 48 hour limit and then resubmits itself by logging into one
 
of the development nodes.
 
 
<source lang="bash">
 
#!/bin/bash
 
# MOAB/Torque example submission script for auto resubmission
 
# SciNet GPC
 
#
 
#PBS -l nodes=1:ppn=8,walltime=48:00:00
 
#PBS -N my_job
 
 
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from
 
cd $PBS_O_WORKDIR
 
 
# YOUR CODE HERE
 
./run_my_code
 
 
# RESUBMIT 10 TIMES HERE
 
num=$NUM
 
if [ $num -lt 10 ]; then
 
      num=$(($num+1))
 
      ssh gpc01 "cd $PBS_O_WORKDIR; qsub ./script_name.sh -v NUM=$num";
 
fi
 
</source>
 
 
<pre>
 
qsub script_name.sh -v
 
</pre>
 
 
You can alternatively use [[ Moab#Job_Dependencies | Job dependencies ]] through the queuing system which will not start one job until another job has completed.
 
 
===How can I pass in arguments to my submission script?===
 
 
If you wish to make your scripts more generic you can use qsub's ability
 
to pass in environment variables to pass in arguments to your script.
 
The following example shows a case where an input and an output
 
file are passed in on the qsub line. Multiple variables can be
 
passed in using the qsub "-v" option and comma delimited.
 
 
<source lang="bash">
 
#!/bin/bash
 
# MOAB/Torque example of passing in arguments
 
# SciNet GPC
 
#
 
#PBS -l nodes=1:ppn=8,walltime=48:00:00
 
#PBS -N my_job
 
 
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from
 
cd $PBS_O_WORKDIR
 
 
# YOUR CODE HERE
 
./run_my_code -f $INFILE -o $OUTFILE
 
</source>
 
 
<pre>
 
qsub script_name.sh -v INFILE=input.txt,OUTFILE=outfile.txt
 
</pre>
 
 
=== How can I run a job longer than 48 hours? ===
 
 
'''Answer:'''
 
 
Currently the maximum time you can run a single job on either the GPC or TCS is 48hours. If your job will take longer, you will have to submit your job in multiple parts, restarting from a checkpoint each time.  You can
 
do this manually or using [[FAQ#How_can_I_automatically_resubmit_a_job.3F | automatic resubmission. ]]
 
 
 
===How do we manage job priorities within our research group?===
 
 
'''Answer:'''
 
 
Obviously, managing shared resources within a large group - whether it
 
is conference funding or CPU time - takes some doing. 
 
 
It's important to note that the fairshare periods are intentionally kept
 
quite short - just two weeks long.  (These exact numbers subject to
 
change  as the year goes on and we better understand use patterns, but
 
they're unlikely to change radically).  So, for example, let us say that in your resource
 
allocation you have about 10% of the machine.  Then for someone to use
 
up the whole two week amount of time in 2 days, they'd have to use 70%
 
of the machine in those two days - which is unlikely to happen by
 
accident.  If that does happen, 
 
those using the same allocation as the person who used 70% of the
 
machine over the two days will suffer by having much lower priority for
 
their jobs, but only for the next 12 days - and even then, if there are
 
idle cpus they'll still be able to compute.
 
 
There will be online tools for seeing how the allocation is being used,
 
and those people who are in charge in your group will be able to use
 
that information to manage the users, telling them to dial it down or
 
up.  We know that managing a large research group is hard, and we want
 
to make sure we provide you the information you need to do your job
 
effectively.
 
 
One way for users within a group to manage their priorities within the group
 
is with [[Moab#Adjusting_Job_Priority | user-adjusted priorities]]; this is
 
described in more detail on the [[Moab | Scheduling System]] page.
 
 
 
=== How do I charge jobs to my NRAC/LRAC allocation? ===
 
 
'''Answer:'''
 
 
Please see the [[Moab#Accounting|accounting section of Moab page]].
 
 
 
==Monitoring jobs in the queue==
 
 
 
===Why hasn't my job started?===
 
 
'''Answer:'''
 
 
Use the moab command
 
 
<pre>
 
checkjob -v jobid
 
</pre>
 
 
and the last couple of lines should explain why a job hasn't started. 
 
 
Please see [[Moab| Job Scheduling System (Moab) ]] for more detailed information
 
 
===How do I figure out when my job will run?===
 
 
'''Answer:'''
 
 
Please see [[Moab#Available_Resources| Job Scheduling System (Moab) ]]
 
 
 
===How can I monitor my running jobs on TCS?===
 
 
How can I monitor the load of TCS jobs?
 
 
'''Answer:'''
 
 
You can get more information with the command
 
/xcat/tools/tcs-scripts/LL/jobState.sh
 
which I alias as:
 
alias llq1='/xcat/tools/tcs-scripts/LL/jobState.sh'
 
If you run "llq1 -n" you will see a listing of jobs together with a lot of information, including the load.
 
 
 
 
===Another transport will be used instead===
 
 
I get error messages like the following when running on the GPC at the start of the run, although the job seems to proceed OK.  Is this a problem?
 
<pre>
 
--------------------------------------------------------------------------
 
[[45588,1],0]: A high-performance Open MPI point-to-point messaging module
 
was unable to find any relevant network interfaces:
 
 
Module: OpenFabrics (openib)
 
  Host: gpc-f101n005
 
 
Another transport will be used instead, although this may result in
 
lower performance.
 
--------------------------------------------------------------------------
 
</pre>
 
 
'''Answer:'''
 
 
Everything's fine.  The two MPI libraries scinet provides work for both the InifiniBand and the Gigabit Ethernet interconnects, and will always try to use the fastest interconnect available.  In this case, you ran on normal gigabit GPC nodes with no infiniband; but the MPI libraries have no way of knowing this, and try the infiniband first anyway.  This is just a harmless `failover' message; it tried to use the infiniband, which doesn't exist on this node, then fell back on using Gigabit ethernet (`another transport').
 
 
==Errors in running jobs==
 
 
===On GPC, `Job cannot be executed'===
 
 
I get error messages like this trying to run on GPC:
 
 
<pre>
 
PBS Job Id: 30414.gpc-sched
 
Job Name:  namd
 
Exec host:  gpc-f120n011/7+gpc-f120n011/6+gpc-f120n011/5+gpc-f120n011/4+gpc-f120n011/3+gpc-f120n011/2+gpc-f120n011/1+gpc-f120n011/0
 
Aborted by PBS Server
 
Job cannot be executed
 
See Administrator for help
 
 
 
 
PBS Job Id: 30414.gpc-sched
 
Job Name:  namd
 
Exec host:  gpc-f120n011/7+gpc-f120n011/6+gpc-f120n011/5+gpc-f120n011/4+gpc-f120n011/3+gpc-f120n011/2+gpc-f120n011/1+gpc-f120n011/0
 
An error has occurred processing your job, see below.
 
request to copy stageout files failed on node 'gpc-f120n011/7+gpc-f120n011/6+gpc-f120n011/5+gpc-f120n011/4+gpc-f120n011/3+gpc-f120n011/2+gpc-f120n011/1+gpc-f120n011/0' for job 30414.gpc-sched
 
 
Unable to copy file 30414.gpc-sched.OU to cmadill@gpc-f101n084.scinet.local:/scratch/cmadill/projects/sim-performance-test/runtime/l/namd/8/namd.o30414
 
*** error from copy
 
30414.gpc-sched.OU: No such file or directory
 
*** end error output
 
</pre>
 
 
Try doing the following:
 
<pre>
 
mkdir /scratch/${USER}/.pbs_spool
 
ln -s /scratch/${USER}/.pbs_spool ~/.pbs_spool
 
</pre>
 
 
This is how all new accounts are setup on SciNet.
 
 
<tt>/home</tt> on GPC for compute jobs is mounted as a read-only file system. 
 
PBS by default tries to spool its output  files to <tt>${HOME}/.pbs_spool</tt>
 
which fails as it tries to write to a read-only file 
 
system.    New accounts at SciNet  get around this by having ${HOME}/.pbs_spool 
 
point to somewhere appropriate on <tt>/scratch</tt>, but if you've deleted that link
 
or directory, or had an old account, you will see errors like the above.
 
 
 
 
=== My GPC job died, telling me `Copy Stageout Files Failed' ===
 
 
'''Answer:'''
 
 
When a job runs on GPC, the scripts standard output and error are  redirected to <tt>~/.pbs_spool/${PBS_JOBID}.gpc-sched.OU</tt> and <tt>~/.pbs_spool/${PBS_JOBID}.gpc-sched.ER</tt> respectively. <tt>~/.pbs_spool</tt> is a symlink to <tt>/scratch/${USER}/.pbs_spool</tt>, as the compute  nodes can't write to the <tt>/home directory</tt>.  At the end of the job, those .OU and .ER files are copied to where the batch script tells them to be copied, by default <tt>${PBS_JOBNAME}.o${PBS_JOBID}</tt> and<tt>${PBS_JOBNAME}.e${PBS_JOBID}</tt>.  (You can set those filenames to be something clearer with the -e and -o options in your PBS script.)
 
 
When you get errors like this:
 
<pre>
 
An error has occurred processing your job, see below.
 
request to copy stageout files failed on node
 
</pre>
 
it means that the copying back process has failed in some way.  There could be a few reasons for this - it could have just been a random filesystem error, or it  could be that your job failed spectacularly enough to shortcircuit the normal job-termination process and those files just never got copied.
 
 
Generally if you look in <tt>~/.pbs_spool</tt> for the files corresponding to  that job ID, you will find them, and that may offer some clues as to what happened.
 
 
==Data on SciNet disks==
 
 
===How do I find out my disk usage?===
 
 
'''Answer:'''
 
 
The standard unix/linux utilities for finding the amount of disk space used by a directory are very slow, and notoriously inefficient on the GPFS filesystems that we run on the SciNet systems.  There are utilities that very quickly report your disk usage:
 
 
The
 
 
<pre>
 
mmlsquota
 
</pre>
 
 
command lists the user's current usage on the filesystems that run quotas (currently /home).
 
 
In order to get the usage on the /scratch filesystem, run the command
 
 
<pre>
 
/scinet/gpc/bin/scratchUsage
 
</pre>
 
 
In addition, the <tt>diskUsage</tt> command, available on the login nodes and the GPC devel nodes, reports how much disk space is being used by yourself and your group on the home, scratch, and project file systems, and how much remains available.  This information is updated hourly.  More information about these filesystems is available at the [[Storage_Quickstart | Storage Quick-start]].
 
 
 
===How do I transfer data to/from SciNet?===
 
 
'''Answer:'''
 
 
All incoming connections to SciNet go through relatively low-speed connections to the <tt>login.scinet</tt> gateways, so using scp to copy files the same way you ssh in is not an effective way to move lots of data.  Better tools are described in our page on [[Data_Transfer | Data Transfer]].
 
 
===How can I check if I have files in /scratch that are scheduled for automatic deletion?===
 
 
'''Answer:'''
 
 
Please see [[Storage_Quickstart#Scratch_Disk_Purging_Policy | Storage At SciNet]]
 
 
==Keep 'em Coming!==
 
 
===Next question, please===
 
 
Send your question to [mailto:support@scinet.utoronto.ca <support@scinet.utoronto.ca>];  we'll answer it asap!
 

Revision as of 09:21, 9 March 2010

Jonathan Dursi HPC Computational Science Specialist ljdursi@scinet.utoronto.ca http://www.cita.utoronto.ca/~ljdursi/