FAQ
The Basics
Who do I contact for support?
Who do I contact if I have problems or questions about how to use the SciNet systems?
Answer:
E-mail <support@scinet.utoronto.ca>
In your email, please include the following information:
- your username on SciNet
- the cluster that your question pertains to (GPC or TCS; SciNet is not a cluster!),
- any relevant error messages
- the commands you typed before the errors occured
- the path to your code (if applicable)
- the location of the job scripts (if applicable)
- the directory from which it was submitted (if applicable)
- a description of what it is supposed to do (if applicable)
- if your problem is about connecting to SciNet, the type of computer you are connecting from.
Note that your password should never, never, never be to sent to us, even if your question is about your account.
Try to avoid sending email only to specific individuals at SciNet. Your chances of a quick reply increase significantly if you email our team!
What does code scaling mean?
Answer:
Please see A Performance Primer
What do you mean by throughput?
Answer:
Please see A Performance Primer.
Here is a simple example:
Suppose you need to do 10 computations. Say each of these runs for 1 day on 8 cores, but they take "only" 18 hours on 16 cores. What is the fastest way to get all 10 computations done - as 8-core jobs or as 16-core jobs? Let us assume you have 2 nodes at your disposal. The answer, after some simple arithmetic, is that running your 10 jobs as 8-core jobs will take 5 days, whereas if you ran them as 16-core jobs it would take 7.5 days. Take your own conclusions...
I changed my .bashrc/.bash_profile and now nothing works
The default startup scripts provided by SciNet, and guidelines for them, can be found here. Certain things - like sourcing /etc/profile and /etc/bashrc are required for various SciNet routines to work!
If the situation is so bad that you cannot even log in, please send email support.
How can I run Matlab / IDL / my favourite commercial software on SciNet?
Answer:
Because SciNet serves such a disparate group of user communities, there is just no way we can buy licenses for everyone's commercial package. The only commercial software we have purchased is that which in principle can benefit everyone -- fast compilers and math libraries (Intel's on GPC, and IBM's on TCS).
If your research group requires a commercial package that you already have or are willing to buy licenses for, contact us at support@scinet and we can work together to find out if it is feasible to implement the packages licensing arrangement on the SciNet clusters, and if so, what is the the best way to do it.
Note that it is important that you contact us before installing commercially licensed software on SciNet machines, even if you have a way to do it in your own directory without requiring sysadmin intervention. It puts us in a very awkward position if someone is found to be running unlicensed or invalidly licensed software on our systems, so we need to be aware of what is being installed where.
Do you have a recommended ssh program that will allow scinet access from Windows machines?
Answer:
Sure. Two in particular are
- putty - this is a terminal for windows that connects via ssh
- CygWin - this is a whole linux-like environment for windows, which also includes an X window server so that you can display remote windows on your desktop. Make sure you include the openssh and X window system in the installation for full functionality.
Compiling your Code
How can I get g77 to work?
The fortran 77 compilers on the GPC are ifort and gfortran. We have dropped support for g77. This has been a conscious decision. g77 (and the associated library libg2c) were completely replaced five years ago (Apr 2005) by the gcc 4.x branch, and haven't undergone any updates at all, even bug fixes, for over four years. If we would install g77 and libg2c, we would have to deal with the inevitable confusion caused when users accidentally link against the old, broken, wrong versions of the gcc libraries instead of the correct current versions.
If your code for some reason specifically requires five-plus-year-old libraries, availability, compatibility, and unfixed-known-bug problems are only going to get worse for you over time, and this might be as good an opportunity as any to address those issues.
A note on porting to gfortran or ifort:
While gfortran and ifort are rather compatible with g77, one important difference is that by default, gfortran does not preserve local variables between function calls, while g77 does. Preserved local variables are for instance often used in implementations of quasi-random number generators. Proper fortran requires to declare such variables as SAVE but not all old code does this. Luckily, you can change gfortran's default behavior with the flag -fno-automatic. For ifort, the corresponding flag is -noautomatic.
Where is libg2c.so?
libg2c.so is part of the g77 compiler, for which we dropped support. See #How can I get g77 to work on the GPC? for our reasons.
Autoparallelization does not work!
I compiled my code with the -qsmp=omp,auto option, and then I specified that it should be run with 64 threads - with
export OMP_NUM_THREADS=64
However, when I check the load using llq1 -n, it shows a load on the node of 1.37. Why?
Answer:
Using the autoparallelization will only get you so far. In fact, it usually does not do too much. What is helpful is to run the compiler with the -qreport option, and then read the output listing carefully to see where the compiler thought it could parallelize, where it could not, and the reasons for this. Then you can go back to your code and carefully try to address each of the issues brought up by the compiler. We emphasize that this is just a rough first guide, and that the compilers are still not magical! For more sophisticated approaches to parallelizing your code, email us at <support@scinet.utoronto.ca> to set up an appointment with one of our technical analysts.
How do I link against the Intel Math Kernel Library?
If you need to link in the MKL libraries, you are well advised to use the Intel(R) Math Kernel Library Link Line Advisor: http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/ for help in devising the list of libraries to link with your code.
"relocation truncated to fit: R_X86_64_PC32": Huh?
What does this mean, and why can't I compile this code?
Answer:
Welcome to the joys of the x86 architecture! You're probably having trouble building arrays larger than 2GB, individually or together. Generally, you have to try to use the medium or large x86 `memory model'. For the intel compilers, this is specified with the compile options
-mcmodel=medium -shared-intel
"feupdateenv is not implemented and will always fail"
How do I get rid of this and what does it mean?
Answer:
First note that, as ominous as it sounds, this is really just a warning, and has to do with the intel math library. You often can ignore it, or take the safe road and get rid off it by linking with the intel math functions library:
-limf
See also #How do I link against the Intel Math Kernel Library?
Testing your Code
Can I run a something for a short time on the development nodes?
I am in the process of playing around with the mpi calls in my code to get it to work. I do a lot of tests and each of them takes a couple of seconds only. Can I do this on the development nodes?
Answer:
Yes, as long as it's very brief (a few minutes). People use the development nodes for their work, and you don't want to bog it down for people, and testing a real code can chew up a lot more resources than compiling, etc. The procedures differ depending on what machine you're using.
TCS
On the TCS you can run small MPI jobs on the tcs02 node, which is meant for development use. But even for this test run on one node, you'll need a host file -- a list of hosts (in this case, all tcs-f11n06, which is the `real' name of tcs02) that the job will run on. Create a file called `hostfile' containing the following:
tcs-f11n06 tcs-f11n06 tcs-f11n06 tcs-f11n06
for a 4-task run. When you invoke "poe" or "mpirun", there are runtime arguments that you specify pointing to this file. You can also specify it in an environment variable MP_HOSTFILE, so, if your file is in your /scratch/USER/hostfile, then you would do
export MP_HOSTFILE=/scratch/USER/hostfile
in your shell. You will also need to create a .rhosts file in your home director, again listing tcs-f11n06 so that poe can start jobs. After that you can simply run your program. You can use mpiexec:
mpiexec -n 4 my_test_program
adding -hostfile /path/to/my/hostfile if you did not set the environment variable above. Alternatively, you can run it with the poe command (do a "man poe" for details), or even by just directly running it. In this case the number of MPI processes will by default be the number of entries in your hostfile.
GPC
On the GPC one can run short test jobs on the GPC development nodes gpc01..gpc04; if they are single-node jobs (which they should be) they don't need a hostfile. Even better, though, is to request an interactive job and run the tests either in regular batch queue or using a short high availability debug queue that is reserved for this purpose.
How do I run a longer (but still shorter than an hour) test job quickly ?
Answer
On the GPC there is a high turnover short queue called debug that is designed for this purpose. You can use it by adding
#PBS -q debug
to your submission script.
Running your jobs
OpenMP on the TCS
How do I run an OpenMP job on the TCS?
Answer:
Please look at the TCS Quickstart page.
Can I can use hybrid codes consisting of MPI and openMP on the GPC?
Answer:
Yes. Please look at the GPC Quickstart page.
How do I run serial jobs on GPC?
Answer:
So it should be said first that SciNet is a parallel computing resource, and our priority will always be parallel jobs. Having said that, if you can make efficient use of the resources using serial jobs and get good science done, that's good too, and we're happy to help you.
The GPC nodes each have 8 processing cores, and making efficient use of these nodes means using all eight cores. As a result, we'd like to have the users take up whole nodes (eg, run multiples of 8 jobs) at a time.
It depends on the nature of your job what the best strategy is. Several approaches are presented on the serial run wiki page.
Why can't I request only a single cpu for my job on GPC?
Answer:
On GPC, computers are allocated by the node - that is, in chunks of 8 processors. If you want to run a job that requires only one processor, you need to bundle the jobs into groups of 8, so as to not be wasting the other 7 for 48 hours. See serial run wiki page.
How do I run serial jobs on TCS?
Answer: You don't.
But in the queue I found a user who is running jobs on GPC, each of which is using only one processor, so why can't I?
Answer:
The pradat* and atlaspt* jobs, amongst others, are jobs of the ATLAS high energy physics project. That they are reported as single cpu jobs is an artifact of the moab scheduler. They are in fact being automatically bundled into 8-job bundles but have to run individually to be compatible with their international grid-based systems.
How do I use the ramdisk on GPC?
To use the ramdisk, create and read to / write from files in /dev/shm/.. just as one would to (eg) /scratch/USER/. Only the amount of RAM needed to store the files will be taken up by the temporary file system; thus if you have 8 serial jobs each requiring 1 GB of RAM, and 1GB is taken up by various OS services, you would still have approximately 7GB available to use as ramdisk on a 16GB node. However, if you were to write 8 GB of data to the RAM disk, this would exceed available memory and your job would likely crash.
It is very important to delete your files from ram disk at the end of your job. If you do not do this, the next user to use that node will have less RAM available than they might expect, and this might kill their jobs.
More details on how to setup your script to use the ramdisk can be found on the Ramdisk wiki page.
How can I automatically resubmit a job?
Commonly you may have a job that you know will take longer to run than what is permissible in the queue. As long as your program contains checkpoint or restart capability, you can have one job automatically submit the next. In the following example it is assumed that the program finishes before the 48 hour limit and then resubmits itself by logging into one of the development nodes.
<source lang="bash">
- !/bin/bash
- MOAB/Torque example submission script for auto resubmission
- SciNet GPC
- PBS -l nodes=1:ppn=8,walltime=48:00:00
- PBS -N my_job
- DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from
cd $PBS_O_WORKDIR
- YOUR CODE HERE
./run_my_code
- RESUBMIT 10 TIMES HERE
num=$NUM if [ $num -lt 10 ]; then
num=$(($num+1)) ssh gpc01 "cd $PBS_O_WORKDIR; qsub ./script_name.sh -v NUM=$num";
fi </source>
qsub script_name.sh -v
You can alternatively use Job dependencies through the queuing system which will not start one job until another job has completed.
How can I pass in arguments to my submission script?
If you wish to make your scripts more generic you can use qsub's ability to pass in environment variables to pass in arguments to your script. The following example shows a case where an input and an output file are passed in on the qsub line. Multiple variables can be passed in using the qsub "-v" option and comma delimited.
<source lang="bash">
- !/bin/bash
- MOAB/Torque example of passing in arguments
- SciNet GPC
- PBS -l nodes=1:ppn=8,walltime=48:00:00
- PBS -N my_job
- DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from
cd $PBS_O_WORKDIR
- YOUR CODE HERE
./run_my_code -f $INFILE -o $OUTFILE </source>
qsub script_name.sh -v INFILE=input.txt,OUTFILE=outfile.txt
How can I run a job longer than 48 hours?
Answer:
Currently the maximum time you can run a single job on either the GPC or TCS is 48hours. If your job will take longer, you will have to submit your job in multiple parts, restarting from a checkpoint each time. You can do this manually or using automatic resubmission.
How do priorities work/why did that job jump ahead of mine in the queue?
Answer:
The queueing system used on SciNet machines is a Priority Queue. Jobs enter the queue at the back of the queue, and slowly make their way to the front as those ahead of them are run; but a job that enters the queue with a higher priority can `cut in line'.
The main factor which determines priority is whether or not the user (or their PI) has an LRAC or NRAC allocation. These are competitively allocated grants of computer time; there is a call for proposals towards the end of every calendar year. Users with an allocation have high priorities in an attempt to make sure that they can use the amount of computer time the committees granted them. Their priority decreases as they approach their allotted usage over the current window of time; by the time that they have exhausted that allotted usage, their priority is the same as users with no allocation (unallocated, or `default' users). Unallocated users have a fixed, low, priority.
This priority system is called `fairshare'; the scheduler attempts to make sure everyone has their fair share of the machines, where the share that's fair has been determined by the allocation committee. The fairshare window is a rolling window of two weeks; that is, any time you have a job in the queue, the fairshare calculation of its priority is given by how much of your allocation of the machine has been used in the last 14 days.
A particular allocation might have some fraction of, for the purposes of discussion, the ethernet side of GPC - say 5% of the machine (if the PI had been allocated 10 million CPU hours on GPC-ethernet). The allocations have labels; (called `Resource Allocation Proposal Identifiers', or RAPIs) they look something like
abc-123-ab
where abc-123 is the PIs CCRI, and the suffix specifies which of the allocations granted to the PI is to be used. These can be specified on a job-by-job basis. On GPC, one adds the line
#PBS -A RAPI
to your script; on TCS, one uses
# @ account_no = RAPI
If the allocation to charge isn't specified, a default is used; each user has such a default, which can be changed at the same portal where one changes one's password,
https://portal.scinet.utoronto.ca/
A jobs priority is determined primarily by the fairshare priority of the allocation it is being charged to; the previous 14 days worth of use under that allocation is calculated and compared to the allocated fraction (here, 5%) of the machine over that window (here, 14 days). The fairshare priority is a decreasing function of the allocation left; if there is no allocation left (eg, jobs running under that allocation have already used 379,038 CPU hours in the past 14 days), the priority is the same as that of a user with no granted allocation. (This last part has been the topic of some debate; as the machine gets more utilized, it will probably be the case that we allow RAC users who have greatly overused their quota to have their priorities to drop below that of unallocated users, to give the unallocated users some chance to run on our increasingly crowded system; this would have no undue effect on our allocated users as they still would be able to use the amount of resources they had been allocated by the committees.) Note that all jobs charging the same allocation get the same fairshare priority.
There are other factors that go into calculating priority, but fairshare is the most significant. Other factors include
- amount of time waiting in queue (measured in units of the requested runtime). A job that requests 1 hour in the queue and has been waiting 2 days will get a bump in its priority larger than a job that requests 2 days and has been waiting the same time.
- User adjustment of priorities ( See below ).
The major effect of these subdominant terms is to shuffle the order of jobs running under the same allocation.
How do we manage job priorities within our research group?
Answer:
Obviously, managing shared resources within a large group - whether it is conference funding or CPU time - takes some doing.
It's important to note that the fairshare periods are intentionally kept quite short - just two weeks long. (These exact numbers subject to change as the year goes on and we better understand use patterns, but they're unlikely to change radically). So, for example, let us say that in your resource allocation you have about 10% of the machine. Then for someone to use up the whole two week amount of time in 2 days, they'd have to use 70% of the machine in those two days - which is unlikely to happen by accident. If that does happen, those using the same allocation as the person who used 70% of the machine over the two days will suffer by having much lower priority for their jobs, but only for the next 12 days - and even then, if there are idle cpus they'll still be able to compute.
There will be online tools for seeing how the allocation is being used, and those people who are in charge in your group will be able to use that information to manage the users, telling them to dial it down or up. We know that managing a large research group is hard, and we want to make sure we provide you the information you need to do your job effectively.
One way for users within a group to manage their priorities within the group is with user-adjusted priorities; this is described in more detail on the Scheduling System page.
How do I charge jobs to my NRAC/LRAC allocation?
Answer:
Please see the accounting section of Moab page.
How does one check the amount of used CPU-hours in a project, and how does one get statistics for each user in the project?
Answer:
This information is available on the scinet portal,https://portal.scinet.utoronto.ca, See also SciNet Usage Reports.
Monitoring jobs in the queue
Why hasn't my job started?
Answer:
Use the moab command
checkjob -v jobid
and the last couple of lines should explain why a job hasn't started.
Please see Job Scheduling System (Moab) for more detailed information
How do I figure out when my job will run?
Answer:
Please see Job Scheduling System (Moab)
My job won't start, and checkjob says "Batch:PolicyViolation"
Answer:
The job you're submitting breaks one of the rules of the queues, and is being held until you modify it or kill it and re-submit a conforming job. The most common problems are:
- Job is too long: Jobs on the TCS or GPC may only run for 48 hours at a time; this restriction greatly increases responsiveness of the queue and queue throughput for all our users. If your computation requires longer than that, as many do, you will have to checkpoint your job and restart it after each 48-hour queue window. You can manually re-submit jobs, or if you can have your job cleanly exit before the 48 hour window, there are ways to automatically resubmit jobs .
- Single-node infiniband job: only one quarter of the nodes on GPC are networked with the faster Infiniband network fabric, and so these scarser nodes are limited to those jobs which can make effective use of them. If your job requires only a single node, then since there is no inter-node communication, it cannot possibly matter whether the node has infiniband or not; so the job will not be allowed to run on these nodes. Remove the request for the ib feature in the -l line of your job submission script, and your job will run on the first available ethernet node.
- Incorrect number of processors per node: Jobs on the GPC are scheduled per-node not per-core and since each node has 8 processor cores (ppn=8) the smallest job allowed is one node with 8 cores (nodes=1:ppn=8). For serial jobs users must bundle or batch them together in groups of eight.
How can I monitor my running jobs on TCS?
How can I monitor the load of TCS jobs?
Answer:
You can get more information with the command
/xcat/tools/tcs-scripts/LL/jobState.sh
which I alias as:
alias llq1='/xcat/tools/tcs-scripts/LL/jobState.sh'
If you run "llq1 -n" you will see a listing of jobs together with a lot of information, including the load.
Errors in running jobs
On GPC, `Job cannot be executed'
I get error messages like this trying to run on GPC:
PBS Job Id: 30414.gpc-sched Job Name: namd Exec host: gpc-f120n011/7+gpc-f120n011/6+gpc-f120n011/5+gpc-f120n011/4+gpc-f120n011/3+gpc-f120n011/2+gpc-f120n011/1+gpc-f120n011/0 Aborted by PBS Server Job cannot be executed See Administrator for help PBS Job Id: 30414.gpc-sched Job Name: namd Exec host: gpc-f120n011/7+gpc-f120n011/6+gpc-f120n011/5+gpc-f120n011/4+gpc-f120n011/3+gpc-f120n011/2+gpc-f120n011/1+gpc-f120n011/0 An error has occurred processing your job, see below. request to copy stageout files failed on node 'gpc-f120n011/7+gpc-f120n011/6+gpc-f120n011/5+gpc-f120n011/4+gpc-f120n011/3+gpc-f120n011/2+gpc-f120n011/1+gpc-f120n011/0' for job 30414.gpc-sched Unable to copy file 30414.gpc-sched.OU to cmadill@gpc-f101n084.scinet.local:/scratch/cmadill/projects/sim-performance-test/runtime/l/namd/8/namd.o30414 *** error from copy 30414.gpc-sched.OU: No such file or directory *** end error output
Try doing the following:
mkdir /scratch/${USER}/.pbs_spool ln -s /scratch/${USER}/.pbs_spool ~/.pbs_spool
This is how all new accounts are setup on SciNet.
/home on GPC for compute jobs is mounted as a read-only file system. PBS by default tries to spool its output files to ${HOME}/.pbs_spool which fails as it tries to write to a read-only file system. New accounts at SciNet get around this by having ${HOME}/.pbs_spool point to somewhere appropriate on /scratch, but if you've deleted that link or directory, or had an old account, you will see errors like the above.
My GPC job died, telling me `Copy Stageout Files Failed'
Answer:
When a job runs on GPC, the scripts standard output and error are redirected to /home/$USER/.pbs_spool/$PBS_JOBID.gpc-sched.OU and /home/$USER/.pbs_spool/$PBS_JOBID.gpc-sched.ER respectively.
/home/$USER/.pbs_spool is a symbolic link to /scratch/$USER/.pbs_spool, as the compute nodes can't write to the /home directory. At the end of the job, those .OU and .ER files are copied to where the batch script tells them to be copied, by default $PBS_JOBNAME.o$PBS_JOBID and$PBS_JOBNAME.e$PBS_JOBID. (You can set those filenames to be something clearer with the -e and -o options in your PBS script.)
When you get errors like this:
An error has occurred processing your job, see below. request to copy stageout files failed on node
it means that the copying back process has failed in some way. There could be a few reasons for this. The first thing to make sure that your .bashrc does not produce any output, as the output-stageout is performed by bash and further output can cause this to fail. But it also could have just been a random filesystem error, or it could be that your job failed spectacularly enough to shortcircuit the normal job-termination process and those files just never got copied.
Generally if you look in /home/$USER/.pbs_spool for the files corresponding to that job ID, you will find them, and that may offer some clues as to what happened. This directory is actually a symbolic link to a /scratch/$USER/.pbs_spool. So do not remove either /home/$USER/.pbs_spool nor /scratch/$USER/.pbs_spool!
Another transport will be used instead
I get error messages like the following when running on the GPC at the start of the run, although the job seems to proceed OK. Is this a problem?
-------------------------------------------------------------------------- [[45588,1],0]: A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces: Module: OpenFabrics (openib) Host: gpc-f101n005 Another transport will be used instead, although this may result in lower performance. --------------------------------------------------------------------------
Answer:
Everything's fine. The two MPI libraries scinet provides work for both the InifiniBand and the Gigabit Ethernet interconnects, and will always try to use the fastest interconnect available. In this case, you ran on normal gigabit GPC nodes with no infiniband; but the MPI libraries have no way of knowing this, and try the infiniband first anyway. This is just a harmless `failover' message; it tried to use the infiniband, which doesn't exist on this node, then fell back on using Gigabit ethernet (`another transport').
Data on SciNet disks
How do I find out my disk usage?
Answer:
The standard unix/linux utilities for finding the amount of disk space used by a directory are very slow, and notoriously inefficient on the GPFS filesystems that we run on the SciNet systems. There are utilities that very quickly report your disk usage:
The
mmlsquota
command lists the user's current usage on the filesystems that run quotas (currently /home).
In order to get the usage on the /scratch filesystem, run the command
/scinet/gpc/bin/scratchUsage
In addition, the diskUsage command, available on the login nodes and the GPC devel nodes, reports how much disk space is being used by yourself and your group on the home, scratch, and project file systems, and how much remains available. This information is updated hourly. More information about these filesystems is available at the Storage Quick-start.
How do I transfer data to/from SciNet?
Answer:
All incoming connections to SciNet go through relatively low-speed connections to the login.scinet gateways, so using scp to copy files the same way you ssh in is not an effective way to move lots of data. Better tools are described in our page on Data Transfer.
My group works with data files of size 1-2 GB. Is this too large to transfer by scp to login.scinet.utoronto.ca ?
Answer:
Generally, occasion transfers of data less than 10GB is perfectly acceptible to so through the login nodes. See Data Transfer.
How can I check if I have files in /scratch that are scheduled for automatic deletion?
Answer:
Please see Storage At SciNet
Keep 'em Coming!
Next question, please
Send your question to <support@scinet.utoronto.ca>; we'll answer it asap!