Scheduler
The queueing system used at SciNet is based around Cluster Resources Moab Workload Manager. Moab is used on both the GPC and TCS however Torque is used as the backend resource manager on the GPC and IBM's LoadLeveler is used on the TCS.
This page outlines some of the most common Moab commands with full documentation available from Moab here.
Queues
GPC
batch
The batch queue is the default queue on the GPC allowing the user access to all the resources for jobs upto 48 hours. If a specific queue is not specified, -q flag, then a job is submitted to the batch queue.
interactive
An interactive queue has been set up primarily for code developers to quickly test and evaluate their codes and configurations without have to wait in the batch queue. There are 10 nodes currently reserved for the interactive queue. It has quite restrictive limits to promote high turnover and availability. A user can only use a maximum of 2 nodes (16 cores) for a maximum of 2 hours, and a user can only have one job in the interactive queue at a time.
qsub -l nodes=1:ppn=8,walltime=1:00:00 -q interactive
TCS
The TCS currently only has one queue, or class, in use called "verylong" and its only limitation is that jobs must be under 48 hours.
#@ class = verylong
Job Info
To see all jobs queued on a system use
showq
Three sections are shown; running, idle, and blocked. Idle jobs are commonly referred to as queued jobs as they meet all the requirements, however they are waiting for available resources. Blocked jobs are either caused by improper resource requests or more commonly by exceeding a user or groups allowable resources. For example if you are allowed to submit 10 jobs and you submit 20, the first 10 jobs will be submitted properly and either run right away or be queued, however the other 10 jobs will be blocked and the jobs won't be submitted to the queue until one of the first 10 finishes.
If showq is returning output slowly, you can query cached info using
showq --noblock
Available Resources
To show how many total nodes are currently free, use the show back fill command
showbf -A
To show how many infiniband nodes are free use
showbf -f ib
For example
$ showbf -A Partition Tasks Nodes Duration StartOffset StartDate --------- ----- ----- ------------ ------------ -------------- ALL 14728 1839 7:36:23 00:00:00 00:23:37_09/24 ALL 256 30 INFINITY 00:00:00 00:23:37_09/24
shows that for jobs under 7:36:23 you can use 1839 nodes, but if you submit a job over that time only 30 will be available. In this case this is due to a large reservation made my SciNet staff, but from a users point of view, showbf tells you very simply what is available and at what time point.
In this case, a user may wish to set #PBS -l walltime=7:30:00 in their script, or add -l walltime=7:30:00 to their qsub command in order to ensure that the jobs backfill the reserved nodes.
Job Submission
Interactive
On the GPC an interactive queue session can be requested using the following
qsub -l nodes=2:ppn=8,walltime=1:00:00 -I
Non-interactive (Batch)
For a non-interactive job submission you require a submission script formatted for the appropriate resource manger. Examples are provided for the GPC and TCS.
Job Status
checkjob jobid
Cancel a Job
canceljob jobid
User Stats
Show current usage stats for a $USER
showstats -u $USER
Reservations
showres
Standard users can only see their reservations not other users or system ones. To determine what is available a user can use "showbf", it shows what resources are available and at what time level, taking into account running jobs and all the reservations. Refer to the Available Resources section of this page for more details.