Difference between revisions of "Scheduler"

Revision as of 15:08, 17 February 2010

The queueing system used at SciNet is based around Cluster Resources Moab Workload Manager. Moab is used on both the GPC and TCS however Torque is used as the backend resource manager on the GPC and IBM's LoadLeveler is used on the TCS.

This page outlines some of the most common Moab commands with full documentation available from Moab here.

Queues

GPC

batch

The batch queue is the default queue on the GPC allowing the user access to all the resources for jobs upto 48 hours. If a specific queue is not specified, -q flag, then a job is submitted to the batch queue.

debug

A debug queue has been set up primarily for code developers to quickly test and evaluate their codes and configurations without having to wait in the batch queue. There are 10 nodes currently reserved for the debug queue. It has quite restrictive limits to promote high turnover and availability thus a user can only use a maximum of 2 nodes (16 cores) for 2 hours, and can only have one job in the debug queue at a time.

qsub -l nodes=1:ppn=8,walltime=1:00:00 -q debug -I

largemem

The largemem queue is used for accessing one of two 16 core with 128 GB memory intel Xeon (non-nehalem) nodes.

qsub -l nodes=1:ppn=16,walltime=1:00:00 -q largemem -I

TCS

The TCS currently only has one queue, or class, in use called "verylong" and its only limitation is that jobs must be under 48 hours.

#@ class           = verylong

Job Info

To see all jobs queued on a system use

showq

Three sections are shown; running, idle, and blocked. Idle jobs are commonly referred to as queued jobs as they meet all the requirements, however they are waiting for available resources. Blocked jobs are either caused by improper resource requests or more commonly by exceeding a user or groups allowable resources. For example if you are allowed to submit 10 jobs and you submit 20, the first 10 jobs will be submitted properly and either run right away or be queued, however the other 10 jobs will be blocked and the jobs won't be submitted to the queue until one of the first 10 finishes.

If showq is returning output slowly, you can query cached info using

showq --noblock

Available Resources

To show how many total nodes are currently free, use the show back fill command

showbf -A

To show how many infiniband nodes are free use

showbf -f ib

For example checking for a batch job

$ showbf -c batch
Partition     Tasks  Nodes      Duration   StartOffset       StartDate
---------     -----  -----  ------------  ------------  --------------
ALL           14728   1839       7:36:23      00:00:00  00:23:37_09/24
ALL             256     30      INFINITY      00:00:00  00:23:37_09/24

shows that for jobs under 7:36:23 you can use 1839 nodes, but if you submit a job over that time only 30 will be available. In this case this is due to a large reservation made my SciNet staff, but from a users point of view, showbf tells you very simply what is available and at what time point. In this case, a user may wish to set #PBS -l walltime=7:30:00 in their script, or add -l walltime=7:30:00 to their qsub command in order to ensure that the jobs backfill the reserved nodes.

Another way to check is to use

showstart procs@duration

to check when a job will run. Say for example I wanted to know when a 1200 processor job that needs 7 hours would run

$showstart 1200@7:00:00
job 1200@7:00:00 requires 1200 procs for 7:00:00

Estimated Rsv based start in                00:00:00 on Fri Sept 24 23:37:11
Estimated Rsv based completion in           07:00:00 on Sat Sept 25 06:37:11

This shows that the job would start right away as there is enough resources, however if I asked for 10 hours

$showstart 1200@10:00:00
job 1200@10:00:00 requires 1200 procs for 10:00:00

Estimated Rsv based start in                03:45:00 on Sat Sept 25 02:22:11
Estimated Rsv based completion in           13:45:00 on Sat Sept 25 12:22:11

the job wouldn't start for 3hours and 45 minutes.

Job Submission

Interactive

On the GPC an interactive queue session can be requested using the following

qsub -l nodes=2:ppn=8,walltime=1:00:00 -I

Non-interactive (Batch)

For a non-interactive job submission you require a submission script formatted for the appropriate resource manger. Examples are provided for the GPC and TCS.

Job Status

checkjob jobid

Cancel a Job

canceljob jobid

User Stats

Show current usage stats for a $USER

showstats -u $USER

Reservations

showres

Standard users can only see their reservations not other users or system ones. To determine what is available a user can use "showbf", it shows what resources are available and at what time level, taking into account running jobs and all the reservations. Refer to the Available Resources section of this page for more details.

Job Dependencies

Sometimes you may want one job not to start until another job finishes, however you would like to submit them both at the same time. This can be done using job dependencies on both the GPC and TCS, however the commands are different due to the underlying resource managers being different.

GPC

Use the -W flag with the following syntax in your submission script to have this job not start until the job with jobid or jobName (given with -N jobName) is finished

-W depend:after:{jobid | jobName}

More detailed syntax and examples can be found [here ] and [here].

TCS

Loadleveler does job dependencies using what they call steps. See the TCS Quickstart guide for an example.

Adjusting Job Priority

The ability to adjust job priorities downwards can also be of use to adjust relative priorities of jobs between users who are running jobs of the same allocation (eg, a default, LRAC, or NRAC allocation of the same PI). Priorities are determined by how much of the time of that allocation been currently used, and all users using that account will have identical priorities. This mechanism allows users to voluntarily reduce their priority to allow other users of the same allocation to run ahead of them.

In principle, by adjusting a jobs priority downwards, you could reduce your jobs priority to the point that someone elses job entirely could go ahead of yours. In practice, however, this is extremely unlikely. Users with LRAC or NRAC allocations have priorities that are extremely large positive numbers that depend on their allocation and how much of it they have already used during the past fairshare window (2 weeks); it is very unlikely that two groups would have priorities that are within 10 or 100 or 1000 of each other.

Note that at the moment, we do not allow priorities to go negative; they are integers that can go no lower than 1. (This may change in the future) That means that users of accounts that have already used their full allocation during the current fairshare period (eg, over the past two weeks), and so whose priority would normally be negative but is capped at 1, can not lower their priority any further. Similar, users with a `default' allocation have priority 1, and cannot lower their priorities any further.

GPC

Moab allows users to adjust their jobs' priority moderately downwards, with the -p flag; that is, on a qsub line

qsub ... -p -10

or in a script

...
#PBS -p -10
..

The number used (-10 in the examples above) can be any negative number down to -1024.

The ability to adjust job priorities downwards can be useful when you are running a number of jobs and want some to enter the queue at higher priorities than others. Note that if you absolutely require some jobs to start before others, you could use job dependencies instead.

For a job that is currently queued, one can adjust its priority with

mjobctl -p -10

TCS

TCS users can adjust their priorities by putting the following line in their scripts

#@ user_priority = 50

where the number can be between 0 (which is 50 below the default priority) to 50 (the default priority).