User Serial

Running serial jobs on SciNet

You should not submit purely serial jobs to the queue (on either GPC or TCS), since it would mean wasting the computational power of 7 (or 31, on the TCS) cpus while the jobs is running. While we encourage you to try and parallelize your code, sometimes it is beneficial to run several serial codes at the same time. Note that because TCS is machine specialized for parallel computing, you should only use the GPC for concurrent serial runs.

Serial jobs of similar duration

In the FAQ it is explained how to run bunches of eight jobs at the same time, so that all cpus are kept busy. This is a good solution if:

the jobs have the same or similar duration
all eight jobs fit in memory simultaneously
there are no memory contention issues
there are not too many jobs: if you have many jobs to bunch, eventually some of these will be unbalanced.

Serial jobs of varying duration

If you have a lot (50+) of relatively short serial jobs to do, and you know that eight jobs fit in memory without memory issues, the following strategy in your submission script maximizes the cpu utilization: <source lang="bash">

!/bin/bash
MOAB/Torque submission script for multiple, dynamically-run
serial jobs on SciNet GPC
PBS -l nodes=1:ppn=8,walltime=1:00:00
PBS -N serialdynamic

DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from

cd $PBS_O_WORKDIR

COMMANDS ARE ASSUMED TO BE SCRIPTS CALLED job*, WHICH CALL THE MAIN EXECUTABLE:

psname='myrun'

EXECUTE COMMANDS

for serialjob in jobs* do

   sleep 5
   njobs=`ps -C $psname|wc -l`
   while [ $njobs -gt 8 ]
   do
       sleep 5
       njobs=`ps -C $psname|wc -l`
   done
   $serialjob &

done wait </source> Notes:

This is the simplest case of dynamically run serial jobs.
Doing many serial jobs often entails doing many disk reads and writes, which can be detrimental to the performance. In that case, running off the ramdisk may be an option.
When using a ramdisk, make sure you copy your results from the Ramdisk back to the scratch after the runs, or when the job is killed because time has run out.

--Rzon 02:22, 2 April 2010 (UTC)

Version for more than 8 cores at once (still serial)

If you have hundreds of serial jobs that you want to run concurrently and the nodes are available, then the code above, while useful, would require tens of scripts to be submitted separately. It is possible for you to request more than one gigE node and to use the following routine to distribute your processes amongst the cores.

!/bin/bash
MOAB/Torque submission script for multiple, dynamically-run
serial jobs on SciNet GPC
PBS -l nodes=100:ppn=8,walltime=1:00:00
PBS -N serialdynamicMulti

DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from

cd $PBS_O_WORKDIR

FUNCTIONS

function init_dist2nodes {

  D2N_AVAIL=($(cat $PBS_NODEFILE|uniq))
  D2N_NUM=$(cat $PBS_NODEFILE|uniq|wc -l)
  D2N_COUNTER=0

}

function dist2nodes {

  D2N_SELECTED=$(echo ${D2N_COUNTER}/8|bc)
  if((${D2N_SELECTED}>${D2N_NUM}-1)); then
    let "D2N_SELECTED=${D2N_NUM}-1"
  fi
  D2N_NODE=${D2N_AVAIL[${D2N_SELECTED}]}
  let "D2N_COUNTER=D2N_COUNTER+1"

}

INITIALIZATION

init_dist2nodes mydir=$(pwd)

MAIN CODE

for((i=1;i<=800;i++)); do

 #call dist2nodes to store the name of the next node to run on in the variable D2N_NODE
 dist2nodes

 #here is where you put the command that you will run many times. It could be another script that takes an argument or simply an executable
 ssh $D2N_NODE "cd ${mydir}; ./my_command.sh $i" &

done wait </source> Notes:

You can run more or fewer than 8 processes per node by modifying the number 8 in the dist2nodes function
Refer also to notes in the above section.

--cneale 12 May 2010 (UTC)

User Serial

Contents

Running serial jobs on SciNet

Serial jobs of similar duration

Serial jobs of varying duration

Version for more than 8 cores at once (still serial)

Navigation menu

Search