GPC Quickstart
General Purpose Cluster (GPC) | |
---|---|
Installed | June 2009 |
Operating System | Linux |
Interconnect | 1/4 on Infiniband, rest on GigE |
Ram/Node | 16 Gb |
Cores/Node | 8 |
Login/Devel Node | gpc-login01 (142.150.188.51) |
Vendor Compilers | icc (C) ifort (fortran) icpc (C++) |
Queue Submission | Moab/Torque |
The General Purpose Cluster is an extremely large cluster (ranked 16th in the world at its inception, and fastest in Canada) and is where most simulations are to be done at SciNet. It is an IBM iDataPlex cluster based on Intel's Nehalem architecture (one of the first in the world to make use of the new chips). The GPC consists of 3,780 nodes with a total of 30,240 2.5GHz cores, with 16GB RAM per node (2GB per core). Approximately one quarter of the cluster is be interconnected with non-blocking 4x-DDR Infiniband while the rest of the nodes are connected with gigabit ethernet.
Login
Currently you need to log into the TCS first (142.150.188.41) then log into the GPC devel nodes listed below. This will change in the near future.
Compile/Devel Nodes
From a login node you can ssh to gpc-f101n001 and gpc-f101n002. These are the same hardware configuration as the as most of the compute nodes -- 8 Nehalem processing cores with 16GB RAM and Gigabit ethernet. You can compile and test your codes on these nodes. To interactively test on more than processors, or to test your code over an Infiniband connection, you can submit an interactive job request.
Your home directory is in /home/USER; you have 10Gb there that is backed up. This directory cannot be written to by the compute nodes! Thus, to run jobs, you'll use the /scratch/USER directory. Here, there is a large amount of disk space, but it is not backed up. Thus it makes sense to keep your code in /home, compile there, and then run them in the /scratch directory.
Environment Variables
A modules system is used to handle environment variables associated with different compilers, MPI versions, libraries etc. To see all the options available type
module avail
To load a module
module load intel
To unload a module
module unload intel
To unload all modules
module purge
These commands should go in your .bashrc files and/or in your submission scripts to make sure you
are using the correct packages.
Compilers
The intel compilers are icc/icpc/ifort for C/C++/Fortran. For MPI jobs, the scripts mpicc/mpiCC/mpif90 are wrappers to the compilers which ensure the MPI header files and libraries are correctly included and linked to.
Submitting A Job
The SciNet machines are shared systems, and jobs that are to run on them are submitted to a queue; the scheduler then orders the jobs in order to make the best use of the machine, and has them launched when resources become availble. The intervention of the scheduler can mean that the jobs aren't quite run in a first-in first-out order.
The maximum wallclock time for a job in the queue is 48 hours; computations that will take longer than this must be broken into 48-hour chunks and run as several jobs. The usual way to do this is with checkpoints, writing out the complete state of the computation every so often in such a way that a job can be restarted from this state information and continue on from where it left off. Generating checkpoints is a good idea anyway, as in the unlikely event of a hardware failure during your run, it allows you to restart without having lost much work.
If your job should run in fewer than 48 hours, specify that in your script -- your job will start sooner. (It's easier for the scheduler to fit in a short job than a long job). On the downside, the job will be killed automatically by the queue manager software at the end of the specified wallclock time, so if you guess wrong you might loose some work. So the standard procedure is to estimate how long your job will take and add 10% or so.
You interact with the queuing system through the queue/resource manager, Moab. (On the back end, the scheduler on the GPC is Torque, but you won't be directly interacting with it.) To see all the jobs in the queue use
showq
To submit your own job, you must write a script which describes the job and how it is to be run (a sample script follows) and submit it to the queue, using the command
qsub SCRIPT-FILE-NAME
where you will replace SCRIPT-FILE-NAME with the file containing the submission script. This will return a job ID, for example 31415, which is used to identify the jobs. Information about a queued job can be found using
checkjob JOB-ID
and jobs can be canceled with the command
canceljob JOB-ID
Again, these commands have many options, which can be read about on their man pages.
Much more information on the queueing system is available on our queue page.
Submission Script
A sample submission script is shown below for an mpi job using ethernet with the #PBS directives at the top and the rest being what will be executed on the compute node.
#!/bin/bash # MOAB/Torque submission script for SciNet GPC (ethernet) # #PBS -l nodes=2:ppn=8,walltime=1:00:00,os=centos53computeA #PBS -N test # ENVIRONMENT VARIABLES module load intel openmpi # DIRECTORY TO RUN /scratch/USER/SOMEDIRECTORY # EXECUTION COMMAND mpirun -np 16 -hostfile $PBS_NODEFILE ./a.out
MPI over Infiniband
About 1/4 of the GPC (864 nodes or 6912 cores) is connected with a high bandwidth low-latency fabric called Infiniband. Infiniband is best suited for highly coupled, highly scalable parallel jobs. Due to the limited number of these nodes, they are to be used only for jobs that are known to scale well and will benefit from this type of interconnect. To use the infiniband interconnect for MPI communications OpenMPI can also be used.
You will need to source one of the following to setup the appropriate environment variables depending on if you want to compile with the
INTEL
module load openmpi intel
GCC
module load openmpi gcc
Openmpi uses the wrappers mpicc/mpicxx/mpif90/mpif77 for the compilers.
Currently you can compile and link your MPI code on the development nodes gpc-f101n001 and gpc-f101n002 however you will not be able to interactively test as these nodes are not connected with Infiniband. You can alternatively compile, link, and test an MPI code using an interactive queue session, using the os image "centos53develibA" as follows.
qsub -l nodes=2:ib:ppn=8,walltime=12:00:00,os=centos53develibA -I
Once you have compiled your MPI code and would like to test it, use the following command with $PROCS being the number of processors to run on and a.out being your code.
mpirun -np $PROCS -hostfile $PBS_NODEFILE ./a.out
MPI Submission Script
To run your MPI-Infiniband job in a non-interactive queue you can use a submission script as follows, remembering to source the appropriate environment variables.
#!/bin/bash # MOAB/Torque submission script for SciNet GPC (Infiniband) # #PBS -l nodes=2:ib:ppn=8,walltime=1:00:00,os=centos53computeA #PBS -N testib # INTEL & OPENMPI ENVIRONMENT VARIABLES module load intel openmpi # DIRECTORY TO RUN /scratch/USER/SOMEDIRECTORY # EXECUTION COMMAND mpirun -np 16 -hostfile $PBS_NODEFILE ./a.out