GPU Devel Nodes

GPU Development Cluster
GPU Development Cluster
Installed	April 2011
Operating System	Linux Centos 5.3
Interconnect	Infiniband
Ram/Node	48 Gb
Cores/Node	8 with 2xGPUs
Login/Devel Node	arc01 (from login.scinet)
Vendor Compilers	nvcc (gcc,icc)
Queue Submission	Torque

The GPU cluster consists of 8 x86_64 nodes each with two quad core Intel Xeon X5550 2.67GHz CPUs with 48GB of RAM per node. Each node has two NVIDIA Tesla M2070 GPUs with CUDA Capability 2.0 (Fermi) each with 448 CUDA Cores @ 1.15GHz and 6 GB of RAM. The nodes are all connected the DDR Infiniband for MPI communications and Gigabit ethernet for disk I/O to the SciNet GPFS filesystems. In total this cluster contains 64 x86_64 cores with 384 GB of system RAM and 16 GPUs with 96 GB GPU RAM total.

Nodes

Login

First login via ssh with your scinet account at login.scinet.utoronto.ca, and from there you can proceed to arc01 which is the GPU development node.

Access to these machines is currently controlled. Please email support@scinet.utoronto.ca for access.

Devel

As mentioned arc01 is the head/develop node for interactive use. This node is for compiling, short testing, and submitting batch jobs to the compute nodes. It is a shared resource so treat it accordingly and use the queue and compute nodes for long are large computations.

Compute

To access the other 7 compute nodes with GPU's you need to use the queue, similar to the standard GPC compute nodes. Currently the nodes are scheduled by complete node, 8 cores and 2 GPUs, and a maximum walltime of 48 hours.

For an interactive job use

qsub -l nodes=1:ppn=8:gpus=2,walltime=48:00:00 -I

or for a batch job use

qsub script.sh

where script.sh is <source lang="bash">

!/bin/bash
Torque submission script for SciNet ARC
PBS -l nodes=2:ppn=8:gpus=2,walltime=1:00:00
PBS -N GPUtest

cd $PBS_O_WORKDIR

EXECUTION COMMAND; -np = nodes*ppn

mpirun -np 16 ./a.out </source>

Software

The same software installed on the GPC is available on ARC using the same modules framework. See here for full details.

Programming Frameworks

Currently there are two programming frameworks to use, NVIDIA's CUDA framework or OpenCL.

CUDA

The current installed CUDA Toolkits are 3.0, 3.1, 3.2 (default) and 4.0RC2. To use 3.2 just add the following module

module load cuda/3.2

Note that to use the full 6GB or memory per GPU, CUDA 3.2 or newer must be used.

The CUDA driver is installed locally, however the CUDA Toolkits are installed in.

/project/scinet/arc/cuda-$VERSION/

The environment variable $SCINET_CUDA_INSTALL is set when a cuda module is loaded and it points to the install location. This is useful when setting up makefiles and if you use the NVIDIA_SDK build evironment, modify the NVIDIA_SDK/C/common/common.mk file accordingly.

CUDA_INSTALL_PATH = $SCINET_CUDA_INSTALL

NVIDIA SDK

The latest CUDA SDK can be copied into your home directory from:

/scinet/arc/src/gpucomputingsdk_4.0.13_linux.run

NOTE: Not all of the CUDA and OpenCL examples will compile as many require OpenGL graphic libraries not installed on the nodes.

OpenCL

As of 3.0, OpenCL is included in the CUDA Toolkit so loading the CUDA module is all that is required.

Compilers

nvcc - Nvidia cuda compiler (uses gcc-4.4 by default)
gcc - GNU compiler (nvcc need to have either gcc-4.4 or gcc-4.6 module loaded to work correctly)
icc - Intel compiler

Note that you'll have to let the cuda compiler know about the capabilities of the Fermi graphics card by supplying the flag

-arch=sm_13

Debuggers

cuda-gdb - Nvidia text based gdb variant, part of the cuda module.
ddt - Allinea's graphical DDT debugger, in the ddt module (DDT does not support cuda 4.0 yet).

Note that to debug both host and cuda device code, you have to give the

-g -G

pair of flags to nvcc.

MPI

The GPC MPI packages can be used on this system. See the GPC section on MPI for more details.

Driver Version

The current NVIDIA driver version installed is 270.40.

Documentation

CUDA
- google "CUDA"

OpenCL
- see above

Further Info

User Codes

Please discuss and put any relevant information/problems/best practices you have encountered when using/developing for CUDA and/or OpenCL