GPU Devel Nodes

login.scinet
login.scinet.utoronto.ca
arc01
arc01
arc09<\tt>, with a single Intel Xeon Phi and a NVIDIA Tesla K20 is also available for testing these newer technologies.
For full details see  here . 

Compute
To access the other 7 compute nodes with GPU's you need to use the queue, similar to the standard GPC compute nodes.
Currently the nodes are scheduled by complete node, 8 cores and 2 GPUs, and a maximum walltime of 48 hours.
For an interactive job use

qsub -l nodes=1:ppn=8:gpus=2,walltime=48:00:00 -q arc -I

or for a batch job use

qsub script.sh 

where script.sh is
<source lang="bash">

!/bin/bash
Torque submission script for SciNet ARC

PBS -l nodes=2:ppn=8:gpus=2,walltime=1:00:00
PBS -N GPUtest
PBS -q arc
cd $PBS_O_WORKDIR

EXECUTION COMMAND; -np = nodes*ppn
mpirun -np 16 ./a.out
</source>
To check running jobs on the gpu nodes only use

showq -w class=arc

Software
The same software installed on the GPC is available on ARC using the same modules framework. 
See  here for full details.

Programming Frameworks
Currently there are four programming frameworks to use: NVIDIA's CUDA framework, PGI's CUDA Fortran, PGI's implementation of OpenACC, or OpenCL.

NVIDIA toolkit
CUDA
The current installed CUDA Toolkits are 3.2, 4.0, 4.1 (default), and 4.2.  To use 4.0 just add the following module

module load cuda/4.0

Note that to use the full 6GB or memory per GPU, CUDA 3.2 or newer must be used.
The CUDA driver is installed locally, however the CUDA Toolkits are installed in.

/scinet/arc/cuda-$VERSION/

The environment variable $SCINET_CUDA_INSTALL is set when a cuda module is loaded and it points to the
install location.  This is useful when setting up makefiles and if you use the NVIDIA_SDK
build evironment, modify the NVIDIA_SDK/C/common/common.mk file accordingly.

CUDA_INSTALL_PATH = $SCINET_CUDA_INSTALL 

The Nvidia cuda compiler (which uses gcc/4.4.6 by default for CUDA < 4.1, while cuda/4.2 uses gcc/4.6.1), is called nvcc,
You'll have to let the cuda compiler know about the capabilities of the Fermi graphics card by supplying the flag 

-arch=sm_13
 or 
-arch=sm_20
NVIDIA Toolkit
For cuda versions 4.0, 4.1, and 4.2, the CUDA SDK can be copied into your home directory from: 

/scinet/arc/src/gpucomputingsdk_4.0.13_linux.run
/scinet/arc/src/gpucomputingsdk_4.1.28_linux.run
/scinet/arc/src/gpucomputingsdk_4.2.9_linux.run

respectively.
However, for cuda 5.0, the sdk code samples can be copied from the directory

$SCINET_CUDA_INSTALL/samples/

NOTE: Not all of the CUDA and OpenCL examples will compile as many require OpenGL graphic libraries not installed on the nodes.

OpenCL
As of 3.0, OpenCL is included in the CUDA Toolkit so loading the CUDA module is all that is required.

PGI compilers
As of July 2012, The PGI suite of compilers is installed on the ARC.  These can be accessed by 

$  module load gcc/4.6.1 pgi/12.6
(if you use the older pgi/12.5, gcc/4.4.6 is a requirement, and is used, for instance, in the CUDA parts of the PGI compilers). These compilers use their own cuda installation, so you do not need to load an additional cuda module. By default, they use a cuda 4.1 installation, but you can request cuda 4.2 as well using the -Mcuda=4.2 option.
The compilation commands are pgcc, pgcpp and pgfortran for c, c++ and fortran, respectively. As usual, we advice to compiler with optimization using the flags

-O4 -fastsse

The compilers will then optimize for the specific machine that you are compiling on.
The PGI compilers support OpenMP as well through the compile and link flags

-mp

CUDA Fortran
The PGI fortran compiler (pgfortran, also pgf77 and pgf90) understands CUDA extensions to fortran. 
This compiler will automatically understand these extension for source files with the file extension .cuf  Otherwise, you have to specify 

-Mcuda=4.1

OpenACC
OpenACC is a compiler-directive approach to GPGPU programming. The PGI compilers (c, c++ and fortran) have a partial implementation of this open specification. To switch this on, use the options

-acc -ta=nvidia -Mcuda=4.1
More documentation
Manuals are on the  Tutorials and Manuals page.

Other compilers
gcc,g++,gfortran - GNU compiler (nvcc need to have either gcc-4.4 or gcc-4.6 module loaded to work correctly)
icc,icpc,ifort - Intel compiler
Debuggers
ddt - Allinea's graphical DDT debugger, in the ddt module. The most recent version, ddt 4.0 supports cuda 4.0, 4.1, 4.2 and 5.0.
cuda-gdb - Nvidia text based gdb variant, part of the cuda module.

Note that to debug both host and cuda device code, you have to give the 
-g -G
 pair of flags to nvcc.

MPI
The GPC MPI packages can be used on this system. See the GPC section on MPI  for more details.
While these mpi packages should work with the PGI compilers as well, this has not been tested and standard wrappers like mpif90 may not work.

Alternatively, for mpi compilations with the PGI compilers, you can load the mpich1 mpi implementation with 
module load mpich1/pgi
after which you can use the option 
-Mmpi
 or the wrapper scripts mpicc, mpiCC and mpif90, as well as mpirun.  

Driver Version
The current NVIDIA driver version installed is 295.41.

Documentation
CUDA
google "CUDA"
OpenCL
see above
Further Info
User Codes
Please discuss and put any relevant information/problems/best practices you have encountered when using/developing for CUDA and/or OpenCL

GPU Development Cluster

Installed	April 2011
Operating System	Linux Centos 6.0
Number of Nodes	8
Interconnect	DDR Infiniband
Ram/Node	48 Gb
Cores/Node	8 with 2xGPUs
Login/Devel Node	arc01 (from `login.scinet`)
Vendor Compilers	nvcc,pgcc,icc,gcc
Queue Submission	Torque

GPU Devel Nodes

Contents

Nodes

Login

Devel

Xeon Phi/ Tesla K20

Compute

Software

Programming Frameworks

NVIDIA toolkit

CUDA

NVIDIA Toolkit

OpenCL

PGI compilers

CUDA Fortran

OpenACC

More documentation

Other compilers

Debuggers

MPI

Driver Version

Documentation

Further Info

User Codes

Navigation menu

Search