GPU Devel Nodes
GPU Development Cluster | |
---|---|
Installed | April 2011 |
Operating System | Linux |
Interconnect | Infiniband |
Ram/Node | 48 Gb |
Cores/Node | 8 |
Login/Devel Node | arc01 (from login.scinet) |
Vendor Compilers | gcc,nvcc |
The Intel nodes have two 4 core Xeon X5550 2.67GHz CPU's with 48GB of RAM per node along with two NVIDIA M2070 (Fermi) GPU's each with 6 GB or RAM.
Login
First login via ssh with your scinet account at login.scinet.utoronto.ca, and from there you can proceed to ar01 which is the GPU development node.
Access to these machines is currently controlled. Please email support@scinet.utoronto.ca for access.
Compile/Devel Node
Software
The same software installed on the GPC is available on ARC using the same modules framework. See here for full details.
Driver Version
The current NVIDIA driver version installed is 270.40.
Programming Frameworks
Currently there are two programming frameworks to use, NVIDIA's CUDA framework or OpenCL.
CUDA
The current CUDA Toolkits in use are 3.0, 3.1, 3.2 (default) and 4.0. To use 3.2 just add the following module
module load cuda/3.2
Note that to use the full 6GB or memory per GPU, at least CUDA 3.2 must be used.
The CUDA driver is installed locally, however the CUDA Toolkits are installed in.
/project/scinet/arc/cuda-$VERSION/
The variable $SCINET_CUDA_INSTALL is set when a cuda module is loaded and is pointed to the install location. This is useful when setting up your makefile or if you use the NVIDIA_SDK makefiles modify the NVIDIA_SDK/C/common/common.mk file accordingly.
CUDA_INSTALL_PATH ?= $SCINET_CUDA_INSTALL
OpenCL
As of 3.0, OpenCL is included in the CUDA Toolkit so loading the CUDA module is all the is required.
Compilers
- nvcc -- Nvidia compiler
MPI
The GPC MPI packages can be used on this system. See the GPC section on MPI for more details.
Compute Nodes
To access the other 7 compute nodes with GPU's you need to use the queue, similar to the standard GPC compute nodes. Currently the nodes are scheduled by node with a limit of 2 nodes per job and a maximum walltime of 48 hours.
qsub -l nodes=1:ppn=8:gpu=2,walltime=48:00:00 -I
Documentation
- CUDA
- google "CUDA"
- OpenCL
- see above
Further Info
User Codes
Please discuss put any relevant information/problems/best practices you have encountered when using/developing for CUDA and/or OpenCL