Difference between revisions of "GPU Devel Nodes"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
m
 
(44 intermediate revisions by 5 users not shown)
Line 1: Line 1:
{{Infobox Computer
+
{| style="border-spacing: 8px; width:100%"
|image=[[Image:Tesla_S2070_3qtr.gif|center|300px|thumb]]
+
| valign="top" style="cellpadding:1em; padding:1em; border:2px solid; background-color:#f6f674; border-radius:5px"|
|name=GPU Development Cluster
+
'''WARNING: SciNet is in the process of replacing this wiki with a new documentation site. For current information, please go to [https://docs.scinet.utoronto.ca https://docs.scinet.utoronto.ca]'''
|installed=April 2011
+
|}
|operatingsystem= Linux RHEL 5.4
 
|loginnode= arc01 (from <tt>login.scinet</tt>)
 
|numberofnodes=8
 
|rampernode=48 Gb
 
|corespernode=8 with 2xGPUs
 
|interconnect=Infiniband
 
|vendorcompilers=nvcc (gcc,icc)
 
|queuetype=Torque
 
}}
 
  
The GPU cluster consists of 8 x86_64 nodes each with two quad core Intel Xeon X5550 2.67GHz CPUs with 48GB of RAM per node.  Each node has two NVIDIA Tesla M2070 GPUs
+
<span style="color:#772222">The ARC GPU have been decommisioned. The head node, arc01, is still up, however for GPU computations, users are encouraged to move to the [[Gravity]] clusterFor visualization, new [[Visualization Nodes]] are being setup.</span>
with CUDA Capability 2.0 (Fermi) each with 448 CUDA Cores @ 1.15GHz and 6 GB of RAM.  The nodes are all connected the DDR Infiniband for MPI communications
 
and Gigabit ethernet for disk I/O to the SciNet GPFS filesystems.  In total this cluster contains 64 x86_64 cores with 384 GB of system RAM and 16 GPUs with 96 GB GPU RAM total.
 
 
 
== Nodes ==
 
 
 
=== Login ===
 
First login via ssh with your scinet account at <tt>login.scinet.utoronto.ca</tt>, and from there you can proceed to '''<tt>arc01</tt>''' which
 
is the GPU development node.
 
 
 
Access to these machines is currently controlled. Please email support@scinet.utoronto.ca for access.
 
 
 
=== Devel ===
 
 
 
As mentioned '''<tt>arc01</tt>''' is the head/develop node for interactive use.  This node is for compiling, short testing, and submitting
 
batch jobs to the compute nodes.  It is a shared resource so treat it accordingly and use the queue and compute nodes for long are large
 
computations.
 
 
 
=== Compute ===
 
 
 
To access the other 7 compute nodes with GPU's you need to use the queue, similar to the standard GPC compute nodes.
 
Currently the nodes are scheduled by complete node, 8 cores and 2 GPUs, and a maximum walltime of 48 hours.
 
 
 
For an interactive job use
 
<pre>
 
qsub -l nodes=1:ppn=8:gpus=2,walltime=48:00:00 -I
 
</pre>
 
 
 
or for a batch job use
 
<pre>
 
qsub script.sh
 
</pre>
 
where <tt>scirpt.sh</tt> is
 
<source lang="bash">
 
#!/bin/bash
 
# Torque submission script for SciNet ARC
 
#
 
#PBS -l nodes=2:ppn=8:gpus=2,walltime=1:00:00
 
#PBS -N GPUtest
 
cd $PBS_O_WORKDIR
 
 
 
# EXECUTION COMMAND; -np = nodes*ppn
 
mpirun -np 16 ./a.out
 
</source>
 
 
 
== Software ==
 
 
 
The same software installed on the GPC is available on ARC using the same modules framework.
 
See [[GPC_Quickstart#Modules_and_Environment_Variables | here]] for full details.
 
 
 
==Programming Frameworks==
 
 
 
Currently there are two programming frameworks to use, NVIDIA's CUDA framework or OpenCL.
 
 
 
=== CUDA ===
 
 
 
The current installed CUDA Toolkits are 3.0, 3.1, 3.2 (default) and 4.0RC2. To use 3.2 just add the following module
 
 
 
<pre>
 
module load cuda/3.2
 
</pre>
 
 
 
Note that to use the full 6GB or memory per GPU, CUDA 3.2 or newer must be used.
 
 
 
The CUDA driver is installed locally, however the CUDA Toolkits are installed in.
 
 
 
<pre>
 
/project/scinet/arc/cuda-$VERSION/
 
</pre>
 
 
 
The environment variable $SCINET_CUDA_INSTALL is set when a cuda module is loaded and it points to the
 
install locationThis is useful when setting up makefiles and if you use the NVIDIA_SDK
 
build evironment, modify the NVIDIA_SDK/C/common/common.mk file accordingly.
 
 
 
<pre>
 
CUDA_INSTALL_PATH = $SCINET_CUDA_INSTALL
 
</pre>
 
 
 
=== OpenCL ===
 
 
As of 3.0, OpenCL is included in the CUDA Toolkit so loading the CUDA module is all that is required.
 
 
 
===Compilers===
 
 
 
* '''nvcc''' -- Nvidia compiler
 
 
 
===MPI===
 
 
 
The GPC MPI packages can be used on this system. See the GPC section on [[ GPC_Quickstart#MPI |MPI ]] for more details.
 
 
 
=== Driver Version ===
 
 
 
The current NVIDIA driver version installed is 270.40.
 
 
 
== Documentation ==
 
* CUDA
 
** google "CUDA"
 
 
 
* OpenCL
 
** see above
 
 
 
== Further Info ==
 
 
 
 
 
 
 
== User Codes ==
 
 
 
Please discuss and put any relevant information/problems/best practices you have encountered when using/developing for CUDA and/or OpenCL
 

Latest revision as of 19:26, 31 August 2018

WARNING: SciNet is in the process of replacing this wiki with a new documentation site. For current information, please go to https://docs.scinet.utoronto.ca

The ARC GPU have been decommisioned. The head node, arc01, is still up, however for GPU computations, users are encouraged to move to the Gravity cluster. For visualization, new Visualization Nodes are being setup.