Difference between revisions of "Gravity"

From oldwiki.scinet.utoronto.ca

Revision as of 12:23, 5 December 2012

Gravity

Installed	December 2012
Operating System	Linux Centos 6.2
Number of Nodes	49
Interconnect	QDR Infiniband
Ram/Node	32 Gb
Cores/Node	12 with 2xM2090GPUs
Login/Devel Node	gravity01 (from `login.scinet`)
Vendor Compilers	nvcc,pgcc,icc,gcc
Queue Submission	Torque

The Gravity cluster, consists of 49 x86_64 nodes each with two hex core Intel Xeon (Sandybridge) E5-2620 2.0GHz CPUs with 32GB of RAM per node. Each node has two NVIDIA Tesla M2090 GPUs with CUDA Capability 2.0 (Fermi) each with 512 CUDA Cores and 6 GB of RAM. The nodes are interconnected with 3:1 blocking QDR Infiniband for MPI communications and disk I/O to the SciNet GPFS filesystems. In total this cluster contains 588 x86_64 cores with 1,568 GB of system RAM and 98 GPUs with 588 GB GPU RAM total.

Note that SciNet has a mailing lists for people interested in GPGPU computing. To receive information on courses, workshop and other GPGPU related events, sign up at https://support.scinet.utoronto.ca/mailman/listinfo/scinet-gpgpu.

Nodes

Login

First login via ssh with your scinet account at login.scinet.utoronto.ca, and from there you can proceed to gravity01 which is the GPU development node.

Access to these machines is currently controlled. Please email support@scinet.utoronto.ca for access.

Devel

As mentioned gravity01 is the head/develop node for interactive use. This node is for compiling, short testing, and submitting batch jobs to the compute nodes. It is a shared resource so treat it accordingly and use the queue and compute nodes for long are large computations.

Compute

To access the other 48 compute nodes with GPU's you need to use the queue, similar to the standard GPC compute nodes. Currently the nodes are scheduled by complete node, 12 cores and 2 GPUs, and a maximum walltime of 48 hours.

For an interactive job use

qsub -l nodes=1:ppn=12:gpus=2,walltime=48:00:00 -q gravity -I

or for a batch job use

qsub script.sh

where script.sh is <source lang="bash">

!/bin/bash
Torque submission script for Gravity
PBS -l nodes=2:ppn=12:gpus=2,walltime=1:00:00
PBS -N GPUtest
PBS -q gravity

cd $PBS_O_WORKDIR

EXECUTION COMMAND; -np = nodes*ppn

mpirun -np 24 ./a.out </source>

To check running jobs on the gpu nodes only use

showq -w class=gravity

Software

The same software installed on the GPC is available on Gravity using the same modules framework. See here for full details.

Programming Frameworks

See the SciNet [[GPU_Devel_Nodes#Programming_Frameworks | ARC ] page details on GPU specific software environment.

Retrieved from ‘https://oldwiki.scinet.utoronto.ca/index.php?title=Gravity&oldid=5517’

@@ Line 5: / Line 5: @@
 |operatingsystem= Linux Centos 6.2
 |loginnode= gravity01 (from <tt>login.scinet</tt>)
-|nnodes=48
+|nnodes=49
 |rampernode=32 Gb
 |corespernode=12 with 2xM2090GPUs
@@ Line 13: / Line 13: @@
 }}
-The GPU cluster, part of the [[Accelerator Research Cluster]], consists of 8 x86_64 nodes each with two quad core Intel Xeon X5550 2.67GHz CPUs with 48GB of RAM per node.  Each node has two NVIDIA Tesla M2070 GPUs
+The Gravity cluster, consists of 49 x86_64 nodes each with two hex core Intel Xeon (Sandybridge) E5-2620 2.0GHz CPUs with 32GB of RAM per node.  Each node has two NVIDIA Tesla M2090 GPUs
-with CUDA Capability 2.0 (Fermi) each with 448 CUDA Cores @ 1.15GHz and 6 GB of RAM.  The nodes are interconnected with DDR Infiniband for MPI communications
+with CUDA Capability 2.0 (Fermi) each with 512 CUDA Cores and 6 GB of RAM.  The nodes are interconnected with 3:1 blocking QDR Infiniband for MPI communications
-and disk I/O to the SciNet GPFS filesystems.  In total this cluster contains 64 x86_64 cores with 384 GB of system RAM and 16 GPUs with 96 GB GPU RAM total.
+and disk I/O to the SciNet GPFS filesystems.  In total this cluster contains 588 x86_64 cores with 1,568 GB of system RAM and 98 GPUs with 588 GB GPU RAM total.
 Note that SciNet has a mailing lists for people interested in GPGPU computing. To receive information on courses, workshop and other GPGPU related events, sign up at https://support.scinet.utoronto.ca/mailman/listinfo/scinet-gpgpu.
@@ Line 22: / Line 22: @@
 === Login ===
-First login via ssh with your scinet account at <tt>login.scinet.utoronto.ca</tt>, and from there you can proceed to '''<tt>arc01</tt>''' which
+First login via ssh with your scinet account at <tt>login.scinet.utoronto.ca</tt>, and from there you can proceed to '''<tt>gravity01</tt>''' which
 is the GPU development node.
@@ Line 29: / Line 29: @@
 === Devel ===
-As mentioned '''<tt>arc01</tt>''' is the head/develop node for interactive use.  This node is for compiling, short testing, and submitting
+As mentioned '''<tt>gravity01</tt>''' is the head/develop node for interactive use.  This node is for compiling, short testing, and submitting
 batch jobs to the compute nodes.  It is a shared resource so treat it accordingly and use the queue and compute nodes for long are large
 computations.
@@ Line 35: / Line 35: @@
 === Compute ===
-To access the other 7 compute nodes with GPU's you need to use the queue, similar to the standard GPC compute nodes.
+To access the other 48 compute nodes with GPU's you need to use the queue, similar to the standard GPC compute nodes.
-Currently the nodes are scheduled by complete node, 8 cores and 2 GPUs, and a maximum walltime of 48 hours.
+Currently the nodes are scheduled by complete node, 12 cores and 2 GPUs, and a maximum walltime of 48 hours.
 For an interactive job use
 <pre>
-qsub -l nodes=1:ppn=8:gpus=2,walltime=48:00:00 -q arc -I
+qsub -l nodes=1:ppn=12:gpus=2,walltime=48:00:00 -q gravity -I
 </pre>
@@ Line 50: / Line 50: @@
 <source lang="bash">
 #!/bin/bash
-# Torque submission script for SciNet ARC
+# Torque submission script for Gravity
 #
-#PBS -l nodes=2:ppn=8:gpus=2,walltime=1:00:00
+#PBS -l nodes=2:ppn=12:gpus=2,walltime=1:00:00
 #PBS -N GPUtest
-#PBS -q arc
+#PBS -q gravity
 cd $PBS_O_WORKDIR
 # EXECUTION COMMAND; -np = nodes*ppn
-mpirun -np 16 ./a.out
+mpirun -np 24 ./a.out
 </source>
 To check running jobs on the gpu nodes only use
 <pre>
-showq -w class=arc
+showq -w class=gravity
 </pre>
 == Software ==
-The same software installed on the GPC is available on ARC using the same modules framework.
+The same software installed on the GPC is available on Gravity using the same modules framework.
 See [[GPC_Quickstart#Modules_and_Environment_Variables | here]] for full details.
 ==Programming Frameworks==
-Currently there are four programming frameworks to use: NVIDIA's CUDA framework, PGI's CUDA Fortran, PGI's implementation of OpenACC, or OpenCL.
+See the SciNet [[GPU_Devel_Nodes#Programming_Frameworks | ARC ] page details on GPU specific software environment.
-=== NVIDIA toolkit ===
-==== CUDA ====
-The current installed CUDA Toolkits are 3.2, 4.0, 4.1 (default), and 4.2.  To use 4.0 just add the following module
-<pre>
-module load cuda/4.0
-</pre>
-Note that to use the full 6GB or memory per GPU, CUDA 3.2 or newer must be used.
-The CUDA driver is installed locally, however the CUDA Toolkits are installed in.
-<pre>
-/scinet/arc/cuda-$VERSION/
-</pre>
-The environment variable $SCINET_CUDA_INSTALL is set when a cuda module is loaded and it points to the
-install location.  This is useful when setting up makefiles and if you use the NVIDIA_SDK
-build evironment, modify the NVIDIA_SDK/C/common/common.mk file accordingly.
-<pre>
-CUDA_INSTALL_PATH = $SCINET_CUDA_INSTALL
-</pre>
-The Nvidia cuda compiler (which uses gcc/4.4.6 by default for CUDA < 4.1, while cuda/4.2 uses gcc/4.6.1), is called <tt>nvcc</tt>,
-You'll have to let the cuda compiler know about the capabilities of the Fermi graphics card by supplying the flag
-<pre>-arch=sm_13</pre> or <pre>-arch=sm_20</pre>
-==== NVIDIA Toolkit ====
-The latest CUDA SDK can be copied into your home directory from:
-<pre>
-/scinet/arc/src/gpucomputingsdk_4.0.13_linux.run
-</pre>
-NOTE: Not all of the CUDA and OpenCL examples will compile as many require OpenGL graphic libraries not installed on the nodes.
-==== OpenCL ====
-As of 3.0, OpenCL is included in the CUDA Toolkit so loading the CUDA module is all that is required.
-===PGI compilers===
-As of July 2012, The PGI suite of compilers is installed on the ARC.  These can be accessed by
-<pre>$  module load gcc/4.6.1 pgi/12.6</pre>
-(if you use the older pgi/12.5, gcc/4.4.6 is a requirement, and is used, for instance, in the CUDA parts of the PGI compilers). These compilers use their own cuda installation, so you do not need to load an additional cuda module. By default, they use a cuda 4.1 installation, but you can request cuda 4.2 as well using the <tt>-Mcuda=4.2</tt> option.
-The compilation commands are pgcc, pgcpp and pgfortran for c, c++ and fortran, respectively. As usual, we advice to compiler with optimization using the flags
-<pre>
--O4 -fastsse
-</pre>
-The compilers will then optimize for the specific machine that you are compiling on.
-The PGI compilers support OpenMP as well through the compile and link flags
-<pre>
--mp
-</pre>
-==== CUDA Fortran ====
-The PGI fortran compiler (<tt>pgfortran</tt>, also <tt>pgf77</tt> and <tt>pgf90</tt>) understands CUDA extensions to fortran.
-This compiler will automatically understand these extension for source files with the file extension <tt>.cuf</tt>  Otherwise, you have to specify
-<pre>
--Mcuda=4.1
-</pre>
-==== OpenACC ====
-OpenACC is a compiler-directive approach to GPGPU programming. The PGI compilers (c, c++ and fortran) have a partial implementation of this open specification. To switch this on, use the options
-<pre>-acc -ta=nvidia -Mcuda=4.1</pre>
-====More documentation====
-Manuals are on the [[Knowledge_Base:_Tutorials_and_Manuals| Tutorials and Manuals]] page.
-===Other compilers===
-* '''gcc,g++,gfortran''' - GNU compiler (nvcc need to have either gcc-4.4 or gcc-4.6 module loaded to work correctly)
-* '''icc,icpc,ifort''' - Intel compiler
-===Debuggers===
-* '''cuda-gdb''' - Nvidia text based gdb variant, part of the cuda module.
-* '''ddt''' - Allinea's graphical DDT debugger, in the <tt>ddt</tt> module. The most recent version, ddt 3.2.1, has support for cuda 4.2.
-Note that to debug both host and cuda device code, you have to give the <pre>-g -G</pre> pair of flags to nvcc.
-===MPI===
-The GPC MPI packages can be used on this system. See the GPC section on [[ GPC_Quickstart#MPI |MPI ]] for more details.
-While these mpi packages should work with the PGI compilers as well, this has not been tested and standard wrappers like mpif90 may not work.
-Alternatively, for mpi compilations with the PGI compilers, you can load the mpich1 mpi implementation with <pre>module load mpich1/pgi</pre>after which you can use the option <pre>-Mmpi</pre> or the wrapper scripts <tt>mpicc, mpiCC and mpif90</tt>, as well as <tt>mpirun</tt>.
-=== Driver Version ===
-The current NVIDIA driver version installed is 295.41.
-== Documentation ==
-* CUDA
-** google "CUDA"
-* OpenCL
-** see above
-== Further Info ==
-== User Codes ==
-Please discuss and put any relevant information/problems/best practices you have encountered when using/developing for CUDA and/or OpenCL