Difference between revisions of "P8"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
m
 
(25 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
{| style="border-spacing: 8px; width:100%"
 +
| valign="top" style="cellpadding:1em; padding:1em; border:2px solid; background-color:#f6f674; border-radius:5px"|
 +
'''WARNING: SciNet is in the process of replacing this wiki with a new documentation site. For current information, please go to [https://docs.scinet.utoronto.ca https://docs.scinet.utoronto.ca]'''
 +
|}
 +
 
{{Infobox Computer
 
{{Infobox Computer
 
|image=[[Image:P8_s822.jpg|center|300px|thumb]]
 
|image=[[Image:P8_s822.jpg|center|300px|thumb]]
 
|name=P8  
 
|name=P8  
 
|installed=June 2016
 
|installed=June 2016
|operatingsystem= Linux (RHEL 7.2) le  
+
|operatingsystem= Linux RHEL 7.2 le / Ubuntu 16.04 le  
|loginnode= p8t01,p8t02 (from <tt>login.scinet</tt>)
+
|loginnode= p8t0[1-2] / p8t0[3-4]
|nnodes=2 Power8 with 2x NVIDIA K80
+
|nnodes= 2x  Power8 with 2x NVIDIA K80,      2x Power 8 with  4x NVIDIA P100
|rampernode=512 Gb
+
|rampernode=512 GB
|corespernode=32 (128 Threads)
+
|corespernode= 2 x 8core (16 physical, 128 SMT)
|interconnect=Infiniband ( EDR )
+
|interconnect=Infiniband EDR  
|vendorcompilers=xlc/xlf
+
|vendorcompilers=xlc/xlf, nvcc
 
}}
 
}}
  
 
== Specifications==
 
== Specifications==
  
The P8 Test System consists of 2 IBM Power 822LC Servers each with 2x 8core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous Multi Threading (SMT), but extends the design to 8 threads per core.  This allows the 16 physical cores to support up to 128 threads which in many cases can lead to significant speedups.
+
The P8 Test System consists of of 4 IBM Power 822LC Servers each with 2x8core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 16 physical cores to support up to 128 threads.  2 nodes have two NVIDIA Tesla K80 GPUs with CUDA Capability 3.7 (Kepler), consisting of 2xGK210 GPUs each with 12 GB of RAM connected using PCI-E, and 2 others have 4x NVIDIA Tesla P100 GPUs each wit h 16GB of RAM with CUDA Capability 6.0 (Pascal) connected using NVlink.
 +
 
 +
== Compile/Devel/Test ==
 +
 
 +
First login via ssh with your scinet account at '''<tt>login.scinet.utoronto.ca</tt>''', and from there you can ssh to <tt>p8t01</tt> or <tt>p8t02</tt> for the K80 GPUs and to <tt>p8t03</tt> or <tt>p8t04</tt> for the Pascal GPUs.
  
== Login ==
+
== Software for  ==
  
First login via ssh with your scinet account at '''<tt>login.scinet.utoronto.ca</tt>''', and from there you can proceed to '''<tt>p7n01-ib0</tt>''' which
+
==== GNU Compilers ====
is currently the gateway/devel node for this cluster.  It is recommended that you modify your .bashrc files to distinguish between the TCS, P7,
 
and GPC to avoid module confusion, an example configuration is given [[Important_.bashrc_guidelines|here]].
 
  
== Compiler/Devel Node ==
+
To load the newer advance toolchain version use:
  
From '''<tt>p9n08-ib0</tt>''' you can compile, do short tests, and submit your jobs to the queue.
+
For '''<tt>p8t0[1-2]</tt>'''  
 +
<pre>
 +
module load gcc/5.3.1
 +
</pre>
  
=== Software ===
+
For '''<tt>p8t0[3-4]</tt>'''
==== GNU Compilers ====
+
<pre>
gcc/g++/gfortran version 4.4.4 is the default with RHEL 6.3 and is available by default. Gcc 4.6.1 is available as a separate module. However, it is recommended to use the IBM compilers (see below).
+
module load gcc/6.2.1
 +
</pre>
  
 
==== IBM Compilers ====
 
==== IBM Compilers ====
To use the IBM Power specific compilers xlc/xlc++/xlf you need to load the following modules
+
 
 +
To load the native IBM xlc/xlc++ compilers
 +
 
 +
For '''<tt>p8t0[1-2]</tt>'''
 
<pre>
 
<pre>
$ module load vacpp xlf
+
module load xlc/13.1.4
 +
module load xlf/13.1.4
 
</pre>
 
</pre>
  
NOTE: Be sure to use "-q64" when using the IBM compilers.
+
For '''<tt>p8t0[3-4]</tt>'''
 +
<pre>
 +
module load xlc/13.1.5_b2
 +
module load xlf/13.1.5_b2
 +
</pre>
  
==== MPI ====
+
==== Driver Version ====
 +
 
 +
The current NVIDIA driver version is 361.93
 +
 
 +
==== CUDA ====
 +
 
 +
The current installed CUDA Tookit is 8.0
  
IBM's POE is available and will work with both the IBM and GNU compilers.
 
 
<pre>
 
<pre>
$ module load pe
+
module load cuda/8.0
 
</pre>
 
</pre>
The mpi wrappers for C, C++ and Fortran 77/90 are mpicc, mpicxx, and mpif77/mpif90, respectively (but mpcc, mpCC and mpfort should also work).
 
  
Note: To use the full C++ bindings of MPI (those in the MPI namespace) in c++ code, you need to add <tt>-cpp</tt> to the compilation command, and you need to add <tt>-Wl,--allow-multiple-definition</tt> to the link command if you are linking several  object files that use the MPI c++ bindings.
+
The CUDA driver is installed locally, however the CUDA Toolkit is installed in:
 +
 
 +
<pre>
 +
/usr/local/cuda-8.0
 +
</pre>
  
<!--
 
 
==== OpenMPI ====
 
==== OpenMPI ====
  
 +
Currently OpenMPI has been setup on the four nodes connected over QDR Infiniband.
 +
 +
For '''<tt>p8t0[1-2]</tt>'''
 +
<pre>
 +
$ module load openmpi/1.10.3-gcc-5.3.1
 +
$ module load openmpi/1.10.3-XL-13_15.1.4
 +
</pre>
 +
 +
For '''<tt>p8t0[3-4]</tt>'''
 
<pre>
 
<pre>
$ module openmpi/1.5.3-gcc-v4.4.4
+
$ module load openmpi/1.10.3-gcc-6.2.1
 +
$ module load openmpi/1.10.3-XL-13_15.1.5
 
</pre>
 
</pre>
 +
 +
==== PE ====
 +
 +
IBM's Parallel Environment (PE), is available for use with XL compilers using the following
  
 
<pre>
 
<pre>
$ module openmpi/1.5.3-ibm-11.1+13.1
+
$ module pe/xl.perf
 
</pre>
 
</pre>
-->
+
 
 +
<pre>
 +
mpiexec -n 4 ./a.out
 +
</pre>
 +
 
 +
documentation is [http://publib.boulder.ibm.com/epubs/pdf/c2372832.pdf here]

Latest revision as of 13:29, 9 August 2018

WARNING: SciNet is in the process of replacing this wiki with a new documentation site. For current information, please go to https://docs.scinet.utoronto.ca

P8
P8 s822.jpg
Installed June 2016
Operating System Linux RHEL 7.2 le / Ubuntu 16.04 le
Number of Nodes 2x Power8 with 2x NVIDIA K80, 2x Power 8 with 4x NVIDIA P100
Interconnect Infiniband EDR
Ram/Node 512 GB
Cores/Node 2 x 8core (16 physical, 128 SMT)
Login/Devel Node p8t0[1-2] / p8t0[3-4]
Vendor Compilers xlc/xlf, nvcc

Specifications

The P8 Test System consists of of 4 IBM Power 822LC Servers each with 2x8core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 16 physical cores to support up to 128 threads. 2 nodes have two NVIDIA Tesla K80 GPUs with CUDA Capability 3.7 (Kepler), consisting of 2xGK210 GPUs each with 12 GB of RAM connected using PCI-E, and 2 others have 4x NVIDIA Tesla P100 GPUs each wit h 16GB of RAM with CUDA Capability 6.0 (Pascal) connected using NVlink.

Compile/Devel/Test

First login via ssh with your scinet account at login.scinet.utoronto.ca, and from there you can ssh to p8t01 or p8t02 for the K80 GPUs and to p8t03 or p8t04 for the Pascal GPUs.

Software for

GNU Compilers

To load the newer advance toolchain version use:

For p8t0[1-2]

module load gcc/5.3.1

For p8t0[3-4]

module load gcc/6.2.1

IBM Compilers

To load the native IBM xlc/xlc++ compilers

For p8t0[1-2]

module load xlc/13.1.4
module load xlf/13.1.4

For p8t0[3-4]

module load xlc/13.1.5_b2
module load xlf/13.1.5_b2

Driver Version

The current NVIDIA driver version is 361.93

CUDA

The current installed CUDA Tookit is 8.0

module load cuda/8.0

The CUDA driver is installed locally, however the CUDA Toolkit is installed in:

/usr/local/cuda-8.0

OpenMPI

Currently OpenMPI has been setup on the four nodes connected over QDR Infiniband.

For p8t0[1-2]

$ module load openmpi/1.10.3-gcc-5.3.1
$ module load openmpi/1.10.3-XL-13_15.1.4

For p8t0[3-4]

$ module load openmpi/1.10.3-gcc-6.2.1
$ module load openmpi/1.10.3-XL-13_15.1.5

PE

IBM's Parallel Environment (PE), is available for use with XL compilers using the following

$ module pe/xl.perf
mpiexec -n 4 ./a.out

documentation is here