Difference between revisions of "P8"
Line 9: | Line 9: | ||
|corespernode= 2 x 8core (16 physical, 128 SMT) | |corespernode= 2 x 8core (16 physical, 128 SMT) | ||
|interconnect=Infiniband EDR | |interconnect=Infiniband EDR | ||
− | |vendorcompilers=xlc/xlf | + | |vendorcompilers=xlc/xlf, nvcc |
}} | }} | ||
== Specifications== | == Specifications== | ||
− | The P8 Test System consists of of 4 IBM Power 822LC Servers each with 2x8core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 16 physical cores to support up to 128 threads. 2 nodes have two NVIDIA Tesla K80 GPUs with CUDA Capability 3.7 (Kepler), consisting of 2xGK210 GPUs each with 12 GB of RAM connected using PCI-E, and 2 others have 4x NVIDIA Tesla P100 GPUs each wit h | + | The P8 Test System consists of of 4 IBM Power 822LC Servers each with 2x8core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 16 physical cores to support up to 128 threads. 2 nodes have two NVIDIA Tesla K80 GPUs with CUDA Capability 3.7 (Kepler), consisting of 2xGK210 GPUs each with 12 GB of RAM connected using PCI-E, and 2 others have 4x NVIDIA Tesla P100 GPUs each wit h 16GB of RAM with CUDA Capability 6.0 (Pascal) connected using NVlink. |
== Compile/Devel/Test == | == Compile/Devel/Test == | ||
Line 20: | Line 20: | ||
First login via ssh with your scinet account at '''<tt>login.scinet.utoronto.ca</tt>''', and from there you can proceed to '''<tt>p8t0[1-2]</tt>''' for the K80 GPUs and '''<tt>p8t0[3-4]</tt>''' for the Pascal GPUs. | First login via ssh with your scinet account at '''<tt>login.scinet.utoronto.ca</tt>''', and from there you can proceed to '''<tt>p8t0[1-2]</tt>''' for the K80 GPUs and '''<tt>p8t0[3-4]</tt>''' for the Pascal GPUs. | ||
− | == Software == | + | == Software for == |
− | |||
− | |||
==== GNU Compilers ==== | ==== GNU Compilers ==== | ||
− | + | To load the newer advance toolchain version use: | |
+ | For '''<tt>p8t0[1-2]</tt>''' | ||
<pre> | <pre> | ||
module load gcc/5.3.1 | module load gcc/5.3.1 | ||
+ | </pre> | ||
+ | |||
+ | For '''<tt>p8t0[3-4]</tt>''' | ||
+ | <pre> | ||
+ | module load gcc/6.2.1 | ||
</pre> | </pre> | ||
==== IBM Compilers ==== | ==== IBM Compilers ==== | ||
− | |||
− | + | To load the native IBM xlc/xlc++ compilers | |
+ | |||
+ | For '''<tt>p8t0[1-2]</tt>''' | ||
+ | <pre> | ||
+ | module load xlc/13.1.4 | ||
+ | module load xlf/13.1.4 | ||
+ | </pre> | ||
+ | |||
+ | For '''<tt>p8t0[3-4]</tt>''' | ||
+ | <pre> | ||
+ | module load xlc/13.1.5 | ||
+ | module load xlf/13.1.5 | ||
+ | </pre> | ||
+ | |||
==== Driver Version ==== | ==== Driver Version ==== | ||
− | The current NVIDIA driver version is 361. | + | The current NVIDIA driver version is 361.93 |
==== CUDA ==== | ==== CUDA ==== | ||
− | The current installed CUDA Tookit is | + | The current installed CUDA Tookit is 8.0 |
<pre> | <pre> | ||
− | module load cuda/ | + | module load cuda/8.0 |
</pre> | </pre> | ||
− | |||
The CUDA driver is installed locally, however the CUDA Toolkit is installed in: | The CUDA driver is installed locally, however the CUDA Toolkit is installed in: | ||
<pre> | <pre> | ||
− | /usr/local/cuda- | + | /usr/local/cuda-8.0 |
</pre> | </pre> | ||
==== OpenMPI ==== | ==== OpenMPI ==== | ||
− | Currently OpenMPI has been setup on the | + | Currently OpenMPI has been setup on the four nodes connected over QDR Infiniband. |
+ | For '''<tt>p8t0[1-2]</tt>''' | ||
<pre> | <pre> | ||
− | |||
$ module load openmpi/1.10.3-gcc-5.3.1 | $ module load openmpi/1.10.3-gcc-5.3.1 | ||
$ module load openmpi/1.10.3-XL-13_15.1.4 | $ module load openmpi/1.10.3-XL-13_15.1.4 | ||
+ | </pre> | ||
+ | |||
+ | For '''<tt>p8t0[3-4]</tt>''' | ||
+ | <pre> | ||
+ | $ module load openmpi/1.10.3-gcc-6.2.1 | ||
+ | $ module load openmpi/1.10.3-XL-13_15.1.5 | ||
</pre> | </pre> | ||
Revision as of 10:06, 24 October 2016
P8 | |
---|---|
Installed | June 2016 |
Operating System | Linux RHEL 7.2 le / Ubuntu 16.04 le |
Number of Nodes | 2x Power8 with 2x NVIDIA K80, 2x Power 8 with 4x NVIDIA P100 |
Interconnect | Infiniband EDR |
Ram/Node | 512 GB |
Cores/Node | 2 x 8core (16 physical, 128 SMT) |
Login/Devel Node | p8t0[1-2] / p8t0[3-4] |
Vendor Compilers | xlc/xlf, nvcc |
Specifications
The P8 Test System consists of of 4 IBM Power 822LC Servers each with 2x8core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 16 physical cores to support up to 128 threads. 2 nodes have two NVIDIA Tesla K80 GPUs with CUDA Capability 3.7 (Kepler), consisting of 2xGK210 GPUs each with 12 GB of RAM connected using PCI-E, and 2 others have 4x NVIDIA Tesla P100 GPUs each wit h 16GB of RAM with CUDA Capability 6.0 (Pascal) connected using NVlink.
Compile/Devel/Test
First login via ssh with your scinet account at login.scinet.utoronto.ca, and from there you can proceed to p8t0[1-2] for the K80 GPUs and p8t0[3-4] for the Pascal GPUs.
Software for
GNU Compilers
To load the newer advance toolchain version use:
For p8t0[1-2]
module load gcc/5.3.1
For p8t0[3-4]
module load gcc/6.2.1
IBM Compilers
To load the native IBM xlc/xlc++ compilers
For p8t0[1-2]
module load xlc/13.1.4 module load xlf/13.1.4
For p8t0[3-4]
module load xlc/13.1.5 module load xlf/13.1.5
Driver Version
The current NVIDIA driver version is 361.93
CUDA
The current installed CUDA Tookit is 8.0
module load cuda/8.0
The CUDA driver is installed locally, however the CUDA Toolkit is installed in:
/usr/local/cuda-8.0
OpenMPI
Currently OpenMPI has been setup on the four nodes connected over QDR Infiniband.
For p8t0[1-2]
$ module load openmpi/1.10.3-gcc-5.3.1 $ module load openmpi/1.10.3-XL-13_15.1.4
For p8t0[3-4]
$ module load openmpi/1.10.3-gcc-6.2.1 $ module load openmpi/1.10.3-XL-13_15.1.5
PE
IBM's Parallel Environment (PE), is available for use with XL compilers using the following
$ module pe/xl.perf
mpiexec -n 4 ./a.out
documentation is here