Difference between revisions of "SOSCIP GPU"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
 
(14 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
__NOTOC__
 +
 +
{| style="border-spacing: 8px; width:100%"
 +
| valign="top" style="cellpadding:1em; padding:1em; border:2px solid; background-color:#f6f674; border-radius:5px"|
 +
'''WARNING: SciNet is in the process of replacing this wiki with a new documentation site. For current information, please go to [https://docs.scinet.utoronto.ca https://docs.scinet.utoronto.ca]'''
 +
|}
 +
 
{{Infobox Computer
 
{{Infobox Computer
 
|image=[[Image:S882lc.png|center|300px|thumb]]
 
|image=[[Image:S882lc.png|center|300px|thumb]]
Line 11: Line 18:
 
|vendorcompilers=xlc/xlf, nvcc
 
|vendorcompilers=xlc/xlf, nvcc
 
}}
 
}}
 +
 +
== New Documentation Site ==
 +
Please visit the new documentation site: [https://docs.scinet.utoronto.ca/index.php/SOSCIP_GPU https://docs.scinet.utoronto.ca/index.php/SOSCIP_GPU] for updated information.
  
 
== SOSCIP ==
 
== SOSCIP ==
Line 20: Line 30:
 
Please use [mailto:soscip-support@scinet.utoronto.ca <soscip-support@scinet.utoronto.ca>] for SOSCIP GPU specific inquiries.
 
Please use [mailto:soscip-support@scinet.utoronto.ca <soscip-support@scinet.utoronto.ca>] for SOSCIP GPU specific inquiries.
  
 +
 +
<!--
 
== Specifications==
 
== Specifications==
  
Line 133: Line 145:
 
</pre>
 
</pre>
 
===Packing single-GPU jobs within one SLURM job submission===
 
===Packing single-GPU jobs within one SLURM job submission===
Jobs are scheduled by node (4 GPUs) on SOSCIP GPU cluster. If user's code/program cannot take advantage of all 4 GPUs, user can use GNU Parallel tool to pack 4 or more single-GPU jobs into one SLURM job. Below is an example of submitting 4 single-GPU python codes within one job:
+
Jobs are scheduled by node (4 GPUs) on SOSCIP GPU cluster. If user's code/program cannot utilize all 4 GPUs, user can use GNU Parallel tool to pack 4 or more single-GPU jobs into one SLURM job. Below is an example of submitting 4 single-GPU python codes within one job: (When using GNU parallel for a publication please cite as per '''''parallel --citation''''')
 
<pre>
 
<pre>
 
#!/bin/bash
 
#!/bin/bash
Line 166: Line 178:
 
===GNU Compilers ===
 
===GNU Compilers ===
  
System default compiler is GCC/5.4.0. More recent versions of the GNU Compiler Collection (C/C++/Fortran) are provided in the IBM Advanced Toolchain with enhancements for the POWER8 CPU. To load the newer advance toolchain version use:
+
System default compiler is GCC/5.4.0. More recent versions of the GNU Compiler Collection (C/C++/Fortran) are provided in the IBM Advance Toolchain with enhancements for the POWER8 CPU. To load the newer advance toolchain version use:
  
Advanced Toolchain V10.0
+
Advance Toolchain V10.0
 
<pre>
 
<pre>
module load gcc/6.3.1
+
module load gcc/6.4.1
 
</pre>
 
</pre>
  
Advanced Toolchain V11.0
+
Advance Toolchain V11.0
 
<pre>
 
<pre>
module load gcc/7.2.1
+
module load gcc/7.3.1
 
</pre>
 
</pre>
  
More information about the IBM Advanced Toolchain can be found here: [https://developer.ibm.com/linuxonpower/advance-toolchain/ https://developer.ibm.com/linuxonpower/advance-toolchain/]
+
More information about the IBM Advance Toolchain can be found here: [https://developer.ibm.com/linuxonpower/advance-toolchain/ https://developer.ibm.com/linuxonpower/advance-toolchain/]
  
 
=== IBM XL Compilers ===
 
=== IBM XL Compilers ===
Line 199: Line 211:
 
=== NVIDIA GPU Driver ===
 
=== NVIDIA GPU Driver ===
  
The current NVIDIA driver version is 390.46
+
The current NVIDIA driver version is 396.26
  
 
=== CUDA ===
 
=== CUDA ===
  
The current installed CUDA Tookits is are version 8.0, version 9.0 and version 9.1.
+
The current installed CUDA Tookits is are version 8.0, 9.0 and 9.1.
  
 
<pre>
 
<pre>
 
module load cuda/8.0
 
module load cuda/8.0
</pre>
 
 
 
or  
 
or  
 
<pre>
 
 
module load cuda/9.0
 
module load cuda/9.0
</pre>
 
 
or  
 
or  
 
<pre>
 
 
module load cuda/9.1
 
module load cuda/9.1
 +
or
 +
module load cuda/9.2
 
</pre>
 
</pre>
  
Line 227: Line 234:
 
/usr/local/cuda-9.0
 
/usr/local/cuda-9.0
 
/usr/local/cuda-9.1
 
/usr/local/cuda-9.1
 +
/usr/local/cuda-9.2
 
</pre>
 
</pre>
  
Note that the <tt>/usr/local/cuda</tt> directory is linked to the <tt>/usr/local/cuda-9.1</tt> directory.
+
Note that the <tt>/usr/local/cuda</tt> directory is linked to the <tt>/usr/local/cuda-9.2</tt> directory.
  
 
Documentation and API reference information for the CUDA Toolkit can be found here: [http://docs.nvidia.com/cuda/index.html http://docs.nvidia.com/cuda/index.html]
 
Documentation and API reference information for the CUDA Toolkit can be found here: [http://docs.nvidia.com/cuda/index.html http://docs.nvidia.com/cuda/index.html]
Line 315: Line 323:
 
source tensorflow-1.8-py2/bin/activate
 
source tensorflow-1.8-py2/bin/activate
 
</pre>
 
</pre>
* Install TensorFlow into the virtual environment: (A custom Numpy built with optimized OpenBLAS library can be installed)
+
* Install TensorFlow into the virtual environment: (A custom Numpy built with OpenBLAS library can be installed)
 
<pre>
 
<pre>
 
pip install --upgrade --force-reinstall /scinet/sgc/Libraries/numpy/numpy-1.14.3-cp27-cp27mu-linux_ppc64le.whl
 
pip install --upgrade --force-reinstall /scinet/sgc/Libraries/numpy/numpy-1.14.3-cp27-cp27mu-linux_ppc64le.whl
Line 332: Line 340:
 
source tensorflow-1.8-py3/bin/activate
 
source tensorflow-1.8-py3/bin/activate
 
</pre>
 
</pre>
* Install TensorFlow into the virtual environment: (A custom Numpy built with optimized OpenBLAS library can be installed)
+
* Install TensorFlow into the virtual environment: (A custom Numpy built with OpenBLAS library can be installed)
 
<pre>
 
<pre>
 
pip3 install --upgrade --force-reinstall /scinet/sgc/Libraries/numpy/numpy-1.14.3-cp35-cp35m-linux_ppc64le.whl
 
pip3 install --upgrade --force-reinstall /scinet/sgc/Libraries/numpy/numpy-1.14.3-cp35-cp35m-linux_ppc64le.whl
Line 365: Line 373:
  
 
# GPU Cluster Introduction: [[Media:GPU_Training_01.pdf‎|SOSCIP GPU Platform]]
 
# GPU Cluster Introduction: [[Media:GPU_Training_01.pdf‎|SOSCIP GPU Platform]]
 +
-->

Latest revision as of 15:17, 5 October 2018


WARNING: SciNet is in the process of replacing this wiki with a new documentation site. For current information, please go to https://docs.scinet.utoronto.ca

SOSCIP GPU
S882lc.png
Installed September 2017
Operating System Ubuntu 16.04 le
Number of Nodes 14x Power 8 with 4x NVIDIA P100
Interconnect Infiniband EDR
Ram/Node 512 GB
Cores/Node 2 x 10core (20 physical, 160 SMT)
Login/Devel Node sgc01
Vendor Compilers xlc/xlf, nvcc

New Documentation Site

Please visit the new documentation site: https://docs.scinet.utoronto.ca/index.php/SOSCIP_GPU for updated information.

SOSCIP

The SOSCIP GPU Cluster is a Southern Ontario Smart Computing Innovation Platform (SOSCIP) resource located at theUniversity of Toronto's SciNet HPC facility. The SOSCIP multi-university/industry consortium is funded by the Ontario Government and the Federal Economic Development Agency for Southern Ontario [1].

Support Email

Please use <soscip-support@scinet.utoronto.ca> for SOSCIP GPU specific inquiries.