Difference between revisions of "SOSCIP GPU"
Jump to navigation
Jump to search
(16 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | __NOTOC__ | ||
+ | |||
+ | {| style="border-spacing: 8px; width:100%" | ||
+ | | valign="top" style="cellpadding:1em; padding:1em; border:2px solid; background-color:#f6f674; border-radius:5px"| | ||
+ | '''WARNING: SciNet is in the process of replacing this wiki with a new documentation site. For current information, please go to [https://docs.scinet.utoronto.ca https://docs.scinet.utoronto.ca]''' | ||
+ | |} | ||
+ | |||
{{Infobox Computer | {{Infobox Computer | ||
|image=[[Image:S882lc.png|center|300px|thumb]] | |image=[[Image:S882lc.png|center|300px|thumb]] | ||
Line 11: | Line 18: | ||
|vendorcompilers=xlc/xlf, nvcc | |vendorcompilers=xlc/xlf, nvcc | ||
}} | }} | ||
+ | |||
+ | == New Documentation Site == | ||
+ | Please visit the new documentation site: [https://docs.scinet.utoronto.ca/index.php/SOSCIP_GPU https://docs.scinet.utoronto.ca/index.php/SOSCIP_GPU] for updated information. | ||
== SOSCIP == | == SOSCIP == | ||
Line 20: | Line 30: | ||
Please use [mailto:soscip-support@scinet.utoronto.ca <soscip-support@scinet.utoronto.ca>] for SOSCIP GPU specific inquiries. | Please use [mailto:soscip-support@scinet.utoronto.ca <soscip-support@scinet.utoronto.ca>] for SOSCIP GPU specific inquiries. | ||
+ | |||
+ | <!-- | ||
== Specifications== | == Specifications== | ||
Line 132: | Line 144: | ||
</pre> | </pre> | ||
− | ===Packing single-GPU jobs within one SLURM job=== | + | ===Packing single-GPU jobs within one SLURM job submission=== |
− | Jobs are scheduled by node (4 GPUs) on SOSCIP GPU cluster. If user's code/program cannot | + | Jobs are scheduled by node (4 GPUs) on SOSCIP GPU cluster. If user's code/program cannot utilize all 4 GPUs, user can use GNU Parallel tool to pack 4 or more single-GPU jobs into one SLURM job. Below is an example of submitting 4 single-GPU python codes within one job: (When using GNU parallel for a publication please cite as per '''''parallel --citation''''') |
<pre> | <pre> | ||
#!/bin/bash | #!/bin/bash | ||
Line 164: | Line 176: | ||
The PowerAI platform contains popular open machine learning frameworks such as '''Caffe, TensorFlow, and Torch'''. Run the <tt>module avail</tt> command for a complete listing. More information is available at this link: https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/. Release 4.0 is currently installed. | The PowerAI platform contains popular open machine learning frameworks such as '''Caffe, TensorFlow, and Torch'''. Run the <tt>module avail</tt> command for a complete listing. More information is available at this link: https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/. Release 4.0 is currently installed. | ||
− | + | ===GNU Compilers === | |
− | System default compiler is GCC/5.4.0. More recent versions of the GNU Compiler Collection (C/C++/Fortran) are provided in the IBM | + | System default compiler is GCC/5.4.0. More recent versions of the GNU Compiler Collection (C/C++/Fortran) are provided in the IBM Advance Toolchain with enhancements for the POWER8 CPU. To load the newer advance toolchain version use: |
− | + | Advance Toolchain V10.0 | |
<pre> | <pre> | ||
− | module load gcc/6. | + | module load gcc/6.4.1 |
</pre> | </pre> | ||
− | + | Advance Toolchain V11.0 | |
<pre> | <pre> | ||
− | module load gcc/7. | + | module load gcc/7.3.1 |
</pre> | </pre> | ||
− | More information about the IBM | + | More information about the IBM Advance Toolchain can be found here: [https://developer.ibm.com/linuxonpower/advance-toolchain/ https://developer.ibm.com/linuxonpower/advance-toolchain/] |
− | + | === IBM XL Compilers === | |
To load the native IBM xlc/xlc++ and xlf (Fortran) compilers, run | To load the native IBM xlc/xlc++ and xlf (Fortran) compilers, run | ||
Line 197: | Line 209: | ||
[https://www.ibm.com/support/knowledgecenter/SSAT4T_15.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL Fortran] | [https://www.ibm.com/support/knowledgecenter/SSAT4T_15.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL Fortran] | ||
− | + | === NVIDIA GPU Driver === | |
− | The current NVIDIA driver version is | + | The current NVIDIA driver version is 396.26 |
− | + | === CUDA === | |
− | The current installed CUDA Tookits is are version 8.0, | + | The current installed CUDA Tookits is are version 8.0, 9.0 and 9.1. |
<pre> | <pre> | ||
module load cuda/8.0 | module load cuda/8.0 | ||
− | |||
− | |||
or | or | ||
− | |||
− | |||
module load cuda/9.0 | module load cuda/9.0 | ||
− | |||
or | or | ||
− | |||
− | |||
module load cuda/9.1 | module load cuda/9.1 | ||
+ | or | ||
+ | module load cuda/9.2 | ||
</pre> | </pre> | ||
Line 227: | Line 234: | ||
/usr/local/cuda-9.0 | /usr/local/cuda-9.0 | ||
/usr/local/cuda-9.1 | /usr/local/cuda-9.1 | ||
+ | /usr/local/cuda-9.2 | ||
</pre> | </pre> | ||
− | Note that the <tt>/usr/local/cuda</tt> directory is linked to the <tt>/usr/local/cuda-9. | + | Note that the <tt>/usr/local/cuda</tt> directory is linked to the <tt>/usr/local/cuda-9.2</tt> directory. |
Documentation and API reference information for the CUDA Toolkit can be found here: [http://docs.nvidia.com/cuda/index.html http://docs.nvidia.com/cuda/index.html] | Documentation and API reference information for the CUDA Toolkit can be found here: [http://docs.nvidia.com/cuda/index.html http://docs.nvidia.com/cuda/index.html] | ||
− | + | === OpenMPI === | |
Currently OpenMPI has been setup on the 14 nodes connected over EDR Infiniband. | Currently OpenMPI has been setup on the 14 nodes connected over EDR Infiniband. | ||
Line 315: | Line 323: | ||
source tensorflow-1.8-py2/bin/activate | source tensorflow-1.8-py2/bin/activate | ||
</pre> | </pre> | ||
− | * Install TensorFlow into the virtual environment: (A custom Numpy built with | + | * Install TensorFlow into the virtual environment: (A custom Numpy built with OpenBLAS library can be installed) |
<pre> | <pre> | ||
pip install --upgrade --force-reinstall /scinet/sgc/Libraries/numpy/numpy-1.14.3-cp27-cp27mu-linux_ppc64le.whl | pip install --upgrade --force-reinstall /scinet/sgc/Libraries/numpy/numpy-1.14.3-cp27-cp27mu-linux_ppc64le.whl | ||
Line 332: | Line 340: | ||
source tensorflow-1.8-py3/bin/activate | source tensorflow-1.8-py3/bin/activate | ||
</pre> | </pre> | ||
− | * Install TensorFlow into the virtual environment: (A custom Numpy built with | + | * Install TensorFlow into the virtual environment: (A custom Numpy built with OpenBLAS library can be installed) |
<pre> | <pre> | ||
pip3 install --upgrade --force-reinstall /scinet/sgc/Libraries/numpy/numpy-1.14.3-cp35-cp35m-linux_ppc64le.whl | pip3 install --upgrade --force-reinstall /scinet/sgc/Libraries/numpy/numpy-1.14.3-cp35-cp35m-linux_ppc64le.whl | ||
Line 365: | Line 373: | ||
# GPU Cluster Introduction: [[Media:GPU_Training_01.pdf|SOSCIP GPU Platform]] | # GPU Cluster Introduction: [[Media:GPU_Training_01.pdf|SOSCIP GPU Platform]] | ||
+ | --> |
Latest revision as of 15:17, 5 October 2018
WARNING: SciNet is in the process of replacing this wiki with a new documentation site. For current information, please go to https://docs.scinet.utoronto.ca |
SOSCIP GPU | |
---|---|
Installed | September 2017 |
Operating System | Ubuntu 16.04 le |
Number of Nodes | 14x Power 8 with 4x NVIDIA P100 |
Interconnect | Infiniband EDR |
Ram/Node | 512 GB |
Cores/Node | 2 x 10core (20 physical, 160 SMT) |
Login/Devel Node | sgc01 |
Vendor Compilers | xlc/xlf, nvcc |
New Documentation Site
Please visit the new documentation site: https://docs.scinet.utoronto.ca/index.php/SOSCIP_GPU for updated information.
SOSCIP
The SOSCIP GPU Cluster is a Southern Ontario Smart Computing Innovation Platform (SOSCIP) resource located at theUniversity of Toronto's SciNet HPC facility. The SOSCIP multi-university/industry consortium is funded by the Ontario Government and the Federal Economic Development Agency for Southern Ontario [1].
Support Email
Please use <soscip-support@scinet.utoronto.ca> for SOSCIP GPU specific inquiries.