<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-GB">
	<id>https://oldwiki.scinet.utoronto.ca/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Wagnerse</id>
	<title>oldwiki.scinet.utoronto.ca - User contributions [en-gb]</title>
	<link rel="self" type="application/atom+xml" href="https://oldwiki.scinet.utoronto.ca/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Wagnerse"/>
	<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php/Special:Contributions/Wagnerse"/>
	<updated>2026-05-24T21:40:53Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.35.12</generator>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9179</id>
		<title>SOSCIP GPU</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9179"/>
		<updated>2018-03-14T19:21:29Z</updated>

		<summary type="html">&lt;p&gt;Wagnerse: /* Job Submission */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:S882lc.png|center|300px|thumb]]&lt;br /&gt;
|name=SOSCIP GPU &lt;br /&gt;
|installed=September 2017&lt;br /&gt;
|operatingsystem= Ubuntu 16.04 le &lt;br /&gt;
|loginnode= sgc01 &lt;br /&gt;
|nnodes= 14x Power 8 with  4x NVIDIA P100&lt;br /&gt;
|rampernode=512 GB&lt;br /&gt;
|corespernode= 2 x 10core (20 physical, 160 SMT)&lt;br /&gt;
|interconnect=Infiniband EDR &lt;br /&gt;
|vendorcompilers=xlc/xlf, nvcc&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== SOSCIP ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster is a Southern Ontario Smart Computing Innovation Platform ([http://soscip.org/ SOSCIP]) resource located at theUniversity of Toronto's SciNet HPC facility. The SOSCIP  multi-university/industry consortium is funded by the Ontario Government and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:soscip-support@scinet.utoronto.ca &amp;lt;soscip-support@scinet.utoronto.ca&amp;gt;] for SOSCIP GPU specific inquiries.&lt;br /&gt;
&lt;br /&gt;
== Specifications==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster consists of  of 14 IBM Power 822LC &amp;quot;Minsky&amp;quot; Servers each with 2x10core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 20 physical cores to support up to 160 threads.  Each node has 4x NVIDIA Tesla P100 GPUs each with 16GB of RAM with CUDA Capability 6.0 (Pascal) connected using NVlink.&lt;br /&gt;
&lt;br /&gt;
== Access and Login ==&lt;br /&gt;
&lt;br /&gt;
In order to obtain access to the system, you must request access to the SOSCIP GPU Platform. Instructions will have been sent to your sponsoring faculty member via E-mail at the beginning of your SOSCIP project.&lt;br /&gt;
&lt;br /&gt;
Access to the SOSCIP GPU Platform is provided through the BGQ login node, '''&amp;lt;tt&amp;gt; bgqdev.scinet.utoronto.ca &amp;lt;/tt&amp;gt;''' using ssh, and from there you can proceed to the GPU development node '''&amp;lt;tt&amp;gt;sgc01-ib0&amp;lt;/tt&amp;gt;''' via ssh. Your user name and password is the same as it is for SciNet systems.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The filesystem is shared with the BGQ system.  See [https://wiki.scinet.utoronto.ca/wiki/index.php/BGQ#Filesystem here ] for details.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU cluster uses [https://slurm.schedmd.com/ SLURM ] as a job scheduler and jobs are scheduled by node, ie 20 cores and 4 GPUs each. Jobs are submitted from the development node '''&amp;lt;tt&amp;gt;sgc01&amp;lt;/tt&amp;gt;'''. The maximum walltime per job is 12 hours (except in the 'long' queue, see below) with up to 8 nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch myjob.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where myjob.script is &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More information about the &amp;lt;tt&amp;gt;sbatch&amp;lt;/tt&amp;gt; command is found [https://slurm.schedmd.com/sbatch.html here].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can query job information using&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To see only your own jobs, run &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue -u &amp;lt;userid&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Once your job is running, SLURM creates a file usually named &amp;lt;tt&amp;gt;slurm&amp;lt;jobid&amp;gt;.out&amp;lt;/tt&amp;gt; in the directory from where you issued the &amp;lt;tt&amp;gt;sbatch&amp;lt;/tt&amp;gt; command. This contains the console output from your job. You can monitor the output of your job by using the &amp;lt;tt&amp;gt;tail -f &amp;lt;file&amp;gt;&amp;lt;/tt&amp;gt; command.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
scancel $JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Longer jobs ===&lt;br /&gt;
&lt;br /&gt;
If your job takes more than 12 hours, the sbatch command will not let you submit your job.  There is, however, a way to have jobs up to 24 hours long, by specifying &amp;quot;-p long&amp;quot; as an option (i.e., add &amp;lt;tt&amp;gt;#SBATCH -p long&amp;lt;/tt&amp;gt; to your job script).  The priority of such jobs may be throttled in the future if we see that the 'long' queue is having a negative efffect on turnover time in the queue.&lt;br /&gt;
&lt;br /&gt;
=== Interactive ===&lt;br /&gt;
&lt;br /&gt;
For an interactive session use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
salloc --gres=gpu:4&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After executing this command, you may have to wait in the queue until a system is available.&lt;br /&gt;
&lt;br /&gt;
More information about the &amp;lt;tt&amp;gt;salloc&amp;lt;/tt&amp;gt; command is [https://slurm.schedmd.com/salloc.html here].&lt;br /&gt;
&lt;br /&gt;
=== Automatic Re-submission and Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Commonly you may have a job that you know will take longer to run than what is permissible in the queue. As long as your program contains checkpoint or restart capability, you can have one job automatically submit the next. In the following example it is assumed that the program finishes before the time limit requested and then resubmits itself by logging into the development nodes.   Job dependencies and a maximum number of job re-submissions are used to ensure sequential operation.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
: ${job_number:=&amp;quot;1&amp;quot;}           # set job_nubmer to 1 if it is undefined&lt;br /&gt;
job_number_max=3&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;hi from ${SLURM_JOB_ID}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
#RUN JOB HERE&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# SUBMIT NEXT JOB&lt;br /&gt;
if [[ ${job_number} -lt ${job_number_max} ]]&lt;br /&gt;
then&lt;br /&gt;
  (( job_number++ ))&lt;br /&gt;
  next_jobid=$(ssh sgc01-ib0 &amp;quot;cd $SLURM_SUBMIT_DIR; /opt/slurm/bin/sbatch --export=job_number=${job_number} -d afterok:${SLURM_JOB_ID} thisscript.sh | awk '{print $4}'&amp;quot;)&lt;br /&gt;
  echo &amp;quot;submitted ${next_jobid}&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
 &lt;br /&gt;
sleep 15&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;${SLURM_JOB_ID} done&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Software Installed ==&lt;br /&gt;
&lt;br /&gt;
=== IBM PowerAI ===&lt;br /&gt;
&lt;br /&gt;
The PowerAI platform contains popular open machine learning frameworks such as '''Caffe, TensorFlow, and Torch'''. Run the &amp;lt;tt&amp;gt;module avail&amp;lt;/tt&amp;gt; command for a complete listing. More information is available at this link: https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/. Release 4.0 is currently installed.&lt;br /&gt;
&lt;br /&gt;
==== GNU Compilers ====&lt;br /&gt;
&lt;br /&gt;
More recent versions of the GNU Compiler Collection (C/C++/Fortran) are provided in the IBM Advanced Toolchain with enhancements for the POWER8 CPU. To load the newer advance toolchain version use:&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V10.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/6.3.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V11.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/7.2.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More information about the IBM Advanced Toolchain can be found here: [https://developer.ibm.com/linuxonpower/advance-toolchain/ https://developer.ibm.com/linuxonpower/advance-toolchain/]&lt;br /&gt;
&lt;br /&gt;
==== IBM XL Compilers ====&lt;br /&gt;
&lt;br /&gt;
To load the native IBM xlc/xlc++ and xlf (Fortran) compilers, run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load xlc/13.1.5&lt;br /&gt;
module load xlf/15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
IBM XL Compilers are enabled for use with NVIDIA GPUs, including support for OpenMP 4.5 GPU offloading and integration with NVIDIA's nvcc command to compile host-side code for the POWER8 CPU.&lt;br /&gt;
&lt;br /&gt;
Information about the IBM XL Compilers can be found at the following links:&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSXVZZ_13.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL C/C++]&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSAT4T_15.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL Fortran]&lt;br /&gt;
&lt;br /&gt;
==== NVIDIA GPU Driver Version ====&lt;br /&gt;
&lt;br /&gt;
The current NVIDIA driver version is 384.66&lt;br /&gt;
&lt;br /&gt;
==== CUDA ====&lt;br /&gt;
&lt;br /&gt;
The current installed CUDA Tookits is are version 8.0 and version 9.0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The CUDA driver is installed locally, however the CUDA Toolkit is installed in:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/cuda-8.0&lt;br /&gt;
/usr/local/cuda-9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the &amp;lt;tt&amp;gt;/usr/local/cuda&amp;lt;/tt&amp;gt; directory is linked to the &amp;lt;tt&amp;gt;/usr/local/cuda-9.0&amp;lt;/tt&amp;gt; directory.&lt;br /&gt;
&lt;br /&gt;
Documentation and API reference information for the CUDA Toolkit can be found here: [http://docs.nvidia.com/cuda/index.html http://docs.nvidia.com/cuda/index.html]&lt;br /&gt;
&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
Currently OpenMPI has been setup on the 14 nodes connected over EDR Infiniband.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi/2.1.1-gcc-5.4.0&lt;br /&gt;
$ module load openmpi/2.1.1-XL-13_15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Other Software ==&lt;br /&gt;
&lt;br /&gt;
Other software packages can be installed onto the SOSCIP GPU Platform. It is best to try installing new software in your own home directory, which will give you control of the software (e.g. exact version, configuration, installing sub-packages, etc.).&lt;br /&gt;
&lt;br /&gt;
In the following subsections are instructions for installing several common software packages.&lt;br /&gt;
&lt;br /&gt;
=== Anaconda (Python) ===&lt;br /&gt;
&lt;br /&gt;
Anaconda is a popular distribution of the Python programming language. It contains several common Python libraries such as SciPy and NumPy as pre-built packages, which eases installation.&lt;br /&gt;
&lt;br /&gt;
Anaconda can be downloaded from here: [https://www.anaconda.com/download/#linux https://www.anaconda.com/download/#linux]&lt;br /&gt;
&lt;br /&gt;
NOTE: Be sure to download the '''Power8''' installer.&lt;br /&gt;
&lt;br /&gt;
TIP: If you plan to use Tensorflow within Anaconda, download the Python 2.7 version of Anaconda&lt;br /&gt;
&lt;br /&gt;
=== Keras ===&lt;br /&gt;
&lt;br /&gt;
Keras ([https://keras.io/ https://keras.io/]) is a popular high-level deep learning software development framework. It runs on top of other deep-learning frameworks such as TensorFlow.&lt;br /&gt;
&lt;br /&gt;
The easiest way to install Keras is to install Anaconda first, then install Keras by using using the pip command.&lt;br /&gt;
&lt;br /&gt;
Keras uses TensorFlow underneath to run neural network models. Before running code using Keras, be sure to load the PowerAI TensorFlow module and the cuda module.&lt;br /&gt;
&lt;br /&gt;
=== PyTorch ===&lt;br /&gt;
&lt;br /&gt;
PyTorch is the Python implementation of the Torch framework for deep learning. &lt;br /&gt;
&lt;br /&gt;
It is suggested that you use PyTorch within Anaconda.&lt;br /&gt;
&lt;br /&gt;
There is currently no build of PyTorch for POWER8-based systems. You will need to compile it from source.&lt;br /&gt;
&lt;br /&gt;
Obtain the source code from here: [http://pytorch.org/ http://pytorch.org/]&lt;br /&gt;
&lt;br /&gt;
Before building PyTorch, make sure to load cuda by running &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
NOTE: Do not have the gcc modules loaded when building PyTorch. Use the default version of gcc (currently v5.4.0) included with the operating system. Build will fail with later versions of gcc.&lt;br /&gt;
&lt;br /&gt;
== LINKS ==&lt;br /&gt;
&lt;br /&gt;
[https://www.olcf.ornl.gov/kb_articles/summitdev-quickstart/#System_Overview  Summit Dev System at ORNL]&lt;br /&gt;
&lt;br /&gt;
== DOCUMENTATION ==&lt;br /&gt;
&lt;br /&gt;
# GPU Cluster Introduction: [[Media:GPU_Training_01.pdf‎|SOSCIP GPU Platform]]&lt;/div&gt;</summary>
		<author><name>Wagnerse</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9168</id>
		<title>SOSCIP GPU</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9168"/>
		<updated>2018-02-23T15:07:03Z</updated>

		<summary type="html">&lt;p&gt;Wagnerse: /* DOCUMENTATION */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:S882lc.png|center|300px|thumb]]&lt;br /&gt;
|name=SOSCIP GPU &lt;br /&gt;
|installed=September 2017&lt;br /&gt;
|operatingsystem= Ubuntu 16.04 le &lt;br /&gt;
|loginnode= sgc01 &lt;br /&gt;
|nnodes= 14x Power 8 with  4x NVIDIA P100&lt;br /&gt;
|rampernode=512 GB&lt;br /&gt;
|corespernode= 2 x 10core (20 physical, 160 SMT)&lt;br /&gt;
|interconnect=Infiniband EDR &lt;br /&gt;
|vendorcompilers=xlc/xlf, nvcc&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== SOSCIP ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster is a Southern Ontario Smart Computing Innovation Platform ([http://soscip.org/ SOSCIP]) resource located at theUniversity of Toronto's SciNet HPC facility. The SOSCIP  multi-university/industry consortium is funded by the Ontario Government and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:soscip-support@scinet.utoronto.ca &amp;lt;soscip-support@scinet.utoronto.ca&amp;gt;] for SOSCIP GPU specific inquiries.&lt;br /&gt;
&lt;br /&gt;
== Specifications==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster consists of  of 14 IBM Power 822LC &amp;quot;Minsky&amp;quot; Servers each with 2x10core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 20 physical cores to support up to 160 threads.  Each node has 4x NVIDIA Tesla P100 GPUs each with 16GB of RAM with CUDA Capability 6.0 (Pascal) connected using NVlink.&lt;br /&gt;
&lt;br /&gt;
== Access and Login ==&lt;br /&gt;
&lt;br /&gt;
In order to obtain access to the system, you must request access to the SOSCIP GPU Platform. Instructions will have been sent to your sponsoring faculty member via E-mail at the beginning of your SOSCIP project.&lt;br /&gt;
&lt;br /&gt;
Access to the SOSCIP GPU Platform is provided through the BGQ login node, '''&amp;lt;tt&amp;gt; bgqdev.scinet.utoronto.ca &amp;lt;/tt&amp;gt;''' using ssh, and from there you can proceed to the GPU development node '''&amp;lt;tt&amp;gt;sgc01-ib0&amp;lt;/tt&amp;gt;''' via ssh. Your user name and password is the same as it is for SciNet systems.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The filesystem is shared with the BGQ system.  See [https://wiki.scinet.utoronto.ca/wiki/index.php/BGQ#Filesystem here ] for details.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU cluster uses [https://slurm.schedmd.com/ SLURM ] as a job scheduler and jobs are scheduled by node, ie 20 cores and 4 GPUs each. Jobs are submitted from the development node '''&amp;lt;tt&amp;gt;sgc01&amp;lt;/tt&amp;gt;'''. The maximum walltime per job is 12 hours (except in the 'long' queue, see below) with up to 8 nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch myjob.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where myjob.script is &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can queury job information using&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
scancel $JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Longer jobs ===&lt;br /&gt;
&lt;br /&gt;
If your job takes more than 12 hours, the sbatch command will not let you submit your job.  There is, however, a way to have jobs up to 24 hours long, by specifying &amp;quot;-p long&amp;quot; as an option (i.e., add &amp;lt;tt&amp;gt;#SBATCH -p long&amp;lt;/tt&amp;gt; to your job script).  The priority of such jobs may be throttled in the future if we see that the 'long' queue is having a negative efffect on turnover time in the queue.&lt;br /&gt;
&lt;br /&gt;
=== Interactive ===&lt;br /&gt;
&lt;br /&gt;
For an interactive session use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
salloc --gres=gpu:4&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Automatic Re-submission and Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Commonly you may have a job that you know will take longer to run than what is permissible in the queue. As long as your program contains checkpoint or restart capability, you can have one job automatically submit the next. In the following example it is assumed that the program finishes before the time limit requested and then resubmits itself by logging into the development nodes.   Job dependencies and a maximum number of job re-submissions are used to ensure sequential operation.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
: ${job_number:=&amp;quot;1&amp;quot;}           # set job_nubmer to 1 if it is undefined&lt;br /&gt;
job_number_max=3&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;hi from ${SLURM_JOB_ID}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
#RUN JOB HERE&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# SUBMIT NEXT JOB&lt;br /&gt;
if [[ ${job_number} -lt ${job_number_max} ]]&lt;br /&gt;
then&lt;br /&gt;
  (( job_number++ ))&lt;br /&gt;
  next_jobid=$(ssh sgc01-ib0 &amp;quot;cd $SLURM_SUBMIT_DIR; /opt/slurm/bin/sbatch --export=job_number=${job_number} -d afterok:${SLURM_JOB_ID} thisscript.sh | awk '{print $4}'&amp;quot;)&lt;br /&gt;
  echo &amp;quot;submitted ${next_jobid}&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
 &lt;br /&gt;
sleep 15&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;${SLURM_JOB_ID} done&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Software Installed ==&lt;br /&gt;
&lt;br /&gt;
=== IBM PowerAI ===&lt;br /&gt;
&lt;br /&gt;
The PowerAI platform contains popular open machine learning frameworks such as '''Caffe, TensorFlow, and Torch'''. Run the &amp;lt;tt&amp;gt;module avail&amp;lt;/tt&amp;gt; command for a complete listing. More information is available at this link: https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/. Release 4.0 is currently installed.&lt;br /&gt;
&lt;br /&gt;
==== GNU Compilers ====&lt;br /&gt;
&lt;br /&gt;
More recent versions of the GNU Compiler Collection (C/C++/Fortran) are provided in the IBM Advanced Toolchain with enhancements for the POWER8 CPU. To load the newer advance toolchain version use:&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V10.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/6.3.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V11.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/7.2.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More information about the IBM Advanced Toolchain can be found here: [https://developer.ibm.com/linuxonpower/advance-toolchain/ https://developer.ibm.com/linuxonpower/advance-toolchain/]&lt;br /&gt;
&lt;br /&gt;
==== IBM XL Compilers ====&lt;br /&gt;
&lt;br /&gt;
To load the native IBM xlc/xlc++ and xlf (Fortran) compilers, run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load xlc/13.1.5&lt;br /&gt;
module load xlf/15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
IBM XL Compilers are enabled for use with NVIDIA GPUs, including support for OpenMP 4.5 GPU offloading and integration with NVIDIA's nvcc command to compile host-side code for the POWER8 CPU.&lt;br /&gt;
&lt;br /&gt;
Information about the IBM XL Compilers can be found at the following links:&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSXVZZ_13.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL C/C++]&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSAT4T_15.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL Fortran]&lt;br /&gt;
&lt;br /&gt;
==== NVIDIA GPU Driver Version ====&lt;br /&gt;
&lt;br /&gt;
The current NVIDIA driver version is 384.66&lt;br /&gt;
&lt;br /&gt;
==== CUDA ====&lt;br /&gt;
&lt;br /&gt;
The current installed CUDA Tookits is are version 8.0 and version 9.0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The CUDA driver is installed locally, however the CUDA Toolkit is installed in:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/cuda-8.0&lt;br /&gt;
/usr/local/cuda-9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the &amp;lt;tt&amp;gt;/usr/local/cuda&amp;lt;/tt&amp;gt; directory is linked to the &amp;lt;tt&amp;gt;/usr/local/cuda-9.0&amp;lt;/tt&amp;gt; directory.&lt;br /&gt;
&lt;br /&gt;
Documentation and API reference information for the CUDA Toolkit can be found here: [http://docs.nvidia.com/cuda/index.html http://docs.nvidia.com/cuda/index.html]&lt;br /&gt;
&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
Currently OpenMPI has been setup on the 14 nodes connected over EDR Infiniband.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi/2.1.1-gcc-5.4.0&lt;br /&gt;
$ module load openmpi/2.1.1-XL-13_15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Other Software ==&lt;br /&gt;
&lt;br /&gt;
Other software packages can be installed onto the SOSCIP GPU Platform. It is best to try installing new software in your own home directory, which will give you control of the software (e.g. exact version, configuration, installing sub-packages, etc.).&lt;br /&gt;
&lt;br /&gt;
In the following subsections are instructions for installing several common software packages.&lt;br /&gt;
&lt;br /&gt;
=== Anaconda (Python) ===&lt;br /&gt;
&lt;br /&gt;
Anaconda is a popular distribution of the Python programming language. It contains several common Python libraries such as SciPy and NumPy as pre-built packages, which eases installation.&lt;br /&gt;
&lt;br /&gt;
Anaconda can be downloaded from here: [https://www.anaconda.com/download/#linux https://www.anaconda.com/download/#linux]&lt;br /&gt;
&lt;br /&gt;
NOTE: Be sure to download the '''Power8''' installer.&lt;br /&gt;
&lt;br /&gt;
TIP: If you plan to use Tensorflow within Anaconda, download the Python 2.7 version of Anaconda&lt;br /&gt;
&lt;br /&gt;
=== Keras ===&lt;br /&gt;
&lt;br /&gt;
Keras ([https://keras.io/ https://keras.io/]) is a popular high-level deep learning software development framework. It runs on top of other deep-learning frameworks such as TensorFlow.&lt;br /&gt;
&lt;br /&gt;
The easiest way to install Keras is to install Anaconda first, then install Keras by using using the pip command.&lt;br /&gt;
&lt;br /&gt;
Keras uses TensorFlow underneath to run neural network models. Before running code using Keras, be sure to load the PowerAI TensorFlow module and the cuda module.&lt;br /&gt;
&lt;br /&gt;
=== PyTorch ===&lt;br /&gt;
&lt;br /&gt;
PyTorch is the Python implementation of the Torch framework for deep learning. &lt;br /&gt;
&lt;br /&gt;
It is suggested that you use PyTorch within Anaconda.&lt;br /&gt;
&lt;br /&gt;
There is currently no build of PyTorch for POWER8-based systems. You will need to compile it from source.&lt;br /&gt;
&lt;br /&gt;
Obtain the source code from here: [http://pytorch.org/ http://pytorch.org/]&lt;br /&gt;
&lt;br /&gt;
Before building PyTorch, make sure to load cuda by running &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
NOTE: Do not have the gcc modules loaded when building PyTorch. Use the default version of gcc (currently v5.4.0) included with the operating system. Build will fail with later versions of gcc.&lt;br /&gt;
&lt;br /&gt;
== LINKS ==&lt;br /&gt;
&lt;br /&gt;
[https://www.olcf.ornl.gov/kb_articles/summitdev-quickstart/#System_Overview  Summit Dev System at ORNL]&lt;br /&gt;
&lt;br /&gt;
== DOCUMENTATION ==&lt;br /&gt;
&lt;br /&gt;
# GPU Cluster Introduction: [[Media:GPU_Training_01.pdf‎|SOSCIP GPU Platform]]&lt;/div&gt;</summary>
		<author><name>Wagnerse</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=File:GPU_Training_01.pdf&amp;diff=9167</id>
		<title>File:GPU Training 01.pdf</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=File:GPU_Training_01.pdf&amp;diff=9167"/>
		<updated>2018-02-23T15:06:38Z</updated>

		<summary type="html">&lt;p&gt;Wagnerse: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Wagnerse</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9115</id>
		<title>SOSCIP GPU</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9115"/>
		<updated>2018-01-18T14:23:30Z</updated>

		<summary type="html">&lt;p&gt;Wagnerse: /* Compile/Devel/Test */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:S882lc.png|center|300px|thumb]]&lt;br /&gt;
|name=SOSCIP GPU &lt;br /&gt;
|installed=September 2017&lt;br /&gt;
|operatingsystem= Ubuntu 16.04 le &lt;br /&gt;
|loginnode= sgc01 &lt;br /&gt;
|nnodes= 14x Power 8 with  4x NVIDIA P100&lt;br /&gt;
|rampernode=512 GB&lt;br /&gt;
|corespernode= 2 x 10core (20 physical, 160 SMT)&lt;br /&gt;
|interconnect=Infiniband EDR &lt;br /&gt;
|vendorcompilers=xlc/xlf, nvcc&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== SOSCIP ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster is a Southern Ontario Smart Computing Innovation Platform ([http://soscip.org/ SOSCIP]) resource located at theUniversity of Toronto's SciNet HPC facility. The SOSCIP  multi-university/industry consortium is funded by the Ontario Government and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:soscip-support@scinet.utoronto.ca &amp;lt;soscip-support@scinet.utoronto.ca&amp;gt;] for SOSCIP GPU specific inquiries.&lt;br /&gt;
&lt;br /&gt;
== Specifications==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster consists of  of 14 IBM Power 822LC &amp;quot;Minsky&amp;quot; Servers each with 2x10core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 20 physical cores to support up to 160 threads.  Each node has 4x NVIDIA Tesla P100 GPUs each with 16GB of RAM with CUDA Capability 6.0 (Pascal) connected using NVlink.&lt;br /&gt;
&lt;br /&gt;
== Access and Login ==&lt;br /&gt;
&lt;br /&gt;
In order to obtain access to the system, you must request access to the SOSCIP GPU Platform. Instructions will have been sent to your sponsoring faculty member via E-mail at the beginning of your SOSCIP project.&lt;br /&gt;
&lt;br /&gt;
Access to the SOSCIP GPU Platform is provided through the BGQ login node, '''&amp;lt;tt&amp;gt; bgqdev.scinet.utoronto.ca &amp;lt;/tt&amp;gt;''' using ssh, and from there you can proceed to the GPU development node '''&amp;lt;tt&amp;gt;sgc01-ib0&amp;lt;/tt&amp;gt;''' via ssh. Your user name and password is the same as it is for SciNet systems.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The filesystem is shared with the BGQ system.  See [https://wiki.scinet.utoronto.ca/wiki/index.php/BGQ#Filesystem here ] for details.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU cluster uses [https://slurm.schedmd.com/ SLURM ] as a job scheduler and jobs are scheduled by node, ie 20 cores and 4 GPUs each. Jobs are submitted from the development node '''&amp;lt;tt&amp;gt;sgc01&amp;lt;/tt&amp;gt;'''. The maximum walltime per job is 12 hours (except in the 'long' queue, see below) with up to 8 nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch myjob.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where myjob.script is &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can queury job information using&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
scancel $JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Longer jobs ===&lt;br /&gt;
&lt;br /&gt;
If your job takes more than 12 hours, the sbatch command will not let you submit your job.  There is, however, a way to have jobs up to 24 hours long, by specifying &amp;quot;-p long&amp;quot; as an option (i.e., add &amp;lt;tt&amp;gt;#SBATCH -p long&amp;lt;/tt&amp;gt; to your job script).  The priority of such jobs may be throttled in the future if we see that the 'long' queue is having a negative efffect on turnover time in the queue.&lt;br /&gt;
&lt;br /&gt;
=== Interactive ===&lt;br /&gt;
&lt;br /&gt;
For an interactive session use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
salloc --gres=gpu:4&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Automatic Re-submission and Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Commonly you may have a job that you know will take longer to run than what is permissible in the queue. As long as your program contains checkpoint or restart capability, you can have one job automatically submit the next. In the following example it is assumed that the program finishes before the time limit requested and then resubmits itself by logging into the development nodes.   Job dependencies and a maximum number of job re-submissions are used to ensure sequential operation.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
: ${job_number:=&amp;quot;1&amp;quot;}           # set job_nubmer to 1 if it is undefined&lt;br /&gt;
job_number_max=3&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;hi from ${SLURM_JOB_ID}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
#RUN JOB HERE&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# SUBMIT NEXT JOB&lt;br /&gt;
if [[ ${job_number} -lt ${job_number_max} ]]&lt;br /&gt;
then&lt;br /&gt;
  (( job_number++ ))&lt;br /&gt;
  next_jobid=$(ssh sgc01-ib0 &amp;quot;cd $SLURM_SUBMIT_DIR; /opt/slurm/bin/sbatch --export=job_number=${job_number} -d afterok:${SLURM_JOB_ID} thisscript.sh | awk '{print $4}'&amp;quot;)&lt;br /&gt;
  echo &amp;quot;submitted ${next_jobid}&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
 &lt;br /&gt;
sleep 15&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;${SLURM_JOB_ID} done&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Software Installed ==&lt;br /&gt;
&lt;br /&gt;
=== IBM PowerAI ===&lt;br /&gt;
&lt;br /&gt;
The PowerAI platform contains popular open machine learning frameworks such as '''Caffe, TensorFlow, and Torch'''. Run the &amp;lt;tt&amp;gt;module avail&amp;lt;/tt&amp;gt; command for a complete listing. More information is available at this link: https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/. Release 4.0 is currently installed.&lt;br /&gt;
&lt;br /&gt;
==== GNU Compilers ====&lt;br /&gt;
&lt;br /&gt;
More recent versions of the GNU Compiler Collection (C/C++/Fortran) are provided in the IBM Advanced Toolchain with enhancements for the POWER8 CPU. To load the newer advance toolchain version use:&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V10.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/6.3.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V11.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/7.2.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More information about the IBM Advanced Toolchain can be found here: [https://developer.ibm.com/linuxonpower/advance-toolchain/ https://developer.ibm.com/linuxonpower/advance-toolchain/]&lt;br /&gt;
&lt;br /&gt;
==== IBM XL Compilers ====&lt;br /&gt;
&lt;br /&gt;
To load the native IBM xlc/xlc++ and xlf (Fortran) compilers, run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load xlc/13.1.5&lt;br /&gt;
module load xlf/15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
IBM XL Compilers are enabled for use with NVIDIA GPUs, including support for OpenMP 4.5 GPU offloading and integration with NVIDIA's nvcc command to compile host-side code for the POWER8 CPU.&lt;br /&gt;
&lt;br /&gt;
Information about the IBM XL Compilers can be found at the following links:&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSXVZZ_13.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL C/C++]&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSAT4T_15.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL Fortran]&lt;br /&gt;
&lt;br /&gt;
==== NVIDIA GPU Driver Version ====&lt;br /&gt;
&lt;br /&gt;
The current NVIDIA driver version is 384.66&lt;br /&gt;
&lt;br /&gt;
==== CUDA ====&lt;br /&gt;
&lt;br /&gt;
The current installed CUDA Tookits is are version 8.0 and version 9.0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The CUDA driver is installed locally, however the CUDA Toolkit is installed in:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/cuda-8.0&lt;br /&gt;
/usr/local/cuda-9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the &amp;lt;tt&amp;gt;/usr/local/cuda&amp;lt;/tt&amp;gt; directory is linked to the &amp;lt;tt&amp;gt;/usr/local/cuda-9.0&amp;lt;/tt&amp;gt; directory.&lt;br /&gt;
&lt;br /&gt;
Documentation and API reference information for the CUDA Toolkit can be found here: [http://docs.nvidia.com/cuda/index.html http://docs.nvidia.com/cuda/index.html]&lt;br /&gt;
&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
Currently OpenMPI has been setup on the 14 nodes connected over EDR Infiniband.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi/2.1.1-gcc-5.4.0&lt;br /&gt;
$ module load openmpi/2.1.1-XL-13_15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Other Software ==&lt;br /&gt;
&lt;br /&gt;
Other software packages can be installed onto the SOSCIP GPU Platform. It is best to try installing new software in your own home directory, which will give you control of the software (e.g. exact version, configuration, installing sub-packages, etc.).&lt;br /&gt;
&lt;br /&gt;
In the following subsections are instructions for installing several common software packages.&lt;br /&gt;
&lt;br /&gt;
=== Anaconda (Python) ===&lt;br /&gt;
&lt;br /&gt;
Anaconda is a popular distribution of the Python programming language. It contains several common Python libraries such as SciPy and NumPy as pre-built packages, which eases installation.&lt;br /&gt;
&lt;br /&gt;
Anaconda can be downloaded from here: [https://www.anaconda.com/download/#linux https://www.anaconda.com/download/#linux]&lt;br /&gt;
&lt;br /&gt;
NOTE: Be sure to download the '''Power8''' installer.&lt;br /&gt;
&lt;br /&gt;
TIP: If you plan to use Tensorflow within Anaconda, download the Python 2.7 version of Anaconda&lt;br /&gt;
&lt;br /&gt;
=== Keras ===&lt;br /&gt;
&lt;br /&gt;
Keras ([https://keras.io/ https://keras.io/]) is a popular high-level deep learning software development framework. It runs on top of other deep-learning frameworks such as TensorFlow.&lt;br /&gt;
&lt;br /&gt;
The easiest way to install Keras is to install Anaconda first, then install Keras by using using the pip command.&lt;br /&gt;
&lt;br /&gt;
Keras uses TensorFlow underneath to run neural network models. Before running code using Keras, be sure to load the PowerAI TensorFlow module and the cuda module.&lt;br /&gt;
&lt;br /&gt;
=== PyTorch ===&lt;br /&gt;
&lt;br /&gt;
PyTorch is the Python implementation of the Torch framework for deep learning. &lt;br /&gt;
&lt;br /&gt;
It is suggested that you use PyTorch within Anaconda.&lt;br /&gt;
&lt;br /&gt;
There is currently no build of PyTorch for POWER8-based systems. You will need to compile it from source.&lt;br /&gt;
&lt;br /&gt;
Obtain the source code from here: [http://pytorch.org/ http://pytorch.org/]&lt;br /&gt;
&lt;br /&gt;
Before building PyTorch, make sure to load cuda by running &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
NOTE: Do not have the gcc modules loaded when building PyTorch. Use the default version of gcc (currently v5.4.0) included with the operating system. Build will fail with later versions of gcc.&lt;br /&gt;
&lt;br /&gt;
== LINKS ==&lt;br /&gt;
&lt;br /&gt;
[https://www.olcf.ornl.gov/kb_articles/summitdev-quickstart/#System_Overview  Summit Dev System at ORNL]&lt;/div&gt;</summary>
		<author><name>Wagnerse</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9107</id>
		<title>SOSCIP GPU</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9107"/>
		<updated>2017-12-20T14:32:31Z</updated>

		<summary type="html">&lt;p&gt;Wagnerse: /* CUDA */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:S882lc.png|center|300px|thumb]]&lt;br /&gt;
|name=SOSCIP GPU &lt;br /&gt;
|installed=September 2017&lt;br /&gt;
|operatingsystem= Ubuntu 16.04 le &lt;br /&gt;
|loginnode= sgc01 &lt;br /&gt;
|nnodes= 14x Power 8 with  4x NVIDIA P100&lt;br /&gt;
|rampernode=512 GB&lt;br /&gt;
|corespernode= 2 x 10core (20 physical, 160 SMT)&lt;br /&gt;
|interconnect=Infiniband EDR &lt;br /&gt;
|vendorcompilers=xlc/xlf, nvcc&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== SOSCIP ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster is a Southern Ontario Smart Computing Innovation Platform ([http://soscip.org/ SOSCIP]) resource located at theUniversity of Toronto's SciNet HPC facility. The SOSCIP  multi-university/industry consortium is funded by the Ontario Government and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:soscip-support@scinet.utoronto.ca &amp;lt;soscip-support@scinet.utoronto.ca&amp;gt;] for SOSCIP GPU specific inquiries.&lt;br /&gt;
&lt;br /&gt;
== Specifications==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster consists of  of 14 IBM Power 822LC &amp;quot;Minsky&amp;quot; Servers each with 2x10core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 20 physical cores to support up to 160 threads.  Each node has 4x NVIDIA Tesla P100 GPUs each with 16GB of RAM with CUDA Capability 6.0 (Pascal) connected using NVlink.&lt;br /&gt;
&lt;br /&gt;
== Compile/Devel/Test ==&lt;br /&gt;
&lt;br /&gt;
Access is provided through the BGQ login node, '''&amp;lt;tt&amp;gt; bgqdev.scinet.utoronto.ca &amp;lt;/tt&amp;gt;''' using ssh, and from there you can proceed to the GPU development node '''&amp;lt;tt&amp;gt;sgc01-ib0&amp;lt;/tt&amp;gt;'''.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The filesystem is shared with the BGQ system.  See [https://wiki.scinet.utoronto.ca/wiki/index.php/BGQ#Filesystem here ] for details.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU cluster uses [https://slurm.schedmd.com/ SLURM ] as a job scheduler and jobs are scheduled by node, ie 20 cores and 4 GPUs each. Jobs are submitted from the development node '''&amp;lt;tt&amp;gt;sgc01&amp;lt;/tt&amp;gt;'''. The maximum walltime per job is 12 hours (except in the 'long' queue, see below) with up to 8 nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch myjob.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where myjob.script is &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can queury job information using&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
scancel $JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Longer jobs ===&lt;br /&gt;
&lt;br /&gt;
If your job takes more than 12 hours, the sbatch command will not let you submit your job.  There is, however, a way to have jobs up to 24 hours long, by specifying &amp;quot;-p long&amp;quot; as an option (i.e., add &amp;lt;tt&amp;gt;#SBATCH -p long&amp;lt;/tt&amp;gt; to your job script).  The priority of such jobs may be throttled in the future if we see that the 'long' queue is having a negative efffect on turnover time in the queue.&lt;br /&gt;
&lt;br /&gt;
=== Interactive ===&lt;br /&gt;
&lt;br /&gt;
For an interactive session use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
salloc --gres=gpu:4&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Automatic Re-submission and Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Commonly you may have a job that you know will take longer to run than what is permissible in the queue. As long as your program contains checkpoint or restart capability, you can have one job automatically submit the next. In the following example it is assumed that the program finishes before the time limit requested and then resubmits itself by logging into the development nodes.   Job dependencies and a maximum number of job re-submissions are used to ensure sequential operation.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
: ${job_number:=&amp;quot;1&amp;quot;}           # set job_nubmer to 1 if it is undefined&lt;br /&gt;
job_number_max=3&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;hi from ${SLURM_JOB_ID}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
#RUN JOB HERE&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# SUBMIT NEXT JOB&lt;br /&gt;
if [[ ${job_number} -lt ${job_number_max} ]]&lt;br /&gt;
then&lt;br /&gt;
  (( job_number++ ))&lt;br /&gt;
  next_jobid=$(ssh sgc01-ib0 &amp;quot;cd $SLURM_SUBMIT_DIR; /opt/slurm/bin/sbatch --export=job_number=${job_number} -d afterok:${SLURM_JOB_ID} thisscript.sh | awk '{print $4}'&amp;quot;)&lt;br /&gt;
  echo &amp;quot;submitted ${next_jobid}&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
 &lt;br /&gt;
sleep 15&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;${SLURM_JOB_ID} done&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Software Installed ==&lt;br /&gt;
&lt;br /&gt;
=== IBM PowerAI ===&lt;br /&gt;
&lt;br /&gt;
The PowerAI platform contains popular open machine learning frameworks such as '''Caffe, TensorFlow, and Torch'''. Run the &amp;lt;tt&amp;gt;module avail&amp;lt;/tt&amp;gt; command for a complete listing. More information is available at this link: https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/. Release 4.0 is currently installed.&lt;br /&gt;
&lt;br /&gt;
==== GNU Compilers ====&lt;br /&gt;
&lt;br /&gt;
More recent versions of the GNU Compiler Collection (C/C++/Fortran) are provided in the IBM Advanced Toolchain with enhancements for the POWER8 CPU. To load the newer advance toolchain version use:&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V10.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/6.3.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V11.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/7.2.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More information about the IBM Advanced Toolchain can be found here: [https://developer.ibm.com/linuxonpower/advance-toolchain/ https://developer.ibm.com/linuxonpower/advance-toolchain/]&lt;br /&gt;
&lt;br /&gt;
==== IBM XL Compilers ====&lt;br /&gt;
&lt;br /&gt;
To load the native IBM xlc/xlc++ and xlf (Fortran) compilers, run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load xlc/13.1.5&lt;br /&gt;
module load xlf/15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
IBM XL Compilers are enabled for use with NVIDIA GPUs, including support for OpenMP 4.5 GPU offloading and integration with NVIDIA's nvcc command to compile host-side code for the POWER8 CPU.&lt;br /&gt;
&lt;br /&gt;
Information about the IBM XL Compilers can be found at the following links:&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSXVZZ_13.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL C/C++]&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSAT4T_15.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL Fortran]&lt;br /&gt;
&lt;br /&gt;
==== NVIDIA GPU Driver Version ====&lt;br /&gt;
&lt;br /&gt;
The current NVIDIA driver version is 384.66&lt;br /&gt;
&lt;br /&gt;
==== CUDA ====&lt;br /&gt;
&lt;br /&gt;
The current installed CUDA Tookits is are version 8.0 and version 9.0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The CUDA driver is installed locally, however the CUDA Toolkit is installed in:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/cuda-8.0&lt;br /&gt;
/usr/local/cuda-9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the &amp;lt;tt&amp;gt;/usr/local/cuda&amp;lt;/tt&amp;gt; directory is linked to the &amp;lt;tt&amp;gt;/usr/local/cuda-9.0&amp;lt;/tt&amp;gt; directory.&lt;br /&gt;
&lt;br /&gt;
Documentation and API reference information for the CUDA Toolkit can be found here: [http://docs.nvidia.com/cuda/index.html http://docs.nvidia.com/cuda/index.html]&lt;br /&gt;
&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
Currently OpenMPI has been setup on the 14 nodes connected over EDR Infiniband.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi/2.1.1-gcc-5.4.0&lt;br /&gt;
$ module load openmpi/2.1.1-XL-13_15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Other Software ==&lt;br /&gt;
&lt;br /&gt;
Other software packages can be installed onto the SOSCIP GPU Platform. It is best to try installing new software in your own home directory, which will give you control of the software (e.g. exact version, configuration, installing sub-packages, etc.).&lt;br /&gt;
&lt;br /&gt;
In the following subsections are instructions for installing several common software packages.&lt;br /&gt;
&lt;br /&gt;
=== Anaconda (Python) ===&lt;br /&gt;
&lt;br /&gt;
Anaconda is a popular distribution of the Python programming language. It contains several common Python libraries such as SciPy and NumPy as pre-built packages, which eases installation.&lt;br /&gt;
&lt;br /&gt;
Anaconda can be downloaded from here: [https://www.anaconda.com/download/#linux https://www.anaconda.com/download/#linux]&lt;br /&gt;
&lt;br /&gt;
NOTE: Be sure to download the '''Power8''' installer.&lt;br /&gt;
&lt;br /&gt;
TIP: If you plan to use Tensorflow within Anaconda, download the Python 2.7 version of Anaconda&lt;br /&gt;
&lt;br /&gt;
=== Keras ===&lt;br /&gt;
&lt;br /&gt;
Keras ([https://keras.io/ https://keras.io/]) is a popular high-level deep learning software development framework. It runs on top of other deep-learning frameworks such as TensorFlow.&lt;br /&gt;
&lt;br /&gt;
The easiest way to install Keras is to install Anaconda first, then install Keras by using using the pip command.&lt;br /&gt;
&lt;br /&gt;
Keras uses TensorFlow underneath to run neural network models. Before running code using Keras, be sure to load the PowerAI TensorFlow module and the cuda module.&lt;br /&gt;
&lt;br /&gt;
=== PyTorch ===&lt;br /&gt;
&lt;br /&gt;
PyTorch is the Python implementation of the Torch framework for deep learning. &lt;br /&gt;
&lt;br /&gt;
It is suggested that you use PyTorch within Anaconda.&lt;br /&gt;
&lt;br /&gt;
There is currently no build of PyTorch for POWER8-based systems. You will need to compile it from source.&lt;br /&gt;
&lt;br /&gt;
Obtain the source code from here: [http://pytorch.org/ http://pytorch.org/]&lt;br /&gt;
&lt;br /&gt;
Before building PyTorch, make sure to load cuda by running &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
NOTE: Do not have the gcc modules loaded when building PyTorch. Use the default version of gcc (currently v5.4.0) included with the operating system. Build will fail with later versions of gcc.&lt;/div&gt;</summary>
		<author><name>Wagnerse</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9092</id>
		<title>SOSCIP GPU</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9092"/>
		<updated>2017-12-06T15:45:03Z</updated>

		<summary type="html">&lt;p&gt;Wagnerse: /* IBM XL Compilers */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:S882lc.png|center|300px|thumb]]&lt;br /&gt;
|name=SOSCIP GPU &lt;br /&gt;
|installed=September 2017&lt;br /&gt;
|operatingsystem= Ubuntu 16.04 le &lt;br /&gt;
|loginnode= sgc01 &lt;br /&gt;
|nnodes= 14x Power 8 with  4x NVIDIA P100&lt;br /&gt;
|rampernode=512 GB&lt;br /&gt;
|corespernode= 2 x 10core (20 physical, 160 SMT)&lt;br /&gt;
|interconnect=Infiniband EDR &lt;br /&gt;
|vendorcompilers=xlc/xlf, nvcc&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== SOSCIP ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster is a Southern Ontario Smart Computing Innovation Platform ([http://soscip.org/ SOSCIP]) resource located at theUniversity of Toronto's SciNet HPC facility. The SOSCIP  multi-university/industry consortium is funded by the Ontario Government and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:soscip-support@scinet.utoronto.ca &amp;lt;soscip-support@scinet.utoronto.ca&amp;gt;] for SOSCIP GPU specific inquiries.&lt;br /&gt;
&lt;br /&gt;
== Specifications==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster consists of  of 14 IBM Power 822LC &amp;quot;Minsky&amp;quot; Servers each with 2x10core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 20 physical cores to support up to 160 threads.  Each node has 4x NVIDIA Tesla P100 GPUs each with 16GB of RAM with CUDA Capability 6.0 (Pascal) connected using NVlink.&lt;br /&gt;
&lt;br /&gt;
== Compile/Devel/Test ==&lt;br /&gt;
&lt;br /&gt;
Access is provided through the BGQ login node, '''&amp;lt;tt&amp;gt; bgqdev.scinet.utoronto.ca &amp;lt;/tt&amp;gt;''' using ssh, and from there you can proceed to the GPU development node '''&amp;lt;tt&amp;gt;sgc01-ib0&amp;lt;/tt&amp;gt;'''.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The filesystem is shared with the BGQ system.  See [https://wiki.scinet.utoronto.ca/wiki/index.php/BGQ#Filesystem here ] for details.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU cluster uses [https://slurm.schedmd.com/ SLURM ] as a job scheduler and jobs are scheduled by node, ie 20 cores and 4 GPUs each. Jobs are submitted from the development node '''&amp;lt;tt&amp;gt;sgc01&amp;lt;/tt&amp;gt;'''. The maximum walltime per job is 12 hours (except in the 'long' queue, see below) with up to 8 nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch myjob.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where myjob.script is &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can queury job information using&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
scancel $JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Longer jobs ===&lt;br /&gt;
&lt;br /&gt;
If your job takes more than 12 hours, the sbatch command will not let you submit your job.  There is, however, a way to have jobs up to 24 hours long, by specifying &amp;quot;-p long&amp;quot; as an option (i.e., add &amp;lt;tt&amp;gt;#SBATCH -p long&amp;lt;/tt&amp;gt; to your job script).  The priority of such jobs may be throttled in the future if we see that the 'long' queue is having a negative efffect on turnover time in the queue.&lt;br /&gt;
&lt;br /&gt;
=== Interactive ===&lt;br /&gt;
&lt;br /&gt;
For an interactive session use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
salloc --gres=gpu:4&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Automatic Re-submission and Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Commonly you may have a job that you know will take longer to run than what is permissible in the queue. As long as your program contains checkpoint or restart capability, you can have one job automatically submit the next. In the following example it is assumed that the program finishes before the time limit requested and then resubmits itself by logging into the development nodes.   Job dependencies and a maximum number of job re-submissions are used to ensure sequential operation.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
: ${job_number:=&amp;quot;1&amp;quot;}           # set job_nubmer to 1 if it is undefined&lt;br /&gt;
job_number_max=3&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;hi from ${SLURM_JOB_ID}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
#RUN JOB HERE&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# SUBMIT NEXT JOB&lt;br /&gt;
if [[ ${job_number} -lt ${job_number_max} ]]&lt;br /&gt;
then&lt;br /&gt;
  (( job_number++ ))&lt;br /&gt;
  next_jobid=$(ssh sgc01-ib0 &amp;quot;cd $SLURM_SUBMIT_DIR; /opt/slurm/bin/sbatch --export=job_number=${job_number} -d afterok:${SLURM_JOB_ID} thisscript.sh | awk '{print $4}'&amp;quot;)&lt;br /&gt;
  echo &amp;quot;submitted ${next_jobid}&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
 &lt;br /&gt;
sleep 15&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;${SLURM_JOB_ID} done&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Software Installed ==&lt;br /&gt;
&lt;br /&gt;
=== IBM PowerAI ===&lt;br /&gt;
&lt;br /&gt;
The PowerAI platform contains popular open machine learning frameworks such as '''Caffe, TensorFlow, and Torch'''. Run the &amp;lt;tt&amp;gt;module avail&amp;lt;/tt&amp;gt; command for a complete listing. More information is available at this link: https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/. Release 4.0 is currently installed.&lt;br /&gt;
&lt;br /&gt;
==== GNU Compilers ====&lt;br /&gt;
&lt;br /&gt;
More recent versions of the GNU Compiler Collection (C/C++/Fortran) are provided in the IBM Advanced Toolchain with enhancements for the POWER8 CPU. To load the newer advance toolchain version use:&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V10.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/6.3.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V11.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/7.2.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More information about the IBM Advanced Toolchain can be found here: [https://developer.ibm.com/linuxonpower/advance-toolchain/ https://developer.ibm.com/linuxonpower/advance-toolchain/]&lt;br /&gt;
&lt;br /&gt;
==== IBM XL Compilers ====&lt;br /&gt;
&lt;br /&gt;
To load the native IBM xlc/xlc++ and xlf (Fortran) compilers, run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load xlc/13.1.5&lt;br /&gt;
module load xlf/15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
IBM XL Compilers are enabled for use with NVIDIA GPUs, including support for OpenMP 4.5 GPU offloading and integration with NVIDIA's nvcc command to compile host-side code for the POWER8 CPU.&lt;br /&gt;
&lt;br /&gt;
Information about the IBM XL Compilers can be found at the following links:&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSXVZZ_13.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL C/C++]&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSAT4T_15.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL Fortran]&lt;br /&gt;
&lt;br /&gt;
==== NVIDIA GPU Driver Version ====&lt;br /&gt;
&lt;br /&gt;
The current NVIDIA driver version is 384.66&lt;br /&gt;
&lt;br /&gt;
==== CUDA ====&lt;br /&gt;
&lt;br /&gt;
The current installed CUDA Tookits is are version 8.0 and version 9.0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The CUDA driver is installed locally, however the CUDA Toolkit is installed in:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/cuda-8.0&lt;br /&gt;
/usr/local/cuda-9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the &amp;lt;tt&amp;gt;/usr/local/cuda&amp;lt;/tt&amp;gt; directory is linked to the &amp;lt;tt&amp;gt;/usr/local/cuda-9.0&amp;lt;/tt&amp;gt; directory.&lt;br /&gt;
&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
Currently OpenMPI has been setup on the 14 nodes connected over EDR Infiniband.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi/2.1.1-gcc-5.4.0&lt;br /&gt;
$ module load openmpi/2.1.1-XL-13_15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Other Software ==&lt;br /&gt;
&lt;br /&gt;
Other software packages can be installed onto the SOSCIP GPU Platform. It is best to try installing new software in your own home directory, which will give you control of the software (e.g. exact version, configuration, installing sub-packages, etc.).&lt;br /&gt;
&lt;br /&gt;
In the following subsections are instructions for installing several common software packages.&lt;br /&gt;
&lt;br /&gt;
=== Anaconda (Python) ===&lt;br /&gt;
&lt;br /&gt;
Anaconda is a popular distribution of the Python programming language. It contains several common Python libraries such as SciPy and NumPy as pre-built packages, which eases installation.&lt;br /&gt;
&lt;br /&gt;
Anaconda can be downloaded from here: [https://www.anaconda.com/download/#linux https://www.anaconda.com/download/#linux]&lt;br /&gt;
&lt;br /&gt;
NOTE: Be sure to download the '''Power8''' installer.&lt;br /&gt;
&lt;br /&gt;
TIP: If you plan to use Tensorflow within Anaconda, download the Python 2.7 version of Anaconda&lt;br /&gt;
&lt;br /&gt;
=== Keras ===&lt;br /&gt;
&lt;br /&gt;
Keras ([https://keras.io/ https://keras.io/]) is a popular high-level deep learning software development framework. It runs on top of other deep-learning frameworks such as TensorFlow.&lt;br /&gt;
&lt;br /&gt;
The easiest way to install Keras is to install Anaconda first, then install Keras by using using the pip command.&lt;br /&gt;
&lt;br /&gt;
Keras uses TensorFlow underneath to run neural network models. Before running code using Keras, be sure to load the PowerAI TensorFlow module and the cuda module.&lt;br /&gt;
&lt;br /&gt;
=== PyTorch ===&lt;br /&gt;
&lt;br /&gt;
PyTorch is the Python implementation of the Torch framework for deep learning. &lt;br /&gt;
&lt;br /&gt;
It is suggested that you use PyTorch within Anaconda.&lt;br /&gt;
&lt;br /&gt;
There is currently no build of PyTorch for POWER8-based systems. You will need to compile it from source.&lt;br /&gt;
&lt;br /&gt;
Obtain the source code from here: [http://pytorch.org/ http://pytorch.org/]&lt;br /&gt;
&lt;br /&gt;
Before building PyTorch, make sure to load cuda by running &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
NOTE: Do not have the gcc modules loaded when building PyTorch. Use the default version of gcc (currently v5.4.0) included with the operating system. Build will fail with later versions of gcc.&lt;/div&gt;</summary>
		<author><name>Wagnerse</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9091</id>
		<title>SOSCIP GPU</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9091"/>
		<updated>2017-12-06T15:39:35Z</updated>

		<summary type="html">&lt;p&gt;Wagnerse: /* IBM PowerAI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:S882lc.png|center|300px|thumb]]&lt;br /&gt;
|name=SOSCIP GPU &lt;br /&gt;
|installed=September 2017&lt;br /&gt;
|operatingsystem= Ubuntu 16.04 le &lt;br /&gt;
|loginnode= sgc01 &lt;br /&gt;
|nnodes= 14x Power 8 with  4x NVIDIA P100&lt;br /&gt;
|rampernode=512 GB&lt;br /&gt;
|corespernode= 2 x 10core (20 physical, 160 SMT)&lt;br /&gt;
|interconnect=Infiniband EDR &lt;br /&gt;
|vendorcompilers=xlc/xlf, nvcc&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== SOSCIP ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster is a Southern Ontario Smart Computing Innovation Platform ([http://soscip.org/ SOSCIP]) resource located at theUniversity of Toronto's SciNet HPC facility. The SOSCIP  multi-university/industry consortium is funded by the Ontario Government and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:soscip-support@scinet.utoronto.ca &amp;lt;soscip-support@scinet.utoronto.ca&amp;gt;] for SOSCIP GPU specific inquiries.&lt;br /&gt;
&lt;br /&gt;
== Specifications==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster consists of  of 14 IBM Power 822LC &amp;quot;Minsky&amp;quot; Servers each with 2x10core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 20 physical cores to support up to 160 threads.  Each node has 4x NVIDIA Tesla P100 GPUs each with 16GB of RAM with CUDA Capability 6.0 (Pascal) connected using NVlink.&lt;br /&gt;
&lt;br /&gt;
== Compile/Devel/Test ==&lt;br /&gt;
&lt;br /&gt;
Access is provided through the BGQ login node, '''&amp;lt;tt&amp;gt; bgqdev.scinet.utoronto.ca &amp;lt;/tt&amp;gt;''' using ssh, and from there you can proceed to the GPU development node '''&amp;lt;tt&amp;gt;sgc01-ib0&amp;lt;/tt&amp;gt;'''.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The filesystem is shared with the BGQ system.  See [https://wiki.scinet.utoronto.ca/wiki/index.php/BGQ#Filesystem here ] for details.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU cluster uses [https://slurm.schedmd.com/ SLURM ] as a job scheduler and jobs are scheduled by node, ie 20 cores and 4 GPUs each. Jobs are submitted from the development node '''&amp;lt;tt&amp;gt;sgc01&amp;lt;/tt&amp;gt;'''. The maximum walltime per job is 12 hours (except in the 'long' queue, see below) with up to 8 nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch myjob.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where myjob.script is &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can queury job information using&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
scancel $JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Longer jobs ===&lt;br /&gt;
&lt;br /&gt;
If your job takes more than 12 hours, the sbatch command will not let you submit your job.  There is, however, a way to have jobs up to 24 hours long, by specifying &amp;quot;-p long&amp;quot; as an option (i.e., add &amp;lt;tt&amp;gt;#SBATCH -p long&amp;lt;/tt&amp;gt; to your job script).  The priority of such jobs may be throttled in the future if we see that the 'long' queue is having a negative efffect on turnover time in the queue.&lt;br /&gt;
&lt;br /&gt;
=== Interactive ===&lt;br /&gt;
&lt;br /&gt;
For an interactive session use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
salloc --gres=gpu:4&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Automatic Re-submission and Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Commonly you may have a job that you know will take longer to run than what is permissible in the queue. As long as your program contains checkpoint or restart capability, you can have one job automatically submit the next. In the following example it is assumed that the program finishes before the time limit requested and then resubmits itself by logging into the development nodes.   Job dependencies and a maximum number of job re-submissions are used to ensure sequential operation.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
: ${job_number:=&amp;quot;1&amp;quot;}           # set job_nubmer to 1 if it is undefined&lt;br /&gt;
job_number_max=3&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;hi from ${SLURM_JOB_ID}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
#RUN JOB HERE&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# SUBMIT NEXT JOB&lt;br /&gt;
if [[ ${job_number} -lt ${job_number_max} ]]&lt;br /&gt;
then&lt;br /&gt;
  (( job_number++ ))&lt;br /&gt;
  next_jobid=$(ssh sgc01-ib0 &amp;quot;cd $SLURM_SUBMIT_DIR; /opt/slurm/bin/sbatch --export=job_number=${job_number} -d afterok:${SLURM_JOB_ID} thisscript.sh | awk '{print $4}'&amp;quot;)&lt;br /&gt;
  echo &amp;quot;submitted ${next_jobid}&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
 &lt;br /&gt;
sleep 15&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;${SLURM_JOB_ID} done&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Software Installed ==&lt;br /&gt;
&lt;br /&gt;
=== IBM PowerAI ===&lt;br /&gt;
&lt;br /&gt;
The PowerAI platform contains popular open machine learning frameworks such as '''Caffe, TensorFlow, and Torch'''. Run the &amp;lt;tt&amp;gt;module avail&amp;lt;/tt&amp;gt; command for a complete listing. More information is available at this link: https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/. Release 4.0 is currently installed.&lt;br /&gt;
&lt;br /&gt;
==== GNU Compilers ====&lt;br /&gt;
&lt;br /&gt;
More recent versions of the GNU Compiler Collection (C/C++/Fortran) are provided in the IBM Advanced Toolchain with enhancements for the POWER8 CPU. To load the newer advance toolchain version use:&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V10.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/6.3.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V11.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/7.2.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More information about the IBM Advanced Toolchain can be found here: [https://developer.ibm.com/linuxonpower/advance-toolchain/ https://developer.ibm.com/linuxonpower/advance-toolchain/]&lt;br /&gt;
&lt;br /&gt;
==== IBM XL Compilers ====&lt;br /&gt;
&lt;br /&gt;
To load the native IBM xlc/xlc++ and xlf (Fortran) compilers, run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load xlc/13.1.5&lt;br /&gt;
module load xlf/15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Information about the IBM XL Compilers can be found at the following links:&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSXVZZ_13.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL C/C++]&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSAT4T_15.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL Fortran]&lt;br /&gt;
&lt;br /&gt;
==== NVIDIA GPU Driver Version ====&lt;br /&gt;
&lt;br /&gt;
The current NVIDIA driver version is 384.66&lt;br /&gt;
&lt;br /&gt;
==== CUDA ====&lt;br /&gt;
&lt;br /&gt;
The current installed CUDA Tookits is are version 8.0 and version 9.0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The CUDA driver is installed locally, however the CUDA Toolkit is installed in:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/cuda-8.0&lt;br /&gt;
/usr/local/cuda-9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the &amp;lt;tt&amp;gt;/usr/local/cuda&amp;lt;/tt&amp;gt; directory is linked to the &amp;lt;tt&amp;gt;/usr/local/cuda-9.0&amp;lt;/tt&amp;gt; directory.&lt;br /&gt;
&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
Currently OpenMPI has been setup on the 14 nodes connected over EDR Infiniband.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi/2.1.1-gcc-5.4.0&lt;br /&gt;
$ module load openmpi/2.1.1-XL-13_15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Other Software ==&lt;br /&gt;
&lt;br /&gt;
Other software packages can be installed onto the SOSCIP GPU Platform. It is best to try installing new software in your own home directory, which will give you control of the software (e.g. exact version, configuration, installing sub-packages, etc.).&lt;br /&gt;
&lt;br /&gt;
In the following subsections are instructions for installing several common software packages.&lt;br /&gt;
&lt;br /&gt;
=== Anaconda (Python) ===&lt;br /&gt;
&lt;br /&gt;
Anaconda is a popular distribution of the Python programming language. It contains several common Python libraries such as SciPy and NumPy as pre-built packages, which eases installation.&lt;br /&gt;
&lt;br /&gt;
Anaconda can be downloaded from here: [https://www.anaconda.com/download/#linux https://www.anaconda.com/download/#linux]&lt;br /&gt;
&lt;br /&gt;
NOTE: Be sure to download the '''Power8''' installer.&lt;br /&gt;
&lt;br /&gt;
TIP: If you plan to use Tensorflow within Anaconda, download the Python 2.7 version of Anaconda&lt;br /&gt;
&lt;br /&gt;
=== Keras ===&lt;br /&gt;
&lt;br /&gt;
Keras ([https://keras.io/ https://keras.io/]) is a popular high-level deep learning software development framework. It runs on top of other deep-learning frameworks such as TensorFlow.&lt;br /&gt;
&lt;br /&gt;
The easiest way to install Keras is to install Anaconda first, then install Keras by using using the pip command.&lt;br /&gt;
&lt;br /&gt;
Keras uses TensorFlow underneath to run neural network models. Before running code using Keras, be sure to load the PowerAI TensorFlow module and the cuda module.&lt;br /&gt;
&lt;br /&gt;
=== PyTorch ===&lt;br /&gt;
&lt;br /&gt;
PyTorch is the Python implementation of the Torch framework for deep learning. &lt;br /&gt;
&lt;br /&gt;
It is suggested that you use PyTorch within Anaconda.&lt;br /&gt;
&lt;br /&gt;
There is currently no build of PyTorch for POWER8-based systems. You will need to compile it from source.&lt;br /&gt;
&lt;br /&gt;
Obtain the source code from here: [http://pytorch.org/ http://pytorch.org/]&lt;br /&gt;
&lt;br /&gt;
Before building PyTorch, make sure to load cuda by running &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
NOTE: Do not have the gcc modules loaded when building PyTorch. Use the default version of gcc (currently v5.4.0) included with the operating system. Build will fail with later versions of gcc.&lt;/div&gt;</summary>
		<author><name>Wagnerse</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9090</id>
		<title>SOSCIP GPU</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9090"/>
		<updated>2017-12-06T15:38:59Z</updated>

		<summary type="html">&lt;p&gt;Wagnerse: /* Keras */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:S882lc.png|center|300px|thumb]]&lt;br /&gt;
|name=SOSCIP GPU &lt;br /&gt;
|installed=September 2017&lt;br /&gt;
|operatingsystem= Ubuntu 16.04 le &lt;br /&gt;
|loginnode= sgc01 &lt;br /&gt;
|nnodes= 14x Power 8 with  4x NVIDIA P100&lt;br /&gt;
|rampernode=512 GB&lt;br /&gt;
|corespernode= 2 x 10core (20 physical, 160 SMT)&lt;br /&gt;
|interconnect=Infiniband EDR &lt;br /&gt;
|vendorcompilers=xlc/xlf, nvcc&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== SOSCIP ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster is a Southern Ontario Smart Computing Innovation Platform ([http://soscip.org/ SOSCIP]) resource located at theUniversity of Toronto's SciNet HPC facility. The SOSCIP  multi-university/industry consortium is funded by the Ontario Government and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:soscip-support@scinet.utoronto.ca &amp;lt;soscip-support@scinet.utoronto.ca&amp;gt;] for SOSCIP GPU specific inquiries.&lt;br /&gt;
&lt;br /&gt;
== Specifications==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster consists of  of 14 IBM Power 822LC &amp;quot;Minsky&amp;quot; Servers each with 2x10core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 20 physical cores to support up to 160 threads.  Each node has 4x NVIDIA Tesla P100 GPUs each with 16GB of RAM with CUDA Capability 6.0 (Pascal) connected using NVlink.&lt;br /&gt;
&lt;br /&gt;
== Compile/Devel/Test ==&lt;br /&gt;
&lt;br /&gt;
Access is provided through the BGQ login node, '''&amp;lt;tt&amp;gt; bgqdev.scinet.utoronto.ca &amp;lt;/tt&amp;gt;''' using ssh, and from there you can proceed to the GPU development node '''&amp;lt;tt&amp;gt;sgc01-ib0&amp;lt;/tt&amp;gt;'''.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The filesystem is shared with the BGQ system.  See [https://wiki.scinet.utoronto.ca/wiki/index.php/BGQ#Filesystem here ] for details.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU cluster uses [https://slurm.schedmd.com/ SLURM ] as a job scheduler and jobs are scheduled by node, ie 20 cores and 4 GPUs each. Jobs are submitted from the development node '''&amp;lt;tt&amp;gt;sgc01&amp;lt;/tt&amp;gt;'''. The maximum walltime per job is 12 hours (except in the 'long' queue, see below) with up to 8 nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch myjob.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where myjob.script is &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can queury job information using&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
scancel $JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Longer jobs ===&lt;br /&gt;
&lt;br /&gt;
If your job takes more than 12 hours, the sbatch command will not let you submit your job.  There is, however, a way to have jobs up to 24 hours long, by specifying &amp;quot;-p long&amp;quot; as an option (i.e., add &amp;lt;tt&amp;gt;#SBATCH -p long&amp;lt;/tt&amp;gt; to your job script).  The priority of such jobs may be throttled in the future if we see that the 'long' queue is having a negative efffect on turnover time in the queue.&lt;br /&gt;
&lt;br /&gt;
=== Interactive ===&lt;br /&gt;
&lt;br /&gt;
For an interactive session use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
salloc --gres=gpu:4&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Automatic Re-submission and Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Commonly you may have a job that you know will take longer to run than what is permissible in the queue. As long as your program contains checkpoint or restart capability, you can have one job automatically submit the next. In the following example it is assumed that the program finishes before the time limit requested and then resubmits itself by logging into the development nodes.   Job dependencies and a maximum number of job re-submissions are used to ensure sequential operation.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
: ${job_number:=&amp;quot;1&amp;quot;}           # set job_nubmer to 1 if it is undefined&lt;br /&gt;
job_number_max=3&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;hi from ${SLURM_JOB_ID}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
#RUN JOB HERE&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# SUBMIT NEXT JOB&lt;br /&gt;
if [[ ${job_number} -lt ${job_number_max} ]]&lt;br /&gt;
then&lt;br /&gt;
  (( job_number++ ))&lt;br /&gt;
  next_jobid=$(ssh sgc01-ib0 &amp;quot;cd $SLURM_SUBMIT_DIR; /opt/slurm/bin/sbatch --export=job_number=${job_number} -d afterok:${SLURM_JOB_ID} thisscript.sh | awk '{print $4}'&amp;quot;)&lt;br /&gt;
  echo &amp;quot;submitted ${next_jobid}&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
 &lt;br /&gt;
sleep 15&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;${SLURM_JOB_ID} done&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Software Installed ==&lt;br /&gt;
&lt;br /&gt;
=== IBM PowerAI ===&lt;br /&gt;
&lt;br /&gt;
The PowerAI platform contains popular open machine learning frameworks such as '''Caffe, Tensorflow, and Torch'''. Run the &amp;lt;tt&amp;gt;module avail&amp;lt;/tt&amp;gt; command for a complete listing. More information is available at this link: https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/. Release 4.0 is currently installed.&lt;br /&gt;
&lt;br /&gt;
==== GNU Compilers ====&lt;br /&gt;
&lt;br /&gt;
More recent versions of the GNU Compiler Collection (C/C++/Fortran) are provided in the IBM Advanced Toolchain with enhancements for the POWER8 CPU. To load the newer advance toolchain version use:&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V10.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/6.3.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V11.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/7.2.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More information about the IBM Advanced Toolchain can be found here: [https://developer.ibm.com/linuxonpower/advance-toolchain/ https://developer.ibm.com/linuxonpower/advance-toolchain/]&lt;br /&gt;
&lt;br /&gt;
==== IBM XL Compilers ====&lt;br /&gt;
&lt;br /&gt;
To load the native IBM xlc/xlc++ and xlf (Fortran) compilers, run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load xlc/13.1.5&lt;br /&gt;
module load xlf/15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Information about the IBM XL Compilers can be found at the following links:&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSXVZZ_13.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL C/C++]&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSAT4T_15.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL Fortran]&lt;br /&gt;
&lt;br /&gt;
==== NVIDIA GPU Driver Version ====&lt;br /&gt;
&lt;br /&gt;
The current NVIDIA driver version is 384.66&lt;br /&gt;
&lt;br /&gt;
==== CUDA ====&lt;br /&gt;
&lt;br /&gt;
The current installed CUDA Tookits is are version 8.0 and version 9.0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The CUDA driver is installed locally, however the CUDA Toolkit is installed in:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/cuda-8.0&lt;br /&gt;
/usr/local/cuda-9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the &amp;lt;tt&amp;gt;/usr/local/cuda&amp;lt;/tt&amp;gt; directory is linked to the &amp;lt;tt&amp;gt;/usr/local/cuda-9.0&amp;lt;/tt&amp;gt; directory.&lt;br /&gt;
&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
Currently OpenMPI has been setup on the 14 nodes connected over EDR Infiniband.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi/2.1.1-gcc-5.4.0&lt;br /&gt;
$ module load openmpi/2.1.1-XL-13_15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Other Software ==&lt;br /&gt;
&lt;br /&gt;
Other software packages can be installed onto the SOSCIP GPU Platform. It is best to try installing new software in your own home directory, which will give you control of the software (e.g. exact version, configuration, installing sub-packages, etc.).&lt;br /&gt;
&lt;br /&gt;
In the following subsections are instructions for installing several common software packages.&lt;br /&gt;
&lt;br /&gt;
=== Anaconda (Python) ===&lt;br /&gt;
&lt;br /&gt;
Anaconda is a popular distribution of the Python programming language. It contains several common Python libraries such as SciPy and NumPy as pre-built packages, which eases installation.&lt;br /&gt;
&lt;br /&gt;
Anaconda can be downloaded from here: [https://www.anaconda.com/download/#linux https://www.anaconda.com/download/#linux]&lt;br /&gt;
&lt;br /&gt;
NOTE: Be sure to download the '''Power8''' installer.&lt;br /&gt;
&lt;br /&gt;
TIP: If you plan to use Tensorflow within Anaconda, download the Python 2.7 version of Anaconda&lt;br /&gt;
&lt;br /&gt;
=== Keras ===&lt;br /&gt;
&lt;br /&gt;
Keras ([https://keras.io/ https://keras.io/]) is a popular high-level deep learning software development framework. It runs on top of other deep-learning frameworks such as TensorFlow.&lt;br /&gt;
&lt;br /&gt;
The easiest way to install Keras is to install Anaconda first, then install Keras by using using the pip command.&lt;br /&gt;
&lt;br /&gt;
Keras uses TensorFlow underneath to run neural network models. Before running code using Keras, be sure to load the PowerAI TensorFlow module and the cuda module.&lt;br /&gt;
&lt;br /&gt;
=== PyTorch ===&lt;br /&gt;
&lt;br /&gt;
PyTorch is the Python implementation of the Torch framework for deep learning. &lt;br /&gt;
&lt;br /&gt;
It is suggested that you use PyTorch within Anaconda.&lt;br /&gt;
&lt;br /&gt;
There is currently no build of PyTorch for POWER8-based systems. You will need to compile it from source.&lt;br /&gt;
&lt;br /&gt;
Obtain the source code from here: [http://pytorch.org/ http://pytorch.org/]&lt;br /&gt;
&lt;br /&gt;
Before building PyTorch, make sure to load cuda by running &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
NOTE: Do not have the gcc modules loaded when building PyTorch. Use the default version of gcc (currently v5.4.0) included with the operating system. Build will fail with later versions of gcc.&lt;/div&gt;</summary>
		<author><name>Wagnerse</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9089</id>
		<title>SOSCIP GPU</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9089"/>
		<updated>2017-12-06T15:36:08Z</updated>

		<summary type="html">&lt;p&gt;Wagnerse: /* Keras */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:S882lc.png|center|300px|thumb]]&lt;br /&gt;
|name=SOSCIP GPU &lt;br /&gt;
|installed=September 2017&lt;br /&gt;
|operatingsystem= Ubuntu 16.04 le &lt;br /&gt;
|loginnode= sgc01 &lt;br /&gt;
|nnodes= 14x Power 8 with  4x NVIDIA P100&lt;br /&gt;
|rampernode=512 GB&lt;br /&gt;
|corespernode= 2 x 10core (20 physical, 160 SMT)&lt;br /&gt;
|interconnect=Infiniband EDR &lt;br /&gt;
|vendorcompilers=xlc/xlf, nvcc&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== SOSCIP ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster is a Southern Ontario Smart Computing Innovation Platform ([http://soscip.org/ SOSCIP]) resource located at theUniversity of Toronto's SciNet HPC facility. The SOSCIP  multi-university/industry consortium is funded by the Ontario Government and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:soscip-support@scinet.utoronto.ca &amp;lt;soscip-support@scinet.utoronto.ca&amp;gt;] for SOSCIP GPU specific inquiries.&lt;br /&gt;
&lt;br /&gt;
== Specifications==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster consists of  of 14 IBM Power 822LC &amp;quot;Minsky&amp;quot; Servers each with 2x10core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 20 physical cores to support up to 160 threads.  Each node has 4x NVIDIA Tesla P100 GPUs each with 16GB of RAM with CUDA Capability 6.0 (Pascal) connected using NVlink.&lt;br /&gt;
&lt;br /&gt;
== Compile/Devel/Test ==&lt;br /&gt;
&lt;br /&gt;
Access is provided through the BGQ login node, '''&amp;lt;tt&amp;gt; bgqdev.scinet.utoronto.ca &amp;lt;/tt&amp;gt;''' using ssh, and from there you can proceed to the GPU development node '''&amp;lt;tt&amp;gt;sgc01-ib0&amp;lt;/tt&amp;gt;'''.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The filesystem is shared with the BGQ system.  See [https://wiki.scinet.utoronto.ca/wiki/index.php/BGQ#Filesystem here ] for details.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU cluster uses [https://slurm.schedmd.com/ SLURM ] as a job scheduler and jobs are scheduled by node, ie 20 cores and 4 GPUs each. Jobs are submitted from the development node '''&amp;lt;tt&amp;gt;sgc01&amp;lt;/tt&amp;gt;'''. The maximum walltime per job is 12 hours (except in the 'long' queue, see below) with up to 8 nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch myjob.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where myjob.script is &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can queury job information using&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
scancel $JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Longer jobs ===&lt;br /&gt;
&lt;br /&gt;
If your job takes more than 12 hours, the sbatch command will not let you submit your job.  There is, however, a way to have jobs up to 24 hours long, by specifying &amp;quot;-p long&amp;quot; as an option (i.e., add &amp;lt;tt&amp;gt;#SBATCH -p long&amp;lt;/tt&amp;gt; to your job script).  The priority of such jobs may be throttled in the future if we see that the 'long' queue is having a negative efffect on turnover time in the queue.&lt;br /&gt;
&lt;br /&gt;
=== Interactive ===&lt;br /&gt;
&lt;br /&gt;
For an interactive session use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
salloc --gres=gpu:4&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Automatic Re-submission and Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Commonly you may have a job that you know will take longer to run than what is permissible in the queue. As long as your program contains checkpoint or restart capability, you can have one job automatically submit the next. In the following example it is assumed that the program finishes before the time limit requested and then resubmits itself by logging into the development nodes.   Job dependencies and a maximum number of job re-submissions are used to ensure sequential operation.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
: ${job_number:=&amp;quot;1&amp;quot;}           # set job_nubmer to 1 if it is undefined&lt;br /&gt;
job_number_max=3&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;hi from ${SLURM_JOB_ID}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
#RUN JOB HERE&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# SUBMIT NEXT JOB&lt;br /&gt;
if [[ ${job_number} -lt ${job_number_max} ]]&lt;br /&gt;
then&lt;br /&gt;
  (( job_number++ ))&lt;br /&gt;
  next_jobid=$(ssh sgc01-ib0 &amp;quot;cd $SLURM_SUBMIT_DIR; /opt/slurm/bin/sbatch --export=job_number=${job_number} -d afterok:${SLURM_JOB_ID} thisscript.sh | awk '{print $4}'&amp;quot;)&lt;br /&gt;
  echo &amp;quot;submitted ${next_jobid}&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
 &lt;br /&gt;
sleep 15&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;${SLURM_JOB_ID} done&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Software Installed ==&lt;br /&gt;
&lt;br /&gt;
=== IBM PowerAI ===&lt;br /&gt;
&lt;br /&gt;
The PowerAI platform contains popular open machine learning frameworks such as '''Caffe, Tensorflow, and Torch'''. Run the &amp;lt;tt&amp;gt;module avail&amp;lt;/tt&amp;gt; command for a complete listing. More information is available at this link: https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/. Release 4.0 is currently installed.&lt;br /&gt;
&lt;br /&gt;
==== GNU Compilers ====&lt;br /&gt;
&lt;br /&gt;
More recent versions of the GNU Compiler Collection (C/C++/Fortran) are provided in the IBM Advanced Toolchain with enhancements for the POWER8 CPU. To load the newer advance toolchain version use:&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V10.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/6.3.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V11.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/7.2.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More information about the IBM Advanced Toolchain can be found here: [https://developer.ibm.com/linuxonpower/advance-toolchain/ https://developer.ibm.com/linuxonpower/advance-toolchain/]&lt;br /&gt;
&lt;br /&gt;
==== IBM XL Compilers ====&lt;br /&gt;
&lt;br /&gt;
To load the native IBM xlc/xlc++ and xlf (Fortran) compilers, run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load xlc/13.1.5&lt;br /&gt;
module load xlf/15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Information about the IBM XL Compilers can be found at the following links:&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSXVZZ_13.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL C/C++]&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSAT4T_15.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL Fortran]&lt;br /&gt;
&lt;br /&gt;
==== NVIDIA GPU Driver Version ====&lt;br /&gt;
&lt;br /&gt;
The current NVIDIA driver version is 384.66&lt;br /&gt;
&lt;br /&gt;
==== CUDA ====&lt;br /&gt;
&lt;br /&gt;
The current installed CUDA Tookits is are version 8.0 and version 9.0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The CUDA driver is installed locally, however the CUDA Toolkit is installed in:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/cuda-8.0&lt;br /&gt;
/usr/local/cuda-9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the &amp;lt;tt&amp;gt;/usr/local/cuda&amp;lt;/tt&amp;gt; directory is linked to the &amp;lt;tt&amp;gt;/usr/local/cuda-9.0&amp;lt;/tt&amp;gt; directory.&lt;br /&gt;
&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
Currently OpenMPI has been setup on the 14 nodes connected over EDR Infiniband.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi/2.1.1-gcc-5.4.0&lt;br /&gt;
$ module load openmpi/2.1.1-XL-13_15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Other Software ==&lt;br /&gt;
&lt;br /&gt;
Other software packages can be installed onto the SOSCIP GPU Platform. It is best to try installing new software in your own home directory, which will give you control of the software (e.g. exact version, configuration, installing sub-packages, etc.).&lt;br /&gt;
&lt;br /&gt;
In the following subsections are instructions for installing several common software packages.&lt;br /&gt;
&lt;br /&gt;
=== Anaconda (Python) ===&lt;br /&gt;
&lt;br /&gt;
Anaconda is a popular distribution of the Python programming language. It contains several common Python libraries such as SciPy and NumPy as pre-built packages, which eases installation.&lt;br /&gt;
&lt;br /&gt;
Anaconda can be downloaded from here: [https://www.anaconda.com/download/#linux https://www.anaconda.com/download/#linux]&lt;br /&gt;
&lt;br /&gt;
NOTE: Be sure to download the '''Power8''' installer.&lt;br /&gt;
&lt;br /&gt;
TIP: If you plan to use Tensorflow within Anaconda, download the Python 2.7 version of Anaconda&lt;br /&gt;
&lt;br /&gt;
=== Keras ===&lt;br /&gt;
&lt;br /&gt;
Keras is a popular high-level deep learning software development framework. It runs on top of other deep-learning frameworks such as TensorFlow.&lt;br /&gt;
&lt;br /&gt;
The easiest way to install Keras is to install Anaconda first, then install Keras by using using the pip command.&lt;br /&gt;
&lt;br /&gt;
Keras uses TensorFlow underneath to run neural network models. Before running code using Keras, be sure to load the PowerAI TensorFlow module and the cuda module.&lt;br /&gt;
&lt;br /&gt;
=== PyTorch ===&lt;br /&gt;
&lt;br /&gt;
PyTorch is the Python implementation of the Torch framework for deep learning. &lt;br /&gt;
&lt;br /&gt;
It is suggested that you use PyTorch within Anaconda.&lt;br /&gt;
&lt;br /&gt;
There is currently no build of PyTorch for POWER8-based systems. You will need to compile it from source.&lt;br /&gt;
&lt;br /&gt;
Obtain the source code from here: [http://pytorch.org/ http://pytorch.org/]&lt;br /&gt;
&lt;br /&gt;
Before building PyTorch, make sure to load cuda by running &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
NOTE: Do not have the gcc modules loaded when building PyTorch. Use the default version of gcc (currently v5.4.0) included with the operating system. Build will fail with later versions of gcc.&lt;/div&gt;</summary>
		<author><name>Wagnerse</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9088</id>
		<title>SOSCIP GPU</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9088"/>
		<updated>2017-12-06T15:32:50Z</updated>

		<summary type="html">&lt;p&gt;Wagnerse: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:S882lc.png|center|300px|thumb]]&lt;br /&gt;
|name=SOSCIP GPU &lt;br /&gt;
|installed=September 2017&lt;br /&gt;
|operatingsystem= Ubuntu 16.04 le &lt;br /&gt;
|loginnode= sgc01 &lt;br /&gt;
|nnodes= 14x Power 8 with  4x NVIDIA P100&lt;br /&gt;
|rampernode=512 GB&lt;br /&gt;
|corespernode= 2 x 10core (20 physical, 160 SMT)&lt;br /&gt;
|interconnect=Infiniband EDR &lt;br /&gt;
|vendorcompilers=xlc/xlf, nvcc&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== SOSCIP ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster is a Southern Ontario Smart Computing Innovation Platform ([http://soscip.org/ SOSCIP]) resource located at theUniversity of Toronto's SciNet HPC facility. The SOSCIP  multi-university/industry consortium is funded by the Ontario Government and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:soscip-support@scinet.utoronto.ca &amp;lt;soscip-support@scinet.utoronto.ca&amp;gt;] for SOSCIP GPU specific inquiries.&lt;br /&gt;
&lt;br /&gt;
== Specifications==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster consists of  of 14 IBM Power 822LC &amp;quot;Minsky&amp;quot; Servers each with 2x10core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 20 physical cores to support up to 160 threads.  Each node has 4x NVIDIA Tesla P100 GPUs each with 16GB of RAM with CUDA Capability 6.0 (Pascal) connected using NVlink.&lt;br /&gt;
&lt;br /&gt;
== Compile/Devel/Test ==&lt;br /&gt;
&lt;br /&gt;
Access is provided through the BGQ login node, '''&amp;lt;tt&amp;gt; bgqdev.scinet.utoronto.ca &amp;lt;/tt&amp;gt;''' using ssh, and from there you can proceed to the GPU development node '''&amp;lt;tt&amp;gt;sgc01-ib0&amp;lt;/tt&amp;gt;'''.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The filesystem is shared with the BGQ system.  See [https://wiki.scinet.utoronto.ca/wiki/index.php/BGQ#Filesystem here ] for details.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU cluster uses [https://slurm.schedmd.com/ SLURM ] as a job scheduler and jobs are scheduled by node, ie 20 cores and 4 GPUs each. Jobs are submitted from the development node '''&amp;lt;tt&amp;gt;sgc01&amp;lt;/tt&amp;gt;'''. The maximum walltime per job is 12 hours (except in the 'long' queue, see below) with up to 8 nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch myjob.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where myjob.script is &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can queury job information using&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
scancel $JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Longer jobs ===&lt;br /&gt;
&lt;br /&gt;
If your job takes more than 12 hours, the sbatch command will not let you submit your job.  There is, however, a way to have jobs up to 24 hours long, by specifying &amp;quot;-p long&amp;quot; as an option (i.e., add &amp;lt;tt&amp;gt;#SBATCH -p long&amp;lt;/tt&amp;gt; to your job script).  The priority of such jobs may be throttled in the future if we see that the 'long' queue is having a negative efffect on turnover time in the queue.&lt;br /&gt;
&lt;br /&gt;
=== Interactive ===&lt;br /&gt;
&lt;br /&gt;
For an interactive session use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
salloc --gres=gpu:4&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Automatic Re-submission and Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Commonly you may have a job that you know will take longer to run than what is permissible in the queue. As long as your program contains checkpoint or restart capability, you can have one job automatically submit the next. In the following example it is assumed that the program finishes before the time limit requested and then resubmits itself by logging into the development nodes.   Job dependencies and a maximum number of job re-submissions are used to ensure sequential operation.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
: ${job_number:=&amp;quot;1&amp;quot;}           # set job_nubmer to 1 if it is undefined&lt;br /&gt;
job_number_max=3&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;hi from ${SLURM_JOB_ID}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
#RUN JOB HERE&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# SUBMIT NEXT JOB&lt;br /&gt;
if [[ ${job_number} -lt ${job_number_max} ]]&lt;br /&gt;
then&lt;br /&gt;
  (( job_number++ ))&lt;br /&gt;
  next_jobid=$(ssh sgc01-ib0 &amp;quot;cd $SLURM_SUBMIT_DIR; /opt/slurm/bin/sbatch --export=job_number=${job_number} -d afterok:${SLURM_JOB_ID} thisscript.sh | awk '{print $4}'&amp;quot;)&lt;br /&gt;
  echo &amp;quot;submitted ${next_jobid}&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
 &lt;br /&gt;
sleep 15&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;${SLURM_JOB_ID} done&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Software Installed ==&lt;br /&gt;
&lt;br /&gt;
=== IBM PowerAI ===&lt;br /&gt;
&lt;br /&gt;
The PowerAI platform contains popular open machine learning frameworks such as '''Caffe, Tensorflow, and Torch'''. Run the &amp;lt;tt&amp;gt;module avail&amp;lt;/tt&amp;gt; command for a complete listing. More information is available at this link: https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/. Release 4.0 is currently installed.&lt;br /&gt;
&lt;br /&gt;
==== GNU Compilers ====&lt;br /&gt;
&lt;br /&gt;
More recent versions of the GNU Compiler Collection (C/C++/Fortran) are provided in the IBM Advanced Toolchain with enhancements for the POWER8 CPU. To load the newer advance toolchain version use:&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V10.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/6.3.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V11.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/7.2.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More information about the IBM Advanced Toolchain can be found here: [https://developer.ibm.com/linuxonpower/advance-toolchain/ https://developer.ibm.com/linuxonpower/advance-toolchain/]&lt;br /&gt;
&lt;br /&gt;
==== IBM XL Compilers ====&lt;br /&gt;
&lt;br /&gt;
To load the native IBM xlc/xlc++ and xlf (Fortran) compilers, run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load xlc/13.1.5&lt;br /&gt;
module load xlf/15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Information about the IBM XL Compilers can be found at the following links:&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSXVZZ_13.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL C/C++]&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSAT4T_15.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL Fortran]&lt;br /&gt;
&lt;br /&gt;
==== NVIDIA GPU Driver Version ====&lt;br /&gt;
&lt;br /&gt;
The current NVIDIA driver version is 384.66&lt;br /&gt;
&lt;br /&gt;
==== CUDA ====&lt;br /&gt;
&lt;br /&gt;
The current installed CUDA Tookits is are version 8.0 and version 9.0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The CUDA driver is installed locally, however the CUDA Toolkit is installed in:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/cuda-8.0&lt;br /&gt;
/usr/local/cuda-9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the &amp;lt;tt&amp;gt;/usr/local/cuda&amp;lt;/tt&amp;gt; directory is linked to the &amp;lt;tt&amp;gt;/usr/local/cuda-9.0&amp;lt;/tt&amp;gt; directory.&lt;br /&gt;
&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
Currently OpenMPI has been setup on the 14 nodes connected over EDR Infiniband.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi/2.1.1-gcc-5.4.0&lt;br /&gt;
$ module load openmpi/2.1.1-XL-13_15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Other Software ==&lt;br /&gt;
&lt;br /&gt;
Other software packages can be installed onto the SOSCIP GPU Platform. It is best to try installing new software in your own home directory, which will give you control of the software (e.g. exact version, configuration, installing sub-packages, etc.).&lt;br /&gt;
&lt;br /&gt;
In the following subsections are instructions for installing several common software packages.&lt;br /&gt;
&lt;br /&gt;
=== Anaconda (Python) ===&lt;br /&gt;
&lt;br /&gt;
Anaconda is a popular distribution of the Python programming language. It contains several common Python libraries such as SciPy and NumPy as pre-built packages, which eases installation.&lt;br /&gt;
&lt;br /&gt;
Anaconda can be downloaded from here: [https://www.anaconda.com/download/#linux https://www.anaconda.com/download/#linux]&lt;br /&gt;
&lt;br /&gt;
NOTE: Be sure to download the '''Power8''' installer.&lt;br /&gt;
&lt;br /&gt;
TIP: If you plan to use Tensorflow within Anaconda, download the Python 2.7 version of Anaconda&lt;br /&gt;
&lt;br /&gt;
=== Keras ===&lt;br /&gt;
&lt;br /&gt;
Keras is a popular high-level deep learning software development framework. It runs on top of other deep-learning frameworks such as TensorFlow.&lt;br /&gt;
&lt;br /&gt;
The easiest way to install Keras is to install Anaconda first, then install Keras by using using the pip command.&lt;br /&gt;
&lt;br /&gt;
Keras uses TensorFlow underneath to run neural network models. Before running code using Keras, be sure to load the PowerAI TensorFlow module.&lt;br /&gt;
&lt;br /&gt;
=== PyTorch ===&lt;br /&gt;
&lt;br /&gt;
PyTorch is the Python implementation of the Torch framework for deep learning. &lt;br /&gt;
&lt;br /&gt;
It is suggested that you use PyTorch within Anaconda.&lt;br /&gt;
&lt;br /&gt;
There is currently no build of PyTorch for POWER8-based systems. You will need to compile it from source.&lt;br /&gt;
&lt;br /&gt;
Obtain the source code from here: [http://pytorch.org/ http://pytorch.org/]&lt;br /&gt;
&lt;br /&gt;
Before building PyTorch, make sure to load cuda by running &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
NOTE: Do not have the gcc modules loaded when building PyTorch. Use the default version of gcc (currently v5.4.0) included with the operating system. Build will fail with later versions of gcc.&lt;/div&gt;</summary>
		<author><name>Wagnerse</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9087</id>
		<title>SOSCIP GPU</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9087"/>
		<updated>2017-12-06T15:04:51Z</updated>

		<summary type="html">&lt;p&gt;Wagnerse: /* IBM PowerAI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:S882lc.png|center|300px|thumb]]&lt;br /&gt;
|name=SOSCIP GPU &lt;br /&gt;
|installed=September 2017&lt;br /&gt;
|operatingsystem= Ubuntu 16.04 le &lt;br /&gt;
|loginnode= sgc01 &lt;br /&gt;
|nnodes= 14x Power 8 with  4x NVIDIA P100&lt;br /&gt;
|rampernode=512 GB&lt;br /&gt;
|corespernode= 2 x 10core (20 physical, 160 SMT)&lt;br /&gt;
|interconnect=Infiniband EDR &lt;br /&gt;
|vendorcompilers=xlc/xlf, nvcc&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== SOSCIP ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster is a Southern Ontario Smart Computing Innovation Platform ([http://soscip.org/ SOSCIP]) resource located at theUniversity of Toronto's SciNet HPC facility. The SOSCIP  multi-university/industry consortium is funded by the Ontario Government and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:soscip-support@scinet.utoronto.ca &amp;lt;soscip-support@scinet.utoronto.ca&amp;gt;] for SOSCIP GPU specific inquiries.&lt;br /&gt;
&lt;br /&gt;
== Specifications==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster consists of  of 14 IBM Power 822LC &amp;quot;Minsky&amp;quot; Servers each with 2x10core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 20 physical cores to support up to 160 threads.  Each node has 4x NVIDIA Tesla P100 GPUs each with 16GB of RAM with CUDA Capability 6.0 (Pascal) connected using NVlink.&lt;br /&gt;
&lt;br /&gt;
== Compile/Devel/Test ==&lt;br /&gt;
&lt;br /&gt;
Access is provided through the BGQ login node, '''&amp;lt;tt&amp;gt; bgqdev.scinet.utoronto.ca &amp;lt;/tt&amp;gt;''' using ssh, and from there you can proceed to the GPU development node '''&amp;lt;tt&amp;gt;sgc01-ib0&amp;lt;/tt&amp;gt;'''.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The filesystem is shared with the BGQ system.  See [https://wiki.scinet.utoronto.ca/wiki/index.php/BGQ#Filesystem here ] for details.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU cluster uses [https://slurm.schedmd.com/ SLURM ] as a job scheduler and jobs are scheduled by node, ie 20 cores and 4 GPUs each. Jobs are submitted from the development node '''&amp;lt;tt&amp;gt;sgc01&amp;lt;/tt&amp;gt;'''. The maximum walltime per job is 12 hours (except in the 'long' queue, see below) with up to 8 nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch myjob.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where myjob.script is &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can queury job information using&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
scancel $JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Longer jobs ===&lt;br /&gt;
&lt;br /&gt;
If your job takes more than 12 hours, the sbatch command will not let you submit your job.  There is, however, a way to have jobs up to 24 hours long, by specifying &amp;quot;-p long&amp;quot; as an option (i.e., add &amp;lt;tt&amp;gt;#SBATCH -p long&amp;lt;/tt&amp;gt; to your job script).  The priority of such jobs may be throttled in the future if we see that the 'long' queue is having a negative efffect on turnover time in the queue.&lt;br /&gt;
&lt;br /&gt;
=== Interactive ===&lt;br /&gt;
&lt;br /&gt;
For an interactive session use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
salloc --gres=gpu:4&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Automatic Re-submission and Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Commonly you may have a job that you know will take longer to run than what is permissible in the queue. As long as your program contains checkpoint or restart capability, you can have one job automatically submit the next. In the following example it is assumed that the program finishes before the time limit requested and then resubmits itself by logging into the development nodes.   Job dependencies and a maximum number of job re-submissions are used to ensure sequential operation.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
: ${job_number:=&amp;quot;1&amp;quot;}           # set job_nubmer to 1 if it is undefined&lt;br /&gt;
job_number_max=3&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;hi from ${SLURM_JOB_ID}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
#RUN JOB HERE&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# SUBMIT NEXT JOB&lt;br /&gt;
if [[ ${job_number} -lt ${job_number_max} ]]&lt;br /&gt;
then&lt;br /&gt;
  (( job_number++ ))&lt;br /&gt;
  next_jobid=$(ssh sgc01-ib0 &amp;quot;cd $SLURM_SUBMIT_DIR; /opt/slurm/bin/sbatch --export=job_number=${job_number} -d afterok:${SLURM_JOB_ID} thisscript.sh | awk '{print $4}'&amp;quot;)&lt;br /&gt;
  echo &amp;quot;submitted ${next_jobid}&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
 &lt;br /&gt;
sleep 15&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;${SLURM_JOB_ID} done&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Software Installed ==&lt;br /&gt;
&lt;br /&gt;
=== IBM PowerAI ===&lt;br /&gt;
&lt;br /&gt;
The PowerAI platform contains popular open machine learning frameworks such as '''Caffe, Tensorflow, and Torch'''. Run the &amp;lt;tt&amp;gt;module avail&amp;lt;/tt&amp;gt; command for a complete listing. More information is available at this link: https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/. Release 4.0 is currently installed.&lt;br /&gt;
&lt;br /&gt;
==== GNU Compilers ====&lt;br /&gt;
&lt;br /&gt;
More recent versions of the GNU Compiler Collection (C/C++/Fortran) are provided in the IBM Advanced Toolchain with enhancements for the POWER8 CPU. To load the newer advance toolchain version use:&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V10.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/6.3.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V11.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/7.2.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More information about the IBM Advanced Toolchain can be found here: [https://developer.ibm.com/linuxonpower/advance-toolchain/ https://developer.ibm.com/linuxonpower/advance-toolchain/]&lt;br /&gt;
&lt;br /&gt;
==== IBM XL Compilers ====&lt;br /&gt;
&lt;br /&gt;
To load the native IBM xlc/xlc++ and xlf (Fortran) compilers, run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load xlc/13.1.5&lt;br /&gt;
module load xlf/15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Information about the IBM XL Compilers can be found at the following links:&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSXVZZ_13.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL C/C++]&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSAT4T_15.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL Fortran]&lt;br /&gt;
&lt;br /&gt;
==== NVIDIA GPU Driver Version ====&lt;br /&gt;
&lt;br /&gt;
The current NVIDIA driver version is 384.66&lt;br /&gt;
&lt;br /&gt;
==== CUDA ====&lt;br /&gt;
&lt;br /&gt;
The current installed CUDA Tookits is are version 8.0 and version 9.0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The CUDA driver is installed locally, however the CUDA Toolkit is installed in:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/cuda-8.0&lt;br /&gt;
/usr/local/cuda-9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the &amp;lt;tt&amp;gt;/usr/local/cuda&amp;lt;/tt&amp;gt; directory is linked to the &amp;lt;tt&amp;gt;/usr/local/cuda-9.0&amp;lt;/tt&amp;gt; directory.&lt;br /&gt;
&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
Currently OpenMPI has been setup on the 14 nodes connected over EDR Infiniband.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi/2.1.1-gcc-5.4.0&lt;br /&gt;
$ module load openmpi/2.1.1-XL-13_15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Wagnerse</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9086</id>
		<title>SOSCIP GPU</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9086"/>
		<updated>2017-12-06T15:04:29Z</updated>

		<summary type="html">&lt;p&gt;Wagnerse: /* Software Installed */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:S882lc.png|center|300px|thumb]]&lt;br /&gt;
|name=SOSCIP GPU &lt;br /&gt;
|installed=September 2017&lt;br /&gt;
|operatingsystem= Ubuntu 16.04 le &lt;br /&gt;
|loginnode= sgc01 &lt;br /&gt;
|nnodes= 14x Power 8 with  4x NVIDIA P100&lt;br /&gt;
|rampernode=512 GB&lt;br /&gt;
|corespernode= 2 x 10core (20 physical, 160 SMT)&lt;br /&gt;
|interconnect=Infiniband EDR &lt;br /&gt;
|vendorcompilers=xlc/xlf, nvcc&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== SOSCIP ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster is a Southern Ontario Smart Computing Innovation Platform ([http://soscip.org/ SOSCIP]) resource located at theUniversity of Toronto's SciNet HPC facility. The SOSCIP  multi-university/industry consortium is funded by the Ontario Government and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:soscip-support@scinet.utoronto.ca &amp;lt;soscip-support@scinet.utoronto.ca&amp;gt;] for SOSCIP GPU specific inquiries.&lt;br /&gt;
&lt;br /&gt;
== Specifications==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster consists of  of 14 IBM Power 822LC &amp;quot;Minsky&amp;quot; Servers each with 2x10core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 20 physical cores to support up to 160 threads.  Each node has 4x NVIDIA Tesla P100 GPUs each with 16GB of RAM with CUDA Capability 6.0 (Pascal) connected using NVlink.&lt;br /&gt;
&lt;br /&gt;
== Compile/Devel/Test ==&lt;br /&gt;
&lt;br /&gt;
Access is provided through the BGQ login node, '''&amp;lt;tt&amp;gt; bgqdev.scinet.utoronto.ca &amp;lt;/tt&amp;gt;''' using ssh, and from there you can proceed to the GPU development node '''&amp;lt;tt&amp;gt;sgc01-ib0&amp;lt;/tt&amp;gt;'''.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The filesystem is shared with the BGQ system.  See [https://wiki.scinet.utoronto.ca/wiki/index.php/BGQ#Filesystem here ] for details.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU cluster uses [https://slurm.schedmd.com/ SLURM ] as a job scheduler and jobs are scheduled by node, ie 20 cores and 4 GPUs each. Jobs are submitted from the development node '''&amp;lt;tt&amp;gt;sgc01&amp;lt;/tt&amp;gt;'''. The maximum walltime per job is 12 hours (except in the 'long' queue, see below) with up to 8 nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch myjob.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where myjob.script is &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can queury job information using&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
scancel $JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Longer jobs ===&lt;br /&gt;
&lt;br /&gt;
If your job takes more than 12 hours, the sbatch command will not let you submit your job.  There is, however, a way to have jobs up to 24 hours long, by specifying &amp;quot;-p long&amp;quot; as an option (i.e., add &amp;lt;tt&amp;gt;#SBATCH -p long&amp;lt;/tt&amp;gt; to your job script).  The priority of such jobs may be throttled in the future if we see that the 'long' queue is having a negative efffect on turnover time in the queue.&lt;br /&gt;
&lt;br /&gt;
=== Interactive ===&lt;br /&gt;
&lt;br /&gt;
For an interactive session use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
salloc --gres=gpu:4&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Automatic Re-submission and Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Commonly you may have a job that you know will take longer to run than what is permissible in the queue. As long as your program contains checkpoint or restart capability, you can have one job automatically submit the next. In the following example it is assumed that the program finishes before the time limit requested and then resubmits itself by logging into the development nodes.   Job dependencies and a maximum number of job re-submissions are used to ensure sequential operation.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
: ${job_number:=&amp;quot;1&amp;quot;}           # set job_nubmer to 1 if it is undefined&lt;br /&gt;
job_number_max=3&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;hi from ${SLURM_JOB_ID}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
#RUN JOB HERE&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# SUBMIT NEXT JOB&lt;br /&gt;
if [[ ${job_number} -lt ${job_number_max} ]]&lt;br /&gt;
then&lt;br /&gt;
  (( job_number++ ))&lt;br /&gt;
  next_jobid=$(ssh sgc01-ib0 &amp;quot;cd $SLURM_SUBMIT_DIR; /opt/slurm/bin/sbatch --export=job_number=${job_number} -d afterok:${SLURM_JOB_ID} thisscript.sh | awk '{print $4}'&amp;quot;)&lt;br /&gt;
  echo &amp;quot;submitted ${next_jobid}&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
 &lt;br /&gt;
sleep 15&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;${SLURM_JOB_ID} done&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Software Installed ==&lt;br /&gt;
&lt;br /&gt;
=== IBM PowerAI ===&lt;br /&gt;
&lt;br /&gt;
The PowerAI platform contains popular open machine learning frameworks such as Caffe, Tensorflow, and Torch. Run the &amp;lt;tt&amp;gt;module avail&amp;lt;/tt&amp;gt; command for a complete listing. More information is available at this link: https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/. Release 4.0 is currently installed.&lt;br /&gt;
&lt;br /&gt;
==== GNU Compilers ====&lt;br /&gt;
&lt;br /&gt;
More recent versions of the GNU Compiler Collection (C/C++/Fortran) are provided in the IBM Advanced Toolchain with enhancements for the POWER8 CPU. To load the newer advance toolchain version use:&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V10.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/6.3.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V11.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/7.2.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More information about the IBM Advanced Toolchain can be found here: [https://developer.ibm.com/linuxonpower/advance-toolchain/ https://developer.ibm.com/linuxonpower/advance-toolchain/]&lt;br /&gt;
&lt;br /&gt;
==== IBM XL Compilers ====&lt;br /&gt;
&lt;br /&gt;
To load the native IBM xlc/xlc++ and xlf (Fortran) compilers, run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load xlc/13.1.5&lt;br /&gt;
module load xlf/15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Information about the IBM XL Compilers can be found at the following links:&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSXVZZ_13.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL C/C++]&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSAT4T_15.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL Fortran]&lt;br /&gt;
&lt;br /&gt;
==== NVIDIA GPU Driver Version ====&lt;br /&gt;
&lt;br /&gt;
The current NVIDIA driver version is 384.66&lt;br /&gt;
&lt;br /&gt;
==== CUDA ====&lt;br /&gt;
&lt;br /&gt;
The current installed CUDA Tookits is are version 8.0 and version 9.0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The CUDA driver is installed locally, however the CUDA Toolkit is installed in:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/cuda-8.0&lt;br /&gt;
/usr/local/cuda-9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the &amp;lt;tt&amp;gt;/usr/local/cuda&amp;lt;/tt&amp;gt; directory is linked to the &amp;lt;tt&amp;gt;/usr/local/cuda-9.0&amp;lt;/tt&amp;gt; directory.&lt;br /&gt;
&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
Currently OpenMPI has been setup on the 14 nodes connected over EDR Infiniband.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi/2.1.1-gcc-5.4.0&lt;br /&gt;
$ module load openmpi/2.1.1-XL-13_15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Wagnerse</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9085</id>
		<title>SOSCIP GPU</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9085"/>
		<updated>2017-12-06T15:01:42Z</updated>

		<summary type="html">&lt;p&gt;Wagnerse: /* Software */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:S882lc.png|center|300px|thumb]]&lt;br /&gt;
|name=SOSCIP GPU &lt;br /&gt;
|installed=September 2017&lt;br /&gt;
|operatingsystem= Ubuntu 16.04 le &lt;br /&gt;
|loginnode= sgc01 &lt;br /&gt;
|nnodes= 14x Power 8 with  4x NVIDIA P100&lt;br /&gt;
|rampernode=512 GB&lt;br /&gt;
|corespernode= 2 x 10core (20 physical, 160 SMT)&lt;br /&gt;
|interconnect=Infiniband EDR &lt;br /&gt;
|vendorcompilers=xlc/xlf, nvcc&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== SOSCIP ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster is a Southern Ontario Smart Computing Innovation Platform ([http://soscip.org/ SOSCIP]) resource located at theUniversity of Toronto's SciNet HPC facility. The SOSCIP  multi-university/industry consortium is funded by the Ontario Government and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:soscip-support@scinet.utoronto.ca &amp;lt;soscip-support@scinet.utoronto.ca&amp;gt;] for SOSCIP GPU specific inquiries.&lt;br /&gt;
&lt;br /&gt;
== Specifications==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster consists of  of 14 IBM Power 822LC &amp;quot;Minsky&amp;quot; Servers each with 2x10core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 20 physical cores to support up to 160 threads.  Each node has 4x NVIDIA Tesla P100 GPUs each with 16GB of RAM with CUDA Capability 6.0 (Pascal) connected using NVlink.&lt;br /&gt;
&lt;br /&gt;
== Compile/Devel/Test ==&lt;br /&gt;
&lt;br /&gt;
Access is provided through the BGQ login node, '''&amp;lt;tt&amp;gt; bgqdev.scinet.utoronto.ca &amp;lt;/tt&amp;gt;''' using ssh, and from there you can proceed to the GPU development node '''&amp;lt;tt&amp;gt;sgc01-ib0&amp;lt;/tt&amp;gt;'''.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The filesystem is shared with the BGQ system.  See [https://wiki.scinet.utoronto.ca/wiki/index.php/BGQ#Filesystem here ] for details.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU cluster uses [https://slurm.schedmd.com/ SLURM ] as a job scheduler and jobs are scheduled by node, ie 20 cores and 4 GPUs each. Jobs are submitted from the development node '''&amp;lt;tt&amp;gt;sgc01&amp;lt;/tt&amp;gt;'''. The maximum walltime per job is 12 hours (except in the 'long' queue, see below) with up to 8 nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch myjob.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where myjob.script is &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can queury job information using&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
scancel $JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Longer jobs ===&lt;br /&gt;
&lt;br /&gt;
If your job takes more than 12 hours, the sbatch command will not let you submit your job.  There is, however, a way to have jobs up to 24 hours long, by specifying &amp;quot;-p long&amp;quot; as an option (i.e., add &amp;lt;tt&amp;gt;#SBATCH -p long&amp;lt;/tt&amp;gt; to your job script).  The priority of such jobs may be throttled in the future if we see that the 'long' queue is having a negative efffect on turnover time in the queue.&lt;br /&gt;
&lt;br /&gt;
=== Interactive ===&lt;br /&gt;
&lt;br /&gt;
For an interactive session use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
salloc --gres=gpu:4&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Automatic Re-submission and Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Commonly you may have a job that you know will take longer to run than what is permissible in the queue. As long as your program contains checkpoint or restart capability, you can have one job automatically submit the next. In the following example it is assumed that the program finishes before the time limit requested and then resubmits itself by logging into the development nodes.   Job dependencies and a maximum number of job re-submissions are used to ensure sequential operation.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
: ${job_number:=&amp;quot;1&amp;quot;}           # set job_nubmer to 1 if it is undefined&lt;br /&gt;
job_number_max=3&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;hi from ${SLURM_JOB_ID}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
#RUN JOB HERE&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# SUBMIT NEXT JOB&lt;br /&gt;
if [[ ${job_number} -lt ${job_number_max} ]]&lt;br /&gt;
then&lt;br /&gt;
  (( job_number++ ))&lt;br /&gt;
  next_jobid=$(ssh sgc01-ib0 &amp;quot;cd $SLURM_SUBMIT_DIR; /opt/slurm/bin/sbatch --export=job_number=${job_number} -d afterok:${SLURM_JOB_ID} thisscript.sh | awk '{print $4}'&amp;quot;)&lt;br /&gt;
  echo &amp;quot;submitted ${next_jobid}&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
 &lt;br /&gt;
sleep 15&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;${SLURM_JOB_ID} done&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Software Installed ==&lt;br /&gt;
&lt;br /&gt;
==== GNU Compilers ====&lt;br /&gt;
&lt;br /&gt;
More recent versions of the GNU Compiler Collection (C/C++/Fortran) are provided in the IBM Advanced Toolchain with enhancements for the POWER8 CPU. To load the newer advance toolchain version use:&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V10.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/6.3.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V11.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/7.2.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More information about the IBM Advanced Toolchain can be found here: [https://developer.ibm.com/linuxonpower/advance-toolchain/ https://developer.ibm.com/linuxonpower/advance-toolchain/]&lt;br /&gt;
&lt;br /&gt;
==== IBM XL Compilers ====&lt;br /&gt;
&lt;br /&gt;
To load the native IBM xlc/xlc++ and xlf (Fortran) compilers, run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load xlc/13.1.5&lt;br /&gt;
module load xlf/15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Information about the IBM XL Compilers can be found at the following links:&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSXVZZ_13.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL C/C++]&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSAT4T_15.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL Fortran]&lt;br /&gt;
&lt;br /&gt;
==== NVIDIA GPU Driver Version ====&lt;br /&gt;
&lt;br /&gt;
The current NVIDIA driver version is 384.66&lt;br /&gt;
&lt;br /&gt;
==== CUDA ====&lt;br /&gt;
&lt;br /&gt;
The current installed CUDA Tookits is are version 8.0 and version 9.0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The CUDA driver is installed locally, however the CUDA Toolkit is installed in:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/cuda-8.0&lt;br /&gt;
/usr/local/cuda-9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the &amp;lt;tt&amp;gt;/usr/local/cuda&amp;lt;/tt&amp;gt; directory is linked to the &amp;lt;tt&amp;gt;/usr/local/cuda-9.0&amp;lt;/tt&amp;gt; directory.&lt;br /&gt;
&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
Currently OpenMPI has been setup on the 14 nodes connected over EDR Infiniband.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi/2.1.1-gcc-5.4.0&lt;br /&gt;
$ module load openmpi/2.1.1-XL-13_15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== IBM PowerAI ===&lt;br /&gt;
&lt;br /&gt;
The PowerAI platform contains popular open machine learning frameworks such as Caffe, Tensorflow, and Torch. Run the &amp;lt;tt&amp;gt;module avail&amp;lt;/tt&amp;gt; command for a complete listing. More information is available at this link: https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/. Release 4.0 is currently installed.&lt;/div&gt;</summary>
		<author><name>Wagnerse</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9072</id>
		<title>SOSCIP GPU</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9072"/>
		<updated>2017-11-23T16:15:40Z</updated>

		<summary type="html">&lt;p&gt;Wagnerse: /* CUDA */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:S882lc.png|center|300px|thumb]]&lt;br /&gt;
|name=SOSCIP GPU &lt;br /&gt;
|installed=September 2017&lt;br /&gt;
|operatingsystem= Ubuntu 16.04 le &lt;br /&gt;
|loginnode= sgc01 &lt;br /&gt;
|nnodes= 14x Power 8 with  4x NVIDIA P100&lt;br /&gt;
|rampernode=512 GB&lt;br /&gt;
|corespernode= 2 x 10core (20 physical, 160 SMT)&lt;br /&gt;
|interconnect=Infiniband EDR &lt;br /&gt;
|vendorcompilers=xlc/xlf, nvcc&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== SOSCIP ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster is a Southern Ontario Smart Computing Innovation Platform ([http://soscip.org/ SOSCIP]) resource located at theUniversity of Toronto's SciNet HPC facility. The SOSCIP  multi-university/industry consortium is funded by the Ontario Government and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:soscip-support@scinet.utoronto.ca &amp;lt;soscip-support@scinet.utoronto.ca&amp;gt;] for SOSCIP GPU specific inquiries.&lt;br /&gt;
&lt;br /&gt;
== Specifications==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster consists of  of 14 IBM Power 822LC &amp;quot;Minsky&amp;quot; Servers each with 2x10core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 20 physical cores to support up to 160 threads.  Each node has 4x NVIDIA Tesla P100 GPUs each with 16GB of RAM with CUDA Capability 6.0 (Pascal) connected using NVlink.&lt;br /&gt;
&lt;br /&gt;
== Compile/Devel/Test ==&lt;br /&gt;
&lt;br /&gt;
Access is provided through the BGQ login node, '''&amp;lt;tt&amp;gt; bgqdev.scinet.utoronto.ca &amp;lt;/tt&amp;gt;''' using ssh, and from there you can proceed to the GPU development node '''&amp;lt;tt&amp;gt;sgc01-ib0&amp;lt;/tt&amp;gt;'''.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The filesystem is shared with the BGQ system.  See [https://wiki.scinet.utoronto.ca/wiki/index.php/BGQ#Filesystem here ] for details.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU cluster uses [https://slurm.schedmd.com/ SLURM ] as a job scheduler and jobs are scheduled by node, ie 20 cores and 4 GPUs each. Jobs are submitted from the development node '''&amp;lt;tt&amp;gt;sgc01&amp;lt;/tt&amp;gt;'''. The maximum walltime per job is 12 hours (except in the 'long' queue, see below) with up to 8 nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch myjob.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where myjob.script is &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can queury job information using&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
scancel $JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Longer jobs ===&lt;br /&gt;
&lt;br /&gt;
If your job takes more than 12 hours, the sbatch command will not let you submit your job.  There is, however, a way to have jobs up to 24 hours long, by specifying &amp;quot;-p long&amp;quot; as an option (i.e., add &amp;lt;tt&amp;gt;#SBATCH -p long&amp;lt;/tt&amp;gt; to your job script).  The priority of such jobs may be throttled in the future if we see that the 'long' queue is having a negative efffect on turnover time in the queue.&lt;br /&gt;
&lt;br /&gt;
=== Interactive ===&lt;br /&gt;
&lt;br /&gt;
For an interactive session use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
salloc --gres=gpu:4&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Automatic Re-submission and Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Commonly you may have a job that you know will take longer to run than what is permissible in the queue. As long as your program contains checkpoint or restart capability, you can have one job automatically submit the next. In the following example it is assumed that the program finishes before the time limit requested and then resubmits itself by logging into the development nodes.   Job dependencies and a maximum number of job re-submissions are used to ensure sequential operation.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
: ${job_number:=&amp;quot;1&amp;quot;}           # set job_nubmer to 1 if it is undefined&lt;br /&gt;
job_number_max=3&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;hi from ${SLURM_JOB_ID}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
#RUN JOB HERE&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# SUBMIT NEXT JOB&lt;br /&gt;
if [[ ${job_number} -lt ${job_number_max} ]]&lt;br /&gt;
then&lt;br /&gt;
  (( job_number++ ))&lt;br /&gt;
  next_jobid=$(ssh sgc01-ib0 &amp;quot;cd $SLURM_SUBMIT_DIR; /opt/slurm/bin/sbatch --export=job_number=${job_number} -d afterok:${SLURM_JOB_ID} thisscript.sh | awk '{print $4}'&amp;quot;)&lt;br /&gt;
  echo &amp;quot;submitted ${next_jobid}&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
 &lt;br /&gt;
sleep 15&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;${SLURM_JOB_ID} done&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Software ==&lt;br /&gt;
&lt;br /&gt;
==== GNU Compilers ====&lt;br /&gt;
&lt;br /&gt;
More recent versions of the GNU Compiler Collection (C/C++/Fortran) are provided in the IBM Advanced Toolchain with enhancements for the POWER8 CPU. To load the newer advance toolchain version use:&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V10.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/6.3.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Advanced Toolchain V11.0&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/7.2.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More information about the IBM Advanced Toolchain can be found here: [https://developer.ibm.com/linuxonpower/advance-toolchain/ https://developer.ibm.com/linuxonpower/advance-toolchain/]&lt;br /&gt;
&lt;br /&gt;
==== IBM XL Compilers ====&lt;br /&gt;
&lt;br /&gt;
To load the native IBM xlc/xlc++ and xlf (Fortran) compilers, run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load xlc/13.1.5&lt;br /&gt;
module load xlf/15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Information about the IBM XL Compilers can be found at the following links:&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSXVZZ_13.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL C/C++]&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSAT4T_15.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL Fortran]&lt;br /&gt;
&lt;br /&gt;
==== NVIDIA GPU Driver Version ====&lt;br /&gt;
&lt;br /&gt;
The current NVIDIA driver version is 384.66&lt;br /&gt;
&lt;br /&gt;
==== CUDA ====&lt;br /&gt;
&lt;br /&gt;
The current installed CUDA Tookits is are version 8.0 and version 9.0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The CUDA driver is installed locally, however the CUDA Toolkit is installed in:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/cuda-8.0&lt;br /&gt;
/usr/local/cuda-9.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that the &amp;lt;tt&amp;gt;/usr/local/cuda&amp;lt;/tt&amp;gt; directory is linked to the &amp;lt;tt&amp;gt;/usr/local/cuda-9.0&amp;lt;/tt&amp;gt; directory.&lt;br /&gt;
&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
Currently OpenMPI has been setup on the 14 nodes connected over EDR Infiniband.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi/2.1.1-gcc-5.4.0&lt;br /&gt;
$ module load openmpi/2.1.1-XL-13_15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== IBM PowerAI ===&lt;br /&gt;
&lt;br /&gt;
The PowerAI platform contains popular open machine learning frameworks such as Caffe, Tensorflow, and Torch. Run the &amp;lt;tt&amp;gt;module avail&amp;lt;/tt&amp;gt; command for a complete listing. More information is available at this link: https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/. Release 4.0 is currently installed.&lt;/div&gt;</summary>
		<author><name>Wagnerse</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9068</id>
		<title>SOSCIP GPU</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9068"/>
		<updated>2017-11-21T21:38:37Z</updated>

		<summary type="html">&lt;p&gt;Wagnerse: /* Driver Version */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:S882lc.png|center|300px|thumb]]&lt;br /&gt;
|name=SOSCIP GPU &lt;br /&gt;
|installed=September 2017&lt;br /&gt;
|operatingsystem= Ubuntu 16.04 le &lt;br /&gt;
|loginnode= sgc01 &lt;br /&gt;
|nnodes= 14x Power 8 with  4x NVIDIA P100&lt;br /&gt;
|rampernode=512 GB&lt;br /&gt;
|corespernode= 2 x 10core (20 physical, 160 SMT)&lt;br /&gt;
|interconnect=Infiniband EDR &lt;br /&gt;
|vendorcompilers=xlc/xlf, nvcc&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== SOSCIP ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster is a Southern Ontario Smart Computing Innovation Platform ([http://soscip.org/ SOSCIP]) resource located at theUniversity of Toronto's SciNet HPC facility. The SOSCIP  multi-university/industry consortium is funded by the Ontario Government and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:soscip-support@scinet.utoronto.ca &amp;lt;soscip-support@scinet.utoronto.ca&amp;gt;] for SOSCIP GPU specific inquiries.&lt;br /&gt;
&lt;br /&gt;
== Specifications==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster consists of  of 14 IBM Power 822LC &amp;quot;Minsky&amp;quot; Servers each with 2x10core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 20 physical cores to support up to 160 threads.  Each node has 4x NVIDIA Tesla P100 GPUs each with 16GB of RAM with CUDA Capability 6.0 (Pascal) connected using NVlink.&lt;br /&gt;
&lt;br /&gt;
== Compile/Devel/Test ==&lt;br /&gt;
&lt;br /&gt;
Access is provided through the BGQ login node, '''&amp;lt;tt&amp;gt; bgqdev.scinet.utoronto.ca &amp;lt;/tt&amp;gt;''' using ssh, and from there you can proceed to the GPU development node '''&amp;lt;tt&amp;gt;sgc01-ib0&amp;lt;/tt&amp;gt;'''.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The filesystem is shared with the BGQ system.  See [https://wiki.scinet.utoronto.ca/wiki/index.php/BGQ#Filesystem here ] for details.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU cluster uses [https://slurm.schedmd.com/ SLURM ] as a job scheduler and jobs are scheduled by node, ie 20 cores and 4 GPUs each. Jobs are submitted from the development node '''&amp;lt;tt&amp;gt;sgc01&amp;lt;/tt&amp;gt;'''. The maximum walltime per job is 12 hours (except in the 'long' queue, see below) with up to 8 nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch myjob.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where myjob.script is &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can queury job information using&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
scancel $JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Longer jobs ===&lt;br /&gt;
&lt;br /&gt;
If your job takes more than 12 hours, the sbatch command will not let you submit your job.  There is, however, a way to have jobs up to 24 hours long, by specifying &amp;quot;-p long&amp;quot; as an option (i.e., add &amp;lt;tt&amp;gt;#SBATCH -p long&amp;lt;/tt&amp;gt; to your job script).  The priority of such jobs may be throttled in the future if we see that the 'long' queue is having a negative efffect on turnover time in the queue.&lt;br /&gt;
&lt;br /&gt;
=== Interactive ===&lt;br /&gt;
&lt;br /&gt;
For an interactive session use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
salloc --gres=gpu:4&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Automatic Re-submission and Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Commonly you may have a job that you know will take longer to run than what is permissible in the queue. As long as your program contains checkpoint or restart capability, you can have one job automatically submit the next. In the following example it is assumed that the program finishes before the time limit requested and then resubmits itself by logging into the development nodes.   Job dependencies and a maximum number of job re-submissions are used to ensure sequential operation.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
: ${job_number:=&amp;quot;1&amp;quot;}           # set job_nubmer to 1 if it is undefined&lt;br /&gt;
job_number_max=3&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;hi from ${SLURM_JOB_ID}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
#RUN JOB HERE&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# SUBMIT NEXT JOB&lt;br /&gt;
if [[ ${job_number} -lt ${job_number_max} ]]&lt;br /&gt;
then&lt;br /&gt;
  (( job_number++ ))&lt;br /&gt;
  next_jobid=$(ssh sgc01-ib0 &amp;quot;cd $SLURM_SUBMIT_DIR; /opt/slurm/bin/sbatch --export=job_number=${job_number} -d afterok:${SLURM_JOB_ID} thisscript.sh | awk '{print $4}'&amp;quot;)&lt;br /&gt;
  echo &amp;quot;submitted ${next_jobid}&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
 &lt;br /&gt;
sleep 15&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;${SLURM_JOB_ID} done&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Software ==&lt;br /&gt;
&lt;br /&gt;
==== GNU Compilers ====&lt;br /&gt;
&lt;br /&gt;
More recent versions of the GNU Compiler Collection (C/C++/Fortran) are provided in the IBM Advanced Toolchain with enhancements for the POWER8 CPU. To load the newer advance toolchain version use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/6.3.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More information about the IBM Advanced Toolchain can be found here: [https://developer.ibm.com/linuxonpower/advance-toolchain/ https://developer.ibm.com/linuxonpower/advance-toolchain/]&lt;br /&gt;
&lt;br /&gt;
==== IBM XL Compilers ====&lt;br /&gt;
&lt;br /&gt;
To load the native IBM xlc/xlc++ and xlf (Fortran) compilers, run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load xlc/13.1.5&lt;br /&gt;
module load xlf/15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Information about the IBM XL Compilers can be found at the following links:&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSXVZZ_13.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL C/C++]&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSAT4T_15.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL Fortran]&lt;br /&gt;
&lt;br /&gt;
==== NVIDIA GPU Driver Version ====&lt;br /&gt;
&lt;br /&gt;
The current NVIDIA driver version is 384.66&lt;br /&gt;
&lt;br /&gt;
==== CUDA ====&lt;br /&gt;
&lt;br /&gt;
The current installed CUDA Tookit is 8.0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The CUDA driver is installed locally, however the CUDA Toolkit is installed in:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/cuda-8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
Currently OpenMPI has been setup on the 14 nodes connected over EDR Infiniband.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi/2.1.1-gcc-5.4.0&lt;br /&gt;
$ module load openmpi/2.1.1-XL-13_15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== IBM PowerAI ===&lt;br /&gt;
&lt;br /&gt;
The PowerAI platform contains popular open machine learning frameworks such as Caffe, Tensorflow, and Torch. Run the &amp;lt;tt&amp;gt;module avail&amp;lt;/tt&amp;gt; command for a complete listing. More information is available at this link: https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/. Release 4.0 is currently installed.&lt;/div&gt;</summary>
		<author><name>Wagnerse</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9067</id>
		<title>SOSCIP GPU</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9067"/>
		<updated>2017-11-21T21:38:09Z</updated>

		<summary type="html">&lt;p&gt;Wagnerse: /* IBM Compilers */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:S882lc.png|center|300px|thumb]]&lt;br /&gt;
|name=SOSCIP GPU &lt;br /&gt;
|installed=September 2017&lt;br /&gt;
|operatingsystem= Ubuntu 16.04 le &lt;br /&gt;
|loginnode= sgc01 &lt;br /&gt;
|nnodes= 14x Power 8 with  4x NVIDIA P100&lt;br /&gt;
|rampernode=512 GB&lt;br /&gt;
|corespernode= 2 x 10core (20 physical, 160 SMT)&lt;br /&gt;
|interconnect=Infiniband EDR &lt;br /&gt;
|vendorcompilers=xlc/xlf, nvcc&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== SOSCIP ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster is a Southern Ontario Smart Computing Innovation Platform ([http://soscip.org/ SOSCIP]) resource located at theUniversity of Toronto's SciNet HPC facility. The SOSCIP  multi-university/industry consortium is funded by the Ontario Government and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:soscip-support@scinet.utoronto.ca &amp;lt;soscip-support@scinet.utoronto.ca&amp;gt;] for SOSCIP GPU specific inquiries.&lt;br /&gt;
&lt;br /&gt;
== Specifications==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster consists of  of 14 IBM Power 822LC &amp;quot;Minsky&amp;quot; Servers each with 2x10core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 20 physical cores to support up to 160 threads.  Each node has 4x NVIDIA Tesla P100 GPUs each with 16GB of RAM with CUDA Capability 6.0 (Pascal) connected using NVlink.&lt;br /&gt;
&lt;br /&gt;
== Compile/Devel/Test ==&lt;br /&gt;
&lt;br /&gt;
Access is provided through the BGQ login node, '''&amp;lt;tt&amp;gt; bgqdev.scinet.utoronto.ca &amp;lt;/tt&amp;gt;''' using ssh, and from there you can proceed to the GPU development node '''&amp;lt;tt&amp;gt;sgc01-ib0&amp;lt;/tt&amp;gt;'''.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The filesystem is shared with the BGQ system.  See [https://wiki.scinet.utoronto.ca/wiki/index.php/BGQ#Filesystem here ] for details.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU cluster uses [https://slurm.schedmd.com/ SLURM ] as a job scheduler and jobs are scheduled by node, ie 20 cores and 4 GPUs each. Jobs are submitted from the development node '''&amp;lt;tt&amp;gt;sgc01&amp;lt;/tt&amp;gt;'''. The maximum walltime per job is 12 hours (except in the 'long' queue, see below) with up to 8 nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch myjob.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where myjob.script is &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can queury job information using&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
scancel $JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Longer jobs ===&lt;br /&gt;
&lt;br /&gt;
If your job takes more than 12 hours, the sbatch command will not let you submit your job.  There is, however, a way to have jobs up to 24 hours long, by specifying &amp;quot;-p long&amp;quot; as an option (i.e., add &amp;lt;tt&amp;gt;#SBATCH -p long&amp;lt;/tt&amp;gt; to your job script).  The priority of such jobs may be throttled in the future if we see that the 'long' queue is having a negative efffect on turnover time in the queue.&lt;br /&gt;
&lt;br /&gt;
=== Interactive ===&lt;br /&gt;
&lt;br /&gt;
For an interactive session use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
salloc --gres=gpu:4&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Automatic Re-submission and Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Commonly you may have a job that you know will take longer to run than what is permissible in the queue. As long as your program contains checkpoint or restart capability, you can have one job automatically submit the next. In the following example it is assumed that the program finishes before the time limit requested and then resubmits itself by logging into the development nodes.   Job dependencies and a maximum number of job re-submissions are used to ensure sequential operation.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
: ${job_number:=&amp;quot;1&amp;quot;}           # set job_nubmer to 1 if it is undefined&lt;br /&gt;
job_number_max=3&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;hi from ${SLURM_JOB_ID}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
#RUN JOB HERE&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# SUBMIT NEXT JOB&lt;br /&gt;
if [[ ${job_number} -lt ${job_number_max} ]]&lt;br /&gt;
then&lt;br /&gt;
  (( job_number++ ))&lt;br /&gt;
  next_jobid=$(ssh sgc01-ib0 &amp;quot;cd $SLURM_SUBMIT_DIR; /opt/slurm/bin/sbatch --export=job_number=${job_number} -d afterok:${SLURM_JOB_ID} thisscript.sh | awk '{print $4}'&amp;quot;)&lt;br /&gt;
  echo &amp;quot;submitted ${next_jobid}&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
 &lt;br /&gt;
sleep 15&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;${SLURM_JOB_ID} done&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Software ==&lt;br /&gt;
&lt;br /&gt;
==== GNU Compilers ====&lt;br /&gt;
&lt;br /&gt;
More recent versions of the GNU Compiler Collection (C/C++/Fortran) are provided in the IBM Advanced Toolchain with enhancements for the POWER8 CPU. To load the newer advance toolchain version use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/6.3.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More information about the IBM Advanced Toolchain can be found here: [https://developer.ibm.com/linuxonpower/advance-toolchain/ https://developer.ibm.com/linuxonpower/advance-toolchain/]&lt;br /&gt;
&lt;br /&gt;
==== IBM XL Compilers ====&lt;br /&gt;
&lt;br /&gt;
To load the native IBM xlc/xlc++ and xlf (Fortran) compilers, run&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load xlc/13.1.5&lt;br /&gt;
module load xlf/15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Information about the IBM XL Compilers can be found at the following links:&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSXVZZ_13.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL C/C++]&lt;br /&gt;
&lt;br /&gt;
[https://www.ibm.com/support/knowledgecenter/SSAT4T_15.1.5/com.ibm.compilers.linux.doc/welcome.html IBM XL Fortran]&lt;br /&gt;
&lt;br /&gt;
==== Driver Version ====&lt;br /&gt;
&lt;br /&gt;
The current NVIDIA driver version is 384.66&lt;br /&gt;
&lt;br /&gt;
==== CUDA ====&lt;br /&gt;
&lt;br /&gt;
The current installed CUDA Tookit is 8.0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The CUDA driver is installed locally, however the CUDA Toolkit is installed in:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/cuda-8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
Currently OpenMPI has been setup on the 14 nodes connected over EDR Infiniband.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi/2.1.1-gcc-5.4.0&lt;br /&gt;
$ module load openmpi/2.1.1-XL-13_15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== IBM PowerAI ===&lt;br /&gt;
&lt;br /&gt;
The PowerAI platform contains popular open machine learning frameworks such as Caffe, Tensorflow, and Torch. Run the &amp;lt;tt&amp;gt;module avail&amp;lt;/tt&amp;gt; command for a complete listing. More information is available at this link: https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/. Release 4.0 is currently installed.&lt;/div&gt;</summary>
		<author><name>Wagnerse</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9066</id>
		<title>SOSCIP GPU</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9066"/>
		<updated>2017-11-21T21:35:12Z</updated>

		<summary type="html">&lt;p&gt;Wagnerse: /* GNU Compilers */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:S882lc.png|center|300px|thumb]]&lt;br /&gt;
|name=SOSCIP GPU &lt;br /&gt;
|installed=September 2017&lt;br /&gt;
|operatingsystem= Ubuntu 16.04 le &lt;br /&gt;
|loginnode= sgc01 &lt;br /&gt;
|nnodes= 14x Power 8 with  4x NVIDIA P100&lt;br /&gt;
|rampernode=512 GB&lt;br /&gt;
|corespernode= 2 x 10core (20 physical, 160 SMT)&lt;br /&gt;
|interconnect=Infiniband EDR &lt;br /&gt;
|vendorcompilers=xlc/xlf, nvcc&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== SOSCIP ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster is a Southern Ontario Smart Computing Innovation Platform ([http://soscip.org/ SOSCIP]) resource located at theUniversity of Toronto's SciNet HPC facility. The SOSCIP  multi-university/industry consortium is funded by the Ontario Government and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:soscip-support@scinet.utoronto.ca &amp;lt;soscip-support@scinet.utoronto.ca&amp;gt;] for SOSCIP GPU specific inquiries.&lt;br /&gt;
&lt;br /&gt;
== Specifications==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster consists of  of 14 IBM Power 822LC &amp;quot;Minsky&amp;quot; Servers each with 2x10core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 20 physical cores to support up to 160 threads.  Each node has 4x NVIDIA Tesla P100 GPUs each with 16GB of RAM with CUDA Capability 6.0 (Pascal) connected using NVlink.&lt;br /&gt;
&lt;br /&gt;
== Compile/Devel/Test ==&lt;br /&gt;
&lt;br /&gt;
Access is provided through the BGQ login node, '''&amp;lt;tt&amp;gt; bgqdev.scinet.utoronto.ca &amp;lt;/tt&amp;gt;''' using ssh, and from there you can proceed to the GPU development node '''&amp;lt;tt&amp;gt;sgc01-ib0&amp;lt;/tt&amp;gt;'''.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The filesystem is shared with the BGQ system.  See [https://wiki.scinet.utoronto.ca/wiki/index.php/BGQ#Filesystem here ] for details.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU cluster uses [https://slurm.schedmd.com/ SLURM ] as a job scheduler and jobs are scheduled by node, ie 20 cores and 4 GPUs each. Jobs are submitted from the development node '''&amp;lt;tt&amp;gt;sgc01&amp;lt;/tt&amp;gt;'''. The maximum walltime per job is 12 hours (except in the 'long' queue, see below) with up to 8 nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch myjob.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where myjob.script is &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can queury job information using&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
scancel $JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Longer jobs ===&lt;br /&gt;
&lt;br /&gt;
If your job takes more than 12 hours, the sbatch command will not let you submit your job.  There is, however, a way to have jobs up to 24 hours long, by specifying &amp;quot;-p long&amp;quot; as an option (i.e., add &amp;lt;tt&amp;gt;#SBATCH -p long&amp;lt;/tt&amp;gt; to your job script).  The priority of such jobs may be throttled in the future if we see that the 'long' queue is having a negative efffect on turnover time in the queue.&lt;br /&gt;
&lt;br /&gt;
=== Interactive ===&lt;br /&gt;
&lt;br /&gt;
For an interactive session use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
salloc --gres=gpu:4&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Automatic Re-submission and Job Dependencies ===&lt;br /&gt;
&lt;br /&gt;
Commonly you may have a job that you know will take longer to run than what is permissible in the queue. As long as your program contains checkpoint or restart capability, you can have one job automatically submit the next. In the following example it is assumed that the program finishes before the time limit requested and then resubmits itself by logging into the development nodes.   Job dependencies and a maximum number of job re-submissions are used to ensure sequential operation.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash &lt;br /&gt;
&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
: ${job_number:=&amp;quot;1&amp;quot;}           # set job_nubmer to 1 if it is undefined&lt;br /&gt;
job_number_max=3&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;hi from ${SLURM_JOB_ID}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
#RUN JOB HERE&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# SUBMIT NEXT JOB&lt;br /&gt;
if [[ ${job_number} -lt ${job_number_max} ]]&lt;br /&gt;
then&lt;br /&gt;
  (( job_number++ ))&lt;br /&gt;
  next_jobid=$(ssh sgc01-ib0 &amp;quot;cd $SLURM_SUBMIT_DIR; /opt/slurm/bin/sbatch --export=job_number=${job_number} -d afterok:${SLURM_JOB_ID} thisscript.sh | awk '{print $4}'&amp;quot;)&lt;br /&gt;
  echo &amp;quot;submitted ${next_jobid}&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
 &lt;br /&gt;
sleep 15&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;${SLURM_JOB_ID} done&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Software ==&lt;br /&gt;
&lt;br /&gt;
==== GNU Compilers ====&lt;br /&gt;
&lt;br /&gt;
More recent versions of the GNU Compiler Collection (C/C++/Fortran) are provided in the IBM Advanced Toolchain with enhancements for the POWER8 CPU. To load the newer advance toolchain version use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/6.3.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More information about the IBM Advanced Toolchain can be found here: [https://developer.ibm.com/linuxonpower/advance-toolchain/ https://developer.ibm.com/linuxonpower/advance-toolchain/]&lt;br /&gt;
&lt;br /&gt;
==== IBM Compilers ====&lt;br /&gt;
&lt;br /&gt;
To load the native IBM xlc/xlc++ compilers&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load xlc/13.1.5&lt;br /&gt;
module load xlf/15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Driver Version ====&lt;br /&gt;
&lt;br /&gt;
The current NVIDIA driver version is 384.66&lt;br /&gt;
&lt;br /&gt;
==== CUDA ====&lt;br /&gt;
&lt;br /&gt;
The current installed CUDA Tookit is 8.0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The CUDA driver is installed locally, however the CUDA Toolkit is installed in:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/cuda-8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
Currently OpenMPI has been setup on the 14 nodes connected over EDR Infiniband.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi/2.1.1-gcc-5.4.0&lt;br /&gt;
$ module load openmpi/2.1.1-XL-13_15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== IBM PowerAI ===&lt;br /&gt;
&lt;br /&gt;
The PowerAI platform contains popular open machine learning frameworks such as Caffe, Tensorflow, and Torch. Run the &amp;lt;tt&amp;gt;module avail&amp;lt;/tt&amp;gt; command for a complete listing. More information is available at this link: https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/. Release 4.0 is currently installed.&lt;/div&gt;</summary>
		<author><name>Wagnerse</name></author>
	</entry>
	<entry>
		<id>https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9037</id>
		<title>SOSCIP GPU</title>
		<link rel="alternate" type="text/html" href="https://oldwiki.scinet.utoronto.ca/index.php?title=SOSCIP_GPU&amp;diff=9037"/>
		<updated>2017-10-10T18:40:56Z</updated>

		<summary type="html">&lt;p&gt;Wagnerse: /* IBM PowerAI */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:S882lc.png|center|300px|thumb]]&lt;br /&gt;
|name=SOSCIP GPU &lt;br /&gt;
|installed=September 2017&lt;br /&gt;
|operatingsystem= Ubuntu 16.04 le &lt;br /&gt;
|loginnode= sgc01 &lt;br /&gt;
|nnodes= 14x Power 8 with  4x NVIDIA P100&lt;br /&gt;
|rampernode=512 GB&lt;br /&gt;
|corespernode= 2 x 10core (20 physical, 160 SMT)&lt;br /&gt;
|interconnect=Infiniband EDR &lt;br /&gt;
|vendorcompilers=xlc/xlf, nvcc&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== SOSCIP ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster is a Southern Ontario Smart Computing Innovation Platform ([http://soscip.org/ SOSCIP]) resource located at theUniversity of Toronto's SciNet HPC facility. The SOSCIP  multi-university/industry consortium is funded by the Ontario Government and the Federal Economic Development Agency for Southern Ontario [http://www.research.utoronto.ca/about/our-research-partners/soscip/].&lt;br /&gt;
&lt;br /&gt;
== Support Email ==&lt;br /&gt;
&lt;br /&gt;
Please use [mailto:soscip-support@scinet.utoronto.ca &amp;lt;soscip-support@scinet.utoronto.ca&amp;gt;] for SOSCIP GPU specific inquiries.&lt;br /&gt;
&lt;br /&gt;
== Specifications==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU Cluster consists of  of 14 IBM Power 822LC &amp;quot;Minsky&amp;quot; Servers each with 2x10core 3.25GHz Power8 CPUs and 512GB Ram. Similar to Power 7, the Power 8 utilizes Simultaneous MultiThreading (SMT), but extends the design to 8 threads per core allowing the 20 physical cores to support up to 160 threads.  Each node has 4x NVIDIA Tesla P100 GPUs each with 16GB of RAM with CUDA Capability 6.0 (Pascal) connected using NVlink.&lt;br /&gt;
&lt;br /&gt;
== Compile/Devel/Test ==&lt;br /&gt;
&lt;br /&gt;
Access is provided through the BGQ login node, '''&amp;lt;tt&amp;gt; bgqdev.scinet.utoronto.ca &amp;lt;/tt&amp;gt;''' using ssh, and from there you can proceed to the GPU development node '''&amp;lt;tt&amp;gt;sgc01-ib0&amp;lt;/tt&amp;gt;'''.&lt;br /&gt;
&lt;br /&gt;
== Filesystem ==&lt;br /&gt;
&lt;br /&gt;
The filesystem is shared with the BGQ system.  See [https://wiki.scinet.utoronto.ca/wiki/index.php/BGQ#Filesystem here ] for details.&lt;br /&gt;
&lt;br /&gt;
== Job Submission ==&lt;br /&gt;
&lt;br /&gt;
The SOSCIP GPU cluster uses [https://slurm.schedmd.com/ SLURM ] as a job scheduler and jobs are scheduled by node, ie 20 cores and 4 GPUs each. Jobs are submitted from the development node '''&amp;lt;tt&amp;gt;sgc01&amp;lt;/tt&amp;gt;'''. The maximum walltime per job is 12 hours (except in the 'long' queue, see below) with up to 8 nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch myjob.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where myjob.script is &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1 &lt;br /&gt;
#SBATCH --ntasks=20  # MPI tasks (needed for srun) &lt;br /&gt;
#SBATCH --time=00:10:00  # H:M:S&lt;br /&gt;
#SBATCH --gres=gpu:4     # Ask for 4 GPUs per node&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
hostname&lt;br /&gt;
nvidia-smi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can queury job information using&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To cancel a job use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
scancel $JOBID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Longer jobs ===&lt;br /&gt;
&lt;br /&gt;
If your job takes more than 12 hours, the sbatch command will not let you submit your job.  There is, however, a way to have jobs up to 24 hours long, by specifying &amp;quot;-p long&amp;quot; as an option (i.e., add &amp;lt;tt&amp;gt;#SBATCH -p long&amp;lt;/tt&amp;gt; to your job script).  The priority of such jobs may be throttled in the future if we see that the 'long' queue is having a negative efffect on turnover time in the queue.&lt;br /&gt;
&lt;br /&gt;
=== Interactive ===&lt;br /&gt;
&lt;br /&gt;
For an interactive session use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
salloc --gres=gpu:4&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Software ==&lt;br /&gt;
&lt;br /&gt;
==== GNU Compilers ====&lt;br /&gt;
&lt;br /&gt;
To load the newer advance toolchain version use:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load gcc/6.3.1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== IBM Compilers ====&lt;br /&gt;
&lt;br /&gt;
To load the native IBM xlc/xlc++ compilers&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load xlc/13.1.5&lt;br /&gt;
module load xlf/15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Driver Version ====&lt;br /&gt;
&lt;br /&gt;
The current NVIDIA driver version is 384.66&lt;br /&gt;
&lt;br /&gt;
==== CUDA ====&lt;br /&gt;
&lt;br /&gt;
The current installed CUDA Tookit is 8.0&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load cuda/8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The CUDA driver is installed locally, however the CUDA Toolkit is installed in:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr/local/cuda-8.0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
Currently OpenMPI has been setup on the 14 nodes connected over EDR Infiniband.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load openmpi/2.1.1-gcc-5.4.0&lt;br /&gt;
$ module load openmpi/2.1.1-XL-13_15.1.5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== IBM PowerAI ===&lt;br /&gt;
&lt;br /&gt;
The PowerAI platform contains popular open machine learning frameworks such as Caffe, Tensorflow, and Torch. Run the &amp;lt;tt&amp;gt;module avail&amp;lt;/tt&amp;gt; command for a complete listing. More information is available at this link: https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/. Release 4.0 is currently installed.&lt;/div&gt;</summary>
		<author><name>Wagnerse</name></author>
	</entry>
</feed>