Difference between revisions of "Phi"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
m
 
(45 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
{| style="border-spacing: 8px; width:100%"
 +
| valign="top" style="cellpadding:1em; padding:1em; border:2px solid; background-color:#f6f674; border-radius:5px"|
 +
'''WARNING: SciNet is in the process of replacing this wiki with a new documentation site. For current information, please go to [https://docs.scinet.utoronto.ca https://docs.scinet.utoronto.ca]'''
 +
|}
 +
 
{{Infobox Computer
 
{{Infobox Computer
 
|image=[[Image:Xeon_phi.jpg|center|250px|thumb]][[Image:NVIDIA-Tesla-K20X.jpg|center|250px|thumb]]
 
|image=[[Image:Xeon_phi.jpg|center|250px|thumb]][[Image:NVIDIA-Tesla-K20X.jpg|center|250px|thumb]]
Line 4: Line 9:
 
|installed=April 2013
 
|installed=April 2013
 
|operatingsystem= Linux Centos 6.4
 
|operatingsystem= Linux Centos 6.4
|loginnode= arc09 (from <tt>arc01</tt>)
+
|loginnode= gravity01
 
|nnodes=1
 
|nnodes=1
 
|rampernode=32 GB
 
|rampernode=32 GB
Line 13: Line 18:
 
}}
 
}}
  
This is a single test/devel node, part of the [[Accelerator Research Cluster]], for investigating new accelerator technologies. It consists of a singele x86_64 nodes with one 8-core Intel Sandybridge Xeon   
+
This is a single test node, for investigating new accelerator technologies. It consists of a single x86_64 node with one 8-core Intel Sandybridge Xeon   
E5-2650 2.0GHz CPU with 32GB of RAM per node. It has a single NVIDIA Tesla K20 GPU with CUDA Capability 3.0 (Kepler) with 2496 CUDA Cores and 5 GB of RAM as well as a single Intel Xeon Phi 3120A with 57  
+
E5-2650 2.0GHz CPU with 32GB of RAM. It has a single NVIDIA Tesla K20 GPU with CUDA Capability 3.0 (Kepler) with 2496 CUDA Cores and 5 GB of RAM as well as a single Intel Xeon Phi 3120A with 57  
 
1.1 GHz cores and 6GB of RAM. The node is interconnected to the rest of the clusters with DDR Infiniband and mounts the regular SciNet GPFS filesystems.   
 
1.1 GHz cores and 6GB of RAM. The node is interconnected to the rest of the clusters with DDR Infiniband and mounts the regular SciNet GPFS filesystems.   
  
 
=== Login ===
 
=== Login ===
  
First login via ssh with your scinet account at <tt>login.scinet.utoronto.ca</tt>, and from there you can proceed to '''<tt>arc01</tt>''' which
+
First login via ssh with your scinet account at '''<tt>login.scinet.utoronto.ca</tt>''', and from there you can proceed to '''<tt>gravity01</tt>'''.
is the GPU development node and then to '''<tt>arc09</tt>'''.
 
  
Access to this machines is no enabled be default so please email support@scinet.utoronto.ca for access.
+
=== Queue ===
  
=== Devel/Compute ===
+
As this is a single node users are expected to use it in a "friendly" manner as this system is not setup for production
 +
usage, and primarily for investigating new technologies run times are limited to under 4 hours.
 +
To access the node you need to use the queue, similar to the standard ARC and GPC compute nodes,
 +
however with a maximum walltime of 4 hours.
  
As this is a single node there is no queue and users are expected to use it in a "friendly" manner.  This system is not setup for production
+
For an interactive job use
usage, and primarily for investigating new technologies so please keep your run times short. 
+
<pre>
 +
qsub -l nodes=1:ppn=8,walltime=1:00:00 -q arcX -I
 +
</pre>
  
 
== Software ==
 
== Software ==
  
The same software installed on the GPC is available on ARC using the same modules framework.  
+
The same software installed on the GPC is available on '''<tt>arcX</tt>''' using the modules framework.  
See [[GPC_Quickstart#Modules_and_Environment_Variables | here]] for full details.
+
See '''[[GPC_Quickstart#Modules_and_Environment_Variables | here]]''' for full details.
  
== NVIDIA K20 ==
+
== NVIDIA Tesla K20 ==
  
See [[ GPU_Devel_Nodes | ARC ]] wiki page for details of the available CUDA and OpenCL compilers and modules. To
+
See the '''[[ Gravity | Gravity ]]''' wiki page for full details of the available CUDA and OpenCL compilers and modules. To
use all the K20 features a minimum of CUDA 5.0 is required.
+
use all the K20 (Kepler) features a minimum of CUDA 5.0 is required. Cuda/6.5 is recommended for the K20.
  
 +
=== CUDA ===
 
<pre>
 
<pre>
module load cuda/5.0
+
module load gcc/4.8.1 cuda/6.5
 
</pre>
 
</pre>
 +
Here, gcc is loaded because it is a prerequisite of the cuda module.
 +
 +
You will have to let the cuda compiler know about the capabilities of the Kepler graphics card by supplying the flag
 +
<tt>-arch=sm_30</tt> or <tt>-arch=sm_35</tt>.
  
 
=== Driver Version ===
 
=== Driver Version ===
  
The current NVIDIA driver version for the K20 is 310.44
+
The current NVIDIA driver version for the K20 is 340.32
  
 
== Xeon Phi ==
 
== Xeon Phi ==
  
 
=== Compilers ===
 
=== Compilers ===
The Xeon Phi uses the standard intel compilers, however requires at least version 13.0
+
The Xeon Phi uses the standard intel compilers, however requires at least version 13.1
 +
 
 +
<pre>
 +
module load intel/14.0.0
 +
</pre>
 +
 
 +
=== MPI ===
 +
 
 +
IntelMPI also has Xeon Phi support
  
 
<pre>
 
<pre>
module load intel/13.1.1
+
module load intelmpi/4.1.1.036
 
</pre>
 
</pre>
  
 +
'''NOTE''': Be sure to use '''mpiifort''' for compiling native MIC Fortran code as the '''mpif77,mpif90''' scripts ignore the -mmic flags and will produce host only code.
  
 
=== Tools ===
 
=== Tools ===
Line 66: Line 89:
 
</pre>
 
</pre>
  
 +
=== OpenCL ===
 +
 +
OpenCL version 1.2 is available for the Xeon Phi on '''<tt>arcX</tt>'''
 +
 +
<pre>
 +
/opt/intel/opencl
 +
</pre>
  
 
=== Direct Access ===
 
=== Direct Access ===
  
The Xeon Phi can be accessed directly by  
+
The Xeon Phi can be accessed directly from the host node by  
  
 
<pre>
 
<pre>
Line 77: Line 107:
 
=== Shared Filesystem ===
 
=== Shared Filesystem ===
  
The host node arc09 mounts the standard SciNet filesystems, i.e. $HOME and $SCRATCH, however to share
+
The host node '''arc09''' mounts the standard SciNet filesystems, i.e. $HOME and $SCRATCH, however to share
files between the host and Xeon Phi use /localscratch/$HOME
+
files between the host and Xeon Phi use /localscratch/$HOME which shows up as $HOME on "mic0".
 +
 
 +
=== Useful Links ===
 +
 
 +
[http://software.intel.com/en-us/articles/building-a-native-application-for-intel-xeon-phi-coprocessors Building Native for MIC ]
 +
 
 +
[http://www.tacc.utexas.edu/user-services/user-guides/stampede-user-guide#mic TACC Stampede MIC Info ]

Latest revision as of 19:29, 31 August 2018

WARNING: SciNet is in the process of replacing this wiki with a new documentation site. For current information, please go to https://docs.scinet.utoronto.ca

Intel Xeon Phi / NVIDIA Tesla K20
Xeon phi.jpg
NVIDIA-Tesla-K20X.jpg
Installed April 2013
Operating System Linux Centos 6.4
Number of Nodes 1
Interconnect DDR Infiniband
Ram/Node 32 GB
Cores/Node 8 with Xeon Phi & K20
Login/Devel Node gravity01
Vendor Compilers nvcc,pgcc,icc,gcc
Queue Submission none

This is a single test node, for investigating new accelerator technologies. It consists of a single x86_64 node with one 8-core Intel Sandybridge Xeon E5-2650 2.0GHz CPU with 32GB of RAM. It has a single NVIDIA Tesla K20 GPU with CUDA Capability 3.0 (Kepler) with 2496 CUDA Cores and 5 GB of RAM as well as a single Intel Xeon Phi 3120A with 57 1.1 GHz cores and 6GB of RAM. The node is interconnected to the rest of the clusters with DDR Infiniband and mounts the regular SciNet GPFS filesystems.

Login

First login via ssh with your scinet account at login.scinet.utoronto.ca, and from there you can proceed to gravity01.

Queue

As this is a single node users are expected to use it in a "friendly" manner as this system is not setup for production usage, and primarily for investigating new technologies run times are limited to under 4 hours. To access the node you need to use the queue, similar to the standard ARC and GPC compute nodes, however with a maximum walltime of 4 hours.

For an interactive job use

qsub -l nodes=1:ppn=8,walltime=1:00:00 -q arcX -I

Software

The same software installed on the GPC is available on arcX using the modules framework. See here for full details.

NVIDIA Tesla K20

See the Gravity wiki page for full details of the available CUDA and OpenCL compilers and modules. To use all the K20 (Kepler) features a minimum of CUDA 5.0 is required. Cuda/6.5 is recommended for the K20.

CUDA

module load gcc/4.8.1 cuda/6.5

Here, gcc is loaded because it is a prerequisite of the cuda module.

You will have to let the cuda compiler know about the capabilities of the Kepler graphics card by supplying the flag -arch=sm_30 or -arch=sm_35.

Driver Version

The current NVIDIA driver version for the K20 is 340.32

Xeon Phi

Compilers

The Xeon Phi uses the standard intel compilers, however requires at least version 13.1

module load intel/14.0.0 

MPI

IntelMPI also has Xeon Phi support

module load intelmpi/4.1.1.036

NOTE: Be sure to use mpiifort for compiling native MIC Fortran code as the mpif77,mpif90 scripts ignore the -mmic flags and will produce host only code.

Tools

The Intel Cluters Tools such as vtune amplifier and inspector are available for the Xeon Phi by loading the following modules.

module load inteltools

OpenCL

OpenCL version 1.2 is available for the Xeon Phi on arcX

/opt/intel/opencl

Direct Access

The Xeon Phi can be accessed directly from the host node by

ssh mic0

Shared Filesystem

The host node arc09 mounts the standard SciNet filesystems, i.e. $HOME and $SCRATCH, however to share files between the host and Xeon Phi use /localscratch/$HOME which shows up as $HOME on "mic0".

Useful Links

Building Native for MIC

TACC Stampede MIC Info