Essentials

WARNING: SciNet is in the process of replacing this wiki with a new documentation site. For current information, please go to https://docs.scinet.utoronto.ca

SciNet Systems

SciNet is a consortium for High-Performance Computing consisting of researchers at the University of Toronto and its associated hospitals. It is one of seven such consortia in Canada, which together form Compute/Calcul Canada.

SciNet has two main clusters:

General Purpose Cluster (GPC)

The General Purpose Cluster is an extremely large cluster (ranked 16th in the world at its inception in June 2009, and the fastest in Canada) and is where most simulations are to be done at SciNet. It is an IBM iDataPlex cluster based on Intel's Nehalem architecture (one of the first in the world to make use of the new chips). The GPC consists of 3,780 nodes with a total of 30,240 2.5GHz cores, with 16GB RAM per node (2GB per core). Approximately one quarter of the cluster is interconnected with non-blocking DDR InfiniBand while the rest of the nodes are connected with 5:1 blocked QDR Infiniband. The compute nodes are accessed through a queuing system that allows jobs with a maximum wall time of 48 hours.

Tightly Coupled System (TCS)

The TCS is a cluster of `fat' (high-memory, many-core) POWER6 nodes on a very fast Infiniband connection, which was installed in January of 2009. It has a relatively small number of cores (~3000), arranged in nodes of 32 cores with 128 GB of RAM on each node. It is dedicated for jobs that require such a large memory / low latency configuration. Jobs need to use multiples of 32 cores (a node), and are submitted to the LoadLeveler queuing system which allows jobs with a maximum wall time of 48 hours.

Power 7 Linux Cluster (P7)

The P7 is a cluster of 5 (soon to be at least 8) `fat' (high-memory, many-core) POWER7 nodes on a very fast Infiniband connection, which was installed in May of 2011. It has a total of 160 cores, arranged in nodes of 32 cores with 128 GB of RAM on each node. It is dedicated for jobs that require such a large memory / low latency configuration. Jobs need to use multiples of 32 cores or more, and are submitted to the LoadLeveler queuing system which allows jobs with a maximum wall time of 48 hours.

Accelerator Research Cluster (ARC)

The ARC is a technology evaluation cluster with a combination of 14 IBM PowerXCell 8i "Cell" nodes and 8 Intel x86_64 "Nehalem" nodes containing 16 NVIDIA M2070 GPUs. The QS22's each have two 3.2GHz "IBM PowerXCell 8i CPU's, where each CPU has 1 Power Processing Unit (PPU) and 8 Synergistic Processing Units (SPU), and 32GB of RAM per node. The Intel nodes have two 2.53GHz 4core Xeon X5550 CPU's with 48GB of RAM per node each containing two NVIDIA M2070 (Fermi) GPU's each with 6GB of RAM.

Please note that this cluster is not a production cluster and is only accessible to selected users.

Access to the SciNet systems

Access to the SciNet systems is via SSH only. To use the GPC or TCS, first ssh to the data centre through login.scinet.utoronto.ca:

 ssh -l USER login.scinet.utoronto.ca

From here you can view your directories, see the queue on the GPC using showq, and log into one of four GPC development nodes, gpc01..gpc04, or either of the TCS development nodes, tcs01 or tcs02.

However, because the login nodes are used by everyone who needs to use the SciNet systems, be considerate; do not run scripts or programs that will take more than a few minutes or a few MB of memory on these systems.

Users can transfer small files (at most about 10GB) into or out of the datacentre via the login nodes, using scp, or rsync over ssh. Large data transfers, however, should be done via the datamover1 node. This node can initiate both incoming and outgoing transfers, and since it is on a 10 Gbps link to the University of Toronto, it is the fastest - and recommended - way to transfer data. Note that datamover1 is not accessible from the outside, so you must still login to login.scinet.utoronto.ca and then ssh to the data mover node. See the data transfer section of the Data Management page for more details.

Please talk to us at <support@scinet.utoronto.ca> if you need to do very large file transfers, Note that the login machines are not the same architecture as either the GPC or TCS nodes; you should not compile programs on the login machines that you expect to use on the GPC or TCS clusters.

Note also that access to the TCS is not enabled by default. We ask that people justify the need for this highly specialized machine. Contact us explaining the nature of your work if you want access to the TCS. In particular, applications should scale well to 64 processes/threads to run on this system.

SciNet Firewall

Important note about logging in: The SciNet firewall monitors for too many attempted connections, and will shut down all access (including previously working connections) from your IP address if more than four connection attempts (successful or not) are made within the space of a few minutes. In that case, you will be locked out of the system for an hour. Be patient in attempting new logins!

Default Limits on the SciNet systems

The default allocation on the SciNet GPC cluster allows a research group to use a maximum of 32 nodes at a time for 48 hours per job with no more than 32 individual jobs at a time; and for those who have also applied to use the more specialized TCS resource, up to 8 jobs in the queue running on a total of 2 nodes at a time, again with a 48 hour wallclock limit per job. Users who need more than this amount of resources must apply for it through the account allocation/LRAC/NRAC process.

Usage Policy

All users agreed to various conditions when they requested an account; e.g. accounts must not be shared, computing resources are to be used efficiently and only for research etc. Please see the Scinet Usage Policy for the full details.

Contact information

Any questions and problem reports should be addressed to <support@scinet.utoronto.ca>. Please provide as much relevant information as possible in your email.

9 July 2010, 13:25 (UTC)