Essentials

From oldwiki.scinet.utoronto.ca
Revision as of 13:11, 13 May 2011 by Rzon (talk | contribs)
Jump to navigation Jump to search

WARNING: SciNet is in the process of replacing this wiki with a new documentation site. For current information, please go to https://docs.scinet.utoronto.ca

SciNet Systems

SciNet is a consortium for High-Performance Computing consisting of researchers at the University of Toronto and its associated hospitals. It is one of seven such consortia in Canada, which together form Compute/Calcul Canada.

SciNet has two main clusters:

General Purpose Cluster (GPC)

University of Tor 79284gm-a.jpg

The General Purpose Cluster is an extremely large cluster (ranked 16th in the world at its inception in June 2009, and the fastest in Canada) and is where most simulations are to be done at SciNet. It is an IBM iDataPlex cluster based on Intel's Nehalem architecture (one of the first in the world to make use of the new chips). The GPC consists of 3,780 nodes with a total of 30,240 2.5GHz cores, with 16GB RAM per node (2GB per core). Approximately one quarter of the cluster is interconnected with non-blocking DDR InfiniBand while the rest of the nodes are connected with 5:1 blocked QDR Infiniband. The compute nodes are accessed through a queuing system that allows jobs with a maximum wall time of 48 hours.

Tightly Coupled System (TCS)

TCS-1.jpg

The TCS is a cluster of `fat' (high-memory, many-core) POWER6 nodes on a very fast Infiniband connection, which was installed in January of 2009. It has a relatively small number of cores (~3000), arranged in nodes of 32 cores with 128 GB of RAM on each node. It is dedicated for jobs that require such a large memory / low latency configuration. Jobs need to use multiples of 32 cores (a node), and are submitted to the LoadLeveler queuing system which allows jobs with a maximum wall time of 48 hours.

Power 7 Linux Cluster (P7)

IBM755.jpg

The P7 is a cluster of 5 (soon to be at least 8) `fat' (high-memory, many-core) POWER7 nodes on a very fast Infiniband connection, which was installed in May of 2011. It has a total of 160 cores, arranged in nodes of 32 cores with 128 GB of RAM on each node. It is dedicated for jobs that require such a large memory / low latency configuration. Jobs need to use multiples of 32 cores or more, and are submitted to the LoadLeveler queuing system which allows jobs with a maximum wall time of 48 hours.

Accelerator Research Cluster (ARC)

The ARC is a technology evaluation cluster with a combination of 14 IBM PowerXCell 8i "Cell" nodes and 8 Intel x86_64 "Nehalem" nodes containing 16 NVIDIA M2070 GPUs. The QS22's each have two 3.2GHz "IBM PowerXCell 8i CPU's, where each CPU has 1 Power Processing Unit (PPU) and 8 Synergistic Processing Units (SPU), and 32GB of RAM per node. The Intel nodes have two 2.53GHz 4core Xeon X5550 CPU's with 48GB of RAM per node each containing two NVIDIA M2070 (Fermi) GPU's each with 6GB of RAM.

Please note that this cluster is not a production cluster and is only accessible to selected users.

Accounts

Eligibility

SciNet facilities are designated for Canadian researchers or those collaborating on Canadian research projects. In general, any academic researcher from a Canadian research institution with significant high performance computing requirements to support his or her research may apply for an account on SciNet. Faculty members and senior researchers at a Canadian university or research facility are eligible to create a "Default Project", more aptly called a "Program", and get a SciNet account. This person becomes the "lead researcher" for the program. A person can only have one program. In the parlance of the RACs, a "project" refers to a special allocation requested for a specific research, and a PI may have many projects. A "Resource Allocation Project" (RAP), other than the default, is for a specific research project. Graduate Students and PDFs are eligible to have accounts belonging to a Project, but are not eligible to create Projects on their own. Their supervisor must first create the project, even if the supervisor has no intention of using his SciNet account.

Application Process

Note: SciNet Account creation is suspended during the transition from the General Purpose Cluster to the new Niagara supercomputer. You can still go through the process below, but your SciNet account creation request will not be handled until April 2018, when Niagara will be taken into general production.

To use SciNet’s computational resources, any qualified researcher at a Canadian University is eligible for a free account on the SciNet systems. To apply for a SciNet account, one must first register with the Compute Canada Database. In general, the faculty member first applies, after which group members can request accounts. Note that accounts have default limits but those users with especially large resource requirements may apply for more resources in the annual national Resource Allocation Competition (RAC, see below).

Here is a synopsis of the account application process:

First, the Faculty member (PI) follows these steps

  1. Go to the Compute Canada Database (CCDB) at https://ccdb.computecanada.ca
  2. Click on "Register"; agree to the Acceptable Usage Policy; then fill out and submit a Compute Canada account application.
  3. After a few days, the PI receives a Compute Canada account confirmation email with a Compute Canada Role Identifier (CCRI).
  4. The PI can now log into the ccdb, go to the "Apply for a Consortium Account" item under the "My Account" menu, and click on the "Apply" button next to SciNet.
  5. This request is sent to SciNet; After a few days, the PI will be notified, by e-mail, when the account has been set up.

Next, the PI's group members and collaborators can go through the same process:

  1. Go to the Compute Canada Database (CCDB) at https://ccdb.computecanada.ca
  2. Click on "Register"; agree to the Acceptable Usage Policy; then fill out and submit a Compute Canada account application. They should indicate on the application form in step 2 that they are a "Sponsored User". They need to enter the PI's Compute Canada Role Identifier (CCRI), which they should request from him/her.
  3. The PI will receive a request to approve the account application.
  4. After a few days, the user receives a Compute Canada account confirmation email.
  5. The user can now log into the CCDB, go to the "Apply for a Consortium Account" item under the "My Account" menu, and click on the "Apply" button next to SciNet.
  6. This request is sent to SciNet; After a few days, the user will be notified, by e-mail, when the account has been set up.

Rapid Access Service

All SciNet users have access to a "Rapid Access Service" (RAS, which was previously known as a "default allocation") which gives them the ability to use a certain maximum number of nodes at any time. If that limit has been exceeded, the user's jobs will remain in the queue until they are able to run.

Currently, the default allocation limits are equivalent to continuous usage of 32 nodes (256 cores) on the GPC and 2 nodes on the TCS (for groups with access to that machine) in any given 48-hour period. Note that this default is a maximum limit rather than a guaranteed limit. If the job queue is full then it is possible that jobs are unable to start even if a user's default limits have not been reached.

It is also important to understand that this RAS limit applies to entire research groups rather than to individuals. Thus, if one member of a group is using the full default allocation then jobs submitted by other users in the same group will wait in the queue until they are able to run. It is technically more correct to say that the default limit applies to all members of a given "Default Resource Allocation Project" (DRAP). Login to the CCDB to check which DRAP you belong to and to see either:

  1. who belongs to your group (if you are a PI/faculty member) or
  2. who's group you belong to (if you are a PDF, RA or student).

Competition for Resources for Research Groups (RRG)

Resource allocations beyond the default allocation are determined by an application and adjudication process.

Researchers requiring access to more resources than are available to a default allocation must submit a proposal to Compute Canada's Resource Allocation Committee. The Call for Proposals for Resources for Research Groups (RRG) is posted on the Compute Canada site each fall, with awarded allocations running 1 Jan to 31 Dec of every year.

All SciNet PIs will receive email notification when RRG applications are due. Proposals with details about the scientific and technical aspects, are to be submitted via the Compute Canada website: https://ccdb.computecanada.ca.

Account Administration

Portal site

There is a portal site which deals with administrative things like changing your password, (un)subscibing to main lists, and your usage reports.

Usage policy

All users will be asked to agree to the SciNet usage policy when they get an account.

Usage reports

WARNING: SciNet is in the process of replacing this wiki with a new documentation site. For current information, please go to https://docs.scinet.utoronto.ca

SciNet usage reports are now available on the CCDB website - https://ccdb.computecanada.ca/. Login and then click on "View Group Usage" under the "My Account" menu. Please let us know if you have comments/questions about these reports.

PIs and all people that they sponsor will be able to view the reports for only their group.

There is a summary report which gives a high-level overview for those groups with RAC allocations as well as more detailed cumulative reports which breakdown usage by allocation, system and user. These are all updated nightly.

Access to the SciNet systems

Access to the SciNet systems is via SSH only. To use the GPC or TCS, first ssh to the data centre through login.scinet.utoronto.ca:

 ssh -l USER login.scinet.utoronto.ca

From here you can view your directories, see the queue on the GPC using showq, and log into one of four GPC development nodes, gpc01..gpc04, or either of the TCS development nodes, tcs01 or tcs02.

However, because the login nodes are used by everyone who needs to use the SciNet systems, be considerate; do not run scripts or programs that will take more than a few minutes or a few MB of memory on these systems.

Users can transfer small files (at most about 10GB) into or out of the datacentre via the login nodes, using scp, or rsync over ssh. Large data transfers, however, should be done via the datamover1 node. This node can initiate both incoming and outgoing transfers, and since it is on a 10 Gbps link to the University of Toronto, it is the fastest - and recommended - way to transfer data. Note that datamover1 is not accessible from the outside, so you must still login to login.scinet.utoronto.ca and then ssh to the data mover node. See the data transfer section of the Data Management page for more details.

Please talk to us at <support@scinet.utoronto.ca> if you need to do very large file transfers, Note that the login machines are not the same architecture as either the GPC or TCS nodes; you should not compile programs on the login machines that you expect to use on the GPC or TCS clusters.

Note also that access to the TCS is not enabled by default. We ask that people justify the need for this highly specialized machine. Contact us explaining the nature of your work if you want access to the TCS. In particular, applications should scale well to 64 processes/threads to run on this system.

SciNet Firewall

Important note about logging in: The SciNet firewall monitors for too many attempted connections, and will shut down all access (including previously working connections) from your IP address if more than four connection attempts (successful or not) are made within the space of a few minutes. In that case, you will be locked out of the system for an hour. Be patient in attempting new logins!

Default Limits on the SciNet systems

The default allocation on the SciNet GPC cluster allows a research group to use a maximum of 32 nodes at a time for 48 hours per job with no more than 32 individual jobs at a time; and for those who have also applied to use the more specialized TCS resource, up to 8 jobs in the queue running on a total of 2 nodes at a time, again with a 48 hour wallclock limit per job. Users who need more than this amount of resources must apply for it through the account allocation/LRAC/NRAC process.

Usage Policy

All users agreed to various conditions when they requested an account; e.g. accounts must not be shared, computing resources are to be used efficiently and only for research etc. Please see the Scinet Usage Policy for the full details.

Suggested Further Reading

Contact information

Any questions and problem reports should be addressed to <support@scinet.utoronto.ca>. Please provide as much relevant information as possible in your email.

Acknowledging SciNet

In Publications

When submitting a publication based on results from SciNet computations, we ask that you use the following line in the acknowledgements:

Computations were performed on the [systemname] supercomputer at the SciNet HPC Consortium. SciNet is funded by: the Canada Foundation for Innovation under the auspices of Compute Canada; the Government of Ontario; Ontario Research Fund - Research Excellence; and the University of Toronto.

(replace [systemname] with the appropriate system, e.g. gpc, tcs, arc gpu, p7).

Also please cite the SciNet datacentre paper:

Chris Loken et al 2010 J. Phys.: Conf. Ser. 256 012026 doi: (10.1088/1742-6596/256/1/012026)

This helps us automatically track publications that use SciNet. Such publications are are both useful evidence of scientific merit for future resource allocations for you the user, and help us demonstrate the importance of computational resources such as SciNet to our funding partners. Also please feel free to email details of any such publications, along with PDF preprints, to support@scinet.

In Talks

Please feel free to use the logos below, and images of GPC, TCS, and the data centre, in any talks you give:

SciNet Images: Logos and Data Centre

SciNet Logo (700x239)

Older logos:

SciNet Logo (1280x386)
SciNet Logo with Transparent Background (1364×412)
SciNet Logo (2728x823)
SciNet Data Centre (1280×854)
Compute Canada Logo (544x96)

Photoshop format

9 July 2010, 13:25 (UTC)