Using the TCS

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search

About

The Tightly-coupled Capability System (TCS) is a cluster of IBM Power 6 nodes intended for jobs that scale well to at least 32 processes and which require high bandwidth and large memory. It was installed at SciNet in late 2008 and is operating in "friendly-user" mode during winter 2009

Node Names

  • node tcs-f02n01 is node # 1 in frame/rack #2
  • entire list of 104 nodes can be seen with llstatus

Node Specs

There are 102 compute nodes each with:

  • 32 Power6 cores (4.7GHz each)
    • each core is 2-way multi-threaded using SMT (simultaneous multithreading)
  • 128GB of RAM (except for tcs-f11n03 and n04 which have 256GB each)
  • 4 InfiniBand (IB) interfaces used for data and message-passing traffic
  • 2 GigE interfaces used for management and GPFS token traffic

User Documentation

User Access

  • login to 142.150.188.41 (this is node tcs-f11n05) in order to start using the TCS

Login Nodes

  • there are two interactive login nodes: tcs-f11n05 and tcs-f11n06
  • use the login nodes to submit and monitor jobs, edit files, compile code etc
  • small, interactive, short test jobs may be run ONLY on tcs-f11n06

Submitting Jobs

Loadleveler Batch Files

Directories available to batch system

  • loadleveler jobs run from /scratch
  • loadleveler jobs can NOT access /home
  • users must take care of copying any required executables, input files etc to their /scratch/ space before submitting a job

Monitoring Jobs

Filesystems

  • 10GB quota in your home directory; it is backed-up to disk
  • your /home/ directory is NOT mounted on the compute nodes
  • loadleveler jobs will run from /scratch/. Users must take care of copying any required executables, input files etc to their /scratch/ space before submitting a job * files in /scratch are NEVER backed-up but should remain there for now (barring hardware/software problems)