Using the TCS
Revision as of 11:51, 23 April 2009 by Cloken (talk | contribs) (→Directories available to batch system)
About
The Tightly-coupled Capability System (TCS) is a cluster of IBM Power 6 nodes intended for jobs that scale well to at least 32 processes and which require high bandwidth and large memory. It was installed at SciNet in late 2008 and is operating in "friendly-user" mode during winter 2009
Node Names
- node tcs-f02n01 is node # 1 in frame/rack #2
- entire list of 104 nodes can be seen with llstatus
Node Specs
There are 102 compute nodes each with:
- 32 Power6 cores (4.7GHz each)
- each core is 2-way multi-threaded using SMT (simultaneous multithreading)
- 128GB of RAM (except for tcs-f11n03 and n04 which have 256GB each)
- 4 InfiniBand (IB) interfaces used for data and message-passing traffic
- 2 GigE interfaces used for management and GPFS token traffic
User Documentation
User Access
- login to 142.150.188.41 (this is node tcs-f11n05) in order to start using the TCS
Login Nodes
- there are two interactive login nodes: tcs-f11n05 and tcs-f11n06
- use the login nodes to submit and monitor jobs, edit files, compile code etc
- small, interactive, short test jobs may be run ONLY on tcs-f11n06
Submitting Jobs
Loadleveler Batch Files
Directories available to batch system
- loadleveler jobs run from /scratch
- loadleveler jobs can NOT access /home
- users must take care of copying any required executables, input files etc to their /scratch/ space before submitting a job
Monitoring Jobs
Filesystems
- 10GB quota in your home directory; it is backed-up to disk
- your /home/ directory is NOT mounted on the compute nodes
- loadleveler jobs will run from /scratch/. Users must take care of copying any required executables, input files etc to their /scratch/ space before submitting a job * files in /scratch are NEVER backed-up but should remain there for now (barring hardware/software problems)