Difference between revisions of "Oldwiki.scinet.utoronto.ca:System Alerts"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
Line 1: Line 1:
== System Status: <span style="color:#ff0000">'''DOWN'''</span>==
+
== System Status: <span style="color:#00ff00">'''UP'''</span>==
  
 
------------------------------
 
------------------------------
Thu Apr 19 16:54:42 EDT 2012Shutdown Status
+
Thu 19 Apr 2012 19:43:46 EDT:  System Status
  
All work associated with the machine room expansion project has been completed as well as cooling tower maintenance and hardware changes related to the GPC networking upgrade.  The systems are currently being tested and we expect to allow users back later this evening.
+
The GPC interconnect has now been upgraded so that there is low-latency,
 +
high-bandwidth Infiniband (IB) networking throughout the cluster. This
 +
is expected to result in several significant benefits for users
 +
including: better I/O performance for all jobs, better job performance
 +
for any multi-node ethernet jobs (they can now make use of IB) and, for
 +
IB users, improved queue throughput (there are now 4x as many IB nodes)
 +
as well as the ability to run larger IB jobs.
  
------------------------------
+
Though we have been testing the new system since last night, a change of
Wed Apr 18 9:05:00 EST 2012
+
this magnitude is likely to result in some teething problems so please
 
+
bear with us over the next few days. Please report any issues/problems
'''Apr 18-19: Full SciNet shutdown'''. All logins and jobs will be killed at 9AM on 18 April. Expect systems to come back online in the evening of the following day (19 April).
+
that are not explained/resolved after reading this current page or our
 
+
temporary IB upgrade page
------------------------------
+
<https://support.scinet.utoronto.ca/wiki/index.php/Infiniband_Upgrade>
 
+
to support@scinet.utoronto.ca.
Thu Feb  9 11:50:57 EST 2012: ''System Temporary Change for MPI ethernet jobs:''
 
  
Due to some changes we are making to the GPC GigE nodes, if you run multinode ethernet MPI jobs (IB multinode jobs are fine), you will need to explicitly request the ethernet interface in your mpirun:
+
NOTE that our online documentation is NOT completely up-to-date after
 +
this recent change. For the time being, you should first check this
 +
current page and
 +
<https://support.scinet.utoronto.ca/wiki/index.php/Infiniband_Upgrade>
 +
for anything related to networks and queueing.
  
For Openmpi  -> mpirun --mca btl self,sm,tcp<br>
+
NOTE: The temporary mpirun settings that were recommended are no longer in effect, as all MPI traffic is now going over InfiniBand.
For IntelMPI  ->  mpirun -env I_MPI_FABRICS shm:tcp
 
  
There is no need to do this if you run on IB, or if you run single node mpi jobs on the ethernet (GigE) nodes.  Please check [[GPC_MPI_Versions]] for more details.
 
  
 
([[Previous_messages:|Previous messages]])
 
([[Previous_messages:|Previous messages]])

Revision as of 19:52, 19 April 2012

System Status: UP


Thu 19 Apr 2012 19:43:46 EDT: System Status

The GPC interconnect has now been upgraded so that there is low-latency, high-bandwidth Infiniband (IB) networking throughout the cluster. This is expected to result in several significant benefits for users including: better I/O performance for all jobs, better job performance for any multi-node ethernet jobs (they can now make use of IB) and, for IB users, improved queue throughput (there are now 4x as many IB nodes) as well as the ability to run larger IB jobs.

Though we have been testing the new system since last night, a change of this magnitude is likely to result in some teething problems so please bear with us over the next few days. Please report any issues/problems that are not explained/resolved after reading this current page or our temporary IB upgrade page <https://support.scinet.utoronto.ca/wiki/index.php/Infiniband_Upgrade> to support@scinet.utoronto.ca.

NOTE that our online documentation is NOT completely up-to-date after this recent change. For the time being, you should first check this current page and <https://support.scinet.utoronto.ca/wiki/index.php/Infiniband_Upgrade> for anything related to networks and queueing.

NOTE: The temporary mpirun settings that were recommended are no longer in effect, as all MPI traffic is now going over InfiniBand.


(Previous messages)