Difference between revisions of "Oldwiki.scinet.utoronto.ca:System Alerts"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
Line 1: Line 1:
 
== System Status: <span style="color:#00dd00">'''UP'''</span> ==  
 
== System Status: <span style="color:#00dd00">'''UP'''</span> ==  
  
Filesystems are back. Please resubmit you jobs.
+
Scratch file system got unmounted around 3:30 am. Jobs may have crashed.
  
Thu Jan 19 12:31:34 EST 2012
+
Filesystems are back now.  Please resubmit you jobs.
  
-------------------------
+
Mon Jan 30 9:12:00 EST 2012
 
 
System still apparently unstable, with consequent loss of /scratch.  Jobs may have died.  Being mounted again.
 
 
 
Thu Jan 19 12:10:15 EST 2012
 
  
 
--------------------------
 
--------------------------
Line 24: Line 20:
  
 
Thu Jan 19 11:12:55 EST 2012
 
Thu Jan 19 11:12:55 EST 2012
 
---------------------------
 
 
 
Chiller failure, all systems automatically shut down.  We'll keep you informed in this space.
 
 
Thu 19 Jan 2012 07:54:17 EST
 
 
  
 
([[Previous_messages:|Previous messages]])
 
([[Previous_messages:|Previous messages]])

Revision as of 10:12, 30 January 2012

System Status: UP

Scratch file system got unmounted around 3:30 am. Jobs may have crashed.

Filesystems are back now. Please resubmit you jobs.

Mon Jan 30 9:12:00 EST 2012


System Temporary Change:

Due to some changes we are making to the GPC GigE nodes, if you run multinode ethernet MPI jobs (IB multinode jobs are fine), you will need to explicitly request the ethernet interface in your mpirun:

For Openmpi -> mpirun --mca btl self,sm,tcp

For IntelMPI -> mpirun -env I_MPI_FABRICS shm:tcp

There is no need to do this if you run on IB, or if you run single node mpi jobs on the ethernet (GigE) nodes. Please check GPC_MPI_Versions for more details.

Thu Jan 19 11:12:55 EST 2012

(Previous messages)