Difference between revisions of "Oldwiki.scinet.utoronto.ca:System Alerts"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
Line 17: Line 17:
 
for later in August to this Thursday as well (hence the uncertainty in  
 
for later in August to this Thursday as well (hence the uncertainty in  
 
the length of the required downtime).
 
the length of the required downtime).
 
Mon Jul 29 10:40:00  All systems back up.
 
 
Mon Jul 29 10:09:00 TCS is back up.  BGQ still down.
 
 
Mon Jul 29  8:37:00  Power glitch overnight took systems down.  GPC is already up, and other systems are being brought up.
 
 
Wed Jul 24 15:00:00  All BGQ racks back in production
 
 
Thu Jul 18 10:00:00  Bgqdev and one of the two bgq racks are up again
 
 
Wed Jul 17 17:00:00  Bgqdev and bgq systems are down.
 
 
Wed Jul 17 15:58:00  We're reenabling the rack, please resubmit crashed jobs.
 
 
Wed Jul 17 15:24:12  One of the two racks of the BlueGene/Q production system has gone down.
 
 
Mon Jul 15 09:45:49: Gravity01 (head node in gravity cluster) is down until futher notice. Jobs may still be submitted from devel nodes or arc01
 
  
 
([[Previous_messages:|Previous messages]])
 
([[Previous_messages:|Previous messages]])

Revision as of 19:29, 30 July 2013

System Status

upGPC upTCS upARC upP7 upBGQ upHPSS

Tue Jul 30, 19:24:00: Downtime announcement

All systems will be shutdown at 8AM on Thurs, 1 Aug for emergency repair of a component in the cooling system. Systems are expected to be back on-line in the afternoon. Check here for progress updates.

Apologies for the short notice but we only learned of the problem this afternoon. We're now attempting to re-schedule other maintenance planned for later in August to this Thursday as well (hence the uncertainty in the length of the required downtime).

(Previous messages)