Difference between revisions of "Oldwiki.scinet.utoronto.ca:System Alerts"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
Line 31: Line 31:
 
|}
 
|}
  
 +
 +
<b>Fri Jun 10 13:41:43 EDT 2016:</b> HPSS is scheduled to software upgrade on Jun/15 (next Wednesday).
  
 
<font color=red><b>NOTE:</b> BGQ cannot currently be accessed directly from the outside.  Please ssh to the login nodes (login.scinet.utoronto.ca), and once there do "ssh bgqdev".</font>
 
<font color=red><b>NOTE:</b> BGQ cannot currently be accessed directly from the outside.  Please ssh to the login nodes (login.scinet.utoronto.ca), and once there do "ssh bgqdev".</font>

Revision as of 13:43, 10 June 2016

System Status

upGPC upTCS upSandy upFile System
downGravity upP7 upViz upBGQ upHPSS


Fri Jun 10 13:41:43 EDT 2016: HPSS is scheduled to software upgrade on Jun/15 (next Wednesday).

NOTE: BGQ cannot currently be accessed directly from the outside. Please ssh to the login nodes (login.scinet.utoronto.ca), and once there do "ssh bgqdev".

Sat May 21 18:09:06 EDT 2016: BGQ was overheating due to stuck valve, and some jobs were killed. We have reset the valve and it's working apparently. TCS is up.

Sat May 21 16:08:57 EDT 2016: P7 and BGQ are up. TCS still has some issues.

Sat 21 May 2016 13:12:14 EDT: GPC and viz nodes available. Some issues delaying other systems

Sat 21 May 2016 10:07:14 EDT: Starting to bring up storage and other eqpt slowly to be sure there are no outstanding issues. Will be at least noon before any systems are available. Will update timeline as we progress.

Fri 20 May 2016 19:04:36 EDT: Problem was traced to a faulty valve controlling makeup water to the cooling-tower. Valve has been fixed and water removed but systems will remain down overnight in order to make sure the machine room sub-floor has dried properly. Next update will be about 10AM tomorrow (Saturday) morning. We hope to start bringing systems up at that time.

May 20, 9:50 AM: All systems are being brought down to investigate a water leak in the data centre. Keep checking here for updates. As investigation is ungoing, it is not yet possible to give an estimate when systems may be up again.