Difference between revisions of "Oldwiki.scinet.utoronto.ca:System Alerts"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
Line 21: Line 21:
 
|[[File:up.png|up|link=GPC Quickstart]][[GPC Quickstart|GPC]]
 
|[[File:up.png|up|link=GPC Quickstart]][[GPC Quickstart|GPC]]
 
|[[File:up.png|up|link=TCS Quickstart]][[TCS Quickstart|TCS]]
 
|[[File:up.png|up|link=TCS Quickstart]][[TCS Quickstart|TCS]]
|[[File:down.png|down|link=Sandy]][[Sandy]]
+
|[[File:up.png|up|link=Sandy]][[Sandy]]
 
|[[File:up.png|up]]File System
 
|[[File:up.png|up]]File System
 
|-
 
|-

Revision as of 18:22, 21 May 2016

System Status

upGPC upTCS upSandy upFile System
downGravity upP7 upViz upBGQ upHPSS

Sat May 21 18:09:06 EDT 2016: BGQ was overheating due to stuck valve, and some jobs were killed. We have reset the valve and it's working apparently. TCS is up.

Sat May 21 16:08:57 EDT 2016: P7 and BGQ are up. TCS still has some issues. NOTE: BGQ cannot currently be accessed directly from the outside. Please login to the login nodes (login.scinet.utoronto.ca), and once there do "ssh bgqdev".

Sat 21 May 2016 13:12:14 EDT: GPC and viz nodes available. Some issues delaying other systems

Sat 21 May 2016 10:07:14 EDT: Starting to bring up storage and other eqpt slowly to be sure there are no outstanding issues. Will be at least noon before any systems are available. Will update timeline as we progress.

Fri 20 May 2016 19:04:36 EDT: Problem was traced to a faulty valve controlling makeup water to the cooling-tower. Valve has been fixed and water removed but systems will remain down overnight in order to make sure the machine room sub-floor has dried properly. Next update will be about 10AM tomorrow (Saturday) morning. We hope to start bringing systems up at that time.

May 20, 9:50 AM: All systems are being brought down to investigate a water leak in the data centre. Keep checking here for updates. As investigation is ungoing, it is not yet possible to give an estimate when systems may be up again.