Oldwiki.scinet.utoronto.ca:System Alerts

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search

System Status

upGPC upTCS upSandy upFile System
upGravity upP7 upViz upBGQ upHPSS


NOTE: BGQ cannot currently be accessed directly from the outside. Please ssh to the login nodes (login.scinet.utoronto.ca), and once there do "ssh bgqdev".

Sat May 21 18:09:06 EDT 2016: BGQ was overheating due to stuck valve, and some jobs were killed. We have reset the valve and it's working apparently. TCS is up.

Sat May 21 16:08:57 EDT 2016: P7 and BGQ are up. TCS still has some issues.

Sat 21 May 2016 13:12:14 EDT: GPC and viz nodes available. Some issues delaying other systems

Sat 21 May 2016 10:07:14 EDT: Starting to bring up storage and other eqpt slowly to be sure there are no outstanding issues. Will be at least noon before any systems are available. Will update timeline as we progress.

Fri 20 May 2016 19:04:36 EDT: Problem was traced to a faulty valve controlling makeup water to the cooling-tower. Valve has been fixed and water removed but systems will remain down overnight in order to make sure the machine room sub-floor has dried properly. Next update will be about 10AM tomorrow (Saturday) morning. We hope to start bringing systems up at that time.

May 20, 9:50 AM: All systems are being brought down to investigate a water leak in the data centre. Keep checking here for updates. As investigation is ungoing, it is not yet possible to give an estimate when systems may be up again.