Oldwiki.scinet.utoronto.ca:System Alerts

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search

System Status

upGPC upTCS upSandy upGravity upBGQ Up.pngFile System
upP7 upP8 upKNL upViz upHPSS

Fri Aug 18 13:20:56 EDT 2017 We need to have an emergency shutdown of all compute systems to fix a cooling issue that has arisen. Should be back up this afternoon. We will try and keep the login nodes and stroage online, however all the compute nodes will need to be shutdown.

Sat Aug 5 16:48:44 EDT 2017 The switch is fixed. Scinet0[2-4] and datamovers are back online.

Sat Aug 5 00:56:31 EDT 2017 Most of GPC will be accessible soon. Lost a switch, scinet0[2-4] and datamovers will be down until it's fixed. Scinet01 may be login using its IP address; "ssh 142.150.188.51".

Fri Aug 4 17:45:10 EDT 2017 The chiller went down again, causing a full shutdown of all systems. We don't expect them back tonight, as the storm continues and power outages continue with it.

Fri Aug 4 17:11:08 EDT 2017 A power glitch took down all the compute nodes, including GPC, TCS, BGQ. The filesystems are up, except for reserved1 and scratchtcs. Systems are being restored.

Tue Jun 13 11:24:15 EDT 2017 HPSS is back on service.

Sat Jun 10 21:38:37 EDT 2017 The robot arm on the HPSS library developed problems. HPSS is out of service until further notice.

Tue May 23 13:40:56 EDT 2017 HPSS is back on service.

Mon May 22 07:37:27 EDT 2017 Overnight the robot arm on the library developed problems. We may not be able to have support come in and have it fixed until Tuesday. In the meantime HPSS is out of service

Thu May 18 13:00:00 EDT 2017 File system seem resolved. Please check your jobs and resubmit if they had issues.

Thu May 18 12:00:00 EDT 2017 File system issues and some jobs may have died. Investigating.