Oldwiki.scinet.utoronto.ca:System Alerts

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search

System Status

upGPC downTCS downSandy downARC upFile System
downGravity upP7 downBGQ upHPSS

Fri Aug 1 20:19:44 EDT 2014: The 3 second power-outage took down all the GPC, TCS and BGQ compute nodes so all running jobs were killed. Queued and new jobs started 3s later on the GPC. The TCS and BGQ are back-online as well. Please email support@scinet.utoronto.ca if you still notice issues

Fri Aug 1 17:23:05 EDT 2014: Around 5pm, a few seconds of power outage has taken down an as-of-yet unknown number of nodes. GPC, Sandy, TCS, Gravity, ARC are certainly affected, but to which extent is not clear yet. Updates will be posted here.

Fri Aug 1 17:46:04 EDT 2014: GPC, Sandy, ARC, Gravity, TCS, and BGQ were all affected. P7, HPSS and file system are okay. We're rebooting the nodes.


Note: As a precaution, emails by the Moab/Torque scheduler have been disabled because of a potential security vulnerability since Jan 24th 2014.

Last updated: Tue Jul 15 7:51:44 EDT 2014 (Previous messages)