Difference between revisions of "Oldwiki.scinet.utoronto.ca:System Alerts"
Jump to navigation
Jump to search
Line 19: | Line 19: | ||
|} | |} | ||
− | Sat Nov 22: 0448: Power glitch at site at 0342. Access to TCS has been lost. | + | Sat Nov 22: 0448: Power glitch at site at 0342. Access to TCS has been lost. Many jobs running on GPC were killed when nodes rebooted. Will investigate more closely later this morning. |
Fri Nov 22: | Fri Nov 22: |
Revision as of 06:27, 23 November 2013
System Status
GPC | TCS | Sandy | ARC |
Gravity | P7 | BGQ | HPSS |
Sat Nov 22: 0448: Power glitch at site at 0342. Access to TCS has been lost. Many jobs running on GPC were killed when nodes rebooted. Will investigate more closely later this morning.
Fri Nov 22:
One of our IB fabric managers died last night. As a result, many nodes including the GPFS managers could not communicate properly and many nodes had their GPFS unmounted. If you had crashed jobs, please resubmit.
Last updated: Fri Nov 22 14:49