Difference between revisions of "Oldwiki.scinet.utoronto.ca:System Alerts"
Jump to navigation
Jump to search
Line 18: | Line 18: | ||
|[[File:up.png|up|link=HPSS]][[HPSS]] | |[[File:up.png|up|link=HPSS]][[HPSS]] | ||
|} | |} | ||
+ | |||
+ | Sat Nov 22: 0448: Power glitch at site at 0342. Access to TCS has been lost. Appears to be at least one IB switch misbehaving as a result which has killed jobs touching one rack of the GPC. Will be more investigation later this morning. | ||
Fri Nov 22: | Fri Nov 22: |
Revision as of 06:05, 23 November 2013
System Status
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Sat Nov 22: 0448: Power glitch at site at 0342. Access to TCS has been lost. Appears to be at least one IB switch misbehaving as a result which has killed jobs touching one rack of the GPC. Will be more investigation later this morning.
Fri Nov 22:
One of our IB fabric managers died last night. As a result, many nodes including the GPFS managers could not communicate properly and many nodes had their GPFS unmounted. If you had crashed jobs, please resubmit.
Last updated: Fri Nov 22 14:49