Difference between revisions of "Oldwiki.scinet.utoronto.ca:System Alerts"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
Line 9: Line 9:
 
{|  
 
{|  
 
|[[File:up.png|up|link=GPC Quickstart]][[GPC Quickstart|GPC]]
 
|[[File:up.png|up|link=GPC Quickstart]][[GPC Quickstart|GPC]]
|[[File:down.png|down|link=TCS Quickstart]][[TCS Quickstart|TCS]]
+
|[[File:up.png|up|link=TCS Quickstart]][[TCS Quickstart|TCS]]
 
|[[File:up.png|up|link=Sandy]][[Sandy]]
 
|[[File:up.png|up|link=Sandy]][[Sandy]]
 
|[[File:up.png|up|link=GPU Devel Nodes]][[GPU Devel Nodes|ARC]]
 
|[[File:up.png|up|link=GPU Devel Nodes]][[GPU Devel Nodes|ARC]]
Line 18: Line 18:
 
|[[File:up.png|up|link=HPSS]][[HPSS]]
 
|[[File:up.png|up|link=HPSS]][[HPSS]]
 
|}
 
|}
 +
 +
Sat Nov 22: 1016:  Systems have been restored. A 20s power event knocked out the entire TCS and resulted in most/all of the GPC rebooting. Hence most jobs running at 0342 this morning were lost.
  
 
Sat Nov 22: 0448:  Power glitch at site at 0342. Access to TCS has been lost. Many jobs running on GPC were killed when nodes rebooted. Will investigate more closely later this morning.
 
Sat Nov 22: 0448:  Power glitch at site at 0342. Access to TCS has been lost. Many jobs running on GPC were killed when nodes rebooted. Will investigate more closely later this morning.

Revision as of 11:17, 23 November 2013

System Status

upGPC upTCS upSandy upARC
upGravity upP7 upBGQ upHPSS

Sat Nov 22: 1016: Systems have been restored. A 20s power event knocked out the entire TCS and resulted in most/all of the GPC rebooting. Hence most jobs running at 0342 this morning were lost.

Sat Nov 22: 0448: Power glitch at site at 0342. Access to TCS has been lost. Many jobs running on GPC were killed when nodes rebooted. Will investigate more closely later this morning.

Fri Nov 22:

One of our IB fabric managers died last night. As a result, many nodes including the GPFS managers could not communicate properly and many nodes had their GPFS unmounted. If you had crashed jobs, please resubmit.

Last updated: Fri Nov 22 14:49

(Previous messages)