Difference between revisions of "Oldwiki.scinet.utoronto.ca:System Alerts"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
Line 19: Line 19:
 
  -->
 
  -->
 
{|  
 
{|  
|[[File:up.png|up|link=GPC Quickstart]][[GPC Quickstart|GPC]]
+
|[[File:down.png|up|link=GPC Quickstart]][[GPC Quickstart|GPC]]
|[[File:up.png|up|link=TCS Quickstart]][[TCS Quickstart|TCS]]
+
|[[File:down.png|up|link=TCS Quickstart]][[TCS Quickstart|TCS]]
|[[File:up.png|up|link=Sandy]][[Sandy]]
+
|[[File:down.png|up|link=Sandy]][[Sandy]]
|[[File:up.png|up]]File System
+
|[[File:down.png|up]]File System
 
|-
 
|-
|[[File:up.png|up|link=Gravity]][[Gravity]]
+
|[[File:down.png|up|link=Gravity]][[Gravity]]
|[[File:up.png|up|link=P7 Linux Cluster]][[P7 Linux Cluster|P7]]
+
|[[File:down.png|up|link=P7 Linux Cluster]][[P7 Linux Cluster|P7]]
|[[File:up.png|up|link=Visualization Nodes]][[Visualization Nodes|Viz]]
+
|[[File:down.png|up|link=Visualization Nodes]][[Visualization Nodes|Viz]]
|[[File:up.png|up|link=BGQ]][[BGQ]]
+
|[[File:down.png|up|link=BGQ]][[BGQ]]
|[[File:up.png|up|link=HPSS]][[HPSS]]
+
|[[File:down.png|up|link=HPSS]][[HPSS]]
 
|}
 
|}
  
 +
<b>May 20, 9:50 AM:</b> All systems down, controlled shutdown due to water leaking in the data center. Personnel is already working on it.
  
 
<b>Apr 22, 18:50 PM:</b> Hardware issue caused the /scratch FS to go offline, and most jobs were killed. It's back online. Apologize for the troubles.
 
<b>Apr 22, 18:50 PM:</b> Hardware issue caused the /scratch FS to go offline, and most jobs were killed. It's back online. Apologize for the troubles.

Revision as of 10:00, 20 May 2016

System Status

upGPC upTCS upSandy upFile System
upGravity upP7 upViz upBGQ upHPSS

May 20, 9:50 AM: All systems down, controlled shutdown due to water leaking in the data center. Personnel is already working on it.

Apr 22, 18:50 PM: Hardware issue caused the /scratch FS to go offline, and most jobs were killed. It's back online. Apologize for the troubles.

Apr 21, 11:23 PM: Most systems are up. Jobs on TCS are all canceled, please resubmit your jobs and report issues.

Apr 21, 4:30 PM: Systems are beginning to come up. We expect most will be functional by 8PM.

Apr 21, 1:10 PM: Systems down. Power failure. Personnel working on it.