Difference between revisions of "Oldwiki.scinet.utoronto.ca:System Alerts"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
Line 7: Line 7:
 
     up.png    for 100% up
 
     up.png    for 100% up
 
  -->
 
  -->
[[File:up.png|up|link=GPC Quickstart]]GPC
+
[[File:up50.png|scratch file system down|link=GPC Quickstart]]GPC
[[File:up.png|up|link=TCS Quickstart]]TCS
+
[[File:up50.png|scratch file system down|link=TCS Quickstart]]TCS
[[File:up.png|up|link=GPU Devel Nodes]]ARC
+
[[File:up50.png|scratch file system down|link=GPU Devel Nodes]]ARC
[[File:up.png|up|link=P7 Linux Cluster]]P7
+
[[File:up50.png|scratch file system down|link=P7 Linux Cluster]]P7
 
[[File:up.png|up|link=BGQ]]BGQ
 
[[File:up.png|up|link=BGQ]]BGQ
[[File:up.png|down|link=HPSS]]HPSS
+
[[File:up50.png|scratch file system down|link=HPSS]]HPSS
  
Thu Aug  8  13:22     - most systems are back up
+
Fri Aug 9 15:25 - File system problems. Scratch is unmounted. Jobs are likely dying. We are working on it.
 +
 
 +
Thu Aug  8  13:22 - most systems are back up
  
 
Thu Aug  8 11:18:45 - problems with storage hardware.  Trying to resolve with vendor
 
Thu Aug  8 11:18:45 - problems with storage hardware.  Trying to resolve with vendor

Revision as of 15:27, 9 August 2013

System Status

scratch file system downGPC scratch file system downTCS scratch file system downARC scratch file system downP7 upBGQ scratch file system downHPSS

Fri Aug 9 15:25 - File system problems. Scratch is unmounted. Jobs are likely dying. We are working on it.

Thu Aug 8 13:22 - most systems are back up

Thu Aug 8 11:18:45 - problems with storage hardware. Trying to resolve with vendor

Thu Aug 8 08:14:01 Cooling has been restored. Starting to recover systems.

Large voltage drop at site knocked-out cooling system at 0558 today. Staff enroute to site.


Last update: Thu 8 Aug 2013 06:12:28 EDT


(Previous messages)