Difference between revisions of "Oldwiki.scinet.utoronto.ca:System Alerts"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
Line 16: Line 16:
 
|[[File:up.png|up|link=Gravity]][[Gravity]]
 
|[[File:up.png|up|link=Gravity]][[Gravity]]
 
|[[File:up.png|up|link=P7 Linux Cluster]][[P7 Linux Cluster|P7]]
 
|[[File:up.png|up|link=P7 Linux Cluster]][[P7 Linux Cluster|P7]]
|[[File:up75.png|up|link=BGQ]][[BGQ]]
+
|[[File:up.png|up|link=BGQ]][[BGQ]]
 
|[[File:up.png|up|link=HPSS]][[HPSS]]
 
|[[File:up.png|up|link=HPSS]][[HPSS]]
 
|
 
|
 
|}
 
|}
 
Wed Jul 02 13:20:00 EDT : Hadware issue on BlueGene/Q : cannot run full system jobs (2048 nodes)
 
  
 
Mon Jun 30 15:19:39 EDT: All system down. Some kind of power issue (again).  
 
Mon Jun 30 15:19:39 EDT: All system down. Some kind of power issue (again).  

Revision as of 09:03, 4 July 2014

System Status

upGPC upTCS upSandy upARC upFile System
upGravity upP7 upBGQ upHPSS

Mon Jun 30 15:19:39 EDT: All system down. Some kind of power issue (again).

Sun Jun 29 19:57:29: Compute systems started coming online about 730PM.

Sun Jun 29 18:20:41: filesystems restarted after some issues. Likely at least 8PM before compute systems available

Sun Jun 29 16:39:35 EDT 2014: large voltage spike tripped our main circuit breaker. We have power though it's out at sites within 2k because of lightning strike. Cooling system being restored

Sun Jun 29 15:47:11 EDT 2014: staff enroute to site. Should have update on cause within an hour

Sun Jun 29 15:40:31 EDT 2014: power lost about 3:20P today. All systems down. Investigating.


Note: As a precaution, emails by the Moab/Torque scheduler have been disabled because of a potential security vulnerability since Jan 24th 2014.

Last updated: Fri May 23 12:01:44 EDT 2014 (Previous messages)