Difference between revisions of "Oldwiki.scinet.utoronto.ca:System Alerts"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
Line 16: Line 16:
 
|[[File:up.png| up |link=Gravity]][[Gravity]]
 
|[[File:up.png| up |link=Gravity]][[Gravity]]
 
|[[File:up.png| up |link=P7 Linux Cluster]][[P7 Linux Cluster|P7]]
 
|[[File:up.png| up |link=P7 Linux Cluster]][[P7 Linux Cluster|P7]]
|[[File:down.png| down |link=BGQ]][[BGQ]]
+
|[[File:up.png| down |link=BGQ]][[BGQ]]
 
|[[File:up.png| up |link=HPSS]][[HPSS]]
 
|[[File:up.png| up |link=HPSS]][[HPSS]]
 
|
 
|
 
|}
 
|}
  
Tue Jan 20 14:27:06 EST 2015: At noon on Tuesday January 20th, 2015, both 2-rack BlueGene/Q systems, bgq and bgqdev, will be taken down in order to be merged into one 4-rack system (i.e. 65536 cores).  We expect that the BGQ will be up again some time on Thursday January 22nd, 2015.
+
Thu Jan 22 13:27:44 EST 2015: BGQ now available as a single 4-rack system. bgqdev-fen1 is the single login/devel/submission node.
  
 
Sat 17 Jan 2015 21:50:40 EST: Cooling has been restored. Systems being restarted. Likely available within an hour or so.  Root cause was a frozen pipe in cooling tower (very strange; has never happened before and today is relatively warm compared to past two weeks).
 
Sat 17 Jan 2015 21:50:40 EST: Cooling has been restored. Systems being restarted. Likely available within an hour or so.  Root cause was a frozen pipe in cooling tower (very strange; has never happened before and today is relatively warm compared to past two weeks).

Revision as of 14:28, 22 January 2015

System Status

upGPC upTCS upSandy upARC upFile System
upGravity upP7 downBGQ upHPSS

Thu Jan 22 13:27:44 EST 2015: BGQ now available as a single 4-rack system. bgqdev-fen1 is the single login/devel/submission node.

Sat 17 Jan 2015 21:50:40 EST: Cooling has been restored. Systems being restarted. Likely available within an hour or so. Root cause was a frozen pipe in cooling tower (very strange; has never happened before and today is relatively warm compared to past two weeks).

Sat 17 Jan 2015 19:34:00 EST: JCI on site as well. Diagnosing issue.

Sat 17 Jan 2015 17:33:47 EST: Unusual cooling problem. Systems down. Staff enroute to site


--


(Previous messages)