Difference between revisions of "Oldwiki.scinet.utoronto.ca:System Alerts"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
Line 34: Line 34:
  
  
<b>Mon Nov 7 11:59:00 EST 2016</b> File system has been restored. Jobs are being scheduler again. Please resubmit jobs if they crashed or had issues last night or this morning.
+
<b>Mon Nov 7 11:59:00 EST 2016</b> File system has been restored. Jobs are being scheduled again. Please resubmit jobs if they crashed or had issues last night or this morning.
  
<b>Mon Nov 7 10:40:00 EST 2016</b> Due to this issue, many jobs will have either crashed, or not have had a change to write their output; please check any jobs you had running overnight. The scratch file system is expected to be back up soon.
+
<b>Mon Nov 7 10:40:00 EST 2016</b> Due to this issue, many jobs will have either crashed, or have not had a change to write their output; please check any jobs you had running overnight. The scratch file system is expected to be back up soon.
  
 
<b>Mon Nov 7 9:40:00 EST 2016</b> Scratch file system filled up overnight. We are investigating how to mitigate this. In the meantime, the job scheduler has been stopped, so no new jobs will start (but will remain in the queue).
 
<b>Mon Nov 7 9:40:00 EST 2016</b> Scratch file system filled up overnight. We are investigating how to mitigate this. In the meantime, the job scheduler has been stopped, so no new jobs will start (but will remain in the queue).

Revision as of 13:50, 7 November 2016

System Status

upGPC upTCS upSandy upGravity upBGQ upFile System
upP7 downP8 downKNL upViz upHPSS


Mon Nov 7 11:59:00 EST 2016 File system has been restored. Jobs are being scheduled again. Please resubmit jobs if they crashed or had issues last night or this morning.

Mon Nov 7 10:40:00 EST 2016 Due to this issue, many jobs will have either crashed, or have not had a change to write their output; please check any jobs you had running overnight. The scratch file system is expected to be back up soon.

Mon Nov 7 9:40:00 EST 2016 Scratch file system filled up overnight. We are investigating how to mitigate this. In the meantime, the job scheduler has been stopped, so no new jobs will start (but will remain in the queue).

Mon Nov 7 8:00:00 EST 2016 Apparent file system issues.

Fri Oct 28 23:00:00 EDT 2016 The login nodes and devel nodes of the GPC, P7 and BGQ, as well as the datamover nodes, will be rebooted between 2 am and 6 am on Sat Oct 29. Running and queued jobs will not be affected, but interactive sessions will be closed.

Mon Sep 26 10:33:47 EDT 2016 HPSS schedule is back to normal operations.

Sun Sep 25 12:37:12 EDT 2016 Problems resolved. Systems have started coming online. Check the status "lights" above.

Sun Sep 25 10:16:37 EDT 2016 Power outage tripped main breaker and other circuits. Power has been restored to site but there may be an issue with cooling system power that needs to be resolved before any compute systems can be restarted

Sun Sep 25 09:28:15 EDT 2016 Staff enroute to site. After assessing situation will give ETA for recovery.

Sun Sep 25 08:46 EDT 2016 Power outage at datacentre.