Difference between revisions of "Oldwiki.scinet.utoronto.ca:System Alerts"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
Line 19: Line 19:
 
  -->
 
  -->
 
{|  
 
{|  
|[[File:up25.png|up|link=GPC Quickstart]][[GPC Quickstart|GPC]]
+
|[[File:up.png|up|link=GPC Quickstart]][[GPC Quickstart|GPC]]
|[[File:up25.png|up|link=TCS Quickstart]][[TCS Quickstart|TCS]]
+
|[[File:up.png|up|link=TCS Quickstart]][[TCS Quickstart|TCS]]
|[[File:up25.png|up|link=Sandy]][[Sandy]]
+
|[[File:up.png|up|link=Sandy]][[Sandy]]
|[[File:up25.png|up|link=Gravity]][[Gravity]]
+
|[[File:up.png|up|link=Gravity]][[Gravity]]
|[[File:up25.png|up|link=BGQ]][[BGQ]]
+
|[[File:up.png|up|link=BGQ]][[BGQ]]
|[[File:up75.png|file systems unmounted|]]File System
+
|[[File:up.png|file systems unmounted|]]File System
 
|-
 
|-
|[[File:up25.png|up|link=P7 Linux Cluster]][[P7 Linux Cluster|P7]]
+
|[[File:up.png|up|link=P7 Linux Cluster]][[P7 Linux Cluster|P7]]
|[[File:up25.png|up|link=P8]][[P8]]
+
|[[File:up.png|up|link=P8]][[P8]]
|[[File:up25.png|up|link=Knights Landing]][[Knights Landing|KNL]]
+
|[[File:up.png|up|link=Knights Landing]][[Knights Landing|KNL]]
|[[File:up25.png|up|link=Visualization Nodes]][[Visualization Nodes|Viz]]
+
|[[File:up.png|up|link=Visualization Nodes]][[Visualization Nodes|Viz]]
 
|[[File:up.png|up|link=HPSS]][[HPSS]]
 
|[[File:up.png|up|link=HPSS]][[HPSS]]
 
|}
 
|}
 +
 +
<b> Tue Mar 15 18:00:00 EST 2017</b> Systems are back online and fully operational.
  
 
<b> Tue Mar 15 16:31:39 EST 2017</b> Power glitch at data center. Compute nodes went down, bringing them up.
 
<b> Tue Mar 15 16:31:39 EST 2017</b> Power glitch at data center. Compute nodes went down, bringing them up.

Revision as of 18:14, 15 March 2017

System Status

upGPC upTCS upSandy upGravity upBGQ Up.pngFile System
upP7 upP8 upKNL upViz upHPSS

Tue Mar 15 18:00:00 EST 2017 Systems are back online and fully operational.

Tue Mar 15 16:31:39 EST 2017 Power glitch at data center. Compute nodes went down, bringing them up.

Sun Mar 5 14:34:11 EST 2017 Globus access to HPSS has been re-enabled.

Thu Mar 2 9:29:14 EST 2017 GPC jobs are back running.

Thu Mar 2 01:54:57 EST 2017 scratch filesystem went down earlier and most GPC jobs were killed. New GPC jobs are in hold till disk check finished in the morning.

Tue Feb 28 2017 16:00:00 EST The file transfer of users files on the old scratch system to the new scratch system has been completed. The new scratch folders are logically in the same place as before, i.e. /scratch/G/GROUP/USER. Your $SCRATCH environment variable will point to this location when you log in. The project folders have also been moved in the same way. Compute jobs have been released and are starting to run. Let us know if you have any concerns. Thank you for you patience.

Tue Feb 28 2017 10:02:45 EST It could take a few more hours for the scratch migration to finish. We still have a dozen or so users to go. Please check this page from time to time for updates.

Mon Feb 27 2017 10:00:00 EST The old scratch was 99% full. Given the current incident of scratch getting unmounted everywhere, we had little choice but to decide that it is time to initiate the transition to the new scratch file system at this point, instead of performing a roll-out approach that we had planned earlier.

We estimate the transition to the new scratch will take roughly one day, but since we want all users' data on the old scratch system to be available in the new scratch (at the same logical location), the exact duration of the transition depends on the amount of new data to be transferred over.

In the meantime, no jobs will start running on the GPC, Sandy, Gravity or P7.

In addition, $SCRATCH will not be accessible to users during the transition, but you can login to the login and devel nodes. $HOME is not affected.

The current scratch system issue and the scratch transition don't affect the BGQ or TCS anymore (although running jobs on TCS may have stopped this morning), because BGQ and TCS have their own separate scratch file systems. It also does not affect groups whose scratch space is on /scratch2.

Mon Feb 27 2017 7:20:00 EST Scratch file system is down. We are investigating.

Wed Feb 22 2017 16:17:00 EST Globus access to HPSS is currently not operational. We hope to have a resolution for this soon.