Difference between revisions of "Oldwiki.scinet.utoronto.ca:System Alerts"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
Line 35: Line 35:
 
<b>Mon 27 Feb 2017 10:00:00 EST</b> The old scratch was 99% full. Given the current incident of scratch getting unmounted everywhere, we had little choice but to decide that it is time to initiate the transition to the new scratch file system at this point, instead of performing a roll-out approach that we had planned earlier.  
 
<b>Mon 27 Feb 2017 10:00:00 EST</b> The old scratch was 99% full. Given the current incident of scratch getting unmounted everywhere, we had little choice but to decide that it is time to initiate the transition to the new scratch file system at this point, instead of performing a roll-out approach that we had planned earlier.  
  
We estimate the transition to the new scratch will take roughly one day, but because we want all users data on the old scratch system to be available in the new scratch (at the same logical location), the exact length of the transition depends on the amount of new data to be transferred over.
+
We estimate the transition to the new scratch will take roughly one day, but since we want all users' data on the old scratch system to be available in the new scratch (at the same logical location), the exact duration of the transition depends on the amount of new data to be transferred over.
  
 
In the meantime, no jobs will start running on the GPC, Sandy, Gravity, or P7.   
 
In the meantime, no jobs will start running on the GPC, Sandy, Gravity, or P7.   
Line 41: Line 41:
 
In addition, $SCRATCH will not be accessible to users during the transition, but you can login to the login and devel nodes. $HOME is not affected.
 
In addition, $SCRATCH will not be accessible to users during the transition, but you can login to the login and devel nodes. $HOME is not affected.
  
The current scratch system issues and the scratch transition does not affect the BGQ and TCS (anymore, although running jobs on TCS may have stopped this morning), because BGQ and TCS have their own separate scratch file system. It also does not affect groups whose scratch space is on /scratch2.
+
The current scratch system issue and the scratch transition don't affect the BGQ or TCS anymore (although running jobs on TCS may have stopped this morning), because BGQ and TCS have their own separate scratch file systems. It also does not affect groups whose scratch space is on /scratch2.
  
 
<b>Mon 27 Feb 2017 7:20:00 EST</b> Scratch file system is down. We are investigating.
 
<b>Mon 27 Feb 2017 7:20:00 EST</b> Scratch file system is down. We are investigating.

Revision as of 22:49, 27 February 2017

System Status

upGPC upTCS upSandy upGravity upBGQ Up25.pngFile System
upP7 upP8 upKNL upViz upHPSS

Mon 27 Feb 2017 10:00:00 EST The old scratch was 99% full. Given the current incident of scratch getting unmounted everywhere, we had little choice but to decide that it is time to initiate the transition to the new scratch file system at this point, instead of performing a roll-out approach that we had planned earlier.

We estimate the transition to the new scratch will take roughly one day, but since we want all users' data on the old scratch system to be available in the new scratch (at the same logical location), the exact duration of the transition depends on the amount of new data to be transferred over.

In the meantime, no jobs will start running on the GPC, Sandy, Gravity, or P7.

In addition, $SCRATCH will not be accessible to users during the transition, but you can login to the login and devel nodes. $HOME is not affected.

The current scratch system issue and the scratch transition don't affect the BGQ or TCS anymore (although running jobs on TCS may have stopped this morning), because BGQ and TCS have their own separate scratch file systems. It also does not affect groups whose scratch space is on /scratch2.

Mon 27 Feb 2017 7:20:00 EST Scratch file system is down. We are investigating.

Wed 22 Feb 2017 16:17:00 EST Globus access to HPSS is currently not operational. We hope to have a resolution for this soon.