Oldwiki.scinet.utoronto.ca:System Alerts

From oldwiki.scinet.utoronto.ca
Revision as of 12:16, 27 February 2017 by Rzon (talk | contribs) (→‎System Status)
Jump to navigation Jump to search

System Status

upGPC upTCS upSandy upGravity upBGQ Up25.pngFile System
upP7 upP8 upKNL upViz upHPSS

Mon 27 Feb 2017 10:00:00 EST The old scratch was 99% full. Given the current incident of scratch getting unmounted everywhere, we decided that it is time to initiate the transition to the new scratch file system at this point, instead of performing a roll-out approach that we had planned earlier.

We estimate the transition to the new scratch will take roughly one day, but because we want all users data on the old scratch system to be available in the new scratch (at the same logical location), the exact length of the transition depends on the amount of new data to be transferred over.

In the meantime, no jobs will be accepted for the GPC, Sandy, or Gravity.

In addition, $SCRATCH will not be accessible to users during the transition, but you can login to the login and devel nodes. $HOME is not affected.

The current scratch system issues and the scratch transition does not affect the BGQ and TCS (anymore, although running jobs on TCS may have stopped this morning), because BGQ and TCS have their own separate scratch file system. It also does not affect groups whose scratch space is on /scratch2.

Mon 27 Feb 2017 7:20:00 EST Scratch file system is down. We are investigating.

Wed 22 Feb 2017 16:17:00 EST Globus access to HPSS is currently not operational. We hope to have a resolution for this soon.

Wed Feb 22 12:02:09 EST 2017 The HPSS library is back in service.

Tue Feb 21 19:01:08 EST 2017 The robot arm in the HPSS library is stuck, and a support call with the vendor has been opened. All jobs have been suspended until that is fixed, hopefully tomorrow (Wednesday).

Older Messages

Mon Feb 6 16:01:47 EST 2017 Full shutdown of the systems was required, in order to bring up and orderly interconnect and filesystems. Please watch this space for updates.

Mon Feb 6 14:24:51 EST 2017 We're experiencing internal networking problems, which have caused filesystems to become inaccessible on some nodes. Work is underway to fix this.