Difference between revisions of "Oldwiki.scinet.utoronto.ca:System Alerts"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
Line 33: Line 33:
 
|}
 
|}
  
 +
<b>Wed 22 Feb 2017 16:17:00 EST</b> Globus access to HPSS is currently not operational.  We hope to have a resolution for this soon.
  
 
<b>Wed Feb 22 12:02:09 EST 2017</b> The HPSS library is back in service.
 
<b>Wed Feb 22 12:02:09 EST 2017</b> The HPSS library is back in service.
  
 
<b>Tue Feb 21 19:01:08 EST 2017</B> The robot arm in the HPSS library is stuck, and a support call with the vendor has been opened. All jobs have been suspended until that is fixed, hopefully tomorrow (Wednesday).  
 
<b>Tue Feb 21 19:01:08 EST 2017</B> The robot arm in the HPSS library is stuck, and a support call with the vendor has been opened. All jobs have been suspended until that is fixed, hopefully tomorrow (Wednesday).  
 +
 +
<b>Older Messages</b>
  
 
<b>Mon Feb  6 16:01:47 EST 2017</b>  Full shutdown of the systems was required, in order to bring up and orderly interconnect and filesystems.  Please watch this space for updates.
 
<b>Mon Feb  6 16:01:47 EST 2017</b>  Full shutdown of the systems was required, in order to bring up and orderly interconnect and filesystems.  Please watch this space for updates.
Line 42: Line 45:
 
<b>Mon Feb  6 14:24:51 EST 2017</b>  We're experiencing internal networking problems, which have caused filesystems to become inaccessible on some nodes.  Work is underway to fix this.
 
<b>Mon Feb  6 14:24:51 EST 2017</b>  We're experiencing internal networking problems, which have caused filesystems to become inaccessible on some nodes.  Work is underway to fix this.
  
<b>Sat Jan 28 12:54:12 EST 2017</b> GPC and TCS are up, as well as the filesystems.  We have reverted to the old /scratch and /project disks on the GPC, until we can ascertain what was wrong with the new appliance.  In the meantime please submit your jobs as usual.  Also, please help us by cleaning up unnecessary stuff on /scratch.  For TCS users:  the changes we implemented, where you need to use /scratchtcs, are still in effect.
 
 
<b>Sat Jan 28 12:19:16 EST 2017</b>  We are bringing the systems back.  Expect to be ready in a couple of hours.
 
 
<b>Sat 28 Jan 2017 8:41 EST</b> BGQ is not affected and the system is up.
 
 
<b>Sat 28 Jan 2017 8:15 EST</b> Further issues found on the file system; system access to users has been closed until we can solve these issues.
 
 
<b>Fri 27 Jan 2017 15:11:58 EST</b> Cluster network issue is resolved and filesystem access is finally resolved after determining the root cause of the network issue.
 
 
<b> Fri Jan 27 11:20:32 EST 2017 </b> While we're restoring things, file systems will generally not be available for usage during this period, to facilitate our work. Sorry.
 
 
<b> Fri Jan 27 10:02:32 EST 2017 </b> The IB network fabric had a failure earlier today that affected the file systems. The IB fabric is back to normal, and we're working on restoring the file systems at the moment.
 
 
<b> Fri Jan 27 7:34:00 EST 2017 </b> Issues with the new scratch file system; we're investigating.
 
 
<b> Thu Jan 26 21:24:14 EST 2017 </b> Maintenance finished, systems back online and available, with the exception of the TCS, that does not accept jobs yet (but the devel nodes are accessible).
 
 
Jan 25, 2017, 18:48: BGQ is online.
 
  
  
 
<!-- [https://support.scinet.utoronto.ca/wiki/index.php/Previous_messages:]  -->
 
<!-- [https://support.scinet.utoronto.ca/wiki/index.php/Previous_messages:]  -->

Revision as of 17:24, 22 February 2017

System Status

upGPC upTCS upSandy upGravity upBGQ file systems unmountedFile System
upP7 upP8 upKNL upViz upHPSS

Wed 22 Feb 2017 16:17:00 EST Globus access to HPSS is currently not operational. We hope to have a resolution for this soon.

Wed Feb 22 12:02:09 EST 2017 The HPSS library is back in service.

Tue Feb 21 19:01:08 EST 2017 The robot arm in the HPSS library is stuck, and a support call with the vendor has been opened. All jobs have been suspended until that is fixed, hopefully tomorrow (Wednesday).

Older Messages

Mon Feb 6 16:01:47 EST 2017 Full shutdown of the systems was required, in order to bring up and orderly interconnect and filesystems. Please watch this space for updates.

Mon Feb 6 14:24:51 EST 2017 We're experiencing internal networking problems, which have caused filesystems to become inaccessible on some nodes. Work is underway to fix this.