Difference between revisions of "Oldwiki.scinet.utoronto.ca:System Alerts"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
 
Line 1: Line 1:
== System Status: <span style="color:#33AA33">'''UP''' ==
+
== System Status==
 +
<!--
 +
  Notes for updating the system status:
  
We hope to have stabilized the system.  Please let us know if there are any problems.
+
  -  When removing system status entries, please archive them to:
  
Thu Dec  8 16:43:47 EST 2011
+
    http://wiki.scinethpc.ca/wiki/index.php/Previous_messages:
  
--------------------
+
    (yes, the trailing colon is part of the url)
  
We continue to experience random outages of the system.  Network problems are the latest suspect.  All/most GPC jobs died at around 2:40pm today.
+
  -  The 'status circles' can be one of the following files:  
  
 +
    down.png  for down
 +
    up25.png  for 25% up
 +
    up50.png  for 50% up
 +
    up75.png  for 75% up
 +
    up.png    for 100% up
  
--------------------
+
 +
{|
 +
|[[File:up.png|up|link=https://docs.scinet.utoronto.ca/index.php/Main_Page]][https://docs.scinet.utoronto.ca Niagara]
 +
|-
 +
|[[File:up.png|up|link=BGQ]][[BGQ]]
 +
|[[File:up.png|up|link=P7 Linux Cluster]][[P7 Linux Cluster|P7]]
 +
|[[File:up.png|up|link=P8]][[P8]]
 +
|-
 +
|[[File:up.png|up|link=SOSCIP_GPU]][[SOSCIP_GPU|SGC]]
 +
|[[File:up.png|up|link=Knights Landing]][[Knights Landing|KNL]]
 +
|[[File:down.png|up|link=HPSS]][https://docs.scinet.utoronto.ca/index.php/HPSS HPSS]
 +
|-
 +
|[[File:up.png|up|]]File System
 +
|[[File:up.png|up|]]External Network
 +
|
 +
|}
  
 +
-->
  
We are still encountering problems resulting from the transition to CentOS 6. While we had tested this operating system on a subset of nodes, there are problems when running at large scale, i.e. with almost 4,000 nodes in the GPC cluster.
+
System status can now be found at [https://docs.scinet.utoronto.ca docs.scinet.utoronto.ca]
  
Please bear with us as we try to fix things.  Some of the symptoms are evidenced in the slow (or disappearing) filesystems, sluggish nodes, and general network problems.  We'll inform users when we've solved this.  In the meantime, please check this space regularly for updates.
 
  
'''Some of the known issues (and workarounds) are listed [[Transition to CentOS 6|here]].'''
+
<b> Mon 23 Apr 2018 </b> GPC-compute is decommissioned, GPC-storage available until <font color=red><b>30 May 2018</b></font>
  
Thanks for your patience and understanding!
+
<b> Thu 18 Apr 2018 </b>  Niagara system will undergo an upgrade to its Infiniband network between 9am and 12pm, should be transparent to users, however there is a chance of network interruption. 
  
The SciNet Team.
+
<b> Fri 13 Apr 2018 </b> HPSS system will be down for a few hours on <b>Mon, Apr/16, 9AM</b>, for hardware upgrades, in preparation for the eventual move to the Niagara side.
  
--------------------
+
<b> Tue 10 Apr 2018 </b> Niagara is open to users.
  
The GPC has been transitioned to CentOS 6 on Monday, December 5, 2011. While this should not have influenced running jobs, unexpectedly, the scratch and home file systems got unmounted on Monday afternoon, killing most jobs. Please resubmit.
+
<b> Wed 4 Apr 2018 </b> We are very close to the production launch of Niagara, the new system installed at SciNet.
 +
While the RAC allocation year officially starts today, April 4/18, the Niagara system is still undergoing some final tuning and software updates, so the plan is to officially open it to users on next week.
  
Let us know if you encounter unexpected behavior due to the transition.
+
All active GPC users will have their accounts, $HOME, and $PROJECT, transferred to the new
 +
Niagara system.  Those of you who are new to SciNet, but got RAC allocations on Niagara,
 +
will have your accounts created and ready for you to login.
  
Last updated: Thu Dec  8 15:06:31 EST 2011
+
We are planning an extended [https://support.scinet.utoronto.ca/education/go.php/370/index.php Intro to SciNet/Niagara session], available in person at our office, and webcast on Vidyo and possibly other means, on Wednesday April 11 at noon EST.
  
 
+
<!-- [https://support.scinet.utoronto.ca/wiki/index.php/Previous_messages:] -->
 
 
 
 
([[Previous_messages:|Previous messages]])
 

Latest revision as of 14:23, 7 May 2018

System Status

System status can now be found at docs.scinet.utoronto.ca


Mon 23 Apr 2018 GPC-compute is decommissioned, GPC-storage available until 30 May 2018

Thu 18 Apr 2018 Niagara system will undergo an upgrade to its Infiniband network between 9am and 12pm, should be transparent to users, however there is a chance of network interruption.

Fri 13 Apr 2018 HPSS system will be down for a few hours on Mon, Apr/16, 9AM, for hardware upgrades, in preparation for the eventual move to the Niagara side.

Tue 10 Apr 2018 Niagara is open to users.

Wed 4 Apr 2018 We are very close to the production launch of Niagara, the new system installed at SciNet. While the RAC allocation year officially starts today, April 4/18, the Niagara system is still undergoing some final tuning and software updates, so the plan is to officially open it to users on next week.

All active GPC users will have their accounts, $HOME, and $PROJECT, transferred to the new Niagara system. Those of you who are new to SciNet, but got RAC allocations on Niagara, will have your accounts created and ready for you to login.

We are planning an extended Intro to SciNet/Niagara session, available in person at our office, and webcast on Vidyo and possibly other means, on Wednesday April 11 at noon EST.