Difference between revisions of "Oldwiki.scinet.utoronto.ca:System Alerts"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
 
(920 intermediate revisions by 12 users not shown)
Line 1: Line 1:
 
== System Status==
 
== System Status==
<!-- The 'status circles' can be one of the following files:  
+
<!--  
 +
  Notes for updating the system status:
 +
 
 +
  -  When removing system status entries, please archive them to:
 +
 
 +
    http://wiki.scinethpc.ca/wiki/index.php/Previous_messages:
 +
 
 +
    (yes, the trailing colon is part of the url)
 +
 
 +
  -  The 'status circles' can be one of the following files:  
 +
 
 
     down.png  for down
 
     down.png  for down
 
     up25.png  for 25% up
 
     up25.png  for 25% up
Line 6: Line 16:
 
     up75.png  for 75% up
 
     up75.png  for 75% up
 
     up.png    for 100% up
 
     up.png    for 100% up
-->
 
[[File:up.png|up|link=GPC Quickstart]]GPC
 
[[File:up.png|up|link=TCS Quickstart]]TCS
 
[[File:up.png|up|link=GPU Devel Nodes]]ARC
 
[[File:up.png|up|link=P7 Linux Cluster]]P7
 
[[File:up.png|up|link=BGQ]]BGQ
 
[[File:up.png|up|link=HPSS]]HPSS
 
  
Wed Aug 14 20:00:00 - '''The login node and GPC development node are back in service now. We have disabled read-only mount for scratch since that was causing issues with the ongoing recover. Please check the wiki for further updates.'''
+
 +
{|
 +
|[[File:up.png|up|link=https://docs.scinet.utoronto.ca/index.php/Main_Page]][https://docs.scinet.utoronto.ca Niagara]
 +
|-
 +
|[[File:up.png|up|link=BGQ]][[BGQ]]
 +
|[[File:up.png|up|link=P7 Linux Cluster]][[P7 Linux Cluster|P7]]
 +
|[[File:up.png|up|link=P8]][[P8]]
 +
|-
 +
|[[File:up.png|up|link=SOSCIP_GPU]][[SOSCIP_GPU|SGC]]
 +
|[[File:up.png|up|link=Knights Landing]][[Knights Landing|KNL]]
 +
|[[File:down.png|up|link=HPSS]][https://docs.scinet.utoronto.ca/index.php/HPSS HPSS]
 +
|-
 +
|[[File:up.png|up|]]File System
 +
|[[File:up.png|up|]]External Network
 +
|
 +
|}
  
Wed Aug 14 19:36:41 - There are currently filesystem issues with the gpc login node and the general scinet login node.  We are working on the issue and trying to fix it.
+
-->
  
Wed Aug 14 00:30:46 - the regular monthly purge of /scratch will be delayed because of the problems with the filesystem. It will tentatively take place on 22 Aug (or later). New date will be announced here.
+
System status can now be found at [https://docs.scinet.utoronto.ca docs.scinet.utoronto.ca]
  
Tue Aug 13 20:23:27 - GPC and TCS available. See notes below about scratch2, scratch and project filesystems.
 
  
Tue Aug 13 19:15:28- for the time being, /scratch and /project will be available only from the login and devel nodes and will only be readable (you can not write to them). This way users can retrieve files they really need but we minimize the stress on the filesystem while we complete LUN verifies and filesystem checks. These filesystems will return to normal later this week (likely Wed or Thurs but may take longer than expected). We know that there are some files that may have corrupted data and will post more details later about how to identify them. The total amount of corrupted data is small and appears to be limited only to those files which were open for writing when the problems started (about 1445 on Friday, 9 Aug).  GPC users will still need to use /scratch2 for running jobs while TCS users will need to make use of /reserved1.
+
<b> Mon 23 Apr 2018 </b> GPC-compute is decommissioned, GPC-storage available until <font color=red><b>30 May 2018</b></font>
  
Tue Aug 13 17:24:18 - there is good news about /scratch and /project. They appear to be at least 99% intact. However, there are still more LUN verifies that needs to be run as well as disk fscks. It's not yet clear whether we will be able to make these disks available tonight or at some point tomorrow.  Systems should come online again within a couple of hours though perhaps only with the new /scratch2 for now.
+
<b> Thu 18 Apr 2018 </b>  Niagara system will undergo an upgrade to its Infiniband network between 9am and 12pm, should be transparent to users, however there is a chance of network interruption.
  
Tue Aug 13 17:13:58 - datacentre upgrades finished. Snubber network, upgraded trigger board, UPS for the controller and the Quickstart feature should make chiller more resilient to power events and improve the time it takes to restart. Hot circuit breakers also replaced
+
<b> Fri 13 Apr 2018 </b> HPSS system will be down for a few hours on <b>Mon, Apr/16, 9AM</b>, for hardware upgrades, in preparation for the eventual move to the Niagara side.
  
Tues Aug 13 09:00:00 - systems down for datacentre improvement work
+
<b> Tue 10 Apr 2018 </b> Niagara is open to users.
  
Sun Aug 11 21:55:06 - TCS can be used by those groups which have /reserved1 space. Use /reserved1 to run jobs as you would hve with the old /scratch (which we are still trying to recover)
+
<b> Wed 4 Apr 2018 </b> We are very close to the production launch of Niagara, the new system installed at SciNet.
 +
While the RAC allocation year officially starts today, April 4/18, the Niagara system is still undergoing some final tuning and software updates, so the plan is to officially open it to users on next week.
  
Sun Aug 11 21:49:03 - GPC is available for useThere is no /scratch or /project filesystem as we are still trying to recover them. You can use /scratch2 to run jobs in exactly the same way as the old scratch (however the environment variable is $SCRATCH2). New policies for /scratch2 are being set but for now each user is limited to 10TB and 1 million files. /home is unscathed.
+
All active GPC users will have their accounts, $HOME, and $PROJECT, transferred to the new
 +
Niagara systemThose of you who are new to SciNet, but got RAC allocations on Niagara,
 +
will have your accounts created and ready for you to login.
  
 +
We are planning an extended [https://support.scinet.utoronto.ca/education/go.php/370/index.php Intro to SciNet/Niagara session], available in person at our office, and webcast on Vidyo and possibly other means, on Wednesday April 11 at noon EST.
  
([[Previous_messages:|Previous messages]])
+
<!-- [https://support.scinet.utoronto.ca/wiki/index.php/Previous_messages:] -->

Latest revision as of 14:23, 7 May 2018

System Status

System status can now be found at docs.scinet.utoronto.ca


Mon 23 Apr 2018 GPC-compute is decommissioned, GPC-storage available until 30 May 2018

Thu 18 Apr 2018 Niagara system will undergo an upgrade to its Infiniband network between 9am and 12pm, should be transparent to users, however there is a chance of network interruption.

Fri 13 Apr 2018 HPSS system will be down for a few hours on Mon, Apr/16, 9AM, for hardware upgrades, in preparation for the eventual move to the Niagara side.

Tue 10 Apr 2018 Niagara is open to users.

Wed 4 Apr 2018 We are very close to the production launch of Niagara, the new system installed at SciNet. While the RAC allocation year officially starts today, April 4/18, the Niagara system is still undergoing some final tuning and software updates, so the plan is to officially open it to users on next week.

All active GPC users will have their accounts, $HOME, and $PROJECT, transferred to the new Niagara system. Those of you who are new to SciNet, but got RAC allocations on Niagara, will have your accounts created and ready for you to login.

We are planning an extended Intro to SciNet/Niagara session, available in person at our office, and webcast on Vidyo and possibly other means, on Wednesday April 11 at noon EST.