Difference between revisions of "Oldwiki.scinet.utoronto.ca:System Alerts"

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search
 
(176 intermediate revisions by 8 users not shown)
Line 17: Line 17:
 
     up.png    for 100% up
 
     up.png    for 100% up
  
  -->
+
   
 
{|  
 
{|  
|[[File:up75.png|up|link=GPC Quickstart]][[GPC Quickstart|GPC]]
+
|[[File:up.png|up|link=https://docs.scinet.utoronto.ca/index.php/Main_Page]][https://docs.scinet.utoronto.ca Niagara]
|[[File:up.png|up|link=TCS Quickstart]][[TCS Quickstart|TCS]]
+
|-
|[[File:up.png|up|link=Sandy]][[Sandy]]
 
|[[File:up.png|up|link=Gravity]][[Gravity]]
 
 
|[[File:up.png|up|link=BGQ]][[BGQ]]
 
|[[File:up.png|up|link=BGQ]][[BGQ]]
|[[File:up.png|up|]]File System
 
|-
 
 
|[[File:up.png|up|link=P7 Linux Cluster]][[P7 Linux Cluster|P7]]
 
|[[File:up.png|up|link=P7 Linux Cluster]][[P7 Linux Cluster|P7]]
 
|[[File:up.png|up|link=P8]][[P8]]
 
|[[File:up.png|up|link=P8]][[P8]]
 +
|-
 +
|[[File:up.png|up|link=SOSCIP_GPU]][[SOSCIP_GPU|SGC]]
 
|[[File:up.png|up|link=Knights Landing]][[Knights Landing|KNL]]
 
|[[File:up.png|up|link=Knights Landing]][[Knights Landing|KNL]]
|[[File:up.png|up|link=Visualization Nodes]][[Visualization Nodes|Viz]]
+
|[[File:down.png|up|link=HPSS]][https://docs.scinet.utoronto.ca/index.php/HPSS HPSS]
|[[File:up.png|up|link=HPSS]][[HPSS]]
+
|-
 +
|[[File:up.png|up|]]File System
 +
|[[File:up.png|up|]]External Network
 +
|
 
|}
 
|}
  
 +
-->
  
<b> Mon Mar 20 20:50:00 EDT 2017</b> File system has recovered; issues remain with the scheduler.
+
System status can now be found at [https://docs.scinet.utoronto.ca docs.scinet.utoronto.ca]
 
 
<b> Mon Mar 20 14:56:05 EDT 2017</b> Problems with IB fabric or the scratch3 & project3 file systems. We are investigating.
 
 
 
<b> Tue Mar 15 18:00:00 EST 2017</b> Systems are back online and fully operational.
 
 
 
<b> Tue Mar 15 16:31:39 EST 2017</b> Power glitch at data center. Compute nodes went down, bringing them up.
 
 
 
<b> Sun Mar  5 14:34:11 EST 2017</b> Globus access to HPSS has been re-enabled.
 
 
 
<b> Thu Mar  2 9:29:14 EST 2017 </b> GPC jobs are back running. 
 
 
 
<b>Thu Mar  2 01:54:57 EST 2017</b> scratch filesystem went down earlier and most GPC jobs were killed. New GPC jobs are in hold till disk check finished in the morning.
 
 
 
<b>Tue Feb 28 2017 16:00:00 EST</b> The file transfer of users files on the old scratch system to the new scratch system has been completed.  The new scratch folders are logically in the same place as before, i.e. /scratch/G/GROUP/USER.   Your $SCRATCH environment variable will point to this location when you log in.   The project folders have also been moved in the same way. Compute jobs have been released and are starting to run. Let us know if you have any concerns. Thank you for you patience.
 
  
<b>Tue Feb 28 2017 10:02:45 EST</b> It could take a few more hours for the scratch migration to finish. We still have a dozen or so users to go. Please check this page from time to time for updates.
 
  
<b>Mon Feb 27 2017 10:00:00 EST</b> The old scratch was 99% full. Given the current incident of scratch getting unmounted everywhere, we had little choice but to decide that it is time to initiate the transition to the new scratch file system at this point, instead of performing a roll-out approach that we had planned earlier.
+
<b> Mon 23 Apr 2018 </b> GPC-compute is decommissioned, GPC-storage available until <font color=red><b>30 May 2018</b></font>
  
We estimate the transition to the new scratch will take roughly one day, but since we want all users' data on the old scratch system to be available in the new scratch (at the same logical location), the exact duration of the transition depends on the amount of new data to be transferred over.
+
<b> Thu 18 Apr 2018 </b>  Niagara system will undergo an upgrade to its Infiniband network between 9am and 12pm, should be transparent to users, however there is a chance of network interruption.
  
In the meantime, no jobs will start running on the GPC, Sandy, Gravity or P7.
+
<b> Fri 13 Apr 2018 </b> HPSS system will be down for a few hours on <b>Mon, Apr/16, 9AM</b>, for hardware upgrades, in preparation for the eventual move to the Niagara side.
  
In addition, $SCRATCH will not be accessible to users during the transition, but you can login to the login and devel nodes. $HOME is not affected.
+
<b> Tue 10 Apr 2018 </b> Niagara is open to users.
  
The current scratch system issue and the scratch transition don't affect the BGQ or TCS anymore (although running jobs on TCS may have stopped this morning), because BGQ and TCS have their own separate scratch file systems. It also does not affect groups whose scratch space is on /scratch2.
+
<b> Wed 4 Apr 2018 </b> We are very close to the production launch of Niagara, the new system installed at SciNet.
 +
While the RAC allocation year officially starts today, April 4/18, the Niagara system is still undergoing some final tuning and software updates, so the plan is to officially open it to users on next week.
  
<b>Mon Feb 27 2017 7:20:00 EST</b> Scratch file system is down. We are investigating.
+
All active GPC users will have their accounts, $HOME, and $PROJECT, transferred to the new
 +
Niagara system. Those of you who are new to SciNet, but got RAC allocations on Niagara,
 +
will have your accounts created and ready for you to login.
  
<b>Wed Feb 22 2017 16:17:00 EST</b> Globus access to HPSS is currently not operational. We hope to have a resolution for this soon.
+
We are planning an extended [https://support.scinet.utoronto.ca/education/go.php/370/index.php Intro to SciNet/Niagara session], available in person at our office, and webcast on Vidyo and possibly other means, on Wednesday April 11 at noon EST.
  
<!-- [https://support.scinet.utoronto.ca/wiki/index.php/Previous_messages:] -->
+
<!-- [https://support.scinet.utoronto.ca/wiki/index.php/Previous_messages:] -->

Latest revision as of 14:23, 7 May 2018

System Status

System status can now be found at docs.scinet.utoronto.ca


Mon 23 Apr 2018 GPC-compute is decommissioned, GPC-storage available until 30 May 2018

Thu 18 Apr 2018 Niagara system will undergo an upgrade to its Infiniband network between 9am and 12pm, should be transparent to users, however there is a chance of network interruption.

Fri 13 Apr 2018 HPSS system will be down for a few hours on Mon, Apr/16, 9AM, for hardware upgrades, in preparation for the eventual move to the Niagara side.

Tue 10 Apr 2018 Niagara is open to users.

Wed 4 Apr 2018 We are very close to the production launch of Niagara, the new system installed at SciNet. While the RAC allocation year officially starts today, April 4/18, the Niagara system is still undergoing some final tuning and software updates, so the plan is to officially open it to users on next week.

All active GPC users will have their accounts, $HOME, and $PROJECT, transferred to the new Niagara system. Those of you who are new to SciNet, but got RAC allocations on Niagara, will have your accounts created and ready for you to login.

We are planning an extended Intro to SciNet/Niagara session, available in person at our office, and webcast on Vidyo and possibly other means, on Wednesday April 11 at noon EST.