Oldwiki.scinet.utoronto.ca:System Alerts

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search

System Status

upGPC upTCS upSandy upARC upFile System
upGravity upP7 upBGQ upHPSS

Sat 17 Jan 2015 17:33:47 EST: Unusual cooling problem. Systems down. Staff enroute to site

Thu Jan 15 11:22:00 EST: Cooling tower fan belt service is finished. Chiller is being serviced as scheduled while the chilled water plant is working on free-cooling mode. We are not expecting any interruption for users. Systems are being brought up now.

Wed Jan 14 17:02:18 EST: Emergency shutdown of all compute nodes 8:30AM tomorrow (Thurs, 15 Jan). After starting to bring up systems this afternoon we learned that an emergency replacement of the cooling tower fan belt is required tomorrow morning. Compute systems that are currently up will need to be shutdown at 0830 tomorrow. We will attempt to keep login nodes and storage up during tomorrow's downtime which is expected to last 1-4 hrs.

Wed Jan 14 14:34:18 EST: Expect some systems (login nodes, GPC and BGQ) to be available by approx 3:00-3:30PM.

Wed Jan 14 13:09:03 EST: Free-cooling is being restored and should allow compute systems to come online this afternoon. Chiller maintenance will continue throughout the day and possibly into tomorrow. Check back for updates.


On January 14 and 15, scheduled maintenance on the data centre's cooling system will require all systems to be shut down for at least the first part of the maintenance. All SciNet systems will be shut down at 7 AM on Wednesday January 14, 2015 and all login sessions and jobs will be killed at that time.

At the earliest, the systems will be available again later on Wednesday afternoon, but is it possible that the downtime will extend into Thursday January 15, 2015. Check here on the SciNet wiki (wiki.scinethpc.ca) for updates on Wednesday and Thursday.


(Previous messages)