Oldwiki.scinet.utoronto.ca:System Alerts

From oldwiki.scinet.utoronto.ca
Revision as of 16:30, 22 February 2013 by Rzon (talk | contribs) (→‎System Status)
Jump to navigation Jump to search

System Status

Up.pngGPC Up75.pngTCS Up.pngARC Up.pngP7 Up.pngBGQ Up.pngHPSS

Fri Feb 22, 2013, 15:30

All GPC compute nodes are back in production. The BGQ and BGQdev clusters are back up too.

Fri Feb 22, 2013, 7:30 am

The BGQ devel system shut down at 7:30 this morning because it detected a coolant issue. We hope to have it, and the production system, back up later this afternoon.

Wed Feb 20 04:12:26 EST 2013:

Some compute nodes will be turned off Thursday (21 Feb) morning in order to reduce the cooling load in the datacentre. We'll be running on free-cooling only so that the bearings in the chiller can be replaced; that work is expected to be completed by end of Friday. At this point we're planning to shutdown 30 TCS nodes and the production BGQ (the devel system will keep running) on Thursday morning and 20% of the GPC on Friday morning. This will be done through reservations in the queueing system so that no jobs will be killed.

Plans may change depending on outside air temperatures and progress of the work. Any changes will be posted here.

Last updated Wed Feb 22 15:28:26 EST 2013
(Previous messages)