Oldwiki.scinet.utoronto.ca:System Alerts

From oldwiki.scinet.utoronto.ca
Revision as of 16:22, 22 February 2013 by Rzon (talk | contribs) (→‎System Status)
Jump to navigation Jump to search

System Status

Up.pngGPC Up75.pngTCS Up.pngARC Up.pngP7 Down.pngBGQ Up.pngHPSS

Fri Feb 22, 2013:

The BGQ devel system shut down at 7:30 this morning because it detected a coolant issue. We hope to have it, and the production system, back up later this afternoon.

All GPC compute nodes are back in production.

Wed Feb 20 04:12:26 EST 2013:

Some compute nodes will be turned off Thursday (21 Feb) morning in order to reduce the cooling load in the datacentre. We'll be running on free-cooling only so that the bearings in the chiller can be replaced; that work is expected to be completed by end of Friday. At this point we're planning to shutdown 30 TCS nodes and the production BGQ (the devel system will keep running) on Thursday morning and 20% of the GPC on Friday morning. This will be done through reservations in the queueing system so that no jobs will be killed.

Plans may change depending on outside air temperatures and progress of the work. Any changes will be posted here.

Last updated Wed Feb 20 04:12:26 EST 2013
(Previous messages)