Oldwiki.scinet.utoronto.ca:System Alerts

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search

System Status: PARTIALLY UP

Oct 19 13:05:00 Half of the GPC is being brought up again. TCS, P7, ARC, BGQ, and HPSS are not in operation yet as the chiller control system still needs repairing.

Oct 19 11:02:48 Staff and technicians on-site have concluded that a chiller control board needs to be replaced. We believe we can bring up the chiller manually now and get a portion of the GPC running by 1PM. The repair work will require a brief chiller shutdown (but no GPC shutdown) later in the day so TCS will stay off for now in order to minimize heat load.

Oct 18 23:19:04 Still seeing significant voltage fluctuations in facility power. Will keep systems off rather then risk another failure overnight. Sorry for the inconvenience. Expect to be back up by noon tomorrow (possibly earlier)

Oct 18 22:35:13 Power quality issues brought down the chiller, which required a shutdown of the clusters. Power and chiller are coming back up, and we hope to have the clusters up by morning.

Oct 18 21:01:00 The datacentre is down due to a power failure. We are investigating the problem.

(Previous messages)