Oldwiki.scinet.utoronto.ca:System Alerts

From oldwiki.scinet.utoronto.ca
Jump to navigation Jump to search

System Status: UP and DOWN

The systems have been mostly down for the better part of today (Wed Dec 7), due to the intermittent and relentless issues we pointed out yesterday. Working around the clock to get things going again...



We are still encountering problems resulting from the transition to CentOS 6. While we had tested this operating system on a subset of nodes, there are problems when running at large scale, i.e. with almost 4,000 nodes in the GPC cluster.

Please bear with us as we try to fix things. Some of the symptoms are evidenced in the slow (or disappearing) filesystems, sluggish nodes, and general network problems. We'll inform users when we've solved this. In the meantime, please check this space regularly for updates.

Some of the known issues (and workarounds) are listed here.

Thanks for your patience and understanding!

The SciNet Team.


The GPC has been transitioned to CentOS 6 on Monday, December 5, 2011. While this should not have influenced running jobs, unexpectedly, the scratch and home file systems got unmounted on Monday afternoon, killing most jobs. Please resubmit.

Let us know if you encounter unexpected behavior due to the transition.

Last updated: Wed Dec 7 23:35:38 EST 2011



(Previous messages)