Excitement tomorrow morning

Our telecom section has informed us that they need to reboot the switches in our machine room for IOS upgrades. This has taken some effort to arrange. We have clustered services in there, and when the switches reboot, the heartbeat signals will go away. Then the clusters will do what clusters are supposed to do when heartbeat goes away, freak out.

One of us will be here tomorrow morning to issue that fateful command "cluster down", and babysit the whole process. Once everything is turned off, he'll give the OK to the Telecom dude to perform the upgrade and reboot. Once the switch is back up, up will come the servers, and the babysitting of the cluster resurection will commence.

If that behaves like the few times I've had to bring large numbers of cluster services up at the same time (such as when the NDPS problem took out the entire student half of the cluster), then something WILL go wrong. Some volume mount or something will cause one node to lock up hard, and that resource won't come up for 5 minutes or so. It'll be rocky, but we'll get it up.