It Finally Happened

I got a call last night about 1:30. Student printing was reported to be down in one of the all-night computer labs. I know from experience that if one is down the rest likely are as well.

And they were.

The cluster node running NDPSM had abended, and recovery didn't go well. As usual. Same problem we've had for a while in that the NDPS queues themsevles are on a different volume than the one associated with the printing service. And when a cluster fail occures, seven times out of ten the two services end up on different nodes and student printing fails until we can manually get the two services together.

Seeing as how I'd really rather not have more 1:30am calls, I took steps to move the volume. It is actually fairly simple to do....

SERVER: ndpsm /dbvolume=.fully.qualified.nds.name.of.volume

And it'll magically migrate the queues to the specified volume! Whee! One of the reasons we hadn't done this earlier is due to a bad design decision back when the cluster went in. The print volumes that hold queues were each created with 500mb of space. In the past that's been just fine. But that didn't take into account either NDPS driver-storage, and 14 page PDFs with nothing but high-res scanned in pages for data (which generates 150mb print files, which crashes printers and hangs the job on the server, causing the poor student to try to print to another server, same song second verse another 150 meg down the tubes, goes to a different computer lab....). Now we have more slack space in the SAN, we're expanding those volumes to a nice, cozy 10gb.

Faculty side has been expanded this morning. Students will be tomorrow, once the SAN-disks finish re-striping the data.