I learned something today. Excessive DOS accesses in NetWare can cause a cluster poison-pill! I'm guessing this is a manifestation of the Realtime 'bug'. If so, this is an aspect of my favorite rant against Netware.

Real Mode Operations
NetWare is one of the few operating systems out there that still insists on putting the CPU into Real Mode for operations. There are a few things that really kick it off:
  • Extended SCSI operations, such as those that command an auto-loader robot arm, or on some baddly designed tape-drives, the eject/insert mechanism. In this case, the server drops into Real Mode, gives the SCSI command, and waits in real mode until the command finishes. For things like 'Go To Slot 5' this can take whole seconds to complete. For things like "Eject tape in drive & put in slot 5" it can take up to five MINUTES.
  • DOS-partition access You can get around this one by loading DOSFAT.NSS, but almost noone does that. Access to the DOS partition is done in Real Mode because DOS-FAT can't understand multiple accesses, so NetWare has to make sure that access to it are done sequentially.
The problem with Real Mode access is that when it goes into real mode, all other I/O operations on the server STOP UNTIL THE OPERATION IS COMPLETED. That's everything. Disk I/O (how I ran into it the first time). Network I/O. You name it. All halted until the *#$!~ tape can finish rewinding and go back to slot 2. Highly annoying.

Yes, I know there is a TID on this, but we apparently haven't used it here yet.

The problem is that on a cluster, real-mode can make you miss your heartbeat. And that makes you go split-brain. Not Good. I managed to crash two of the three servers this way! At essentially the same time! Good thing that third server was around, I tell yah.