Network outage

Something caused our router core to start dropping packets like it was going out of style, and that had side-effects. One of the first ways it manifested was as a DNS outage, but poking after that started getting reported started returning traceroutes going "host unreachable" while in our router core.

I'm just happy this happened during break. If it had happened during session there would have been screaming and Very Concerned deans-n-things asking for updates every few minutes. I still haven't heard of the exact cause, but know it was some strange traffic coming from multiple segments. Once those segments were cut off, the packet drops went away.

Of note to NetWare is how Timesync handled the fault. We have a Reference server and three Primaries supplying time to everything. The Reference server gets its time by way of NTP from Titan, the designated time-host on the Solaris side of the house. Because of the router problems, Titan went out-of-sync since it couldn't contact any of its sources. This caused our Reference server to take that time anyway, but report as 'out of sync'. Somewhere along the line, Titan demoted itself to a lower stratum (probably st 16) and our Reference server marked the time from there as insane and just plain quit. Once THAT happened, the three remaining Primary servers negotiated between themselves and picked a time.

Unfortunately, that took 15-20 minutes. The three main NDS servers went out-of-sync pretty quickly, so for a while there we weren't accepting any NDS changes. Again, during session there would be screaming. Happily, once the Primaries had agreed on a time, things fell back into Sync again and NDS deltas started flowing.

Other servers have been impacted. Something went screwy with our main MS-SQL server, and cause certain things like the Western Channels to stop working, portal, and other such thing. E-mail went available/unavailable depending on traffic in the core, but Outlook is robust that way so most folk didn't really notice; not being able to surf the web was much more concerning than not getting e-mail on time.