I'm sure I'm not the only one salivating at the idea of a dual, 64-bit processor workstation. So what if its a mac. It's a dual, 64-bit processor workstation! Off the shelf!
Ahem.
Now that's out of the way. It wouldn't be a good idea for a work desktop, since several of my Novell apps are still solidly in Win32-land. Also, connecting to our very own Novell environment is dodgy at best from a Mac, especially the bits I need to get at in my role as local deity. So no, no G5 for me at work.
Home though... except for that ^!$%^! budget, I'd be tempted.
Auditing has come a l-o-n-g way since the old Autitcon days. A long way. They now support dumping the event-log to a wide selection of repositories, including flat-file, JDBC->Oracle, JDBC->MSSql, MySQL, and SMTP for some reason. Lots of repositories.
Can we use this? Probably. Will we use it? Probably not. Logs like that are generally used when a lawsuit hits or we need to trace down precicely what happened. And thumbing through potentially terabytes of data for just those events is not the best use of limited resources.
All this came up with a vendor troubleshoot. This particular package is behaving badly when its files are on the Netware cluster, and the vendor doesn't know why. It is clear we're dealing with MS-trained techs out of their depth in a NetWare world. They've asked us for our "novell logs" for the directory in question, and access logs for the specific time period. Um... don't have it. We could get it in, but give us a month to get the system set up correctly.
Juggling high-explosives went well this weekend. The HP techs were able to get the new SAN enclosures in, get everything with updated firmware, and upgraded the management appliance into a non-broken state. No data loss! Though the extreme care exhibited by the techs, and having to rebuild the management appliance, ment we didn't get everything up until 40 minutes after our announced outage window. But it all got done! Yay!
Sadly, due to the time overrun, we were unable to check out the exchange bad-cluster thing.
We found out where the extra space-usage was on the Exchange servers. Turns out the EXCHSRVR directories for the two halves of the exchange cluster are effectively identical.
We also found out that there was something like 35 GB going missing that wasn't turning up in the directory lists, but was showing as used at the disk-level. I hammered that thing, but couldn't dig it up. Until I reran checkdisk and actually paid attention to the drive stats, and noticed that BAD CLUSTERS was up around 32GB of space.
Whaaaaa? This is a SAN. That sort of thing shouldn't happen. We have something like three layers of error correction between the physical platter and the OS. Clearly the bad stuff wasn't in regular data our we'da heard about it by now. Because of how the striping happens, bad clusters like that should have completely shot the SAN to a smouldering glow. But the only signs of space like this is on this one Exchange server. My theory is that these got marked bad through some OS-level mistake, rather than actual bad stuff.
This will get closer inspection when we bring things up after the SAN update this Sunday. Wozers.
After reading the past few issues, I've come to the realization that I'm not getting much out of it any more. It isn't because it doesn't appeal to me, its largely because they're hawking widgets I have no hope of affording or getting past the privacy hawks. Even the case-studies aren't all that interesting, as they're deploying some spiffy new technology we'll never get, and how they overcame their obstacles. Working in higher ed does have its differences, and one of those is a more hostile IT environment.
In a sense we have a more immediate need for cranking things down, yet ironically we are unable to do just that. Our servers require very stringent patch-schedules since the time between patch-release and exploit release is now measurable in hours these days. Plus we've had at least one compromise that could be attributable to a zero-day (i.e. undisclosed vulnerability, for which no patch exists yet) exploit.
In a sense, our defenses are better than those at a private corp since we don't have the safety-blanket of a firewall to tuck us in at night and allow us to sleep well. Put up a vulnerable version of PHPBB anywhere in our network, and it'll get hacked within a day or two. Because of this, our 'soft interior' is a bit crunchier than your average corp. On the plus side, we haven't had an enterprise-wide worm nail us since I got here.
I managed to get a custom Intermapper probe worked up! This will check the available cache-buffers for a netware server, and set a warning/alarm threshold for it. Nifty! Not much production use quite yet since it is new, but it does seem to work as advertised.
<header>
type = "custom-snmp"
package = "edu.wwu.ts.netware.cb"
probe_name = "snmp.custom.netware.cachememory"
human_name = "Netware cache-memory monitoring"
version = 0.3
address_type = "IP,AT"
port_number = "161"
</header>
<parameters>
"CBCacheWarn" = "128000"
"CBCacheAlert" = "64000"
</parameters>
<snmp-device-variables>
memCacheMemoryFree, 1.3.6.1.4.1.23.2.79.1.3.0, INTEGER, "Available cache buffers"
</snmp-device-variables>
<snmp-device-thresholds>
alarm: ${memCacheMemoryFree} < ${CBCacheAlert} "Cache Buffers critically short"
warning: ${memCacheMemoryFree} < ${CBCacheWarn} "Cache Buffers getting short"
</snmp-device-thresholds>
<snmp-device-display>
\B5\NetWare Memory Thresholds\OP \4\Memory in Cache Buffers:\O\ ${memCacheMemoryFree} 4kb buffers
</snmp-device-display>
The VMWare license from Brainshare showed up. So I installed it. One thing I noticed after poking around is that, unlike MyVPC from MS, they have 'tools' for NetWare. This is really freakin cool. The big problem with NetWare is that it NOOPs the CPU during idle instead of HALTing it. This means that the Virtualization Software's CPU usage is 100% during that time. That ain't good. The 'tools' force it to HALT instead, which is really nifty. If I had two Netware servers running at the same time, the NOOP thing would force both to behave very eratically. I should be able to virtualize clusters with this stuff!
More poking later.
It has been a busy couple of day out of work, so I haven't been here all that much. Thus, fewer updates.
In other news, Novell has shipped us field-test code for NDPSM. The new module has some improvements, but we've found that it is capable of blowing the stack. Unfortunately, we haven't isolated where this happens. When it fails, zeros get scribbled all over hither and yon and handily overwrites the area code was when things went south.
The abend.log format has changed! They're now including register dumps and a stack-trace. Hopefully this will mean fewer core-dumps need taking.
The output from the 'cluster resources' command is now sorted alphabeticly. Previously it was sorted by date of addition. A cosmetic change, but a good one for readability.