April 2004 Archives

Yesterday we had our internet bandwidth increased. Almost imediately, we breached the previous max. I'm not surprised by this. What I am surprised by is that we haven't bumped out head on the max yet! I had fully expected the powers that be to notice the faster response times and start taking advantage of it. If we had ResNet on our feed, we'd be maxed no problem. But they have a different feed.

Also, co-worker is suddenly out of the office due to family issues. This did not time well with the recent blackboard-test crash that seems to have happened. I've been having to deal with the retrieval requests from the application administratiors in his absence. It has been going well so far, thanks to his efforts of yesterday.

After a bit of trickery with scripting and an apache log, I was able to generate a hammerhead scenario set. The generated scenarios were based on real requests made to the server, so I could get a wide basis for generating scripts that would NOT return 404. Happy me. I managed to stress the server enough that the server itself was taking about a second to return requests, and probably couldn't cache it all. The server itself barely broke over 10% CPU during the test, so I consider it a valid verification.

Two days ago I viewed the Novell Patch Management webinar offered by Novell. It was a very high level view of their new Zen Patch Management product, offered through a partnership with Patchlink. The webinar was, as I said, very high level. So high, in fact, that they had to spend 30 minutes describing the problem; something that sysadmins in the line of combat don't need. I got a very few points out of the seminar, but they are somewhat useful.
  • The agents and server software is free if you have a Zen maintenance contract
  • A subscription is required, licensed per-device, to get packaged patches
  • When asked directly if it was possible to create your own patch packages, the presenter dodged the question
  • The list price is $18/device, which only small shops will pay as anyone with any volume in Novell software has SOME discount off of list thanks to contract.
  • The repository server has to be a Windows server, but agents exist for Netware and Windows
  • Linux patching is offered though the Ximian purchase of Red Carpet
All of this tells me that this product is only somewhat useful. We don't have the funds, even with our sizable education discount, to even think of covering all of our managed desktops. If we deploy, it'll be our servers only. I'm glad they're offering a product like this, but it still is out of reach for us.

I managed to get mod_status and mod_info put into place on our MyWeb servers. It was interesting to set up, and I'm glad I got that in there. Mod_status provides a basic view of how busy the server really is at that point in time. And so far, they aren't very busy. Fun stuff.

The Scalar 100 was racked today. I also learned that the other administrator was planning on having all four of the backup servers drive one of the drives in this system. I wish he would have told me that, since that particular setup would have ended up costing us around $12,000 more in hardware and software costs. He was, as they say, not pleased. Now we get to figure out how to drive a centralized backup server with responsibility to back up everything Windows/netware on a single server.

There are some real I/O problems with this. For one, with four drives the SCSI card in question has the potential to take up a good chunk of PCI-bus just on that; 64-bit scsi would be a good idea here. Since all of the backup duties will be remote, we get to cram that stream over the ethernet. The network really would benefit from a gigabit NIC, since we have the potential of four separate streams coming down a 100 megabit NIC we WILL get contention and the higher speed is needed. And since we don't really have a server beefy enough to drive it by itself, we're going to have to live with bottlenecking on the PCI bus.

Oh, this will be fun.

The new apache modules seem to be working. The unexplained crash yesterday might have nothing to do with them. We'll wait until Friday to determine if all is indeed well.

A user had some bad seafood, and I get to clean it up. Eeeew.

We got code from Novell this morning! I've thrown it in. In an interesting development, the copywrite line of the NLM information changed. And the files are slightly smaller. And the version re-reved back to 1.00.00.

More worrying about spam stuff. The Sophos product looks pretty nifty, but price could be a real factor there. Cyphertrust also looks pretty nifty, but the apparent lack of an end-user definable white-list could scuttle that one. Postini came in way over budget. NAI just as out of budget and with a third of the functionality as the rest of 'em.

We just threw a sniffer onto the network in the hopes of catching the traffic that causes the apache dump.

Exploit is in the wild for one of the bugs reported in MS04-011, according to the ISC Handler. The only corobration I've found is from from this source. Exploits the SSL bug in the patch.

A very exciting morning! The April patches from Microsoft were released yesterday, and there were some real doozies in the bunch. Multiple 'remote exploit' bugs were patched, and we kicked off our own patchfest. And things didn't like it. It wasn't solar flares, there weren't any. I checked.

The April security updates are nasty with a capital N. 04-011 contains several Critical rated flaws, including a pair discovered by eEye. This means that detailed documentation of what's wrong will likely be released later today.

The ongoing Spam wars continue. We received news that Stanford like the Sophos PureMessage product. It runs on several Unix variants, and marks up headers. Not sure if that'd fit in our environment.

Another Apache abend this morning. As it looked identical to the dumps I've already submitted, I didn't submit this one.

Microsoft removed the SLEEP command from WinXP, but they hid it in the Resource Kits. This is a useful tool for building an MRTG that'll allow it to refresh the config build without having to restart the MRTG process manually.

Another day, no abends.

In other more exciting news, the server that permits us to get voicemail from within Outlook was discovered to be hacked. Our vendor notified us of this, since this is their box on our network. It has been an exciting morning figuring out what to do with it. The hackage is not impacting service that we've been able to tell, but it was warezed out. We're looking at port-blocking until our vendor can fix it.

Novell got back to me late yesterday acknowledging that they received the core dumps and are analyzing them. We didn't get an abend this morning, which is a nice thing.

As for MRTG, I'm trying to figure the best way to create a process to continually run MRTG, but still allow it to reread config-file updates. This is tricky.

We've had two abends in Apache, while EIP was on MODRDIRS. Yesterday and today. Both dumps have been sent to Novell. If I don't hear from someone by morning tomorrow, I'm getting back in the queue to get a person.

In other news, I've been playing with MRTG again. Lovely tool. I used it a lot at the old job. Zen for Servers is supposed to do a lot of what this does, only with better reporting. But we're not going to get that for a long time, so.... I just whipped up a couple of pages. In under 5 hours of total effort, I put in a system that'll automatically monitor NIC loads for each of the cluster resources and track which server is serving them. Plus I figured out the template feature to do CPU-load tracking on the cluster nodes. Coolness.

Had the myfiles and myweb servers added to the Big Brother tracking system. Now when they go unresponsive we'll hopefully get mail before the calls roll in.

The web-servers survived the weekend on the new Novell-supplied code. I think we have a fix!

We received revised code from Novell, and it works to a point. The server doesn't abend when I throw the strange traffic at it. But it does abend when I throw legitimate traffic. Not so useful. They had me take another core-dump, and I await news.

All the reproducing information has been passed off to the Novell engineer to throw at the developer. We'll see how this goes.

Update on the apache problem.

The problem appears to be centered on the mod_rdirs.nlm that permits serving of web-pages from non-local volumes. This is both good and bad. Good, in that it is a Novell fault not an Apache fault. Bad, in that it'll need the developer to fix this one.