March 2004 Archives

Continuing the processing, I've created a process that will guarantee abend my Apache servers. I've also sent a couple of core-dumps on up to Novell. I haven't heard from the webservices division person yet (whom I may have worked with before, IIRC), but that'll change. We attempted to replicate the problem on an apache running on OSX, but it handled the load fine. There are a bunch of other variables to test for that to be meaningful, but it is something.

Yep, it's a somthin' alright. I was able to reproduce the abend. I've opened up an incident with Novell to see how this turns out.

Disturbing news, it looks like the apache abend we had this morning was directly related to the strange web-log entries I reported a couple of days ago. We got this in the abend-log:

*********************************************************

Server FACSRV2 halted Monday, March 29, 2004 7:00:05.556 am
Abend 1 on P00: Server-5.60.04: Page Fault Processor Exception (Error code 00000000)

Registers:
CS = 0008 DS = 0010 ES = 0010 FS = 0010 GS = 0010 SS = 0010
EAX = CCBFC308 EBX = B102B102 ECX = 8C992A10 EDX = 00000090
ESI = CCC16A88 EDI = CCA37C00 EBP = 952B26A8 ESP = 90444FD4
EIP = B102B102 FLAGS = 00010286
Address (0xB102B102) exceeds valid memory limit
EIP in UNKNOWN memory area
Access Location: 0xB102B102

The violation occurred while processing the following instruction:


Running process: Apache 33 Process
Thread Owned by NLM: APACHE.NLM
Stack pointer: 90444718
OS Stack limit: 90431220
Scheduling priority: 67371008
Wait state: 3030070 Yielded CPU
Stack: --B102B102 ?
--B102B102 ?
--B102B102 ?
--B102B102 ?
--B102B102 ?
--B102B102 ?
--B102B102 ?


The "B102B102" is the exact hex-code of the strange SEARCH request recorded in my apache logs. The difference here is that the server that received the request was running NetStorage. I don't know the exact mechanism for how EIP got to where it was, or how the stack got overwritten, but that isn't good. Not good at all. I'm trying to get a hold of Novell to notify them of this.

The big problem this morning was twofold.

1) Of the three admins, two were out at Doctor appointments, and one was on vacation. No one was here until 10
2) One of our database servers lost the rights to its name in WINS, causing several other services to fail. This has since been fixed. If that mysterious whatever happens again, it'll get thwacked good.

In other news, I've been freeloading brainshare from the perspective of the Support Forum sysop webcam. Oh yeah.

The apache thing yesterday is what appears to be a new webdav scan that's going around. No biggie, since we're not running that on that server.

With both admins gone due to vacation and injury, I'm by my self today. We've gone on round and round on this chronicle thing, and hope we have a fix. The current theory is that Exchange is stopping delivery of the mailling when it gets enough non-existant mails. We thing we've managed to expunge the list from the bad addresses, so we'll see if things show up tonight.

Weird Apache log entry. Hard to show, but:
160.36.148.150 - - [24/Mar/2004:04:57:31 -0800] "SEARCH /±±±±±±±±±±±[...]±±" 414 350
The hex-characters are 02 and B1, padded at the end by 91

Had another exchange event. It was the same thing that hit us two weeks ago yesterday where the exchange server in question wouldn't or couldn't talk to other SMTP servers. Something DNS related, though nothing I could throw at it short of a reboot fixed the thing. The reboot did it. I wish I knew why it fell into this state, and did so on days when I'm the only one around.

The Chronicle of Higher Education sends a bulletin out every day to paying customers. It has come to our attention that somehow this mail is not making it through Exchange. This has perplexed many people, since both folk in and not in the anti-spam trial have reported not getting the thing. A call is into Microsoft to help figure this one out.

We've had a talk on the Spam problem. It seems we have a Gartner contract, so we used a question to get some information. And it was informative. Through that I learned that Sophos is the number four AV vendor behind NAI, Symantec, and TrendMicro. I also learned that the Sophos scanning engine is in a lot of the AV/Spam appliances out there.

Excitement last night. I was doing a bit of research on the USA Patriot Act at home last night, when I noticed that the ISC InfoCon level was Yellow, which I've never seen before. So I checked a few things at work, and in the process noticed that three of the six cluster nodes were down. Completely unrelated to the ISC stuff, but I just happened to notice. Otherwise the other admin would have found out about it this morning at about 7:30.

Turns out one of the fibre-channel switches turned itself off at 9:21am Saturday. I had to come in and turn it back on and ensure everything came up OK (it did).

We managed to identify the issue behind why some students can get to the CLASS volume on a Mac, and some can't. It seems the users that can't don't have a simple password set. As with many educational institutions, we have a fair amount of Macs around, but very little Mac-depth in the network administration side. As far as we're concerned, Mac-networking is almost like Banyan Vines.

The thing that threw us was that we don't set simple passwords as a rule. It turns out that a command-line option for the load of AFPTCP will set it for us if we so choose. What it does is capture the supplied password, check it against NDS, and if it matches it'll set the simple password to the NDS password. One of our desktop-support servers has it set up this way, but the cluster does not. We need to reload AFP on the cluster to get this working, so it'll be either after hours or during break (next week).

We've moved our first users over to the new Blackberry server. The one with the real license, not the evaluation license we had been running under. So far, no crashes.

The students roll on. The students have expanded their use of the officially unannounced myweb service over on myweb.students.wwu.edu. I'm not complaining, I'm just remarking, that's all.

On the 15th, most access has been on-campus until about 1pm. A couple of folk have pointed FrontPage at this Netware/Apache combo with expected zippo results. So far, the large majority of access is for HTML-ized powerpoint, and personal web-pages. A couple of people are using it for web-storage. There is a 50mb .MOV file that is getting attention from off-site.

On the 16th, most access is off-campus. It was a very busy day, largest log so far. More students joined in the fun. The period between 3-7pm was by far the heaviest period.

On the 17th, access is mixed, and lighter than the first two days. Finals week is running to conclusion so there just may be that much fewer students around taking advantage of it.

I'm pleased to find almost no MP3 or WMA files being shared!

Spent the morning staging up the new Blackberry server. This is an actual server unit, with a whole gig of RAM. With luck, it'll stay up longer than the current workstation-class unit with a half gig in it.

Quiet day. This is finals week, and we make a point of not doing anything weird. Or even, normally disruptive.

Our Microsoft TAM is up today, from Dallas as it turns out. He'll be meeting with a bunch of people over the day, but will be taking us out to lunch later. That should turn out to be interesting. We have some real questions relating to Exchange 2003 and clustering that we need to have answers to.

The vacationer and trainee are back from Arizona, both had a good time. The training wasn't as good as it could have been, but spring training was everything it was billed to be. No problems anywhere.

No issues this morning, just a lot of catch-up to do.

Tonight's backup tapes have been changed around. And I found out that an abend on one of the servers will prevent backup processing until we get reboots. Not like I haven't done thousands of those over the years. Aie. This is why I don't run backup software (except for agents) on servers that have anything resembling uptime requirements.

Had the Exchange talk this morning. It seems the correct upgrade path is:
1) Windows 2003 in the root domain
2) Windows 2003 in the child domain
3) Exchange 2003

And rumor mill has it that we're having a hard time extorting microsoft donations out of the alumni, so we may end up having to pay for some licenses. Like the exchange licenses. This has some of the upper management people running scared, because, well DO YOU KNOW HOW MUCH THAT WILL COST?? and such.

If our free MS goes away, we're going to have to look to other free sources of software. Like:
* Groupwise. Free with our Novell licenses
* OpenOffice, should be able to open all of our existing stuff, but not as feature rich macro-wise
* MySQL, since MS-SQL is hella expensive
* Apache-related web-development, on NetWare or Linux

Flying Solo, Day Two: Part Two

A pretty quiet day, until the end. The P-Counter meeting went better than expected. Then I find that there is a problem with our frontpage server. But not the kind of problem that is drop dead. Still figuring it out, actually. One of the out admins has been well steeped into the black magic that is FrontPage administration, and it really is pretty bad. I don't want to touch it unless it is pretty urgent.

Flying Solo, Day Two: Part One

No morning crisis today. The talk with MS was postponed to tomorrow morning.

Flying Solo, Day one: Part Four

We just got told that a request we made of our Microsoft TAM will be fulfilled this week! Unfortunately, the guy with most of the questions is one of the ones that is out. So I get to handle them tomorrow morning. IF I get the list of questions to ask. Erm.

Plus an off the wall request to add two managers to some distribution lists. While the GUI won't let you, the permissions invovled are quite simple to replicate. No problem there, just some data entry work.

Flying Solo, Day one: Part Three

Blackboard stopped responding to logins. After asking around to figure out which server that probably involved, I noticed that Tomcat on that server was not running (and it should). So I restarted it, and logins started working again. That's about the only thing I could do with this server, though. Blackboard is a big fear, since it is a core component of this place and I know very little about how it works and who to call when things go south.

Flying Solo, Day one: Part two

The blackberry server has decided to be weird. Two users, the admin in phoenix and the U president, have mail queueing up on the server NOT delivering to the handheld device. The admin noticed and has called me. And this problem has survived a reboot. This COULD get me on the phone with the U-prez if this keeps up. We shall see.

Update: What happened was that the the server had lost contact with the hand-held units, and forcing the hand-helds to talk to the server cleared things right up.

Flying Solo, Day one: part one

The other two admins are out of town and I'm running things solo for the first time. In the hour I've been here, we've had a DMCA-related traceback (some fool was doing a bit-torrent of LOST IN TRANSLATION off of our VPN server and got noticed), two high priority e-mail account creations, and hand-holding another admin in the final bits of a problem we started fixing yesterday. My absent office-mate's phone has rung several times but has yet to roll over to mine; this could be both good and bad.

Exchange went weird. One of the two storage servers decided it couldn't use DNS for some weird reason. I could resolve everything from command-line just peachy, and ipconfig /flushdns didn't do its usual magic. The smtp queue (didn't know it used that as a transport on this level) to the OTHER Exchange store had 1600 messages stacked up in the two hours it had been unresponsive. Unfortunately, a reboot was needed to kick it into gear. Which we gave it, and mail started flowing again.

Silly thing.

And there was much rejoycing! The MS patches this month are pretty minor on the server side! Only one really counts:

http://www.microsoft.com/technet/security/bulletin/MS04-008.mspx

And we do have at least one Media server like that. But I can do it merely after hours and not in the maintenance window. So I don't have to stay up until 1am tonight and patch 32 servers like I was fearing! Woo!

Two servers now getting updates the way they should! Yay, me! Unfortunately, I'll be the only one here to handle the MS patch coming out tomorrow. I hope hope hope its one of the very few no-reboot patches. Ah well.

It seems the NetShield for NetWare infrastructure round these parts needs some attention. Good thing I know the product well.

Some items of note involving patching of MS products in the future:
  • MBSA 2.0 will be out Q2 of 04
  • Software Update Services 2.0 is also supposed to be out Q2 of 04
  • MSI 3.0 will include the capability of doing 'delta patches' where the files being replaced aren't actually replaced, but modified in-place. This will reduce the size of patches by quite a bit. MSI 3.0 will also make patch removal more robust
  • The new MBSA and SUS will include some very nice new features
    • Ability to better target patches at different classes of users. Currently with SUS if you want to distribute different patches to different classes of machines you need an SUS server for each grouping. Not so with SUS 2.0
    • A more robust scanning engine to reduce both false positives, and improve detection of patched systems
    • Much better reporting out of SUS
  • Work on reducing the number of reboots is ongoing. Win2003 sp1 will be the best place for this, as support for hot-patching is more possible on that platform

We just had some fun deleting some restored-from-tape data. It seems a couple of power-point files didn't land correctly. ConsoleOne couldn't delete the files. The files in question didn't even show up either at command-line, or in explorer. They did show up in the Netware Portal, though, and you could delete the named files. Unfortunately, the portal showed a couple of unnamed files of 1.8mb in size in the directory that could not be deleted.

So along comes one of my favorite utilities, TOOLBOX.NLM. In normal mode it couldn't show the files. However, when loaded with the /NL flag (no login), it DID show something. The thing with the /NL flag is that it is forced to use the 8.3 namespace and can't use the longname namespace. With 8.3 it showed some screwy named files where there didn't used to be any. And since it could see the files, it could delete them! And so we did, and the directories went away as planned. I love toolbox.

A friend of mine has until very recently worked in the storage industry writing drivers for Fibre Channel and iSCSI devices. This has granted those geeks among us who have access to this august personage a certain insight into the realities of what goes on at the storage layer. And in his words:

In general, I've found that working in the storage industry has greatly increased the value I see in a good backup strategy. I don't really trust any storage technology anymore.

And in more specific terms, he distrusts iSCSI as immature. When comparing Fibre Channel (FC) and iSCSI, he has waxed eloquent in his beliefs. For one, the fact that iSCSI runs over Ethernet and TCP/IP means that such problems long-solved at the network-driver level as out of order arrival and bad latency are essentially unknown to Operating Systems (Win2003 is new enough it MIGHT not qualify here) at the storage-driver level. For another, the dependency on the TCP/IP stack by the storage-driver is another new condition that can lead to deadlocking. In short, the industry is immature.

In his opinion, a correctly configured iSCSI environment would include:
  • Hardware based iSCSI cards that present to the OS as a storage adapter, not a NIC
  • A dedicated Ethernet segment for iSCSI
  • Guaranteed low latency on the iSCSI path
  • Guaranteed bandwidth on the iSCSI path
Quite a list. Especially when you see that one of iSCSI's key marketing points is that it isn't limited to distance the way FC is. The other key marketing point, price, is true; however, when correctly designed, iSCSI isn't nearly as cheap as it looks on first blush. The first item on that list is very key, as it presents to the OS as a device that was designed for storage and also is built to have its own TCP-stack to handle the networking issues.

Fibre Channel has several things going for it. For one, it has been around a number of years and has been well battle tested. Second, it uses a networking infrastructure that was designed from the ground up for storage-networking and its demands. Third, it is still far more reliable that iSCSI. Even if it is more expensive on a per-port basis.

And I agree with him on these. But as anyone who has attempted to get something expensive approved, the siren call of 'cheaper... Cheaper...' can deafen budgetary ears to the potential downsides.

SuSE installed into my VPC session. This is nifty. Now if only I wasn't running out of disk-space quite so baddly.

Quote-file update.

Yay! I finally found a NIC driver that SuSE likes in VPC! The Tulip driver, as it turns out. This will make learning SuSE go much quicker.

Well, lookit that. I found a way to edit the IP-address an NDPS printer uses for LPR communication. You can get to it from the Portal! Go to Health Manager -> NDPS Manager -> Printer object -> Configuration Options. From there you can alter both the LPR address and the LPR-queue it prints to. You don't have to delete/recreate!

Over the weekend NAI had to release two whole dat-files. Both worms spread by way of .zip files. This has potentially very bad consequences for the future of attachments in e-mail. Already, certain file-types are banned at the boarder in order to defend against "zero day" worms (worms that start spreading like wildfire before signatures are updated). At OldJob, we managed to dodge many such worms just by blocking certain attachment extensions. If worms start spreading in archive formats like .ZIP and .CAB, the boundary blockers will not catch them.

The archivers introduce a layer between the blockers and the actual content. Unfortunately, most e-mail clients now read ZIP well enough that a double-click is all it takes to open them and get at the buggy insides. Virus scanners can also scan into such archives, but the attachment blockers generally don't. There exist some open-source utilities that can block files deep within zip files, but that generally doesn't help Exchange/LotusNotes/Groupwise environments very much.

The debate has been raging for some time on the topic of executable content in e-mail, and the desirability of e-mail as a file-transfer protocol. Until Outlook 2003 introduced the "convert all to plain-text" option for viewing new e-mail, just plain HTML in the message portion could be used to do Bad Stuff. Part of the problem is defining just what 'executable content' is.

E-mail as a file-transfer protocol is a fairly poor choice. Even today, binary code send in e-mail has to be converted to 7-bit ASCII before being transmitted over the internet by SMTP. Extensions exist to permit 8-bit transfer (and thus savings), but enough weird e-mail systems exist out there that staying with 7-bit is needed. Therefore, you get a size penalty for sending binaries (like Word documents) as attachments. This is the main reason mail administrators put attachment-size limits into place, because that size problem becomes very obvious when mailing 652mb cd-rom images.

Add into that, the fact that different e-mail systems handle the single-attachment/multiple-recipient problem (think of that retirement party notice you got last week, with the Word document containing the bouncing balloons image, and cheery music, sent to the entire department) differently, and you have another issue. Some send the same file to each person individually. Others save the file in one spot and send pointers to that file to the recipients. Both methods have their good and bad points.

And the source of the problem: attachments are very easy for the end-user to figure out, so they use them a lot. Weaning them folk off of attachments and onto another system will take effort. Lots of effort. If it is even possible.