June 2004 Archives

Quote file

Quote file update:

"Try as I might, I can't get beer out of it" -- Coworker on the ADIC Scalar 100"

VirtualPC and clusters

It seems that MS Virutal PC doesn't permit the sharing of virtual-disks. This is kind of needed if one intends to simulate a cluster. Which I can't. I hear that the latest version of VMWare is the same way. The old VMWare still permits sharing of virtual-disks.

I wonder if it'd work if I had a pair of virtual disks that pointed to the same physical drive? But since I don't have a spare physical drive to dedicate to VPC use, I can't find out. Darnit.

eDir 8.7.3 is in the tree!

Now if that other server would come up from service-packing, we'd all be good. EDir 8.7.3.1 is in the tree now. The root CA has been updated. The schema has propegated.

But a server over in the College of Arts & Sciences was down for the whole thing (oops) and has yet to come up. Something about a DXEVENT error on boot up causing it to crash hard. Aie. Not our server, but concerning none the less.

ML530 problem

I have it on one count that it does work. So long as you have the latest BIOS and RomPAQ (or is that ROMPaq?). That was done prior to the NW6 upgrades last year. We'll just have to see, won't we?

That's a problem

According to Compaq, the ML530 G1 server platform is NOT supported for NetWare 6.5. The ML530 G2 IS supported, but we don't have any of those. Since our replica servers, and really most of our NetWare servers at the moment, are running on ML530 G1's this presents a real problem to the NetWare 6.5 upgrade project.

Now we get to find out:
  • Why is this server platform not supported
  • Drivers for NW6.5 are listed on the support page, what does that mean
  • If we do install NW6.5 on these servers, what does that mean for our support
Fortunately, the eDir 8.7.3 upgrade is not impacted by this.

Investigating ZenWorks

| 4 Comments
We have some of the components for Zenworks for Desktops in the NDS tree. They were put in shortly after I got here. But only after a protracted battle in the Lan Managers group delayed things. It seems word got out that this particular package contained a "remote desktop" component that would allow helpdesk techs to remote-control workstations.

Remote control the workstations? As in, control them? Without the user giving permission for that? Spyware! Spyware!

It got ugly. The battle was just winding down when I started here in early December, so I didn't get to see the nastiness. It did prompt The Powers That Be to obtain an, "affirmation of end-user privacy rights", that we had to sign. The form was just a rehash of the usual System Administrator code of ethics of:
"Just because I can read every file on my servers, doesn't mean I have a right to look into files that aren't mine without permission. I will not look into files that I do not own, or do not need for my normal execution of duties."
Which also means that I can't go trolling through the Student directories looking for illegal MP3's to borrow. Or look into the ASCII file Payroll prepares for the big pay-day EFT that contains details of how much I'm going to get paid this week, but now know a week early. Or into the calendar of a very busy person who is running for public office to see if they have campaign events in their state-funded calendar.

You know, that sort of thing.

They wanted it written down and with a signature just in case someone DOES get caught doing something like that they'll have a firm document to nail 'em to the tree with. Which they could do anyway.

But I'm getting off topic a bit. Ahem.

The word has come down from on high that we are to look into certain aspects of ZenWorks considered untouchable not six months ago. The driving force behind this are multiple events of small numbers of workstations being compromised by something nasty and then effectively taking down our router core. They want to be able to tell centrally which machines we manage are unpatched. And the best way to do that is ZenWorks Inventory. Where Best means, "we've already paid for it." Recently we had eight (8) machines participating in a DDoS attack (not Akami) that also brought disruption to our network, just to give you an idea as to how few machines are needed to turn us off. The 80-90% effective methods we'd been using before aren't good enough, we need more better good!

It'll have to be a slow expansion, so we'll have to figure out how to rig the system so only small groups of users can be done at a time. This will take some finesse considering how our tree is designed. TPTB will be obtaining permission from each area to do this, and they feel that they have a very good case to convince the other areas with.

The first step, though, is Workstation Import. The very first step to all of this, and happily, fairly easy.

Bits and stuff

| 1 Comment
The hardware that is backing our upgrade/migration to Exchange 2003 has been delayed. A shipping error somewhere, and Compaq can't tell us when they'll have the hardware available to ship. "Backordered." This has the unfortunate side-effect of pushing our upgrade of this product to the period between the end of second summer term and the beginning of fall quarter.

The admin who has put together the Exchange migration process will be on vacation for most of July. Its one of those "use it or lose it" things for vacation balance. After my predecessor left and before I arrived there were only two admins to answer phones, and that made for some interesting scheduling of vacations. Both of the admins have balances that'll likely get dumped come their anniversary dates. Both of 'em have been taking lots of vacation now that I'm here.

Anyway, Exchange is effectively halted until August. Which runs it right into our planned upgrade from NetWare 6.0 to NetWare 6.5. Since that particular migration involves taking down the cluster for non trivial amounts of time, it will HAVE to happen during the dead period between terms. Have to have to. We only get three weeks a year to do these kinds of upgrades and if the technology isn't there... we wait another year. We'd like to try NetWare 7, but it won't be out in time. We're also upgrading Banner during this period if I've read the notices correctly.

Because these two projects will end up overlapping, the admin who planned the Exchange upgrade will not be able to also plan the NetWare upgrade. That task has fallen to me, and I'm very happy to have it. This is exactly the sort of thing I was doing at OldJob and haven't been able to do at this job. I look forward to this project as a way to prove that the CNE after my name is well earned, not just a paper one.

As I look into this project I see that my planning skills are beginning to atrophy. Not good. I need this in order to stay sharp.

Working the mod-edir problem

I've attempted to open an incident with Novell in relation to the bad library calls that make mod_edir not work. Where I found that apache2 is not supported on NW6, and thus this call would not 'count'. If I could reproduce the problem against NW6.5 and Apache2, that's another story completely.

Problem is, I don't have a NW6.5 server to play with. The web-server has to be in the same tree as the storage, and since the cluster is what we need to test against in the production tree is where the webserver needs to go. And the production server is not yet ready for eDir 8.7.3. With luck, we'll be there by next week. But I'm not holding my breath. I'm probably going to have to close the incident until I can get a NW6.5 to test against.

MyWeb and Apache2

It seems that the libraries under mod_edir aren't quite there yet when it gets to clusters. I've been speaking with one of the developers of mod_edir and he had a clear opinion on what the problem was. Since Mod_edir is on Novell Forge as an open-source thing, I downloaded the code. It is rather short code for what it does, which meant I could actually follow it. The bits I'm having trouble with is this function call:
err = NXCreatePathContext(0, spath_root, 0, (void *)rdirs_server_identity, &pathCtx);

if (err) {
ap_log_rerror(APLOG_MARK, APLOG_NOERRNO|APLOG_ERR, 0, r,
"could not create path context. error: %d", err);
return HTTP_NOT_FOUND;
}
When it runs into the problem it gives me this error-code in the debug-log:

could not create path context. error: -1

Okie dokie, so the error is returned by NXCreatePathContext. The Novell NDK documentation here does NOT list what an error-code of "-1" is. The developer has run into this more than I have, and he said:

George Raetzke:
Running mod_edir on a cluster is somewhat screwed.
This isn't mod_edir fault, but a defect in libc. (it uses winsock, and if
winsock give a "hold on a sec" response, rather then an immediate success,
it gives up) This "hold on a sec" is actually caused by an overlapping
i/o. I have discovered this can be reduced by (no you probably aren't
going to like it), adding a replica of the partition that contains the
server objects (not sure if the server object that make a difference is
the web server's, or the remote server object, as my test setup those were
in the same partition). I have a defect entered in on this, but no fix
yet. And yes I spent a lot of time on this one.
So, in short, we can't do that yet. I certainly hope the underlying libraries get fixed before August. For it is August when we'll have the downtime window available to upgrade the cluster to NW6.5. There are other ways to pull of MyWeb, but none of then are what you call... clean. Mod_userdir is the only other module out there capable of doing http://server/~useracct/ but it is limited in that it can only serve from the local server. That can be worked around, but it'll be a customized hack.

Honeymoon over for Apache2

It just doesn't wanna scale. It handles the Faculty side just fine, but only lasts a few hours on the student side before going catatonic. It looks to me like the mod_edir module isn't quite mature yet. I am using the latest 1.0.6 mod_edir as well. I'm working with one of the developers of this bad boy (open-source is cool that way), so we'll see what we can get out. So far the libc update is in, and the snapshot build of apache is crashing-on-load. Not the best start.

I've moved myweb on students back to the apache1.3 setup. Easy to do, fortunately.

Google Mail

| 10 Comments
It seems that Blogger-users like me can help test out GMail. I am, and I have some invites. If you want to give it a whirl, drop a comment to this post. I'll send the invite along and nuke the comment to minimize spam-harvester-bot exposure.

Update: Invites sent out. The comments have been whacked to avoid spam-botiness. Enjoy you two! One more invite left.

Apache2 kink

| 3 Comments
Well! The student servers have enough data behind them that I can REALLY stress-test the web-implementation. And things aren't behaving correctly. Twice now I've run the stress-test-of-doom at it (details below) and the server has gone weird. In one case it started serving up traffic really slowly, then started cropping errors about non-existent .htaccess files before going catatonic. The second time response-time sucked for a while but eventually returned to normal.

So. What is causing that? I'm not entirely sure, but I suspect something in the dynamic thread handling. One of the nicer features of Apache2 is that it is able to handle resources dynamically, so I don't have to allocate 150 threads up front. I am now able to set low and high water-marks and it'll allocate (and de-allocate) threads as needed. I'm not so sure They have the bugs worked out, but I need to do some testing.

Stress-test-of-doom
A while back, I took a couple days of logfiles from the student-servers and parsed 'em. The stress-test tool is called hammerhead off of the Phlak security ISO-linux distro. Some scripting later, and I have a bucket (980!) of scenario-files of requests that were at one time legitimate. This gave me a good base to try to stress servers with. Hammerhead allows the configuration of how many simultaneous users to simulate and how fast they hit their next site. This allowed me to get downright slashdotty. A couple (less than 5) scenario files are no longer good, which just goes to show the static nature of these kinds of web-pages.

Apache2 spreads

The Student side of myfiles has been converted over to apache2. I also did some config-file clean-up to make it more readable. Not that this'll affect end performance, but if I get hit by a truck and our resident apache guru gets the task of figuring out what I did, he won't curse as badly.

Finals Week

Today is the last Friday of finals week. The few students still printing are doing so franticly, but overall things are winding down for the main achedemic year. Summer term starts in a week or two, but attendance there is typically much lower than during the rest of the year.

The internet bandwidth chart shows this quite well. The load today is down easily 30% from yesterday. There is some strange outbound peaks that have yet to be explained, but overall usage is very much down. Next week I should have my insanely fast download speeds again.

In other news, the test of Apache2 for the myweb service has gone well enough that I've deployed it in the cluster for the facstaff side. I'll do the student-side next week, once things have calmed down here. Over There is more of interest anyway, since the student side gets perhaps ten times the traffic facstaff does. We're still not talking huge traffic here, as the myweb stuff is still largely a word-of-mouth service. Sad, since it was ready in the early parts of Winter quarter.

LDAP import/export

As with many NDS admins, I've done the obvious thing. Set up a test tree, and attempt to import my current DS into the test tree by way of LDIF. Short answer is that it doesn't work the way you'd hope it would. Slightly longer answer is that it'll work if you spend enough time at it.

An example for how to add a group-membership:

dn: cn=LAB,ou=groups,o=corp
changetype: modify
add: member
member: cn=User-xyz,o=corp

dn: cn=User-xyz,o=corp
changetype: modify
add: securityEquals
securityEquals: cn=LAB,ou=groups,o=corp

dn: cn=LAB,ou=groups,o=corp
changetype: modify
add: equivalentToMe
equivalentToMe: cn=User-xyz,o=corp

dn: cn=User-xyz,o=corp
changetype: modify
add: groupMembership
groupMembership: cn=LAB,ou=groups,o=corp

Four steps:
1: Add the user to the group
2: Add the security-equals to the user
3: Add the security-equivalent-to-me to the group
4: Add the Membership to the user

It can be condensed down, and for an example of a full object creation look at This novell-support-forum posting. This is the sort of thing that DirXML is supposed to automate for you, once you have rules set up. Fun stuff, if that's your thing.

Backup speeds

They managed to find the correct combination of driver and voodoo and things are now talking. He was chorteling over a 700 mb/min backup speed earlier, of which I fully understand. When you've been looking at backups that barely peak over 200 megs/minute, seeing one spew forth at 700 megs a minute is cause for giggling with glee. And seeing a Verify go at 1600 megs/minute is cause for laughing out loud with joy.

Depending on how things go, we may end up picking up the fibre-channel interface for the Scalar. It'll allow our backups to stream that much faster, plus allow for future expansion in the backup-heads. The server that is driving it right now is a wee bit under-powered when it comes to PCI bus, so we'll start bottlenecking well before we hit the theoretical limits on dual GigE ports.

Outsourcing

A coworker is on the phone with Veritas Support right now to try and get our Scalar 100 talking to a Windows server. It hasn't been going well, so we needed the big guns. According to him, Veritas Support is in India. They're currently playing the hotfix/driver tango, we'll see how it turns out.

Exchange

Today Microsoft is coming up to visit with us. Our Premier contract had some extra hours on it, so we're going to burn them by getting MS-approval for our migration plan for Exchange 2000 -> Exchange 2003. This does, of course, also require a migration to Windows 2003 in some places, so we're dealing with that as well. The preliminary plan looks good, but we just need to know if there are any gotchas we've overlooked.

One thing that did arouse the ire of one of the other administrators is that there is a schema conflict! It seems Exchange2000 and AD2003 both want to use the same attribute slightly differently. There are some KB articles on dealing with it, but it was rather annoying to discover.

UPDATE: Our plan has been given the stamp of approval. We're going to kick off the Win2003 upgrades next Tuesday during our regular downtime window.

Macintosh & the cluster

Our local Mac-guru kludged up a very fragile script to allow our Mac-labs to map cluster drives. Almost anything will break it, and it broke this weekend. One of the nodes decided that it doesn't like talking AFP. This node happens to contain two thirds of the student user-directories. As it is FINALS WEEK this is causing... concern.

The question has come up again about how NW6.5 will work with our Mac stuff. I hope to help our guru try things out later today.

MyWeb

Okie dokie. The 'facstaff' myweb is temporarilly being served out of Apache2. Lets see how things react. I'm guessing pretty good, but you never know.

UPDATE:
Oooo, nice new feature. Apache2 doesn't lock EXCLUSIVE READ the access_log file. This way I can view the file (or tail it) while running. With 1.3 this was hidden until such time as the log rotated and the file came out from under its exclusive lock. This actually makes some trouble-shooting easier.

Apache 2

NW6.5 provides me with my first real look at Apache2. I understand that there is quite a debate over which is better, good old Apache1.3 or the new stuff of Apache2. Some new stuff in here. One thing that I particularly like is the fact that Apache2 can effectively run in address spaces, something that 1.3 couldn't do stably.

My current project is to figure out how to port the myweb services (what this page is provided by) from 1.3 over to 2.0 in preparation for our eventual migration to NW6.5 later this summer (I'd like NW7, but our migration window is 8/15-9/10, no extensions). It has gone fairly well. Yesterday I determined that it is probably doable to run both the 1.3 services such as NetStorage and have MyWeb served out of 2.0 in a separate address space. Seeing if that is stable is today's job.

Yay! NetWare 6.5!

I finally managed to get NW6.5 to load in my VirtualPC session. I just had to select the "load unsupported drivers" option during the NW install. And lo, did my NIC get recognized. And there was much happiness.

Now if only the thing didn't run like mollasses in a Minnesota January. But it IS running!

bash for netware?

http://forge.novell.com/modules/xfmod/project/?bash

Didn't know that. From an authority:

Last thing I heard is that Novell is porting more open source stuff to
NetWare and that for example they have ported bash to NetWare allowing
you to get a Unix like shell on NetWare kernels.