May 2004 Archives

History Lesson

The DS-Admins meeting was not held, but the few members who arrived exchanged stories of the past. Specifically the collective experience of centraliztion of IT resources. Most organizations experienced the fun of having separate IT areas grow up (Accounting is a good bet, for Universities the Computer Science dept is another key driver) and eventually learn to talk to each other.

It was a fun chat. I learned some of the history of how the organization works around here, which is very useful to know. Historical context provides the bases for decisions of today, and knowledge of that can lead you to avoid pitfalls.

Busy morning

I got a call early this morning from our head of faculty desktop support. He was trying to give a demo of WebDav through NetStorage and it wasn't working for him. Would I check into it? So I do.

Things don't look right. I don't use WebDav at all so I had to look up how to get Windows to do it. And what I found wasn't working. Grr. Some snooping finds that it was trying to get WebDav stuff at the /oneNet sublevel and not /oneNet/NetStorage like it was supposed to, and /oneNet wasn't giving anything. Bad.

Thus begins much snooping. In the process I learn that the proxy user set up by the NetStorage install has moved and this causes problems. Looking up how to fix THAT took quite the digging, I tell you. Found it, though. Silly Novell.

Then I learn the REAL way to get WebDav drives mapped (through My Network Places -> Create New Place) and lo, does it work everywhere. AAAAAAA! Did I waist 4 hours, or did I fix something somewhere? Hard to say, hard to say.

Then this afternoon I created a cluster resource for Zen workstation import. Two of 'em, in the end. Once I get the bugs worked out I'll have a round-robbin DNS entry set up for a pair of import services to load balance. It'll be fun.

New Apache

Upgraded the Apache on the cluster from 1.3.29 to 1.3.31.

It Finally Happened

I got a call last night about 1:30. Student printing was reported to be down in one of the all-night computer labs. I know from experience that if one is down the rest likely are as well.

And they were.

The cluster node running NDPSM had abended, and recovery didn't go well. As usual. Same problem we've had for a while in that the NDPS queues themsevles are on a different volume than the one associated with the printing service. And when a cluster fail occures, seven times out of ten the two services end up on different nodes and student printing fails until we can manually get the two services together.

Seeing as how I'd really rather not have more 1:30am calls, I took steps to move the volume. It is actually fairly simple to do....

SERVER: ndpsm /

And it'll magically migrate the queues to the specified volume! Whee! One of the reasons we hadn't done this earlier is due to a bad design decision back when the cluster went in. The print volumes that hold queues were each created with 500mb of space. In the past that's been just fine. But that didn't take into account either NDPS driver-storage, and 14 page PDFs with nothing but high-res scanned in pages for data (which generates 150mb print files, which crashes printers and hangs the job on the server, causing the poor student to try to print to another server, same song second verse another 150 meg down the tubes, goes to a different computer lab....). Now we have more slack space in the SAN, we're expanding those volumes to a nice, cozy 10gb.

Faculty side has been expanded this morning. Students will be tomorrow, once the SAN-disks finish re-striping the data.

UPS again

Yep, it's a NIC alright. The trick will be hacking into it to get the IP set up. Unfortunately, to get into it requires:
  • A serial connection
  • A specially pinned out cable
  • A machine close enough to the UPS to be connected by that cable
  • Non-production machine, at that
So it is thwarting me again. This is a perfect job for a laptop, but we don't have any of those floating around the office; they're all on long-term loan to people. My second choice, throwing an old Visual terminal on the cable, didn't work due to my unfamiliarity with that particular version of terminal. I'm going to end up having to haul my PC in there and do it while sitting on the floor.

Whine whine whine, moan moan moan.

UPS & auto shutdown

It seems that the Liebert UPS in our datacenter has a network card in it. This is a surprise to me. I was hunting for the fabled RS-232 connection to which I get to connect a UPS cable, and found this device. There it was, with the following info on it:
  • Text that read PWA, SNMP/MODEM
  • A DB-9 male connector
  • An RJ-23 connector
  • An RJ-45 connector
  • A sticker with a MAC address on it
To me that looks like a network card. If this is indeed a network card, then we may not need a server to act as the interface between the UPS and the server shutdown agents. Nice, as we don't have anything to really use in that sort of mission critical, but almost never really used, capacity.

The trick will be figuring out exactly what this card is, and how to use it. The Liebert site is useless, as this model is not referenced anywhere. Our model of UPS isn't up there anymore, though Google can find links to its manuals. This particular card doesn't look like the newer SNMP-management card. The manual says that one does exist for this model of UPS, but we are to reference the separate manual for configuration details.


Network problem resolved

It seems there was some mysterious multicast traffic on the core router network. We don't know what it was, but it was chatty. In the end putting in ACLs to limit the multicast traffic to what we know we need seems to have stemmed the flood and permitted normal traffic.

Personally, I'm a big puzzled at our internet bandwidth chart. Our outbound traffic on it since service was 'restored' at 4:30pm yesterday isn't normal. If it keeps up tonight I'll mention it to the one telecom person who is here.

And that isn't very much. Poor guy has had something like six hours of sleep since things broke out Tuesday morning. Nothing like 36 hour troubleshoots to keep you awake. Things got so bad that we had to recall a technician who was on vacation as he had much needed skills. We thank him for it.

More obit whacking

It seems a dead server over in CAS was the cause of my most recent obituary troubles. Yesterday I noticed a quartet of unprocessed obituaries that had hung around for toooo loooong. Thwaping the system didn't seem to fix it.

Until I looked really closely at the External Reference check log. It seems that this particular move was waiting to notify certain servers that held backlinks of the move. One of the servers was incommunicado. Due to the network excitement of the past two days I didn't learn until this afternoon that the server in question has died the bad death and would I please remove it from the tree.

So I do. I check the obits, and we're down to six. A few gentle taps, and they're gone on all the replica servers. Ahhhhhh.

Network problems

There was something screwy relating to multicast in the router core. Acording to our internet bandwidth chart we're back to normal. From an overheard discussion I've heard that the traffic was originating from a pair of routers and the telecom crew was going to take Steps to isolate the buildings connected to the two routers. Things may have gone offline already, which would account for our speed-up.

I hope to learn exactly what went wrong.

Networking problems

The core router network has an Issue relating to CPU loads. This is causing packet loss. Internet speeds are low, so blogging by way of external utilities just as Blooger are difficult.

NW6SP4 is a good service pack

Novell just posted a bunch of beta service-packs, including the beta SP5 for Netware 6. To me, that seems pretty quick. To the paranoid, this could seem to indicate a problem with the latest service packs if they're going to release one this quickly. But, lets take a look at the public patches posted since SP4 released:

Go to here for the source of this list.
Packetscan - NetWare packet capture toolA utility, not a patch. Improves on a previous utiltity that allows packet captures at the netware console. Really nice, even if it doesn't include the capability to filter yet.
PKI Diag Utility 2Another utility, not a patch. Improves on the previous PKIDIAG utility.
iPrint Client 3.05 with Silent Install optionAn improved installer for iPrint that includes the ability to install silently. While not a patch for broken, it is a patch for a most-requested-feature.
Printer Agent Conversion Utility v2.1Another utility, not a patch. Improves the ability of administrators to handle multiple NDPS printers and a migration to iPrint
TCP update for NetWare 6The first actual PATCH so far
Remote Debugging ToolsAnother utility, mostly to assist Novell tech-support and developers
BorderManager ICSA Compliance Kit v5.0bWhile a patch, it's a patch for a product that doesn't ship with Netware. Infact, using this patch on a non BorderManager server can greatly reduce the functionality of your Netware
So of the public updates released since SP4, only ONE actual patch is in the list. I know from watching things that the TCP stack was updated on all netware platforms at that time.

So lets go over to the Beta side of the updates. That list can be found here. The magic date is Feb 19, 2004 which is when SP4 hit public. But it was in beta for a while, so I'm going to assign a code-freeze date of Jan 1, 2004. With that in mind, I count 6 actual patches; two of which are backup-system related. All in all, a very minor number.

To my reading, this tells me that SP4 is a very good patch. It didn't break much, and SP5 shouldn't be the kind of service-pack-fix that NW41SP8 was for NW41SP7.

| 1 Comment
I downloaded the Exchange connector for Evolution today. It wasn't up Friday, but it is this morning. It took some fiddling, but I was able to get connected.

Screen Capture

I wasn't able to send mail, but I'm guessing that's just a config problem I haven't solved quite yet. But it did pull down my entire calendar, and was able to access the global address list. Even if it was a lot slow on that count. So far, this looks pretty neeto.

In other news, I tripped across this link:

In short, Sometime Soon, Nsure Linux Services will be rereleased to include NCP as a transport option.

In the second half of 2004, Novell will again revise Nterprise Linux Services by adding support for Novell's NetWare Core Protocol (NCP). NetWare file servers use NCP to process workstation requests and handle file and directory access.

The other task customers will be able to perform with this release is to bring up a Linux server and mount a newer Novell Storage Services (NSS) or NCP volume on it, so existing file volumes can run on Linux. NSS was introduced with NetWare 5 in 1998.

"Rather than having to migrate all your data across the wire, you could simply move the volumes from one server to another," Hutchinson says.

Baring the usual risk of losing your trustees, of course. But you could shift things around like that! In a SAN environment, this is pretty neat. It also makes clustering that much more doable. Our 6 node cluster could make use of this, should we decide it was needed.

If we don't get to our NW6.5 upgrades this summer, we'll be going to NW7 next summer. It all depends on scheduling this year.

Blogger has redone their stuff quite significantly. This has required that I readjust my posting template. The colors aren't quite there yet, alas, but I do get comments now. Woo!

The ongoing search for an anti-spam vendor has run into an all too familiar road block. We have two vendors we're seriously considering. Sophos, and Cyphertrust. We had been looking at Postini, but their pricing was so far out of acceptable range they didn't make the second round of looking. They may have company.

The Sophos presentation yesterday to the higher up decision makers went rather well (except for the presentation PC wanting to go into standby mode after 20 minutes of no mouse movement, but I digress). The product is very well featured, the spam handling nicely complex, and the end-user experience looks to be fairly simple. Then came the numbers. I'm not quoting numbers, but the price had two big things wrong with it:
  1. The up-front cost was not much bigger than the yearly cost
  2. The yearly cost was twice what we pay already for Exchange AV and desktop AV for faculty/staff
Cyphertrust's demo was this morning, and that presentation wasn't nearly as well put together. I had been in on all the earlier wrangling involving technical details, and I personally liked the Cyphertrust implementation of anti-spam over the Sophos one. Unfortunately, the use of technology (Webex) was not well handled by the presenters. One downside is that a couple of very key make-or-break features we need aren't in the product quite yet, but will be when they rev in June or July. The pricing on this product was much more to our liking, but still out of the projected budget.

Right now the decisions are with the higher-ups to determine if we get to go through the dickering effort to crank down the costs of some of these systems, or just keep what we have and see what things look like next year. Really, for the prices involved we can have a half FTE do nothing but spam administration and that opens the door for some of these open source solutions we've been eying.

I also learned something during this process. As a publicly funded University subject to the financial whims of our State legislature for both tuition increases and direct subsidies, big up-front costs are far easier to get approved than yearly costs. In other words, the subscription model that software vendors are really fond of these days plays silly buggers with our ability to actually buy what we need. It comes down to the fact that big up-front costs can be delt with by taking surplusses in other areas that haven't spent their quota, where subscription stuff requires us to get an indefinate budget increase approved.

Today I learned that Novell's Open Enterprise Server (a.k.a. Netware 7) will have NSS volumes for both the Netware kernel and the Linux kernel. This is cool, since NSS allows expansion of the volume and I believe efs3 doesn't.

We have a weird thing going on. We use a particularly nice tool called "NwSysMon" to track the 'stoplight' status of our netware servers. One of our servers has been flapping in this utility. It'll show as 'down' for 30 seconds and come back green, but the server itself never actually went down.

I managed to catch the event on a network capture, and I got some info, but not lots. It seems that the server responds to the query just fine, but my workstation, for reasons unknown, resets the conversation instead of FIN. No idea why that's happening, but it is.


More obit whacking. Turned out that there was an INHIBIT_MOVE cloging up the works. The following procedure got rid of the damned thing.

On each replica-containing server
SET DSTRACE=!D [disables DS-sync, but DS stays up]
SET DSTRACE=+obit [turns on obit tracking, used later]
DSREPAIR -OT -A [go into dsrepair]
Advanced -> Repair Databases -> Accept defaults
Wait until the Master is done

On the Master replica server:
SET DSTRACE=!E [enabled DS-sync again, and lets it do what needs doing]

At one minute intervals, SET DSTRACE=!E on the rest of the servers. Watch the obits flow.

Not so much certificates, as obituaries. Again. Time to go on an Obit hunt.

The certificates are thwarting me. One of our backup servers is having issues with its SSL certificates. Not used for much but remote admin, I'd like to get them working correctly. But it is not working today. The obits from the old cert are taking for ever to clear, and that complicates making new ones.

The Boss is back from a weeks vacation. It has been fun watching him catch up his e-mail. This was fun with my boss at OldJob, as she'd send e-mail requesting status updates to things she hadn't gotten to yet. This boss isn't so bad, but it is still interesting.

Novell has released a white paper detailing the issues facing backup technologies. It is a very nice read, I highly recommend it.

You can find it here

The Sassor worm has been felt here, though not to the extremes that a friend of mine's University has. It could have been a lot worse. I'm not sure how our desktop people handle patches, but I suspect that using Windows Update and automatic downloading has a lot to do with it. I'm very sure we do not make use of Software Update Services, as we don't have a server that does that.

A friend of mine recently posted a review of book she just read regarding the US bombing of Japan during WWII. This particular book focused on the nukes we dropped on them, and the generally ignored aftermath of that particular action. She was particularly moved by this book as it really highlighted what the after effects of a nuke are on the survivors. This made what had been an intellectual thing into something she could really identify with. This book did NOT go into depth on whether or not dropping the bomb as we did was justified in the long run.

This book probably would not affect me nearly as much as it did her. I spent the cold-war years living in the largest city of my state, and within 5 miles of a major international airport and Air National Guard base. Two guaranteed targets should the evil communists ever launch their nukes. For the nukes aimed at Downtown, we lived in the "dead in a month" radius. However, we were in the "Dead in a few days" radius for the nukes aimed at the airport. This particular Sword of Damocles hung over our heads until the Soviet empire fell.

I was very much aware of the kinds of things that level of radiation can do to a body. So much so, that the half plan our family had was to go out on the front lawn and watch the light show instead of cowering in the basement. A kind of pointless futility that we were screwed no matter which way the bombs fell.

Now that I work at a larger higher educational institution sited on a coast, my targeting priority has diminished somewhat. What lands here probably won't be as big as what would land on the old homestead, but we'd get almost no notice. Should IT happen while I was at home, the intervening earth would protect us from the radiation flash, but fallout and fire would do quite a bit of damage. Call it, dead of leukemia in 10 years.

And people wonder why we cheered so much when the cold war ended.