September 2004 Archives

Quick updates & vacation

Student printing has been bouncing around for a bit. We've had some abends relating to the LPR polling process in NPDSM. I had enough, I called it into Novell. Turns out there is a patch for situations involving PCounter and NDPS pools, the later item being a new thing in our environment. I have high hopes this'll fix our problems. Printing volumes are back to normal, so we'll hear right quick if things go pear-shaped.

Myweb is holding up on both faculty and student side. The student side of things has been run on Apache2 since early yesterday with only a single outage, which was due to a rebooting server stealing the MyWeb.Students.wwu.edu IP address for a bit until I put it back where it needed to be.

And I'm heading out on vacation (probably should update my voice-mail). Back 10/5. Vanishingly unlikely that there will be any posts between then and now.

MyWeb & Students

I checked google this morning to see what it could dig up on pages on myweb.students.wwu.edu. And found stuff!

http://www.google.com/search?hl=en&lr=&ie=UTF-8&q=+site%3Amyweb.students.wwu.edu+&btnG=Search

About 25 pages of stuff, in fact. Most of them involve teaching, so I'm guessing a class was held once that had these put up. Others probably could be found over on www.ac.wwu.edu, but I'm not going to dig that far.

MyWeb yourweb

| 1 Comment
Now running (on the Faculty side anyway, soon to be joined by the Student side) on Apache 2.0.51, mod_edir 1.0.8-dev! Student side should be running the new rev by Wednesday if all goes well *cracks knuckles*. That will be nice, since Apache2 is much nicer to play around in on Netware. Really really.

Found out that Facsrv3 still had SERVERID in its autoexec. Icky. Fixed that.

Move-in has been quiety for us, but I understand ResTek has been slammed. Stiff upper lip and all that.

Myweb side-effects

Astute readers may have noticed an outage this morning. Apache 2.0.51 came out, and I had attempted to upgrade. Unfortunately, the server needed a reboot in order to allow mod_edir to connect to remote servers. Odd, that.

In other news, I got a hold of Novell yesterday on the incident I've had open with mod_edir (and the underlaying LibC stuff that mod_edir illuminates as broken). It seems the referral thing I spoke of yesterday has cause certain people to go "hmm" in the thoughtful way. We'll see what comes of this.

Whoa... cracked the myweb problem

I think I managed to crack this puppy! I managed to get some answers in one of the developer forums. Thanks to mod_edir being an open-source project, I was able to track down which function call was giving me the grief it was. NXCreatePathContext for the curious.

I won't go into the how I got to this point, but it seems that cluster nodes with SERVERID set don't work well. When I comment out SERVERID from the autoexec.ncf file it seems to work like the Faculty-side does.

WHY this works is a touch of a mystery, but I can see where things are getting confused. With SERVERID not set, part of the connect-to-remote-server process involves resolving the cluster-node name:
   Result Flags: 0x00000040

Entry ID: 0x000084c8
Referral Records: 3
NDS Referral Record #1
Number of Addresses in Referral - 2
(TCP Protocol)
Port: 524
Address Referral: 140.160.5.144 (140.160.5.144)
(UDP Protocol)
Port: 524
Address Referral: 140.160.5.144 (140.160.5.144)
NDS Referral Record #2
Number of Addresses in Referral - 2
(TCP Protocol)
Port: 524
Address Referral: 140.160.247.29 (140.160.247.29)
(UDP Protocol)
Port: 524
Address Referral: 140.160.247.29 (140.160.247.29)
NDS Referral Record #3
Number of Addresses in Referral - 2
(TCP Protocol)
Port: 524
Address Referral: 140.160.247.27 (140.160.247.27)
(UDP Protocol)
Port: 524
Address Referral: 140.160.247.27 (140.160.247.27)
And if you compare it to the result you get WITH SERVERID set:
    Result Flags: 0x00000040

Entry ID: 0x000084c8
Referral Records: 3
NDS Referral Record #1
Number of Addresses in Referral - 3
(IPX Protocol)
Network: 0x00370001 (00370001)
Node: 00:00:00:00:00:01 (00:00:00_00:00:01)
Socket: 0x0451
(TCP Protocol)
Port: 524
Address Referral: 140.160.5.144 (140.160.5.144)
(UDP Protocol)
Port: 524
Address Referral: 140.160.5.144 (140.160.5.144)
NDS Referral Record #2
Number of Addresses in Referral - 3
(IPX Protocol)
Network: 0x00008012 (00008012)
Node: 00:00:00:00:00:01 (00:00:00_00:00:01)
Socket: 0x0451
(TCP Protocol)
Port: 524
Address Referral: 140.160.247.29 (140.160.247.29)
(UDP Protocol)
Port: 524
Address Referral: 140.160.247.29 (140.160.247.29)
NDS Referral Record #3
Number of Addresses in Referral - 3
(IPX Protocol)
Network: 0x00008013 (00008013)
Node: 00:00:00:00:00:01 (00:00:00_00:00:01)
Socket: 0x0451
(TCP Protocol)
Port: 524
Address Referral: 140.160.247.27 (140.160.247.27)
(UDP Protocol)
Port: 524
Address Referral: 140.160.247.27 (140.160.247.27)
Note the fact that the IPX address is first. I'm thinking that this confuses the querying server (the web-server in this case) somehow. The next action after the resolve is the make-or-break point. When it works, the querying server follows the referral and translates the entryID it got above into a fully-qualified name of the cluster-node. When it fails, the querying server queries the cluster-node (on its cluster IP for that particualr resource) for the information and gets told -601 not-found.

Adding IPX to the querying server didn't help, nor did putting it also on the cluster node (through SCMD, since IPX isn't enabled on the cluster subnet). But forcing it to go PURE IP, it seems to work now.

Unfortunately, getting rid of SERVERID will break our backup agent. Fortunately(?) our network is slow enough that the improvement from using the agent is very slight. If/when we get gigE in, then we may have to revisit this issue.

September Info Security ad-counts

I got my "information security" mag today. As with my August post here is the ad counts.


FUD
We-are-nifty
Hard-made-easy
Regulatory
Security Service Management
1

5

Widgets
8
3
6
1
Training

1
4

In-house ads
2

2

total ads
33




The big movers this month are just plain more ads, plus an increase of widgets. This is a subjective definition of 'widget' at work here that may be responsible for the change in allocation. A widget, at least this month, is a software or hardware package that is purchased to do whatever. Security Service Management is hiring another firm to manage your security for you (which may include widgets as part of the deal). Some widgets require a subscription for updated definition files for whatever, I do not consider these "security service management" since the management is automated.

Other then that, the spreads look similar to what they did last month. FUD still moves widgets more than the ease-of-use factor. This would change if the patch-management widgets would stop hyping panic over patching and start hyping the ease of use angle, something that might be more effective in luring harried admins.

I may have to include a new category next month, "other" in the product department. There are a few ads that don't fit the three buckets, notibly a head-hunter firm. We'll see next week.

Exchange2003 & IMAP

I managed to get IMAP running for the Exch2003 folks today! Not bad for a primarilly Novell guy who hasn't had any Exchange training. I even managed to do it with SSL and everything.

Well, drat

Upgraded the student half of the cluster to eDir 8.7.3.2 and it didn't change how the myweb thing works. I had hopes. The problem is that the server with the home directory on it can't answer the simple question "who are you" at a key phase of setting up the connection to it.

Still no word from Novell.

More with the nice

I just found the online staff directory that is available outside of our subnet. Can you say "spam harvester"? I knew you could.

Well, that's nice

It seems that the deadline we thought we had for Exchange isn't it. Concerns have been raised that the new environment hadn't been adequately tested yet, especially on the spam front. So we're pushing the deadline a bit. We're OK with this, since we're ready to roll out now and some users have already been moved. We can wait a week.

School starts next week, with move-in happening starting Friday. Which means that I have three days to do Stuff to the student servers before I have to start scheduling migrations in the middle of the night.

Also, It seems that NW6SP4 DID introduce the concept of NDPS printer-pools to NetWare 6! The functionality I noticed in iManager was verified by our computer lab guy last week. I'm way happy this worked out this way! We can get rid of our old queues we were using for printer pooling before, and jobs submitted to pooled printers will NOT be lost duringa failover. Nifty.

Happy news on the Exchange front

GroupShield is in, SpamKiller is running, public folders replicated. We're good for live testing!

And the backup agents are in! It is very, very nice to see a backup roll in at 650 megs/minute in real time. Woo! Now once we get the gig ports in, we can really fly!

Code update

A develoment version of mod_edir 1.0.8 was put out here. Quick testing shows that the bug I discovered has been fixed in this release! I still have a problem over on the student side with the libc issue, and there is still no progress on that. But at least this particular server will stop showing 404's on the hour.
Ahem.

Parallel processing: Coming to a desktop near you

My first reaction to that article title was admitedly, "what are you talking about? We've had it for years!"

Ever since the Pentium moved to a pipelined instruction processer it has been parallel in some form. The latest chips coming out of Intel are all increasingly parallel in nature. The Pentium Pro introduced multi-CPU systems to the WinTel world, and workstations with multiple processors have been available since a few minutes after the PPro launch. We have some PPro servers that we are just now happily retiring. It's been a while.
The company [Intel] will focus on parallel processing with future products, Otellini said. This will include multicore processors, virtualization technology and a continuation of Intel's hyperthreading technology.
That makes more sense, but it hardly is "coming soon". Perhaps its my inner curmudgeon coming out, but the modern CPU is w-a-y more parallel than our old 486's were.
The move to dual-core processors will proceed much faster for notebook and server processors. Otellini said. More than 75% of Intel's 2006 shipments in those categories will be dual-core chips, with just under half of all desktop chips in that time frame containing two cores, he said.
THIS is much more meaningful. What this says is that ye olde $1000 PC will have two execution cores installed as of late 2006. Dual core chips take a big step beyond the Hyperthreading (yet more parallelism) that already exists in the P4 line. A dual core chip is simply two CPU's on the same piece of silicon. AMD recently announced a dual core chip with the same socket pinout of an existing chip, and all it took was a BIOS tweek (not revision) to make it work.

So where can modern PCs take advantage of this stuff? Well, they already are trying. Hyperthreading presents logical CPUs to the operating system, which itself is already MP-aware. Processes get allocated to a CPU for execution, and the more execution queues you have the faster things theoretically run. Hyperthreading as it exists in the P4 is two execution queues presenting to a single execution core. When CPU 1:0 is paused waiting on a fetch from disk-cache, CPU 1:1 can run as far as it can until it too hits a WAIT. For things like pure computation in memory such as that found in a big spreadsheet Hyperthreading doesn't give you much at all. For things like lots of disk I/O, Hyperthreading also doesn't give you much at all (which is a prime reason hyperthreading is largely useless on fileservers) due to the fundamental non-parallelism of the I/O channel. Where Hyperthreading really assists is when multiple separate processes with unique access patterns do Stuff to the system, Hyperthreading improved CPU efficiency.

Dual core with hyperthreading (and pipelining, and predictive fetch, and multiple layers of caching) allows very efficient use of the execution cores for very heterogenious accesses. The jury is still out on how well Windows (and it'll be Longhorn before all this new stuff comes out of Intel) can handle the new archetecture and what kind of performance boosts it'll get.

This isn't "parallel processing coming soon," it is much more, "improvements to parallel processing coming to desktops." Yeesh.

Spam statistics

| 1 Comment
One of the most frequently quotes stats regarding the percentage of mail that is spam, comes from MessageLabs. You too can find these stats!

http://www.messagelabs.com/emailthreats/default.asp

Instability, update

Yep, things are misbehaving. This particular blog gets a lot of search-engine traffic so regular readers (hi!) don't run into 404 much. Other users of this particular server aren't so lucky. I'm not getting much traction with the bug report, so perhaps I'll have to get more vocal about it. We'll see.

In other news, student side is still completely borked. NO idea why that stuff is failing. Same exact error as before the LibC update, with the same exact resolution (put a replica on it). Without any idea as to how the timing issue manifests there in the depths of libc I can't even theorize if any hardware tweaking might break the odds in my favor. Ah well.

Instability, option 1

It seems mod_edir is having possible issues relating to the caching of identities in the back end. When you go, say, here it takes "riedesg" and caches the location of where the user-directory can be found. Right now it seems to be throwing a 404 the very first time the resource is accessed and is OK (200) afterwards. At least until the entry ages long enough to be purged from the internal cache, at which point the cycle starts over again.

I think.

Since mod_edir is an an open-source project I'll be downloading the latest source and try to trace down where in the code the error is occuring. If a miracle strikes, I might be able to identify where it could be fixed!

UPDATE:
On parsing the debug output of this here module, I've found the flow goes something like this:

[1st connection attempt]
- Username is checked
- It comes back as good
- Check cache for existing entry
- No entry, so query LDAP for userdir
- Gets it, adds to cache
- Craps out on "invalid home directory"

[2nd connection attempt]
- Username is checked
- It comes back as good
- Check cache for existing entry
- Finds entry
- Checks cache for entry for the volume the directory is on
- No entry, so it creates a connection to the volume, adds info to cache
- Connects to resource
- Returns data to requester

[3+ connection attempt, up to the cache-expire time]
- Username is checked
- It comes back as good
- Check cache for existing entry
- Finds entry
- Checks cache for entry for the volume the directory is on
- Finds entry
- Connects to resource
- Returns data to requster

Once the cache-expire time rolls over, timed from the first access not last access, the process starts all over again. Also, this is the request for the VOLUME not the USER in this case. So for heavilly used resources (of which I have none so far) one user per volume per cache-expire period will get an unexpected 404.

Webserver instability

The web-server that serves this blog is being worked on. You may get 404 errors once in a while as I figure out why the heck things are working the way they are. More info later once I figure it out.

Updated SMS data

For those of you out there playing with TSAFS and such, you will be pleased to know that new TSAFS modules have been posted to developer.novell.com. I think there may be a new TSATEST in there somewhere, but I haven't dug it out yet.

Another thing to note:

TSATEST's "/C" paramater...
"Specifies a value to be used in the scanType field of the job structure when creating a job. This option should only be used after referring to the SMS NDK documentation for appropriate values."
Has some revised data in it! There is a revised readme that gives some examples, but not the actual data itself. That can be found here. If you are like me, the thing you wanted to know was how to simulate an incrimental backup using archive bits. That's simple!

TSATEST /S=Yoda /V=DataVol: /C=2 /U=.God.yourorg /P=passwordintheclear

The new TSAFS includes the ability to use the /nocachingmode and /cachememorythreshold options, which the TSA15 update doesn't allow. In our case it won't help lots since we're binding up on network I/O not file-server stuff. But still, nice to know it is there.