February 2005 Archives

More stats!

For the week of 2/21/05-2/27/05, 76.01% of traffic to myweb.students.wwu.edu was from Internet Explorer. 10.44% of the traffic identified as Mozilla (includes Firefox). MSNBOT was the third highest browser, at 4.87% of traffic.

On the myweb.facstaff.wwu.edu side, it is very different. IE traffic was 37.79% of traffic, and BigBrother (our Web monitor) was 34.28% of traffic. Mozilla came in at 9.72%. There was a LOT of RSS-reader traffic down in the sub 1.0% range. Hi!

Linux hacking from way back

One of my proudest hacks came when I upgraded my old college computer (that 486 I talked about earlier) with a 1.2GB drive. This was back in the day when anything over 520MB wasn't visible to the computer without trickery. When I attempted to install Linux to it, I ran smack into that trickery and couldn't get it to work.

You see, trickery in this era was dependent entirely on the controller being used to get that high. In my case I picked up the drive and a VLB-based IDE controller to drive it quickly. Therein lays the rub. The DOS driver for this board wasn't the most configurable thing on the planet and I had to poke at it a lot to get the partitioning I wanted.

Sadly, when it came to the linux install, I was getting different return data when I ran FDISK from DOS, fdisk from linux, and the card's partition utility. I'd partition how I wanted in linux in FDISK, reboot, and DOS wouldn't show any signs that I'd done anything. Somewhere, somehow, the partition table wasn't getting updated.

So out I trot a handy DOS toolkit known as "PC Tools" (now defunct, done in by Norton) that had some truly wonderful disk-hacking tools. One of 'em was a handy utility called 'disk edit' that allowed you to hex-decode bits of the hard-drive. Things like FAT tables, boot sectors, and most importantly, partition tables. With this tool I was able to discover that the various FDISK utilities were indeed changing the data in the partition tables, but for reasons unknown those changes weren't taking place. My theory was the trickery involved in getting a 1.2GB drive visible. Many hours later, I notice an identical data-set two sectors over from the partition table that looked exactly like a partition table, in the wrong spot, but with the data the machine was actually looking at.

AHAH!

I hand edit those values to match the ones I pulled from the 'real' partition table and reboot. Suddenly, its respecting those boundaries. More tweaking to make sure I have things done the way they need to be, and I'm happy.

Kernel-level support for the trickery arrived in the 1.3.x kernels somewhere, and is still in there. I can't remember the name of the card anymore, that machine died a number of years ago, but I suspect it was the CMD640 line. Probably wrong, but that seems closest.

Update: It was a DTC-2278 card, not CMD-640.

Today's new things

The big new thing today was enabling SMP in the kernel of the linux box I've been playing with. It didn't ship with an SMP kernel, so I got to compile my own. This is the proud part. But first a little history.

The very first kernel I attempted to compile was 1.2.21. I wanted something newer than the 1.2.17 that shipped with the Slackware disk we had (Volkerding visted our college and left a disk behind). On my 486/33DX, that compile took around 40-50 minutes. It was monolithic, as modules hadn't been created yet. But I got it done. I repeated the trick for 1.2.24, 1.2.26, 1.2.28, and 1.2.29. Then I tried some 1.3.x kernels, but the machine got superseeded before I could attempt any of the 2.x series of kernels.

That, as you can guess, was a while ago. I'm quite happy that I managed to compile and install the kernel and all modules without causing problems on reboot, on the first try. This has to be the first time that's happened. I'm so happy!

But... with all the modules and the kernel compile, the compile time took a shade under an hour. And that is WITH "make -j3" to allow make to run multiple threads (after I had the first SMP kernel in, and was tweaking it, of course). That visibly helped.

And I really didn't need to do that. This particular application is very, very low-key. I could probably happily run it on a PPro 200. I just wanted to see if I could get the kernel optimized a touch. Turned on SMP, and set the memory config settings to allow the entire 1gb of RAM to be used. Not that I'm short of memory. Just one of those things to keep a skill-set from getting rustier.

Brainshare scheduling

Brainshare has started scheduling for us non-alum. I have like four areas where I don't have a session, and I had to make some hard choices about conflicts. This is going to be a good year, and I feel I'll still not get everywhere I want to go to. If last time was anything to judge by, some of these sessions will not be what I'm looking for so I'll have a chance to take a stroll through the labs and vendor stuff.

Monday and Wednesday I don't have a spot for lunch. Monday the two sessions blocking lunch are the Identity Manager overview which I really need, and the Novell Client for Linux. The client for linux is exciting, but the IM stuff isn't immediately applicable even if I still need to know it. Wednesday its "Must know tips and tricks for Nsure Identity Manager Deployments", which is right up my alley, and "Deploying Universal Passwords" which is something we'll be doing here in the not too distant future. Sad.

On the other hand, I have three Laura Chappell sessions. If anyone from OldJob is going to Brainshare this year (unlikely, but it could happen) these sessions are ones they'll almost certainly hit. And since the one person likely to go from OldJob is an Alumni, she'd have scheduled already.

Printing update

We're halfway there to a stable printing platform right now. Yesterday we heard from Novell. We've gone this entire quarter without the CPU-hog abends that had been plaguing us all through Fall quarter, thanks to new NDPSM and NDPSGW modules we received. We've had some crashes, but so far they've been attributable to other causes. That's good news.

The bad news is that the other problem we've been having, printers mysteriously refusing to print, still has no known root cause. This affects only the printers in printer-pools; which sucks, since by definition those are the printers that have the highest volumes. Apparently a developer looked at the problem as described to him and decided that the entire print-pooling section of NDPS/iPrint needed a ground-up rewrite. So it'll be a month at least before we get revised code to fix our problem.

In the mean time, I search for a root-cause. If I can find that, hopefully we'll be able to help spur the developer down the correct path.

For reference (and search-bots) this is the error we're getting:
 2-16-2005  11:04:55 pm:    NDPSGW-3.1-0
DEBUG WARNING(prhh245-2): ** FATAL ERROR treated as INST FATAL in
ProcessDocData -> kill inst
That sort of thing. Once that triggers the "requested" column in the NPDS Manager portal-screen for that specific printer-agent will read a negative number. At that point, that pool will stop servicing print-jobs. This is bad.

Hula for Linux

Novell put out a press-release the other day about a new product or software project they're kicking off. Something called Hula. The plan is to take the existing NetMail product, and make it bigger. It isn't going to be a true competitor to GroupWise since they currently occupy different markets, and are attractive to different groups. Hula will be open source, where GroupWise will remain a closed-source Novell product.

The main object of the Hula project for the moment is twofold:
  • Provide a kick-butt webmail interface (think GMail)
  • Provide a kick-butt calendar that is easy to use, easy to share, easy to publish
What it probably never do is task-lists and workflows. For that, you can use GroupWise. One thing that it can do that GroupWise can't (and for that matter Exchange can't do as well) is scale into the six-figures of users. That alone will make it attractive to regional ISP's.

The calendar thing is one I'm looking forward to. If it is possible for a user to have multiple calendars associated with themselves, that'd be double-plus spiffy. Creating a standards-based calendaring system that can scale and is easily shared will be a really nice app.

But.

Anything that is standards-based, easy to use, easy to share, and lacks central control will end up with jerks in the system. Like SMTP, NNTP, IRC, or BitTORRENT. Calendar spam can't be far behind. Expect to have to reject 'appointments' to talk about hot penny-stock investment opportunities in the not too distant future.

Hmmm...

| 2 Comments
sftp to the student side of the cluster appears to be working now. Maybe.

Maybe.

ftp.students.wwu.edu

Web stats

We've been asked to ship the access-logs for myweb into a centralized log-dump repository. Since we can run reports from there, I've been looking at access-patterns for these servers. On the student-side, interms of hit-count about 58% of all hits come from our two service-monitoring packages. In terms of data transfered, we're still nada compared to www.wwu.edu. This service isn't as used as I had guessed.

FacStaff is a desolate wasteland, of course. The biggest request (after the monitoring pages, of course) is http://myweb.facstaff.wwu.edu/~riedesg/sysadmin1138/atom.xml which is the atom file for this very blog, and how almost all of you read me. The number two spot has one fifth the traffic of the atom.xml file for this blog and is http://myweb.facstaff.wwu.edu/~walkers/egeo350/index.html which is happy to see. Number three is the front page of this blog. I knew I dominated the access stats, but it is nice to see it proven.

Liebert MultiLink on Netware

Did you know that you can run the Liebert Viewer on Netware's X-Windows screen? I'm glad I can, since I'm having a hard time getting the JVM to run correctly on my admin workstation. Here is the NCF needed:

java -cp sys:system\ml\lib\em.jar -XX:+ForceTimeHighResolution com.liebert.dpg.app.LxExecutor mainViewer SYS:System\ML

Obviously, XWin has to be running to get this to work. But it should cause the console to load. This is handy, as I've said before, because it allows local at-the-console administration of the event settings and the like.

Liebert UPS

It turns out that Liebert follows the same licensing model that the rest of the industry does. In order to get remote shut down with their MultiLink product to work correctly, without hammering the SNMP card, you need a license. To activate the features that permit centralized management of the events thrown by the UPS, you need a second license. That second license also turns on the advanced notification methods like e-mail.

Only they don't spell it out that clearly in their documentation. I have the remote shutdown thing working, but the e-mail piece of it will have to wait until we get the right license file.

DHCP in the cluster

It is looking almost certain that we'll be moving our DHCP service off of the elderly Win2K DHCP box and into the Novell cluster. This was done due to circumstances where the DHCP server were to suddenly up and die, we'd have the ability to serve things. Supposedly.

And now for the testing.

More fun with stats

It just keeps on coming! Today's big project was to take my disk utilization data and make even more pretty pictures. Since I have a quarter and a half worth of data, I can start to draw conclusions from the data. So can my boss.
  1. The rate of free-space burn is increasing on the student side, but not on the Fac/Staff side.
  2. Fac/Staff burn space faster on the Shared volumes than their User volumes.
  3. Student burn rate increases as the quarter progresses.
  4. Student burn rate is fundamentally faster in Winter quarter than it was in Fall.
  5. Old user deletes are not liberating enough space no keep up
And most importantly, a pie chart and line graph were all it took to convince Mr. Checkbook that we had a looming problem, and get him to approve getting more disk.

Note: We have a SAN, so "Just throw disk at it" actually works. Without the SAN we'd be looking at a Reformat+Restore.

Pointless printer stats!

Just because I was in the mood, I ran some Pcounter reports.

Total number of pages printed since 1/4/05 (start of quarter): 690,288
Busiest Hour: 11:00-11:69am, with 86,054 pages printed during it
Busiest Printer: HH154-1, with 41,919 pages printed by it
or
Why I'm %*)(! tired of memory fragmentation

By: Me

Novell's new Open Enterprise Server has caused waves in the marketplace since it includes a choice of kernels to run Netware services from. Specifically, the traditional Netware kernel (arguably called Netware 7), and a Linux kernel (a.k.a. SuSE Enterprise Server 9 or 10). Novell has made a big deal about not being able to tell from a user perspective the difference between the two kernels. From the admin point of view with regards to web-tools, its quite clear. But from the end-user, the same. This is a big deal.

This gives Mega Corps a Linux kernel that has the history of support Novell has. Novell has been providing support to their OSes for coming on twenty years now, and a lot of Fortune BigNumber companies have support contracts with Novell already. The argument here has been covered on Slashdot a number of times, so I won't get into it.

So why am I, arguably a Netware bigot, looking forward to the choice? Because I very, very tired of memory allocator issues. Right there in the readme for the OES Beta is a little bit of text that sums it up completely:
The Linux kernel provides more robust memory management and TCP/IP services.
Netware has had TCP/IP available for its kernel since the NW3.11 days (1991 for those that keep track). Here we are, fourteen years later and Netware is still suffering stupid problems with regards to TCP/IP services. WinSock2 services are mostly OK. Routing is also mostly OK. I'd be happy with "OK" and not just "mostly OK".

The Memory Allocator issues really came to the fore when Netware 6.5 released. A pair of technologies ganged up on the staid Netware kernel to thump it into faultiness. Novell's eDirectory 8.7.3 included a much changed caching mechanism that greatly increased both how much data is stashed into memory, and changed how often it gets swapped around from disk. NSS 3.whatever that came with NW6.5 also included greatly changed disk-caching mechanisms for file serving, which also made disk-caching much more aggressive than it once was. All of this creates increased memory load/unload calls, and runs smack into Netware's intrinsic lack of ability to handle misbehaving applications.

Unless otherwise defined, all Netware programs run in Kernel space. Due to the ability of end-users to run stuff on the machine, OSes like Linux and Windows have had to create segregation between user-space and kernel-space. Netware 5.1 introduced the ability to run things in protected memory (i.e. not in Kernel space) but the number of programs that were able to run like that was very limited. Again, for comparison Windows NT introduced to the Windows world the idea of each program running in its own protected memory space. Two revs of Netware later, and the number of applications that can run protected is better; arguably the most recognizable is Apache2.

Within each memory space you can run into the specter of memory fragmentation. This is the same thing as disk fragmentation, only applied to memory. And if a program can't malloc() a chunk of free (contiguous) memory of the size it needs, it returns an error condition. I may still have 251 megs of free memory, but it doesn't help me if that memory is segregated into thousands of 2k words. Therefore, because Netware doesn't force separation of process environments, bad programming can cause system-wide stability issues.

This is why I'm looking forward to a Linux kernel option. I'm not looking forward to playing with version 1.5 of certain services, but the idea is very attractive. First and foremost, Apache runs a lot quicker on Linux than it does on Netware, due to the vastly superior POSIX emulation and years of platform-specific development. Novell is providing both NSS-on-linux (a new filesystem for linux for those of you who don't know what NSS is), as well as NCP-served-by-linux (different than the ncpfs modules that allow the mounting of NCP volumes), which will provide the 'native' fileserver for Linux. Whether or not this fileserver will be better or worse with regards to disk-caching performance I couldn't say. But process separation should provide a better firewall between runaway processes and system kabooms.

One of the extra-spiffy things with OES is that the clustering system that comes with it allows both Netware and Linux kernels in the same cluster. That's really spiffy. That way I can have Linux serving my web stuff (NetStorage, web-pages-from-user-dir, iprint) and Netware serving files. Yay!

OES Beta trials

I downloaded the Open Enterprise Server beta to see what it looks like. We may end up going with this in the not too distant future, and I also need to get a better hand of admin-work on a Linux-based server.

What surprised me is what OES is. Perhaps this is the beta form of it, but I was not expecting what I have here. The Netware kernel version is NW6.5SP3 with OES services. The Linux version is SLES9 with OES services. Since I know all about NW6.5, I didn't try out the Netware kernel version, and installed the Linux version. So far it has been going... interestingly. This will change how we handle file-access.

Right now I see a use for OES-Linux. In our cluster, we have two real batches of services. File & Print, and Applications. File & Print is the meat 'n potatoes of Netware, and that stuff is perfectly fine right where it is. Its the other stuff, MyFiles and MyWeb to be exact, that would benefit from being run on a Linux kernel instead. Linux has much better memory management, more mature TCP/IP stack, and a much improved POSIX layer which will make apache2 a lot happier.

MyFiles would be the easy move, since that just NetStorage. No biggie there. MyWeb will be the challenge, since I'm pretty sure mod_edir hasn't been ported to SuSE yet. Perhaps we'll be able to get away with 'useracct' instead, but I'm not in a position to test that out.

rc.d init.d

| 1 Comment
I learned something about Slackware today.

There are two styles of startup-files on Linux that I've run into. Slackware uses one, and RedHad and others use the other. This is "well DUH!" territory for people who play around in Linux-land a lot, but I haven't.

The Slackware Method
/etc/rc.d/ is the directory the startup files are kept. Files are "rc.something", rc.1 for runlevel1 on up etcetera, with other files (rc.inet1, rc.serial, etc) called in as needed.

Everyone Else (or at least RedHat and SuSE)
/etc/init.d is the directory the startup hive is kept (I think)
/etc/init.d/rc1.d/ is the directory holding the scripts to kick off runlevel1, and so on.

Today I learned that Slackware will kick off files in "/etc/rc.d/rc6.d/" if they are present. This is handy when installing programs that assume the other method.

Why Slackware?

Because that's what I ran into back in 1995 when I encountered Linux for the first time. I haven't taken the time to learn anything else. Gentoo scares me. SuSE makes me hide under the desk and peer out suspiciously. But I'm sure if I devote the time to it, I'll come out from under the desk and learn how to admin the darned thing. But not yet.

Update:
Slackware uses BSD-style startup, the other method is called SystemV. Now you know.