March 2005 Archives

Hope on the NDPS front

We just received a revised NDPSM module from Novell. This is the latest hope in the fight against printers that suddenly stop printing. Novell had to rewrite how printer pooling was done, since apparently the first incarnation of it was sub-optimal in a number of ways.

The NDPSM.NLM that we had before, dated mid-February came in at 356KB, and had this info:
Version 3.01.14 Wednesday, February 16, 2005
Copyright (c) 1989-2005 Novell Inc. All rights reserved.
NDPS Manager

The NDPSM.NLM that we just received comes in at 1050KB, and has this info:
Version 3.01.14 Wednesday, March 30, 2005
Copyright (c) 1989-2005 Novell Inc. All rights reserved.
NDPS Manager (Debug)

The extra size probably is partially due to the debug symbols included in it. But it is very encouraging. I spoke with the developer who was working on this while I was at Brainshare last week. He gave me a good idea why the old code was not the best when it comes to handling printer pooling.

We already knew that the problem was because the scheduler was attempting to print two things at the same time. I had proof of this on the wire, and the developer suspected it from the code. If it prints that fast, a timing condition kicks in that in involves PCOUNTER in some way to cause the scheduler to forget it has a job on deck and just sit there forever.

The NDPS pooling code before was written in a fairly unmodular way. If I understand it right, a 'pool' bit was set somewhere, and all throughout the scheduler code you got assertions like this one:
if isPool then {
[do pool stuff]
} else {
[do non-pool stuff]
}
Which sucks from a code-path point of view. The revised code introduces a layer of abstraction into the mix. The scheduler doesn't do much different if it is a pool, the servicing agents know they're in a pool and avoid conflicts while they contest for jobs to service. Or rather, the scheduler offers up jobs for servicing by attached printer-agents, and the agents line up to service them. This can scale pretty well.

One of the trivial improvements included in this new NDPSM is the ability to clear operator-holds from jobs while in the NRM NDPS Manager portal. Previously, only user-holds could be cleared. I mentioned this to the developer last week, and I'm happy to see it in code today. That's spiffy.

Excitement this morning

| 1 Comment
I came in this morning to find things in a bit of a tizzy. Turns out we had some cluster excitement during the night. The exact timeline is unclear, but I think we had a switch reset in there, and a cluster-leave event on the server throwing the cache-allocator errors.

One of the three cluster nodes was rebooted and only ran parts of the autoexec.ncf file. It ran some of it, but certain bits that are required to even load certain cluster services (like volumes, FTP, and anything webby like this) didn't load. Odd. Very odd. Its running things now, but we'll have to keep an eye on it. Sadly, the relevant bits of the boot process scrolled past the furthest extents of the Logger and console screens. Netware does not have a dmesg command.

First day back

This is the first day back at work from Brainshare. It has been a solid day. The job for the day was to reverse the brain transplant I did before heading out and put my laptop back into a state where I can use it at home. This was accomplished, though not without a bit of sweat. I was getting CRC errors on some parts, but I managed to get ghost to behave in the end.

Otherwise it has been a lot of knowledge transfer to other people. I picked up quite a bit at Brainshare this year. I'm very encouraged by the fact that my boss said, "It seems you got your money's worth." This implies that there is a good chance I may get to go next year! Woo! If true.

Friday keynote

The Friday keynote is always the most technical of the keynotes, with product demonstrations being the rule. This one was no different. One Novell product was demoed as well as several OSS products.

The Novell product was their Virtualization Services. This is an interesting product that does server virtualization, but in a way we haven't seen on commodity x86 hardware. They claim to have gotten around some of the resource hits that Virtualization causes to make running things this way actually attractive; we'll see about that when we get our hands on it.

The impressive thing about this offering is that it can offer clusters in a way we haven't seen before. We've seen virtual servers before in clusters, they're just an impersonation of a server by an existing server. This new service takes it to the next step by creating a full blown server (SuSE and NetWare-kernel were demoed).

Planned failovers happen by saving the state of the machine (think Hibernation mode) and transferring it and the storage associated with that Virtual server to another physical platform and starting it there. This has the added benefit of preserving the state of the TCP stack, so clients don't have to reconnect (presuming their watchdog settings are long enough to pass over the service interruption). This is very nice.

On the Open Source Software front, Novell demoed Mono and a lot of what it can do. One thing that I hadn't been aware of is that you can take .NET compiled binaries, and execute them with Mono on both Linux and PPC. Whoa. Didn't know that. The new Mono will shortly have full support for windows forms, which was what was demoed today. Expected release in Q2 of '05.

They also did a demo of how Hula works with non-Hula clients. When sending calendar invites to other people, it sends a message with 'accept' and 'decline' links in the mail. The recipient then clicks whichever, and Hula marks that person as accepted/declined as appropriate. Nifty stuff, that. They also demoed a Mac client pulling down the ICS for a user and importing it into that local calendar.

Brainshare network

The Brainshare wireless network has been nearly unusuable a lot this week. Today we learned why that is. Apparently Laura Chappell was in the Wednesday keynote and did some monitoring. There were 1900 wireless devices contending for the few APs in there. And in her Bring Your Own Laptop class, there were 375 devices contending for the very same AP.

With loads like that... It looks to me like we're running right up to the limit of 802.11b (the flavor in use around here) by having so many geeks with wireless around. I do wonder if there is a way to engineer around loads like that. It doesn't help that a lot of the wireless cards around here are in 'soft-AP' mode and are acting as repeaters for the hardware APs, and thus scrambling the channels even more.

The number of hardwire connections is under 20, and they seem to be in near constant use.

Day 3

The keynote this morning was special in that the Gov. of Utah came by to deliver a 30 minute speach. It was a lot about how great Utah is, and how Novell is helping to drive the tech industry here. Unusual for a Pol to show up.

Other then that, I skipped my first session since wow am I unmotivated. Tips and Tricks for Identity Manager Deployments. The chances of us deploying it in any meaningful way are miniscule. I only signed up to learn more about the product, and as Yesterday's session proved that is not what I'm getting.

Next up is Deploying Universal Passwords, which we're in the middle of, then iManager configuration which I need. Then getting silly at the conference party. Woo.

Day 2

It has been a full day already. I got up at 6:15 body time, and bleered my way into breakfast. Ate. Then spent a good session learning about NSS on linux and what it can and can't do.

NSS on Linux
It does a lot. The core NSS modules is a direct port from Netware. Unfortunately, the linux kernel doesn't handle the extra bits that NSS provides (*cough* posix *cough*), so it required some creative engineering to allow the extra non-posix bits to be used. NSS allows you to create rich rights assignments, something that is lacking on standard POSIX.

In fact, the code for NSS has been submitted to Open Source and either will be included in the Kernel source, or is available other places. One thing to note, though. Since there is a significant non-posix layer in there, just loading nss.o into kernel won't give you the full NSS experience.

Performance-wise, NSS-on-linux is slower than NSS-on-Netware. Which only stands to reason. NetWare has spent coming on 20 years being fine tuned as a file-server, where Linux is an application server with general purpose OS aspirations. The presenter at this meeting noted the NSS team's surprise when the performance lab reported that NSS was only 12% slower than ReiserFS; they were expecting much slower then that since NSS is a port and Rieser was built from scratch for Linux. Also, they were rather clear that NSS is great for file-serving and such, but not the best choice for things like database serving or data warehousing.

Deploying OES on HP Blades
It was all about how to install OES-Linux onto blades using PXE, DHCP, and code hacking. Since our problem is how to get OES-NW installed onto blades, this did not apply to us. So I skipped it to take a nap.

NSS Tuning
Facinating seminar. And happy days, it WASN'T linux-centric! Yay! In this class we learned that the test-lab was able to get a system to do 10GB/Minute throughput on TSATEST, and got this performance by using software RAID0 on top of two hardware RAID5 arrays. Yep, software RAID0. Whoa....

My inner performance geek was very happy about this. I got some good ideas to test things out back home. Though I suspect our bottlenecks are more in memory/network land than storage I/O land.

Two more sessions to go, and then the Sponsor Party. Marti Gras theme, so I'll most assuredly have a couple of fist fulls of beads that I'll never want again by the time the day is done. *Sigh*

TUT 102: Novell client for Linux

Of the two sessions I've been to, this one was by far the most fun. Novell has released Beta1 of the Novell Client for Linux. This client was designed to replicate Client32 on a Novell Linux Desktop environment. This is a great boon to us network administrators since it permits us to run login-scripts to those stations.

The product as it sits right now does lack some features, but what it has is downright good. It'll run existing login scripts unchanged, which is an immense benefit. It isn't 100% comaptible yet, but that's why its a beta.

It'll support console/SSH logins! By using PAM, it'll allow such users to log in text-only, and get all the benefits. It'll mount their map-drives and give them a link to them in their HOME directory. How that works is pretty nifty.

map u:=servername/volume:%USERNAME%/

Standard script entry. What that'll give is a symlink in their HOME directory to that location.

/mnt/novell/%USERNAME%/servername/volume/%USERNAME%/

This does permit using more descriptive names, but the Windows client can't use those commands. That's nifty all by itself.

There is a kernel module that loads that permits access to Novell-hosted servers called novfs. This works a lot like nfs in that it is a virtual filesystem that permits remote resources to be mounted local. This replaced the ncpfs that's been around for a decade with something that, IMHO, actually works the way I need it to. This same kernel module permits the local machine to access all of that special Novell-only stuff like trustee editing, salvage & purge access, and volume stats.

They also support both GNOME and KDE desktops, though only the consoles hosted on Novell Linux Desktop are tested so far. It'll probably work on others, given enough fiddling. It is distributed in RPM format, among others, so there is a chance to try it out on other things like, say, Gentoo.

It also has plugins for Konquerer and Nautilus. Also nifty.

It isn't 100% complete yet. We can't use it quite yet since it doesn't do contextless login yet. It'll get there, possibly in the next point-rev, but not for this release. Sad. But at least they're building in the bits that'll permit it to happen. This IS a v1.0 build afterall.

They expect to have a release candidate build this summer some time.

Netware kernel is dead?

After watching the opening keynote of Brainshare 2005, it becomes even more clear that the NetWare kernel is on its way out the door. Support for the NetWare kernel will probably be the same as support for IPX... Novell will do it until the end of time, simply because it needs to. But that doesn't stop it from pushing Linux very hard. All of the product announcements Messman made were Linux oriented in some form. Not a one was Netware-kernel oriented. The speaker that followed Messman was the director of Open Source Development, which just underscored the Linux focus.

Is this the writing on the powerpoint slides?

Novell will continue to develop the current web-based applications that run on NetWare (iManager, NetStorage, VirtualOffice, etc) in the Open Enterprise line. However, partners will develop more and more for Linux since that is a much simpler environment to develop to. One thing that was very telling to me, was the following statement made by Nessman:

All of Novell's Windows based backups have been moved to Linux.


Backup is one of those things that really is a mission critical thing. I've already ranted about running backups on NetWare (which is different from backing up NetWare), so I'll save you that rant. This is telling that Novell believes that Linux is strong enough to do mainline backup. It is also telling that Novell didn't think that NetWare was strong enough for that. I was right! Pardon this moment of vindication.

So it looks like the NetWare kernel will still be continued, but it will not be Novell's preferred platform to develop to. The non-Novell products that support NetWare kernel will dry up over the next five years in favor of Linux as a platform.

Brainshare Sunday night

Sunday night is Brainshare's welcome party. Novell puts on a pretty good bash (though nothing to match Wednesday) to welcome people to the conference. This year the party theme was 'nightclub'. They had a beefy bouncer at the front who was 'looking' at people's 'ID', lasers, glow sticks, and fog. The band was a cover band that was doing a pretty good job. Other attractions included a Laser Tag arena (inflatable), archaic arcade games (Asteroids, PacMan, Donkey Kong, etc), and as open a bar as Utah's liquor laws permit. Plus, food. This is the Sunday Dinner.

The food had a nice variety. The theme, such as there was one, seemed to be Asian influenced. They had the standard cold-cut tray for those of us who aren't even close to adventurous when it comes to food. They do get points for offering sushi at the buffet. They also had a smoked salmon that was pretty good, though I begin to suspect I may have a Thing against chemical smoke. Gosh, I hope not.

Overall the crowd is looking a bit younger than it was when I was here in 2001. Since then Novell purchased SuSe and has delved deeply into the land of Linux. This has had the effect of dropping the average age of Brainshare attendees a year or two. This is good. Also, I suspect that if you lined up all the people with long hair on the stage, there would be a 50/50 split between the men and the women; this is the double-whammy of a 80/20 men/women ratio and a much higher per-capita incidence of long hair in linux-geekdom.

Not a lot of content yet, though the vendor hall was open for business. One thing that caught my eye is that SyncSort is a Platinum sponsor and Veritas isn't even here officially. I know one fairly high-up Veritas guy who is here, but I suspect that isn't in a 'sell our stuff' mode, and more of a 'keep up with what's happening' mode. Unknown if this is a side-effect of the Symantec merger.

I did find one odd thing this afternoon:

eth0 IEEE 802.11-DS ESSID:"FBI-Trace-BS4U" Nickname:"mark"

Strangely, when my wireless card found this particular access-point, I could still get to the network. I also had a private 10.70.0.0/24 address instead of the Brainshare network's address. Laura Chappel playing with our heads? Some funny guy? Or really the FBI? Who knows. All I know was that I was tunneling my HTTP through SSH at the time, and am probably safe for a few days.

Another thing I noticed from last time is that HP/Compaq is not the exclusive hardware vendor. I've seen both Dell, IBM, and what looks like whiteboxes here. INteresting.

Arrived in SLC

I'm here. I attended the NCCI party at the Shilo like I intended, and met a bunch of people I've only seen on line, and relieved confusion at the same time (long story). Played pool, drank drinks (SLC allows spirit sales these days, though last-call was 11:30), and chatted about weather and clusters. Mmmmmmm.

Registration opens tomorrow at Noon, though I plan on drifting in around 1:30ish to avoid the mad rush that'll happen at noon. Tip from a fellow partier.

The flight was mostly uneventful. Landing in SLC was the shakiest bit. There was a bit of a cross wind at the runway, so we hit the ground with a noticeable lateral vector. Since I flew out of Bellingham, I got to have my bag searched right there behind the ticket counter. Did you know that they're upgrading to an enclosed baggage claim area?

Brainshare, on my way

My flight leaves in about two hours. I arrive SLC at around 6pm, if everything goes as planned. Then tonight, the NCCI party at the Shilo. It'll be fun.
Found here

And here

Turns out you can redirect the GUI to an HTTP client without authenticating! Neat trick, that. Also one I wish I knew about before, because it would make certain software installs a bit easier. Since we don't leave the GUI loaded as a rule, this doesn't apply to us that much. But still, good to know it is there.

Blogging Brainshare

I intend to blog brainshare, but that's dependant on me getting my laptop in a state where that can be done. It's a middle-aged Inspiron 8000, and my wireless cards are giving me grief. The DWL-G630 I had locks up in the laptop from overheating, so I had to replace it. It got a Netgear WG511v2... which also doesn't have good Linux support.

The WG511v2 is my best bet. Right now I've got ndiswrapper running, and it loads the driver most of the time. Not always. And I get kernel panics three out of five times. Very kludgy. I get better results when the WAP doesn't have any sort of encryption, which the Brainshare WAPs won't. We'll see how it gets.

Mac and SP3

| 2 Comments
Looks like AFPTCP in SP3 isn't 100% nifty:

01 03 00 01 00 00 00 00 00 00 01 B2 00 00 00 00 ................
00 18 00 2F 00 72 00 B2 00 BF 06 43 52 4F 4E 55 .../.r.....CRONU
53 00 00 9B 00 AB 00 00 16 4E 6F 76 65 6C 6C 20 S........Novell
4E 65 74 57 61 72 65 20 35 2E 37 30 2E 30 33 06 NetWare 5.70.03.
0E 41 46 50 56 65 72 73 69 6F 6E 20 31 2E 31 0E .AFPVersion 1.1.
41 46 50 56 65 72 73 69 6F 6E 20 32 2E 30 0E 41 AFPVersion 2.0.A
46 50 56 65 72 73 69 6F 6E 20 32 2E 31 06 41 46 FPVersion 2.1.AF
50 32 2E 32 06 41 46 50 58 30 33 06 41 46 50 33 P2.2.AFPX03.AFP3
2E 31 02 10 52 61 6E 64 6E 75 6D 20 45 78 63 68 .1..Randnum Exch
61 6E 67 65 16 32 2D 57 61 79 20 52 61 6E 64 6E ange.2-Way Randn
75 6D 20 65 78 63 68 61 6E 67 65 31 34 30 2E 31 um exchange140.1
36 30 2E 32 34 37 2E 32 37 00 00 01 06 01 8C A0 60.247.27.......

Note the lack of Diffie-Hellman exchange. Therefore, no 9+ character passwords. Happily, it'll take the first 8 characters of your 156 character pass-sentance, so long as you have a Universal Password.

Logical extension of MyWeb

Over a year ago, when the MyWeb service was given to me to make happen, I made a prediction.

At some point, they're going to ask me to provide pretty much direct file-system access by way of Apache.

It took over a year, and lots of instability probably contributed to it taking this long, but I just got the request to provide, "MyWeb, but for departmental shared areas. A sort of DeptWeb." That's right boys and girls, they want to provide departmental web-pages from Netware. While I'm cool with that in theory, the execution doth suck. And this is why.

You see, in order to provide that, I need to use mod_edir again. While I'm fully versed in it, mod_edir (and more specifically the LibC calls that allow it to work at all) isn't stable enough for production usage yet. NW65SP3 might fix that, but I'm still a few weeks away from knowing that for certain. If SP3 does fix it, then yes, we can do it. If it doesn't, then DeptWeb will have the same uptime as MyWeb. Which, if you hadn't noticed, is best described as, "most of the time, we think."

No, MyWeb is not a service capable of 'two nines' uptime. Heck, Intermapper reports our uptime for student MyWeb at 95.7%. Hardly a reliable service. DeptWeb will be just as wibbly, and I'm pretty sure that won't sit well with the grand high muckity mucks who send out an all-campus mailing for some prestige event, and the URL to the PDF that explains it all is not working.

SSH into the cluster

SP3 has the LibC update that claims to fix our problem. Sadly, the update needs to be on both ends of the conversation (SSHD-server <-> User-directory server) for the change to be effective. So have patience. We hope to get the cluster updated over break.

NW65SP3 progress

The service-pack so far looks like a good one. We have it on two nodes in the cluster now, and no problems have cropped up. Cautious optimism. Normally we don't apply service-packs this soon after release, but we had a needed fix in the LibC portion of it that mandated we apply as soon as it came out.

sysadmin humor

"fsck this!"

Whrump--BRAAAAAA--RUmRumRumRUmRumRUM-BRAAAAAARRRrrrrrrrrrrrrRRRAAAAAAAA *clunk*
RumRUmRUMRUmRUmRumRum---gasp

"HAH! I'll give you 'bad sectors' you piece of badly organized iron."

NW65SP3 installed

On my first server. And what do you know, they changed the style-sheet for the Netware Remote Manager (thingy on port 8009).



Note the text.

"Notell Open Enterprise Server, NetWare 6.5 Server Version 5.70.03..."

More indication that OES-Netware is just NW65SP3 with bits added.

Passwords in NDS

The NDS password is a very secure password. So secure, that Novell had to result to trickery in order to support other password schemes. This is why.

When NetWare 4 released, NDS came with it. With NDS also came a new login scheme. Novell actually paid RSA to license their encryption technology, rather than use unencumbered encryption methods. This is where the problem lay.

When you log in to a Novell network from a Novell Client, you get asked a password. You give it. The Client then requests from the Server the RSA Certificate for passwords in that particular tree, and encrypts your password with that key. The ciphered value is then compared with the cipher-value stored in your user-object. Since the system does not use a public/private key system and instead just a one-way cipher, there is no simple way to turn the cipher-value into its clear-text value.

Universal Passwords are Novell's way of fixing that. Instead, it uses 3DES, and safes the DES keys multiple ways. Since DES is reversible, it is possible to translate the Universal Password into the various password styles required by the various Native File Access protocols.

So in essence, Novell went TOO secure back in the early '90s. Nothing is uncrackable, but NDS passwords are pretty tough. Too tough for interoperability, as it eventually turned out.

Product Releases

The latest Consolidated Support Pack is out, which means that NW65SP3 and NW51SP8 are out. There is no SP for NW6.

Also, OES released today. Good timing, since it'll allow Brainshare in two weeks to be all about OES and how it relased in time. As I suspected, the Netware-kernel part of OES is just NW65SP3 with bits added. Hah!

Universal Password notes

Passwords changed by the following methods will change Universal Password and get propegated to eDir PW and Simple PW:

  • Change-password from 4.9+ client change-password dialogs
  • Change-password from 3.40 client, if NICI 2.6.1 and NMAS Client 2.2 are also installed (not default)
  • Change-password from AFP client, if AFPTCP.NLM is loaded on a NW6.5 server
  • Change-password API call made from a 4.9+ client machine, to a 8.7.3 directory-server hosting a replica of the user's object
  • Change-password from C1, if on a 4.9+ client machine
  • Change-password from C1, in a special tab, if on a machine under v4.9
  • Change-password from NWADMN32, if on 4.9+ client machine (untested)
  • Change-password from iManager, if iManager is on a NW6.5 machine with eDir 8.7.3
  • Change-password from LDAP, if NLDAP.NLM is loaded on a NW6.5 server
Gotchas:
  • Universal Passwords are enabled by container, or by tree. Subcontainers are also included when selecting by-container, but new containers will NOT have UP enabled by default.
  • When Universal Passwords are turned on, the Simple password is written to the Universal Password, not the eDir password. This is because the eDir password is so secure, it can't be retrieved in cleartext, and Simple passwords can.
  • If Universal Passwords are turned on without the sync option set, the UP will be populated at password-change time. Simple/eDir/UP will still be synced, but the initial population of Universal Passwords will not happen when UP's are turned on.
  • The AFP password is still 8 characters (though OS 10.3 fixed this?). Like Unix, if the Universal Password is longer than 8 characters, the usable password is just the first eight characters.
  • Password policies are created in iManager, which include things such as complexity requirements.
  • Passwords are now case-sensitive. Surprise!
  • Password-policy objects are kept in the security container (.passwordpolicy.security.wwu)
  • The password-change rules are displayed to the end user, if the end user is changing their password from the client (where client is v4.9SP2 or greater)
Advanced Password Restrictions include the following possibilities:
  • Require unique passwords
    • Number of passwords to store
    • Number of days to store a password
  • Password Lifetime
    • Number of days before password can be changed
    • Number of days before password expires
    • Number of grace logins
  • Minimum/Maximum characters in password (1-512)
  • Minimum number of unique characters in the password
  • Maximum number of times a specific character can be used
  • Maximum number of times a specific character can be repeated sequentially
  • Minimum/Maximum number of upper-case characters
  • Minimum/Maximum number of lower-case characters
  • Allow/disallow numeric characters in password
    • Disallow numeric as first/last character
    • Minimum/Maximum number of numerals in password
  • Allow/disallow special characters in password
    • Disallow special characters as first/last character
    • Minimum/Maximum number of special characters in password
  • Password exclusion list, hand edited. NOT INTENDED FOR DICTIONARIES, says so in the documentation.
    • "Instead of a long exclusion list to protect against "dictionary attacks" on passwords, we recommend that you use the Advanced Password Rules to require numbers to be included in the password."

More monitoring!

The Telecom people are going to let us server folk at their Intermapper server. This is nifty, since we'll be able to build 'maps' that show what's up and down, and also configure e-mailing as a result of what it sees. We've been using BigBrother for a while, and that is clunky. This is more featured, and the notifications can be CONFIGURED!

BigBrother would page/email/whatnot every 5 minutes until the error condition fixes. So if, say, Myfiles went down at 11:45pm, when I get up in the morning my pager would have far too many messages on it. That sort of thing. Intermapper allows you to configure how many times it'll bug you. Cool!

Also, they have a way of creating custom probes. This has me salivating. Their SNMP probes are very customizable, so I should be able to do Neat Things with that. Also, they have a TCP probe format that I might be able to jigger to good use. If I get any of this stuff working, I'll probably post the probe-files here. A good number of contributed probes are available here.

Network glitches

You may have noticed a few glitches accessing core services in the last week. Well, we've figured out where they are occuring, but not why. They're backup related, and they're also CatalystOS related. I'm not the telecom geek, so I'm not 100% on the details relating to the switch/routing infrastructure. But this is the diagram of our traffic problems.

ServerA is being backed up by BACKUP. The traffic follows this path:

Switch 1 -100-> PIX -100-> Switch 2 -Gig-> Switch 3 -Gig-> Router 1 -Gig-> Switch 2 -Gig-> Switch 4

Annotated...

Switch 1 -> PIX Getting out of the firewalled network
Pix -> Switch 2 Because that's where the PIX is plugged in.
Switch 2 -> Switch 3 Switch 2 sees that the traffic is from VLAN 1, so it sends it to the switch that handles VLAN 1 traffic.
Switch 3 -> Router 1 Switch 3 sees that the traffic is destined for VLAN 2, so it shoves it at a router
Router 1 -> Switch 2 Router sees that the traffic is destined for VLAN2 and knows that Switch 2 handles that stuff, so sends it on to Switch 2
Switch 2 -> Switch 4 Switch 2 knows that Switch 4 is hosting the port the traffic needs to get to, so shoves it down the ISL.

How the traffic SHOULD route is like this...

Switch 1 -> PIX -> Switch 2 -> Switch 4

The problem is with switch 2. The traffic is not enough to saturate it, even with handling the data stream twice. The CPU is not bombing. Somehow, when it is doing this backup procedure it stops processing all other traffic but the backup traffic. We're trying to figure out why. This is strongly corolated with the CatOS upgrade we did last week. CatOS upgrades are major mojo, so we're not about to backrev just for this if there are workarounds we can use instead.

Funny movie

John Cleese in a flash-movie denigrating tape-backup. Where can you go wrong?

Full disclosure: It's an infomercial for a to-disk backup product. Still funny.

Listen real close to the Manager when he shows up.

Minor update

Pretty quiet the past few days.

We had to take both clusters down for about 15 minutes while telecom upgraded the IOS on the switches servicing the clusters. That went with nary a hitch. The first part of this was done a few weeks ago, so this should be the last time we have to do this for them for a while.

We're preparing to get the cluster and backup servers onto a gig-E blade. That'll improve things! For a while all the student services were housed on one cluster-node, and that was averaging between 15-20% utilization on 100-meg Ethernet. We have slack, but it getting into the area where performance degradation starts creeping in.

Musings on the printing problem

Novell is rewriting the print-pooling code, so this is sort of moot. But I just got a second good network-capture of what The Problem looks like at the wire-level. At 1:22:05pm today, PRHH245-1 suffered the problem. And I got a sniff. Wohoo!

Normal traffic (as far as I can figure):
(S = Print server, P = Printer)
[handshake]
  1. S -> P: LPR Print-queue name
  2. P -> S: OK
  3. S -> P: Control-File name will be...
  4. P -> S: OK
  5. S -> P: Control-file transmission (includes header info like the filename, job owner, print-server)
  6. P -> S: OK
  7. S -> P: Data-File name will be...
  8. P -> S: OK
  9. S -> P: Data-file transmission
  10. P -> S: OK
  11. [session teardown]
When the problem happens, it steps in at step 9. When it happens, instead of step 9, you get session-teardown instead. What is also telling to me is that there are two print-jobs attempting to print essentially simultaneously. The difference between the two print-jobs at the wire-level is about 150ms. As I page through the capture, I don't see that sort of occurrence happening anywhere else.

So, looks like software is tripping up, rather than something strange happening on the wire. At least, that's what my two datapoints tell me so far.

RSS bots trolling this blog

| 1 Comment
In very rough order by number of hits:

Feed Burner
MagpieRSS
SharpReader
Bloglines
Everyfeed spider
PubSub.com RSS reader
Technoratibot
BlogPulse
Java
Blogsnowbot
NewsFire
UniversalFeedParser
Blogshares Spider
BlogzIce
Blogwise CacheBuilder
Multimap Geotag Blog Parser
The World as a Blog

Lotta blog-parsers out there.

Central vs. Distributed

I've been pondering the different organizational styles of OldJob and here. Both have undergone centralization of the IT function, but with different results. And also, different methods.

At OldJob, centralization was done by fiat with the details worked out later. The IT director spent a lot of time managing the different areas. There were still two major silos of IT that connected with central IT only in theory, and rarely in practice; after seven years that wall was only just showing signs of weakness.

Here at WWU, that was done by diplomacy and bribery. The first step was to set up a central e-mail system, and offer that to everyone; this gave the smaller departments an option to get out of the e-mail biz that many took. Then came a central printing system that offered the same thing. The central file-server farm was an attempt to do that as well. The SAN has provided another incentive, in that it provides fast, high capacity, backed-up storage, that is sparking another round of migrations.

There are still two big departments that are resisting the migration to central services, and they're big enough that they can stonewall as long as they want to. They may move in the future, but it is far from a sure thing.