Recently in netware Category

On a Wednesday in August in 1996, the WWU NDS tree was born. There were other trees, but this is the one that everyone else merged into. The one tree to rule them all. That was NetWare 4. It brought the directory, and it was glorious (when it worked right).

And now, most of 14 years later, it is done. The last replica servers were powered off today after a two year effort to disentangle WWU from NetWare.

I have some blog-header text to change.
One of the more annoying problems with password and account-lockout policies in Active Directory has been that they apply to every account universally. I you want to force your users to change passwords every 90 days, with account lockout after a certain number of bad login attempts, then the same policies apply to your 'Administrator' user. Account lock-out was a really great way to DoS yourself in really critical ways.

In a way, that's what account-lockout is all about. It's to keep bad people from coming in, but its also a way for bad people from preventing legitimate people from using their own accounts. You need to take the good with the bad.

Since we were a NetWare shot for y-e-a-r-s we're very used to Intruder Lockout (ILO), and losing it during the move to Windows was seen as a loss of a key security feature. We had accounts that had to be exempted from lockout, which was dead easy in eDirectory but very difficult in AD.

Happily, Server 2008 introduces a way to do this. It's called "Fine-Grained Password Policy", and is NOT group-policy based. This was somewhat surprising. Getting this requires raising the domain and forest functional levels to the 2008 level. What it allows is setting password policy based on group memberships, with conflict resolution handled by a priority setting on the policy itself. Interestingly, the actual policies are created through ASDI Edit, so they're not beginner-friendly.

For instance, we can set a 'lock out after 6 tries in 30 minutes' setting to the Domain Users group at a Priority of 30, and a second 'never lock out ever' policy to the Domain Admins group at a Priority of 20. That way 'Administrator' will have the never-lock policy apply to it, but Joe User will have the lock-after-6-in-30 policy apply. This works best if the password policy specifies that Domain Admins need to have very complex and long passwords, which makes a brute-force cracking attempt take unreasonably long amounts of time.

We put this in place a few weeks ago, and it is working as we expected. SO GLAD to have this.

TCP problems

| 3 Comments | No TrackBacks
My testing for a cheap NAS solution has progressed to the option that costs the most money, Windows 2008 running KernSafe's iStorage. As it happens, it works really well when the iSCSI initiator is Windows but Linux clients don't really want to talk to it. Windows: 30-50 MB/s. Linux: 3-5 MB/s. Biiiig difference there.

Looking at packets I'm noticing a similar pattern on the wire to one I'd seen before. Back when I was troubleshooting exactly why NetWare backups to DataProtector were horrible I came across this problem. It seems that TCP Windowing is fundamentally broken between Server 2008 and NetWare which leads to really bad throughputs, which in turn is very bad for half TB backups. The receiving server seemed to feel the need to ACK after every two packets, which in turn really slowed things down. And that's what the Linux clients are doing for iSCSI to Server 2008.

It has to be something affecting basic TCP services but not complex protocols. Using smbclient to upload a 4GB DVD iso runs at 50MB/s but the iSCSI throughput on the same client is a piddly 3-5MB/s. I'm sure some kind of tuning on either side might be able to jar things loose, heaven knows Linux 2.6.31 is a heck of a lot more current on TCP settings than NetWare 6.5 SP8 is. I just haven't found it yet.

Conversely, Server 2008 talking to a Linux iSCSI client works at line speed pretty much. I'm testing this for completeness's sake. We need something that can serve up to 30TB via both iSCSI and SMB. My findings aren't fully complete yet, but in general:
  • OpenFiler: GREAT iSCSI host, completely blows for SMB in our environment.
  • OpenSolaris: Great iSCSI host, just can't convince the kernel-mode CIFS to join our domain. Also, worst-of-breed random I/O performance.
  • OpenFiler + Windows: OpenFiler for iSCSI, Windows (mounting an iSCSI share) for SMB. Should work GREAT. Current best-best for the future.
  • OpenSolaris + Windows: As previous option, but I/O problems make it less attractive.
  • Windows + KernSafe: GREAT SMB performance, solid iSCSI for Windows hosts. Linux hosts will take lots of tuning (perhaps, or it could be intractable).

The passing

| No Comments | No TrackBacks
At 16:40 this afternoon I issued the final 'cluster down' command on the WUF cluster. This 6 node NetWare cluster was born August 26, 2003 as NetWare 6.0. It replaced a trio of large file servers (Huey, Dewey, and Louie) that had been providing file-serving to campus, and allowed this critical function to be provided in a highly available way.

As of 16:40 April 12th, 2010, WWU Information Technology Services was no longer in the Novell File Serving business. Other entities on campus still provide this service. ITS continues to provide identity management and replica hosting to eDirectory.

The remains of WUF will be cleaned up over the next couple of days.
I'm going over some of my older posts and am reposting some of the good stuff that's still relevant. I've been at this a while, so there is a good week's worth of good essays hiding in the archives.
Shortly after the release of Novell OES SP1, the version of Open Enterprise Server based on SuSE Linux 9, I ran a benchmark series to determine just how it would hold up in our environment. The results were pretty clear: not that good. I re-ran some of the tests with later versions and it got a lot better. SP2 improved things significantly, and has gotten even better with OES2 (based on SLES10).

The long and short of it is that the 32-bit Linux kernel has some design constraints that simply prevented Novell from designing a NetWare-equivalent system when it came to NCP performance. The 64-bit kernel that came with OES2 helped a lot. Also, more intelligent assumptions about usage.

Our big problem was concurrency. Our cluster nodes regularly ran between 2000-6000 concurrent connections. Anyway, for details about what I found, read the series:

Benchmark Results Summary

It has pictures. Oooo!

That TCP Windowing fault

| 2 Comments | No TrackBacks
Here is the smoking gun, let me show you it (new window).

That's an entire TCP segment. Packet 339 there is the end of the TCP window as far as the NetWare side is concerned. Packet 340 is a delayed ACK, which is a normal TCP timeout. Then follows a somewhat confusing series of packets and the big delay in packet 345.

That pattern, the 200ms delay, and 5 packets later a delay measurable in full seconds, is common throughout the capture. They seem to happen on boundaries between TCP windows. Not all windows, but some windows. Looking through the captures, it seems to happen when the window has an odd number of packets in it. The Windows server is ACKing after every two packets, which is expected. It's when it has to throw a Delayed ACK into the mix, such as the odd packet at the end of a 27 packet window, is when we get our unstable state.

The same thing happened on a different server (NW65SP8) before I turned off "Receive Window Auto Tuning" on the Server 2008 server. After I turned that off, the SP8 server stopped doing that and started streaming at expectedly high data-rates. The rates still aren't as good as they were when doing the same backup to the Server 2003 server, but at least it's a lot closer. 28 hours for this one backup versus 21, instead of over 5 days before I made the change.

The packets you see are for an NW65 SP5 server after the update to the Windows server. Clearly there are some TCP/IP updates in the later NetWare service-packs that help it talk to Server 2008's TCP/IP stack.

Sniffing packets

| 2 Comments | No TrackBacks
When I first started this sysadmin gig 'round about 1997, Windows based packet sniffers were still in their infancy. In fact, the word 'sniffer' was (and probably still is) a trademarked term for the software and hardware package for, er, sniffing packets. Sniffer. So when I needed to figure out a problem on the network, I went to the Network Guys who plugged their Sniffer into any available port on the 10baseT hub I needed analysis on and went to work. They told me what was wrong. Like a JetDirect card transmitting packets whenever it sensed a packet on the wire, thus bringing the network to is knees. Things like that.

Time passed and Sniffer was bought by Network Associates. Who then added a zero to the price because that package really did have a lock on the market. The next rev then more than doubled the already inflated price. So when it came time to renew/upgrade, our Sniffer couldn't handle Fast Ethernet, the price was eye watering. So. On came the free sniffers.

At first I was using Ether Boy, a now long lost packet sniffer. But eventually I found Ethereal (now WireShark), and I went to work. By the time I left my old job in 2003 I already had a rep for knowing WTF I was looking at, and the network guys didn't bat an eyelash when I asked for a span port. This ability was very handy when diagnosing slow Novell logins.

Fast forward to now. Right now I'm trying to figure out why the heck a certain NetWare server is so slow talking to the Data Protector media agent. It isn't obviously a TSA problem, but I've had problems with DP and NW talking to each other on the TCP level so that's where I'm looking now. Unfortunately for me, the desktop-grade GigE nic I have on the span isn't, shall we say, resourced enough to sniff a full GigE stream without at least a few buffer overruns. So I'm not getting ALL of the packets.

When I asked for the span port, the telecom guy said he greatly respected my ability to dig in to TCP issues. And said it in the voice of, "I think you're better at that kind of troubleshooting than we are." Which is a bit disconcerting to hear from your telecom router-gods. But there it is. What it means is that I can't very well ask for help interpreting these traces.

So far I've been able to determine that there is something hinky going on with network delays. There are some 200ms delays in there, which hints strongly at a failed protocol negotiation somewhere. But there are some rather longer delays, and it could be due to window size negotiation problems. Server 2008, the media-agent server, has a much newer TCP/IP stack than NetWare so it is entirely possible that they just don't work well together. I don't understand that quite well enough to manually deconstruct what's going on, so that's what I'm googling on right now.

And why Saturday? Because of course the volume that's doing this is our single largest and it is on the weekend where it is in the failed state where I can pry the hood off and look. Who knows, I may resort to posting packets and crowd sourcing the problem.

Update 12/23/09: Found it.

Old hardware

| No Comments | No TrackBacks
Watching traffic on the opensuse-factory mailing list has brought home one of the maxims of Linuxdom that has been true for over a decade: People run Linux on some really old crap. And really, it makes sense. How much hardware do you really need for a router/firewall between your home network and the internet? Shoving packets is not a high-test application if you only have two interfaces. Death and fundamental hardware speed-limits are what kills these beasts off.

This is one feature that Linux shares with NetWare. Because NetWare gets run on some really old crap too, since it just works, and you don't need a lot of hardware for a file-server for only 500 people. Once you get over a 1000 or very large data-sets the problem gets more interesting, but for general office-style documents... you don't need much. This is/was one of the attractions for NetWare, you need not much hardware and it runs for years.

On the factory mailing list people have been lamenting recent changes in the kernel and entire environment that has been somewhat deleterious for really old crap boxes. The debate goes back and forth, but at the end of the day the fact remains that a lot of people throw Linux on hardware they'd otherwise dispose of for being too old. And until recently, it has just worked.

However, the diaspora of hardware over the last 15 years has caught up to Linux. Supporting everything sold in the last 15 years requires a HELL of a lot of drivers. And not only that, but really old drivers need to be revised to keep up with changes in the kernel, and that requires active maintainers with that ancient hardware around for testing. These requirements mean that more and more of these really old, or moderately old but niche, drivers are drifting into abandonware-land. Linux as an ecosystem just can't keep up anymore. The Linux community decries Windows for its obsession with 'backwards compatibility' and how that stifles innovation. And yet they have a 12 year old PII box under the desk happily pushing packets.

NetWare didn't have this problem, even though it's been around longer. The driver interfaces in the NetWare kernel changed a very few times over the last 20 years (such as the DRV to HAM conversion during the NetWare 4.x era, and the introduction of SMP later on) which allowed really old drivers to continue working without revision for a really long time. This is how a 1998 vintage server could be running in 2007, and running well.

However, Linux is not NetWare. NetWare is a special purpose operating system, no matter what Novell tried in the late 90's to make it a general purpose one (NetWare + Apache + MySQL + PHP = a LAMP server that is far more vulnerable to runaway thread based DoS). Linux is a general purpose operating system. This key difference between the two means that Linux got exposed to a lot more weird hardware than NetWare ever did. SCSI attached scanners made no sense on NetWare, but they did on Linux 10 years ago. Putting any kind of high-test graphics card into a NetWare server is a complete waste, but on Linux it'll give you those awesome wibbly-windows.

There comes a time when an open source project has to cut away the old stuff. Figuring this out is hard, especially when the really old crap is running under desks or in closets entirely forgotten. It is for this reason that Smolt was born. To create a database of hardware that is running Linux, as a way to figure out driver development priorities. Both in creating new, missing drivers, and keeping up old but still frequently used drivers.

If you're running a Pentium 2-233 machine as your network's NTP server, you need to let the Linux community know about it so your platform maintains supportability. It is no longer good enough to assume that if it worked in Linux once, it'll always work in Linux.

Account lockout policies

| No Comments | No TrackBacks
This is another area where how Novell and Microsoft handle a feature differ significantly.

Since NDS was first released back at the dawn of the commercial internet (a.k.a. 1993) Novell's account lockout policies (known as Intruder Lockout) were set-able based on where the user's account existed in the tree. This was done per Organizational-Unit or Organization. In this way, users in .finance.users.tree could have a different policy than .facilities.users.tree. This was the case in 1993, and it is still the case in 2009.

Microsoft only got a hierarchical tree with Active Directory in 2000, and they didn't get around to making account lockout policies granular. For the most part, there is a single lockout policy for the entire domain with no exceptions. 'Administrator' is subjected to the same lockout as 'Joe User'. With Server 2008 Microsoft finally got some kind of granular policy capability in the form of "Fine Grained Password and Lockout Policies."

This is where our problem starts. You see, with the Novell system we'd set our account lockout policies to lock after 6 bad passwords in 30 minutes for most users. We kept our utility accounts in a spot where they weren't allowed to lock, but gave them really complex passwords to compensate (as they were all used programatically in some form, this was easy to do). That way the account used by our single-signon process couldn't get locked out and crash the SSO system. This worked well for us.

Then the decision was made to move to a true blue solution and we started to migrate policies to the AD side where possible. We set the lockout policy for everyone. And we started getting certain key utility accounts locked out on a regular basis. We then revised the GPOs driving the lockout policy, removing them from the Default Domain Policy, creating a new "ILO polcy" that we applied individually to each user container. This solved the lockout problem!

Since all three of us went to class for this 7-9 years ago, we'd forgotten that AD lockout policies are monolithic and only work when specified in Default Domain Policy. They do NOT work per-user the way they are in eDirectory. By doing it the way we did, no lockout policies were being applied anywhere. Googling on this gave me the page for the new Server 2008-era granular policies. Unfortunately for us, it requires the domain to be brought to the 2008 Functional Level, which we can't do quite yet.

What's interesting is a certain Microsoft document that suggested settings of 50 bad logins every 30 minutes as a way to avoid DoSing your needed accounts. That's way more that 6 every 30.

Getting the forest functional level raised just got more priority.

Windows 7 releases!

| 2 Comments | No TrackBacks
Or rather, its retail availability is today. We're on a Microsoft agreement, so we've had it since late August. And boy do I know that. I've been having a trickle of calls and emails ever since the beta released about various ways Win7 isn't working in my environment and whether I have any thoughts about that. Well, I do. As a matter of fact, Technical Services and ATUS both have thoughts on that:

Don't use it yet. We're not ready. Things will break. Don't call us when it does.

But as with any brand new technology there is demand. Couple that with the loose 'corporate controls' inherent in a public Higher Ed institution and we have it coming in anyway. And I get calls when people can't get to stuff.

The main generator of calls is our replacement of the Novell Login Script. I've spoken about how we feel about our login script in the past. Back on July 9, 2004 I had a long article about that. The environment has changed, but it still largely stands. Microsoft doesn't have a built in login script the same way NetWare/OES has had since the 80's, but there are hooks we can leverage. One of my co-workers has built a cunning .VBS file that we're using for our login script, and does the kinds of things we need out of a login script:
  • Run a series of small applications we need to run, which drive the password change notification process among other things.
  • Maps drives based on group membership.
  • Maps home directories.
  • Allows shelling out to other scripts, which allows less privileged people to manage scripts for their own users.
A fair amount of engineering did go into that script, but it works. Mostly. And that's the problem. It works good enough that at least one department on campus decided to put Vista in their one computer lab and rely on this script to get drive mappings. So I got calls shortly after quarter-start to the effect of, "your script don't work, how can this be fixed." To which my reply was (summarized), "You're on Vista and we told y'all not to do that. This isn't working because of XYZ, you'll have to live with it." And they have, for which I am greatful.

Which brings me to XYZ and Win7.

The main incompatibility has to do with the NetWare CIFS stack. Which I describe here. The NetWare CIFS stack doesn't speak NTLMv2, only LM and NTLM. In this instance, it makes it similar to much older Samba versions. This conflicts with Vista and Windows 7, which both default their LAN Manager Authentication Level to "NTLMv2 Responses Only." Which means that out of the box both Vista and Win7 will require changes to talk to our NetWare servers at all. This is fine, so long as they're domained we've set a Group Policy to change that level down to something the NetWare servers speak.

That's not all of it, though. Windows 7 introduced some changes into the SMB/CIFS stack that make talking to NetWare a bit less of a sure thing even with the LAN Man Auth level set right. Perhaps this is SMB2 negotiations getting in the way. I don't know. But for whatever reason, the NetWare CIFS stack and Win7 don't get along as well as the Vista's SMB/CIFS stack did.

The main effect of this is that the user's home-directory will fail to mount a lot more often on Win7 than on Vista. Also, other static drive mappings will fail more often. It is reasons like these that we are not recommending removing the Novell Client and relying on our still in testing Windows Login Script.

That said, I can understand why people are relying on the crufty script rather than the just-works Novell Login Script. Due to how our environment works, The Vista/Win7 Novell Client is dog slow. Annoyingly slow. So annoyingly slow that not getting some drives when you log in is preferable to dealing with it.

This will all change once we move the main file-serving cluster to Windows 2008. At that point, the Windows script should Just Work (tm). At that point, getting rid of the Novell Client will allow a more functional environment. We are not at that point yet.

Other Blogs

My Other Stuff

About this Archive

This page is an archive of entries from June 2010 listed from newest to oldest.

May 2010 is the previous archive.

July 2010 is the next archive.

Find recent content on the main index or look in the archives to find all content.