January 2014 Archives

Not a bad observation

A friend of mine recently posted some job stuff and he had a good observation:

I investigate businesses that pay employees under the table. I ensure that unemployment insurance is paid by the employers, protecting the employees and ensuring they get unemployment insurance if they get laid off (if they get paid under the table they don't get unemployment).

I have been picking up a lot of businesses who are avoiding taxes (surprisingly, or maybe not, software companies are a big issue, along with housecleaners and dog groomers/sitters/walkers).

Emphasis mine.

You know, that's an interesting point and doesn't surprise me much. He does his work in the Seattle area, which is one of the major tech-hubs. And one thing tech-startups are known for is distributed offices. Take a 10 person company with people in 6 different states, no one who has run a company like that before, and you have prime conditions for dropping the ball on unemployment reporting and payment.

So you fired the slacker living in Waukegan, Illinois. Did you report their earnings in Illinois, where they live, Wisconsin, where the shared-office they 'worked' out of was, or Washington State, where HQ is?

aaahhh.... lemme get back to you on that.

As he tells me, that can be a very expensive mistake to make depending on how long the misunderstanding was in place. Your payroll vendor may or may not know WTF they're doing with a startup-style distributed office, so don't rely solely on them. Work location and residential location are different things. You can work in Vancouver, WA but live in Portland, OR; you pay Oregon income taxes, but will earn Washington unemployment if you get laid off.

It all began with a bit of Twitter snark:


SmallLAMPStack.png

Utilities follow a progression. They begin as a small shell script that does exactly what I need it to do in this one instance. Then someone else wants to use it, so I open source it. 10 years of feature-creep pass, and then you can't use my admin suite without a database server, a web front end, and just maybe a worker-node or two. Sometimes bash just isn't enough you know? It happens.

Anyway...

Back when Microsoft was just pushing out their 2007 iteration of all of their enterprise software, they added PowerShell support to  most things. This was loudly hailed by some of us, as it finally gave us easy scriptability into what had always been a black box with funny screws on it to prevent user tampering. One of the design principles they baked in was that they didn't bother building UI elements for things you'd only do a few times, or would do once a year.

That was a nice time to be a script-friendly Microsoft administrator since most of the tools would give you their PowerShell equivalents on one of the Wizard pages, so you could learn-by-practical-example a lot easier than you could otherwise. That was a real nice way to learn some of the 'how to do a complex thing in powershell' bits. Of course, you still had to learn variable passing, control loops, and other basic programming stuff but you could see right there what the one-liner was for that next -> next -> next -> finish wizard was.

SmallLAMPStack-2.png

One thing that a GUI gives you is a much shallower on-ramp to functionality. You don't have to spend an hour or two feeling your way around a new syntax in order to do one simple thing, you just visually assemble your bits, hit next, then finish, then done. You usually have the advantage of a documented UI explaining what each bit means, a list of fields you have to fill out, syntax checking on those fields, which gives you a lot of information about what kinds of data a task requires. If it spits out a blob of scripting at the end, even better.

An IDE, tab-completion, and other such syntactic magic help scripters build what they need; but it all relies upon on the fly programatic interpretation of syntax in a script-builder. It's the CLI version of a GUI, so doesn't have the sigma of 'graphical' ("if it can't be done through bash, I won't use it," said the Linux admin).

Neat GUIs and scriptability do not need to be diametrically opposed things, ideally a system should have both. A GUI to aid discoverability and teach a bit of scripting, and scripting for site-specific custom workflows. The two interface paradigms come from different places, but as Microsoft has shown you can definitely make one tool support the other. More things should take their example.

Worried about the IPv4 to IPv6 migration?

NetWare users had a similar migration when Novell finally got off of IPX and moved to native TCP/IP with the release of NetWare 5.0 on or around 1999. We've done it before. Like the IPv6 transition, it was reasons other than "because it's a good idea" that pushed for the retirement of IPX from the core network. Getting rid of old networking protocols is hard and involves a lot of legacy, so they stick around for a long, long time.

As it happens IPv6 is spookily familiar to old IPX hands, but better in pretty much every way. It's what Novell had in mind back in the 80's, but done right.

  • Dynamic network addressing that doesn't require DHCP.
  • A mechanism for whole-network announcements (SAP in IPX, various multicast methods for IPv6)

Anyway, you have a network protocol you need to eventually retire, but pretty much everything uses it. What do you do? Like the stages of grief, there is a progression at work here:

  1. Ignore it. We're using the old system just fine, it's going to work for the forseeable future, no reason to migrate.
  2. On by default, but disabled manually. The installer asks for the new stuff, but we just turn it off as soon as the system is up. We're not migrating yet.
  3. The WAN link doesn't support the old stuff. Um, crap. Tunnel the old stuff over the new stuff for that link and otherwise... continue to not migrate.
  4. Clients go on-by-default, but disabled manually. New clients are supporting the new stuff, but we disable it manually when we push out new clients. We're not migrating.
  5. Clients get trouble related to protocol negotiation. Thanks to the tunnel there is new stuff out there and clients are finding it, but can't talk to it. Which is creating network delays and causing support tickets. Find ways to disable protocol discovery, push that out to clients.
  6. Internal support says all the manual changes are harshing their workflow, and can we please migrate since everything supports it now anyway. Okay, maybe we can go dual stack now.
  7. Network team asks if they can turn off the old stuff since everything is also using the new stuff. Say no, and revise deploy guides to start disabling the old stuff on clients but keep it on servers just in case.
  8. Network team asks again since the networking vendor has issued a bulletin on this stuff. Audit servers to see if there is any oldstuff usage. Find that the only usage is between the servers themselves and some really old, extremely broken stuff. Replace the broken stuff, turn off old stuff stack on servers.
  9. Migration complete.

At WWU we finished our IPX to IP migration by following this road and it took us something like 7 years to do it.

Ask yourself where you are in your IPv6 implementation. At WWU when I left we'd gotten to step 5 (but didn't have a step 3).

I've done this before, and so have most old NetWare hands. Appeals to best practices and address-space exhaustion won't work as well as you'd hope, feeling the pain of the protocol transition does. Just like we're seeing right now. Migration will happen after operational pain is felt, because people are lazy. We're going to have RFC1918 IPv4 islands hiding behind corporate firewalls for years and years to come, with full migration only happening after devices stop supporting IPv4 at all.

The IPX transition was a private-network only transition since it was never transited over the public Internet. The IPv6 transition is Internet wide, but there are local mitigations that will allow local v4 islands to function for a long, long time. I know this, since I've done it before.

This is a bit of a rehash of a post I did back in 2005, but Novell did it right when it came to handling user credentials way back in the late 80's and early 90's. The original documents have pretty much fallen off the web, but Novell chose to use a one-way RSA method (or possibly a two-way RSA method but elected to not retain the decryption key, which is much the same thing) to encipher the passwords. The certificate used in this method was generated by the tree itself at creation time, so was unique per tree.

The authentication process looked something like this (from memory, see also: primary documentation is offline)

  1. Client connects to a server, says, I want to log in a user, here is a temporary key.
  2. Server replies using the temporary key, "Sure. Here is my public key and a salt."
  3. Client says, "I want to log in bobjoe42.employees.corporate"
  4. Server replies, "Here is the public key for bobjoe42.employees.corporate"
  5. Client crypts the password with bobjoe42's certificate.
  6. Client crypts the cryptotext+salt with the server's signing key.
  7. Client says, "Here is the login for bobjoe42.emploees.corporate"
  8. Server decrypts login request to get at the cryptotext+salt of bobjoe42.emploees.corporate.
  9. Removes salt.
  10. Server compares the submitted cryptotext to the cryptotext on bobjoe42.employees.corporate's object. It matches.
  11. Server says, "You're good."

Unfortunately, the passwords were monocased before crypting computation.

Fortunately, they allowed really long passwords unlike many systems (ahem 1993 version of UNIX-crypt).

That said, this system does a lot of password-handling things right:

  1. Passwords are never passed in the clear over the network, only the enciphered values are transferred.
  2. Passwords are never stored in the clear.
  3. Passwords are never stored in a reversable way.
  4. Reversible keys are never transferred in the clear.
  5. The password submission process prevents replay attacks through the use of a random salt with each login transaction.
  6. The passwords themselves were stored encrypted with tree-unique crypto certificates, so the ciphertext of a password in one tree would look different than the same password in a different tree.

You can get a similar system using modern web technologies:

  1. Client connects to server over SSL. A secure channel is created.
  2. Client retrieves javascript or whatever from the server describing how to encode login credentials.
  3. Client says, "I'm about to log in, give me a salt."
  4. Server returns a salt to the client.
  5. Client computes a salted hash from the user-supplied password.
  6. Client submits, "I'm logging in bobjoe42@zmail.us with this hashtext."
  7. Server compares the hashtext to the password database, finds a match.
  8. Server replies, "You're good, use this token."

However, a lot of systems don't even bother going that complex, relying instead on the SSL channel to provide transaction security and allowing unhashed passwords to be passed over that crypted channel. That's "good enough" for a lot of things, and clearly Novell with rather paranoid back in the day.

As it happened, that method ended up being so secure they had to change their authentication system when it came time to handle systems that wanted to authenticate using non-NCP methods like, oh, CIFS, or Appletalk. Those other protocols don't have mechanisms to handle the sort of handshake that NCP allows so something else had to be created, and thus the Universal Password was born. But that's kind of beyond the scope of this article.

Yep, they did it right back then. A network sniffer on the network (a lot easier in the days of hubbed networks) was much less likely to yield tasty numnums. SMB wasn't so lucky.

Novell introduced NDS with NetWare 4.0 in 1993, and is still being shipped 21years later as part of Open Enterprise Server.

For those of you who've never run into it, NDS (Novell Directory Services, currently marketed as eDirectory) is currently a distributed LDAP database that also provides non-LDAP interfaces for interacting with the object store. It can scale up to very silly object counts and due to Novell's long experience with distributed database management does so with a minimum of object corruption. It just works (albeit on a proprietary system).

It didn't start off as an LDAP datastore, though. No, it began life in 1993 as the authentication database behind NetWare and had a few very revolutionary features versus what was available on the market at the time:

  • It allowed multiple servers to use the same authentication database, so you didn't have to have an account on each server if users needed to access more than one of them. This was the biggest selling point, and seems pretty basic right now.
    • NIS/NIS+ already did this and predates NDS, but was a UNIX-only system not useful for non-UNIX offices.
  • It ran the database on multiple nodes, which made it a replicated database.
  • It partitioned the database to provide improved database locality, which made it a sharded database.
  • It allowed write operations on more than one replica per shard, which made it a distributed database.
  • It had eventual convergence built into it.
  • It had robust authentication features, which I'll get into in a later post.

NDS was a replicated, distributed, sharded database with eventual convergence that was written in 1993. MongoDB can do three of those (replicated, sharded, eventual consistency, but can distribute reads if needed), Cassandra does all four. This is a solvable problem but it's a rather complex one as Novell found out.

Consider the state of networking in 1993.

  • 10Mbps Ethernet was high-speed, and was probably hubbed even in the "datacenter".
  • Any enterprise of any size had very slow WAN links connecting small offices to central, so you had high latency links.
  • 16Mbps token-ring was still in frequent enough use NetWare had to support it.
    • Since TR was faster than Ethernet, it was frequently deployed in the datacenter, which necessitated TR to Ethernet bridges.
    • TR was often the edge network as well.
  • The tech industry hadn't yet converged on a single Ethernet Layer 2 framing protocol, so anything talking to Ethernet had to be able to handle up to 4 different framing standards (to the best of my limited knowledge, Cisco gear stillcan be configured to use any of the three losers of that contest, even though none of them has been in common usage for a long time).
  • TCP/IP was not the only data standard, NetWare used its very own IPX protocol which is not an IP protocol (more on that in a later post).

Can you imagine trying to run something like Cassandra on 10Mbps links with some nodes on the other side of links with pings approaching 1000ms? It can certainly be done, but it sure as heck magnifies any problems in the convergence protocol.

Novel learned that too. Early versions of NDS were prone to corruption, very prone. Real world networking conditions were so very unlike the assumed conditions the developers engineered in that it was only after NDS hit production that they truly appreciated the full array of situations it had to support. From memory, it was only after NDS version 6 released on or about NetWare 4.11 service-pack 3 that it really became stable. That took Novell over 4 years to get right.

Corruption bugs continued in NDS even into the modern era since that's a very hard problem to stomp. The edge cases surrounding a node disappearing, and reappearing with old/new/changed data and how convergence happens gets very nuanced, very quickly. The open-source distributed database projects are dealing with that right now.

For all that it was a strong backing database for very large authentication and identity databases, NDS/eDirectory was never designed to be highly transactional. It's an LDAP database, and you use it where you'd use an LDAP database.

NetWare Retrospective

As I've recently been through a change of jobs I've had a lot of chance to look back on my career. That career is long enough to have included Novell NetWare in it quite prominently, though I no longer point that out on my resume unless I feel a specific employer would be impressed by that. Novell was doing a lot of familiar things 20-odd years ago, and this blog series will be a retrospective on some old-yet-new problems that were solved in the 90's, but we're still fighting today.

10 year blog-anniversary

| 1 Comment

10 years ago today, I had my first post.

This was done as part of the first big project I was given when I started working for WWU: figure out how to serve web-pages from home directories. Which I did, and this blog was a way to make sure it actually worked. It did. Back then I used Blogger and their FTP publish option to maintain this thing, I've since moved on to my own domain and actual blog-software.

10 years later I'm also starting a brand new job, and am all of 3 days into it so far. By now I'm just beginning to get a handle on the complexity of the problem I'm facing.

I'm not posting as often as I used to. In part that's because I've been working for places that have intellectual property they need to protect and talking about what I'm working on is frequently a violation of that, and in part there are other outlets for the shorter stuff. Twitter for instance, and even ServerFault.

I'm still here, and still going. Some pointless stats after the cut.