Recently in edir Category

This is a bit of a rehash of a post I did back in 2005, but Novell did it right when it came to handling user credentials way back in the late 80's and early 90's. The original documents have pretty much fallen off the web, but Novell chose to use a one-way RSA method (or possibly a two-way RSA method but elected to not retain the decryption key, which is much the same thing) to encipher the passwords. The certificate used in this method was generated by the tree itself at creation time, so was unique per tree.

The authentication process looked something like this (from memory, see also: primary documentation is offline)

  1. Client connects to a server, says, I want to log in a user, here is a temporary key.
  2. Server replies using the temporary key, "Sure. Here is my public key and a salt."
  3. Client says, "I want to log in bobjoe42.employees.corporate"
  4. Server replies, "Here is the public key for bobjoe42.employees.corporate"
  5. Client crypts the password with bobjoe42's certificate.
  6. Client crypts the cryptotext+salt with the server's signing key.
  7. Client says, "Here is the login for bobjoe42.emploees.corporate"
  8. Server decrypts login request to get at the cryptotext+salt of bobjoe42.emploees.corporate.
  9. Removes salt.
  10. Server compares the submitted cryptotext to the cryptotext on bobjoe42.employees.corporate's object. It matches.
  11. Server says, "You're good."

Unfortunately, the passwords were monocased before crypting computation.

Fortunately, they allowed really long passwords unlike many systems (ahem 1993 version of UNIX-crypt).

That said, this system does a lot of password-handling things right:

  1. Passwords are never passed in the clear over the network, only the enciphered values are transferred.
  2. Passwords are never stored in the clear.
  3. Passwords are never stored in a reversable way.
  4. Reversible keys are never transferred in the clear.
  5. The password submission process prevents replay attacks through the use of a random salt with each login transaction.
  6. The passwords themselves were stored encrypted with tree-unique crypto certificates, so the ciphertext of a password in one tree would look different than the same password in a different tree.

You can get a similar system using modern web technologies:

  1. Client connects to server over SSL. A secure channel is created.
  2. Client retrieves javascript or whatever from the server describing how to encode login credentials.
  3. Client says, "I'm about to log in, give me a salt."
  4. Server returns a salt to the client.
  5. Client computes a salted hash from the user-supplied password.
  6. Client submits, "I'm logging in bobjoe42@zmail.us with this hashtext."
  7. Server compares the hashtext to the password database, finds a match.
  8. Server replies, "You're good, use this token."

However, a lot of systems don't even bother going that complex, relying instead on the SSL channel to provide transaction security and allowing unhashed passwords to be passed over that crypted channel. That's "good enough" for a lot of things, and clearly Novell with rather paranoid back in the day.

As it happened, that method ended up being so secure they had to change their authentication system when it came time to handle systems that wanted to authenticate using non-NCP methods like, oh, CIFS, or Appletalk. Those other protocols don't have mechanisms to handle the sort of handshake that NCP allows so something else had to be created, and thus the Universal Password was born. But that's kind of beyond the scope of this article.

Yep, they did it right back then. A network sniffer on the network (a lot easier in the days of hubbed networks) was much less likely to yield tasty numnums. SMB wasn't so lucky.

Novell introduced NDS with NetWare 4.0 in 1993, and is still being shipped 21years later as part of Open Enterprise Server.

For those of you who've never run into it, NDS (Novell Directory Services, currently marketed as eDirectory) is currently a distributed LDAP database that also provides non-LDAP interfaces for interacting with the object store. It can scale up to very silly object counts and due to Novell's long experience with distributed database management does so with a minimum of object corruption. It just works (albeit on a proprietary system).

It didn't start off as an LDAP datastore, though. No, it began life in 1993 as the authentication database behind NetWare and had a few very revolutionary features versus what was available on the market at the time:

  • It allowed multiple servers to use the same authentication database, so you didn't have to have an account on each server if users needed to access more than one of them. This was the biggest selling point, and seems pretty basic right now.
    • NIS/NIS+ already did this and predates NDS, but was a UNIX-only system not useful for non-UNIX offices.
  • It ran the database on multiple nodes, which made it a replicated database.
  • It partitioned the database to provide improved database locality, which made it a sharded database.
  • It allowed write operations on more than one replica per shard, which made it a distributed database.
  • It had eventual convergence built into it.
  • It had robust authentication features, which I'll get into in a later post.

NDS was a replicated, distributed, sharded database with eventual convergence that was written in 1993. MongoDB can do three of those (replicated, sharded, eventual consistency, but can distribute reads if needed), Cassandra does all four. This is a solvable problem but it's a rather complex one as Novell found out.

Consider the state of networking in 1993.

  • 10Mbps Ethernet was high-speed, and was probably hubbed even in the "datacenter".
  • Any enterprise of any size had very slow WAN links connecting small offices to central, so you had high latency links.
  • 16Mbps token-ring was still in frequent enough use NetWare had to support it.
    • Since TR was faster than Ethernet, it was frequently deployed in the datacenter, which necessitated TR to Ethernet bridges.
    • TR was often the edge network as well.
  • The tech industry hadn't yet converged on a single Ethernet Layer 2 framing protocol, so anything talking to Ethernet had to be able to handle up to 4 different framing standards (to the best of my limited knowledge, Cisco gear stillcan be configured to use any of the three losers of that contest, even though none of them has been in common usage for a long time).
  • TCP/IP was not the only data standard, NetWare used its very own IPX protocol which is not an IP protocol (more on that in a later post).

Can you imagine trying to run something like Cassandra on 10Mbps links with some nodes on the other side of links with pings approaching 1000ms? It can certainly be done, but it sure as heck magnifies any problems in the convergence protocol.

Novel learned that too. Early versions of NDS were prone to corruption, very prone. Real world networking conditions were so very unlike the assumed conditions the developers engineered in that it was only after NDS hit production that they truly appreciated the full array of situations it had to support. From memory, it was only after NDS version 6 released on or about NetWare 4.11 service-pack 3 that it really became stable. That took Novell over 4 years to get right.

Corruption bugs continued in NDS even into the modern era since that's a very hard problem to stomp. The edge cases surrounding a node disappearing, and reappearing with old/new/changed data and how convergence happens gets very nuanced, very quickly. The open-source distributed database projects are dealing with that right now.

For all that it was a strong backing database for very large authentication and identity databases, NDS/eDirectory was never designed to be highly transactional. It's an LDAP database, and you use it where you'd use an LDAP database.

Migrating off of NetWare

| 5 Comments
It has been around a year since we did the heavy lifting of migrating off of NetWare and retiring our eDirectory tree. By this point last year we had our procedures in place, we just needed to pull the trigger and start moving data around. I was asked to provide some hints about it, but the mail bounced with a 550-mailbox-not-found error *ahem*.

Because it's such a narrowly focused topic, and the WWU people who read me lived through it and therefore already know this stuff, I'm putting the meat of the post under the fold.

You're welcome.
There is a certain question that has shown up in pretty much every class about how to set up an X500-compliant directory service (thats things like Active Directory, NDS, and eDirectory). It goes like this:
You have been hired as a consultant to set up $FakeCorpName's new $Directory. They have major offices in five places. New York, Los Angeles, London, Sydney, and Tokyo. They have five $OldTech. What is the directory layout you recommend?
I originally ran into this particular question when I was getting my Certified Novell Administrator certification back in 1996. In that case $Directory was NDS and $OldTech was actually other NDS trees. In 2000/2001 when I was getting my Active Directory training, $Directory was AD and $OldTech was NT4 domains. The names of the countries did not vary much between the two. NYC and LA are always there, as are London and Tokyo. Sometimes Paris is there instead of Sydney. Once in a great while you'll see Hong Kong instead of Tokyo. In a fit of continental inclusiveness, I think I saw "Johannesburg" in there once (in an Exchange class IIRC). I ran into this question again recently in relation to AD.

This is a good academic question, but you will never, ever get it that easy in real life. This question is good for considering how geographically diverse corporate structure impacts your network layout and the knock-on effects that can have on your directory structure. However, the network is only a small part of the overall decision making process when it comes to problems like these.

The major part? Politics.

It is now 2010. Multi-national companies have figured out this 'office networking' thingy and have a pre-existing infrastructure. They have some kind of directory tree, somewhere, even if it only exists in their ERP system (which they all have now). They have office IT people who have been doing that work for 15+ years. A company that size has probably eaten bought out competitors, which introduces strange networking designs to their network. Figuring out how to glue together 5 geographically separate WinNT4.0 domains in 2010 is not useful. The problem is not technical, it's business.

1996
In 1996, WAN links were expensive and slow. NDS was the only directory of note on the market (NIS+ was a unix directory, therefore completely ignored in the normal business windows-only workplace). Access across WAN links was generally discouraged unless specifically needed. Because of this, your WAN links gave you the no-brainer divisions in your NDS tree where Replicas needed to be declared. All the replication traffic would stay within that site and only external reference resolution would cross the expensive WAN. Resources the entire company needed access to might go in a specific, smaller, replica that gets put on multiple sites.

This in turn meant that the top levels of your NDS tree had a kind of default structure. Many early NDS diagrams had a structure like this:

An early NDS diagram
Each of the top-level "C" containers was a replica. The US example was given to show how internal organization could happen. Snazzy! However, this flew in the face of real-work experience. Companies merge. Bits get sold off. By 2000 Novell was publishing diagrams similar to this one:

A later NDS diagram
This one was designed to show how company mergers work. Gone are the early "C" containers, in their place are "O". Merging companies? Just merge that NDS tree into a new O, and tada! Then you can re-arrange your OUs and replicas at your leisure.

This was a sign that Novell, the early pioneer in directories like this, had their theory run smack into reality with bad results. The original tree style with the top level C containers didn't handle mergers and acquisitions well. Gone was the network purity of the early 1996 diagrams, now the diagrams showed some signs of political influence.

2000
In 2000, Microsoft released Windows 2000 and Active Directory. The business world had been on the Internet for some time, and the .com boom was in full swing. WAN links were still expensive and slow, but not nearly as slow as they used to be. The network problem Microsoft was faced with was merging multiple NT4 domains into a single Active Directory structure.

In 2000, AD inter-DC replication was a lot noisier than eDirectory was doing at the time, so replication traffic was a major concern. This is why AD introduced the concept of Sites and inter-Site replication scheduling. Even so, the diagrams you saw then were reminiscent of the 1996 NDS diagrams:
An early AD diagram
As you can see, separate domains for NYC and LA are gone, which is recognition that in-country WAN links may be fast enough for replication, but transcontinental links were still slow. Microsoft handled the mergers-and-acquisitions problem with inter-domain trusts (which, thanks to politics, tend to be hard to get rid of once in place).

AD replication improved with both Server 2003 and Server 2008. The Microsoft ecosystem got used to M&A activity the same way Novell did a decade earlier and changes were made to best practices. Also, network speeds improved a lot.

2010
In 2010 WAN links are still slow relative to LAN links, but they're now fast enough that directory replication traffic is not a significant load for all but the slowest of such links. Even trans-continental WAN links are fat enough that directory replication traffic doesn't eat too much valuable resources.
An AD tree in the modern era
Note how simple this is.There is an empty root to act as nothing but the root of an entire tree. Northwinds is the major company and it recently bought DigitalRiver, but hasn't fully digested it yet. Note the lack of geographic separation in this chart. WAN speeds have improved (and AD replication has improved) enough that replicating even large domains over the WAN is no longer a major no-no.
  


And yet... you'll rarely see trees like that. That's because, as I said, network considerations are not the major driver behind organization these days, it's politics.

Take the original question at the top of this post. Consider it 5 one-domain AD trees, and each country/city is its own business unit that's large enough to have their own full IT stack (people dedicated to server, desktop, web support, and developers supporting it all), and has also been that way for a number of years. This is what you'll run into in real life. This is what will monkey-wrench the network purity of the above charts.

The biggest influence towards whether or not a one-domain solution can be reached will be the political power behind the centralizing push, and how uncowed they get when Very Important People start throwing their weight around. If the CEO is the one pushing this and brooks no argument, then, well, it's more likely to happen. If the COO is the one pushing it, but caves to pressure in order to not expend political capital with regards to unrelated projects, you may end up with a much more fragmented picture.

There will be at least one, and perhaps as many as five, business units that will insist, adamantly, that they absolutely have to keep doing things the way they've always been doing it, and they can't have other admins stomping around their walled garden in jack-boots. Whether or not they get their way is a business decision, not a technical one. Caving into these demands will give you an AD structure that includes multiple domains, or worse, multiple forests.
Fragmented AD environment
In my experience, the biggest bone of contention will be who gets to be in the Domain and Enterprise Admins groups. Those groups are the God Groups for AD, and everyone has to trust them. Demonstrating that only a few tasks require Domain Admin rights and that nearly all day-to-day administration can be done through effectively delegated rights will go a long way towards alleviating this pressure, but that may not be enough to convince business managers weighing in on the process.

The reason for this resistance is that this kind of structural change will require changes to operational procedures. You may think IT types are used to change, but you'd be wrong. Change can be resented just as fiercely in the ranks of IT-middle-managers as it is in rank-n-file clerks. Change for change's sake is doubly resented.

Overcoming this kind of political obstructionism is damned hard. It takes real people skills and political backing. This is not the kind of thing you can really teach in an MCSE/MCITP class track. Political backing has to already be in place before the project even gets off the ground.

I haven't been in an MCSE/MCITP class, so I don't know what Microsoft is teaching these days. I ran into this question in what looks like a University environment, which is a bit less up-to-date than getting it direct from Microsoft would be.  Perhaps MS is teaching this with the political caveats attached. I don't know. But they should be doing so.
On a Wednesday in August in 1996, the WWU NDS tree was born. There were other trees, but this is the one that everyone else merged into. The one tree to rule them all. That was NetWare 4. It brought the directory, and it was glorious (when it worked right).

And now, most of 14 years later, it is done. The last replica servers were powered off today after a two year effort to disentangle WWU from NetWare.

I have some blog-header text to change.

Password policies in AD

| No Comments
One of the more annoying problems with password and account-lockout policies in Active Directory has been that they apply to every account universally. I you want to force your users to change passwords every 90 days, with account lockout after a certain number of bad login attempts, then the same policies apply to your 'Administrator' user. Account lock-out was a really great way to DoS yourself in really critical ways.

In a way, that's what account-lockout is all about. It's to keep bad people from coming in, but its also a way for bad people from preventing legitimate people from using their own accounts. You need to take the good with the bad.

Since we were a NetWare shot for y-e-a-r-s we're very used to Intruder Lockout (ILO), and losing it during the move to Windows was seen as a loss of a key security feature. We had accounts that had to be exempted from lockout, which was dead easy in eDirectory but very difficult in AD.

Happily, Server 2008 introduces a way to do this. It's called "Fine-Grained Password Policy", and is NOT group-policy based. This was somewhat surprising. Getting this requires raising the domain and forest functional levels to the 2008 level. What it allows is setting password policy based on group memberships, with conflict resolution handled by a priority setting on the policy itself. Interestingly, the actual policies are created through ASDI Edit, so they're not beginner-friendly.

For instance, we can set a 'lock out after 6 tries in 30 minutes' setting to the Domain Users group at a Priority of 30, and a second 'never lock out ever' policy to the Domain Admins group at a Priority of 20. That way 'Administrator' will have the never-lock policy apply to it, but Joe User will have the lock-after-6-in-30 policy apply. This works best if the password policy specifies that Domain Admins need to have very complex and long passwords, which makes a brute-force cracking attempt take unreasonably long amounts of time.

We put this in place a few weeks ago, and it is working as we expected. SO GLAD to have this.

Centralized IT

| 2 Comments
I've had quite a bit of experience with the process of centralizing IT. At my last job I was at ground zero as I was on the committee that was charged with rationalizing an IT job family structure that was grounded in the early 1980's (key clue, the phrase, "electronic data processing" was slathered across many job titles, a phrase not at all in vogue in the 1990's). This particular consolidation event was driven from a directive from on high, above the CIO. So, as it were, it happened in spite of the grumbling.

WWU has gone through some of its own consolidations, but there are natural barriers to complete consolidation in the Higher Ed market. I'll get to those in a bit. The one thing acting as a serious barrier to consolidation in any organization are departments that are large enough to support their own multi-person IT departments. Departments with one or two people effectively doing the full IT stack (stand-alone sysadmins who also do desktop support, database maintenance, to-the-desk network wiring, and maybe a bit of app-dev along the side) are most vulnerable to being consolidated into the central Borg.

Some departments are all too happy to join the central IT infrastructure, as they see it as a way to shed costs onto another business unit. Others are happy because their own IT people are so overworked, the idea of getting them help is seen as a cost-free mercy; or put another way agreeing to consolidation is seen as a cost-free way to increase IT investment. Still others are happy to join because they want some nifty new technology and their stick-in-the-mud IT people keep saying, "no," and view the central Borg as a way to get that thing.

The big reason departments don't want their IT people consolidated away from them is personalized service. These are people who know the business intimately, something those central-office folk don't. The cost of maintaining an independent IT infrastructure is seen as a perfectly valid business investment in operational efficiency. Any centralization initiative will have to deal with this concern.

The other big reason shows up less often, but is very hard to overcome without marching orders delivered from On High: distrust of central IT in specific. If the business unit that contains central IT is seen to be less competent as compared to the local IT people, that business unit will not consent to centralization. If the people in central IT are collectively viewed as a bunch of idiots, or run by idiots, the only way that unit is centralizing is if a metaphorical gun is held to their heads.

My last job handled the all of the above and eventually came to an agreement. First and foremost, it was a fiat from On High that IT centralization would happen. All IT job titles started being paid out of the same budget. We then spent the next four years hammering out the management structure, which meant that for a long time a whole bunch of people had their salary paid by people with 0% influence on their work direction.

Many departments gleefully joined the central infrastructure, driven in large part by their own IT people. They'd been overworked, you see, and the idea of gaining access to a much wider talent pool, and a significantly deeper one as well, was hard to not take advantage of. These were the departments with 1-3 IT people. In almost every case the local IT people stayed in their areas as the local IT contact, which maintained the local knowledge they'd developed over the years.

There was one small department that was a holdout until the very end. An attempt to merge some 5 years earlier had gone horribly wrong, and institutional memory remembered that very clearly. It wasn't until that department got a new director that an agreement was reached. The one IT guy up there stayed up there after the merger and stopped doing server and desktop support in favor of department-specific app-dev work, what he was hired to do in the first place as it happened.

Then the arm-wrestling over the bigger departments took place. For the most part they kept near complete control over their own IT staffs, but their top level IT managers were regularly hauled back to the home IT office for 'management team meetings'. This ended up being a good move, since it reduced the barriers for communication at the very top level, and ultimately lead to some better efficiencies overall; especially in the helpdesk area as staff started to move between stacks after a while. Also, the departments that had been deeply skeptical of this whole centralized IT thing started working with other IT managers and getting their concerns heard, which reduced some of the inherent distrust.

With Higher Ed, there is an additional factor or two that my previous job didn't face. First of all, the historic independence of specific Colleges. Second, Universities are generally a lot less command-and-control than their .com or even .gov brethren. This means that centralization relies far more on direct diplomacy between IT business units than it does on direct commands from on high. Distrust in this environment is much more hard to overcome as coercion is not a readily available option.

Back in the day, WWU had 7 separate NDS trees. 7. That's a lot. Obviously, there wasn't much in the way of cross-departmental access of data. Over the course of around 5 years we consolidated down to a single 'WWU' NDS tree. Some departments happily stopped spending IT time on account maintenance tasks and let central IT do it all. Some departments gave up their servers all together. Time passed and still more areas decided they really didn't need to bother keeping local replicas, and let central IT handle that problem.

In the end, handling IT in Higher Ed means dealing with a more heterogeneous environment than is otherwise cost-effective. I've mentioned before how network management on Higher Ed networks resembles ISPs more than it does corporate networks, and that unfortunately applies to things like server and storage purchases. Now that we're in the process of migrating off of NetWare and onto Windows, it means we're now in the process of wrangling over rules governing Active Directory management.

We wrangled NDS control back in the 90's and early 00's, and now it's Microsoft's turn. As with the last round of NDS wrangling, some departments have gleefully turned over control (GPOs and file-server management specifically) of their department over to us in ITS. Others, specifically one with a large local IT presence, is really holding out for complete control of their area. They're clearly angling to just use us as an authentication provider and they'll do the rest, something that... well... negotiations are ongoing.

My crystal ball says we have somewhere between 5 to 10 years before the next wave of 'directory' upgrade forces another consolidation. That consolidation just might involve consolidating with a State agency of some kind. Perhaps the State will force us to use a directory rooted in the wa.gov DNS domain (wwu.univ.wa.gov perhaps), and our Auth servers will be based in Olympia rather than on our local network. Don't know. What is true, is that we'll be going through this again, probably within the next decade.
I'm going over some of my older posts and am reposting some of the good stuff that's still relevant. I've been at this a while, so there is a good week's worth of good essays hiding in the archives.
This one got some inbound links and attracted several readers. The question was asked on ServerFault but I had to echo by reply on my blog since it was too juicy. I've been working with directories since 1996 when I first ran into Novell NDS, so the concepts behind LDAP are engraved on my bones it seems like. So explaining it to someone else is an effort in... restraint of information. They don't need to know every little detail, just enough to get the concepts.

I still get wordy.

Explaining LDAP.

Account lockout policies

| No Comments
This is another area where how Novell and Microsoft handle a feature differ significantly.

Since NDS was first released back at the dawn of the commercial internet (a.k.a. 1993) Novell's account lockout policies (known as Intruder Lockout) were set-able based on where the user's account existed in the tree. This was done per Organizational-Unit or Organization. In this way, users in .finance.users.tree could have a different policy than .facilities.users.tree. This was the case in 1993, and it is still the case in 2009.

Microsoft only got a hierarchical tree with Active Directory in 2000, and they didn't get around to making account lockout policies granular. For the most part, there is a single lockout policy for the entire domain with no exceptions. 'Administrator' is subjected to the same lockout as 'Joe User'. With Server 2008 Microsoft finally got some kind of granular policy capability in the form of "Fine Grained Password and Lockout Policies."

This is where our problem starts. You see, with the Novell system we'd set our account lockout policies to lock after 6 bad passwords in 30 minutes for most users. We kept our utility accounts in a spot where they weren't allowed to lock, but gave them really complex passwords to compensate (as they were all used programatically in some form, this was easy to do). That way the account used by our single-signon process couldn't get locked out and crash the SSO system. This worked well for us.

Then the decision was made to move to a true blue solution and we started to migrate policies to the AD side where possible. We set the lockout policy for everyone. And we started getting certain key utility accounts locked out on a regular basis. We then revised the GPOs driving the lockout policy, removing them from the Default Domain Policy, creating a new "ILO polcy" that we applied individually to each user container. This solved the lockout problem!

Since all three of us went to class for this 7-9 years ago, we'd forgotten that AD lockout policies are monolithic and only work when specified in Default Domain Policy. They do NOT work per-user the way they are in eDirectory. By doing it the way we did, no lockout policies were being applied anywhere. Googling on this gave me the page for the new Server 2008-era granular policies. Unfortunately for us, it requires the domain to be brought to the 2008 Functional Level, which we can't do quite yet.

What's interesting is a certain Microsoft document that suggested settings of 50 bad logins every 30 minutes as a way to avoid DoSing your needed accounts. That's way more that 6 every 30.

Getting the forest functional level raised just got more priority.

I have a degree in this stuff

| 1 Comment
I have a CompSci degree. This qualified me for two things:
  • A career in academics
  • A career in programming
You'll note that Systems Administration is not on that list. My degree has helped my career by getting me past the "4 year degree in a related field" requirement of jobs like mine. An MIS degree would be more appropriate, but there were very few of those back when I graduated. It has indirectly helped me in troubleshooting, as I have a much better foundation about how the internals work than your average computer mechanic.

Anyway. Every so often I stumble across something that causes me to go Ooo! ooo! over the sheer computer science of it. Yesterday I stumbled across Barrelfish, and this paper. If I weren't sick today I'd have finished it, but even as far as I've gotten into it I can see the implications of what they're trying to do.

The core concept behind the Barrelfish operating system is to assume that each computing core does not share memory and has access to some kind of message passing architecture. This has the side effect of having each computing core running its own kernel, which is why they're calling Barrelfish a 'multikernel operating system'. In essence, they're treating the insides of your computer like the distributed network that it is, and using already existing distributed computing methods to improve it. The type of multi-core we're doing now, SMP, ccNUMA, uses shared memory techniques rather than message passing, and it seems that this doesn't scale as far as message passing does once core counts go higher.

They go into a lot more detail in the paper about why this is. A big one is hetergenaity of CPU architectures out there in the marketplace, and they're not just talking just AMD vs Intel vs CUDA, this is also Core vs Core2 vs Nehalem. This heterogenaity in the marketplace makes it very hard for a traditional Operating System to be optimized for a specific platform.

A multikernel OS would use a discrete kernel for each microarcitecture. These kernels would communicate with each other using OS-standardized message passing protocols. On top of these microkernels would be created the abstraction called an Operating System upon which applications would run. Due to the modularity at the base of it, it would take much less effort to provide an optimized microkernel for a new microarcitecture.

The use of message passing is very interesting to me. Back in college, parallel computing was my main focus. I ended up not pursuing that area of study in large part because I was a strictly C student in math, parallel computing was a largely academic endeavor when I graduated, and you needed to be at least a B student in math to hack it in grad school. It still fired my imagination, and there was squee when the Pentium Pro was released and you could do 2 CPU multiprocessing.

In my Databases class, we were tasked with creating a database-like thingy in code and to write a paper on it. It was up to us what we did with it. Having just finished my Parallel Computing class, I decided to investigate distributed databases. So I exercised the PVM extensions we had on our compilers thanks to that class. I then used the six Unix machines I had access to at the time to create a 6-node distributed database. I used statically defined tables and queries since I didn't have time to build a table parser or query processor and needed to get it working so I could do some tests on how optimization of table positioning impacted performance.

Looking back on it 14 years later (eek) I can see some serious faults about my implementation. But then, I've spent the last... 12 years working with a distributed database in the form of Novell's NDS and later eDirectory. At the time I was doing this project, Novell was actively developing the first version of NDS. They had some problems with their implementation too.

My results were decidedly inconclusive. There was a noise factor in my data that I was not able to isolate and managed to drown out what differences there were between my optimized and non-optimized runs (in hindsight I needed larger tables by an order of magnitude or more). My analysis paper was largely an admission of failure. So when I got an A on the project I was confused enough I went to the professor and asked how this was possible. His response?
"Once I realized you got it working at all, that's when you earned the A. At that point the paper didn't matter."
Dude. PVM is a message passing architecture, like most distributed systems. So yes, distributed systems are my thing. And they're talking about doing this on the motherboard! How cool is that?

Both Linux and Windows are adopting more message-passing architectures in their internal structures, as they scale better on highly parallel systems. In Linux this involved reducing the use of the Big Kernel Lock in anything possible, as invoking the BKL forces the kernel into single-threaded mode and that's not a good thing with, say, 16 cores. Windows 7 involves similar improvements. As more and more cores sneak into everyday computers, this becomes more of a problem.

An operating system working without the assumption of shared memory is a very different critter. Operating state has to be replicated to each core to facilitate correct functioning, you can't rely on a common memory address to handle this. It seems that the form of this state is key to performance, and is very sensitive to microarchitecture changes. What was good on a P4, may suck a lot on a Phenom II. The use of a per-core kernel allows the optimal structure to be used on each core, with changes replicated rather than shared which improves performance. More importantly, it'll still be performant 5 years after release assuming regular per-core kernel updates.

You'd also be able to use the 1.75GB of GDDR3 on your GeForce 295 as part of the operating system if you really wanted to! And some might.

I'd burble further, but I'm sick so not thinking straight. Definitely food for thought!

Other Blogs

My Other Stuff

Monthly Archives