October 2010 Archives

Clear error messages

By SysAdmin1138 on October 31, 2010 8:48 AM

It isn't often this happens, but every so often I run into a really clear error message. All too often it's stuff like "0x80042000" that I then have to google to figure out. So imagine my delight when I ran into this baby the other day:

Hard to be more clear than, "dlu=0DEAD:DEADh".

Fixing it turned out to be remarkably easy. Just pop the battery card off of the controller and reinsert. That seems to have cleared whatever it was that caused this. And off I go. In this case, this server had been powered off for close to 3 months and I'm guessing the RAID battery had simply fully discharged.

Spam checker failure

By SysAdmin1138 on October 31, 2010 8:22 AM

Something went weird with the comment anti-spam system I use and it purged all of the non-spam comments published since 10/20. This does include all of the very nice comments I had on the last post, Guaranteed Delivery of Email, which was all about spam. Rar! It was one of my most-commented-on posts, too.

Happily for me, I saw the strange state it was in before the comments got lost, so if something similar happens again I should be able to file a bug-report. Very sorry about that.

Guaranteed delivery of email

By SysAdmin1138 on October 26, 2010 10:31 AM

Simply put, can't be done.

Even in the pre-spam Internet, SMTP was designed around the fact that email delivery is fundamentally best-effort. Back then it was more about handling network outages, but the fact remains that by design email is best-effort not guaranteed.

That doesn't stop people from demanding it, though.

More recently, with the Virginia Tech shootings a lot (I'd even go so far as to say most) of Higher Ed has taken a really close look at emergency alerting systems. WWU is not immune to this. We have an outside entity that handles this, and WWU upper administration has asked that such emails NOT end up in the Junk Mail folder (we also have an SMS alerting system to go along side this). This is harder than they think, which just makes my life less pleasant every time such a notice does end up in a Junk Mail folder.

With spam making up anywhere from 92-98% of all incoming email, email is fundamentally lossy these days. We LIKE it that way. The hard part is picking the good stuff out of the sea of bad stuff. And unfortunately, there is no one way to guarantee email WILL be seen by the recipient.

The most recent major junking event was because our outside mailing entity changed what servers they do their mailing from, which meant they weren't getting the benefit of the IP-whitelist. Either they didn't notify us, or the people who communicate with them didn't realize that was important and therefore didn't send the change notice to me. The fact that the message in question was a simple web-page copy with a lot of hyper-linked images just made it extra-spammy.

The WWU marketing department has been having their "WWU News" messages, emails with lots of links including mentions of WWU and links to WWU events, end up in Junk Mail about 80% of the time, even though the service they send through is ALSO whitelisted.

The one thing that makes my life all too interesting when attempting to guarantee email delivery? Outlook's junkmail filters. We can't do a thing with them, and Microsoft purposely makes them hard to predict. Nearly all of the junkings I end up troubleshooting end up being Outlook independently deciding it was crap and binning it. I can guarantee delivery right up to the point where Outlook analyzes new messages for spam-factors, but once it gets there all bets are off.

Unfortunately.

I can't guarantee email delivery. I never could, but it's harder now.

The marketplace of ideas

By SysAdmin1138 on October 21, 2010 11:45 AM

Since I started paying attention to the tech industry in the 1990's certain patterns have emerged.

Hot new topic arrives.
Lots of startup activity around hot topic.
The big boys start getting into hot topic (and tend to be bad at it).
The big boys start buying startups (in order to become good at it).
One or more of the startups becomes a big boy themselves.
The big companies prospect fresh startups for good ideas and buy the companies for their tech.

In the end, what starts as a flurry of creativity in independent directions ends up in a MUCH more consolidated market once said market matures. There are still smaller startups around, but all of the big boys have worthwhile products, and they're the ones with the marketing muscle and application-suites to bundle them in.

The same kind of marketplace seems to apply to networking ideas, though the timescale is longer. Back when spam started REALLY annoying A LOT of people, the topic was what to do about spam. The fundamental problem is that the SMTP protocol and the associated "how to send mail on the internet" RFCs don't have any kind of trust model built into them, they're purely a transport protocol. Garbage goes in, just as much garbage comes out the other side. It's efficient that way. To those trying to keep a clean mailbox this is a problem.

Many, many, many ideas were floated for how to patch SMTP, or otherwise introduce some kind of trust or costing model that would provide a meaningful disincentive to sending 1.5 kajillion emails. Most of these are still around in some form (even in the above list, when an industry gets to stage 6 there are still weeny stage-2 startups poking around with just enough of a customer base to remain viable), though not as widely deployed as they once were.

Stamps for e-mail! Make the US Post Office provide a $0.05 per email "stamp" to certify its from a real person! 1.5 kajillion emails become really expensive then! Win!
Central post-office certified mailers. Each mailer in a country has to be certified by their Postal Authority. When sending mail, pass your authority token so the receiving mailer can check it out. Mass mailers on Deutche Telecom dial-up won't be certified and can't send 1.5 kajillion messages. Win!
Mandate secure signatures. Email not signed by PGP/S-MIME? Toss it. Win!
Email Captcha. Sending an email to someone new? You have to reply to an automatic message (or click a web-link) before it'll consider you 'real' and pass your message. Sorry. Anyone sending 1.5 kajillion messages will in no way be able to reply to 1.5 kajillion of those. Win!
Keyword searching. Some topics aren't valid in work email. Toss 'em. Win!
Bayesian filtering. Dynamically figure out what's spam and what's not based on how a Real Live Person ranks incoming mail. Once it knows how you roll, all spam will go away. Win!
Address book registry! All people have to be certified as a real, and where they send mail from. Some skeevy Slovenian impersonating the CFO won't be able to do that. Win!
Sender Policy Framework! Entities list the IP addresses that send their mail. Anything not coming from there can be tossed. Win!
IP Reputation! The current king. Keep track of what IPs send bad mail. Then blacklist them.

What I and others have noticed is that how people access the Internet is undergoing a similar evolution. Until the wireless broadband revolution, Internet access was pretty much done from general-purpose computing devices known as 'desktop' and 'laptop' with occasional forays into special-purpose devices like 'webTV'. Now that the wires have gone away, things have shrunk again and you've got smartphones and tablets in the mix.

Unlike how the initial Internet developed, the companies pushing out the new phones and tablets have realized that providing central control of that experience can avoid some of the pitfalls of a fully open infrastructure. This is a new interaction model for people and uptake has been strong. To me this suggests that we're entering the 'consolidation' phase of Internet access, and this is where the fight gets real.

We already have entire countries attempting to regulate who can go where on the Internet, and now we have large companies doing the same thing. Unlike countries, at the moment people are opting into central control by these companies. Things Just Work, which is very attractive. To those who look at the innovation that the fully open Internet provided, this sort of curated access is very threatening.

That said, innovation won't stop, it'll just get subsumed into the big companies. Once stage 5 is hit, the creative froth starts dying down. Or rather, what generated the froth is still there, it just doesn't engender new companies at the same rate; the existing companies prospect for new ideas and glomb onto them a lot faster than they did in stage 2.

For a GREAT visual of this process, NASA has an excellent video shot on the ISS: link

Much like the consolidation list I started this post with, you'll still get a steady stream of startup ideas, but there will be some downward pressure from the central authorities. Much hay has been made over what Apple will and won't let into the App Store, which is a great case in point.

Will curated-computing replace the open-access computing we've had until now? I think it'll replace a large part of the market, but not all of it. Things Just Work, and that's a powerful driver. People don't mind closed access so long as anything they want to do is available in the box with them. This is why Apple keeps promoting the number of apps in the App Store, and why Android, Blackberry, and HP Palm do the same thing for their application storefronts.

Once access narrows down into a small group of companies, centralized control becomes a LOT easier. The Internet has been transformative because it's so decentralized it's tricky to lock down. Places like China and Iran do so with limited effect. Having a few smaller targets to aim laws at (Blackberry & India for example) makes exerting territorial control markedly easier. It is my opinion that this will provide a counter-incentive for the upkeep of these curated technologies.

Unfortunately, we'll have to wait and see how the market settles out.

Professional SysAdmin organizations

By SysAdmin1138 on October 19, 2010 2:21 PM | 3 Comments

Whenever you talk about system-administration professional organizations, a very short list comes to mind. If any, I've met admins who've never heard of these.

USENIX.Arguably the grand-daddy of them all.
SAGE. The SysAdmin special interest group of Usenix.
LOPSA. League of Professional System Administrators.

What's interesting to note is that each of these has a foundation in Usenix. LOPSA got its start in an effort to bring SAGE outside of Usenix. Usenix and LOPSA have joined to put on the LISA (Large Installation System Administration) conferences.

I'd deeply love to go to LISA 2010, but no one is willing to give me the $3K it'll take to do it all. WWU certainly isn't, it's out-of-state travel and therefore banned this year, and I certainly can't afford it out of private funds. And looking at the session list there is almost nothing that's directly related to what I do for a living. I still want to go because I want to know about those other things, and would like to expand my career that direction.

I recently joined LOPSA since their goals are admirable and I'd like to be a part of that. However, the entire organization shows just the kind of anti-Windows bias you'd expect from an organization founded by a bunch of UNIX admins & engineers in the very pre-Windows 1970's. They say big tent, and I'll take it at its word. But the bias still shows.

Taking a look at the LISA technical-sessions I can find absolutely no sessions that directly discuss Windows installations. A lot of it is networking, which is a key factor in a large installation, but even the one session discussing large scale encrypted laptop backup and deduplication (neat!) is talking about OS X laptops. There are a couple that are tangentially related to Windows, specifically managing VMs and SLAs, but nothing direct.

The Training sessions are better, as there are some things in there that really are generic to the System Administration job, such as time management, better troubleshooting methodologies, and effective documentation. There is a lot of Storage Administration in there as well, which tends to cross boundaries. Also a few Linux-specific sessions, though again nothing Windows specific.

Given the look of LISA's program guide, if I were a pure Windows admin I probably wouldn't look twice at it. I'd see a bunch of Linux/Unix admins and we all know the kind of sneer they get when Windows comes up in conversation. Barely relevant, and not worth the effort. As it is I do have Linux experience and a fair amount of Storage admin as well, so the conference is relative to me. I even have sufficient Linuxy T-shirts for protective coloration! And my laptop is running openSUSE so I wouldn't feel self-conscious checking email in the middle of a crowded hall.

Or maybe it's because Windows admins haven't HAD a professional organization planted in the center of their sphere for decades the way that Unix-land has, and therefore don't consider it important. I don't know. But the SAGE and LOPSA outreach efforts aren't terribly effective either way. Maybe I can change that.

A well-constructed directory-services tree

By SysAdmin1138 on October 15, 2010 12:45 PM

There is a certain question that has shown up in pretty much every class about how to set up an X500-compliant directory service (thats things like Active Directory, NDS, and eDirectory). It goes like this:

You have been hired as a consultant to set up $FakeCorpName's new $Directory. They have major offices in five places. New York, Los Angeles, London, Sydney, and Tokyo. They have five $OldTech. What is the directory layout you recommend?

I originally ran into this particular question when I was getting my Certified Novell Administrator certification back in 1996. In that case $Directory was NDS and $OldTech was actually other NDS trees. In 2000/2001 when I was getting my Active Directory training, $Directory was AD and $OldTech was NT4 domains. The names of the countries did not vary much between the two. NYC and LA are always there, as are London and Tokyo. Sometimes Paris is there instead of Sydney. Once in a great while you'll see Hong Kong instead of Tokyo. In a fit of continental inclusiveness, I think I saw "Johannesburg" in there once (in an Exchange class IIRC). I ran into this question again recently in relation to AD.

This is a good academic question, but you will never, ever get it that easy in real life. This question is good for considering how geographically diverse corporate structure impacts your network layout and the knock-on effects that can have on your directory structure. However, the network is only a small part of the overall decision making process when it comes to problems like these.

The major part? Politics.

It is now 2010. Multi-national companies have figured out this 'office networking' thingy and have a pre-existing infrastructure. They have some kind of directory tree, somewhere, even if it only exists in their ERP system (which they all have now). They have office IT people who have been doing that work for 15+ years. A company that size has probably ~~eaten~~ bought out competitors, which introduces strange networking designs to their network. Figuring out how to glue together 5 geographically separate WinNT4.0 domains in 2010 is not useful. The problem is not technical, it's business.

1996
In 1996, WAN links were expensive and slow. NDS was the only directory of note on the market (NIS+ was a unix directory, therefore completely ignored in the normal business windows-only workplace). Access across WAN links was generally discouraged unless specifically needed. Because of this, your WAN links gave you the no-brainer divisions in your NDS tree where Replicas needed to be declared. All the replication traffic would stay within that site and only external reference resolution would cross the expensive WAN. Resources the entire company needed access to might go in a specific, smaller, replica that gets put on multiple sites.

This in turn meant that the top levels of your NDS tree had a kind of default structure. Many early NDS diagrams had a structure like this:

An early NDS diagram

Each of the top-level "C" containers was a replica. The US example was given to show how internal organization could happen. Snazzy! However, this flew in the face of real-work experience. Companies merge. Bits get sold off. By 2000 Novell was publishing diagrams similar to this one:

A later NDS diagram

This one was designed to show how company mergers work. Gone are the early "C" containers, in their place are "O". Merging companies? Just merge that NDS tree into a new O, and tada! Then you can re-arrange your OUs and replicas at your leisure.

This was a sign that Novell, the early pioneer in directories like this, had their theory run smack into reality with bad results. The original tree style with the top level C containers didn't handle mergers and acquisitions well. Gone was the network purity of the early 1996 diagrams, now the diagrams showed some signs of political influence.

2000
In 2000, Microsoft released Windows 2000 and Active Directory. The business world had been on the Internet for some time, and the .com boom was in full swing. WAN links were still expensive and slow, but not nearly as slow as they used to be. The network problem Microsoft was faced with was merging multiple NT4 domains into a single Active Directory structure.

In 2000, AD inter-DC replication was a lot noisier than eDirectory was doing at the time, so replication traffic was a major concern. This is why AD introduced the concept of Sites and inter-Site replication scheduling. Even so, the diagrams you saw then were reminiscent of the 1996 NDS diagrams:
An early AD diagram

As you can see, separate domains for NYC and LA are gone, which is recognition that in-country WAN links may be fast enough for replication, but transcontinental links were still slow. Microsoft handled the mergers-and-acquisitions problem with inter-domain trusts (which, thanks to politics, tend to be hard to get rid of once in place).

AD replication improved with both Server 2003 and Server 2008. The Microsoft ecosystem got used to M&A activity the same way Novell did a decade earlier and changes were made to best practices. Also, network speeds improved a lot.

2010
In 2010 WAN links are still slow relative to LAN links, but they're now fast enough that directory replication traffic is not a significant load for all but the slowest of such links. Even trans-continental WAN links are fat enough that directory replication traffic doesn't eat too much valuable resources.
An AD tree in the modern era

Note how simple this is.There is an empty root to act as nothing but the root of an entire tree. Northwinds is the major company and it recently bought DigitalRiver, but hasn't fully digested it yet. Note the lack of geographic separation in this chart. WAN speeds have improved (and AD replication has improved) enough that replicating even large domains over the WAN is no longer a major no-no.

And yet... you'll rarely see trees like that. That's because, as I said, network considerations are not the major driver behind organization these days, it's politics.

Take the original question at the top of this post. Consider it 5 one-domain AD trees, and each country/city is its own business unit that's large enough to have their own full IT stack (people dedicated to server, desktop, web support, and developers supporting it all), and has also been that way for a number of years. This is what you'll run into in real life. This is what will monkey-wrench the network purity of the above charts.

The biggest influence towards whether or not a one-domain solution can be reached will be the political power behind the centralizing push, and how uncowed they get when Very Important People start throwing their weight around. If the CEO is the one pushing this and brooks no argument, then, well, it's more likely to happen. If the COO is the one pushing it, but caves to pressure in order to not expend political capital with regards to unrelated projects, you may end up with a much more fragmented picture.

There will be at least one, and perhaps as many as five, business units that will insist, adamantly, that they absolutely have to keep doing things the way they've always been doing it, and they can't have other admins stomping around their walled garden in jack-boots. Whether or not they get their way is a business decision, not a technical one. Caving into these demands will give you an AD structure that includes multiple domains, or worse, multiple forests.
Fragmented AD environment

In my experience, the biggest bone of contention will be who gets to be in the Domain and Enterprise Admins groups. Those groups are the God Groups for AD, and everyone has to trust them. Demonstrating that only a few tasks require Domain Admin rights and that nearly all day-to-day administration can be done through effectively delegated rights will go a long way towards alleviating this pressure, but that may not be enough to convince business managers weighing in on the process.

The reason for this resistance is that this kind of structural change will require changes to operational procedures. You may think IT types are used to change, but you'd be wrong. Change can be resented just as fiercely in the ranks of IT-middle-managers as it is in rank-n-file clerks. Change for change's sake is doubly resented.

Overcoming this kind of political obstructionism is damned hard. It takes real people skills and political backing. This is not the kind of thing you can really teach in an MCSE/MCITP class track. Political backing has to already be in place before the project even gets off the ground.

I haven't been in an MCSE/MCITP class, so I don't know what Microsoft is teaching these days. I ran into this question in what looks like a University environment, which is a bit less up-to-date than getting it direct from Microsoft would be. Perhaps MS is teaching this with the political caveats attached. I don't know. But they should be doing so.

File-system obsessions

By SysAdmin1138 on October 12, 2010 9:01 AM | 1 Comment

If there is one thing that separates Windows sysadmins from Linux sysadmins it is worry about file-systems. In Windows, there is only one file-system, NTFS, and it updates whenever Microsoft releases a new Server OS. The main concerns are going to be about correct block-size selection for the expected workload and in recent versions, how and whether to use ShadowCopy.

The Linux side is a lot more complex. Because of this complexity sysadmins pick favorites out of the pack through a combination of empirical research and just plain gut-level "I like it". The days when ext2 and ext3 were your only choices are long, long gone. Now the decision of how to format your storage for specific data loads has to take into account the various file-system features, strengths, weakness, and quirks, and then how to optimize the mkfs settings for a best fit. A file-system that will contain a few directories with tens to hundreds of thousands of files in them is probably not the same filesystem you'd pick to handle 100's of TB worth of 8-20GB files.

Whenever a new file-system hits the Linux kernel there is a lot of debate over its correct usage and whether or not it'll replace older file-systems. btrfs and ext4 have gotten the most debate recently, with ext4 finally in 'stable' while btrfs remains 'experimental'. ZFS continues to be something everyone wants but can't have, the Solaris admins get to be smug and the BSDians ignore all the fuss and just use it, while the btrfs devs ask for patience while they finish making a file-system that does everything XFS can do.

What this means is that you sometimes see Linux admins succumbing to New Shiny! syndrome and installing an experimental file-system on a production system. I saw this frequently with ext4 while it was still staging up. When used with LVM, the new file-systems have a lot of very interesting features that aren't possible with ye olde ext3+LVM.

Break previous directory-size limits
Use extents instead of block-allocation, which reduces fragmentation and speeds up fscks
Tracking of free block groups makes fscks go even faster
Checksumming journal writes for consistency
Checksumming entire file-system structures for improved consistency checking
Better timestamp granularity, for when milliseconds are too large for your application
In-filesystem snapshots, above and beyond the snapshots LVM allows
Improved allocated file tracking makes handling large directories much more efficient

What's not to love? It's for this reason that you get questions like this one on ServerFault wondering if Windows has a file-system that can do log-checksumming, where the asker has a New Shiny! feature they really like and clearly hasn't realized that there is only one file-system on Windows and it's what Microsoft gives you. Apple does exactly the same thing with their XServer gear, and so did Novell until they ported NetWare's key features over to Linux (NSS is very good, but some workloads are best suited to something else).

This can be a real challenge for Windows admins attempting to get into the Linux space, since "filesystem choice" is not something they've had to worry about since FAT stopped being a viable option. The same goes for Linux admins getting into Windows administration, the lack of choice is seen as yet another sign that Windows is fundamentally inferior to Linux. The differing mindsets are something I see in the office a few times a year.

Printing from ancient history

By SysAdmin1138 on October 6, 2010 10:23 AM | 1 Comment

Over the summer we tested a lot of things, one of which was Windows 7 in the computer labs. The other thing was Office 2010 in the labs. We did learn one thing, though.

Word 2010 files printing to an HP LaserJet 9050 with the Microsoft-supplied PCL5 driver yields some straaange page-counts on the printer.

Most people wouldn't care about this, since most people only look at the printer's page-counter once a quarter if that often. We care about it since we use that number for auditing. Students only get so many pages a quarter, and PCounter allows the use of the printer's page-counter as a double-check to driver-counted pages. It'll ding the user the amount of quota for paper that came out of the printer.

We were getting cases where a file with 7 pages would cause the printer to report it had printed 593 pages. Or in one case last week, a 19 page document dinged the user 2239 pages; more paper than the printer can hold. In every case it was a DOCX file that did it, on a printer using the PCL5 driver.

The fix is to use the PCL6 driver, which we're doing now. Historically we've avoided the PCL6 driver for reasons that our lab managers haven't made clear to me. They just don't work right, apparently. There are 'some issues'. Their labs, so I've rolled with it. But we're on PCL6 since that works with Office 2010.

Then I did some digging on PCL5 vs PCL6 and came across the Wikipedia page with the timeline. PCL5c/e, the version we're using, was introduced in 1992. Wowzers. PCL6 was introduced in 1995. I guess 15 years is enough time to get a printer description language right, eh?

« September 2010 | Main Index | Archives | November 2010 »