February 2011 Archives

Email is not IM

| 3 Comments
Anyone who has worked with email knows this, but that doesn't stop mail users from calling us and asking why a message is late. But hey, what's yet another example? This describes the mail routing a specific message followed.

Received: from [140.160.12.34] by web120819.mail.ne1.yahoo.com via HTTP; Sun, 20 Feb 2011 17:48:33 PST
Received: (qmail 81576 invoked by uid 60001); 21 Feb 2011 01:48:33 -0000
Received: from [127.0.0.1] by omp1039.mail.ne1.yahoo.com with NNFMP; 21 Feb 2011 01:48:33 -0000
Received: from [98.138.88.239] by tm4.bullet.mail.ne1.yahoo.com with NNFMP; 21 Feb 2011 01:48:33 -0000
Received: from [98.138.90.51] by nm24.bullet.mail.ne1.yahoo.com with NNFMP; 21 Feb 2011 01:48:34 -0000
Received: from nm24.bullet.mail.ne1.yahoo.com ([98.138.90.87]) by BAY0-PAMC1-F6.Bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675);Sun, 20 Feb 2011 19:37:38 -0800
Received: from mail pickup service by SNT0-XMR-002.phx.gbl with Microsoft SMTPSVC; Sun, 20 Feb 2011 19:37:48 -0800
Received: from SNT0-XMR-002.phx.gbl (10.13.104.140) by BL2PRD0103HT012.prod.exchangelabs.com (10.6.4.137) with Microsoft SMTP Server id 14.0.650.68; Mon, 21 Feb 2011 03:37:54 +0000

Normally these headers are read from bottom up, but this is ordered top down. Translated, this means:

  1. A message was sent from a computer on campus via a Yahoo mailer (98.138.88.239) at 17:48:33 PST.
  2. This was processed by another Yahoo mailer (98.138.90.51) at 17:48:33 PST
  3. And again (98.138.90.87), at 17:48:33 PST
  4. It was picked up by the Microsoft mailer (10.13.104.140) at 19:37:48
  5. And forwarded on to an ExchangeLabs server (10.6.4.137) where it came to rest.
Note the nearly two hour delay from start to finish. And yet, the mail shows a timestamp of 17:48 in the mailbox in question even though it arrived much later then that. That's because the Date: field was set by the sending mailer and not modified throughout its travels. In this case, the exact source of the delay is uncertain, it could be a queuing delay on the Yahoo side when talking to Microsoft, or it could be a queuing delay on the Microsoft receiving side. Can't tell from here.

This does show that mail consumers tend to assume two key things:

  1. Mail transit is very fast (where 'fast' can be defined as anything under a minute to up to five minutes depending on the user).
  2. The Date is when it is received.
The fact that mail can legitimately take hours to get to where it is going is not good enough for these users. The fact that it takes less than 5 minutes nearly all the time is really gravy, but it does set the service expectation of the entire email service. So we get grief when the expectation runs against the reality of it.

Is network now faster than disk?

Way back in college, when I was earning my Computer Science degree, the latencies of computer storage were taught like so:

  1. On CPU register
  2. CPU L1/L2 cache (this was before L3 existed)
  3. Main Memory
  4. Disk
  5. Network
This question came up today, so I thought I'd explore it.

The answer is complicated. The advent of Storage Area Networking was made possible because a mass of shared disk is faster, even over a network, than a few local disks. Nearly all of our I/O operations here at WWU are over a fibre-channel fabric, which is disk-over-the-network no matter how you dice it. With iSCSI and FC over Ethernet this domain is getting even busier.

That said, there are some constraints. "Network" in this case is still subject to distance limitations. A storage array 40km from the processing node will still see more storage latencies than the same type of over-the-network I/O 100m away. Our accesses are fast enough these days that the speed-of-light round-trip time for 40km is measurable versus 100m.

A very key difference here is that the 'network' component is handled by the operating system and not application code. For SAN an application requests certain portions of a file, the OS translates that into block requests, which are then translated into storage bus requests; the application doesn't know that the request was served over a network.

For application development the above tiers of storage are generally well represented.

  1. Registers, unless the programming is in assembly, most programmers just trust the compiler and OS to handles these right.
  2. L1/2/3 Cache, as above, although well tuned code can maximize the benefit this storage tier can provide.
  3. Main memory, this is directly handled through code. One might argue that at a low level memory handling constitutes a majority of what code does.
  4. Disk, This is represented by file-access or sometimes file-as-memory API calls. These tend to be discrete calls from main memory.
  5. Network, This is yet another completely separate call structure, which means using it requires explicit programming.
Storage Area Networking is parked in step 4 up there. Network can include things like making NFS connections and then using file-level calls to access data, or actual Layer 7 stuff like passing SQL over the network.

For massively scaled out applications, the network has even crept into step 3 thanks to things like memcached and single-system-image frameworks.

Network is now competitive with disk, though so far the best use-cases let the OS handle the network part instead of the application doing it.

Held to an unreasonable standard

| 5 Comments
Yesterday morning we took the first user-visible step in our Exchange 2010 migration. We put the Exchange 2010 Client Access servers into place as our OWA servers. Since we have no one in an Exchange 2010 mailbox, all they'll do is proxy incoming connections to the old 2007 CA servers. Nothing should change in the end-user experience.

However, something else has cropped up.

We actually got the first few reports last week when Entourage stopped working reliably. After the 2010 swap, some mobile devices had their ActiveSync stop working.

We caught some fire today because we didn't catch the fact that certain ActiveSync setups don't work with Exchange 2010. The Android and iOS devices we tried all worked, so we just presumed it wasn't a big problem. Apparently that's wrong.

It seems that the DroidX and certain iOS versions under 4.2 don't sync correctly with Exchange 2010 ActiveSync. It just so happens that a Dean was using an iOS device running a 3.something version and was out of luck, and Deans are high enough the food chain to make their pain felt. We felt it.

The Mobile space is highly, highly fragmented. We simply do not have the resources to test and validate each OS rev on each hardware device out there and provide a 'How To' page for everything. Being expected to provide this is unreasonable. Unfortunate, but unreasonable.

We do warn people that using ActiveSync is a 'best effort' solution. However, today's 'best effort' is tomorrow's broken critical service.

The solution right now is to upgrade your iOS or not use DroidX. The iOS upgrade is free. DroidX users are out of luck until Motorola fixes the problem. There is nothing we can do about that.

Data Protector deduplication

| 1 Comment
When last I looked at HP Data Protector deduplication, I was not impressed. The client-side requirements were a resource hungry joke, and were seriously compromised by Microsoft failover clusters.

I found a use-case for it last week. We have a server in a remote office across a slow WAN link. Backing that up has always been a trial, but I had the right resources to try and use dedupe and at least get the data off-site.

Sometime between then and now (v6.11) HP changed things.

The 'enhincrdb' directory I railed against is empty. Having just finished a full and incremental backup I see that the amount of data transferred over the wire is exactly the same for full and incremental backups, but the amount of data stashed away is markedly different. Apparently the processing to figure out what needs to be in the backup has been moved from the client to the backup server, which make this useless over slow links.

It means that enhanced incremental backups will take just as long as the fulls do, and we don't have time for that on our larger servers. We're still going to stick with an old fashioned Full/Incremental rotation.

It's an improvement in that it doesn't require such horrible client-side resources. However, this implementation still has enough quirks that I just do not see the use-case.

The heartbreak of software licensing

For the most part, I don't deal with software licensing. I like it this way. WWU has one person who has dealing with software licensing as a core part of her job. She does a great job at it! And like a tax-accountant, the fact that this is what she does day in and day out means that I rarely end up applying head to appropriate walls over licensing.

Licensing is a hard, hard problem, especially when you throw in that heady mix of virtualization and discounts into the mix. The software industry as a whole is still figuring out how to license stuff in a virtual environment, and we work in an industry that typically gets discounts above and beyond what others get. Add in a healthy dash of smaller software vendors that have simpler licensing regimes, and you have enough for a full time job.

Our particular place in the software ecosystem is defined by these characteristics:

  • Higher Education
  • State Government (we are a public university that received state support)
  • ~4,200 staff
  • ~14,000 full-time-equivalent students
  • ~23,000 users (+/- 1500 depending on our exact percentage of part-timer students and where we are in the year)
  • ~1,800 computer-lab seats
  • ~3,000 computers
Any given software licensing regime will take one or more of the above. Most will take 'Higher Ed' into account into their licensing, which we appreciate since it makes it cheaper. Being State means we can leverage master-contracts negotiated by the State, but frequently the higher-ed discount makes independent contracts cheaper than going through Olympia.

Then comes the rest of it.

For stuff that everyone gets, like anti-virus software, the per-seat charge is where we pay the closest attention. The last time the AV contract went out for bid, the entity that won did so in large part because they applied their per-seat charge against the staff count (4,200), where the others applied it to the FTE-student + Staff number (18,200); even with a higher per-seat charge that still came in markedly cheaper.

For stuff that goes on every lab-seat, we have a mix of software that has to be licensed for every seat (1800) or a concurrency arrangement that requires a license server. We like the concurrency arrangement, but it means we need a license server.

Which brings me to license servers. We do have a FlexLM license-server, but not everything uses FlexLM. Also, FlexLM version support may end up forcing us to have more than one licensing server (which incurs OS licensing costs). It's working, but a trial.



That's end-user software licensing, but IT-software can be worse. We solved our Microsoft problem by caving and getting a Select agreement. While our client-access-licenses are covered, we still need to purchase Server licenses. Rolling out Exchange 2010 required picking up OS and Exchange server licenses, but we didn't have to worry about umpteen thousand CALs for our end-users. Simple, in its way. However, knowing, or know how to find out, what exact products are covered by our Select agreement is part of why our licensing person does this full time.

I have yet to meet the system administrator of any experience who hasn't had to shake their fist at ala-carte pricing for IT software. We ended up changing our backup vendor over a gross misunderstanding of the differences in licensing models between our old vender and the prospective new one. The new one wasn't any cheaper, the biggest benefit there is that we had to buy new licenses a lot less often than we had to with the old vendor.

IT software pricing competence is the biggest value-add that Value Added Resellers give, at least for us (see also, Higher Ed discounts). We don't do enough volume with IT software for our in-house software person to gain any real experience with it, so we have to trust outsiders. We trust them to know WTF they're talking about when it comes to reselling IT software. Which means that when they fail us, we get very angry. The backup software debacle ended up costing us about 120% more than we expected to pay once we got truly fully licensed for what we wanted to do, and getting to that step took nearly two years before we got it all ironed out. Since then, however, they've caught things we never even knew existed.

Like tax accountants handling very complex returns, getting a definitive answer for what is exactly the amount of licensing needed varies with who you ask. You'd think the rules would be nicely deterministic, but that presupposes complete knowledge of the whole system and uniform understanding of definitions. In the past when attempting to get an idea as to what licensing was required for a specific thing I'm thinking about we've gotten different answers from different people at the vendor itself, as well as a couple of VARs. Four people, four answers, two of whom actually work for who we were trying to buy from, fairly large price spread (though the higher-ed discount percentage was uniform across all of them).

Who do you trust? Whichever one sounds reasonable, and hope. If you get in trouble with a vendor, at least you have someone complicit in your unintended perfidity. In the case of the backup software we learned the hard way that how the software enforces licenses was different from how the VAR understood licensing to work (imperfect understanding of definitions). They did help make things right, but the sheer magnitude of the definition misunderstanding still made it very expensive.



For entities smaller than us, licensing is just as much of a head-ache, especially for end-user software. They may not have a Select agreement so may be purchasing off of the Microsoft/Adobe/Apple rolling cart. Virtual desktops, which we've avoided so far, make that hard to pin down since the definition of 'machine' is variable. I'm not in that space, but I hear things. It's hard.

It's hard everywhere.

I would not be at all surprised if someone could found a business on advising companies on software licensing issues.

A Shadow Copy trick

It turns out that you can access Previous Versions on a remote server from cmd. I thought CMD didn't have hooks for that, but it turns out it can. The syntax is weird:

U:\@GMT-2011.02.09-15:00:22\Dirname\
You have to know the exact time the snapshot was taken, but it's there for access. You can DIR that, copy out of it, even CD into it.

The same applies to UNCs. Prefix the @GMT syntax before the directory you're looking for.

This does NOT apply to local paths, alas.

This is the way it was

| 2 Comments
We're not alone in this, but we have a room called "The Library". In theory, this is where we put the books we share with each other, as well as the boxes of software we're using. Remember when software came in a box with a manual? A community resource, so we don't need six copies of the O'Reilly Perl book. That kind of thing.

However, that's the theory...

Rogue file-servers

Being the person who manages our centralized file-server, I also have to deal with storage requests. The requests get directed to a layer or two higher than me, but I'm the one who has to make it so, or add new when the time comes. People never have enough storage, and when they ask for more sticker-shock means they often decide they can't have it.

It's a bad situation. End-users have a hard time realizing that the $0.07/GB hard-drive they can get from NewEgg has no bearing on what storage costs for us. My cheap-ass storage tier is about $1.50/GB, and that's not including backup infrastructure costs. So when we present a bill that's much more than they're expecting, the temptation to buy one of those 3.5TB 7.2K RPM SATA drives from NewEgg and slap it in a PC-turned-fileserver is high.

Fortunately(?) due to the decentralized nature of the University environment, what usually happens is that users go to their college IT department and ask for storage there. For individual colleges that have their own IT people, this works for them. I know of major storage concentrations that I have absolutely nothing to do with in the Libraries and the College of Science and Technology, and a smaller but still significant amount in Huxley. CST may have as much storage under management as I do, but I can't tell from here.

Which is to say, we generally don't have to worry about this problem. That problem? That's what happens when you have a central storage system that can't meet demand, and no recourse for end-users to fix it some other way.

And I'd hate to be the sysadmin who has to come down on that person like a ton of bricks. I'd do it, I won't like it, because I also hate not meeting my user's needs that flagrantly, but I'd still do it. Having users do that kind of end-run leads to pain everywhere in time.