Recently in virtualization Category

As I look around the industry with an eye towards further employment, I've noticed a difference of philosophy between startups and the more established players. One easy way to see this difference is on their job postings.

  • If it says RHEL and VMWare on it, they believe in support contracts.
  • If it says CentOS and OpenStack on it, they believe in community support.

For the same reason that tech startups almost never use Windows if they can get away with it, they steer clear of other technologies that come with license costs or mandatory support contracts. Why pay the extra support cost when you can get the same service by hiring extremely smart people and use products with a large peer support community? Startups run lean, and all that extra cost is... cost.

And yet some companies find that they prefer to run with that extra cost. Some, like StackExchange, don't mind the extra licensing costs of their platform (Windows) because they're experts in it and can make it do exactly what they want it to do with a minimum of friction, which means the Minimum Viable Product gets kicked out the door sooner. A quicker MVP means quicker profitability, and that can pay for the added base-cost right there.

Other companies treat support contracts like insurance: something you carry just in case, as a hedge against disaster. Once you grow to a certain size, business continuity insurance investments start making a lot more sense. Running for the brass ring of market dominance without a net makes sense, but once you've grabbed it keeping it needs investment. Backup vendors love to quote statistics on the percentage of business that fail after a major data-loss incident (it's a high percentage), and once you have a business worth protecting it's good to start protecting it.

This is part of why I'm finding that the long established companies tend to use technologies that come with support. Once you've dominated your sector, keeping that dominance means a contract to have technology experts on call 24/7 from the people who wrote it.

"We may not have to call RedHat very often, but when we do they know it'll be a weird one."


So what happens when startups turn into market dominators? All that no-support Open Source stuff is still there...

They start investing in business continuity, just the form may be different from company to company.

  • Some may make the leap from CentOS to RHEL.
  • Some may contract for 3rd party support for their OSS technologies (such as with 10gen for MongoDB).
  • Some may implement more robust backup solutions.
  • Some may extend their existing high-availability systems to handle large-scale local failures (like datacenter or availability-zone outages).
  • Some may acquire actual Business Continuity Insurance.

Investors may drive adoption of some BC investment, or may actively discourage it. I don't know, I haven't been in those board meetings and can argue both ways on it.

Which one do I prefer?

Honestly, I can work for either style. Lean OSS means a steep learning curve and a strong incentive to become a deep-dive troubleshooter of the platform, which I like to be. Insured means someone has my back if I can't figure it out myself, and I'll learn from watching them solve the problem. I'm easy that way.

The evil genius of OSv

| No Comments

One of the talks here at LISA13 was one about a new Cloud-optimized operating system called OSv. This is a new thing, and I hadn't heard of it before. Why do we need yet another OS? And one that doesn't even run a Linux kernel? I was frowning through the talk until I got to this slide:

NotNetware.jpg

That's the point when I said:

Holy shit! They've built a 64-bit NetWare!

  • Cooperative multi-tasking? Check!
  • A shared memory space? Check!
  • Everything runs in Ring 0? Check!

There were a few other things that made the parallel even more clear to me, but this is a stunning display of evil genius. Even though Novell tried for ten years to promote NetWare as a perfectly legitimate general purpose server for application serving, it never really took off. There were several reasons for this (not exhaustive):

  • It was a pain to develop for. The NLM model never got anything approaching wide-spread adoption so you had to get everything just right.
  • The shared memory space meant that the OS allowed you to stomp all over other processes running on the system, something that other OSs (Windows, Linux) don't allow.
  • If something did manage to wiggle out of the app and into the kernel, it had free reign (though in practice all it did was abend the server; writing exploits is subject to the first bullet-point problem).
  • It didn't have any concept of forking, just threads. Which changed the multi-processing paradigm from what it was on most other platforms and made porting software to it a pain.
  • There were no significant user-space utilities (grep/sed/awk/bash), though they did get some of that well after they'd lost the battle.

All of these made NetWare a challenging platform to develop for, and challenging platforms don't get developed for. Novell tried to further encourage people to develop for it by getting the Java JVM ported to NetWare so people could run Java apps on it. Few did, though it was quite possible; search for "netstorage" on this blog to get one such application that saw a lot of use.

Have I mentioned that OSv's first release ships with a JVM on it?


The Evil Genius part is that they're not wrong, things really do run faster when you write a kernel like that and run things in the same memory space as the kernel. I got pretty nice scaling with Apache when I was running it on NetWare.

The Evil Genius part is that they're designing this system to be a single-app system, not a general purpose system like NetWare was supposed to be. It runs a JVM, and that's it. The JVM can only stomp on itself and the kernel, and apps can stomp on each other within the limits of the JVM.

The Evil Genius part is that if it does fall over, it's designed to be flushed and a fresh copy spun up in its place. Disposable servers! NetWare servers of old were bastion hosts that Shall Never Go Down. OSv? Not the same thing at all.

The Evil Genius part is that they're doing this in an era where a system like this can actually succeed.

The Evil Genius part is that everyone looks at what they're doing and goes, "...uh HUH. Riiiight. LIke that's a good idea." And like evil geniuses of the past will go unrecognized and slink off to some dark corner somewhere to cackle and dream of world domination that will never happen.

Anyone taking DevOps to heart should read about Normal Accidents. The book is about failure modes of nuclear power plants; those highly automated and extremely instrumented things that they are still manage to fail in spite of everything that we do. The lessons here carry well into the highly automated environments we try to build in our distributed systems.

There are a couple of key learnings to take from this book and theory:

  • Root cause can be something seemingly completely unrelated to the actual problem.
  • Contributing causes can sneak in and make what would be a well handled event into something that gets you bad press.
  • Monitoring instrumentation failures can be sneaky contributing causes.
  • Single-failure events are easily handled, and may be invisible.
  • Multiple-failure events are much harder to handle.
  • Multiple-failure events can take months to show up if the individual failures happened over the course of months and were invisible.

The book had a failure mode much like this one:

After analysis, it was known that the flow direction of a specific coolant pipe was a critical item. If backflow occurred, hot fluid could enter areas not designed for handling it. As a result, a system was put in place to monitor flow direction, and automation put in place to close a valve on the pipe if backflow was detected.

After analyzing the entire system after a major event, it was discovered that the flow-sensor had correctly identified backflow, and had activated the valve close automation. However, it was also discovered that the valve had frozen open due to corrosion several months prior to the event. Additionally, the actuator had broken when the solenoid moved to close the valve. As a result, the valve was reported closed, and showed as such on the Operator panel, when in fact it was open.

  • The valve had been subjected to manual examination 9 months before the event, and was due to be checked again in 3 more months. However, it had failed between checks.
  • The actuator system was checked monthly and had passed every check. The actuator breakage happened during one of these monthly checks.
  • The sensor on the actuator was monitoring power draw for the actuator. If the valve was frozen, the actuator should notice an above-normal current draw. However, as the actuator arm was disconnected from the valve it experienced a below-normal current draw and did not detect this as an alarm condition.
  • The breaking of the actuator arm was noted in the maintenance report during the monthly check as a "brief flicker of the lamp" and put down as a 'blip'. The arm failed before the current meter triggered its event. As the system passed later tests, the event was disregarded.
  • The backflow sensor actually installed was not directional. It alarmed on zero-flow, not negative-flow.

Remediations:

  • Instrument the valve itself for open/close state.
  • Introduce new logic so that if the backflow sensor continues to detect backflow, raise alarms.
  • Replace the backflow sensor with a directional one as originally called for.
  • Add a new flow sensor behind the valve.
  • Change the alerting on the actuator sensor to alarm on too-low voltages.
  • Increase the frequency of visual inspection of the physical plant

That valve being open caused Fun Times To Be Had. If that valve system had been operating correctly, the fault that caused the backflow would have been isolated as the system designers intended and the overall damage contained. However, this contributing cause, one that happened months before the triggering event, turned a minor problem into a major one.

So, why did that reactor release radioactive materials into the environment? Well, it's complicated...

And yet, after reading the post-mortem report you look at what actually failed and think, 'and these are the jokers running our nuclear power plants? We're lucky we're not all glowing in the dark!'

We get the same kind of fault-trees in massively automated distributed systems. Take this entirely fictional, but oh-so-plausible failure cascade:

ExampleCorp was notified by their datacenter provider of the need for emergency power maintenance in their primary datacenter. ExampleCorp (EC) operated a backup datacenter and had implemented a hot failover method, tested twice a year, for moving production to the backup facility. EC elected to perform a hot failover to the backup facility prior to the power work in their primary facility.

Shortly after the failover completed the backup facility crashed hard. Automation attempted to fail back to the primary facility, but technicians at the primary facility had already begun, but not yet completed, safe-shutdown procedures. As a result, the fail-back was interrupted part way through, and production stopped cold.

Service recovery happened at the primary site after power maintenance completed. However, the cold-start script was out of date by over a year so restoration was hampered by differences that came up during the startup process.

Analysis after the fact isolated several causes of the extensive downtime:

  • In the time between the last hot-failover test, EC had deployed a new three-node management cluster for their network switch configuration and software management system, one three node cluster for each site.
  • The EC-built DNS synchronization script used to keep the backup and primary sites in sync was transaction oriented. A network fault 5 weeks ago meant the transactions related to the DNS update for the cluster deployment were dropped and not noticed.
  • The old three-node clusters were kept online "just in case".
  • The differences in cluster software versions between the two sites was displayed in EC's monitoring panel, but was not alarmed, and disregarded as a 'glitch' by Operations. Interviews show that Ops staff are aware that the monitoring system will sometimes hold onto stale data if it isn't part of an alarm.
  • At the time of the cluster migration Operations was testing a new switch firmware image. The image on the old cluster was determined to have a critical loading bug, which required attention from the switch vendor.
  • Two weeks prior to the event EC performed an update of switch firmware using new code that passed validation. The new firmware was replicated to all cluster members in both sites using automation based on the IP addresses of the cluster members. The old cluster members were not updated.
  • The automation driving the switch firmware update relied on the non-synchronized DNS entries, and reported no problems applying updates. The primary site got the known-good firmware, the backup site got the known-bad firmware.
  • The hot-swap network load triggered the fault in the backup site's switch firmware, causing switches to reboot every 5 minutes.
  • Recovery logic in the application attempted to work around the massive network faults and ended up duplicating some database transactions, and losing others. Some corrupted data was transferred to the primary site before it was fully shut down.
  • Lack of technical personnel physically at the backup site hampered recovery from the backup site and extended the outage.
  • Out of date documentation hampered efforts restart services from a cold stop.
  • The inconsistent state of the databases further delayed recovery.

That is a terrible-horrible-no-good-very-bad-day, yes indeed. However, it shows what I'm talking about here. Several small errors crept in to make what was supposed to be a perfectly handleable fault something that caused many hours of downtime. This fault would have been discovered during the next routine test, but that hadn't happened yet.

Just like the nuke-plant failure, reading this list makes you go "what kind of cowboy outfit allows this kind of thing to happen?"

Or maybe, if it has happened to you, "Oh crimeny, I've so been there. Here's hoping I retire before it happens again."

It happens to us all. Netfix reduces this through the Chaos Monkey, using it to visibly trigger these small failures before they can cascade into big ones. And yet even they fall over when a really big failure happens naturally.

What can you do?

  • Accept that the multiple-failure combinatorics problem is infinite and you won't be able to capture every fail case.
  • Build your system to be as disaster resilient as possible.
  • Test your remediations, and do so regularly.
  • Validate your instrumentation is returning good results, and do so regularly.
  • Cross-check where possible.
  • Investigate glitches, and keep doing it after it gets tediously boring.
  • Cause small failures and force your system to respond to them.

These are all known best-practices, and yet people are lazy, or can't get sufficient management buy-in to do it (a 'minimum viable product' is likely excessively vulnerable to this kind of thing). We do what we can, snark at those who visibly can't, and hope our turn doesn't come up.

Perhaps you've seen this error:

Version mismatch with VMCI driver: expecting 11, got 10.

I get this every time I upgrade a kernel, and this is how I fix it.

The cloud will happen

| No Comments
Like many olde tyme sysadmins, I look at 'cloud' and shake my head. It's just virtualization the way we've always been doing it, but with yet another abstraction layer on top to automate deploying certain kinds of instances really fast.

However... it's still new to a lot of entities. The concept of an outsourced virtualization plant is very new. For entities that use compliance audits for certain kinds of vendors it is most definitely causing something of a quandary. How much data-assurance do you mandate for such suppliers? What kind of 3rd party audits do you mandate they pass? Lots of questions.

Over on 3 Geeks and a Law Blog, they recently covered this dynamic in a post titled The Inevitable Cloud as it relates to the legal field. In many ways, the Law field shares information handling requirements similar to the Health-Care field, though we don't have HIPPA. We handle highly sensitive information, and who had access to what, when, and what they did with it can be extremely relevant details (it's called spoliation). Because of this, certain firms are very reluctant to go for cloud solutions.

Some of their concerns:

  • Who at the outsourcer has access to the data?
  • What controls exist to document what such people did with the data?
  • What guarantees are in place to ensure that any modification is both detectable and auditable?

For an entity like Amazon AWS (a.k.a. Faceless Megacorp) the answer to the first may not be answerable without lots of NDAs being signed. The answers to the second may not even be given by Amazon unless the contract is really big. The answers to the third? How about this nice third-party audit report we have...

The pet disaster for such compliance officers is a user with elevated access deciding to get curious and exploiting a maintenance-only access method to directly access data files or network streams. The ability of an entity to respond to such fears to satisfaction means they can win some big contracts.

However, the costs of such systems are rather high; and as the 3 Geeks point out, not all revenue is profit-making. Firms that insist on end-to-end transport-mode IPSec and universally encrypted local storage all with end-user-only key storage are going to find fewer and fewer entities willing to play ball. A compromise will be made.




However, at the other end of the spectrum you have the 3 person law offices of the world and there are a lot more of them out there. These are offices who don't have enough people to bother with a Compliance Officer. They may very well be using dropbox to share files with each other (though possibly TrueCrypted), and are practically guaranteed to be using outsourced email of some kind. These are the firms that are going into the cloud first, pretty much by default. The rest of the market will follow along, though at a remove of some years.

Exciting times.

Question: When a (IaaS) cloud provider charges per hour for a machine, what's it an hour of? Do I get charged when it's doing nothing? If so, why is that fair?

All the IaaS cloud providers I've run into (which isn't all of them by any stretch) charge by the running hour. If that micro-mini instance is doing nothing but emailing the contents of a single file once a day, it'll still get charged for 24 hours of activity if left on. The same goes for a gargantuGPU instance doing the same work, it'll just cost more to do nothing.

Why is that fair?

Because of resources.

The host machine running all of these virtual machines has many resources. CPU, memory, disk, network, the usual suspects. These resources have to be shared between all the virtual machines. Lets take a look at each and see how easy that is.

CPU

To share CPU between VMs the host has to be able to share execution between them. Much like we do... well, practically everywhere now. We've been doing multiprocess operating systems for a while now. Sharing CPU cycles is dead easy. If a process needs a lot it gets what's available. If it needs none, it gets none. A thousand processes all doing nothing causes... nothing to happen! It's perhaps the easiest thing to share. But, we'll see.

Memory

We've been sharing RAM between processes, with good isolation even, for some time now. Even Apple has joined that game to great effect. Unlike CPU, processes sit on RAM the entire time they're running. It may be swapped out by the OS, but it's still accounted for.

Disk

Disk? Disk is easy. It's just files. Each file gets so much, and more if needed up until you run out. At which point you run into problems. Each VM uses disk to store its files, as you'd expect.

Network

To share network, a host machine has to proxy network connections from a VM. Which... it kinda already does for normal OS processes, like, say, Apache, or MySQL. If a process doesn't need any network resources, none gets used. If it needs some, it uses up to what's available. A thousand processes all doing nothing uses no network resources. Same for VMs really. Its right up there with CPU for ease of sharing.

Now ask yourself. Of these four major resources, which of them are always consumed when a VM (or if you rather, a process) is running?

If you said "memory and disk" you've been paying attention.

If you said "all but network, and maybe even that too", you've been auditing this answer for technical accuracy and probably noticed a few (gross) simplifications so far. Please bear with me!

Now of the two always-consumed resources, memory and disk, which is going to be the more constrained one?

If you look at it from the old memory hierarchy chart based on "how long does the CPU have to wait if it needs to get data from a specific location", you can begin to see a glimmer of the answer here. This is usually measured in CPU cycles spent waiting for data. The lower down the chart you get (faster) the more expensive the storage. A 2.5GHz CPU will have 2.5 billion cycles in a second. Remember that number.

A 7.2K RPM hard-drive, the type you can get in 1TB sizes for cheap, has a retrieval latency of 8.9 miliseconds. Which means that best-case the 2.5GHz CPU will wait  22,250,000 cycles before it gets the data it needs. Thats... really slow, actually.

The RAM in that 2.5GHz server can be fetched in 10 nanoseconds. Which means that best-case the 2.5GHz CPU will wait only... 25 cycles.

Biiiiiiig differences there! RAM is vastly faster. Which means its also vastly more expensive[1]. Which in turn means that RAM is going to be the more constrained resource.



So we have determined that of the four resource types RAM is the most expensive, always-on resource. Because of that, RAM amount is the biggest driver of cost for cloud-computing providers. It's not CPU. This is why that 64MB RAM VM is so much cheaper per-hour than something with 1.6GB in it, even if they get the same CPU resources.

Because RAM amount used is the cost-center, and a 1.6GB VM is using that 1.6GB of RAM all the time, the cloud providers charge by hour of run-time. And this is fair. Now you know.



[1]: How much more expensive? A 1TB disk can be had for $90. 1 TB of RAM requires a special machine (higher end servers), and will run you a bit under $12,000 at today's prices.

Change-automation vs. LazyCoder

| 3 Comments
The lazyCoder is someone sees a need to write code, but doesn't because it's too much work. This describes a lot of sysadmins, as it happens. It also describes software engineers looking at an unfamiliar language. Part of the lazy_coder is definitely a disinclination to write something in a language they're not that familiar with, part of it is a disinclination to work.

It has been said in DevOps circles (though I can't hunt up the reference):
A good sysadmin can probably earn a living as a software engineer, though they choose not to.
A sentiment close to my heart as that definitely applies to me. I have that CompSci degree (before software engineering degrees were common, CSci was the degree-of-choice for the enterprising dot-com boom programmer) that says I know for code. And yet, when I hit the workplace I tacked as close to systems administration as I could. And I did. And like many sysadmins of my age cohort or older, I managed to avoid writing code for a very large part of my career.

I could do it as needed, as proven by a few rather complex scripts I cobbled together over that time. But I didn't go into full time code-writing because of the side-effects on my quality of life. In my regular day to day life problems came and went generally on the same day or with in a couple days of introduction. When I was heads down in front of an IDE the problem took weeks to smash, and I was angry most of the time. I didn't like being cranky that long, so I avoided long coding projects.

Problems are supposed to be resolved quickly, damnit.

Sysadmins also tend to be rather short of attention-span because there is always something new going on. Variety. It's what keeps some of us going. But being heads down in front of a wall of text? The only thing that changes is what aggravating bit of code is aggravating me right now[1]. Not variety.

So you take someone with that particular background and throw them into a modern-age scaled system. Such a system has a few characteristics:

  • It's likely cloud-based[2], so hardware engineering is no longer on the table.
  • It's likely cloud-based[2], so deploying new machines can be done from a GUI, or an API. And probably won't involve actual OS install tasks, just OS config tasks.
  • There are likely to be a lot of the same kind of machine about.

And they have a problem. This problem becomes glaringly obvious when they're told to apply one specific change to triple-digits of virtual machines. Even the laziest of LAZY_CODER will think to themselves:

Guh, there has got to be a better way than just doing it all by hand. There's only one of me.
If they're a Windows admin and the class of machines are all in AD as it should, they'll cheer and reach for a Group Policy Object. All done!

But if whatever needs changing isn't really doable via GPO, or requires a reboot to apply? Then... powershell starts looming[3].

If they're a *nix admin, the problem will definitely involve rolling some custom scripting.

Or maybe, instead, a configuration management engine like Puppet, CFEngine, Chef or the like. Maybe the environment already has something like that but the admin hasn't gone there since it's new to them and they didn't have time to learn the domain-specific-langage used by the management engine. Well, with triple digits of machines to update learning that DSL is starting to look like a good idea.

Code-writing is getting hard to avoid, even for sysadmin hold-outs. Especially now that Microsoft is starting to Strongly Encourage systems engineers to use automation tools to manage their infrastructures.

This changing environment is forcing the lazy coder to overcome the migration threshold needed to actually bother learning a new programming language (or better learning one they already kinda-sorta know). Sysadmins who really don't like to write code will move elsewhere, to jobs where hardware and OS install/config are still a large part of the job.

One of the key things that changes once the idea of a programmable environment starts really setting in is the workflow of applying a fix. For smaller infrastructures that do have some automation, I frequently see this cascade:

  1. Apply the fix.
  2. Automate the fix.

Figure out what you need to do, apply it to a few production systems to make sure it works, then put it into the automation once you're sure of the steps. Or worse, apply the fix everywhere by hand, and automate it so that new systems have it. However, for a fully programmable environment, this is backwards. It really should be:

  1. Automate the fix
  2. Apply the fix.

Because you'll get a much more consistent application of the fix this way. The older way will leave a few systems with slight differences of application; maybe config-files are ordered differently, or maybe the case used in a config file is different from the others. Small differences, but they can really add up. This transition is a very good thing to have happen.

The nice thing about Lazy Coders is that once they've learned the new thing they've been avoiding, they tend to stop being lazy about it. Once that DSL for Puppet has been learned, the idea of amending an existing module to fix a problem becomes something you just do. They've passed the migration threshold, and are now in a new state.

This workflow-transition is beginning to happen in my workplace, and it cheers me.



[1]: As Obi-Wan said, It all depends on your point of view. To an actual Software Engineer, this is not the same problem coming back to thwart me, it's all different problems. Variety! It's what keeps them going.
[2]: Or if you're like that, a heavily virtualized environment that may or may not belong to the company you're working for. So there just might be some hardware engineering going on, but not as much as there used to be. Sixteen big boxes with a half TB of RAM each is a much easier to maintain physical fleet than the old infrastructure with 80 phsysical boxes of mostly different spec.
[3]: Though if they're a certain kind of Windows admin who has had to reach for programming in the past, they'll reach instead for VBScript; Powershell being too new, they haven't bothered to learn it yet.

AMD has released their server-version of the Bulldozer CPU class they released over a month ago, called Interlagos.

Bulldozer/Interlagos is AMD's attempt to grab more of the market from Intel. Currently, it's competing in the value sector but not on performance. The days when AMD CPUs were the virtualization kings have been gone for a couple years now. AMD would like that crown back, thank you, and they're driving to go there.

That said, comparing performance between equivalently clocked AMD and Intel CPUs is hard. They're optimized for different tasks, which means that the smart Systems Engineer looking for the next CPU to base their environment on should pay attention. Workload matters! Those AMD CPUs may be damned cheap compared to Intel, but if you're doing the wrong things with them you'd be better off buying previous-gen Intel chips.

The most controversial thing AMD has done is to make two cores share a Floating Point Unit. They've also done quite a bit of optimization in their Arithmatic Logic Unit, where Integer math is handled. The reasoning behind this is that most server usage these days is integer heavy, highly parallelizeable workloads; most database and simple web-serving workloads are entirely Integer and parallel-friendly, and that's a large part of the webapp stack right there. The likes of Google Plus, StackExchange, and Reddit do far more Integer work than floating-point, so something like Interlagos should be a good fit.

And the early benchmarks show that AMD does indeed have an edge on integer-heavy workloads over equivalent generation Intel parts. Intel still has an edge on compute-performance-per-watt, but AMD holds the edge on compute-performance-per-GHz. Pick which is more important to you.

Specialist workloads like render farms are edge cases, if big consumers, so engineering to handle those workloads is not worth the time. By staking out the middle of the market, AMD can drive innovation in the marketplace by forcing Intel to get creative in the middle. It's good for everyone.



Yes, but what about me, you cry.
Or something. News of the new vSphere 5 pricing guide has leaked out. Kind of like the NetFlix announcement, it has raised a lot of ire on the part of their customers. As would be expected when your preferred vendor announces you'll be paying a lot more.

The key problem has to do with how they're changing the licensing model for vSphere. We knew they'd change it, we just didn't know if they were going to put DRS and HA into a new Enterprise Plus Ultra tier, or do something else. They did something else.

With vSphere 4 the licensing tiers were based on the processor socket, number of cores, and desired features. If you had over 6 cores on that processor, you needed Enterprise Plus to use them all. If you had 6 or fewer, you could go with one of the three cheaper options.

With vSphere 5 the licensing tiers are now based on a combination of processor socket and RAM (as well as features). A 2-core socket counts as much as a 12-core socket in this scheme (yay). Unfortunately, if that dual-socket 12-core server has 256GB of RAM in it, you'll be paying for 6 Enterprise Plus licenses and not the 2 you were paying under vSphere 4. Also? The prices for Enterprise Plus haven't changed, so you just tripled your licensing costs.

vSphere 4's licensing model encouraged cramming as much RAM into a single server as possible. 12-core CPUs and buckets and buckets of RAM. And this happened, since cheaper is always good, and most VM environments are more RAM constrained than CPU constrained. With pricing per socket and not per core, you could maintain efficient RAM-to-Core ratios with licensing efficiency to boot.

vSphere 5's licensing model encourages servers with much fewer cores and a lot less RAM. Keeping a good RAM-to-Core ratio will involve a lot more physical hosts if you wish to maintain licensing efficiency. And you simply won't be able to reach the heights of efficiency you could with vSphere 4.

This is going to be expensive. We'll see if the industry moves as a whole to something else, I'm sure Citrix is salivating at the thought of upgraders upgrading to XenServer and not vSphere, or lumps it and just starts resenting the hell out of VMware the way they already resent (but still use) Oracle.

Is network now faster than disk?

| No Comments
Way back in college, when I was earning my Computer Science degree, the latencies of computer storage were taught like so:

  1. On CPU register
  2. CPU L1/L2 cache (this was before L3 existed)
  3. Main Memory
  4. Disk
  5. Network
This question came up today, so I thought I'd explore it.

The answer is complicated. The advent of Storage Area Networking was made possible because a mass of shared disk is faster, even over a network, than a few local disks. Nearly all of our I/O operations here at WWU are over a fibre-channel fabric, which is disk-over-the-network no matter how you dice it. With iSCSI and FC over Ethernet this domain is getting even busier.

That said, there are some constraints. "Network" in this case is still subject to distance limitations. A storage array 40km from the processing node will still see more storage latencies than the same type of over-the-network I/O 100m away. Our accesses are fast enough these days that the speed-of-light round-trip time for 40km is measurable versus 100m.

A very key difference here is that the 'network' component is handled by the operating system and not application code. For SAN an application requests certain portions of a file, the OS translates that into block requests, which are then translated into storage bus requests; the application doesn't know that the request was served over a network.

For application development the above tiers of storage are generally well represented.

  1. Registers, unless the programming is in assembly, most programmers just trust the compiler and OS to handles these right.
  2. L1/2/3 Cache, as above, although well tuned code can maximize the benefit this storage tier can provide.
  3. Main memory, this is directly handled through code. One might argue that at a low level memory handling constitutes a majority of what code does.
  4. Disk, This is represented by file-access or sometimes file-as-memory API calls. These tend to be discrete calls from main memory.
  5. Network, This is yet another completely separate call structure, which means using it requires explicit programming.
Storage Area Networking is parked in step 4 up there. Network can include things like making NFS connections and then using file-level calls to access data, or actual Layer 7 stuff like passing SQL over the network.

For massively scaled out applications, the network has even crept into step 3 thanks to things like memcached and single-system-image frameworks.

Network is now competitive with disk, though so far the best use-cases let the OS handle the network part instead of the application doing it.

Other Blogs

My Other Stuff

Monthly Archives