Charging by the hour, a story of clouds

Question: When a (IaaS) cloud provider charges per hour for a machine, what's it an hour of? Do I get charged when it's doing nothing? If so, why is that fair?

All the IaaS cloud providers I've run into (which isn't all of them by any stretch) charge by the running hour. If that micro-mini instance is doing nothing but emailing the contents of a single file once a day, it'll still get charged for 24 hours of activity if left on. The same goes for a gargantuGPU instance doing the same work, it'll just cost more to do nothing.

Why is that fair?

Because of resources.

The host machine running all of these virtual machines has many resources. CPU, memory, disk, network, the usual suspects. These resources have to be shared between all the virtual machines. Lets take a look at each and see how easy that is.

CPU

To share CPU between VMs the host has to be able to share execution between them. Much like we do... well, practically everywhere now. We've been doing multiprocess operating systems for a while now. Sharing CPU cycles is dead easy. If a process needs a lot it gets what's available. If it needs none, it gets none. A thousand processes all doing nothing causes... nothing to happen! It's perhaps the easiest thing to share. But, we'll see.

Memory

We've been sharing RAM between processes, with good isolation even, for some time now. Even Apple has joined that game to great effect. Unlike CPU, processes sit on RAM the entire time they're running. It may be swapped out by the OS, but it's still accounted for.

Disk

Disk? Disk is easy. It's just files. Each file gets so much, and more if needed up until you run out. At which point you run into problems. Each VM uses disk to store its files, as you'd expect.

Network

To share network, a host machine has to proxy network connections from a VM. Which... it kinda already does for normal OS processes, like, say, Apache, or MySQL. If a process doesn't need any network resources, none gets used. If it needs some, it uses up to what's available. A thousand processes all doing nothing uses no network resources. Same for VMs really. Its right up there with CPU for ease of sharing.

Now ask yourself. Of these four major resources, which of them are always consumed when a VM (or if you rather, a process) is running?

If you said "memory and disk" you've been paying attention.

If you said "all but network, and maybe even that too", you've been auditing this answer for technical accuracy and probably noticed a few (gross) simplifications so far. Please bear with me!

Now of the two always-consumed resources, memory and disk, which is going to be the more constrained one?

If you look at it from the old memory hierarchy chart based on "how long does the CPU have to wait if it needs to get data from a specific location", you can begin to see a glimmer of the answer here. This is usually measured in CPU cycles spent waiting for data. The lower down the chart you get (faster) the more expensive the storage. A 2.5GHz CPU will have 2.5 billion cycles in a second. Remember that number.

A 7.2K RPM hard-drive, the type you can get in 1TB sizes for cheap, has a retrieval latency of 8.9 miliseconds. Which means that best-case the 2.5GHz CPU will wait  22,250,000 cycles before it gets the data it needs. Thats... really slow, actually.

The RAM in that 2.5GHz server can be fetched in 10 nanoseconds. Which means that best-case the 2.5GHz CPU will wait only... 25 cycles.

Biiiiiiig differences there! RAM is vastly faster. Which means its also vastly more expensive[1]. Which in turn means that RAM is going to be the more constrained resource.



So we have determined that of the four resource types RAM is the most expensive, always-on resource. Because of that, RAM amount is the biggest driver of cost for cloud-computing providers. It's not CPU. This is why that 64MB RAM VM is so much cheaper per-hour than something with 1.6GB in it, even if they get the same CPU resources.

Because RAM amount used is the cost-center, and a 1.6GB VM is using that 1.6GB of RAM all the time, the cloud providers charge by hour of run-time. And this is fair. Now you know.



[1]: How much more expensive? A 1TB disk can be had for $90. 1 TB of RAM requires a special machine (higher end servers), and will run you a bit under $12,000 at today's prices.