Question: When a (IaaS) cloud provider
charges per hour for a machine, what's it an hour of? Do I get
charged when it's doing nothing? If so, why is that fair?
All the IaaS cloud providers I've run into (which isn't all of them
by any stretch) charge by the running hour. If that micro-mini
instance is doing nothing but emailing the contents of a single file
once a day, it'll still get charged for 24 hours of activity if left
on. The same goes for a gargantuGPU instance doing the same work,
it'll just cost more to do nothing.
Why is that fair?
Because of resources.
The host machine running all of these virtual machines has many
resources. CPU, memory, disk, network, the usual suspects. These
resources have to be shared between all the virtual machines. Lets
take a look at each and see how easy that is.
CPU
To share CPU between VMs the host has to be able to share
execution between them. Much like we do... well, practically
everywhere now. We've been doing multiprocess operating systems for
a while now. Sharing CPU cycles is dead easy. If a process needs a
lot it gets what's available. If it needs none, it gets none. A
thousand processes all doing nothing causes... nothing to happen!
It's perhaps the easiest thing to share. But, we'll see.
Memory
We've been sharing RAM between processes, with good isolation
even, for some time now. Even Apple has joined that game to great
effect. Unlike CPU, processes sit on RAM the entire time they're
running. It may be swapped out by the OS, but it's still accounted
for.
Disk
Disk? Disk is easy. It's just files. Each file gets so much, and
more if needed up until you run out. At which point you run into
problems. Each VM uses disk to store its files, as you'd expect.
Network
To share network, a host machine has to proxy network connections
from a VM. Which... it kinda already does for normal OS processes,
like, say, Apache, or MySQL. If a process doesn't need any network
resources, none gets used. If it needs some, it uses up to what's
available. A thousand processes all doing nothing uses no network
resources. Same for VMs really. Its right up there with CPU for ease
of sharing.
Now ask yourself. Of these four major resources, which of them are
always consumed when a VM (or if you rather, a process) is running?
If you said "memory and disk" you've been paying attention.
If you said "all but network, and maybe even that too", you've been
auditing this answer for technical accuracy and probably noticed a
few (gross) simplifications so far. Please bear with me!
Now of the two always-consumed resources, memory and disk, which is
going to be the more constrained one?
If you look at it from the old
memory
hierarchy chart based on "how long does the CPU have to wait
if it needs to get data from a specific location", you can begin to
see a glimmer of the answer here. This is usually measured in CPU
cycles spent waiting for data. The lower down the chart you get
(faster) the more expensive the storage. A 2.5GHz CPU will have 2.5
billion cycles in a second. Remember that number.
A 7.2K RPM hard-drive, the type you can get in 1TB sizes for cheap,
has a retrieval latency of 8.9
miliseconds. Which
means that best-case the 2.5GHz CPU will wait 22,250,000
cycles before it gets the data it needs. Thats... really slow,
actually.
The RAM in that 2.5GHz server can be fetched in 10
nanoseconds. Which
means that best-case the 2.5GHz CPU will wait only... 25 cycles.
Biiiiiiig differences there! RAM is vastly faster. Which means its
also vastly more expensive
[1]. Which
in turn means that
RAM is going to be the more constrained
resource.
So we have determined that of the four resource types RAM is the
most expensive, always-on resource. Because of that,
RAM amount
is the biggest driver of cost for cloud-computing providers.
It's not CPU. This is why that 64MB RAM VM is so much cheaper
per-hour than something with 1.6GB in it, even if they get the same
CPU resources.
Because RAM amount used is the cost-center, and a 1.6GB VM is using
that 1.6GB of RAM
all the time, the cloud providers charge
by hour of run-time. And this is fair. Now you know.
[1]:
How much more expensive? A 1TB disk
can be had for $90. 1 TB of RAM requires a special machine (higher
end servers), and will run you a bit under $12,000 at today's
prices.