Facing unreasonable requests

Over on ServerFault we had a question become rather hot lately. The key part:

We received an interesting "requirement" from a client today.

They want 100% uptime with off site fail over on a web application. From our web apps viewpoint, this isn't an issue. It was designed to be able to scale out across multiple database servers etc.

However, from a networking issue I just can't seem to figure out how to make it work.
100% uptime? Anyone who has been in this business for a while knows that 100% either doesn't exist, or only exists when examining small timescales. Our eDiscovery hosting platform had 100% uptime... for September. For the quarter? No, we had a well announced major outage in August while we relocated some servers to a new rack and performed network-backbone upgrades.

As it happens we faced a similar requirement from a potential client a while back. They demanded a full refund of that month's fees if the service was unavailable to their people at any time when someone tried to use it. Personally, I wasn't directly involved in these negotiations, I saw that as a nice opening offer in a negotiation rather than an ultimatum. Unless you've got boilerplate service-contract language they're trying to amend, I believe 'initial position' is the best way to frame these sorts of "requirements".

This particular client had scoped their 100% uptime, single month, had provided service metrics, if they ever notice it is down, as well as a penalty, we don't get paid. Clearly they had thought this out. I personally wanted to know what they thought of planned and announced service outages, and eventually the response came back as "same as unplanned". Ok, that's an initial position.

At that point we could:

  • Dicker about the scale of the penalty (pro-rate for any hour/day/week an outage is noticed?)
  • Dicker about planned outages that they can clearly work around?
  • Dicker about using a third party to assess downtime rather than be purely defined by the client?
  • Dicker about downtime attributable to forces beyond our control (such as something screwy happening route-wise in the Internet core, as has happened a few times, or their own firewall going fritzy)?
Uptime requirements from paying customers is nothing new, but it's also the kind of thing that can show up in internal SLA negotiations for large organizations. As State budgets continue to shrink IT charge-back schemes are becoming more and more common in areas that previously didn't have them, so these kinds of demands can arise from internal customers just as often.

The best defense is to have downtime concerns addressed in your boilerplate service contract, much like Amazon does. If an entity wants special treatment, they can work to have a special contract written up but that's a lot of pushing boulders up-hill so only the specialest of snowflakes do this kind of thing. But if you don't have boiler-plate, be prepared to dicker over their initial position.