I believe most SLA’s (Service Level Agreements) are meaningless.
In the world of Software as a Service and cloud computing it has become a very popular topic, but the reality is very different from theory.
In theory, every service provider promises 99.999% of availability which means less than 6 minutes per year.
In reality, even the best services (Amazon, Google, Rackspace) had events of 8 hours of availability problems which means they are at 99.9% availability, at best.
High Availability 99.999 Downtime Table from Wikipedia
Moreover , the economics just don’t make any sense. SLA’s can not replace insurance.
Imagine the following scenario.
E-commerce site “MyCatsAndSnakes.Com” builds its consumer site in “BestAvailabilityHosting” which uses networking equipment from “VeryExpensiveMonopoly, INC.
If MyCatsandSnakes is unavailable, the site owner “Rich Bastardy” loses $100,000 per hour of downtime.
Rich pays BAHosting $20,000 per month and they promise him %99.999 avilability.
BAHostig bought two core routers in high availability mode ,connected to three different ISP’s. Each router costs $50,000 and Platinum support is another %30 per year. So total cost is $130,000 for the first year.
One horrible day, the core routers have a software bug and the traffic to the MyCatsandSnakes is dead.
Since the routers have the same software the high availability does not help to resolve the issue and VeryExpensiveMonopoly top developers have to debug the problem on site. after 8 hours of brave efforts, cats and snakes are being sold online again.
Try to guess the answers to the following questions:
How much money did Rich lose? (Hint: $100,000*8 )
- How much money would Rich get from BestAvailabilityHosting? ( Hint: (8/(24*30))*$20,000 = $166 )
- How much money would BAHosting get back from VeryExpensiveMonopoly? (Hint:$0)
The networking vendor,VeryExpensiveMonopoly, does not give any compensation for equipment failure. This is true for all hardware and software vendors.
They don’t even have SLA for resolution time. The best you can get with platinum support is “response time”, which is not a great help.
As a result , the hosting provider can not have back to back guarantee or insurance for failures in networking.
The hosting provider limits its liability to the amount of money it receives from Rich ($20,000 per month), which makes sense.
Moreover, the service provider would only compensate Pro Rata, so the sum becomes even more neglible.
But that does not help Rich at all, as his losses are far bigger. He lost $800,000 of cats and snakes deliveries to young teenagers across Ohio.
The real answer, IMO, is “Insurance”. If Rich really wants ro mitigate his risk, he can buy an insurance for such cases.
The insurance company should be able to asses the risk and apply the right statistical costs model . Asking a service provider to do it is useless.
SLA’s might be a good way to set mutual expectations, but they are certainly not a replacement for a good insurance policy or a DRP.
Here is an interesting review of CRM and SalesFore.Com (lack of ?) SLA . And here is Amazon’s SLA for EC2 and RackSpace.
Amazon: “If the Annual Uptime Percentage for a customer drops below 99.95% for the Service Year, that customer is eligible to receive a Service Credit equal to 10% of their bill”
GoGrid promises 10,000% but “No credit will exceed one hundred percent (100%) of Customer’s fees for the Service feature in question in the Customer’s then-current billing month”
RackSpace promises 100% avilability , but “Rackspace Guaranty: We will credit your account 5% of the monthly fee for each 30 minutes of network downtime, up to 100% of your monthly fee for the affected server.”
Again, i don’t think one can blame these service providers, but the gap from the perception seems major.
There are three real answers for customers who want an SLA from a service provider:
1) It would be better than on premise
2) How much are you willing to pay for extra availability?
3) We have a great insurance agent