It seems like every time a major cloud provider runs into an outage (like yesterday), the topic of cloud SLAs rises to the surface. But in many cases the benefit of an SLA may be limited to using it as a paper handkerchief to dry the tears of frustration of the user, while he waits for his provider to come back online?
At the risk of simultaneously offending two groups (people who believe in wedding vows XOR people with trust in SLA’s) lets compare the two a bit further. In both cases (you would hope) participants start with the best intentions. And although few will consult the agreed paperwork before making daily operational decisions, you would expect they follow policies in line with the original intentions (for example refrain from ...fill in the blanks ... and/or from buying old, refurbished hardware). But once we have a breach, hitting the partner over the head with a copy of the agreement only gives some emotional reprieve.
While discussing SLAs with a colleague we -only partly in jest - concluded that most SLAs are not worth the paper they are printed on, and consequently, on-line SLAs – which are generally not printed – are worth even less. But – you may ask – is that not what penalty clauses on SLA breaches are for? Strangely enough, not many people are using penalty clauses in their wedding vows. Vows are about trust, and having a financial or other compensation clauses seems to contradict this trust. Of course Ronald Reagan popularized the idea of “Trust but Verify” but that was in the context of the cold war, and if your partnering situation is comparable to that, no amount of SLAs or vows will help you.
Not to mention that verifying anything in the cloud is still severely hampered by lack of provider transparency. In many cases success is more about monitoring input and behavior than about monitoring output. Just like a good sales manager should not routinely ask his reps “How much have you sold today!”, but rather “How many new prospects did you call, how many demo’s did you agree and how many proposals did you send?” Likewise, SLAs (and vows) are best monitored by checking policies and behavior upfront. That is also the only way to do some verification of how realistic the given SLAs are. We have all seen the calculation that five components with 95% availability give an overall availability of barely 78% (95%5 or (1-5%)5), however having a spare for each of these five, increases availability to 98.7% ( (1-(5%)2)5 ) , which by the way is still about 7 days of unplanned, unexpected and undesired downtime per year.
If a provider does not offer the transparency needed to verify the numbers (the input), you may conclude that all you can ask for are penalties. But such penalties are known to invite calculative behavior, with partners evaluating – in a rational but not very partner-like way: “What is more expensive? Paying the penalty of addressing the issue?” Which quickly leeds to aforementioned cold war feelings.
PS As I am writing this blog, the Dutch papers are filled with in-depth analysis of –no, not of a cloud outage – but of the latest soccer celebrity divorce. Somehow this seems to be more newsworthy than the EU cup predicament of the Dutch Soccer team, which has somehow managed to miss all its SLAs (to the large frustration of its customers/fans). And just like with today’s cloud outages, the debate about what went wrong (and who is to blame) happens mainly through public interviews, blogs and articles. It seems that when something goes really wrong, transparency quickly becomes the norm again.
Disclaimer: blogs are not research, they are not peer reviewed (except maybe through heated comments below) and yes, I agree some (but not all) SLAs and/or vows may make (some) sense.