It’s Complicated
By now, the entire world has heard about Amazon’s much publicized outage last Friday. The fact that they had an outage this severe is surprising given their track record. What really shook me was how the situation was explained (or *not* explained).
The short answer, as revealed to ZDNet: “These are complicated systems”
Perhaps they’re taking their cues on explaining outages the way Facebook let’s its users explain odd relationships: “It’s Complicated” (though I think Facebook took out that feature a while back)
It doesn’t take a first hand tour of Amazon’s data centers in order to gauge how truly expansive their infrastructure is. Global facilities, hundreds of thousands of servers, and tons of complex apps all serving the most expansive e-commerce platform in the world. It’s precisely this expertise, scale, and complexity that is often quoted as the rationale for why they are leading the charge around cloud computing.
The various articles describing the outage all wondered whether any of the Amazon Web Services were affected by the outage. I say “wondered” mostly because there was really no way of telling! The AWS site itself was also unavailable!
It appeared that Amazon Web Services such as the S3 storage and EC2 computing services continued to function at least for some customers, though the Amazon Web Services page at Amazon.com wasn’t working.
This left all of the users and businesses (including some Hyperic customers) who are relying on Amazon’s expertise and track record to essentially hope that the services they’re consuming are healthy and available. The press covering the outage had to rely on calling on actual end users and asking them (awkwardly, I’m sure) “Hey, is your app working right?” Luckily, it appears that the AWS services were not affected by the outage based on the accounts from SmugMug and Mashery.
There’s no denying that Amazon is great at operations and that they truly do have incredible infrastructure. That said, it’s yet another wake-up call to those who think that simply offloading their computing to the cloud (whether Amazon’s or anyone elses) implicitly insulates them from outages. Consumers of the cloud need to be smarter about this, and the providers of the cloud need to be much more transparent about their service levels.
I imagine an AWS customer calling into the support tech’s at Amazon would likely get a response with the same level of detail: “It’s complicated”
If you like this post then please consider subscribing to our RSS feed. You can also subscribe by email and have new articles sent directly to your inbox.
