It’s both strange and humbling to suddenly understand just how completely we’ve come to rely on cloud computing to run many of the routine aspects of our lives. Yesterday, Amazon Web Services, which is the major cloud host for web-based businesses large and small, experienced an outage and brought down a sizable chunk of the Web as a result.
From AirBNB to Foursquare to Reddit to Coursera to Flipboard, big chunks of the Web were down for much of the day while Amazon scrambled to get its servers back online. Think about that: everything from room rentals to news sites to education providers was out of commission for most of the day. No transactions. No eyeballs. No revenue. Nothing.
Last spring, another Amazon outage shut down Pinterest, Instagram and Netflix, among many others, and Amazon and its hosted customers all declared important lessons had been learned. Indeed, steps would be taken, we were assured, to avoid another outage of that scale.
And yet here we are again. Many, many companies were affected and so were millions of their customers. Heck, a chain of sandwich shops, one of which is down the street from my office in San Francisco, relies on flat screen monitors to display the menus and iPads to enable quick orders and payment. It was all down because the company runs both its online ecommerce and iPad-based point-of-sale transactions in their physical locations from the Amazon Cloud. (Luckily, their traditional cash registers still worked.)
While the services that were out don’t represent particularly vital businesses – no lives were lost because they were down (so far!) – it does illustrate the degree to which our everyday lives are increasingly tethered to cloud-based services. Cloud computing has revolutionized business creation and services delivery, lowering costs and increasing efficiency while ensuring wide-spread availability. But vulnerabilities abound.
One way to mitigate the chances of a company being brought to its knees due to a single outage at a place like Amazon is to spread the risk. For instance, IT folks or sys admins should spread risk within Amazon itself. That means spreading what’s called elastic load balancers across a variety of Amazon availability zones (AZs) within one data center and across geographies. It’s also clear that to truly mitigate risk, companies should consider multiple cloud hosts to provide the sorts of redundancy that would inoculate against any one host imploding. For instance, if you primarily host on Amazon, consider also hosting on Rackspace.
Look, bad things happen. Occasionally, the electrical grid for an entire region goes down and we experience massive, days-long blackouts. Bridges collapse. And cloud hosts go down. But where the latter infrastructure is concerned there are steps that can be taken to reduce vulnerabilities and insure that customers can continue to access essential cloud-based services, even when someone like Amazon is having a bad day.