1 (866) 866-2320 Resources Events Blog

10 Insights into Recent Outages from the Experts and How to be Prepared


10 Insights into Recent Outages from the Experts and How to be Prepared


In cloud, is relying on a single provider for your entire infrastructure inviting problems? 

The recent high profile outages with AWS, Twitter, and Azure bring renewed attention to preventive measures IT folks need to take to handle Cloud service disruptions of any scale. How do these demonstrate the ability to abstract the management process from the infrastructure and to see the interdependencies and the failures that plague complex IT systems? 

Here are 10 perspectives from experts at ComputerWeekly, NetworkWorld, ZDNet, InformationWeek, and more sharing their insights into the impact of recent outages and to be better prepared.

1. There can often be a disconnect between decision-makers and people with actual, implementable knowledge of the realities of IT.

If this latest crisis has proven anything, it's that IT is not something that should ever be compromised. In order for NatWest, and other companies with similar circumstances, to prevent another instance of this altogether avoidable breakdown, they must pledge to re-examine and thoroughly address the weaknesses of their IT solutions.

By Rawiya Kameir, ITProPortal 
What have we learned from NatWest's banking blunder?

2. Architect for failover.

if one needs to be available all the time, he needs to architect for failover, not just for scaling, as Kavis said: "What we need to understand is that many companies in the Virginia area who built their own datacenters were down too and some still are. Power outages happen. Data centers fail both in the cloud and on-premise. Everything fails eventually. The secret to uptime is how you design for these failures."

By Abel Avram, InfoQ
Avoiding Downtime When Cloud Services Fail

4. Changing Hosts Is Not Enough, You Need a Redundant Cloud.

Changing hosts because the host has gone down doesn't actually resolve the problem. Instead, companies seeking to leverage the cloud should also make sure that they make use of its capability to create geographically redundant links. 

By Thor Olavsrud, Senior Writer at CIO 
Do Customers Share Blame in Amazon Outages?

5. Organizations really need to utilize more than one provider to avoid a single point of failure.

The problem is that business processes, applications and computing infrastructure are too intertwined and dependent on each other. If the infrastructure isn't configured just right or is unavailable, the business process stops. The industry has made great strides in abstracting the physical computing infrastructure from the applications it supports. Amazon and VMware have created tremendous value and built businesses by abstracting (or insulating) applications and users from hardware diversity and failures. 

By Randy Clark, CMO of UC4 Software 
Dealing with outages -- are we ready?

6. Many businesses still do not have measures to insulate themselves from big service provider outages.

There will always be risks and outages, and in a way, Reeves says that's a good thing: It keeps companies and end users on their toes. "I really believe that outages can propel the cloud further rather than hinder it," he says. "If we learn from these mistakes, both customers and providers, and make our systems safer and more secure, that can be a good thing for the industry as a whole 

By Brandon Butler, Staff Writer at Network World 
Amazon outage one year later: Are we safer?

7. A slew of previously unseen bugs extended the downtime

It was a variety of unforeseen bugs appearing in Amazon's software that caused the outage to last so long: for example, one datacentre failed to switch over to its backup generators and eventually the stores of energy in its uninterruptible power supply (UPS) were depleted, shutting down hardware in the region. 

By Jack Clark, Enterprise Infrastructure Reporter, ZDNet UK at CBS Interactive
Amazon Web Services: The hidden bugs that made AWS' outage worse

8. Ensure correct ELB configurations.

One of the advantages of using Elastic Load Balancers (ELBs) is they can automatically reroute traffic based on availability and need. But Newvem found that up to 20% of heavy users aren't properly configuring their ELBs either. One of the most common misconfigurations is to reroute ELB traffic within the same availability zone (AZ). AWS has multiple availability zones within its regions, which are meant to be isolated from one another. By not configuring the ELB to route traffic to a separate AZ, users aren't protected if their AZ is impacted. 

By Brandon Butler, Staff Writer at Network World
Four tips to prepare for the next Amazon outage

9. Don't take chances with data availability.

Every time there is an outage at one of the major cloud providers, it raises new concerns about the cloud and cloud storage. If you're planning to move some or all of your data to the cloud, how can you avoid losing access to that data when an outage occurs? 

By George Crump, Chief Steward, Storage Switzerland 
One Way To Avoid Cloud Outages

10. Beware the "fog of virtualization."

Beware the "fog of virtualization." The phrase comes from Craig Labovitz of DeepField Networks. What it means is that as we move more stuff into the cloud, it's going to become harder and harder to understand where supply chain dependencies and potential weak points lie. I'm betting that some of the people affected by last week's event didn't even realize they were vulnerable to an Amazon outage. That dependency can be disguised by layers of application and/or infrastructure abstraction. 

By Mari Silbey, Technology Writer and Consultant - Broadband, Wireless, Digital Media
3 Takeaways from Amazon's latest cloud outage

Your Turn
Are you prepared for the next outage?

About the Author
Syed Raza and Martin Perlin