open
  1 (866) 866-2320 Resources Events Blog

Configuration Change Knocks out Gmail

Blog

Configuration Change Knocks out Gmail


 

Google's web-based email service, Gmail, crashed on Monday afternoon with widespread outage reports coming from around the world.

DataCenter Knowledge reported, "On at least three occasions, Gmail downtime has been traced back to software updates in which bugs triggered unexpected consequences. "

HuffPost Tech posted "PANIC: Gmail is down, according to perplexed Twitterers and HuffPostTech's own experience trying to access unresponsive accounts."

GigaOM emphasized that "people on Twitter are reporting Chrome crashes left and right, and some of those with more experience on the systems administration and development side are blaming Chrome's crashes on whatever is plaguing Gmail."

Twitter was quickly tweeting about the situation, with some humor-filled updates:

(see more humorous takes on the Gmail outage here)

What Caused Such an a Widespread Outage?

As data center infrastructure becomes more and more complex and the pace of change increases, the challenge of keeping track grows. 

The outage, according to Google, has been attributed to Google's Sync Server, in relying on a component to enforce quotas on per-datatype sync traffic, failed. The quota service "experienced traffic problems today due to a faulty load balancing configuration change."

It's troubles like this which very clearly highlight the pressure that configuration management is under. Minor IT problems can slip into the system all the time, both authorized and unauthorized. As shown, an infrastructure the size of Google can be undermined by a seemingly minor change. This means that really any minute mis-configuration or omission of a single configuration parameter can potentially spiral into an incident that can result in an outage, resulting in harmed reputation, angry customers, , legal liabilities, and even financial implications.

Google's Detailed Explanation

Google has not yet officially explained the cause of the outage that affected Gmail and the Chrome browser. However ZDNet summed up the situation by saying "Google changed something, it didn't work, and it caused the crashes. No hackers were involved, and the outage and crashes certainly were not a result of a denial-of-service attack."

Today's IT Operations

For IT Operations, change is a fact of life. Change takes place at every level of the application and infrastructure stack and impacts nearly every part of the business. When virtual environments co-exist with physical and cloud entities, it is much harder to troubleshoot performance issues and IT admins may spend hours trying to isolate a problem.

IT Operations Analytics

Today IT operations faces new levels of challenges that can no longer be handled with existing approaches. This means applying some more serious brain power to help deal with the complexity and dynamics of today's IT environments. Learn how Evolven's new IT Operations Analytics approach delivers the intelligence that IT operation organizations are craving, allowing them to turn piles of IT operations' configuration data into actionable information.

Your Turn
Are YOU ready to validate configuration changes?

About the Author
Martin Perlin