A Configuration Change Takes Down Facebook for Almost 3 Hours
Facebook is calling it the worst outage of the site in four years. The 500 million strong social networking platform went out for nearly 3 hours last week. "Facebook can't afford to have any downtime," said Debra Aho Williamson, an analyst with EMarketer Inc. in Seattle, in the SF Gate report. "Downtime means time they don't get advertisers in front of people."
In explaining, the source of the problem the director of engineering explained:
Today we made a change to the persistent copy of a configuration value that was interpreted as invalid. This meant that every single client saw the invalid value and attempted to fix it. Because the fix involves making a query to a cluster of databases, that cluster was quickly overwhelmed by hundreds of thousands of queries a second.
Just a single change impacted the availability of the entire Facebook platform.
Moreover, beyond understanding the technical reasons for bringing down Facebook, what is more impressive is how 2.5 hours of downtime made an impact around the world. The issue quickly ranked high in Twitter's trending topics (outranking talk of Facebook's founder Mark Zuckerberg and the new movie about his meteroric rise).
The outage had a huge impact in the Googlesphere as well.
15 of the top 16 Google search trends in the US included Facebook not being accessible to users.
It is understandable when one or two terms end up as "Hot searches", however, having almost 16 out of 20 shows how important that Facebook's downtime affected people's lives. The same trends were reported in several other countries where Facebook went down during 'off' hours.
Forbes. The Wall Street Journal. TechCruch. The Los Angeles Times.
Any minute mis-configuration or omission of a single configuration parameter can quickly lead to an incident with high impact: reputation damage, dissatisfied customers, and financial losses. When your service experiences downtime, your customers and the world will know about it – and faster and at a wider scale than ever before.
A Google News search returned over 17,000 stories and related items. If you googled "Facebook down" then the two-word phrase was searched for approximately 7 times more than it is on average, according to data pulled from Google Trends.
A few more headlines that we collected:
- Facebook Downtime Means Real-Life Repercussions for Blogosphere (Forbes)
- Facebook Gives A Post-Mortem On Worst Downtime In Four Years (TechCrunch)
- Facebook experiencing outage: Company says site is 'slow or unavailable' (LA Times
- Facebook Downtime: IT Outage Or Under Attack? (Forbes)
- What Caused Facebook's Worst Outage in Four Years (Wall Street Journal)
NEXT STEPSFind out how to keep your IT environment running smoothly, and keep downtime to an absolute minimum.
- Learn how Evolven can identify the small configuration changes that can have a major impact on IT environment stability and ultimately the business.