10 Devastating Outages and Failures of Major Brands in 2011
Thought that you were alone? As you can see, even the biggest in the business suffer downtime and outages. This list shows that ongoing failures and performance problems cost companies both in lost revenues and damaged reputations. The business effectively shuts its doors, puts out the 'We are Closed' sign, and then is left wondering 'Will our customers be coming back?'
Outages, downtime and performance problems can be significantly minimized by carrying-out on-going monitoring and analyzing what has changed in the system, to avoid being the next big failure story.
Heard of more big name outages? Had a nightmare downtime story yourself? Please share below.
1. Bank of America Online Banking Down Across U.S.
Duration: 6 days
Impact: Affected 29 million online customers
What Happened: For six consecutive days, the site delivered a series of slowdowns and outages, which the bank attributed to a combination of technical issues and higher than anticipated website traffic. The problem was noted as the result of a "multi-year project" to upgrade its online banking platform.
Link: News Coverage
2. Amazon EC2 Goes Dark In Morning Cloud Outage
Duration: 4 Days
Impact: The percentage of "stuck" single-AZ database instances in the affected Availability Zone decreased steadily during the event as the EBS recovery proceeded, down to 41.0% at the end of 24 hours, 23.5% at 36 hours and 14.6% at the end of 48 hours, and the rest recovered throughout the weekend.
What Happened: The trigger for this event was a network configuration change. This involved a subset of the Amazon Elastic Block Store ("EBS") volumes in a single Availability Zone within the US East Region that became unable to service read and write operations, making them "stuck" volumes. This caused instances trying to use these affected volumes to also get "stuck" when they attempted to read or write to them. As with any complicated operational issue, this one was caused by several root causes interacting with one another and therefore give Amazon many opportunities to protect the service against any similar events from reoccurring. The changes that Amazon made provide them with protections against having a repeat of this event.
Link: News Coverage
3. Intuit Service Outages Leave Frustrated Customers
Duration: 2 days (some users up to 5 days)
Impact: Thousands affected
What Happened: The problem was caused by a change to their network configuration. This blocked customer access to some of Intuit services including TurboTax Online, QuickBooks Online, Quicken and QuickBase. This took place during scheduled maintenance.
Link: News Coverage
4. Google Suffers First Gmail Outage of 2011
Duration: 2 days
Impact: 120,000 users affected
What Happened: After analyzing the issue, Google Engineering determined that the root cause was a bug inadvertently introduced in a Gmail storage software update. The bug caused the affected users' messages and account settings to become temporarily unavailable from the datacenters.
Link: News Coverage
5. BlackBerry Outages Spread Throughout the World
Duration: 24 hours (some more)
Impact: Unavailable worldwide affecting millions of users around the world
What Happened: Blackberry's manufacturer Research in Motion (RIM) has blamed the continuing service outages on a "core switch failure within RIM's infrastructure."
Link: News Coverage
6. Yahoo Mail Suffers Outage
Duration: 24-hours plus
Impact: Affecting people around the globe
What Happened: Yahoo acknowledged that it had some troubles with its services, saying that "Some Yahoo services are currently inaccessible to some users in certain locations"
Link: News Coverage
7. Microsoft Windows Live Hotmail E-Mails, Inboxes Disappear
Duration: 24-72 hours
Impact: Wiped out the content of many users' Windows Live Hotmail e-mail service inboxes and moved some messages to a deleted mail folder.
What Happened: Microsoft did not detail what caused the wipeout and said that not all messages were restored or received right away.
Link: News Coverage
8. Verizon Suffers Series of Data Outages
Duration: 24 hours or more
Impact: Large number of states, with users from California to Maryland saying they're unable to get LTE service.
What Happened: According to Verizon the root cause is "Growing pains". Verizon experienced one outage in April and then three more in December. In April, the new ultra-fast 4G network experienced a string of three outages in a single month, shutting down access to smartphone and wireless hotspot customers across the country.
All three outages were caused by problems in Verizon's service delivery core — called the IP Multimedia Subsystem (IMS) — which replaced the old signaling architectures used in 2G and 3G networks.
Link: News Coverage
9. Netflix Streaming Service Hit by Outage
Duration: 4 to 8 hours (depends who you ask)
Impact: 20,000,000 users affected
What Happened: Netflix wasn't open about this outage, they simply stated that the problem was "a rare technical issue".
Link: News Coverage
10. Microsoft Sorry For E-Mail-Killing BPOS Cloud Outages
Duration: Six and nine hours delays
Impact: Affected Business Productivity Online Services (BPOS) cloud computing suite, which caused massive delays with BPOS users' e-mails.
What Happened: The BPOS-S Exchange service experienced an issue with one of the hub components due to malformed email traffic on the service. Exchange has the built-in capability to handle such traffic, but encountered an obscure case where that capability did not work correctly. The result was a growing backlog of email.
Link: News Coverage