The Q2’19 shutdown roundup: The latest and biggest IT outages
Even though IT outages have become a common evil, just thinking about a major one makes IT professionals break out in a cold sweat. These can cause serious damage on many levels, from operational to promotional ones. To make matters worse, even the largest organizations in the world can be subject to major outages, even after having dedicated large budgets to prevent them.
Even a few minutes of downtime can lead to a severe crisis affecting an organization’s operations, bottom line, business reputation, and level of customer trust.
To raise industry awareness of the risks associated with outages, we’ve decided to publish a periodical roundup of major IT outages that have taken place during the past few months and gained the worst type of PR imaginable.
We believe that knowing your enemy is the first step towards winning the battle….
Let’s get started - shall we?
April 2019 - Major outage hits five airlines!
What happened? An issue with a non-federal program called AeroData, which is responsible for aircraft weight and balance planning, affected several companies’ mainline and regional operations.
The road to recovery: Around 5 hours.
Count the losses: While the total amount was not disclosed, United Airlines alone stated that around 150 of its flights were affected.
Based on our analysis of multiple industry benchmarks, we estimate that this outage caused the loss of tens of millions of dollars.
April 19 - Jetstar flights in Australia experienced significant delays.
What happened? An IT outage took down Jetstar’s passenger check-in systems.
The road to recovery: About an hour.
Count the losses: Multiple flights experienced delays, massive queues formed at check-in, and no check-ins could occur throughout the duration of the outage.
April 2019 - Facebook outage is linked to ad buyers
What happened? This outage was well covered in the press, and we dedicated an article to it: A server configuration issue shut down Facebook, Messenger, WhatsApp, Instagram, and Oculus.
The road to recovery: About 5 hours.
Count the losses: The implications associated with this outage were many. Facebook reportedly earns over $90 million a day in ad revenue, and so the cost of Facebook’s Ad Manager being down for 5 hours was massive. Moreover, ad-buyers continued to report long-term impacts of the outage 2 weeks after it was resolved.
May 2019 - Optus outage affected Netflix, Xbox online, & more
What happened? The outage denied customers access to many websites, including Netflix.
The road to recovery: At least 4 hours.
Count the losses: The damage to the company’s reputation was massive, as many frustrated customers expressed their anger on social media, demanding a refund.
May19 - Azure outage in Australia
What happened? A name server delegation issue with DNS resolution damaged the network’s connectivity.
The road to recovery: A couple of hours.
Count the losses: Even though Azure services couldn’t be reached, it was still running, and this helped minimize the damage. Still, the global incident impacted a whole range of Microsoft cloud services, causing connection problems for core services like Azure, multiple services under the Microsoft 365 umbrella, Dynamics, and DevOps.
May 19 - Salesforce down for 15 hours!
Click to watch the following video for more details on this lengthy outage:
The road to recovery: Around 15 hours!
Count the losses: Critical business and PR damage due to countless Marketing and Sales teams being unable to function across Europe and the US.
As with Salesforce, outages involving any other company that offers its customers online services can have significant financial consequences, like having to provide credits to customers or the payout of penalties. Note that Salesforce shares were down by as much as 3% on Friday, and so loss of value is still another type of financial consequence of an outage to a company like Salesforce.
Learn more about Salesforce admitting in an announcement that this outage could lead to a significant earnings loss.
June 2019 - eBay stopped working
What happened? The entire eBay website and search function crashed.
The road to recovery: About an hour
Count the losses: The incident heavily affected both eBay itself and its sellers’ businesses. Through the entire outage time bidding on products was disabled.
June 19 - The Cloud brings no guarantee for zero issues
What happened? A network congestion issue affected several services, including Google Cloud, G Suite, and YouTube.
The road to recovery: About 4 hours.
Count the losses: The outage affected companies that rely on Google’s services, such as Shopify, Snap, Discord, and many others. YouTube suffered a ~10% view drop (think about the ad impressions lost), and Google Cloud Storage suffered a 30% decrease in traffic.
According to TC: “The outage hit everything from the ability to control the temperature in people’s homes and apartments through Google’s Nest to shopping on any service powered by Shopify, Snapchat and Discord’s social networks.”
Mashable provided a quote by one of the many e-shops that were affected: "We probably lost out several thousands of dollars in sales the five or so hours it was down," Stith told Mashable via email. "Not only that, we are having a summer sale right now so we have more orders than we have ever had in the history of the business at once that are not yet fulfilled. Around 500 orders currently not filled."
To summarize, such an outage will likely have long-term implications that are hard to evaluate in the current time. If the largest tech giant in the world, i.e., Google, can suffer a cloud outage, than every other enterprise has the potential to be just as vulnerable.
June 19 - The secondary waves following Google outage
What happened? Google’s cloud infrastructure, on which the company depends, went down.
The road to recovery: Approximately 4-5 hours.
Count the losses: Countless lost sales were angrily reported, as mentioned above (in the chapter describing Google Cloud outage).
June 19 - Google, again, hits the news
What happened? Issues related to the calendar’s desktop web application. Although this happened in the same month as the major Google outage reported above, it’s a completely different incident. ironically, the issue started immediately after the following tweet was posted:
The road to recovery: Around 3 hours.
Count the losses: It’s hard to put a price tag on canceled meetings across the US, Europe, and parts of South America, but we imagine the damage to be extremely severe.
June 19 - Nationwide outage in Target registries
What happened? Target’s registers went offline.
The road to recovery: 2 hours, but the company experienced additional problems shortly thereafter. Keep reading.
Count the losses: Target stores had to temporarily close due to this outage. You can only imagine the impact on the bottom line….
June 19 - Here we go again…
What happened? Once again, Target’s registers shut down.
The road to recovery: About an hour.
Count the losses: Stores were unable to accept credit card payments. On Twitter, #Targetgeddon and #TargetApocalypse trended as people visualized the outage. Photos and videos revealed long lines, annoyed people, and a general sense of chaos in Target stores nationwide.
Many people couldn’t take it and just took off:
June 19 - A significant number of users can’t access their storage
What happened? The company’s website, desktop application, and API experienced issues.
The road to recovery: At least 2 hours.
Count the losses: The damage caused by the outage is unknown, but a small PR crisis erupted on social media.
June 19 - Instagram is down
What happened? Users could not refresh feed, upload photos or videos, or access their accounts.
The road to recovery: Over an hour.
Count the losses: Popular services dealt with the resulting PR damage almost immediately. During the outage, the hashtag #instagramdown was trending on Twitter. Also, as Instagram monetizes based on ads, you can imagine the amount of lost ad-impressions (that probably, in our real-time programmatic bidding era, went elsewhere).
June19 - Netflix experienced issues
What happened? More than half of the complaints reported an issue with the Netflix Website, while around 30% reported connectivity problems.
The road to recovery: A few hours.
Count the losses: It’s hard to estimate the cost of such an outage, but, as is often the case with popular B2C services, the crisis quickly spread to all social media.
June 19 - Slack is having a global performance issue
What happened? Worldwide outage caused by performance degradation issues impacted users from all over the world, with multiple services reported as being down.
The road to recovery: A few good (or bad) hours.
Count the losses: In addition to costs associated with poor communication at companies across the globe, Slack shares were down about 1% following the outage.
June 19 - A several days’ outage
What happened? Customers were unable to return or pick up online orders at Sears and Kmart stores. In some areas, deliveries were also affected, and stores had issues processing payments.
The road to recovery: 3 days. Yes, DAYS.
Count the losses: Imagine the cost of full days without orders or payments at these retail giants.
This list covered some of the outages that reached the press in April, May, and June 2019. Even the most comprehensive list cannot cover every recent major outage, due to the simple fact that such incidents continue to occur.
While avoiding IT outages is a mission close to impossible, the troubleshooting process has room to improve and save organizations a great deal of money.
Evolven helps enterprises troubleshoot and prevent such performance incidents as those reported above. Evolven Change Analytics tracks and analyzes all actual changes carried out in the enterprise cloud environment, allowing the amount of troubleshooting time to be significantly reduced and the number of incidents to be cut.