Sep
18

Downtime, Outages and Failures - Understanding Their True Costs

Labels: Data CenterDowntime


   
 

When it comes to mission-critical applications, and the performance of the data center, companies put a lot of cash to see results, however, the investment doesn't always deliver the hoped-for outcome.

Confronting System Downtime

Despite advances in infrastructure robustness, many IT organizations still face database, hardware, and software downtime, lasting short periods to shutting down the business for days.

Downtime Expected
Yet, the world of IT failure is strange. Despite mounting statistics that touch nearly every major enterprise software vendor and customer, from ERP to CRM and more, just bringing up the topic of outages still terrifies those in the industry. Against this backdrop, IT failures have become an accepted, virtually expected, aspect of enterprise life.

Downtime Revisited
So, while IT professionals confront downtime and try to get on top of it, the business organization suffers the pain of downtime. About a year ago, we looked at the many ways that IT downtime can hurt businesses (Cost and Scope of Unplanned Outages), from lost revenue to reputation damage to lost productivity. Now we want to revisit the issue, and see how organizations of any size should address and assess threats to their IT operations, including systems, applications, and data, and look at solid numbers around the potential costs that downtime and outages pose to the business.

System Outages: Measuring Big Brand Failures

Where does one start to measure those recent big brand outages (10 Devastating Outages and Failures Of Major Brands In 2011) such as those that recently hit Bank of America Online Banking, Amazon Web Services, Intuit or Blackberry?

Downtime costs also vary significantly within industries, especially due to the different effects of downtime. Business size is the most obvious factor, but it is not the only one. Setting a measure means establishing the nature and implications of the failure.

A failure of a critical application can lead to a few types of losses:

  • Loss of the application service – the impact of downtime varies with the application and the business.
  • Loss of data – the potential loss of data due to a system outage can have significant legal and financial impact.
    (The Impact of Network and or Server Downtime)

Now everyone would agree that today's data centers should never go down, and applications should be available around the clock, and internal as well as external end-users worldwide need to be able to rely on data center availability for critical data and application availability anytime. Regardless, that doesn't mean that inside the data center, nothing ever really stops.

System Outage Nightmare Example: Virgin Blue's Reservation Desk

Customers of Virgin Blue were really upset when they couldn't board their scheduled flights, during an outage that lasted up to 11 long days. The outage fired up a lot of negative press, as well costing the company millions in profits.

In September 2010, Virgin Blue's airline's check-in and online booking systems went down. Virgin Blue suffered a hardware failure, on September 26, and subsequent outage of the airline's internet booking, reservations, check-in and boarding systems. The outage severely interrupted the Virgin Blue business for a period of 11 days, affecting around 50,000 passengers and 400 flights, and was restored to normal on October 6. (Virgin Blue IT outage hit profit by up to $20M)

The Results: Virgin Blue's reservations management company, Navitaire, ended up compensating Virgin Blue for up to $20 million. (Navitaire booking glitch earns Virgin $20M in Compo)

Misconfigurations Have Major Impact on Performance

The IT Process Institute's Visible Ops Handbook reports that "80% of unplanned outages are due to ill-planned changes made by administrators ("operations staff") or developers." (Visible Ops). Getting to the bottom of the matter, the Enterprise Management Association reports that 60% of availability and performance errors are the result of misconfigurations. The little changes that are implemented to the environment and system configuration parameters all the time.

A recent Gartner study projected that "Through 2015, 80% of outages impacting mission-critical services will be caused by people and process issues, and more than 50% of those outages will be caused by change/configuration/release integration and hand-off issues." (Ronni J. Colville and George Spafford Configuration Management for Virtual and Cloud Infrastructures)

Manual configuration errors can cost companies up to $72,000 per hour in Web application downtime. While application maintenance costs are increasing at a rate of 20% annually, 35% of those polled said at least one-quarter of their downtime was caused by configuration errors. (How much will you spend on application downtime this year?)


Production and Application Downtimes Cost Made Clear

Unplanned outages are the responsibility of IT to resolve. However, at the end of the day they are, essentially, business issues. Part of a thorough evaluation process is calculating how much money you will lose for each hour (or minute, or another time increment of your choice) of downtime. For enterprises with revenue models that depend solely on the data centers' ability to deliver IT and networking services to customers – such as telecommunications service providers and e-commerce companies – downtime can be particularly costly, with the highest cost of a single event topping $1 million (more than $11,000 per minute) (Understanding the Cost of Data Center Downtime: An Analysis of the Financial Impact of Infrastructure Vulnerability).



A USA Today survey of 200 data center managers found that over 80% of these managers reported that their downtime costs exceeded $50,000 per hour. For over 25%, downtime cost exceeded $500,000 per hour. (Let's Get an Availability Benchmark).

According to the Information Technology and Intelligence Corp., their high availability survey revealed that while companies can't achieve zero downtime, one out of 10 companies said they need greater than 99.999% availability. (Trends in high availability and fault tolerance)



To get a firm understanding of the implications of production downtime and release downtime, let's look at how the consequences of downtime are manifested.

Downtime Cost Per Year

More According to Dunn & Bradstreet, 59% of Fortune 500 companies experience a minimum of 1.6 hours of downtime per week. This means that if you take the average Fortune 500 company (at least 10,000 employees) paid an average of $56 per hour, including benefits ($40 per hour salary + $16 per hour in benefits). The labor part of downtime costs for an organization this size would be $896,000 weekly, translating into more than $46 million per year. (Assessing The Financial Impact Of Downtime).

 

Downtime Cost Per Hour

On average, the businesses surveyed said they suffered 14 hours of IT downtime per year. Half of those said IT outages damage their reputation and 18% described the impact on their reputation as 'very damaging.' Headlines about IT failures certainly don't help. (IT Downtime Costs $26.5 Billion In Lost Revenue )



The average downtime costs vary considerably across industries, from approximately $90,000 per hour in the media sector to about $6.48 million per hour for large online brokerages. (How Much Does Downtime Really Cost?). According to a survey of IT managers, companies are becoming more aware of the direct financial costs of computer downtime. The survey results showed that one in five businesses lose £10,000 an hour through systems downtime. (Companies count the cost of IT failure)

IT downtime costs businesses, collectively, more than 127 million person-hours per year—or an average of 545 person-hours per company—in employee productivity. (IT Downtime Carries a High Pricetag)

35 percent of survey respondents believe that one hour of downtime for their most business critical applications will cost their company $25,000 or less, potentially underestimating the adverse impact that IT downtime can have on their entire businesses. Just 10 percent of survey respondents report financial impact of $150,000 or greater per hour, which is closer to most industry cost estimates — e.g., the Aberdeen Group's estimate of $110,000 an hour for the average company. (Working in the dark: financial impact of IT downtime)

A conservative estimate from Gartner pegs the hourly cost of downtime for computer networks at $42,000, so a company that suffers from worse than average downtime of 175 hours a year can lose more than $7 million per year. But the cost of each outage affects each company differently, so it's important to know how to calculate the precise financial impact. (How to quantify downtime).

Downtime Cost Per Minute

The average cost of data center downtime across industries was approximately $5,600 per minute. (Unplanned IT Outages Cost More than $5,000 per Minute).

According to a study by the Ponemon Institute, the minimum, median, mean and maximum cost per minute of unplanned outages was computed based on input from 41 data centers. In the chart below, the most expensive cost of an unplanned outage is over $11,000 per minute. On average, the cost of an unplanned outage per minute is likely to exceed $5,000 per incident. (Understanding the Cost of Data Center Downtime: An Analysis of the Financial Impact of Infrastructure Vulnerability)

 

Downtime Per Year

Gartner has calculated that downtime can reach 87 hours a year. Obviously that's the sum of many outages - anywhere from a few minutes to hours. But at the end of the day, for an organization this becomes a staggering figure. (Average large corporation experiences 87 hours of network downtime a year).

Average Outage Period

When an outage occurs, it's a race against time to handle it before it spirals out of control. According to the IT Process Institute, resolution time per outage is around 200 minutes. It's really interesting to see just how much time is being put in to resolve outages, when you consider what is happening to the customer experience and company reputation in this time. The average reported incident length was 90 minutes, resulting in an average cost per incident of approximately $505,500. (Unplanned IT Outages Cost More than $5,000 per Minute)

Unplanned Downtime Impact on Revenue

On average, businesses lose between $84,000 and $108,000 (US) for every hour of IT system downtime, according to estimates from studies and surveys performed by IT industry analyst firms. In addition, financial services, telecommunications, manufacturing and energy lead the list of industries with a high rate of revenue loss during IT downtime. (Assessing the Financial Impact of Downtime)

For a total data center outage, which had an average recovery time of 134 minutes, average costs were approximately $680,000. (Unplanned IT Outages Cost More than $5,000 per Minute)

If an outage creates a disruption in a supply chain with a high level of expectation in responsiveness (i.e., medical services or overnight delivery), the business may be exposed to damages. Often, damages stem from the inability to deliver (i.e., loss in delivery fees due to arriving late or lawsuits due to collateral damages). These highly publicized situations can impact shareholder value. ( How Much Does Downtime Really Cost?)


(Source: Unplanned IT Downtime Can Cost $5K Per Minute)

Downtime Impact on Reputation and Loyalty

What is your reputation worth? This may be difficult to assess, considering the long-term effect of a damaged reputation and its impact on revenue and profitability.

Downtime costs in this regard include lost business with customers (both short term and long term), employee time diverted from other tasks to get the IT systems running again, employee overtime expenses (if applicable), the value of any lost data, emergency maintenance fees (particularly if the outage occurs during off hours) and additional repair costs that may go on even after service has been restored. Needless to say, you must estimate many of these costs, as they will vary depending on when the downtime event occurs, which systems are at fault and what measures are required to get the facility operating again. But even a rough guess in this area will be extremely helpful when you're deciding your required level of application availability.
(The Price of Data Center Availability)

Even so, there are tangible elements that reflect the costs of reputation impairment like stock downturns, marketing man hours and media dollars required to reboot and polish up an organization's profile.

To more accurately assess total lost sales, the impact percentage must be increased to reflect the lifetime value of customers who permanently defect to a competitor. If a large percentage of customers typically become very loyal after a satisfactory buying experience, the impact factor may significantly exceed 100 percent, possibly by a high multiple. (Assessing the Financial Impact of Downtime)

Intangible costs vary among organizations. Downtime can result in lost opportunity, shaken customer loyalty, damaged reputation, and lowered employee morale. This can be translated into considering what is the cost of losing one client? What is the cost of replacing some of your best employees? Thought it's hard to state in dollars, but these soft costs are real. Looking at when Amazon Web Services went down for several days, it produced a tremendous amount of press and speculation! (Cost-Unconscious: Denying the True Cost of Network Downtime)

The fallout from the Amazon cloud outage added to fear surrounding cloud security and downtime. And as Amazon continued to scramble to get its cloud services back online, many customers questioned the reliability of the cloud, Amazon's communication around the outage and whether they would be compensated for the downtime as part of their SLA. (Cloud Outages: Cloud Services Downtime And The Lasting Impact) As for the SLA, despite the almost four-day outage Amazon's EC2 SLA was not breached (Seven lessons to learn from Amazon's outage).

Impact to Employee Productivity

Downtime cost impacts employee productivity, which can be measured in terms of the salaries, wages and benefits of workers that are made idle by system downtime. After a downtime event, investigative actions are often required to correct the damage. For example, IT operations might work overtime – at overtime rates – or temporary staff may be contracted to recover lost data and enter accumulated paper transactions. And, if customer satisfaction was damaged, a costly special marketing program may be necessary to win back customers (How Much Does Downtime Really Cost?).

Cost of Downtime: Calculating it Yourself

How much do you lose from unexpected downtime of your servers & business applications?

The simplest way to calculate potential revenue losses during an outage is with the equation:

LOST REVENUE = (GR/TH) x I x H
GR = gross yearly revenue
TH = total yearly business hours
I = percentage impact
H = number of hours of outage

Service costs are rarely zero.
(How much do you lose from unexpected downtime of your servers & business applications?)

Want to reduce outage and downtime risk?

Find out how you can eliminate outages, improve performance, and reduce production risks with Evolven's IT Operations Analytics solution.  

Take our tour or sign up for a free trial and see for yourself how Evolven eliminates outages. Really.

 


References
10 Devastating Outages and Failures Of Major Brands In 2011
Assessing The Financial Impact Of Downtime
Average large corporation experiences 87 hours of network downtime a year
Cloud Outages: Cloud Services Downtime And The Lasting Impact
Companies count the cost of IT failure
Configuration Management for Virtual and Cloud Infrastructures
Cost and Scope of Unplanned Outages
Cost-Unconscious: Denying the True Cost of Network Downtime
How Much Does Downtime Really Cost?
How much do you lose from unexpected downtime of your servers & business applications?
How much will you spend on application downtime this year?
IT Downtime Carries a High Pricetag
IT Downtime Costs $26.5 Billion In Lost Revenue
Let's Get an Availability Benchmark
Navitaire booking glitch earns Virgin $20M in Compo
Seven lessons to learn from Amazon's outage
The Impact of Network and or Server Downtime
The Price of Data Center Availability
Trends in high availability and fault tolerance
Understanding the Cost of Data Center Downtime: An Analysis of the Financial Impact of Infrastructure Vulnerability
Unplanned IT Outages Cost More than $5,000 per Minute
Virgin Blue IT outage hit profit by up to $20M
Visible Ops
Working in the dark: financial impact of IT downtime


blog comments powered by Disqus

Written by Martin Perlin.

Get actionable insights Now!