Major Outage at National Weather Service
Outages are inconvenient, annoying, costly, disruptive, but how about life threatening in the face of massive natural disasters. Well that's just what the National Weather Service Warning System recently faced as their systems went down just as severe thunderstorms, some spawning damaging tornadoes, erupted across the country from Denver, Colorado, to Albany, New York. The National Weather Service Warning System was unable to issue critical warnings about weather conditions as a widespread data outage hit the National Weather Service (NWS) on May 22. Radar data stopped flowing to mobile apps and NWS websites for at least a half an hour, and at least some NWS offices lost the ability to disseminate severe weather warnings through automated means, turning instead to social media.
The Washington Post shared that the "National Weather Service failed to warn part of D.C. area of dangerous thunderstorm."
Weather Channel reported that the "National Weather Service outage delays critical tornado warning."
Mashable stated that the "National Weather Service Warning System hobbled by outage at worst possible time."
Making matters worse, the outage caught the attention of the chairwoman of the Senate Appropriations Committee. As both the chairwoman of the full Appropriations panel and the subcommittee on Commerce-Justice-Science that funds the National Weather Service, Sen. Barbara A. Mikulski's home state of Maryland was not notified of a severe thunderstorm warning due to the outage. Subsequently she declared that "Clearly, upgrades to the Weather Service IT systems can't happen soon enough. We owe it to our communities to have a weather service that is ready for duty providing reliable warnings now."
So What Caused the Outage?
As reported in a memo provided to Mashable by a NWS spokesman, the data disruption was triggered by an upgrade to the weather data dissemination system itself, specifically a modification to a network firewall. The firewall was supposed to continue to allow data to pass through it, but "within minutes, engineers noticed that data were not traversing the firewall." In IT operations, change happens, enabling continuous improvement of services. In complex dynamic ecosystems, such as the National Weather Service's Warning System IT infrastructure, change happens a lot. On any given day infrastructure can be upgraded, patches installed, automated processes activated that alter files, system environments and their configurations. Sometimes these activities are performed correctly and ... sometimes they're not. When they're not, the cause may only be identified when a devastating failure occurs.
A key component of the entire IT operations process is Change and configuration management. IT control is impacted every time a change occurs in the infrastructure—whether for the deployment of new hardware or applications, the hiring of new personnel, or some other change—questioning the effectiveness of IT control and change management processes. When an organization can manage change on a continuous basis, it gains the visibility necessary to ensure that its infrastructures are secure, compliant and effective.
NWS outage: http://t.co/asnMIkCBlN At AccuWx our policy is to not make software/hardware changes on Fridays/Holidays or severe weather days.— Jesse Ferrell (@Accu_Jesse) May 23, 2014
How Severe was the Outage?
The severity of the outage was described as lasting "from 3:49 to 4:25 p.m. ET on Thursday, occurred as severe thunderstorms erupted from the Denver metro area eastward to the Mid-Atlantic. A tornado warning was issued at one point for downtown Denver, and one strong tornado touched down near Albany, New York. Another weaker twister struck in Delaware around the time of the outage. The outage affected 41 watches, warnings and advisories. 30 products (including a tornado warning) were not delivered through the primary dissemination system, but were disseminated by other systems.11 products were delayed approximately 5 minutes.
IT Operations Analytics
Not only at the NWS, but today's IT Operations organizations face new levels of challenges that can no longer be handled with existing approaches. This means applying some more serious brain power to help deal with the complexity and dynamics of today's IT environments. IT Operations Analytics delivers the intelligence IT operations organization are craving, allowing them to turn piles of IT operations' data into actionable information.