What is IT Operations Analytics (ITOA)?
For IT Operations, change is a fact of life. Change takes place at every level of the application and infrastructure stack and impacts nearly every part of the business. With cost pressures and complexity frustrating many IT ops teams, IT is expected to help the business weave together multiple kinds of technology to solve time-pressured business challenges, pushing the IT landscape in complexity, having to support a wider range of technologies and platforms, and accelerated release schedules. Against this, it is much harder to troubleshoot stability and performance issues and IT specialists may spend hours trying to isolate a problem.
Starting in Application Performance Monitoring (APM), the new discipline of IT Operations Analytics has expanded and applied to more areas, focusing on dealing with application availability and performance. Application-performance-data-specific analytics have been considered essential for IT solutions, with the past year showing an intensification of interest in cross-domain ITOA platforms.
When an incident occurs, IT ops needs to identify the causes (the causal factors) as fast as possible. Today operations look at the behavior of the system and tries to infer a reason behind what is happening. It's like how a doctor may treat symptoms, rather than going after the actual trigger of the illness or pain. So the challenge has been to try and get to the source, what happened, and extrapolate decision making from there.
Existing IT management tools have followed this approach even when taking Analytics to the predictive level, looking at past performance and then projecting (essentially guessing) what performance will be, and then taking decisive steps.
Start with the Source
Rather than looking at prior performance, IT needs to start with the source, not starting to reverse engineer performance information indicating an incident. It is like a software QA manager in dealing with problematic software, trying to reverse engineer this piece of software when it doesn't perform properly. QA wouldn't investigate bugs by reverse engineering application executables that produce wrong outputs. Really when QA finds a problem, the developers need to thoroughly investigate the code and zero in on the problematic commands causing the failure.
Getting to the heart of the matter and dealing with an issue at its source will be both faster and more effective.
So a critical area to apply ITOA is Change and Configuration Management (CCM), since operations run on changes. Where IT operations today is guessing at what changed, they should know for sure.
The dynamic pace of change impacts the IT organization, pushing them to keep track of thousands of dynamic IT configuration parameters. Without clear, up-to-date configuration information, planned changes can fail.
Unauthorized changes may be detected, but not before performance has eroded, and problem resolutions then take too long. One of the key reasons restraining IT ops is the lack of visibility into the complex and dynamic configuration of IT environments, and the actual changes happening. There is a gap in IT visibility where key IT process issues remain open and key operational questions cannot be answered.
What is Analytics?While monitoring gives a lot of information, this needs to be sliced and diced to form a coherent report. Today's dynamic environments produce too much data to keep track of. Like in a stock market, in handling a valuable stock, there is too much dynamic information to be able to reliably say how that stock stands to perform. The financial community gets insight from stock analysts that provide ratings for stocks, based on their digestion of comprehensive dynamic data. In the same way, analytics can provide insights to help IT Operations manage more effectively, translating data from metrics to understandable information.
Different levels of Analytics
Analytics can be looked at in different levels.
- Descriptive analytics: This tells you what happened, and what was a cause of the problem. Once this is clear you can find the solution. This is essentially root-cause analysis where attention is put into identifying the factors that resulted in the harmful outcomes of an event.
- Predictive analytics: This takes analytics another step by making inferences on the future based on collected data. Predictive analytics tools don't need a perfect storm of complete data infrastructure to provide meaningful and immediately applicable results. For example, predictive analytics can recognize the kind of data available and suggest an appropriate predictive model, and indicate the likelihood of possible outcomes.
- Optimizing analytics: This approach looks at past performance data and, through analysis, provides insight for how to improve operations. This entails identifying the specific element or action that needs to be changed and prevent a recurrence in the future.
Chronic Change and Configuration Management Challenge
In today's complex IT landscape, IT Ops has to stay on top of multiple events and processes in order to maintain stability and keep the systems performing as needed, so that the business can reliably focus on what matters most. This challenge is further intensified by the efforts of agile software development, racing to meet ongoing and changing demands of today's business services. IT ops now have to manage complex processes in dynamic environments and keep on top of a hectic pace of deploying releases, patches and hotfixes.
Yet lacking descriptive analytics, IT ops is challenged for how to approach a problem. They don't have any way to connect the impact the changes create to a reason, the change itself.
Operations has looked to APM solutions, and at monitoring these environments. However while this provides a ton of information describing current and historical availability and performance of the systems, the IT team is still at a loss for gaining useful operational inference from this data.
Traditional CCM tools also offer no visibility to pinpoint what actually changed. Change requests remain on a high level, where no one knows what was actually altered as the result of the requests. With lack of visibility into and management of configuration, the risk introduced by changes expands, leaving no way to track down problems.
IT Operations Analytics Needs
Between applications, environments, and underlying infrastructure, mistakes and unauthorized changes can still happen (and do, as we seen with a number of high profile organizations) demanding that IT ops spend time managing detailed configuration values. Configuration management doesn't have to be the painful experience that we've painted. Based on comprehensive collection of configuration parameters, IT operations analytics can translate abundant detailed configuration data and frequent changes into critical decision-support information. This allows IT managers to see the actual state of environment, and get actionable insights that address practical day-to-day operations questions (like, when an incident occurs, can you quickly know "what changed"? etc.).
At this point, while IT Operations Analytics is still at a nascent stage, it is a worthwhile area to explore, and not fall back into the old habits of reverse engineering and backward guessing.