Oops I did IT again: 7 APM mistakes by top IT Ops pros that hurt the bottom line
APM tools are here to stay
Oh, APM. Can’t easily (and with not enough expertise) live with them, can’t run a large enterprise (or any enterprise in today’s world) without them.
Application Performance Management tools are a must-have for monitoring and supporting the many critical applications of today’s enterprises. So essential are they, in fact, that the global APM-solution market is expected to reach $8.7 billion by 2023.
And if that’s not enough, explore this infographic:
Over the past few years, APM tools have evolved and greatly advanced, but so have the IT environments surrounding them, as endless and complicated procedures and challenges were added to the already nerve-wracking routines of IT professionals. On a daily basis, more apps are being added, the amount of data requiring management is increasing, and IT Ops teams are struggling with more challenges than ever.
Still, while some IT operations teams have embraced APM to such an extent as to make them an industry hit, most organizations continuously fail to extract their full potential, and they do so by repeating a few common mistakes.
APM tools are the primary building blocks in the creation of a new standard for IT Operation, but simply having them isn’t enough.
Based on advice provided by our experts and follow-up interviews with many of our clients (some of which are huge global enterprises), we've compiled the following list that we hope will encourage IT Ops teams and keep them from making the mistakes described below.
Mistake #1: What’s the plan?
A lack of strategy is problematic in any field, but it’s also a common phenomenon in utilizing APM tools. When implementing tools whose main purposes are monitoring and optimizing performance, it’s important to first establish what it is that should be monitored and acted upon, and how.
Simply reacting to problems and alerts fails to take advantage of the huge potential that your chosen APM has to aid your organization in becoming what it could be.
Enterprises that set clear goals, focal points, and architectures to serve as the basis upon which to implement their chosen APM tools can clearly see the difference these make in their tools’ ROI.
Mistake #2: The IT band-aid.
APM tools are great but can only do so much. In order to truly solve a large organization’s IT performance problems, its IT Ops teams must focus on the roots of these problems instead of just their symptoms.
APM tools alert teams when symptoms appear, just like a fever alerts you that you’re sick. But what’s the root cause of these problems? You can’t use APM alone for that. Well, not efficiently, at least.
When a problem such as a horror outage or anything as serious as that occurs, the typical reaction is somewhat reminiscent of teams that had not put any sort of strategy in place: it usually includes the firefighting approach employed in merely reacting to problems, not discovering their roots and permanently addressing them.
Image: APM indications on performance parameters (i.e., the symptoms, not the root cause) source
Instead of short-term troubleshooting, IT operations teams must, instead, add dedicated tools to the blend, ones designed to identify the root causes of what they experience, that identify the issues from which the symptoms stem, and that inform them how to improve performance over the long run. After all, the risks and damage each IT crisis poses are too high to ignore.
The most advanced enterprises are aware of additional tools that can address the root causes of problems, and they implement them alongside their APM tools. By doing so, they ensure their enterprises a better standard of performance compared to those obtained by other enterprises’ IT Ops teams. However, too many companies fail to understand this clear fact: APM tools need the support of additional tools in order to fulfill their promise, help their teams quickly identify the root causes of issues, and speed up the time required for troubleshooting.
Change analytics tools, for example, track each and every granular change that has occurred in your IT, identify ones that appear ‘risky’, provide a related score, and decide whether an alert is needed. They point out the changes that most likely caused incidents and use AI to prevent “bad” changes from happening in the first place.
The combination of APM and such a tool can completely change the picture for IT Ops, but not many know and understand the impact this tool can make, and so they keep making the same mistakes.
Mistake #3: The terrible two (or many)
Yes, IT Ops teams should add tools to aid them, but they need to be able to coordinate the operations of these tools with one another right from the planning stage (strategy, remember?).
For instance, using two or more different APM tools—one for development and the other for production—might mean that the team can’t really know what to expect when transitioning from the testing phase to the live product phase. The metrics and calculations related to both are based on entirely different systems.
In most cases, serious issues arise that were not taken into consideration when testing the product, and the team is forced to deal with the resultant unstable consequences while operating a separate, often relatively new tool.
The same types of situations could occur for other, non-APM tools. Comparing performance between environments can’t be valid unless it leans on the same tool or unless some sort of synchronization has been made.
Mistake #4: The self-absorbed IT.
When building and maintaining an application, sometimes it’s easy to forget that it is not meant for our own personal entertainment and that actual end-users will have to deal with the application’s overall user experience. Top-notch monitoring solutions must not be allowed to complicate or harm the application’s performance for these users.
Therefore, members of the IT Opts team must take into consideration the usage environment of the application’s average user (especially when they are working on a B2C products).
Unfortunately, I come across teams that refuse to own and manage an application’s entire delivery chain, resulting in a product that works perfectly for those building it and for whom its maintenance is completely satisfactory, but which its end-users (internal or external) don’t like using.
In order to avoid this mistake, IT Ops teams must make sure that they consider an application’s users, and not overlook performance issues that are not “red-light alerts” but that would nonetheless affect its end-users’ experience.
Mistake #5: Staying one step behind.
Acting may be reacting, but an IT team can’t act by simply reacting . One that is constantly responding to problems, trying to solve them while doing damage control, is doing something wrong that is very basic and important.
A proactive approach (yes, I’m talking about prevention rather than reaction) is a must in today’s IT world and, as I’ve mentioned previously, you simply cannot afford to continue suffering the damage a reactive approach causes.
Easier said than done? Right. But don’t managers and executives exist to generate progress? Troubleshooting faster—yes, of course. But identifying issues before they occur and preventing them from happening is no less important. I’m not referring to the symptoms, but to the real root causes of problems, which must be changes that were made in the IT (possibly in the applications themselves or in their environments). Not tracking actual changes is a major mistake that keeps IT Ops teams reactive and so behind the curve. Automatically analyzing all granular changes (yes, even the tiniest ones) can complement APM and add a proactive layer to the mix. APM can help you greatly improve your response, but they can’t get you out of a reactive loop and lead you to a proactive strategy all on their own.
Mistake #6: Sample size.
Incomplete data mean incomplete monitoring, and though sampling data has quite a few advantages, an enterprise that only samples data instead of assembling a big-data foundation is bound to discover unforeseen problems at exactly the wrong moment.
It’s extremely important to be able to see the full picture on each transaction.
Some APM solutions choose to sample data with the promise of making up for using samples in terms of increased scalability, but, with the APM options available in today’s market, I’m not sure there’s really a reason to make this tradeoff of one for the other.
Mistake #7: Neglecting intelligence
The rise of AI brings new and exciting possibilities to performance management, and so why not take advantage of these? AIOps is revolutionizing our field in the best of ways, and advanced teams have been adding these capabilities to those of their chosen APM (not to mention the current APM tools that can also add these capabilities). This is part of the proactive approach teams should embrace, and AI-based solutions can enable them to optimize performance with increased efficiency and accuracy.
For instance, our AI system helps us Identify changes that are ‘bad’ within the context of a specific environment, based on past data and common patterns. Moreover, it will ’learn’ more as time passes, and it will be more or less strict about issuing alerts regarding future changes, based on the insights it has gained.
In this article, we’ve chosen to identify a set of recurring problems and identify the root cause of each in order to offer real solutions. An in-depth, investigative approach is what we strive for, and that’s what’s needed in today’s market. As our work becomes even more complex and challenging over time, more IT teams will adopt this state of mind and the advanced technology solutions that come with it.