Anticipate the Impact of Risky Changes and Prevent Harm
“Principles of Risk Management:
1) Don’t do anything wrong today
2) Don’t do anything wrong tomorrow
“Anticipation, anticipation is makin’ me late, is keepin’ me waitin’ “- Carly Simon
Never doing anything wrong is great advice…although decidedly impractical and most likely impossible for us mere mortals. But, as businesses move faster in their efforts to innovate, driving IT to increase the cadence of new releases something invariably will go wrong. Something will change and that change could turn out to be the root cause of a disruption in service. But why wait till there is an impact? It is far better to anticipate and take action to prevent its impact. You can mitigate the impact of these changes by embracing and extending the notion of observability to include the dimension of change and configuration awareness.
Infrastructure and Operations (I&O) leaders can reconcile the speed the business must run at with the reliability and control IT needs to ensure a smooth-running production – even when things go wrong. They can do this by anticipating change risks. No, (sorry Carly), anticipation will not make you late. It will make you early! Anticipation of risky change impact enables I&O to handle the unexpected, and sometimes unauthorized events that occur and still move as fast as the business demands. They can also directly detect changes that while expected are risky as their impact was unforeseen by their creator.
AI for Change Awareness
To move fast and anticipate the impact of changes, we need to go beyond labor-intensive, reactionary processes and instead leverage the power of artificial intelligence (AI). AI-driven automation and analysis are used to detect, assess, validate, and reconcile actual and planned changes in manual and automated deployments before there is a problem.
Evolven’s patent for Change Reconciliation addresses this. It is used to detect the CI’s and their configurations in an IT environment that have “actually” changed and compares them to planned change requests and deployment events using sophisticated correlation. Evolven's concept of risk and impact is different from the traditional change risk/impact concept introduced by ITIL. Its risk/impact analysis focuses on individual granular changes, looking for signs that they are "out-of-norm" and as a result may potentially reduce stability.
Evolven leverages AI ML to identify the scope of planned change, implementation timeframe, and most likely change implementers, such as using natural language processing to understand the scope of changes from a change request. The detected changes are compared to planned changes, producing an authorization score indicating if a particular detected change can be attributed to planned or pre-authorized change activity or not. The authorization score draws from the game-theory field, calculating a tradeoff between the cost of erroneously marking a change as authorized instead of unauthorized and vice versa.
Evolven’s Analytics Engine goes on to provide proactive risk analysis, evaluating all changes and differences to estimate their likelihood to cause issues in the future. The resulting probability is visualized for the user as a color-coded risk level.
This approach to analysis guides your and your teams from the endless loop of build-break-fix-repair (rinse and repeat) to proactive management providing prescriptive steps to take to prevent problems and avoid business impact. Using AI ML, Evolven’s “Observability Platform for Change” solution can anticipate that a specific “change” will cause a customer-impacting problem and gives IT the time and opportunity to act to prevent impact instead of reacting to minimize damage.
In a Kubernetes (K8) deployment, a major incident was opened for the service crash of the GetExistingLoans application. (See Figure 1.)
Figure 1: GetExisting Loans Topology Utilizing Kubernetes
When we drill into the details of the high-risk change, Evolven tells you that the file “endpoints-get-existing-loans,json:limits” parameter memory was decreased to 128. This is a non-compliant, policy-violating change as policy states that memory must be larger than 1024MB. (See Figure 2.)
Figure 2: High-Risk Change Detail
Evolven’s AI ML-based risk analysis is transparent and shows you how this performance impacting change of the “memory” parameter was determined to be of high risk in that besides being non-compliant it also had a severe impact on workload behavior. (See figure 3)
Figure 3: Risk Analysis
Evolven’s solution extends standard observability beyond the telemetry contained in logs, metrics, and traces. The extension of change and configuration awareness provides I&O Leaders with the tools to anticipate the risk of changes leading to potential problems, and the prescriptive advice to prevent harm from them. It also delivers businesses the insight to determine the root cause of problems that are the result of risky changes. In the above use case, the change of the memory parameter in Kubernetes was the root cause of the service crash. Evolven’s technology will help your business move from being reactive to actively making risk-informed decisions, preventing changes from impacting the customer experience. Use Evolven to anticipate problems, manage stability, compliance, and security risks and deliver business value to your customers.