Unauthorized Change: The Devil You Don’t Know
Why do IT governance and configuration management processes fail to catch these unauthorized changes?
Everything’s locked down to get ready to stream the major media charity event. Environments are monitored. Releases, updates and patches are well-governed with an approval chain. Development and IT Ops teams are fully trained on configuration management, change control, and infrastructure release processes.
A network administrator makes a small change to a setting on one load balancer, without requesting authorization, confident that his change has nothing to do with the live stream.
The web broadcast starts, people and donations start streaming in, and … it crashes.
It takes half an hour to recover to a backup channel, but the show has been interrupted the whole time, and most viewers leave. It takes another day to figure out what actually happened.
The best laid plans of our configuration management processes still go awry so frequently. If we can tightly define all elements of an IT environment from software to system-level configuration, and specify which components can be changed, when, and by whom -- then how come we still can’t stop these unauthorized changes from happening?
To keep up with compliance requirements, enterprises invested heavily in IT governance processes and tools, from ITIL practices to today’s highly automated approaches to configuration management with more rigorous environment reproduction and testing.
Just like any fabled deal with the devil, the devil is in the details.
How could unauthorized changes still happen in today’s compliance-driven IT environment?
There is a Faustian bargain going on here, wherein IT leaders may believe their spending and effort has acquired them complete knowledge, visibility and process control over their world -- but once the letter of the law is executed, they eventually realize they missed an important detail in the contract.
They are left high and dry.
I’m not saying there’s some horned Mephistopheles involved in this plot. Aside from malware authors and malicious hackers who actually enjoy making unwelcome changes, most IT leaders and professionals are good actors who, in general, try to follow change protocols.
People still make unauthorized changes because they think “oh, this little change won’t affect anything” or “I’m in a hurry and willing to make an exception to policy just this once.” And administrators will still inevitably fail to notice many of these unauthorized changes.
Governments and industry groups stepped in, introducing a host of regulatory standards to introduce safer data handling practices, which naturally required better change control and configuration management tools and practices throughout enterprise IT.
From GDPR for ensuring user privacy in Europe, to banking-specific integration standards like PSD2 and healthcare regulations such as HIPAA in the US, compliance to standards is driving much of the IT investment and board-level attention of leading global firms.
The US Treasury financial regulatory arm produces OCC requirements for change management, log reviews and risk auditing procedures, comprehensive system monitoring, documentation and auditing of change control processes and authorizations, just for starters.
If you think that sounds like a lot of rote review work, requiring IT teams to dig through logs, with little business value to show for it, you’d be right.
Why can’t monitoring or configuration tools stop these kinds of changes?
In hellishly complex enterprise environments with millions of moving parts, no amount of manual checking in the world will meet up to needed controls and standards with any certainty, so we start leaning on automated tools to do this work for us.
There are very impressive tools on the market for managing user access and authorization, orchestrating deployments and updates, instrumenting code and releases, and monitoring system-wide performance, tied to team workflow and alerting processes.
These tools are all incredibly useful parts of a modern ITSM platform, but they often fail to find a solution for unauthorized changes. Since the changes were never intended in the first place, we may keep using our tools to look for what went wrong, in the wrong places, and at the wrong time.
In the real-world nightmare of a very large bank, a developer investigating a performance issue turned on detailed logging for a few application servers in production, and forgot to turn it off. Since the change was part of an emergency investigation, it was never reported nor authorized.
Soon after, customers started to complain that they couldn’t execute actions in their banking apps, and reporting timeout messages. Support spent hours trying to figure out what happened by going through the code changes. The testing team tried to reproduce the slowdown in a new performance environment. Finally the Ops team just added more app servers to recover performance. This failure even made the news.
Perhaps the whole idea of hunting for ‘root cause’ is the root of the problem, because it focuses on what happened, rather than what changed.
Fighting our way out of unauthorized change hell
Unauthorized changes can come at us from any direction -- from inside our own teams, from service partners doing their own jobs, and directly from the software vendors and open-source components running in our environments.
They can also come from malicious hackers and the malware they produce, because as we mentioned, a hack is in essence an unauthorized change.
One large manufacturing firm was using a Change Analytics system from Evolven to monitor their systems for version drift and unauthorized changes. They detected that a registry key was added to a few Windows servers, without any corresponding change record. A sysadmin convinced the change manager that the activity was simply part of a pre-approved patch.
Little did the manager know that this reported unauthorized change was a sign that the Stuxnet virus had started to infect the manufacturer's network. It took the company several weeks to get rid of the virus as it managed to spread across the entire network due to his failure to take action.
If we’re going to survive unauthorized changes, we’ll have to do better than just receiving insights about them. We will need to take action on those insights!
The Intellyx Take
Unauthorized change seems to be an inevitable reality for most organizations, with so many employees, vendors and automated processes touching an extended IT and cloud estate.
Fighting these pernicious problems requires both discipline and technology. Evolven provides a way for teams to monitor exactly what changes are carried out across the IT environment, then reconcile the actual changes against what was planned and approved in the enterprise’s CM or ITSM systems of record.
If we can get out of the habit of looking backward for root causes, and start paying closer attention to what’s changing, perhaps we can put the devil of unauthorized change behind us for good.
©2019, Intellyx, LLC. Intellyx retains full editorial control over this content. At the time of writing, Evolven is an Intellyx customer. Microsoft is a former Intellyx customer. None of the other companies or persons mentioned are Intellyx customers. Image credit: freeimages.com.