open
  1 (866) 866-2320 Resources Events Blog

Gremlins in IT Operations

Blog

Gremlins in IT Operations

 

incident investigationWe've all had it happen, and it's probably recently happened to you. After a long day in front of the bright computer screen, they pounce on you: the gremlins of IT operations.

While most authoritative sources define 'gremlins' as being imaginary creatures, you can probably attest otherwise. Gremlins are exactly what you'd imagine them to be; troublesome, annoying and cloaked in that unmistakable shade of green. They have a tendency to creep up on you when you least need it. Even worse, they can lay the sort of traps in your environment that will adversely affect your IT Operations. 

They are in the news. gremlins

Just a Little Change

Unauthorized change are uncontrolled business risks. Though slight changes may seem fairly innocuous, when a server is potentially accessed thousands of times per day due to a change demanding dynamic content creation, this could bring the server to its knees. 

Take the faulty configuration change to the routers on a company's DNS network. This can cause requests for access to a company's Web sites to go unanswered, requiring hours of investigation to pinpoint the issue. The misconfigured files would need to be replaced in order to return, traffic to and from the affected Web sites to normal.

Complexity

Unknown and even imperceptible changes can result in serious negative impact to IT systems and processes. IT operations are responsible for a complex structure of systems, which must all work together to deliver quality information and services. An integrated stack of systems including applications, databases, middleware, directory services, operating systems, and more must work in smooth coordination in order to successfully deliver a set of functions or processes. The unique behavior and state of each system in a stack is impacted by many of elements, such as file systems and their attributes, configuration settings, users, and permissions.

So when a 'gremlin' sneaks in and changes an IT element, and then the change fails catastrophically,. This type of change put most IT operations personnel are in a re-active role of continuously repairing systems, often called "IT firefighting." IT ops hop from one major system issue or "fire" to another, trying to put out technical emergencies one after another. IT drops planned work to remedy the results of the changes. The service disruption causes an incident that takes hours to repair and involves many IT staff from all functional roles: application developers, QA staff, database administrators, network and system administrators, and security, resulting in lost IT staff productivity. 

Thousands of Changes

IT generally processes the thousands of changes, migrations, and occasional patches during off hours so that business can be conducted without fear of disruption. Whether internal or outsourced, during second or third shift, following the end of a quarter or after the high sales season, all serve as logical maintenance windows. Whichever approach, the dynamics and velocity of today's marketplace requires IT to be controlled and responsive.

Yet while you are sleeping, when all the changes, migrations, and patches are most likely being processed, a little green gremlin can slip in a introduce a single unauthorized or unintended change. This can quickly, and frighteningly easily, bring down a company's critical services, like e-mail, voice mail, network, payroll processing system, or even the ability to process orders or ship products.

Just a Splash

After all, like in the movie Gremlins, it took just a small splash of water to bring out the gremlins, and for the calamities to start multiplying.

Your Turn
What are you doing about gremlins in your IT operations?

 
About the Author
Martin Perlin