Getting Human Error out of IT
Recently Jonathan Crane wrote in Wired Magazine's Innovation Insights about 'Eradicate Human Error Without Limiting IT'. He noted that, "According to Symantec's annual State of the Data Center report, businesses averaged 16 data center outages in the past year. Of those, 25 percent were caused by human error, costing companies approximately $1.7 million in just one year."
Crane explains how "IT complexity is now doubling every two years. With multiple environments operating simultaneously, IT engineers must execute frequent updates and new deployments at an alarming pace." He warns that "IT complexity is only going to expand, so expecting engineers to keep up not only with their job requirements, but also with the pace of the industry is futile. Therefore, the solution must come in technology that can make IT smarter, rather than simpler."
Below we explore how human error and unauthorized changes affect performance and have an impact on the business.
IT Ops Today
The pace of change in today's IT world is truly astonishing. With cost pressures and complexity frustrating many IT ops teams, IT is expected to help the business side weave together multiple kinds of technology to solve time-pressured business challenges, pushing the IT landscape in complexity, having to support a wider range of technologies and platforms, and accelerated release schedules. For IT Operations, change is a fact of life. Change takes place at every level of the application and infrastructure stack and impacts nearly every part of the business. When virtual environments co-exist with physical and cloud entities, it is much harder to troubleshoot performance issues and IT admins may spend hours trying to isolate a problem. Just dynamic workload management mechanism makes this effort incredibly harder. Even with the management capabilities that virtualization and cloud vendors offer, they still lack in monitoring and analytics areas.
Downtime Impact on Reputation and Loyalty
operations teams are faced with the added challenge of ensuring accurate error-free application releases and with appropriate configurations during promotion and deployment, taking into account configurations that are inherently different between pre-prod and production. Even with the availability of automated deployment solutions, this still doesn't ensure that environments are properly configured.
Detecting unauthorized changes
Without clear, up-to-date configuration information, planned changes can fail. Unauthorized changes will be detected, but too late. Problem resolutions will take too long to complete and there will be little visibility into whether or not compliance requirements are being met.
The possibility of failure has expanded
Today, with the introduction of IT initiatives like agile development, virtualization and cloud technologies as well as ongoing release management, the possibility of failure has expanded. IT teams are under increased pressure to implement, manage and make application deployment and software deployment faster while being efficient as well as maintain a more complex mix of systems and environments. Despite the best efforts of IT organizations to take steps against failure, with numerous safeguards, a major source of today's failures can be attributed to both this complexity and simply human error
Hard to identify subtle changes
Security issues can manifest themselves as changes to an environment's bill of material or environment configuration. These could be innocent updates of registry keys or major changes of security parameters, either programmed or as a result of human errors. By detecting real-time changes that can impact environment safety, IT is better prepared to provide the necessary protection and a more immediate response to potential threats.
Automation complements the human element
Automating the discovery of critical changes offers benefits to operations, quickly providing actionable information for taking key decisions on. Automation steps in where the human element can not keep up, eliminating user oversights. A manual processes would limit you to hunting down release problems after the release into production, creating slowdowns and effecting users. Automated change management allows for early detection, so IT teams can take a proactive approach to both minimizing and ultimately to eliminating problems.
Limited visibility adds to risk from human error
Limited visibility to the underlying configuration makes it difficult to identify the root cause of faults, leading to slower application deployment times, impaired service levels, and more risk from human error.
Changes happening throughout the day add to risk
A survey, from the recent VMworld event, found that a majority of administrators make changes to their VMs several times a day. This all points to some pretty significant risks from that ever present element: human error. These risks can't be addressed efficiently by adding more personnel since budgets for staff won't grow in sync to the amount of VMs provisioned. Rather they require Change management and Configuration management that is specialized for virtualization and automatically evaluates the virtual environment especially for simple, minute changes initiated by common "user-error" misconfigurations.