IT Survival Tips for Reducing Uncertainty and Preparing for the Worst
When uncertainty enters your infrastructure operations, specifically your application infrastructure, there is real reason for concern. A few years back, a single application's availability and functionality only relied on the platform and the network it resided on.
The last thing you want to have to do is analyze configuration files and event logs searching for a mis-configuration that slowed performance. This could end up as an IT incident, taking down mission critical applications. Where as once, even a 24 hour outage would be considered by companies to be a problem, but not disastrous, today with 'real-time' standards of availability, even 1 hour of downtime is intolerable.
Today, with the introduction of IT initiatives like agile development, virtualization and cloud technologies as well as ongoing release management, the possibility of failure has expanded. IT teams are under increased pressure to implement, manage and make application deployment and software deployment faster while being efficient as well as maintain a more complex mix of systems and environments.
Despite the best efforts of IT organizations to take steps against failure, with numerous safeguards, a major source of today's failures can be attributed to both this complexity and simply human error.
Greater Complexity
When it comes to environment configuration, the devil is indeed in the details.
IT environments are complex. A typical environment includes thousands of different configuration parameters (for example, IBM Websphere Application Server holds over 16,000 configuration parameters alone) in which the mis-configuration or disregard of a single setting can cause an incident with major impact on the IT environment and business service.Now with the growth of virtual servers, IT organizations can maintain tens or hundreds of multi-environment stand-alone servers, each with their own myriad of mission critical applications running, usually from a single console. This complexity has made the datacenter more productive, but it does come at a cost.
An application needs to be both tested in various environments and implemented properly. Today's complexity adds more risk to implementation. For example, configuration settings only appropriate to the test environment can be carried over into production during deployment. These potential errors can be overlooked and only discovered when performance appears to slow.
For large IT enterprises, maintaining high system availability means integrating a varied mix of technology, resulting in greater complexity, introducing more risk.