Forget Tool Improvements! Today’s Systems Management Needs a New Generation of Tools
Through the years, data centers – and their designs and requirements—have changed. But one thing has stayed the same: they are the focal point of IT operations. One of the most pervasive shifts in IT has been the shift from manual to automated and from reactive to proactive management, where the importance placed on deployment and setting up servers has been replaced by the need to understand and manage servers that in many cases are automatically provisioned and the applications running on them.
Challenge of Working in the Cloud
While the Cloud eliminated numerous challenges in environment management, particularly in the area of deployment it also created new ones, like for example in change monitoring and configuration management. Many make the assumption that monitoring applications in a cloud is only slightly different from monitoring traditional internal enterprise applications. That assumption is far from the truth.
New Conditions in the Cloud
With many more events taking place dynamically in the cloud, running systems in cloud creates new conditions that should be addressed by monitoring and automation tools:
Server virtualization, and cloud have played a huge role in the recent evolution of the data center. Now, administrators simply can't see their computing resources in the traditional sense, with virtualization obscuring the relationship between hardware and workloads. Instead of having servers, software, applications and storage dedicated to certain tasks, all of that is abstracted to users, and even the IT manager. Tools monitoring such environments need to be able to connect the virtualized perspective with actual configuration of all the involved layers
- Cloud Elasticity
A foundation of cloud computing is that resource management needs to be elastic. Automated scale up and scale down of computing resources is one of the most powerful capabilities of the cloud. Virtual machines can be spun off on one server or another, and even on different brands of hypervisors. This is done either for management reasons, to balance out loads or because a different administrator needs to run the VM, or for failover reasons. Monitoring tools should be able to support elasticity automatically including and excluding new and dropped server instances from their monitoring scope
- Automatic Provisioning/Deprovisioning
Elasticity of the cloud relies on an automatic provisioning and deprovisioning capability. Such automation uses various methods to deploy and set up a required configuration (virtual images, scripts, templates, desired configuration policies etc.). The cloud automatically adds and removes machines available in the system, dynamically reconfiguring a Cloud. Complicated scripts automatically and simultaneously set up virtual machines. What validates the scripts? What happens when errors creep into the script? With the dynamic nature of the cloud, this means a lot will happen automatically, possibly based on flawed instructions. IT operations staff begins to perceive managed environments through the perspective of their automation assets. Then the monitoring tools need to link their results with these assets and also be sensitive to the change in assets and their impact on the monitoring architecture and framework
The growing number of systems transitioned to the cloud and high rate of changes following a systems transition demands that IT operations spend more and more time on the automation of changes distribution. However successful administration of any business system requires checks and balances where results of automation are constantly analyzed and matched against expected results. The monitoring tools should be able to assure in an automated fashion that the cloud management loop is closed.
Facing Changing Environments and High Pace Of Change
Currently, cloud resource management systems (for example, VMWare's DRS, or Amazon's AutoScaling) focus on the control of the infrastructure. These systems allocate resources on a real-time basis according to a predefined set of rules without any understanding of the business services using the allocated resources. At the same time monitoring and management technologies should be able to view the cloud environment from the business service perspective to deliver meaningful results. These technologies should work in sync with the resource management platforms maintaining a real-time picture of the monitored environments according to the changes cause by resource re-distribution.
New Gen Tools Need to Take on Unique Cloud Qualities
The essential characteristics of cloud platforms make it much more difficult to stay aware of what is happening in a cloud environment. The new generation of tools needs to face and overcome many aspects of the dynamic nature of the cloud:
- Identify new instances
Cloud resource management may create a volume of events. On one hand these events should be identified and reported but on the other they need to be leveraged to adjust automatically the monitoring framework. So if a new server instance is spun off you want to start monitoring and management of this instance as soon as it becomes active.
- Support scale-downs and decommissioning
The tools need to deal with instance scale downs, and decommissioning to recognize them as intended events rather than system failure
- Connect to instances
The monitoring tools need to recognize content of the instance that also evolves over time as changes are introduced into base images or setup scripts. Then these tools should adjust their monitoring scope automatically to address the evolving instance content
- Identify the type of instance
Instances addressing the same function can be spun off using different infrastructure or base images. The monitoring tools should be able to correlate their observations to the type of the instances. As the amount of information generated by the tools could be significant ability to aggregate the data according to the instance type could be essential to make the monitoring manageable
Overwhelming Amount of Activity
The dynamic and automated nature of the cloud means data centers have to stay on top of what becomes too many events for traditional tools to handle.
When simply monitoring the key system elements in the traditional static console of each individual physical node, you can see any event and then can connect to it. However, in the new dynamic platforms, there are too many events, inundating the console with information and drowning the operators in data, both critical and non-essential. The use of virtualization creates a dynamic capacity pool of resources that needs to be monitored and ultimately managed in a completely different way.
The New Generation of Tools
Facing this new reality and challenges, the new generation of tools needs to be able to aggregate dynamic information coming from multiple cloud vendors, and to translate this data into actionable metrics. These tools should be seamlessly integrated into dynamic resource management and automated deployment. They should recognize and support traditional software stack and underlying virtual and cloud infrastructure. Growth of such tools will be a critical factor in the expansion of enterprises to the cloud as they will need the same level of safety and control to make the transition, like their internal data centers have provided.