Cloud Management Tools Shouldn't Add Burden to Admin
The need for agility and efficiency is driving the effort for adopting cloud computing. With the static infrastructure and high administration costs of physical data centers, cloud offers higher availability, a faster, more flexible platform, and a means of reducing both capital and operational expenses.
Cloud management tools play an essential role in automated delivery of high quality IT services. So you wouldn't want administration of these tools to turn into another legacy system-styled administration challenge hindering efficiency of the cloud. You do not want to waste the benefits achieved with transition to the cloud on implementation and administration of cloud tools. These tools should hide complexity of the cloud environment control and administration providing simple setup, minimum administration overhead, ultimate stability and a straightforward means of delivering information to the users.
Administration of Tools is Different In the Cloud
Traditional management tools implemented in a static physical or virtual data center have fixed, rigid setups. In cloud environments, management tools should support the dynamics of these environment's architecture and configuration, making administration different as it focuses on other aspects. For example, in the old world you would install a monitoring agent on a physical server once and then it will keep running there until you uninstall it. In the cloud you expect a tool to deal with agent distribution on its own as new server instances are spun off and de-provisioned frequently. So you can focus on the methods and parameters of tool deployment automation rather than on the actual act of installation itself.
When you transition an enterprise application to the cloud you expect that the cloud platform will take care of storage and computing resource allocation, failover, Disaster Recovery etc. Similarly, with cloud management tools you expect that basic administration tasks will be taken care by the tool itself, possible leveraging the same cloud infrastructure the tools manage. For example, if a tool is logging some information, it could be maintained using some virtual storage that is automatically allocated, monitored and backed up by the cloud.
Another example of an administration requirement specific for the cloud is control of information access. In a static data center it is easy to define boundaries of data center areas and the scope of collected information a user is allowed to access. It is simple because the architecture changes relatively rarely, meaning that scope can be adjusted manually without significant overhead. However the dynamic cloud environment topology can change so frequently that manual access control is impossible. The access scope should be managed automatically based on some dynamic rules or policies.
The challenge of tools administration in the cloud is exacerbated as IT staffs face the burden of using a variety of specialized cloud management tools, each designed to address a specific operation in cloud management, having their own setup, configurations, and administrative responsibilities.
Critical Behavior of Cloud
The cloud-based server is managed via a set of deployment assets, rather than through direct contact as with physical servers in a traditional data center. For example, let's say you distribute a change in the network configuration, then monitor performance of the system. In the cloud scenario, you don't look just into performance of the new system configuration. Rather you analyze system performance relative to the changes in the deployment assets that created the new configuration.
Further distancing control of the server comes out of the fact that the flow of changes takes place via scripts. Cloud assets are controlled through a set of standard scripts, following policies for security, backup, and management of sensitive data. These scripts are detailed instructions. The downside is that since manual changes still happen, scripts can fall out of sync with the desired environment configuration. Also even when changes are deployed through the automation platform not all the servers are updated. Some of the organizations prefer to keep existing instances all the time they are stable. As a result some drift can appear across the environment. This creates a circumstance where you are working double time to maintain the accuracy of these scripts, to keep them in sync with your target environments or risk the supposedly helpful scripts turning into vicious creatures, performing obsolete tasks.
So when a problem inevitably occurs, the IT ops team needs to carry out a reverse correlation to understand and identify the issues causing the problems. Cloud management includes the task of providing, managing, and monitoring applications into cloud infrastructures that do not require end-user knowledge of the physical location or of the system that delivers the services.
System management tools for cloud need to take a new approach, seeing activity in real time and identifying the affected components. This can be seen in the sense of when the server CPU activity suddenly jumps by 60%, then you need to know what type of server it is. If it is the application server, then you should be able to identify which one, and what is the configuration. Even more importantly, know what the actual configuration should be to investigate the reasons of the CPU jump.
The Elite IT Team and the NoOps Trend
At the most basic level, cloud technologies can reduce the number of administrators needed to manage an environment, shrinking the IT staff, and altering the approach to operations. Today there is a new trend based on the idea of NoOps. No, this isn't about outright eliminating the IT operations organization, but rather making the operations team into a smaller, more efficient group. The skillset of this group moves towards automation, and cloud management tools should take this into account.
Simple Tool Setup
An overly complicated management tool setup adds to the burden of tasks for IT operations. This means that IT operations have to carry out fine tuning for the tool setup, and ongoing resource-draining management responsibilities.
Quickly Deal with Incidents
Complex tools can fail just like enterprise systems do. In order to ensure that a tool deals efficiently with the failures it needs:
Self-monitoring
The tool should check availability and correct functioning of its components. It is very essential that the tool will both report an issue clearly to an administrator, mark the data that was collected incorrectly and any gaps in data collection
Explicit notifications mechanism
The tool should indicate issues visually in its UI, deliver alerts to the administrators and also maintain logs for further investigation
Failover capabilities
Those tools that are critical for cloud control and automation need to provide built-in failover capabilities. There should not be a single point of failure that can bring the tool down.
Self-healing
Many of the issues can be actually resolved by the tool itself. For example, it can restart its agents when agents fail to collect the data, they can recycle database space to ensure that the latest data is preserved, retry data transfer if a connection between server and agent is temporarily lost etc. Only when an incident occurs, and performance cannot be rectified automatically, should the tool escalate the issue by alerting the IT team.
Web-based Management
IT operations specialists managing cloud environments are mobile. This means that cloud tools should provide Web and mobile access. This should extend to the tool administration functionality as well. SaaS based tools appear to be most suitable for cloud management. SaaS allows flexible accessibility while avoiding overhead of tools infrastructure management.
Security
Security is still one of the most common concerns when organizations consider transition to the cloud. Cloud management tools frequently collect information that could be quite sensitive in terms of the security impact, e.g. environment configuration. So the same strict security management methods applied to cloud environments should be employed in cloud tools:
Encryption
Management and monitoring information collected by the tools that exposes sensitive cloud parameters that can potentially compromise cloud environment security should be encrypted. For example, let's say that a tool connects to a database server to interrogate its' performance parameters. Obviously credentials used for such connection should be encrypted
Protocols
Some of the tools transfer information between its components, for example such transfer happens between agents monitoring environment and central repository used by the tool to consolidate collected information. Typical security protocols should be used for such data transfer, e.g. HTTPS
Data Masking
Removal of all identifiable and distinguishing characteristics from data in order to render it anonymous and yet still be operable, for reducing the risk of exposing sensitive information, and preserving the privacy of records by changing the data so that actual values cannot be determined or re-engineered.
Selective access – permissions for specific data
The scope of the information exposed to the tools users should be carefully controlled
Low Day to day Admin
Lean operations cannot survive the burden of having to invest the time and attention of IT staff to keep track of changes created through the cloud's elasticity, including and excluding new and dropped server instances.
IT needs management tools that can drill down on-the-fly through activity to quickly identify performance and availability issues, automatically detect changes and adjust themselves to keep supporting managed environments.
This means that the tools should be able:
- Operate on their own
- Monitor the environment and independently deal with changesAdd agents, monitoring points and paramaters as necessary
- Auto-archive data
- Add/ Free up data space as needed
- Automatically restart/reset
- Perform self-healing activities
Cloud management tools need to introduce zero overhead so to not hinder the efficiency of the cloud.