Rollout and Rollback – a Viable Strategy?
Rollout and rollback – is this a viable approach in large environments?
The idea that at any time virtual server images can be reinstalled, rolled back or re-imaged makes cloud computing an exciting option and suggests that it can dramatically reduce the overhead on the IT department. With the ability to roll back, you can recover rapidly from the deployment of an invalid, incorrect, or corrupt change, that compromises environment performance or availability, and puts the organization at risk. Yet, while automation of deployment makes cloud infrastructure flexible and efficient, managing configuration is still a major challenge. The reason is that in large environments, we've found that the management of configuration based on the rollout and rollback approach of the entire server images is less effective and problematic.
Maintaining Environment Consistency
With the increased pace of changes streaming into the business system environment, environment management based solely on virtual image usage presents risk simply too high to tolerate. Due to the dynamic nature the Private Cloud, the many interdependencies and limited visibility into the actual environment configuration, rollback of a system based on virtual images reset requires that the entire setup must be updated, and then synchronized, while making the upgrade and roll back of the entire business service is a challenge. We observed numerous organizations leveraging private cloud for rapidly scaling up and down particular business systems while running these systems continuously.
Greater Risk from Retirement of Cloud Servers for Upgrade
Let's say that a certain business system is performing at the expected service levels, but you need to deploy a change to the software infrastructure. Unnecessary operational risks are introduced, when existing working virtual servers are retired, and you roll out new, replacement servers based on stored images that include the changes. We are seeing leading companies operating in the cloud changing their practices to upgrade existing servers without retiring them.
This landscape is going to become more intense as Thomas Bittman, VP and distinguished analyst at Gartner Research, anticipates, saying " We'll see about a 10X increase in private cloud deployments in 2012. Enterprises will find where private cloud makes sense, and where it's completely over-hyped. We'll see successes – and there will also be a number of failures (we've seen some already)." (Top Five Private Cloud Computing Trends 2012)
Servers Not Retired
Most often is the case that servers are not retired unless there is a need to scale the environment down, raising stability problems. This means that servers are provisioned using some base virtual image and then enhanced and upgraded using automated deployment on top of the base images. Then, exactly like in a traditional static data center, it is critical to maintain an ongoing assurance of environment consistency, as well as control over environmental changes. It is required to validate results of automated deployment (see Automated Application Deployment Is Not Enough! 3 Reasons Why You Absolutely Still Need To Validate Your Releases) and ensure that manual changes that can happen are detected and rolled into the deployment automation platform.
The Rollback Challenge
Frequent changes, fixes, and improvements, this is the nature of systems that are implemented in cloud platforms. That is until a problem is found. So what do you do?
You can fix the problem manually, which requires an investigation to figure out the cause, and that also means putting the system on hold. This takes time, and during that time the system is dormant, harming the business.
The alternative is to roll back to the state prior to the faulty change. Previously the idea of redeploying all of your servers just for a one-line configuration change was unthinkable. The cloud has changed that, with its fast and easy provisioning feature.
However, problems are often only discovered after a good amount of time after releasing to production. The more time that passes, the harder it is to roll back. Users get accustomed to new features, so you can't just take them off. Business and customer data is accumulated in updated schemas making it difficult to roll the database back to support application rollback. Or sometimes the deployed change was supposed to address a critical issue. Rolling back just brings the issue back.
The other option is to try to rollback manually, specifically selecting what areas to rollback. Not so fast! There are numerous gotchas to this scenario. Since instances can differ in many ways – say, contain different application data – you would need to configure those differences manually. Clearly, it's impractical to approach Cloud instances manually, the way that servers used to be managed on-premise. The complexity of the cloud rules out this option, making this too labor and time intensive.
SaaS-like Approach to Changes
In SaaS, when a new version is deployed, typically only a percentage of the users get new version. As the stability of the new version is verified, then gradually the base of users is expanded. So can changes to private cloud infrastructure be deployed similarly?
This option is not ideal either, as it adds more complexity and room for error to cloud management. This would mean that you have to manage at least two versions of the application and underlying infrastructure in production and design mechanisms, merging data coming from two versions.
Different Rollout Processes and Their Problems
Changes and updates to Private cloud can take place through several different deployment models.
- Image Deployment
Golden images has been an attractive approach, with minimal effort and time that is required to start a copy of an existing image. Downside: The disadvantage of this method is that this is just combining machines 'AS IS'. End users have limited options; the image catalog likely will only contain the commonly used images but not less common combinations of components that might be required for specific user purposes.
- Template-based Deployment
By following the template, a fully configured cloud instance can be deployed to any of a number of cloud environments. RightScale and Kaavo, for example, provide template-based cloud deployment. Downside: This means building and configuring templates that may miss ad hoc changes that took place.
- Script based Deployment
This means using a script to build the image, that ensures the rollout has the latest version of necessary components. Downside: An operator needs to configure these scripts to ensure that the rollout will be error-free, just like in software development. So just as bugs can creep in during coding, you can have a rollout script that contains bugs, even after you used it to deploy.
There is no single way to assemble and deploy systems efficiently in a private cloud environment. Any of these approaches can be mixed and combined to roll out changes in the cloud. And all of the problems can be present in the combination approach as well.
Staying On top of the Configuration
Configuration management tools have been used by groups running big infrastructures with lots and lots of systems to manage. Yet the dynamism of the cloud brings more problems, even if they are only using a couple of server instances to run their systems. This means configuration management and change management tools need to be able to dynamically stay on top of the different states of the private cloud based servers, to know what changed and what is the impact.
Most of the existing cloud vendors are missing this critical configuration management element, and are focusing almost solely on deployment automation. James Staten, Vice President and Principal Analyst Serving Infrastructure & Operations Professionals at Forrester explains, "IT pros have most of the basic ingredients to cook up their own cloud-like infrastructure — but there's no recipe, and many ingredients just don't combine well. Complicating the story are the traditional infrastructure silos around servers, networks, and storage that must work together in a new, truly integrated way. Vendors like Cisco, Dell, EMC, HP, and IBM know you need packaged solutions that just work, but until recently they left too much of the burden on their customers." (Are Converged Infrastructures Good For IT?)
Solution for Managing change in the Private Cloud
From this, we can conclude that management of the actual environment configuration and the changes that happen in the environment is critical for successfully operating in private clouds. To fully realize private cloud in the enterprise, organizations need a solution that can identify changes in near real-time, at a comprehensive and detailed level, to facilitate these deployment approaches . One thing is very clear: If your IT organization is not willing to make this investment for whatever part of its data center is transitioned to a private cloud, then it will not have a cloud that exhibits agile provisioning, elasticity and lower costs per application.