How to Resolve the Agility-Stability Paradox for Enterprise Software
Building software applications within large organizations has always required tradeoffs. The venerable ‘iron triangle’ of software development famously traded off scope, cost, and time for any project. Select any two at the expense of the third, lest quality of the overall effort suffer.
Today, however, many additional factors impact this conventional wisdom.
The cloud changes how organizations account for cost, but also brings scalability that requires a rethink of what it means to create enterprise software.
Agile methodologies help organizations reinvent the tradeoff between scope and time, as Agile relegates the waterfall notion of scope to the dustbin. Instead, software teams work hand-in-hand with stakeholders to define an ongoing cadence of new and updated functionality (at least in theory).
Next, add DevOps to this mix, leveraging dramatically improved automation and greater collaboration across the software lifecycle to drive new paradigms for development and deployment of software: continuous integration and continuous deployment, respectively.
DevOps is unquestionably difficult to get right, but for organizations that do, the time necessary to roll out new and updated software can drop dramatically – from weeks or months to days or even hours.
If such improvements sound too good to be true, well, you’re right to be skeptical. Can enterprises really leverage CI/CD to achieve such blisteringly fast deployment times? And most importantly, what are the remaining tradeoffs?
In Search of Stability
More than any other nonfunctional characteristic, enterprises have long required that their software be stable. Stability, in fact, has different aspects: the software must behave as expected day in and day out, and the operational environment must also provide consistent performance over time.
The fundamental architecture of the cloud, in contrast, centers on resilience rather than high stability. Instead of guaranteeing so many nines of uptime, the story goes, the cloud builds in automated recovery from failure, as well as various failover approaches to reduce the impact such failures would have on businesses and their customers.
The automated recovery from failure necessary for cloud resilience, however, addresses individual incidents, but not the underlying problems. Dealing with such problems in dynamic complex environments is especially challenging, as reproducing intermittent incidents in a pre-production environment to investigate related problems is extraordinarily difficult.
Furthermore, due to the dynamic and opaque nature of the cloud, capturing the detailed state of the cloud environment at the point of incident can difficult or entirely impossible.
Automation counterintuitively exacerbates this problem. After all, automation can go wrong as well as the software interactions that are being automated. It’s essential, therefore, to both test automations before deploying them and to leverage intelligent visibility into their behavior to understand what the automations accomplished, and in particular, where they might have gone wrong.
Today’s clouds are remarkably resilient to be sure, but no one would say that they are as stable as traditional on-premises high availability systems. After all, there are always tradeoffs. What, then, does stability mean in such a complex, dynamic environment?
Attention Shifts to Visibility and Management
Given the burgeoning complexity in this modern hybrid IT environment, automation of many of the operations team’s tasks is absolutely essential. Automation is, in fact, one of the important enablers of DevOps.
Automation, however, will never be enough, because enterprises will always push the limits of possibility. It’s no coincidence, therefore, that observability is a key cloud-native architectural principle. In essence, all of the components of a modern IT environment should have observability built in – often via APIs or other even newer observability technologies.
Regardless of the mix of technologies, the fundamental principle paraphrases Peter Drucker: you can’t manage what you can’t see. The more agile and dynamic the operational environment becomes, the more important visibility is for the overall stability and resilience of the applications and workloads running in that environment.
As the speed of change and the complexity of the environment grow, so too do the types of problems. Errors are more likely to occur in groups rather than in an isolated manner. In such situations, simple rollbacks and rebuilds of defective environments are unlikely to sort out such compound errors.
Instead, it’s essential to establish a change trail and associated errors to identify the underlying problems behind such errors. Furthermore, ops personnel require visibility into the actual environment state in order to effectively manage change in today’s dynamic, automated environments.
In fact, improving visibility becomes a critical enabler of everybody’s role. Developers require visibility to perform their tasks in a dynamic, fast-moving team environment. Security personnel leverage increased visibility to identify, prevent, and mitigate threats effectively.
And most importantly, operations personnel require absolute clarity across the entire hybrid IT environment not only to keep the lights on, but to maintain the agility and speed the organization requires to meet the ongoing needs of its customers.
The Intellyx Take
If change in the operational environment were infrequent, then visibility wouldn’t be so critically important. But in the modern IT environment, change is ongoing, constant, and accelerating.
In this world, causes and effects are more difficult to link, while at the same time, the business requires greater speed and agility.
More than any other reasons, it is the combination of the speed and variety of change that requires increasing levels of visibility into the workings of the hybrid environment. It’s no longer adequate to wait until something goes wrong and then ask ‘what changed’ in order to uncover the cause of the problem. There are simply too many things changing all the time.
Only by rethinking operational visibility do organizations have any hope of resolving the tradeoffs of the iron triangle. Scope is always increasing. We must move ever faster. And the realities of managing costs are never-ending.
We require greater visibility into all changes across the IT landscape to resolve these tradeoffs without compromise.
Copyright © Intellyx LLC. Evolven is an Intellyx client. Intellyx retains final editorial control of this article.