Roger Sessions recently published a white paper on IT complexity and its role in IT project failure: “The IT Complexity Crisis: Danger and Opportunity”. It’s certainly possible to quarrel with bits and pieces of his analysis, and thereby tweak his numbers, but the overall thrust remains undeniable: IT failures are costing the world incredible amounts of real money. Sessions even sums it up under the dire-sounding phrase, “the coming meltdown of IT,” and says, “the out-of-control proliferation of IT failure is a future reality from which no country—or enterprise—is immune.” And he presents “compelling evidence that the proliferation of IT failures is caused by increasing IT complexity.” He points out that the dollar cost of IT failure in the US alone is close to the cost of the recent US financial meltdown, and cites indications that the rate of failure is also increasing by 15% per year.
Roger’s paper is excellent and thought-provoking, and I recommend it highly. And I do agree with his view that complexity is the chief culprit in IT failure. That said, I think his argument focuses a little too strongly on one cause of complexity (unnecessary overcomplexity of architecture), to the neglect of other important factors.
To be sure, some obvious contributors to IT failure (poor project management, and lack of communication within teams and from business to IT implementers) aren’t dismissed by Sessions, but he sees their contribution to the crisis as relatively small. I don’t, and I’ve used this blog to write about those factors quite a bit.
Most of all, though, I differ with Roger’s focus on streamlining architecture as being the key to reducing system complexity. One could say, in fact, that Roger’s solution is primarily a technical one, where the bugaboos I see are primarily cultural and sociological. I see not one, but at least three distinct complexity-related burdens, increasingly endemic, and increasingly bringing down IT:
- Overly complex design/architecture
- Taking on too much functionality
- Poor implementation (technical debt in-the-large and in-the-small)
Roger has admirably dealt with the issue of overly complex design/architecture, at least in terms of a viable approach for simplifying up-front architecture, so I’ll focus here on the other two.
Taking on too much functionality
I recently rented an economy car (the least expensive option) on a trip with my son. (Remember, my self-appellation is “Cheap Technology Officer.”) He was stunned and dismayed that the car didn’t have automatic door locks; he didn’t realize that they even made cars without them anymore. Similarly, my 11-year-old daughter has grown up in a TiVo-ized world where live TV can always be paused. When she encounters a TV without a DVR (and thus no pause capability), she regards it as hopelessly primitive. Indeed, as unacceptable.
Similarly, the general standard for functionality and UI design has been raised by extremely functional PC software. I now expect to be able to double-click on a number in any onscreen report, and thus “drill down” into the transactional details that make up that number. When I can’t, I feel cheated. Equally, I expect everything on an interface to drag and drop; I get frustrated if it doesn’t.
So as an industry, we’ve raised the bar of acceptability, considerably, in software and technology systems over the last couple of decades. What that means in practical terms, though, is that across the board, our eyes have gotten bigger than our stomachs. We want more, up front, than it often makes sense to build at the start. And our demands are not negotiable, or so it seems. The first few cell phones I had didn’t even have a ring silencer on them; I used to silence the phone by adroitly disconnecting the battery when a call came through at an inopportune time. Today, most people wouldn’t even consider buying a phone that lacks much more elaborate features, such as a camera, that I would have considered as space-age in nature back in the 80s.
So our increase in expectations, alone, has added considerably to the functionality of systems we tend to build. There’s more functionality in and of itself, and usually more interface points to other complex systems. In fact, integration testing—where you connect new code into a working environment where it has to interface correctly with other systems—has become a frequent and major sticking point in launching information technology projects. In essence, we’ve fallen into the “nuts” dilemma, both in large and small ways. We want so much, and attempt so much, that we increase our risk of failure considerably.
Technical debt (in the large and in the small)
Any software developer will tell you that their first stab at implementing a given piece of functionality is often (if not usually) much more complex than turns out to be needed. Only after exploring the problem domain, with experiments and backtracking and restarts, do developers usually realize that their code can be pared down, simplified, combined with other modules, etc. This is usually called “refactoring”, and its importance is a relatively recent insight in the software development discipline.
A key insight about refactoring, though, is that it means improving the code without changing its overall results. There’s often no immediately obvious payback to this undertaking: it’s a roof project, in essence. To the extent that refactoring isn’t done (no time, no inclination, no recognition of a simpler approach), the end product is left with vestiges of unnecessary complexity. The greater the time crunch, and the greater the aspiration for the functional depth and breadth of the software to begin with, the more likely it is that these vestiges linger. And one ancillary aspect of the raised bar in expected minimum functionality is that it causes the time crunch to get ever greater. It’s no longer about delivering just a solution that will work, it’s about making sure that the solution includes (metaphorically speaking) a 5-megapixel camera too.
Couple this unnecessary but accidental complexity with what often amounts to a rush job on design for the sake of meeting schedule (creating “technical debt in-the-large”). An example I’ve used here before: choosing a different core DBMS to implement a given function, simply out of expedience, and failing to take time to modify previous functionality to use that new DBMS. Supporting two DBMSes within the same product represents significant technical debt: every subsequent system change and addition will entail “paying interest” on that debt, which not only increases schedule and manpower costs, but increases risk of failure as well. And that’s just one example; the technical debt cascades, feature upon feature, release upon release. Technical debt, until paid down, can be equated to a invisibly rising substrate of complexity, and it contributes massively to an increasingly wobbly, risky system. And lately, I see more and more organizations “pyramiding” their technical debt, never taking the time and cost to pay it down, with disastrous results. As Hemingway said about going broke, it happens slowly, then all at once.
Roger’s analysis, to its large credit, outlined an important aspect of complexity and posed a solution (an approach he calls SIP, or Simple Iterative Partitions). The aspects that I’m presenting (taking on too much functionality, and the pyramiding of technical debt) are, as I’ve said, cultural and sociological within companies. The answer is not nearly as simple or as neat as a specific technical solution (although I am certain that I will get comments on this post, perhaps rightly, from devotees of Agile).
To my mind, it’s engaged, savvy, forceful leadership that alone can address these issues, slow down the demand train, stop the madness. If anything, I think that there is an increasing lack of leadership in IT circles that can suitably recognize and address these factors, as well as educate their peers. And that’s what needs fixing most of all.