Early last week, we looked at one of the many reasons Healthcare.gov failed the way it did: The use of the "waterfall" software development methodology, completing one stage before moving on to another, rather than the "Agile" method, where software is built in iterative bursts, with testing throughout. It seems to have touched a nerve among computer folks, many of whom wrote in with observations on how Agile actually works, and whether it was even the right way to go for a site like Healthcare.gov. Herewith, a few of the most interesting (those without names asked for anonymity to protect their jobs).
From an IT consultant for governments and corporations:
I think the biggest piece of the story that is still missing is that the software was scheduled for delivery on Oct 1. It could have been delivered far sooner, with far fewer functions, if the project delivery dates and requirements were designed differently. I believe this would have greatly increased the likelihood of success, and is a big reason why state healthcare websites are working, while the federal one is not.
For example, in some states relying healtcare.gov, prices for policies may have been known (much) earlier than Oct 1, so the website could have released a site that let users price shop without being to buy well ahead of time. They could have released software that let users create accounts and browse some of the information about them from federated databases prior to Oct 1 release. They could have allowed users to contact support if the information was inaccurate. There are many more examples of small feature sets which could have been released earlier, with the aim towards releasing the final product on Oct 1. In contrast, states running their own markets/sites, that I've looked at, had websites with lower initial objectives and staggered release schedules, providing increasing levels of access and functionality over time.
Whether a project releases software via Agile or Waterfall methods, if you try to have a very big first release, you are almost certainly going to fail. The reason is that you orient the team towards one big, remote and hard to understand objective, rather than a series of small, attainable and comprehensible objectives. In the former case, team accountability and evidence of success are very hard to measure.
Regarding the debate between waterfall and agile, It's totally possible to manage towards small, simple, staggered objectives using either waterfall or agile methods (though I prefer agile). In the case of Healthcare.gov, all the requirements were specified up front (in the procurement), which makes agile less practical.
This speaks to the second point, about government procurement, which is a major frustration of mine. By specifying all the requirements up front, government engages in a fantasy that it thinks it is possible to receive a precise dollar bid from vendors for the entire project. But vendors *know* that the requirements of *all* software projects will change over time. So they know that they will be able to "change order the project" over time, adding new charges as new requirements are received or old ones are changed. Many vendors will underbid projects to win procurements, knowing they will make back the difference on change orders later. To be concrete, a government agency will issue an RFP that signals a price of (say) $17-19M. The clever vendor will bid $15M and win, knowing that the government can spend up to $19M and will almost certainly make changes to the project as it goes along allowing the vendor to increase it's total bill to that amount before all is said and done. It's hard to overemphasize how common this practice is among savvy technology contractors who bid on gov't contracts (local, state and federal).
Which begs the question, why do government agencies issue RFPs with fixed requirements in the first place? In my experience they do this in order to appear to understand the project at the start, which sells well up the executive food chain. It's hard politically to issue an "agile rfp" that says "we have these approximate product goals, we have this specific amount of money to spend, and we would like bidders to present evidence that they will be the most effective partner in iterating software with us towards accomplishing those goals." It's also hard to create RFP evaluation criteria that are not open to cronyism if a bid is written in this way. And of course, government procurement staff would almost always disallow such RFPs for internal policy reasons, even though such procurements would be perfectly legal (as another correspondent noted). So we see gov't agencies write fanciful, waterfall RFPs, and, again and again, have late, over-budget and failed technology projects on their hands, while vendors pocket the same amount of cash as they would under a fixed bid, agile development process.
It's not that Agile doesn't work in gov't for any technical reason, it's that it's very hard to change the existing systems and culture I described above to make it work. It's totally possible to do it, but it requires a lot of time, attention and expertise on the part of the issuing agency (I've seen it work under rare, enlightened leadership). This kind of technical and managerial competence is in short supply inside government at the present time. The Presidential Innovation Fellows program is the most promising move towards bringing this expertise inside government that I've seen to date.
And if you want to look at how hard it is for even the PIFs to move the needle, take a look at the RFP-EZ program that was rolled out recently. The new EZ process is in fact far from easy - and none of the critical requirements that drive managers towards waterfall and fixed-requirement procurement were changed in any way. All the EZ reform did was make it slightly easier to for a vendor to submit a bid online. And just accomplishing these slight changes was a high feat of bureaucratic negotiations, to give the hardworking PIFs their due. There's a long way to go here, though I am encouraged that the government is trying.
I think after the Obamacare technology website rollout debacle, this will finally get senior executives attention in government. I think this is the highest profile technology failure we've ever seen in this country, and the first one that might actually impact the overall policy objectives of a government program (as opposed to just changing the cost profile and timeline, which is usually what happens when technology projects fail).
From the Chief Technology Officer of a management consulting and IT firm focusing on defense and intelligence:
I was prepared to be enraged by another article that “doesn’t get it” but was pleasantly surprised by your article. My note here is not for attribution, but am open to talking with you.
A couple of comments:
- It was a traditional contract vehicle that brought in the design firms under Aquilent. The contract structures exist to allow that to happen but there needs to be strong government leadership and desire to do so. The government contract community is CHOOSING “Low Cost Technically Acceptable” as a framework. They don’t HAVE to. The procurement model supports many varieties of things but does not give the government leaders support for choosing innovation over low cost. The real story is about supporting Government leaders to do the right thing.
- The front end website was NOT developed Open Source as you state. It utilized Open Source code and published the code back. But it did not use the open source community to code and debug, so I think you got that wrong as far as I see. FYI, they have pulled their code back — it is NOT open to the public at this time. Maybe a story there.
- Some acknowledgement of the back end complexity should be made. There are 50+ exchanges interacting with massive archaic back end systems. That is not a good place for Agile. And I don’t think how back end systems function with Personal Data should be made public, do you? I think CGI probably take some blame but they are dealing with enormous challenges. The story you cited references this, but your does not.
- Agile requires a business owner to be embedded who can make real time decisions. Health Care system complexity does not lend itself to that being the case. The challenge is the complexity of the problem — a lot of that comes from CONGRESS!! No one has read ACA b/c its so complex.
From Agile coach Cliff Berg:
Well first of all, we need to distinguish program management from agile project management. Agile is a methodology pertaining to software projects. It does not speak to how one manages a large program - a collection of projects. Agile values and principles can be said to inform program management, but there is no consensus on how one manages an agile program. So if one says that a large software program was "done agile", one is mis-speaking because there is no such thing.
That said, there are many points of view in the agile community on how one should manage programs. This is often referred to as "scaling agile" (from the project level to the program level or organization level). This is an immature area however, and as I said, there is no consensus.
One also needs to distinguish between programs that are owned by the government and ones that are effectively contracted out. There is no sharp line between these two ends of the spectrum. What is typical is that an agency will contract out each step in the process, but insist that the government own the resulting product and often that it be built and deployed using government processes and systems. For example, an agency might have several different contractors build pieces, but each step of the process is overseen by a government manager: one might have a development manager (gov), a test manager (gov), a release management manager (gov), a data center ops manager (gov), etc., and each of these has a counterpart in the respective vendor - the vendors are often different for each step of the process. In this situation, the gov is able to ensure that agency processes are used - processes for governance and risk management. On the other end of the spectrum, the gov might contract with a systems integrator to develop, deploy, and operate an entire system with the gov playing almost no role: that is very rare for software.
Thus, the typical case is that the gov plays a very active role. I don't have insight into how healthcare.gov was managed, so I can't speak to where on the spectrum it was/is, but it sounds like it was somewhere in the middle. That would imply that the gov bears a-lot of responsibility for what happened, but I don't know for sure.
The vast, vast majority of government software projects operate in the manner that I described in which the gov basically embeds several contractors in the gov processes of development and deployment. Each contractor has project managers, but there are gov counterparts to those.
In that situation, it is often hard to say if a project is "done agile" because a project does not exist in a vacuum. Unlike a small team in a startup, a contractor agile team in a government agency has to abide by all of the governance rules that the agency has. The project also usually has to rely on various support functions for security, testing, enterprise architecture, deployment, data center operation, etc. These functions might be staffed by contractors, but they are usually managed by government managers. And each function usually uses different vendors for their contractors.
This makes it is really hard to do a project "agile". The challenge is getting all of the support functions to act in an agile manner - doing things "just in time" rather than doing big plans and schedules and designs. One also has to get the gov business-side stakeholders to properly support the agile process, by allowing requirements to evolve instead of contracting out a big up front requirements definition effort.
And the big elephant in the room is contracting/procurement. Traditional fixed price, fixed feature deliverable-focused task orders kill agile projects. Agile projects can only be successful if contracts are more flexible.
The process of converting an organization (like a gov agency) to properly support agile projects is referred to as "agile transformation". Agile transformation is really hard, and it is what I do. It is management consulting with a agile flavor, and it involves working with the CIO of the agency and the various managers to change how they do things. It also involves changing how the agency plans and manages its portfolio of IT projects and how it does contracting. Agile transformation utilizes "agile coaches" to evangelize the ideas and work directly with software teams, but there are usually a small number of senior "transformation coaches" (like me) who work with management to plan and oversee the process.
Thus, the situation of agile in government is not simple. If one says that there are successful agile projects in gov agencies, then one is implying that either (1) the project was given a "pass" and allowed to bypass the agency's support functions and governance steps, or (2) the support functions were made to work properly (in an agile manner) with the project. The latter is the ideal, and if that is the case, then there is a much larger story: it means that the agency was transformed and made to be an agile agency for its IT work - and that is a big deal.
Agile is pretty new in gov circles. There are lots of success stories, but they are usually special cases - the #1 from above - and that is not interesting because it is not scalable. What is more interesting is which organizations have been able to change how they work so that they can support agile projects in a repeatable manner for all their IT work.
From a quality assurance manager at a large e-commerce website:
It's an interesting question, though -- because what you are really asking is would the public be ok with the ongoing discussion of requirements and changes, and should that be done out in the open? I'm pretty sure you will find a litany of stories now about how the developers knew well in advance that this was going to turn out the way it did. The problem is the organizations involved are not set up to hear and respond to that kind of internal alarm in an effective manner, partially because their marching orders come from outside, by government folks that are also not geared to that kind of responsive change.
From Larry Lewicki, retired technologist at Texas Instruments and National Semiconductor:
I really liked your article from Oct 21. As a retired engineering manager (hardware - communications integrated circuits) -- I believe that there's a lot of truth to these observations. However, I'm still left with a big question:
Can an 'agile' environment can work within the legislative process?
My belief is that the rigid development of comprehensive front end specifications is indicative of a "legalistic" environment -- where there's significant lack of trust between the players who negotiate the law. (You agree to develop this -- no more - no less -- in this amount of time.)
It seems to me that the product paradigm based on hardware that a user owns -- and will use for a while (subsequently replacing) is inherently consistent with the 'legalistic' environment. The user "knows" what they own.
On the other hand, web based software -- that the user occasionally accesses - has a different paradigm. It can continually be upgraded without negatively impacting the user. Something like Google Maps or Google Docs -- is continually being upgraded -- I don't know what has happened in the last 24 hours since I started planning my vacation. I'm sure Amazon is modifying their web software as I'm typing this email.
Agile environments -- mean the specification is constantly evolving as well -- this very behavior seems counter to the way the US government writes legislation -- which in turn drove healthcare.gov.
Scott Simenas, retired software engineer with Raytheon:
Thank you for responding to my comments about your article "The way government does tech is outdated and risky." The problems affecting the Obamacare Website are not related to the contractor software processes used to develop the major components:
- Enterprise Identity Management (EIDM) QSSI lead contractor: Manages the user accounts and provides secure access.
- Federally Facilitated Marketplace (FFM) CGI lead contractor: User interface to the Obamacare Website .
- Data Services Hub (DSH) CMS lead contractor: Connectivity hub to the federal agency databases for the IRS, Social Security Administration, and Department of Homeland Security and connectivity hub to the more than 170 insurance carriers in the 36 states the FFM operates in.
The problems are related to:
- Inadequate testing of the DSH connectivity to federal agency databases and insurance carriers database. All the possible pathways to the legacy databases could not be tested until the system was fully integrated.
- Inadequate real-time performance of the existing federal agency databases and insurance carriers database. These databases were probably not designed to handle the large number of concurrent users expected for the Obamacare Website.
According to CGI Federal Senior VP Cheryl Campbell's written testimony to be presented to the House Commerce and Energy Committee on Thursday October 24th, all three contractor components were tested and validated well before going live on October 1st. Ms. Campbell states that the government agency Center for Medicare and Medicaid Services (CMS) was responsible for overall system integration of the three contractor components and interfacing to the federal agency databases and insurance carriers databases. According to blogs and articles I have read the integrated Obamacare Website was not tested until the day before going live.
In my opinion any problems in the EIDM and FFM should be quickly and easily fixed. The harder problems to fix are in the DSH and the legacy databases. The Obamacare Website uses a loosely coupled database connectivity approach called federated database systems. The problem with this approach is performance and reliability of the overall system is only as good as the slowest and least reliable database in the federation and it may not be feasible to tune the individual databases for the required system performance. It is also extremely difficult and expensive to keep the data model in the DSH consistent with the all the disparate/redundant/inconsistent data models of the databases in the federation. If the federated database approach cannot be made to work, then the Obamacare Website may need to build a common database with a unified consistent/un-ambiguous data model that extracts and uploads data periodically from the legacy databases into the integrated common database. This may be the only way to provide the performance and reliability required by the Obamacare Website.
From an employee of the U.S. Customs and Immigration Service:
Interesting article. However, the fact that USCIS uses AGILE is a big stretch if you are trying to make the point that AGILE is the answer. I think USCIS might be better used as an example of how much worse it could be.
USCIS has been in the process of trying to allow various benefits (Visas, green cards, citizenship, etc.) to be handled online for over 5 years. There is an entire department devoted to this effort of over 50 full time employees. This department is independent of the IT department. The number that gets tossed out is that USCIS has spent over $500 million on this project.
In May 2012, USCIS was able to launch this system to handle a subset of the I-539 forms. This form is used by some nonimmigrants to request extensions of stay or changes from one nonimmigrant category to another nonimmigrant category. Of the ~6 million applications USCIS handles each year this form accounts for ~150,000. Of that the specific type of I-539 that the ELIS system handles accounted for ~4500 applications. There has been no addition of any other form types since May 2012. They also take longer to process than the paper forms.
This is really just a customer management system that would allow customers to establish an account and then file different applications. This type of system has been used by insurance companies and banks for probably over 20 years. Admittedly there are some unique security and document transfer problems. Although these have pretty much been solved by other government agencies such as Department of State (passports) and Patent and Trademark Office (patents). You can probably go to your personal bank and be able to see all your account information and apply online for a loan. The banks and insurance companies have most of the secure information that the USCIS keeps (dob, ssn, address, phone #).
So at 3 1/2 years and whatever amount of money HHS has spent, they seem to be far ahead of the USCIS. At least they have a product out there.
From Sean McBeth:
I was pleased to see that your main thesis was that typical government project management mandates the Waterfall methodology. However, I believe you have made the same error that Winston W. Royce made when first describing the Waterfall model as it pertains to software. He, too, maligned its suitability, and he, too, made nice graphics to go along with it. And non-technical management types skimmed his paper and saw the pretty pictures and thought "that looks like a good idea".
In your article, the image representing the Waterfall model looks simple and pleasing. It's colorful and easy to follow. The Agile software methodology image is, in contrast, confusing, opaque, and cluttered. But despite what $500/hr corporate training consultants from IBM want to tell us, Agile is simple to grasp.
I personally tend to think of Waterfall methodology as using a map drawn from memory to precisely plan on how many steps to take and in which precise orientation to place one's feet, then leaving the map at home, to make a journey across a desert. Agile, on the other hand, is taking the map with you, and a pencil to make corrections as you go.
Thanks again for your article. We need more people fighting the good fight against Waterfall anti-methodology.