Montgomery, Metro outages suggest more lapses coming
Agencies' sophisticated equipment often requires long-term replacement plan

By Ashley Halsey III
Washington Post Staff Writer
Monday, November 9, 2009

Two relics from an earlier century broke down last week, causing vast new frustration for commuters who normally rank among the nation's most frustrated and delays that any good economist could translate into millions of dollars in work hours lost.

Montgomery County called in outside help when a computer that controls the synchronization of 750 traffic signals failed, and the response came from the company that bought out the company that built the computer 30 years ago.

When workers pulled apart a 37-year-old power unit in Metro's downtown headquarters, the inside was thoroughly charred after a circuit breaker blew and the unit overheated. The backup that was put into service was just as old.

The coincidence of twin mini-disasters for commuters last week might foreshadow scores of problems as cash-strapped governments stagger into the 21st century burdened by creaking 20th-century technology. Unlike businesses, which have had to keep pace with technological advances to stay competitive, government and public agencies facing budget woes more readily can postpone spending to replace old but still functional equipment.

"This is a wake-up to all municipalities across the nation and the area and underscores the dangers -- the ticking time bombs -- buried in our aging traffic engineering infrastructure," said John B. Townsend II, spokesman for AAA Mid-Atlantic. "We are playing catch-up, because lawmakers have been unwilling to fund upgrades."

In both cases last week, however, the problem with ancient infrastructure had been identified. The probability that an aging system would break down simply trumped the speed of efforts to avert failure.

The tragedy and travails of Metro this year, including a June crash that killed nine people, have underscored that a network that debuted in 1976 is badly frayed.

Replacing the fried power unit plus two other 37-year-old companion pieces and related equipment will cost Metro $14 million, an expense the transit agency can ill afford in the face of a projected shortfall of at least $22 million this fiscal year and an even bigger one, about $144 million, next year.

The budget picture in Montgomery is just as gloomy. County Executive Isiah Leggett (D) said he recognized the need for new traffic computers when he took office more than two years ago, and he launched a six-year, $35 million program to replace them.

On Thursday, when the balky computer system burped and began working again, Leggett said he would expedite the effort to replace it and won't allow a lack of money to hold up the work.

But Leggett's mandate to expedite the project runs headlong into the reality that such work can't happen overnight. The transition had been planned over six years because, to sum up the list of reasons provided by a county spokeswoman, that's simply how long it takes to complete the complex task.

Jeff Schiller, who manages the computer network at the Massachusetts Institute of Technology, sympathized with the immediate problem and with the long-term dilemma faced by public agencies.

"Building data centers is a very tricky and very expensive proposition if you want to be very sure it's never, ever going to fail," he said.

The very desire to avoid a network failure like Metro's or Montgomery's makes the systems all the more complex and increases the challenge of finding the problem when they fail, he said.

The county and the transit system used the same jargon in describing the breakdown -- "a single point of failure" -- which means that a chink in the armor of redundant systems brought the whole thing down.

"Redundancy adds complexity, and complexity results in complex failures," Schiller said. "The result of that is failures that are really hard to figure out."

That's why, he speculated, it took more than 40 hours to find the fix when the Montgomery computer and its modem stopped sharing information with the county's more than 750 stoplights.

The larger question of how and when to invest in decaying public infrastructure is particularly challenging when it comes time to replace computer systems, he said. A pothole, for example, is a more obvious public menace than an aging but unseen hard drive.

"The argument of saying, 'Let's replace this thing which is working fine, and we want to replace it with something that will do exactly the same thing,' that's a hard sell," Schiller said. "The answer always will be, 'It's working fine, and if we don't fill the pothole or do the paving, people will be upset.' "

Warning that a hard-drive meltdown could be disastrous is "like saying there's going to be an earthquake. People would rather not believe it, and they figure they'll deal with it if it ever really happens."

If government operated like a business, he said, it would calculate the likely lifetime of a computer system and begin banking money against the day when it needed replacement.

"Public entities can't do that because if they try to save and have a pot of money lying around, somebody's going to say, 'Hey, let's spend that,' " Schiller said.

© 2009 The Washington Post Company