It was exactly 2:25 p.m. on Jan. 15 when, out of the corner of his eye, Jim Nelson spotted an alarming sea of red spreading across the screens of 75 video monitors in the control center of AT&T's vast long-distance network.
The screens normally are filled with bland charts and maps of the United States. For Nelson, the manager of the Bedminster, N.J., center, the red warning signals were an unmistakable sign of crisis.
"We have the big one," an assistant exclaimed.
The nation's largest telephone network had virtually collapsed, frustrating millions of Americans who were blocked from making long-distance calls for nine hours and sending a team of more than 100 phone company technicians on a frantic search for the cause.
They found it in the software that controls the system's computers and electronic switches -- a small, undetected error in the web of written instructions that tell the equipment what to do. An unexpectedly heavy flow of calls had overwhelmed a weak point in the system, and American Telephone & Telegraph Co. computers, lacking instructions on how to deal with the unforeseen overload, simply shut down.
The calamity that struck AT&T that Monday afternoon is just the kind that many experts have increasingly come to fear as software reaches deeper into everyday life. In a generation's time, software has emerged as the ubiquitous control system of an automated society, a $125 billion-a-year industry that is an essential underpinning of America's economic and political standing in the world.
Software controls banking and airline reservations networks and is critical to U.S. defense systems. It decides when to buy and sell huge blocks of stock. It is buried inside videocassette recorders and the dashboards and fuel systems of automobiles. It picks lottery winners and flushes toilets in the new Boeing 747-400. It helps physicians select and administer treatments. And, by crunching billions of instructions each second, it can simulate nature to help researchers unravel man's genetic makeup or predict hurricanes.
A miracle of human ingenuity, software instructions translate the tasks requested by humans into electronic commands that computers can follow. Software converted this reporter's keystrokes into letters on a computer screen; other software converted those letters into type for this newspaper page. The computers involved in those operations are lifeless combinations of silicon chips and electronic parts that only software can activate.
Most software routinely performs as expected, but as society demands more and more from software and the computers it controls, errors and failures like AT&T's could easily become more common. According to scores of computer scientists and other specialists interviewed for this series of articles, the nation's ability to produce software on time and with high reliability is in jeopardy.
Software problems already affect many sectors of society. One of the most important is the Pentagon, whose increasingly high-tech weapons systems depend -- with uneven success -- on some of the world's most elaborate computer programs. Another is human health, which can be threatened by faulty software. And at giant corporations, huge investments can be undermined by delayed and over-budget software projects, which are now routine.
The Bank of New York once had to borrow $24 billion overnight from the Federal Reserve, incurring $5 million in instant interest costs, because a software glitch left it without enough funds to balance its account with the Fed. Inadequate software used to process student loans may cost a group of international lenders up to $650 million. Wells Fargo Bank in California vastly overstated the income of 22,000 employees in reports to the Internal Revenue Service because a programming error moved the decimal point two places to the right.
Last year, a mysterious defect paralyzed the American Airlines reservations system for nine hours. Though the carrier located the general problem area, it still isn't certain precisely why the software ran amok.
Perhaps one-quarter of all software projects are so troubled that they are simply canceled in midstream, according to Software Productivity Research, a Cambridge, Mass., consulting firm. The state of Washington, for example, last year pulled the plug on a seven-year, federally backed $20 million automation effort designed to give social service caseworkers more time to spend with their clients. One complaint: The program kept caseworkers waiting 20 minutes for computerized files.
Programmer Supply Declining
While such problems are multiplying, the supply of new programmers and software designers is declining. Following a sharp rise in the late 1970s and early 1980s, interest in computing jobs has plummeted among college freshmen, the fastest collapse ever recorded for a career preference in the 23 years that the University of California at Los Angeles has conducted such surveys. The reason most commonly given: Computer jobs are no longer considered glamorous.
But demand for software is expanding relentlessly, driven by society's insatiable appetite for new uses and the ability of computers to perform calculations at ever greater speeds. Each year, computers have been providing 25 percent more power per dollar, while the productivity of people who produce software has been rising at less than half that rate.
"The amount and quality of software we need is increasing constantly, and our ability to produce it is essentially stagnant. Those two things are on a collision course," warned William Wulf, former head of the National Science Foundation's office of computer and information science. It is "absolutely a problem of much larger dimension than most people realize," said Wulf. The consequence, he fears, will be a slowing of technological progress and in turn a decline in the country's economic competitiveness.
"Software can well become the limiting factor in what we can do in building systems in the future," said Norman Augustine, chairman of Martin Marietta Corp. in Bethesda. The bottleneck could affect "space systems, telephone systems, automobile systems or any other complex technological device," he said.
Other experts warn that, as computers increasingly take over decisions formerly made by human beings, software producers and the public may be placing too much confidence in a technology that defies perfection.
"I'm worried that people are putting too much reliance on computers without enough understanding of the potential risk that they may be adding," said Nancy Leveson, a professor specializing in software reliability at the University of California at Irvine.
Problems with software have claimed a handful of lives, and the potential for software-triggered breakdowns to affect public health and safety "will be much worse in the future than it has been in the past," said John Guttag, an industry consultant and Massachusetts Institute of Technology computer science professor.
Man's 'Most Complex Artifact'
Large software systems, in the words of John Shore, a Washington author and software engineer, are "by far the most complex artifact" built by man. It is impossible for designers to predict how complex software will function in every circumstance, and when failures do occur they may never be fully comprehended even by those who crafted the code.
"The programs we construct are effectively too large for humans to understand," said Wulf. "Yet every characteristic of them depends upon the human's ability to understand them, to cope with them."
The challenges confronting the software industry center on the tension between the rigid, precise demands of electronic technology and the spontaneous creativity of programmers and software designers -- with their capacity for human error.
Hidden and intangible when in operation, software takes form as the excruciatingly detailed instructions known as computer "code." Generally written by professional programmers, the code is gibberish to the uninitiated. But it actually is a logical structure of step-by-step commands and decisions, bearing some resemblance to English in its use of letters, numbers and symbols. A line of code is akin to a sentence of instruction.
The instructions are either stored electronically in computer chips, like those inside video games, calculators and automobile emission-control systems, or recorded on magnetic disks and tapes linked to computers.
In recent years, software programs have swelled from something easily handled by a lone "hacker" -- as computer enthusiasts are known -- to systems too large to be grasped by a single mind.
One popular software program for personal computers known as "dBase," designed to manage large amounts of data, was written a decade ago by two programmers and required fewer than 50,000 lines of code. It took a team of more than 100 people three years to write a new, more sophisticated version of the program, and even then the 400,000 lines of code they delivered -- six months late -- were so laden with defects that publisher Ashton-Tate Corp. of Torrance, Calif., provided buyers a "bug list" of flaws during the 21 months it spent making further corrections.
Still, a program like dBase is a small job compared with the software produced by the aerospace and defense industries. Those projects often run more than 1 million lines -- roughly equivalent to the listings in the Manhattan phone book.
As a product of human minds -- with their wide variances in skill and judgment -- software is not a task easily reduced to tools, mass production or standard parts. Nor is there enough effort to transfer know-how from project to project, causing wasteful duplication.
"The problem is that software has the highest manual labor content of almost any manufactured item in the second half of the 20th century. It's like building pyramids or handcrafting Rolls-Royces," said Capers Jones, chairman of Software Productivity Research.
"We're still building software in many ways the same way we were 30 or 40 years ago," said Max D. Hopper, senior vice president for information systems at American Airlines in Dallas.
Indeed, software development frequently is treated more like an art than a science, with design and testing often dictated more by personal choice than by regimen. Software developers, a fragmented community of independent-minded souls, lack the widely accepted safety standards and engineering discipline applied to the manufacture of mechanical and electrical equipment. Programmers need no license, no particular academic degree and no other official credential to build a software structure, though their creation may be as critical as any bridge or skyscraper.
Coping With Ambiguity
Software problems begin long before the first line of code is written. In trying to take on tasks or decisions formerly handled by people, or new challenges never before conceived, software must translate all the ambiguity of human thought into rigid commands that a computer can follow.
This means that even before writing code, software developers must try to imagine all the different circumstances to which the computer or electronic equipment ultimately may need to respond, a virtually impossible task. And often the people writing the software have little understanding of the industry that is going to be using it.
"Imagine building a skyscraper and then realizing you forgot to leave space for a water system," said William Scherlis, software technology program manager at the Pentagon's Defense Advanced Research Projects Agency. "That's what happens in software all the time."
There is also the endless temptation to keep tinkering with and adjusting what has largely been completed, a practice that can cause other parts of a program to unravel.
Poor management, as much as anything, is to blame for poor software, experts say. Top corporate managers, many lacking an understanding of software, often don't know how to plan for something they can't see or touch. With little in the way of a standard blueprint to help visualize the outcome of a software effort, many companies and government agencies fail to gauge the challenge or create a structure necessary to see large projects through to completion.
Those in charge of software projects routinely miscalculate the magnitude of their project, a mistake Allstate Insurance Co. officials, for instance, readily acknowledge making.
Allstate, based in Northbrook, Ill., hoped a new computer program would cut as much as 75 percent of the time it takes to devise new life-insurance policies. In 1988, just before the system was supposed to be completed, the company realized the project was badly off track, and it started over. Now it predicts that the work will not be completed until 1992 -- and at three times the original cost.
"I don't believe we had recognized the level of planning that was needed," said project chief Ben Currier. "I don't think we had the proper management procedures in place."
American Airlines paid a steep price when it tried to add international fares to existing software before managers had the right information at hand. Too late, they discovered the fare-calculation formulas were incorrect and insufficient, causing development time and costs to double and leaving agents unable to function as planned. "We totally screwed up," senior vice president Hopper conceded.
Many companies that develop software for their own payroll, inventory tracking and other essentials of business estimate that they are so backed up in their software development that if they stopped getting new assignments today, programmers would spend the next three years completing their backlog of requests.
Changes Can Cause Trouble
A major software failure like the one in January at AT&T can be traced to any combination of human error, design flaws and project mismanagement.
The problem with AT&T's software turned out to be a mistake made in just one line of a 2 million-line program used to route calls. Software is structured much like a road map, with many of the lines directing the software where to go next. The flawed line, or software "bug," in the AT&T program sent the call-processing mechanism to an incorrect place in the code, where the next instruction it encountered made no sense, thus disabling the equipment.
As is often the case, the fatal bug had been injected into the system when AT&T altered the software a month earlier to fix an unrelated flaw. The ability to alter software with relative ease causes many of its problems, since small changes can cause larger disruptions elsewhere.
The glitch surfaced only when telephone traffic was so heavy that two calls happened to arrive at a troubled switch within one-hundredth of a second of each other. Despite months of testing, AT&T had failed to prepare for this exact sequence and pace of events.
The AT&T breakdown underscored the trade-off between achieving greater performance and taking greater risk. Software has evolved as the technological backbone of modern society because, in most cases, it is much quicker and more reliable than humans. Many of today's conveniences are possible only because software has taken over where humans or machines left off, carrying on tasks with amazing speed and without wearing out or tiring. And like humans, software has a seemingly endless capacity to adapt to change.
But ultimately, software's performance depends on humans -- on people's ability to turn imprecise human preferences into a master plan that can operate flawlessly, without the benefit of common sense to guide it through unexpected situations. These days, as the people who write software race on an accelerating treadmill to keep up with demand, concerns are rising that they are being pushed too far, too fast.
As the Bell Laboratories vice president who presided over the AT&T software-repair mission, Karl Martersteck knows that dilemma well. "With complexity," he said, "you increase the number of things that can go wrong." NEXT: The challenge of "debugging"