Success Relaxed NASA's Vigilance Assumption of Reliability May Have Blurred Symptoms of Failure Another in an Occasional Series.
Three years before their first powered flight in 1903, Orville and Wilbur Wright experimented in gliders on North Carolina's Outer Banks. To allay parental fears, Wilbur Wright wrote his father: "The man who wishes to keep at the problem long enough to really learn anything positively must not take dangerous risks. Carelessness and overconfidence are usually more dangerous than deliberately accepted risks."
Wright thus became the aerospace industry's first safety analyst. He understood that flying was risky, but felt the dangers could be controlled and accepted if they were not ignored. His views can easily be used as a yardstick to measure today's efforts by the National Aeronautics and Space Ad- ministration to make space flight routine.
It is obvious, four months after the space shuttle Challenger exploded and killed seven astronauts, that NASA forgot the basic rules. Those rules depend on constant vigilance by humans, which can be easily relaxed by success, the pressures of institutional politics and cost.
The vigilance must not end when a shiny new aircraft or spacecraft is rolled out the hangar door, because engineers will never anticipate all the things that can go wrong with their machines.
The technology trap comes after the product has worked reliably. The assumption sometimes is that it will always work reliably -- even if the machine is displaying symptoms that all is not well.
NASA flies machines that are more dangerous to their occupants than any modern airplane. The faulty solid rocket booster joint that is receiving the blame for the Jan. 28 shuttle explosion is but one of 748 known elements that can go wrong during a shuttle flight and kill the crew.
NASA decided during its design phase to take risks that the much-maligned Federal Aviation Administration would never dream of permitting an airline or aircraft manufacturer to take. Therefore, NASA implicitly accepted the responsibility to be unusually alert after operations began.
The FAA requires that passenger aircraft have more than one engine. If an engine fails even in the most critical moments of flight -- takeoff and landing -- the remaining engine or engines must provide adequate power. If just one of NASA's rocket engines fails, there is no assurance of safe flight, although some failures would be survivable.
The shuttle's orbiter has only one hydraulic system to power the elevons, the flight control panels at the rear of the wings. That system is tested and regarded as highly reliable, but if the system failed it would be difficult if not impossible to land the orbiter safely. The FAA requires commercial airliners to have redundant hydraulic systems; if one fails, another is there to assure that the pilot can control the plane.'Single-Event Success'
Shuttle experts agree it was inevitable that an accident would eventually destroy a shuttle. Their surprise came over why it happened this time, because of a faulty seal on one of the solid rocket boosters. "We lost the shuttle for a stupid reason, because the simplest part of the system failed," one specialist said.
NASA's published risk assessment process seeks to prevent stupid accidents. So what went wrong? John Brizendine, chairman of NASA's Aerospace Safety Advisory Panel and the retired president of Douglas Aircraft Co., put it this way: "I have a thing about NASA's single-event success syndrome . . . . It pervades the agency. 'We've done it, so it's got to be good' . . . . NASA was no less safety conscious from the standpoint of not wanting anything to happen, but it doesn't take long for human nature to dull its senses."
Failure after success "may be the frailty of the human element," said L.C. Raborn, director of Delta rocket programs for McDonnell Douglas Astronautics Co. He was interviewed before a Delta was destroyed shortly after liftoff on May 3 and noted that there had never been a Delta failure caused by a malfunction of a rocket part being used for the first time. Rather, he said, problems had come only from equipment that had previously been flown successfully.
Others suggest that the problem is deteriorating quality control. Brian Stockwell, an insurance broker for satellites and other commercial space ventures, said, "There is a general feeling among the underwriting industry that a little sloppiness has crept in" to the manufacturing of satellites and rockets.
Sen. Albert Gore Jr. (D-Tenn.) recently released figures showing that since 1971 NASA has cut by 71 percent the number of people responsible for monitoring the quality and reliability of its equipment.
It is easy to forget that NASA's successes included 24 successful shuttle flights and 43 consecutive successful Delta launches, all glorified by a public relations effort that disarmed would-be critics and may have contributed to the dulling of senses at NASA.
But failure is also part of NASA's history. Even the wondrous Apollo program, which put men on the moon and brought them back safely, had three disasters, two of them largely forgotten.
The best known was the 1967 fire in the command capsule of Apollo 1 that killed three astronauts in training. The unmanned Apollo 6 spacecraft took off in April 1968 for space unknown because a pipe had been installed improperly. There was an explosion on Apollo 13 in April 1970 that cost the mission, although the astronauts were able to use their lunar lander as a life raft to return safely to Earth.
NASA assesses risk for the shuttle roughly the way the FAA decides if an airplane is fit to carry passengers. But unlike commercial aviation, where the FAA writes the rules for the manufacturers and expects them to be followed, NASA writes its own rules and appeals problems to itself.
A Capitol Hill expert on risk analysis, who asked not to be identified, said, "The FAA's regulatory system says that you make it as safe as you can and then see" whether a new plane meets the requirements. "NASA's assumption is that, 'The flight's going to go and let's do it as safely as we can.' "
The decision to go ahead despite uncertainty comes down to "a question of judgment, reached by pooling the talents of a number of people whose opinions you respect, and by agonizing," said John L. McLucas, retired executive vice president of the Communications Satellite Corp. and a former FAA chief.One Chance in a Billion
FAA regulations require that if a plane would not survive the failure of a single part, there must be a back-up part or the manufacturer must prove that there is only one chance in a billion that the part will fail.
NASA regulations require that if a space machine would not survive the failure of a single part, there must either be a method to assure that it will not fail or a committee of NASA experts must determine that the risk is acceptable and waive the requirement.
NASA does not use mathematical probabilities such as one chance in a billion to determine the safety of its space machines, despite much mythology to the effect that the Apollo program was designed around the concept that there would be 99.9 percent reliability. That would be one chance of a failure in 1,000.
"Even with 99.9 percent reliability, we could expect 5,600 defects in Apollo 8's 5.6 million parts," former NASA safety director Jerome F. Lederer said, and clearly the Apollo did better than that.
The exemptions that NASA permits to its safety rule are there for reasons of logic as well as cost and weight. They are:
Primary structure, meaning among other things wings and the tail fin, which cannot sensibly be duplicated but which must not fail if flight is to continue.
The tiles glued to the exterior of the orbiter to protect it from the heat of reentry into the Earth's atmosphere. Pressure vessels such as fuel tanks, on the grounds that if one fails, a backup system would do no good.
Premature firing of explosive bolts, which separate stages of the shuttle as it climbs, or range-safety destructive devices such as those that blew up the solid rocket boosters to stop their erratic flights after Challenger exploded and the Delta after it began to come apart. If such devices fire before schedule, there is no need to duplicate them.
Waivers and most exempted items become entries on the shuttle's "Critical Items List." The failure of items that would result in the loss of the shuttle and its crew are called "Criticality 1" by NASA. There are also "Criticality 2" items, which mean failure would result in the loss of a mission but not the shuttle or crew.
The 748 Criticality 1 items on the shuttle include 335 on the orbiter, 133 for the external tank, 114 for the solid rocket boosters and 94 for the main engines.
If a Criticality 1 item has a backup system, making it fail-safe, it is called "Criticality 1R." The R stands for "redundant."
The critical items lists that have been released by NASA are 3 1/2 inches thick, page after page of engineering analyses of the effect if something goes wrong and why it probably won't, thus making the risk acceptable.
The lists are revised for each shuttle mission, depending partly on what dangerous things are in the payload and what changes have resulted from experience gained from earlier flights.
When the space shuttle was under construction, the reliability of the main engines was a major concern of designers. The individual engines are the most powerful ever built and push the edge of rocket technology.
NASA required that three of the engines be tied together, as they would be on the shuttle's orbiter, and fired 40 times without failure. There were several expensive explosions before the engineers got it right.Extensive Booster Tests
"My real worry about that vehicle was and is and will continue to be the main engines," said John F. Yardley, president of McDonnell Douglas Astronautics and the chief of NASA's manned space flight program from 1974 through 1981, when the shuttle was being developed.
The solid rocket boosters also were tested extensively, although solid rockets cannot be tested the way liquid-fueled rockets can because solids cannot be turned off and on. Once ignited, they go. Many tests are done with small, scaled-down versions because the real thing is so enormous.
When the shuttle Challenger exploded, the accident sequence started after two washer-like devices called O-rings failed to seal a joint between segments of the right-hand solid rocket booster, permitting hot gases to shoot out the side. There was a considerable debate the night before the launch about whether subfreezing weather could adversely affect the performance of the O-rings, as is widely suspected.
"Did we do all the tests we should have on the boosters ?" Yardley asked rhetorically. "I don't know, but I thought we did. But we never did fire lower than 50 degress."
While cold weather clearly contributed to the disaster Jan. 28, it can be argued that temperature was a secondary problem in an O-ring system that had given plenty of warning it was fragile. The warnings either were not heard or were ignored -- the kind of human decision that may seem perfectly reasonable at the time but is incomprehensible in hindsight.
Just as the FAA requires airplane manufacturers to track their creations to spot problems and make corrections, NASA requires its contractors to do the same. Again, the highly complex and inherently dangerous nature of the shuttle makes after-flight monitoring critical.
With a large fleet of aircraft, such as 1,800 Boeing 727s worldwide, an enormous statistical record can predict performance and answer whether a problem is one of a kind or a trend. Over the years, Boeing has recommended 2,200 changes to the 727. Ninety-nine of the changes have been considered of such importance to safety that the FAA has required that they be made.
The shuttles write history every time out with their exotic combination of volatile fuels and edge-of-technology engines expected to perform in hostile environments.
O-ring problems first became evident on the second shuttle flight in November 1981, or 23 flights before Challenger, when significant erosion of an O-ring was discovered in postflight examination. That was the first warning of trouble; meanwhile, another problem was developing.
NASA originally called the O-rings "Criticality 1R" items, because it was believed that a second, redundant O-ring would keep the shuttle flying safely if the first failed.
However, experience showed that the second O-ring would not always seal properly because of rotation in the joint under the pressures of launch. Thus, the O-rings had to have a waiver if shuttle flights were to continue. On March 28, 1983, NASA headquarters signed the waiver.
By that time there had been five shuttle flights and O-ring problems on only the second, so the human alarm system was muted. The justification for the waiver included tests that showed that "nine static firings and the flights have resulted in over 270 joints tested with no evidence of leakage," according to NASA documents.
But after the waiver was signed, there were six more partial failures of either primary or secondary O-rings on five launches before Challenger's last flight. NASA and its contractors were studying the joints and the O-ring system to see if they could improve it, but they did not stop the flights.
An engineer with ties to NASA noted that, while the design of the O-rings can be properly criticized, that is the lesser issue. "My concern is that NASA did not get on it and fix it" after problems were discovered.
Just as manufacturers do to their airplanes, NASA has made many changes to the shuttle since its first flight in 1981, although the precise number could not be learned. There have been improvements in the tiles that protect the shuttle from the heat of reentry, a continuing program to upgrade the main engines and a major effort to improve the braking after the shuttle lands.Worst U.S. Airline Crash
The requirement for continuous vigilance sometimes escapes the airlines and the FAA just as it did NASA.
A 1970 safety analysis by McDonnell Douglas for its DC10 anticipated the unplanned retraction on takeoff of control surfaces on the forward edge of one wing but not the other, thus unbalancing the airplane's flyability.
If a second problem such as an engine failure were combined with a control surface retraction on one wing only, "it would be critical only under the most adverse flight or takeoff conditions. The probability of both failures occurring is less than" one in a billion, the analysis said. The FAA approved the design.
The nation's worst airline crash occurred May 25, 1979, at Chicago's O'Hare International Airport. Just as the nose wheel of a departing American Airlines DC10 was leaving the ground, the engine under the left wing fell off and the front-of-wing control surfaces retracted on the left wing only. The plane climbed slightly, rolled to the left, and crashed, killing 273 people.
The engine fell off because of maintenance damage, something not anticipated in failure and probability analyses, but something that had happened to another DC10 at another airline. There the maintenance-induced damage was discovered and repaired. A modest report -- by no means alarming -- was sent by McDonnell Douglas to other airlines and the FAA, to little effect.
DC10s have since been modified to assure that their wing control surfaces will not retract without command, even if an engine falls off. The engine bracket has been modified to make it less susceptible to maintenance damage.
Few have a more hard-boiled view of the risks of anything than those in the insurance business. Broker Stockwell is president of Corroon & Black Inspace, which helps clients who want to launch satellites decide what insurance they should buy and then attempts to place it. A typical premium is 15 to 20 percent of the cost of the satellite, or $15 million to $20 million for a typical $100 million communications satellite.
From Stockwell's viewpoint, a space mission is a failure if the satellite does not both reach proper orbit and work when it gets there. In the decade ended in 1985, according to Stockwell's records, there were 182 launches of insured satellites on conventional rockets worldwide and 26 failed, which does not necessarily mean the rocket failed. Twenty-seven insured satellites were launched by the shuttle, and three of them failed, although not because of shuttle failure itself. Two of those were recovered on subsequent shuttle flights.
Assessing the risk of space missions is difficult for two reasons, Stockwell said. "There is no unified body or system for guaranteeing the quality of space products" and "the space industry is not mature enough" to have established a meaningful track record.
Before the Challenger explosion, "We said that the chances of a mission being lost due to shuttle failure were 6 percent," Stockwell said. He does not expect much change when the shuttle returns to service.
Staff researcher James Schwartz contributed to this report.