Now a study from the Federal Communications Commission offers the most in-depth explanation of the outage and why it occurred. In a 40-page report, the FCC found that an entirely preventable software error was responsible for causing 911 service to drop. The incident affected 81 call dispatch centers, rendering emergency services inoperable in all of Washington and parts of North Carolina, South Carolina, Pennsylvania, California, Minnesota and Florida.
"It could have been prevented. But it was not," the FCC's report reads. "The causes of this outage highlight vulnerabilities of networks as they transition from the long-familiar methods of reaching 911 to [Internet Protocol]-supported technologies."
At the center of the disruption was a system maintained by a third-party contractor, a Colorado-based company called Intrado. Intrado owns and operates a routing service, taking in 911 calls and directing them to the most appropriate public safety answering point, or PSAP, in industry parlance. Ordinarily, Intrado's automated system assigns a unique identifying code to each incoming call before passing it on — a method of keeping track of phone calls as they move through the system.
But on April 9, the software responsible for assigning the codes maxed out at a pre-set limit; the counter literally stopped counting at 40 million calls. As a result, the routing system stopped accepting new calls, leading to a bottleneck and a series of cascading failures elsewhere in the 911 infrastructure.
At first, Intrado thought that the complaints arising from various PSAPs around the country were just isolated, unconnected events — even though alarm bells were going off an hour into the breakdown. Nobody noticed the warnings until it was too late; the server taking note of the alerts categorized them as "low level" incidents and were never flagged for a human, according to the FCC report.
"It appears that Intrado was not able to fully understand the significance and breaadth of the problem until around 2 a.m. PDT," the FCC said in its report. Any backup 911 call centers that could have helped with the bottleneck never did, because they themselves were suffering from the same problems everyone else was, according to the FCC. In Washington state alone, 4,500 calls to 911 failed to go through during an eight-hour period, said Dave Danner, the chairman of the state utility commission.
What Danner described as a "single coding error" could become more commonplace as 911 services begin relying more heavily on automated, Internet-powered infrastructure. Public safety officials recorded no major outages affecting an entire state or multiple states over the past three years. In 2014 alone, there have been four. They include an outage in Hawaii and one in Vermont.
The day after April's massive outage, officials increased the counter cap and began checking it weekly to be sure that no more blockages will occur. Intrado also created a new alarm for when the number of successful calls drops below a certain percentage.
On Friday, Intrado said it was still evaluating the report and its recommendations.
"We respect the commission's interest in this important issue,” said company spokesman Ray Wendell.
But the FCC is also mulling further steps, such as developing a set of guidelines to help deal with future outages.
"Some failures will inevitably occur," said Adm. David Simpson, the FCC's chief of public safety and homeland security. Unlike previous outages, which were often the result of weather, earthquake, fires or a mechanical calamity, the breakdowns of the future are going to be harder to detect. They'll happen invisibly as computer fail or during software glitches. But that's no excuse. "Calls for help must go through," said Simpson, "period."
Correction: An earlier version of this post misspelled the name of the Washington state utility commission. It is Dave Danner, not Dave Dannier.