“Even though there were specific conditions that triggered this outage, we should have anticipated it,” Rockwell said in the blog post. “We provide mission critical services, and we treat any action that can cause service issues with the utmost sensitivity and priority.”
The outage underscored the importance of little-known Internet infrastructure companies like Fastly to the normal functioning of the Web, and how even isolated disruptions can bring huge parts of online life to a halt. The pandemic-era shifts that sent more people online for their groceries, work, school and health care have heightened the potential for broad shutdowns to do real-world harm.
Fastly runs a “content delivery network,” or a system of servers that its customers use to reduce the time it takes for their websites to load by storing larger files like images and videos closer to their end users. The 10-year-old company has a nearly $5 billion valuation.
On its website, Fastly says it has helped big customers like Wired, Shazam and Shopify handle major spikes in traffic and cut down on load times.
Content delivery networks are particularly difficult to replicate because their business model requires having physical data centers spread across several countries. Fastly itself has more than 50. Larger cloud companies like Google or Amazon, which store the bulk of the Internet, have fewer, but larger data warehouses. Even Amazon has used Fastly to speed up the rate at which its pages load.
Other news outlets affected by the Tuesday morning outage were CNN, the Guardian, Bloomberg News, the Financial Times and the Verge. High-traffic platforms such as Reddit, Pinterest and Twitch also were affected. The British government’s website was taken down, limiting access to public services, including the portal for booking a coronavirus test.
Kentik, a company that helps businesses track Internet traffic, said Fastly suffered a global outage beginning at 5:49 a.m. Eastern time, which caused a 75 percent drop in traffic from its servers.
Fastly said it is doing a postmortem of its processes and practices during the outage, and that it will figure out why quality assurance and testing didn’t detect the bug when it was introduced during a software update on May 12.
“This outage was broad and severe, and we’re truly sorry for the impact to our customers and everyone who relies on them,” Rockwell said in the blog post.