In early 2013, The Washington Post found itself in a particularly vexing situation: The newsroom employed two different content management systems to publish two different websites. Each site happened to be called washingtonpost.com, and the exhausted designers and site engineers had to make sure readers couldn’t see a difference.
Every time a change was made to the templates published by one CMS, it was imperative to ensure they were identical to those in the other CMS, and it was exactly the kind of work that developers hate most: busywork stemming from a violation of the sacred principle that you should only have to make the same change once.
It was a problem long in the making. Since 1995, when The Post first launched its website, the company had a web CMS that published online-only content alongside print content, which it received from the print-only CMS. Following Clay Christensen’s principles of disruptive innovation, the web newsroom was established separate from the main print newsroom. For more than a decade, the ink-stained wretches downtown threw their stories over the wall, and over the river, to the web editors in Arlington, Virginia.
Finally, by the end of the 2000s, The Post merged its newsrooms, brought the digital folks downtown, and replaced its print and web CMS systems with a single solution that supported both print and digital publishing. But — as is so often the case — the all-in-one solution wasn’t equally good at everything. It behaved like a legacy print CMS, where digital was an afterthought.
And the paper’s young turks wanted to blog. They didn’t want to have to run every new post through an industrial-strength print-influenced content workflow. They wanted to be able to publish fast: click-publish, then send an email to an editor buddy for a backread. So newsroom editors put out that fire by installing WordPress to run a couple of their most popular blogs. In the process, they learned something counterintuitive: Having an all-in-one system is less important than having a solid digital system that can integrate with print.
But now The Post had two websites: one published by WordPress, one published by the all-in-one CMS, and a frazzled site development team frantically trying to ensure that there were no visible seams.
This was a problem that needed an engineering solution. Finally, in 2013, Post engineers cut the Gordian knot by building PageBuilder, a page-rendering system that could take the responsibility for building webpages out of the hands of the content management system. Over the next year, PageBuilder was rolled out across the site.
By disaggregating content management from content publishing, The Post had taken the first step toward building its own in-house publishing system, but that wasn’t immediately clear at the time. The engineers were mainly preoccupied with putting out immediate fires.
Because PageBuilder had to integrate into existing legacy systems, it needed to accommodate content coming from anywhere, and that content needed to be able to go anywhere. Content could come in from wires, or a print or digital CMS; it might be published on desktop and mobile webpages, or a native app, or the print paper, or Facebook. Anywhere.
Around the same time, and for similar reasons, Post engineers were developing a new approach to how content was structured. Many content management systems surround words with markup in a format like XML. That may make sense for a desktop webpage, but it likely wouldn’t make sense for print, or any of a number of other places in which content might appear. So they decided to separate the content itself from the rules that the system would use to determine how to render it. Each publishing platform could have its own rules which could be defined and applied separately.
As all of this was being developed, Shailesh Prakash, chief information officer at The Post since 2011, had a simple decision principle: If something is core to The Post’s business, it needs to be built in-house. He firmly believed that it was necessary to maintain code control, so that developers had the flexibility to innovate rapidly. It is very hard to innovate on top of a black box.
Nothing could be closer to the core of The Post’s business than digital publishing, and no off-the-shelf solution would suffice. A full platform was needed, and these tools became an integrated suite, one component at a time.
The metered paywall came in 2013. It was followed by a video system (code-named Goldfish), scheduling and workflow management tool (WebSked), fully redesigned mobile app (Rainbow), quiz and poll tools (Story Tools), analytics dashboard (Loxodo), headline testing tool (Bandito), A/B testing tool (Darwin), and others.
Eventually, the platform got a new name: Arc Publishing.
Beginning in 2014, The Post began providing early versions of the software for free to universities, and the first commercial client came a year later, with more commercial clients coming in shortly thereafter. As the newspaper’s newsroom focused on an unprecedented presidential election, the engineering team worked to build the first enterprise software sales business in the company’s history. Over the last year, The Post launched Arc with clients in Buenos Aires and Toronto, and the paper’s staff hopes for much more growth in the year ahead.
Whatever the next 400 years may hold for journalists, Arc is here to stay. Much has happened in the nearly four years since Arc’s beginnings as a workaround to newsroom frustration. Arc has gone on to power websites around the world, with more to come, and Post engineers daily encounter a multiplicity of challenges more complex than anything the newsroom could have imagined in 2013.
Once again, the playwright Webster foresaw their plight in “The Duchess of Malfi”: “Miserable age, where only the reward of doing well is the doing of it!”