Amazon Web Services can be a great platform for startups when they’re small, but costs can outpace revenue growth pretty quick — especially if you’re offering a a free consumer service. At AWS’s Re: Invent user conference last week, engineers from Pinterest, Flipboard and Yelp shared their impressive and sometimes ingenious techniques for keeping costs under control and their bottom lines healthy.
Pinterest Operations Engineer Ryan Park had the stage to himself for a session on Wednesday, while Flipboard Chief Architect Greg Scallon and Yelp Engineering Manager Jim Blomo teamed up with Kleiner Perkins Caufield Byers Partner Ray Bradford to form a trifecta of wisdom on Thursday.
Know — and measure — your costs
Flipboard’s Scallon had a paradoxical lesson for the audience when it comes to managing cloud-based infrastructure: Embrace the cloud, but be afraid of the cloud. Yes, it’s flexible and affordable if done right, but all it takes is poor planning or a handful of servers left running ad infinitum, and the costs can begin to grow out of control. That’s why Flipboard assigns members of its engineering team the title of “chief miser,” which means they’re the ones who decide that applications are using the right resources and using them wisely.
Thanks to a variety of practices, including its miserly ways, Scallon said Flipboard is now running about 900 instances at any given time. That’s down from a peak of about 1,500.
One way to help ensure this sort lean operation is to understand your business inputs and outputs, Kleiner Perkins’s Bradford explained. He suggests companies ask, for example, what it costs them to serve a free user on their platform and how does that change with scale or affect the experience they can offer premium users. Pick metrics that really matter, he said (e.g., infrastructure cost per user per month) and then consider how long your current architecture can sustain that cost before it’s time to retool.
The secret weapon: Source your instances wisely
Pinterest, Yelp and Flipboard all swear by AWS’s pre-paid Reserved Instances in order to save money over the long haul. In fact, Flipboard’s Scallon said, the e-reading startup sees cost savings of about 80 percent over three years by using heavy-duty Reserved Instances instead of on-demand instances for its base workloads, and the break-even point might be only eight or nine months. Pinterest’s Park cited savings of about 70 percent over three years using them.
Yelp’s Blomo said his company is a heavy Elastic MapReduce (EMR) user, peaking at more than 350 Elastic MapReduce instances when many developers run their Hadoop jobs simultaneously or when it’s doing nightly analysis of its log files. In order to keep costs in check, Yelp uses Reserved Instances whenever possible to save on hourly bills and has implemented a job-flow pooling system to keep Hadoop jobs running continuously as resources become available. This helps avoid the situation where a job completes in 61 minutes, for example, thus triggering the charge for a full hour of resources even though it only used a minute worth of the second hour.