Nathan Peck, AWS Senior Developer Advocate and friend to all, runs the surprisingly handy changelogs.md service that gathers changes to a wide variety of packages.
He’s been remarkably transparent with changes he’s made to make running it more cost-efficient—and since he’s running this in a personal account, this is real money rather than AWS Credits (motto: “Like Bitcoin but less obnoxious”). The entire system is open-sourced and, after a bit of bargaining, he let me into his account to take a look at the bill.
In this post, we tear down what he’s running, how it could be optimized, and what it takes to get there. This is a microcosm of what I do; my actual clients have bills many times this size but they’re understandably reticent about letting me dissect their bills in public.
Before I begin, I’d just like to highlight that Nathan is an AWS employee and a conscientious worker. He’s good at building cost-effective architectures, and he knows his stuff.
This is a good thing. But there are always savings lurking around the AWS bill. Let’s dive in!
To start, Nathan somewhat apologetically mentioned that everything for this service lives in us-east-2. “I really should have this live in a separate account…” he says, like absolutely every AWS customer I’ve ever heard.
This is a pattern; the difference between how things are and how things should ideally be is the space in which we all work. Everyone struggles with proper account-level separation, tagging of resources, and the like. You can either yell at people—or get to work anyway.
Let’s restrict the ever-improving Cost Explorer view down to us-east-2 only, and break it down by service.
That’s much more helpful. We can see that, last month, he spent a total of $163 on the service.
DynamoDB is a fun place to optimize. But before I got into this, Nathan made some changes to DynamoDB—namely, switching them from on-demand capacity to provisioned. This was the right move, but I take no credit for it:
There are also almost certainly some optimizations to DynamoDB’s table structure that would knock this down further. I say that with no particular insight and as someone whose favorite database steadfastly remains Route 53. But I’m pretty confident in Alex DeBrie and his collaborators being some kind of database sorcerers.
The next category down the list is “EC2-Other,” so named because “Miscellaneous” isn’t enterprisey enough of a phrase.
This covers a lot: EBS volumes, snapshots, unattached EIPs, the AWS salesperson’s boat payment, and more. In Nathan’s case, there’s a very clear outlier here:
It’s our old friend, managed NAT gateway. This is a handy service—it means you don’t have to run your own NAT instances to let resources in private subnets speak to things outside of that subnet.
Unfortunately, its pricing model is a cruel joke. At 4.5¢ per gigabyte as a data processing charge, you’re paying twice as much to put data through it to S3 (assuming you haven’t set up a VPC endpoint for S3!) as it costs to store that data in S3 for the month. In this case, the data transfer is piddly, but the per-hour charge for the service is a disturbingly high proportion of the bill.
My first approach would be to migrate anything in the private subnets into public subnets, which would reduce this $35.76 a month fee down to $0. There are frequently architectural reasons this can’t be done, so my next approach would be to shove a t3.nano instance in there instead and run a NAT instance.
It’s a bit more overhead. But it drops that charge down to $3.43 a month with the same capability. At this scale it makes sense—but I’ve seen this charge grow to represent a third of a multi-million dollar bill in various client engagements. Again, the service is great. But it’s rarely worth the price.
Nothing else is of meaningful spend in the EC2-Other category, so we continue bravely onward!
The Elastic Container Service is of course misnamed in everything related to billing as “EC2 Container Service,” primarily to mess with folks preparing for a certification exam.
Fargate recently saw a 40% price cut, which is reflected in these numbers. Then, Savings Plans came out, which offer committed-use discounts for EC2, Lambda, and—you guessed it—Fargate.
Lastly, Fargate now also supports a Spot model. Presupposing that your workload is okay with containers going away with a two-minute warning, this knocks a bunch of money off of even the discounted rate. Unfortunately, it’s very hard to model out what those savings will be in advance thanks to the variable pricing of Spot.
If Nathan cares more about accurately predicting his costs than he does saving dollars and cents on his bill, Spot isn’t recommended. This is a nuanced point that often sneaks past engineers in their discussions with Finance!
The cloud, cost-effectiveness, and cost predictability
Something to bear in mind: The “cloudier” a system becomes, the more cost-effective it becomes.
This makes sense—but there’s a corollary to keep in mind. The cloudier (and thus more cost-efficient) that system becomes, the harder it is to predict that cost other than as a function of work passing through the system.
End result: Nathan can save $8.26 a month if he buys a Savings Plan—or “more than that, but how much more varies” should he instead choose to go with Spot.
Next up is my favorite “message queue,” S3.
This is one of those graphs where—at first glance—you’re positive you didn’t set Cost Explorer properly. But you actually did.
Nathan’s paying 23¢ a month to store 10GB of data in S3, and then $12.97 in request charges to access that data.
Remember, the numbers are paltry here. But this is exactly the kind of pattern that has a way of growing massively at scale, so identifying it early is important.
We can see that something’s calling PUT, DELETE, GET, or LIST over 2.5 million times a month, which is likely…suboptimal. Fortunately, we have access to the code! It’s rewriting every changelog it discovers every time every changelog gets crawled, regardless of whether or not it’s changed.
It’d shave a boatload off of the bill if, when written to S3, each changelog had an md5 hash saved to DynamoDB. Querying DynamoDB with the object saves a relatively expensive S3 call, and that DynamoDB query costs far less.
It’s a small optimization—but a straightforward one that has the potential to cut that charge at least in half. Depending upon the throughput required for DynamoDB; testing would be required to get a more solid answer.
We’re running out of large-dollar items here. We see that there’s a single ElastiCache redis node. If Nathan’s happy with it as it is, buy a Reserved Instance for it and cut that $12.65 monthly charge down to $5.95.
Cutting 46% off the bill
The rest is a variety of “loose change”-style events that don’t hold too much that’s of interest.
You could move the single 40¢/month secret in AWS Secrets Manager to the Systems Manager Parameter Store (which is free). And you could tweak the Lambda functions to output fewer logs to knock a few cents off of the CloudWatch bill. But even at the small-dollar value of this bill, we’re well into areas that account for less than 1% of total spend.
At this point, the right move is to implement what we’ve already found and then take a step back to see where things are. Is there still a billing issue to chase down?
At a glance, we’ve taken his current ~$115/month bill (after his DynamoDB changes) down by 46% with somewhat minimal effort. Some of these changes require code but others just require “a few mouse clicks in the console.”
Two takeaways are left here.
First, I want to highlight how far Cost Explorer has come. It’s gone from an at least you tried offering to a very capable tool in a surprisingly short period of time that roughly corresponds to me making fun of AWS on Twitter. It’s no longer byzantine and deliberately unhelpful; that task is left to Quicksight.
Secondly, Nathan’s time is certainly worth more than refactoring this application for two weeks to save potentially dozens of dollars. He’s not going to cost-optimize this application to a point where it’s meaningful to his budget—so he likely shouldn’t spend too much time on it!
However, if this is a prototype for something he’s preparing to launch at massive scale at some point down the road, every penny he saves now has the potential to grow to thousands of dollars on that future application—which is why these kinds of exercises are incredibly important for the majority of organizations to do on a regular basis.
Overall, Nathan’s done a great job already. Much respect. And thanks to him for letting me play around with the bill.
It turns out that even if you’re somebody as talented as Nathan who’s actually employed by AWS and knows the ropes better than most, you might still be able to cut a big chunk out of your bill each month. Here’s the easiest way how. And if you’ve enjoyed this type of analysis and want to come work with us on similar types of projects, we’re hiring!