I’ve often wondered what it takes to be named an AWS Hero. After all, few people go around self-describing as “heroes.” “I’m a hero” inherently sounds self-aggrandizing, and presents as the speaker having a massively over-inflated ego, similar to “I’m an entrepreneur” or “Hello, my name is Elon Musk.”
In my book, there are three things that could make you a hero: protecting innocents at risk to yourself, developing superpowers, or defeating the unreasonable data processing costs of the Managed NAT Gateway.
I was recently reminded of what AWS Heroism really is by Ben Whaley — who self-describes as a simple “AWS Community Hero” in his Twitter bio — when he reached out about something he was working on to address NAT woes.
What makes Managed NAT Gateway suck
As a refresher to the painful problem that is AWS Managed NAT Gateway, it comes down to the billing. Each managed NAT gateway costs about $32.40 a month in hourly charges, plus a 4.5¢ per gigabyte fee for “data processing.” The former is annoying to independent learners, who miss out on a free tier for the network address translation service. The latter is actively painful to companies moving data at large scale.
If you’re trying to avoid getting gouged, there are historically only two alternatives to Managed NAT Gateways: 1. Don’t use private subnets. This is unthinkable to many organizations. 2. Manually manage your own NAT instances. This is a somewhat flimsy single point of failure.
NAT instances are somewhat fragile — rebooting one for a security update renders that entire private subnet unreachable until it comes back. There have been attempts to make them more durable via the use of Elastic IP addresses and some automation to update the routing table when a NAT instance fails. That still feels remarkably fragile and subject to significant disruption if an instance gets “stuck” somehow.
That’s where Ben came up with a third option: alterNAT.
alterNAT: An alternative NAT Gateway implementation
Like so many good ideas, the idea behind alterNAT is forehead-smackingly obvious when viewed through the clarifying lens of hindsight.
It’s not the $32 a month Managed NAT Gateway hourly charge that drives people to distraction — it’s the data processing fee that can, in many cases, cost millions of dollars a month. Rather than accepting NAT instances or the Managed NAT Gateway as being a binary decision, Ben saw a way to use both.
alterNAT uses a NAT instance with an Elastic IP address to handle traffic, and it stands up a Managed NAT Gateway. Then, it configures a Lambda function to automatically and continuously validate the health of the NAT instance. Should it fail the health check, the Lambda updates the VPC’s route table to direct traffic through the Managed NAT Gateway while it replaces the NAT instance and reassociates the EIP with the new instance.
In other words, rather than having a failure mode of “TCP now terminates on the floor,” alterNAT has a failure mode of “accept the overpriced data processing fees for the NAT Gateway for a few minutes, then go back to its more cost-efficient NAT instance once the environment stabilizes.” For large companies, it’s generally preferable to spend a bit of money on a few minutes of data processing than to spend a whole lot of money on it — or to fail to serve traffic entirely.
The economics of alterNAT
From where I sit, using alterNAT starts making sense somewhere around the point that you send 10 terabytes a month through your Managed NAT Gateways. That costs you $450 in data processing fees that alterNAT can remove entirely. Depending upon which NAT instance size you select, you pay anywhere from $15 a month (not recommended!) on up to all of the money (which is absolutely not recommended!). Most sensible options look like network-optimized instances with at least 32 vCPUs (otherwise current generation instances will be limited to 5Gbps.
The way to do the math on this is quite simply to compare the cost of the instance you’ll use against the per-GB processing fee that shows on your bill. Remember that the Managed NAT Gateway data processing fee is purely additive; it replaces no egress or inter-AZ data transfer fees. Bottom line: It’s no exaggeration to say that this has the potential to save individual customers millions of dollars a month.
In (small) defense of the Managed NAT Gateway
All this isn’t a slam against the Managed NAT Gateway as a product. It solves a real problem, and it does some super nifty things. As far as its billing, I’m reassured by folks I trust that there’s a definite compute cost to having the Managed NAT Gateway running and processing data, so the egregious-feeling charges aren’t entirely about rent-seeking. (That doesn’t make it any better when your AWS bill feels like getting sucker-punched in the gut by She-Hulk.)
When AWS Heroes save us from AWS…
There’s no doubt in my mind that alterNAT makes Ben Whaley a true AWS Hero by rescuing us from absurd Managed NAT Gateway costs.
Of course, there are going to be periodic disruptions to established connections using alterNAT that the Managed NAT Gateway can seamlessly transition, though well-behaved clients should retry. Out of the box, alterNAT doesn’t handle the translation between IPv4 and IPv6 the way that the Managed NAT Gateway does, either.
That said, given the painful bill attached to the Managed NAT Gateway, it’s also pretty clear that AWS may have built — and priced — the wrong service for many of its customers. alterNAT is an open source version of what AWS absolutely should have built instead.