Episode Summary
Join me as continue my series on cloud fundamentals with a look at data transfer pricing that includes my theory on why it costs half-price to move data between US-East-1 and US-East-2 compared to everywhere else, how you basically have to conduct experiments to see how much data transfers cost, how adding a VPN to the mix makes data transfer pricing even more fun, the most expensive AWS region in the world for data transfers, where data transfer pricing shows up on your AWS bill, why data transfer pricing is the white space between AWS services, and more.
Episode Show Notes & Transcript
About Corey Quinn
Over the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.
Transcript
Corey: Welcome to the AWS Morning Brief, specifically our 12-part mini series, Networking In The Cloud, sponsored by ThousandEyes. ThousandEyes recently released their state of the cloud benchmark performance report. They raced five clouds together and gave a comparative view of the networking strengths, weaknesses, and approaches of those various providers. Take a look at what it means for you. There's actionable advice hidden within, as well as incredibly useful comparative data, so you can start comparing apples to oranges instead of apples to baseballs. Check them out and get your copy today at snark.cloud/realclouds. That's snark.cloud/realclouds because Oracle cloud was not invited to participate.
Now, one thing that they did not bother to talk about in that report, is how much all of that data transfer across different providers costs. Today I'd like to talk about that, which is a bit of a lie because I'm not here to talk about it at all, I'm here to rant like a freaking lunatic for which I make no apologies whatsoever.
This episode is about data transfer pricing in AWS. Because honestly I need to rant about something and this topic is entirely too near and dear to my heart, given that I spend most of my time fixing AWS bills for interesting and various sophisticated clients.
Let's begin with a simple question. The answer to which is guaranteed to piss you off like almost nothing else. What does it cost to move a gigabyte of data in AWS? Think about that for a second. The correct answer, of course, is that nobody freaking knows. There is no way to get a deterministic answer to that question without asking a giant boatload of other questions.
Let me give you some examples, and before I do, I would like to call out that every number I'm about to mention applies only to us-east-1, because of course different regions in different places have varying costs, that every single one of these numbers is different in other places sometimes, but not always. Why? Because things are awful. I told you I was going to rant. I'm not apologizing for it at this point.
Let's begin simply and talk about what it takes to just shove a gigabyte of data into AWS. Now in most cases that's free. Inbound bandwidth is always free to AWS usually, until it passes through with load balancer or does something else but we'll get there. What does it cost to move data between two AWS regions? Great. The answer to that is, two cents per gigabyte in the primary regions, except there's one use case which gets slightly less. And that is moving between us-east-1 and us-east-2. One is in Virginia, two is in Ohio. That is half price at one cent per gigabyte. My working theory behind that is that it's because even data wants to get the hell out of Ohio.
Let's take it a step further. Let's say you were in an individual region. What does it cost to move data from 1-AZ to another? The documentation was exquisitely unclear, and I had to do some experiments with spinning up a few instances in otherwise empty AWS accounts, and using DD and Netcat to hurl data across various links to find out the answer and then wait till it showed up on my bill. The answer is it also costs 2 cents per gigabyte, the same cost as region to region. It's one cent per gigabyte out of an AZ and one cent per gigabyte in to an AZ. And that's right, it means you get charged twice. If you move 10 gigabytes, you are charged for 20 gigabytes on that particular metric.
This also has the fun ancillary side effect of meaning that moving data between Virginia and Ohio is cheaper to do that cross region transfer than it is to move that same data within an existing region. Oh wait, it gets dumber than that. What do load balancer data transfer fees look like? The correct answer is who the hell knows? On the old classic load balancers, it was 0.8 cents per gigabyte in or out to the internet and there was also an instance fee, but that's not what we're talking about today. Traffic from any existing load balancer today to something inside of an AZ is free unless it crosses an availability zone and then we're back into cross AZ data transfer territory and anything going from an availability zone to a load balancer costs one cent per gigabyte.
Now the newer load balancer generations, the ALDs and the NLDS, what does that cost? Nobody freaking knows because data throughput is just one of several dimensions that go into a load balancer capacity unit, which mean that what your data transfer price is going to look like is going to vary wildly because in this particular case, it's not data transfer itself. There's still that as it leaves, but you also have to pay for this as an additional through the load balancer fee, but it's blended into an LCU, so it's not at all obvious at times that that is in fact what you're being billed for.
In another episode of this mini series, we talked about global accelerator. Now there's a site to site VPN option, which they had for a while, but at re:Invent last year they announced a accelerated VPN option that leverages a lot of global accelerator technology to let that site to site VPN take advantage significantly of the global accelerator. Now what does that cost? I could not freaking tell you. There are, I am not exaggerating, five distinct billing line items, if you run an accelerated site to site VPN and of course, all of them cost you money. I am not exaggerating. That is the actual state of the world. It is incredibly annoying. It is so annoying that I'm going to have to take a break before I blow a blood vessel to tell you more about ThousandEyes instead.
So other than the cloud report, what is ThousandEyes? They effectively act as the global observer that watches the entire internet from a whole bunch of different listening posts around that internet and keeps track in near real time of what's going on, what's being slow, what providers are having issues and giving information directly to your folks on your side to be able to understand, adapt and mitigate those outages and slow downs. It helps immediately get to the point of is this a networking problem globally or is it our last crappy code deploy that broke things? If this sounds like something that might be useful for you or your team, I encourage you to check them out at thousandeyes.com. They're a fantastic company with a fantastic product and best of all their billing makes sense.
We're back to ranting again. That's right. My problem with the AWS data transfer pricing is not that it's shitty and complex, but also that it's expensive. Pricing largely has not changed since AWS launched and you're effectively seeing 1998 bandwidth prices as a direct result of this. In data center land, the way this works is you pay for a link between two places and however much traffic you put over it, you're charged at the 95th percentile, so you can have bursts and spikes that exceed that limit, but you're paying effectively a flat rate for whatever your throughput looks like over the course of the billing period. It's not the most straightforward thing in the world, but it's a lot less expensive than you wind up paying for the same thing in the cloud.
Somehow AWS has managed to successfully convince an entire generation of companies that bandwidth is a rare, precious, expensive commodity. Unless of course it's bandwidth directly into AWS from the internet, in which case it is of course free, and you can have as much of that as you want. Data checks in, it doesn't check out. This in turn leads to a lot of weird patterns. For example, if you have a mobile app that winds up reporting data to something that lives in an AWS region, rather than having that replicated on your dime, you could theoretically have that mobile app report it to two different regions. It doubles your user's bandwidth on potentially a mobile plan, but it saves you money. How crappy of a dynamic is that?
Now there are other services that wind up leveraging aspects of data transfer pricing in obnoxious ways and there are ways around this too. PrivateLink, for example, a link between two different VPCs, in some cases in different accounts, saves you money on the data exodus charge so you don't have to go across the internet to do it. Great, sure, that's right. It drops it down to one cent per gigabyte in each direction, but you're still paying, at scale, a significant amount of money for what is in effect AWS just moving data around its internal network.
Direct Connect, the service that links a AWS VPC to your on premises data center also saves you money on that for data out that traverses from AWS to your data center, but in reverse, it costs you more than using the internet because again, ingress is free. You could theoretically have data if you're doing a large copy from your data center into AWS, use the public internet and put it directly into S3 or something like that. Why? Because this entire story is a carnival of bullshit. It's awful. Nobody likes it.
Let's talk about CloudFront. AWS's CDN product. It's kind of spendy as far as CDNs go, not horribly so, but what's fun about this is first off, what it costs for data in or out of CloudFront varies depending upon what region that data access is coming from and you don't have fine grain control over any of that. You don't know where your customers are going to come from and in some cases you can pay three times more if a customer accesses your same application with the same traffic pattern from different parts of the world than, if they do it from down the street from your office. And what's even more obnoxious about this is they still have an obnoxious competitive advantage over other CDN offerings because unlike anyone else, they can privilege their own environment and say that traffic from the origin to the CloudFront distribution is free. No one else can do that. So if you have a lot of data that you're pushing through CloudFront that isn't easily cached, that is an advantage that CloudFront has that makes it very expensive to even consider using something else.
This ties into a bigger challenge, notice as well that AWS has a large pile of, let's call them substandard offerings in some respects, that are managed service equivalents of things you could build and run yourself, RDS for a bunch of database offerings. Amazon Elasticsearch, Neptune, DocumentDB, and a whole host of other managed services that have presences in multiple availability zones, offer free replication of that data when you use the managed service. So if you can tolerate the obnoxious sharp edges of those managed services and it works well enough for your use case, they have a tremendous leg up over running your own implementation of the open source products that those things ape. Or having a third party vendor that manages that service for you because one way or another, unless you're using the AWS version of it, either the vendor who's managing it for you or you, have to pay that data transfer charge between AZ's. In some cases, both of you do. Only Amazon gets to ride data transfer between AZ's for free.
This one isn't particularly a data transfer charge directly, although it looks a lot like one, I speak of the managed NAT Gateway, there's a data processing fee in addition to the instance fee, of four and a half cents per gigabyte. That doesn't sound like a lot until you realize that if you have a web facing application that you put in a private subnet that needs to talk to the outside world, every gigabyte you put through costs four and a half cents, that becomes a massive expense. If you run your own NAT Instances, sure you have more overhead, the hourly charge is about the same as you'll spend for the managed NAT Gateway, but the data processing fee completely vanishes. Remember as well that that data processing fee is in addition to any data transfer fees you would be paying. So you go from four and a half cents per gig, to nothing.
Similarly and very tightly related, if you forget to put a free S3 Gateway endpoint in a private subnet that has a managed NAT Gateway in it, every gigabyte of data that you transfer into S3, in turn winds up having to incur that four and a half cent per gigabyte charge. That is the same it costs to store that gigabyte and us-east-1 for just shy of two months. That's enormous. You can make it free, but it's laid there for you as a trap for the unwary.
Lastly, if you're thinking that all regions are equivalent, they're not. You can see four to five times the expense, if you go to the region in Sao Paulo. That is not entirely AWS's fault, it's largely due, to my understanding, from telecom monopolies in that part of the world that make bandwidth incredibly expensive for virtually everyone. That's not something that I can fault them with, but it's still irritating.
What I can fault them with is that absolutely all of this, everything that I have spoken about today shows up on your bill in a whole mess of places that are incredibly difficult to unpack. Sometimes you're charged for the same data multiple times in different places and heaven forbid you try to figure out what's caused the change, which application workload do you have in that account that is suddenly responsible for a whole lot of AZ crosstalk? The only real answer without efficient tagging and the foresight to have those tags in place before this moment, is looking at VPC flow logs, which you've enabled, right? But those are annoying and confusing and difficult to parse as well.
Look, I'm not saying that this is intentional on AWS's part, I'm saying this is just the opposite. I'm saying when it comes to data transfer pricing, it is very clear that no one is effectively minding the store. The folks who build out the networks and the folks who handle pricing and the folks who work on service teams that leverage both of those things, apparently none of those people are allowed to talk to one another. Data transfer is the white space between AWS services. The next time your AWS account manager asks how they can help you with anything, please be sure to yell at them about the data transfer pricing portion of your bill. It's the only way that this apparently will one day get fixed for everyone.
I'm cloud economist Corey Quinn. This is the Networking In The Cloud mini series. If you've enjoyed this podcast, please leave a five star review and a comment telling me what you liked. If you didn't like this podcast, please leave a five star review anyway, and a comment telling me what my problem is.
Announcer: This has been a HumblePod production. Stay humble.