By far the most notable release of re:Invent this year was gp3 EBS volumes. They offer a flat 20% discount from gp2 volumes with no drawbacks and can be done in place.
Today, I’d like to explore a little bit more about what makes these so awesome.
Let’s start by calling out what EBS volumes are: They’re fundamentally disk volumes that are attached to instances. For the non-instance-storage type families, they’re where the root volumes live and are the only “disks” available on these instances.
It goes a little beyond that though. Disk failures aren’t a thing with these volumes, you can convert them to other types on the fly, and you can snapshot them as frequently as your heart desires. They’re disk volumes except they’re magic. From a macro view, they’re a significant driver of actual customer spend on AWS resources—on par with data transfer.
gp2 volumes
For some time, gp2 was the default disk volume. SSD-backed, they perform well in workloads where random IO is prevalent (as opposed to long, streaming reads)… they’re just solid all-around as a default disk that’s highly performant.
One of their benefits, however, is also a drawback: their burst capacity.
In short, this means that they can overperform their baseline performance offering for a while. But eventually, the burst credits run out and performance craters.
If an operator wasn’t aware of this behavior, it was one hell of a mystery. “I must have had a bad volume provisioned, I’ll provision a new one and hey! The problem went away! Back to bed I go!”—and the cycle repeated a half-hour later.
It wasn’t particularly discoverable other than through hard-won experience. It also had the painful pattern of “the behavior I see once I stand the volume up isn’t the behavior it will exhibit under a sustained load,” and that made load testing and modeling out performance of these volumes a challenge.
Making matters worse, the way to overcome these performance constraints was either to upgrade to io1 (and later, io2) volumes where you could explicitly set IOPS per volume or make your gp2 volume larger because performance scaled with volume size.
That said, io1 and io2 are usually not required. And, in fact, expanding your gp2 volume would result in comparable performance at a third of the price.
On top of all of this is the challenge of per-instance limits. There was and is an EBS burst allocation for many instance types that varies based upon the exact instance type in question.
When that gets exhausted, your EBS throughput will fall no matter what volume type you had configured—which contributed to widespread confusion about what EBS volumes you really needed.
Enter gp3 volumes
Of course, gp3 volumes change all of this.
First, they offer a predictable, baseline performance of 3,000 IOPS and 125 MB/s regardless of the volume size. Should you need more than this, you can allocate additional IOPS or throughput to a volume at reasonable cost to a ceiling of 16,000 IOPS and 1,000 MB/s.
You can then adjust all three of those dimensions (i.e., size, IOPS, and throughput) independently, meaning you’re not forced into overprovisioning on volume size to achieve certain performance targets.
Objectively, you’re unlikely to hit the limits of a gp3 volume for most normal workloads. Very specific tasks such as high-performance databases will be the exception here. But my default position on exceeding gp3 limits is to double check your architecture and make sure you’re not doing something that’s best served by alternate means.
What you should do next
Now, everything I’ve said above is all well and good. But what does this mean for your environment today?
Simply put, if you convert all of your existing gp2 volumes to gp3 (which can be done with a single API call in-place), you’ll knock 20% off of your EBS bill immediately with no scenarios in which you’re degrading your volume performance. It’s exceedingly rare to see a single granular change like this that comes with virtually no downsides, so I encourage everyone to take advantage of it immediately.
There’s no “copying data between volumes” and no “instance restart is required.” Worst case, converting back to gp2 (which makes so little sense in any scenario I’ve modeled out that I’d love to hear what your use case is) from gp3 is again a single API call.
The single caveat to be aware of is that—if you have a larger gp2 volume for throughput or IOPS purposes—you need to make sure that you provision a gp3 volume with at least that much throughput allocated for you.
A cloud microcosm
In many ways, this is a microcosm of the best that cloud has to offer. Any on-premises scenario remotely close to this would require migrating VMs, reprovisioning volumes, carefully calculating storage requirements for the change, and months upon months of staging the change for sizable environments.
Instead, it’s “make a single API call and your entire environment becomes better within seconds or minutes.” From where I stand, the only way this change could be better would be via automatically converting gp2 to gp3 volume under the hood—which I wouldn’t bet against AWS doing down the road.
Fundamentally, it’s rare that we see such a clear win for both cost and performance become visible without significant trade-offs. Therefore, when we do, I strongly encourage taking advantage of them.
Well done, EBS team. This is basically magic.