Today, I want to talk about cloud capacity.
Despite the “Last Week in AWS” name, I’m not particularly partisan when it comes to cloud providers. AWS was early to market and left to grow their lead unchallenged for almost five years. As a result, I’ve got a far broader basis of experience with AWS than I do any of its competitors.
That said, I don’t work there. And while I’ll certainly take their sponsor money, the same could be said of virtually any company. Oracle, please call me.
It’s through that lens that I’d like to discuss the recent boom in demand for cloud services. I don’t have a particular axe to grind here. GCP, AWS, and Azure all have glowing bright spots and hideous warts; I try to be even-handed in how much praise or grief I give each of them.
Struggling at scale
Azure has been suffering from capacity shortfalls on and off since Q4 of 2019. Now, services are unavailable for free-tier or credit-based customers, and they’ve stated that they’re prioritizing capacity for different customer profiles.
A few points to unpack there.
First, if you want to build a hyperscale public cloud, you’ve gotta be able to invest a staggering amount of money in a physical footprint then wait years for that plan to come to fruition. There’s a reason why all of the major cloud players and most of the minor ones have “side businesses” such as e-commerce, advertising, and software sales to fund the incredible amount of expansion needed to become a global player.
There are no hyperscale “pure cloud” companies today. This means that you can’t “fix” capacity shortfalls like this in anything approaching a short timeframe by calling up SuperMicro and ordering basically every computer they still have in stock.
You’re instead going to have to plan out datacenter expansions, arrange for new facilities, construction, power, connectivity—all of which takes time.
Secondly, it’s pretty clear that—given their noises about this getting resolved within weeks or months—Microsoft doesn’t view “regions” the same way their competitors do. They’ve made the strategic decision to go for “number of global regions” rather than “robustness of said regions.”
Their “region-pairing” strategy indicates that you can think of each Azure region as an AWS or GCP Availability Zone. That’s great right up until it isn’t, and you wind up with a bunch of small regions rather than fewer more robust ones, and a sudden influx of demand causes those regions to run out of headroom.
Note that AWS pre-announces some regions years in advance; it takes WORK to build these things out in a robust way that can handle capacity surges. The lack of stories around AWS capacity shortfalls indicate that while they lose the number of regions battle, they win the but the regions we have don’t fall over when you try to use them war.
‘Sorry, Azure is full. Maybe try AWS instead?’
Microsoft’s recent announcement that they would be prioritizing services for healthcare customers over others is a statement that says a lot more than face value would suggest.
It’s a tacit admission that they’re facing capacity challenges—something that any given cloud provider would be loath to admit.
That said, their messaging around it is excellent: Who could possibly push back against “granting capacity to more important customers” without feeling like they’re being unreasonable?
Then again, what else could they conceivably say? Sorry, Azure’s full, maybe try AWS!? It’s their only path forward.
The problem with stories like this is that they’re not a win for anyone. Companies aren’t going to back away from Azure and go to AWS; instead, they’re going to consider it “Enterprise Computing Groundhog Day,” figure that the groundhog saw its shadow, and now we’re facing six more years of on-premises no cloud.
Azure’s loss is not a win for AWS and GCP
Capacity shortfalls like this damage the overall perception of the public cloud. AWS and GCP don’t benefit from this kind of story. Every player in the space loses instead.
Security issues have the same dynamic. Company fails to secure S3 bucket properly, leaks customer data isn’t a win for AWS’s competitors. It’s a talking point for the folks who wham on and on endlessly about how the “public cloud is insecure.”
I don’t care what public cloud provider a given company chooses; I have my preferences of course, but they’re aligned through the lens of my own use case.
At the end of the day, you need to pick the provider that works best for you.
I’m just disappointed when issues like this crop up and make me look like a bit of a fool for advocating for cloud computing in general.
This feels like “cloud growing pains” now that we’re seeing the first real globally unexpected capacity test for cloud services. Even though these services seem old and battle-tested to those of us who make our livings off of cloud computing, it’s important to remember and understand that cloud is still a relatively new technology in the grand scheme of things.
Even well-run companies with massive resources like Microsoft are still trying to iron the kinks out; I wish them well with it.