AWS continues to position itself as a leader in AI, touting its Inferentia and Trainium chips, both of which my spell checker quite sensibly flag as cause for alarm.
The concern isn’t unfounded. Amazon’s annoying chatbot tells us that:
AWS Trainium is not a GPU. It is a purpose-built AI chip designed by AWS specifically for training deep learning and generative AI models. Trainium chips include specialized scalar, vector, and tensor engines optimized for deep learning algorithms, providing higher performance and efficiency compared to GPUs.
This is all well and good—but for the folks training models (you probably should not be doing this), all of their software works atop GPUs and, more realistically, NVIDIA’s software stack. Rethinking how those jobs work is going to be a heavy lift until it’s as simple as passing a flag to the ML software that developers are already using.
I have the strong sense that a lot of AWS’s AI chip strategy is driven by the fear of NVIDIA being able to dictate terms to them. Amazon is very much not used to being the negotiating party with less power, and it seems like it’s freaking them out. I don’t believe that taking an approach of “we’ll build our own GPUs but don’t call them that” is going to solve that problem for them any time soon; Nvidia’s moat is only getting stronger, given that they’re increasingly not just a hardware manufacturer, but an integrated software company as well.
That said, 2025 feels like the right time to look at AWS’s AI strategy. How well is AWS executing on its three-layer approach to AI?
The layers are quite imaginatively named:
- Bottom layer, which provides infrastructure for foundation model (FM) training.
- Middle layer, which provides tools for building and training FMs and large language models (LLMs).
- Top layer, which includes apps that use FMs and LLMs, like our pal Q.
Let’s review how AWS is doing at each of these.
Bottom layer: AI infrastructure
The annoying part here is that with its focus on AI chips, AWS is skipping past the actual meaningful AI infrastructure it has been building for years: its compute and storage options.
Whatever the instance type, AWS’s ecosystem of EC2 management means that you’ll be hard-pressed to find a better place to run scaled-out workloads that you then—and this is key!—turn off when you’re done using them. AWS’s storage offerings are likewise awesome. They’ve got infinite capacity (AWS can add drives faster than you can fill them), they’re increasingly low-latency, and this is exactly the sort of thing folks tell me is key for ML training runs.
Assessment: Yippee-ki-yay!
Middle layer: Foundation models
Until re:Invent, this would have been a punchline, but I’m begrudgingly impressed with the 11th hour entry of Amazon Nova models. While I was prepared to dunk on them as being rewarmed Titan models, just with a brand that hadn’t been beaten six ways to Sunday, they’re in fact comparable to a number of other best-of-breed models at a significant cost savings. This is going to be worth watching in 2025, and I’d argue changes the game for how AWS’s GenAI marketspeak is going to land, provided that it can construct a cohesive narrative around when and where you should use the FMs.
Access to models via the Amazon Bedrock offering slots in here. I find it’s increasingly undifferentiated, as every provider has a roughly equivalent way to query hosted models of your choice. If you’re in AWS’s ecosystem, you’ll use this and it’s… fine, I guess?
There are some benefits, as my colleagues at The Duckbill Group point out. With Bedrock, models are running in isolated environments, much like most of their higher-level services, and they provide all of the same security features. Querying Anthropic in basically the same way results in vastly different guarantees for data sovereignty and protection. Bedrock also offers a singular approach to accessing any of its models (though they hide the Converse API endpoint that you want to use if you’re doing such a thing). For example, Cloud Economist Eric Pullen wrote a single Lambda function that could send meeting summarization queries to each of the models without much effort, and he only had to deal with the standard IAM setup to do so. Try that with any other provider and you are dealing with multiple ways to authenticate the API for each provider.
Personally, I’ve been using Anthropic and OpenAI’s respective APIs directly for a while now, and I don’t feel that I’m missing anything by not inserting Bedrock into the mix. Customer reports continue to highlight sharp edges around this service that are strongly indicative that it was hurled out the door before it was ready; as one example, using those Nova models in Bedrock requires an arcane change to the API calls that Anthropic’s models don’t. Ideally this will be consistent… eventually.
I will say that Bedrock left Amazon better positioned than I’d have thought to embrace DeepSeek while the rest of the industry was freaking out about it—so I misread that one in my criticisms last year.
Assessment: All right, all right, all right.
Top layer: GenAI services and tools
Sadly, some of the stuff that predates the current AI craze constitutes some of AWS’s strongest offerings. Amazon Textract and AWS Transcribe are both years old and awesome at what they do (OCR and audio transcription, respectively). I’d go so far as to say they’re bright lights among AWS services. Conversely, Amazon Q Developer (the part that integrates into my IDE, since “Amazon Q” has become a catchall brand that dilutes all meaning) seems more concerned that it might say or do something unfortunate, so it’s quick to refuse and feels like it’s coming from a very defensive position. Copilot, Cursor, and Windsurf don’t suffer from this particular malady. And I frankly don’t know what the hell Amazon Q Business is supposed to be, beyond “pay us per seat please.”
Assessment: This is a mixed bag.
AWS’s missed opportunities to talk about AI
Perhaps most maddening of all is that areas in which their Machine Learning strengths shine aren’t really discussed in the same setting as the rest of their AI positioning. I’m speaking specifically about AWS Compute Optimizer. This thing rightsizes workloads better than anything I’ve seen available or built myself. From AWS’s perspective of seeing All The Workloads, it can very accurately assess what a given application is going to need in terms of performance and scale resources appropriately. The recommendations used to be laughable when Compute Optimizer first launched, but nobody’s laughing now. I would not want to be selling a rightsizing tool when this thing’s available for free.
Compute Optimizer is an example of AWS at its best: it has taken advantage of its vast and deep experience running workloads, applied that knowledge to customers in a self-service way that’s remarkably easy to use, and given the service doing all of this a bad name (the Compute Optimizer also optimizes disk volumes). AWS went a step further and met customers where we are by letting us bias the recommendations when we feel we know better than AWS what we need. (Generally speaking, we do not—but it makes us feel better to have these knobs and dials available.) And, of course, their Savings Plan Analyzer blows the doors off of anything you’d been able to build in-house. That’s not an interesting enough AI use case for them to talk about, unfortunately—but it’s amazing.
What makes AWS’s year in AI hard to forecast
Historically, AWS has been very reluctant to oversell its own capabilities. I’ve lost count of the times I’ve been using an AWS offering, discovered it does something amazing, and asked very loudly why they aren’t telling this story more effectively. (VPC Lattice is a good example of this. It’s an end-run around some data transfer fees, dramatically simplifies AWS networking, and most people reading this have never heard of it before.)
Marketing for this era of GenAI has been very different. AWS is instead telling stories about AI capabilities that aren’t fully baked yet, meaning it’s very hard to distinguish between hype and reality.
What’s apparent is that AWS isn’t sitting still. As cloud increasingly becomes undifferentiated, it’s going to be instructive to watch AWS’s AI strategy in 2025 to see how it positions what it does with AI.