For those who are just stumbling upon this blog for the first time with no context, I'm a very loud critic of Amazon Web Services whenever I feel they deserve it (services that are poorly documented, status pages that don't update, catering options that don't appeal to my appetite that day, etc.). Today, I was summoned by what felt like half of the operational community to chime in on Amazon Breaking the Internet.
The only trouble with that line of thinking is that Amazon didn't do anything wrong here. The root cause was something known as BGP hijacking, and it's taken out Google, YouTube, Mastercard, Facebook, and other staples of the internet. This time, it took over roughly 1300 AWS-owned IP addresses in order to reroute traffic so that some sophisticated attacker could gain access to "MyEtherWallet," which I presume to be some form of Easter candy savings account.
BGP is arcane to most people, but as a quick explanation, it underpins the entire internet. Picture "DNS, but for routes." If that description enrages you, great-- it's not for you. As with most protocols from the 20th century, it was built for a simpler time.
A time when users naively believed they could trust one another.
A time when the Internet was largely a social experiment.
A time when you couldn't make six figures in two hours because users would naively click through SSL errors and then trade cryptocurrencies. (I'm a big believer in blameless postmortems, but if you click through a VERY explicit SSL warning and then conduct financial transactions, I'm sorry-- you're at least partially responsible for whatever happens next.)
And as to reports that "nobody noticed for two hours," that's sheer lunacy. I've had the privilege of working with some incredibly intelligent network engineers over the course of my career-- all of whom pale in comparison to the people working at AWS in the bowels of engineering. Those folks don't miss a trick. "Someone else announces one of your routes" is one of those very, very, very obvious things to see if you're looking for it from the right vantage point. I'd believe that nobody at AWS noticed it for two minutes, maybe three at most.
The trouble is, in a situation like this, all possible remedies take time. The "what the hell are you doing?!" phone calls to the upstream ISP, the publication of new routes and waiting for reconvergence, and of course explaining all of this to the corporate communications people who quite possibly have never heard of BGP before today.
The internet is fundamentally broken in this way; bad actors can cause disruption and woe for huge numbers of people. It's a giant problem, to be sure-- but it's one for which Amazon isn't (for once!) responsible. Blaming them for this issue is at worst dishonest, and at best intellectually lazy.