Episode Summary
Join me as I continue the Whiteboard Confessional series by talking about how I log into all of the various AWS accounts I use for work, why using IAM passwords and username pairs is patently ridiculous, how AWS Single Sign-On is supposed to be great but just makes me angry, everything there is to know about aws-vault and why I needed a better solution, a complicated workaround I created for password management that was ridiculously overbuilt but works, and more.
Episode Show Notes & Transcript
About Corey Quinn
Over the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.
Links
Transcript
Corey: Welcome to AWS Morning Brief: Whiteboard Confessional. I’m Cloud Economist Corey Quinn. This weekly show exposes the semi-polite lie that is whiteboard architecture diagrams. You see, a child can draw a whiteboard architecture, but the real world is a mess. We discuss the hilariously bad decisions that make it into shipping products, the unfortunate hacks the real-world forces us to build, and that the best to call your staging environment is “theory”. Because invariably whatever you’ve built works in the theory, but not in production. Let’s get to it.
This episode is sponsored in part by ParkMyCloud, fellow worshipers at the altar of turned out [bleep] off. ParkMyCloud makes it easy for you to ensure you're using public cloud like the utility it's meant to be. just like water and electricity, You pay for most cloud resources when they're turned on, whether or not you're using them. Just like water and electricity, keep them away from the other computers. Use ParkMyCloud to automatically identify and eliminate wasted cloud spend from idle, oversized, and unnecessary resources. It's easy to use and start reducing your cloud bills. get started for free at parkmycloud.com/screaming.
In today's episode of the Whiteboard Confessional on the AWS Morning Brief, I want to talk to you about how I log into AWS accounts. Now, obviously, I've got a fair few of them here at The Duckbill Group, ranging from accounts that I use to test out new services, to the accounts that run my Last Week in AWS newsletter production things, to my legacy account because of course I have a legacy account for a four-year-old company. This is the Cloud we're talking about. And, as of this writing, they add up to currently 17 accounts in our AWS organization.
Beyond that, there's a lot more we have to worry about. We assume restricted roles into client AWS accounts to conduct our cost analyses. Getting those set up has been a bit of a challenge historically. We have a way of doing it now that we've open-sourced in our company GitHub repo. Someday, someone will presumably discover this, and then I'll get to tell that story. Now, to add all of this complex nonsense, let's not forget that back when I used to travel to other places, before the dark times we're currently living in, I used to do all of my work when I was on the road from an iPad Pro.
So what was the way to intelligently manage logging into all of these different accounts and keep them straight? Now, using IAM passwords and username pairs is patently ridiculous. By the time you take in whatever accounts I'm currently working on, we've got, eh, 40 AWS accounts to care about, which would completely take over my password manager if I go down that path, it further wouldn't solve for the problem of most of the time I interact with these accounts only via API. Now, that's not entirely true because, as we've mentioned, the highest level of configuration management enlightenment is, of course, to use the console, and then lie about it.
Today, I want to talk about how I chained together several ridiculous things to achieve an outcome that works for basically all of these problems. There are almost certainly better ways to do this than what I do. I keep hearing rumors that AWS Single Sign-On can do all this stuff in a better way, but every time I attempt to use it, I get confused and angry and storm off to do something else. So here's what I do. First, I start with my baseline AWS account that has an actual IAM user with a permanent set of credentials in it. That's my starting point. Now, I store those credentials on my Mac in Keychain, and on my EC2 instance running Linux, it lives within the pass utility, which uses GPG-based encryption to store a string securely.
Now, before I get angry letters—because oh, dear Lord, do I get them—let me just say that this is a requirement that instance roles with those ephemeral credentials won't suit. So using an instance role for that EC2 instance won't apply. Specifically, because there's no way today to apply MFA to instance roles, and some of the roles I need to assume do have MFA as a requirement, so that's a complete non-starter. And the way that I manage in these different environments, those effective route pair of credentials are managed by a tool that came out of 99 designs called aws-vault. Don't confuse this with HashiCorp’s Vault, which is something else entirely. This started off as a favorite of mine, but given their periodic breaking changes that the aws-vault maintainers have introduced with different versions, it becomes something far less treasured. They'll release a bunch of enhancements that up the version, which is great, but they haven't gotten around to fixing the documentation well, so I have to stumble my way through it, and I'm angry every time I spin up something new, and then I give up and roll back to a version that works.
There are now other tools I'm looking at as an alternative to this, mostly because this behavior has really torqued me off. Now aws-vault, as well as many other tools in the ecosystem, can read your local configuration file in your .aws directory. It uses this for things like chaining roles together, so you can assume a role in an account that then is allowed to assume a role in a different account, and so on and so forth. It can tell you which credential set to use, which MFA device is going to be used to log into accounts, what region that account is going to be primarily based in etcetera. It's surprisingly handy except for when it breaks with aws-vault releases in [unintelligible] what it's expecting to see in that file. I digress again. Sorry, just thinking about this stuff makes me mad, so I'm going to cool down for a second.
Corey: This episode is sponsored in part by ChaosSearch. Now their name isn’t in all caps, so they’re definitely worth talking to. What is ChaosSearch? A scalable log analysis service that lets you add new workloads in minutes, not days or weeks. Click. Boom. Done. ChaosSearch is for you if you’re trying to get a handle on processing multiple terabytes, or more, of log and event data per day, at a disruptive price. One more thing, for those of you that have been down this path of disappointment before, ChaosSearch is a fully managed solution that isn’t playing marketing games when they say “fully managed.” The data lives within your S3 buckets, and that’s really all you have to care about. No managing of servers, but also no data movement. Check them out at chaossearch.io and tell them Corey sent you. Watch for the wince when you say my name. That’s chaossearch.io.
Now I can interact with aws-vault in two ways. One spawns a shell that has all of the usual AWS environment variables you would expect it to have, with temporary credentials and session tokens. Great. Suddenly, every other tool on the planet does not need to be taught how to work with an assumed role. It just runs locally, it sees those environment variables, and they all do the right thing.
Now, that's not quite as exciting to talk about. I’ve built something monstrous, and that's what I'm here to talk about today. Which brings us to the second way that I interact with aws-vault, which is to have it spit out a URL that logs you into the AWS console. Now on a desktop, it automatically opens your browser and logs you in. Of course, this doesn't work super well on an iPad that's remoting into an EC2 instance, or from an EC2 instance at all for that matter. It turns out that with super small text on a high-resolution display, the URL that I use to log in that this thing spits out is three lines long since signed URLs in AWS land are apparently some of that experience they claim that there's no compression algorithm for.
So this is where I took something that's already monstrous and made it worse. I built a shortcut that spits out a login link to generate that long signed link [unintelligible] above. Cool. Then I pipe that result to a script that I wrote. That script generates a UUID or Universally Unique Identifier. This is a 128-bit number. The odds of generating two that are the same are astronomical. Specifically, if you generate a billion of these a second, you'll have one collision every 85 years. Next, that UUID becomes the name of an S3 object. That object is set to redirect to the stupidly long URL that aws-vault has spit out. So I have a short link I can now click, but it turns into that long link through a redirect.
But wait; I'm not done. That URL is potentially dangerous. If anyone else sees it, they can log in as me. I've gotten around this in a few ways. The S3 bucket that serves the redirect is fronted by a CloudFront endpoint. SSL is required: at no point is the URL going to be communicated in cleartext. Further, I wrote a Lambda@Edge function that's attached to that CloudFront distribution. When it receives the request, it not only returns the redirect, it also deletes the S3 object that's referenced. So while this does sometimes mean that the link is cached in my local browser, I've disabled caching in CloudFront. So now when I do race tests between two computers, the first one will resolve the link and log me into the console, the second computer, hit at roughly the same time, does not. Lastly, should the Lambda function ever fail, there is a periodic reaping job that removes all the redirects from that bucket, taking care to bypass the index file, which exists solely to prevent people from seeing the redirect objects that are currently there.
A few security friends of mine took a look at this and all came to the same conclusion: this is, A) ridiculous, B) overbuilt and, C) it works. That beautiful trifecta of a combination made it a perfect topic to discuss in this week's episode of the AWS Morning Brief: Whiteboard Confessional.
I am Cloud Economist Corey Quinn. This is the AWS Morning Brief. And if you've enjoyed this podcast, please leave a five-star review on Apple Podcasts. Whereas if you've hated it and found it appalling, leave a five-star review on Apple Podcasts anyway, and tell me what I should be using instead.
Thank you for joining us on Whiteboard Confessional. If you have terrifying ideas, please reach out to me on twitter at @quinnypig and let me know what I should talk about next time.
Announcer: This has been a HumblePod production. Stay humble.