Episode Summary
Join Jesse, Amy, and Tim as they set the record straight on what concoctions can actually be called chili and which can not, how you’re out of luck if you’re trying to predict the cost of your architecture proactively, why you should turn on Cost Explorer Cost Anomaly Detection, how no one is required to run their applications in AWS, how it could be cheaper to host your apps on bare metal in certain scenarios, cost categorization and how to measure usage costs vs. base costs, why you need to leave things on as you onboard new architectures and applications, why you shouldn’t maintain something that’s not the core of what you do, and more.
Episode Show Notes & Transcript
Transcript
Corey: This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into production. I’m going to just guess that it’s awful because it’s always awful. No one loves their deployment process. What if launching new features didn’t require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren’t what you expect? LaunchDarkly does exactly this. To learn more, visit launchdarkly.com and tell them Corey sent you, and watch for the wince.
Jesse: Hello, and welcome to AWS Morning Brief: Fridays From the Field. I’m Jesse DeRose.
Amy: I’m Amy Negrette.
Tim: And I’m Tim Banks.
Jesse: This is the podcast within a podcast where we talk about all the ways we’ve seen AWS used and abused in the wild, with a healthy dose of complaining about AWS for good measure. Today is a very special episode for two reasons. First, we’re going to be talking about all the things that you want to talk about. That’s right, it’s time for another Q&A session. Get hyped.
Amy: And second as is Duckbill’s customary hazing ritual, we’re putting a new Duckbill Group Cloud Economist Tim Banks through the wringer to answer some of your pressing questions about cloud costs and AWS. And he has pretty much the best hobbies.
Tim: [laugh].
Jesse: Absolutely.
Tim: You know, I choke people for fun.
Jesse: [laugh]. I don’t even know where to begin with that. I—you know—
Amy: It’s the best LinkedIn bio, that’s [laugh] where you begin with that.
Tim: Yeah, I will change it right after this, I promise. But no, I think it’s funny, we were talking about Jiu-Jitsu as a hobby, but my other hobby is I like to cook a lot, and I’m an avid, avid chili purist. And we were in a meeting earlier and Amy mentioned something about a bowl of sweet chili. And, dear listeners, let me tell you, I was aghast.
Amy: It’s more of a sweet stewed meat than it is, like, some kind of, like, meat candy. It is not a meat candy. Filipinos make very sweet stews because we cannot handle chili, and honestly, we shouldn’t be able to handle anything that’s caramelized or has sugar in it, but we try to anyway. [laugh].
Tim: But this sounds interesting, but I don’t know that I would categorize it as chili, especially if it has beans in it.
Jesse: It has beans. We put beans in everything.
Tim: Oh, then it can’t be chili.
Jesse: Are you a purist that your chili cannot have beans in it?
Tim: Well, no. Chili doesn’t have beans in it.
Amy: Filipino food has beans in it. Our desserts have beans in it. [laugh].
Jesse: We are going to pivot, we’re going to hard pivot this episode to just talk about the basis of what a chili recipe consists of. Sorry, listeners, no cost discussions today.
Tim: Well, I mean, it’s a short list: a chili contains meat and it contains heat.
Jesse: [laugh].
Tim: That’s it. No tomatoes, no beans, no corn, or spaghetti, or whatever people put in it.
Amy: Okay, obviously the solution is that we do some kind of cook-off where Tim and Pete cook for everybody, and we pull in Pete as a special quote-unquote, outside consultant, and I just eat a lot of food, and I’m cool with that. [laugh].
Jesse: I agree to this.
Tim: Pete is afraid of me, so I’m pretty sure he’s going to pick my chili.
Jesse: [laugh].
Amy: I could see him doing that. But also, I just like eating food.
Tim: No, no, it’s great. We should definitely do a chili cook-off. But yeah, I am willing to entertain any questions about, you know, chili, and I’m willing to defend my stance with facts and the truth. So…
Amy: If you have some meat—or [sheet 00:03:19]—related questions, please get into our DMs on Twitter.
Jesse: [laugh]. All right. Well, thank you to everyone who submitted their listener questions. We’ve picked a few that we would like to talk about here today. I will kick us off with the first question.
This first question says, “Long-time listener first-time caller. As a solo developer, I’m really interested in using some of AWS’s services. Recently, I came across AWS’s Copilot, and it looks like a potentially great solution for deployment of a basic architecture for a SaaS-type product that I’m developing. I’m concerned that messing around with Copilot might lead to an accidental large bill that I can’t afford as a solo dev. So, I was wondering, do you have a particular [bizing 00:04:04] availability approach when dealing with a new AWS service, ideally, specific steps or places to start with tracking billing? And then specifically for Copilot, how could I set it up so it can trip off billing alarms if my setup goes over a certain threshold? Is there a way to keep track of cost from the beginning?”
Tim: AWS has some basic billing alerts in there. They are always going to be kind of reactive.
Jesse: Yes.
Amy: They can detect some trends, but as a solo developer, what you’re going to get is notification that the previous day’s spending was pretty high. And then you’ll be able to trend it out over that way. As far as asking if there’s a proactive way to predict what the cost of your particular architecture is going to be, the easy answer is going to be no. Not one that’s not going to be cost-prohibitive to purchase a sole developer.
Jesse: Yeah, I definitely recommend setting up those reactive billing alerts. They’re not going to solve all of your use cases here, but they’re definitely better than nothing. And the one that I definitely am thinking of that I would recommend turning on is the Cost Explorer Cost Anomaly Detector because that actually looks at your spend based on a specific service, a specific AWS cost category, a specific user-defined cost allocation tag. And it’ll tell you if there is a spike in spend. Now, if your spend is just continuing to grow steadily, Cost Anomaly Detector isn’t going to give you all the information you want.
It’s only going to look for those anomalous spikes where all of a sudden, you turned something on that you meant to turn off, and left it on. But it’s still something that’s going to start giving you some feedback and information over time that may help you keep an eye on your billing usage and your spend.
Amy: Another thing we highly recommend is to have a thorough tagging strategy, especially if you’re using a service to deploy resources. Because you want to make sure that all of your resources, you know what they do and you know who they get charged to. And Copilot does allow you to do resource tagging within it, and then from there should be able to convert them to cost allocation tags so you can see them in your console.
Jesse: Awesome. Well, our next question is from Rob. Rob asks, “How do I stay HIPAA compliant, but keep my savings down? Do I really need VPC Flow Logs on? Could we talk in general about the security options in AWS and their cost impact? My security team wants everything on but it would cost us ten times our actual AWS bill.”
Rob, we have actually seen this from a number of clients. It is a tough conversation to have because the person in charge of the bill wants to make sure that spend is down, but security may need certain security measures in place, product may need certain measures in place for service level agreements or service level objectives, and there’s absolutely a need to find that balance between cost optimization and all of these compliance needs.
Tim: Yeah, I think it’s also really important to thoroughly understand what the compliance requirements are. Fairly certain for HIPAA that you may not have to have VPC Flow Logs specifically enabled. The language is something like, ‘logging of visitors to the site’ or something like that. So, you need to be very clear and concise about what you actually need, and remember, for compliance, typically it’s just a box check. It’s not going to be a how much or what percent; it’s going to be, “Do you have this or do you not?”
And so if the HIPAA compliance changes where you absolutely have to have VPC Flow Logging turned on, then there’s not going to be a way around that in order to maintain your compliance. But if the language is not specifically requiring that, then you don’t have to, and that’s going to become something you have to square with your security team. There are ways to do those kinds of logging on other things depending on what your application stack looks like, but that’s definitely a conversation you’re going to want to have, either with your security team, with your product architects, or maybe even outside or third-party consultant.
Jesse: Another thing to think about here is, how much is each of these features in AWS costing you? How much are these security regulations, the SLA architecture choices, how much are each of those things costing you in AWS? Because that is ultimately part of the conversation, too. You can go back to security, or product, or whoever and say, “I understand that this is a business requirement. This is how much it’s costing the business.”
And that doesn’t mean that they have to change it, but that is now additional information that everybody has to collaboratively decide, “Okay, is it worthwhile for us to have this restriction, have this compliance component at this cost?” And again, as Tim was mentioning, if it is something that needs to be set up for compliance purposes, for audit purposes, then there’s not really a lot you can do. It’s kind of a, I don’t want to say sunk cost, but it is a cost that you need to understand that is required for that feature. But if it’s not something that is required for audit purposes, if it’s not something that just needs to be, like, a checkbox, maybe there’s an opportunity here if the cost is so high that you can change the feature in a way that brings the cost down a little bit but still gives security, or product, or whoever else the reassurances that they need.
Tim: I think the other very important thing to remember is that you are not required to run your application in AWS.
Jesse: Yeah.
Tim: You can run it on-premise, you can run at a different cloud provider. If it’s going to be cost-prohibitive to run at AWS and you can’t get the cost down to a manageable level, through, kind of, normal cost reduction methods of EDPs, or your pricing agreement, remember you can always put that on bare metal somewhere and then you will be able to have the logging for free. Now, mind you, you’re going to have to spend money elsewhere to get that done, but you’re going to have to look and see what the overall cost is going to be. It may, in fact, be much less expensive to host that on metal, or at a different provider than it would be at AWS.
Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn’t translate well to cloud or multi-cloud environments, and that’s not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.
Jesse: Our next question is from Trevor Shaffer. He says, “Loving these Friday from the field episodes and the costing”—thank you—“I’m in that world right now, so all of this hits home for me. One topic not covered with the cost categorization, which I’m tasked with, is how to separate base costs versus usage costs. Case in point, we’re driving towards cost metrics based on users and prices go up as users go up. All of that makes sense, but there’s always that base load required to serve quote-unquote, ‘no users.’
“The ALP instance hours, versus the LCU hour, minimum number of EC2 instances for high availability, things like that. Currently, you can’t tag those differently, so I think I’m just doomed here and my hopes will be dashed. For us, our base costs are about 25% of our bill. Looking for tricks on how to do this one well. You can get close with a lot of scripting and time, teasing out each item manually.” Trevor, you can, and I also think that is definitely going to be a pain point if you start scripting some of these things. That sounds like a lot of
effort that may give you some useful information, but I don’t know if it’s going to give you all of the information that you want.
effort that may give you some useful information, but I don’t know if it’s going to give you all of the information that you want.
Tim: Well, it’s also a lot of effort, and it’s also room for error. It won’t take but a simple error in anything that you write where these costs can then be calculated incorrectly. So, that’s something to consider as well: is it worth the overall costs of engineering time, and maintenance, and everything like that, to write these scripts? These are decisions that engineers groups have to make all the time. That said, I do think that this is, for me I think, one of the larger problems that you see with AWS billing is that it is difficult to differentiate something that should be reasonably difficult to differentiate.
If I get my cell phone bill, I know exactly how much it’s going to cost us to have the line, and then I can see exactly how much it’s going to cost me for the minutes. The usage cost is very easily separated from—I’m sorry, the base cost is very easily separated from the usage cost. It’s not always that way with AWS, I do think that’s something that they could fix.
Jesse: Yeah, one thing that I’ve been thinking of is, I don’t want to just recommend turning things on and measuring, but I’m thinking about this from the same perspective that you would think about getting a baseline for any kind of monitoring service: as you turn on a metric or as you start introducing a new metric before you start building alerts for that metric, you need to let that metric run for a certain amount of time to see what the baseline number, usage amount, whatever, looks like before you can start setting alerts. I’m thinking about that same thing here. I know that’s a tougher thing to do when this is actually cost involved when it’s actually costing you money to leave something on and just watch what usage looks like over time, but that is something that will give you the closest idea of what base costs look like. And one of the things to think about, again, is if the base costs are unwieldy for you or not worthwhile for you in terms of the way the architecture is built, is there either a different way that you can build the architecture that is maybe more ephemeral that will make it cost less when there are no users active? Is there a different cloud provider that you can deploy these resources to that is going to ultimately cost you less when you have no users active?
Tim: I think too, though, that when you have these discussions with engineering teams and they’re looking at what their priorities are going to be and what the engineering cost is going to be, oftentimes, they’re going to want metrics on how much is this costing us—how much would it cost otherwise? What is our base cost, what’s our usage cost?—so that you can make a case and justify it with numbers. So, you may think that it is better to run this somewhere else or to re-architect your infrastructure around this, but you’re going to have to have some data to back it up. And if this is what you need to gather that data, then yeah, it is definitely a pain point.
Amy: I agree. I think this is one of those cases where—and I am also loath to just leave things on for the sake of it, but especially as you onboard new architectures and new applications, this should be done at that stage when you start standing things up and finalizing that architecture. Once you know the kind of architecture you want and you’re pushing things to production, find out what that baseline is, have it be part of that process, and have it be a cost of that process. And finally, “As someone new to AWS and wanting to become a software DevOps insert-buzzword-here engineer”—I’m a buzzword engineer—“We’ve been creating projects in Amplify, Elastic Beanstalk, and other services. I keep the good ones alive and have done a pretty good job of killing things off when I don’t need it. What are your thoughts on free managed services in general when it comes to cost transparencies with less than five months left on my free year? Is it a bad idea to use them as someone who is just job hunting? I’m willing to spend a little per month, but don’t want to be here with a giant bill.”
So, chances are if you’re learning a new technology or a new service, unless you run into that pitfall where you’re going to get a big bill as a surprise and you’ve been pretty diligent about turning your services off, your bill is not going to rise that much higher. That said, there have been a lot of instances, on Twitter especially, popping up where they are getting very large bills. If you’re not using them and you’re not actively learning on them, I would just turn them off so you don’t forget later. We’ve also talked about this in our build versus buy, where that is the good thing about having as a managed service is if you don’t need it anymore and you’re not learning or using them, you can just turn them off. And if you have less than half a year on your first free year, there are plenty of services that have a relatively free tier or a really cheap tier at the start, so if you want to go back and learn on them later, you still could.
Tim: I think too, Amy, it’s also important to reflect, at least for this person, that if they’re in an environment where they’re trying to learn something if maintaining infrastructure is not the main core of what they’re trying to learn, then I wouldn’t do it. The reason that they have these managed services is to allow engineering teams to be more focused on the things that they want to do as far as development versus the things they have to do around infrastructure management. If you don’t have an operations team or an infrastructure team, then maintaining the infrastructure on your own sometimes can become unwieldy to the point that you’re not really even learning the thing you wanted to learn; now you’re learning how to manage Elasticsearch.
Amy: Yeah.
Jesse: Absolutely. I think that’s one of the most critical things to think about here. These managed services give you the opportunity to use all these services without managing the infrastructure overhead. And to me, there may be a little bit extra costs involved for that, but to me that cost is worth the freedom to not worry about managing the infrastructure, to be able to just spin up a cluster of something and play with it. And then when you’re done, obviously, make sure you turn it off, but you don’t have to worry about the infrastructure unless you’re specifically going to be looking for work where you do need to manage that infrastructure, and that’s a separate question entirely.
Amy: Yeah. I’m not an infrastructure engineer, so anytime I’m not using infrastructure, and I’m not using a service, I just—I make sure everything’s turned off. Deleting stacks is very cathartic for me, just letting everything—just watching it all float away into the sunset does a lot for me, just knowing that it’s not one more thing I’m going to have to watch over because it’s not a thing I like doing or want to do. So yeah, if that’s not what you want to do, then don’t leave them on and just clean up after yourself, I suppose. [laugh].
Tim: I’ll even say that even if you’re an infrastructure engineer, which is my background, that you can test your automation of building and all this, you know, building a cluster, deploying things like that, and then tear it down and get rid of it. You don’t have to leave it up forever. If you’re load testing an application, that’s a whole different thing, but that’s probably not what you’re doing if you’re concerned about the free tier costs. So yeah, if you’re learning Terraform, you can absolutely deploy a cluster or something and just tear it back out as soon as you’re done. If you’re learning how to manage whatever it is, build it, test it, make sure it runs, and then tear it back down.
Jesse: All righty, folks, that’s going to do it for us this week. If you’ve got questions you would like us to answer, please go to lastweekinaws.com/QA, fill out the form and we’d be happy to answer those on a future episode of the show. If you’ve enjoyed this podcast, please go to lastweekinaws.com/review and give it a five-star review on your podcast platform of choice, whereas if you hated this podcast, please go to lastweekinaws.com/review, give it a five-star rating on your podcast platform of choice and tell us whether you prefer sweet chili or spicy chili.
Announcer: This has been a HumblePod production. Stay humble.