Screaming in the Cloud Archives

Episode 34: Slack and the Safety Dance of Chaos Engineering

Screaming in the Cloud

10.30.2018

33 Minutes

In the early days, angry nerd corners on the Internet viewed Slack and some of its predecessors as, “Oh, it’s just IRC. Now, you pay someone for it.” Many fell into that trap of wondering about what value such systems offered.The big differentiator? Slack is built as a collaborative business tool. Today, we’re talking to Holly Allen, who helped make government software better while serving as the director of engineering at 18F. Now, she’s a senior engineering manager at Slack, a collaborative chat program where you can do most of your work through a rich platform of integrations. Holly enjoys taking a weird set of skills that make a computer do things and convincing people who know how to make computers do things do things. Some of the highlights of the show include: Safety engineering brings chaos and resilience engineering, incident management, and post-mortem processes together for resiliency and reliability Slack strives to move really fast while being in complete control Slack is primarily on AWS, but is working on a multi-Cloud strategy because if AWS is down, Slack still needs to work Slack has a close relationship with AWS and is a collaborative company; it has immediate access to AWS staff anytime there’s a problem Slack uses Terraform and Chef and working to determine if its production workflows in Kubernetes would be worthwhile Disasterpiece Theater: Real scenario that might happen and surmise what will happen; don’t cause production issues, but teach Slack employees Slack hires collaborative, empathetic people to create a collaborative environment where everyone works together toward a goal Slack was firmly in a centralized operations model, but is transforming toward development teams to increase responsibility and service ownership Slack doesn’t encourage remote work because it’s not in a position to put in that investment; day-to-day work happens in hallways and between desks Slack sees itself as an enterprise software company; an enterprise software company must have enterprise software reliability, stability, and processes Slack has thousands of servers, so events and disruptions happen more often; system needs to respond, react, and repair itself without human intervention Links: Holly Allen on Twitter 18F Slack Freenode IRC HipChat AWS Kubernetes Terraform Chef QCon Datadog

Play Episode

Episode 33: The Worst Manager I Ever Had Spoke Only In Metaphor

Screaming in the Cloud

10.23.2018

30 Minutes

If you’ve been doing DevOps for the past 10-20 years, things have really changed in the industry. There’s no longer large pools of help desk support. People aren’t climbing around the data center and learning how to punch down cables and rack servers to gradually work their way up. Now, entry level DevOps jobs require about five years of experience. So, that’s where internships play a major role. But how can an internship program be set up for success? Where is the next generation of SREs or DevOps professionals coming from? Where do we find them? Today, we’re talking to Fatema Boxwala, who has been an intern at Rackspace, Yelp, and Facebook. She’s a computer science student at the University of Waterloo in Canada, where she’s involved with the Women in Computer Science Committee and Computer Science Club. Occasionally, she teaches people about Python, Git, and systems administration. Some of the highlights of the show include: Mentors made Fatema’s intern experience positive for her; made site reliability and operations something she wanted to do Academic paths don’t tend to focus on such fields as SRE, and interns tend to come exclusively from specific schools Fatema’s school requires five internships to graduate and receive a degree; upper-year students are already very qualified professional software engineers Companies don’t have time to train and want to find someone with an exact skill set; instead of hiring someone, they spend months with an unfilled position Continuity Problem: You can’t train someone to be a systems administrator, if you aren’t willing to give them certain privileges due to inexperience Use a low-stakes environment to train, where mistakes can be made; most systems aren’t on a critical path - don’t keep people away from contributing If you have never broke production, that means either you’re lying or you’ve been in an environment that didn’t trust you to touch things that mattered Internship should mimic the kind of work that everyone else is doing; give them responsibilities where their work has an impact Bad mentors lead to bad internships; person in charge of your success doesn’t have the necessary skills; needs to be a good communicator, set expectations As the intern, ask about possible outcomes of internship early on; mentors should be clear about expectations, feedback, and offers Links: Fatema Boxwala Fatema Boxwala on Twitter Jackie Luo on Twitter Julia Evans Zines on Twitter SREcon MEA Digital Ocean

Play Episode

Episode 32: Lambda School: A New Approach to “Hire Ed”

Screaming in the Cloud

10.16.2018

26 Minutes

Are you interested in computer science? How would you like to go to school for free and learn what you need to in just a few months? Then, check out Lambda School! Today, we’re talking to Ben Nelson, co-founder and CTO of Lambda School, which is a 30-week online immersive computer science academy. Lambda School has more than 500 students and takes a share of future earnings instead of traditional debt. So, it's free until students get a job. Some of the highlights of the show include: Bootcamps were created to address engineering shortages and quickly move people into technical careers Lambda is not explicitly a bootcamp; its 30-week program gives students more instructions and more time spent on developing a portfolio Lambda also makes time to cover computer science fundamentals; teaches C, Python, Django, and relational database - not just JavaScript Employers appreciate the school’s in-depth and advanced approach, which results in repeat hires Lambda avoids the typical reputation of traditional for-profit educational institutions by being mission-driven and knowing its investors want ROI Lambda aligns its incentives with those of students; an income share agreement means the school doesn’t make money, unless students are successful Lambda’s 7-month program is less of a risk for someone later in their career; some don't have capital to support their family while going to school for 4 years Lambda incentivizes healthy financial habits; after two years of repayment, students can put that money into retirement, savings, and investments 5 Tracks Now Offered by Lambda: iOS development, UX, Full Stack Web development, data science, and Android development Mastery Based Progression System: When you're learning something sequentially, where knowledge builds, you don't move on until you’ve mastered it Lambda’s acceptance rate is around 5% and based on people who can keep up Lambda works with different partner companies to help them find qualified graduates - people they want to hire Links: Lambda School Ben Nelson on Twitter Y Combinator Wealthfront Datadog

Play Episode

Episode 31: Hey Sam, wake up. It’s 3am, and time to solve a murder mystery!

Screaming in the Cloud

10.09.2018

39 Minutes

Have you ever been on-call duty as an IT person or otherwise? Woken up at 3 a.m. to solve a problem? Did you have to go through log files or look at a dashboard to figure out what was going on? Did you think there has got to be a better way to troubleshoot and solve problems? Today, we’re talking to Sam Bashton, who previously ran a premiere consulting partner with Amazon Web Services (AWS). Recently, he started runbook.cloud, which is a tool built on top of serverless technology that helps people find and troubleshoot problems within their AWS environment. Some of the highlights of the show include: Runbook.cloud looks at metrics to generate machine learning (ML) intelligence to pinpoint issues and present users with a pre-written set of solutions Runbook.cloud looks at all potential problems that can be detected in context with how the infrastructure is being used without being annoying and useless ML is used to do trend analysis and understand how a specific customer is using a service for a specific auto scaling group or Lambda functions Runbook.cloud takes all aggregate data to influence alerts; if there’s a problem in a specific region with a specific service, the tool is careful to caveat it Various monitoring solutions are on the market; runbook.cloud is designed for a mass market environment; it takes metrics that AWS provides for free and makes it so you don’t need to worry about them Will runbook.cloud compete with or sell out to AWS? Amazon wants to build underlying infrastructure, other people to use its APIs to build interfaces for users Runbook.cloud is sold through AWS Marketplace; it’s a subscription service where you pay by the hour and the charges are added to your AWS bill Amazon vs. Other Cloud Providers: Work is involved to detect problems that address multiple Clouds; it doesn’t make sense to branch out to other Clouds Runbook.cloud was built on top of serverless technology for business financial reasons; way to align outlay and costs because you pay for exactly what you use Analysis paralysis is real; it comes down to getting the emotional toil of making decisions down to as few decision points as possible Save money on Lambda; instead of using several Lambda functions concurrently, put everything into a single function using Go AWS responds to customers to discover how they use its services; it comes down to what customers need Links: Sam Bashton on Twitter runbook.cloud How We Massively Reduced Our AWS Lambda Bill with Go AWS AWS Lambda Microsoft Clippy Honeycomb AWS X-Ray Kubernetes Simon Wardley Go Secrets Manager DynamoDB EFS Digital Ocean

Play Episode

Episode 30: How to Compete with Amazon

Screaming in the Cloud

10.02.2018

42 Minutes

Trying to figure out if Amazon Web Services (AWS) is right for you? Use the “quadrant of doom” to determine your answer. When designing a Cloud architecture, there are factors to consider. Any system you design exists for one reason - support a business. Think about services and their features to make sure they’re right for your implementation. Today, we’re talking to Ernesto Marquez, owner and project director at Concurrency Labs. He helps startups launch and grow their applications on AWS. Ernesto especially enjoys building serverless architectures, automating everything, and helping customers cut their AWS costs. Some of the highlights of the show include: Amazon’s level of discipline, process, and willingness to recognize issues and fix them changed the way Ernesto sees how a system should be operated Specialize on a specific service within AWS, such as S3 and EC2, because there are principles that need to be applied when designing an architecture Sales and Delivery Cycle: Ernesto has a conversation with a client to discuss their different needs Vendor Lock-in: Customers concerned about moving application to Cloud provider and how difficult it will be to move code and design variables elsewhere For every service you include in your architecture, evaluate the service within the context of a particular business case Identify failure scenarios, what can go wrong, and if something goes wrong, how it’s going to be remediated CloudWatching detects events that are going to happen, and you can trigger responses for those events Partnering with Amazon: Companies are pushing a multi-Cloud narrative; you gain visibility and credibility, but it’s not essential to be successful Can you compete against Amazon? Depends on which area you choose Expand product selection to grow, focus on user experience, and improve performance to compete against Amazon MiserBot: Don’t freak out about your bill because Ernesto created a Slack chatbot to monitor your AWS costs Links: Concurrency Labs Ernesto Marquez on Twitter How to Know if an AWS is Right for You MiserBot AWS RDS Lambda Digital Ocean

Play Episode

Episode 29: Future of Serverless: A Toy that will Evolve and Offer Flexibility

Screaming in the Cloud

09.25.2018

32 Minutes

Are you a blogger? Engineer? Web guru? What do you do? If you ask Yan Cui that question, be prepared for several different answers. Today, we’re talking to Yan, who is a principal engineer at DAZN. Also, he writes blog posts and is a course developer. His insightful, engaging, and understandable content resonates with various audiences. And, he’s an AWS serverless hero! Some of the highlights of the show include: Some people get tripped up because they don’t bring microservice practices they learned into the new world of serverless; face many challenges Educate others and share your knowledge; Yan does, as an AWS hero Chaos Engineering Meeting Serverless: Figuring out what types of failures to practice for depends on what services you are using Environment predicated on specific behaviors may mean enumerating bad things that could happen, instead of building a resilient system that works as planned API Gateway: Confusing for users because it can do so many different things; what is the right thing to do, given a particular context, is not always clear Now, serverless feels like a toy, but good enough to run production workflow; future of serverless - will continue to evolve and offer more flexibility Serverless is used to build applications; DevOps/IOT teams and enterprises are adopting serverless because it makes solutions more cost effective Links: Yan Cui on Twitter DAZN Production-Ready Serverless Theburningmonk.com Applying Principles of Chaos Engineering to Serverless AWS Heroes re:Invent Lambda Amazon S3 Service Disruption API Gateway Ben Kehoe Digital Ocean

Play Episode

Episode 28: Serverless as a Consulting Cash Register (now accepting Bitcoin!)

Screaming in the Cloud

09.18.2018

32 Minutes

Is your company thinking about adopting serverless and running with it? Is there a profitable opportunity hidden in it? Ready to go on that journey? Today, we’re talking to Rowan Udell, who works for Versent, an Amazon Web Services (AWS) consulting partner in Australia. Versent focuses on specific practices, including helping customers with rapid migrations to the Clouds and going serverless. Some of the highlights of the show include: Australia is experiencing an increase in developers using serverless tool services and serverless being used for operational purposes Serverless seems to be either a brilliant fit or not quite ready for prime time Misconceptions include keeping functions warm, setting up scheduled indications Simon Wardley talked about how the flow of capital can be traced through an organization that has converted to serverless Concept of paying thousands of dollars up front for a server is going away Spend whatever you want, but be able to explain where the money is going (dev vs. prod); companies will re-evaluate how things get done Serverless is either known as an evolution or revolution; transformative to a point Winding up with a large number of shops where when something breaks, they don’t have the experience to fix it; gain practical experience through sharing Seek developer feedback and perform testing, but know where and when to stop With serverless, you have little control of the environment; focus on automated parts you do control Serverless Movement: People have opinions and want you to know them Understand continuum of options for running your application in the Cloud; learn pros and cons; and pick the right tool Reconciliation between serverless and containers will need to play out; changes will come at some point Blockchain + serverless + machine learning + Kubernetes + service mesh = raise entire seed round Links: Rowan Udell’s Blog Rowan Udell on Twitter Versent on Twitter Lambda Simon Wardley Open Guide to AWS Slack Channel Kubernetes Aurora Digital Ocean

Play Episode

Episode 27: What it Took for Google to Make Changes: Outages and Mean Tweets

Screaming in the Cloud

09.11.2018

29 Minutes

Google Cloud Platform (GCP) turned off a customer that it thought was doing something out of bounds. This led to an Internet outrage, and GCP tried to explain itself and prevent the problem in the future. Today, we’re talking to Daniel Compton, an independent software consultant who focuses on Clojure and large-scale systems. He’s currently building Deps, a private Maven repository service. As a third-party observer, we pick Daniel’s brain about the GCP issue, especially because he wrote a post called, Google Cloud Platform - The Good, Bad, and Ugly (It’s Mostly Good). Some of the highlights of the show include: Recommendations: Use enterprise billing - costs thousands of dollars; add phone number and extra credit card to Google account; get support contract Google describing what happened and how it plans to prevent it in the future seemed reasonable; but why did it take this for Google to make changes? GCP has inherited cultural issues that don’t work in the enterprise market; GCP is painfully learning that they need to change some things Google tends to focus on writing services aimed purely at developers; it struggles to put itself in the shoes of corporate-enterprise IT shops GCP has a few key design decisions that set it apart from AWS; focuses on global resources rather than regional resources When picking a provider, is there a clear winner? AWS or GCP? Consider company’s values, internal capabilities, resources needed, and workload GCP’s tendency to end service on something people are still using vs. AWS never ending a service tends to push people in one direction GCP has built a smaller set of services that are easy to get started with, while AWS has an overwhelming number of services Different Philosophies: Not every developer writes software as if they work at Google; AWS meets customers where they are, fixes issues, and drops prices GCP understands where it needs to catch up and continues to iterate and release features Links: Daniel Compton Daniel Compton on Twitter Google Cloud Platform - The Good, Bad, and Ugly (It’s Mostly Good) Deps The REPL Postmortem for GCP Load Balancer Outage AWS Athena Digital Ocean

Play Episode

Episode 26: I’m not a data scientist, but I work for an AI/ML startup building on Serverless Containers

Screaming in the Cloud

09.04.2018

25 Minutes

Do you deal with a lot of data? Do you need to analyze and interpret data? Veritone’s platform is designed to ingest audio, video, and other data through batch processes to process the media and attach output, such as transcripts or facial recognition data. Today, we’re talking to Christopher Stobie, a DevOps professional with more than seven years of experience building and managing applications. Currently, he is the director of site reliability engineering at Veritone in Costa Mesa, Calif. Veritone positions itself as a provider of artificial intelligence (AI) tools designed to help other companies analyze and organize unstructured data. Previously, Christopher was a technical account manager (TAM) at Amazon Web Services (AWS); lead DevOps engineer at Clear Capital; lead DevOps engineer at ESI; Cloud consultant at Credera; and Patriot/THAAD Missile Fire Control in the U.S. Army. Besides staying busy with DevOps and missiles, he enjoys playing racquetball in short shorts and drinking good (not great) wine. Some of the highlights of the show include: Various problems can be solved with AI; companies are spending time and money on AI Tasks can be automated that are too intelligent to write around simple software Machine learning (ML) models are applicable for many purposes; real people with real problems and who are not academics can use ML Fargate is instant-on Docker containers as a service; handles infrastructure scaling, but involves management expense Instant-on works with numerous containers, but there will probably be a time when it no longer delivers reasonable fleet performance on demand Decision to use Kafka was based on workload, stream-based ingestion Veritone’s writes code that tries to avoid provider lock-in; wants to make an integration as decoupled as possible People spend too much time and energy being agnostic to their technology and giving up benefits If you dream about seeing your name up in lights, Christopher describes the process of writing a post for AWS Pain Points: Newness of Fargate and unfamiliarity with it; limit issues; unable to handle large containers Links: Veritone Christopher Stobie on LinkedIn Building Real Time AI with AWS Fargate SageMaker Fargate Docker Kafka Digital Ocean

Play Episode

Episode 25: Kubernetes is Named After the Greek God of Spending Money on Cloud Services

Screaming in the Cloud

08.28.2018

29 Minutes

Google builds platforms for developers and strives to make them happy. There's a team at Google that wakes up every day to make sure developers have great outcomes with its services and products. The team listens to the developers and brings all feedback back into Google. It also spends a lot of time all over the world talking to and connecting with developer communities and showing stuff being worked on. It doesn't do the team any good to build developer products that developers don’t love. Today, we’re talking to Adam Seligman, vice president of developer relations at Google, where he is responsible for the global developer community across product areas. He is the ears and voice for customers. Some of the highlights of the show include: Google tackles everything in an open source way: Shipping feedback, iteration, and building communities Storytelling - the Tale of Kubernetes: in a short period of time, gone from being open source that Google spearheaded to something sweeping the industry Rise of containerization inside Linux Kernel is an opportunity for Google to share container management technology and philosophy with the world Google Next: Knative journey toward lighter-weight serverless-based applications; and GKE On-Prem, customers and teams working with Kubernetes running on premise Innovation: When logging into GCP console, you can terminate all billable resources assigned to project and access tab for building by hand GCP's console development strategy includes hard work on documentation, making things easy to use, and building thoughtfulness in grouping services Google is about design goals, tradeoffs, and metrics; it’s about hyper scale and global footprint of requirements, as well as supporting every developer Conception 1: Google builds HyperScale Reid-Centric user partitioned apps and don't build globally consistent data driven apps Conception 2: Software engineers at the top Internet companies do the code and write amazing things instantly 12-Factor App: Opinions of how to architect apps; developers should have choices, but take away some cognitive and operating load complexity Businesses are running core workloads on Google, which had to put atomic clocks in data centers and private fiber networking to make it all work Perception that Google focuses on new things, rather than supporting what's been released; industry is on a treadmill chasing shiny things and creating noise Industry needs to be welcoming and inclusive; a demand for software, apps, and innovation, but number of developers remains because everyone’s not included Human vs. Technology: More investment and easier onboarding with technology and an obligation to build local communities Goal: Take database complexity and start removing it for lots of use cases and simplify things for users to deal with replication, charting, and consistency issues DevFest: Google has about 800 Google developer groups that do a lot of things to build local communities and write code together Links: Adam Seligman on Twitter 12-Factor App I Want to Build a World Spanning Search Engine on Top of GCP DevFest Kubernetes Docker Heroku Google Next Google Reader

Play Episode

Episode 24: Serverless Observability via the bill is terrible

Screaming in the Cloud

08.21.2018

40 Minutes

What is serverless? What do people want it to be? Serverless is when you write your software, deploy it to a Cloud vendor that will scale and run it, and you receive a pay-for-use bill. It’s not necessarily a function of a service, but a concept. Today, we’re talking to Nitzan Shapira, co-founder and CEO of Epsagon, which brings observability to serverless Cloud applications by using distributed tracing and artificial intelligence (AI) technologies. He is a software engineer with experience in software development, cyber security, reverse engineering, and machine learning. Some of the highlights of the show include: Modern renaissance of “functions as a service” compared to past history; is as abstracted as it can be, which means almost no constraints If you write your own software, ship it, and deploy it - it counts as serverless Some treat serverless as event-driven architecture where code swings into action When being strategic to make it more efficient, plan and develop an application with specific and complicated functioning Epsagon is a global observer for what the industry is doing and how it is implementing serverless as it evolves Trends and use cases include focusing on serverless first instead of the Cloud Economic Argument: Less expensive than running things all the time and offers ability to trace capital flow; but be cautious about unpredictable cost Use bill to determine how much performance and flow time has been spent Companies seem to be trying to support every vendor’s serverless offering; when it comes to serverless, AWS Lambda appears to be used most often Not easy to move from one provider to another; on-premise misses the point People starting with AWS Lambda need familiarity with other services, which can be a reasonable but difficult barrier that’s worth the effort Managing serverless applications may have to be done through a third party Systemic view of how applications work focuses on overall health of a system, not individual function Epsagon is headquartered in Israel, along with other emerging serverless startups; Israeli culture fuels innovation Links: Epsagon Email Nitzan Shapira Nitzan Shapira on Twitter Heroku Google App Engine AWS Elastic Beanstalk Lambda Amazon CloudWatch AWS X-Ray Simon Wardley Charity Majors Start-Up Nation Digital Ocean

Play Episode

Episode 23: Most Likely to be Misunderstood: The Myth of Cloud Agnosticism

Screaming in the Cloud

08.09.2018

36 Minutes

It is easy to pick apart the general premise of Cloud agnosticism being a myth. What about reasonable use cases? Well, generally, when you have a workload that you want to put on multiple Cloud providers, it is a bad idea. It’s difficult to build and maintain. Providers change, some more than others. The ability to work with them becomes more complex. Yet, Cloud providers rarely disappoint you enough to make you hurry and go to another provider. Today, we’re talking to Jay Gordon, Cloud developer advocate for MongoDB, about databases, distribution of databases, and multi-Cloud strategies. MongoDB is a good option for people who want to build applications quicker and faster but not do a lot of infrastructural work. Some of the highlights of the show include: Easier to consider distributed data to be something reliable and available, than not being reliable and available People spend time buying an option that doesn’t work, at the cost of feature velocity If Cloud provider goes down, is it the end of the world? Cloud offers greater flexibility; but no matter what, there should be a secondary option when a critical path comes to a breaking point Hand-off from one provider to another is more likely to cause an outage than a multi-region single provider failure Exclusion of Cloud Agnostic Tooling: The more we create tools that do the same thing regardless of provider, there will be more agnosticism from implementers Workload-dependent where data gravity dictates choices; bandwidth isn’t free Certain services are only available on one Cloud due to licensing; but tools can help with migration Major service providers handle persistent parts of architecture, and other companies offer database services and tools for those providers Cost may/may not be a factor why businesses stay with 1 instead of multi-Cloud How much RPO and RTO play into a multi-Cloud decision Selecting a database/data store when building; consider security encryption Links: Jay Gordon on Twitter MongoDB The Myth of Cloud Agnosticism Heresy in the Church of Docker Kubernetes Amazon Secrets Manager JSON Digital Ocean

Play Episode

Insightful conversations. Less snark.