The Magic of Tailscale with Avery Pennarun

Episode Summary

Tailscale and its CEO, Avery Pennarun, join Corey today for some extremely exciting news! They have just raised $100 million in a Series B, a significant accomplishment. Given the super ease of use, and general wizardry that makes Tailscale work, this is excellent news for all of us! Corey has been using Tailscale for a while, and it has been transformative for how he uses these kinds of tools. He can’t stop raving about how useful it is, but it is hard to explain to folks. Avery clears any confusion and provides a thorough understanding of what it is, and how it works. Avery discusses how Tailscale connects your devices, provides a high level of visibility within your network, and how your whole team is able to utilize it. Avery goes into detail on Tailscale’s offerings, breaks down some of the technical aspects of how it works, and more!

Episode Show Notes & Transcript

About Avery
wvdial, bup, sshuttle, netselect, popularity-contest, redo, gfblip, GFiber, and now @Tailscale doing WireGuard mesh. Top search result for "epic treatise."


Links Referenced:
Transcript
Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.


Corey: This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into production. I’m going to just guess that it’s awful because it’s always awful. No one loves their deployment process. What if launching new features didn’t require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren’t what you expect? LaunchDarkly does exactly this. To learn more, visit launchdarkly.com and tell them Corey sent you, and watch for the wince.


Corey: This episode is sponsored by our friends at Revelo. Revelo is the Spanish word of the day, and its spelled R-E-V-E-L-O. It means “I reveal.” Now, have you tried to hire an engineer lately? I assure you it is significantly harder than it sounds. One of the things that Revelo has recognized is something I’ve been talking about for a while, specifically that while talent is evenly distributed, opportunity is absolutely not. They’re exposing a new talent pool to, basically, those of us without a presence in Latin America via their platform. It’s the largest tech talent marketplace in Latin America with over a million engineers in their network, which includes—but isn’t limited to—talent in Mexico, Costa Rica, Brazil, and Argentina. Now, not only do they wind up spreading all of their talent on English ability, as well as you know, their engineering skills, but they go significantly beyond that. Some of the folks on their platform are hands down the most talented engineers that I’ve ever spoken to. Let’s also not forget that Latin America has high time zone overlap with what we have here in the United States, so you can hire full-time remote engineers who share most of the workday as your team. It’s an end-to-end talent service, so you can find and hire engineers in Central and South America without having to worry about, frankly, the colossal pain of cross-border payroll and benefits and compliance because Revelo handles all of it. If you’re hiring engineers, check out revelo.io/screaming to get 20% off your first three months. That’s R-E-V-E-L-O dot I-O slash screaming.


Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. Generally, at the start of these shows, I mention something about money. When I have a promoted guest, which means that they are sponsoring this episode, I talk about that. This is not that moment. There’s no money changing hands here.


And in fact, I’m about to talk about a product that I am a huge fan of, but I’m, also as of this recording, not paying for. So, one might think I’m the product, but no. Let’s actually start by talking about money. My guest today is Avery Pennarun, the CEO of Tailscale, and as of today, being the day that this goes out, you folks have just raised $100 million in a Series B. First, thank you for joining me, followed immediately by congratulations.


Avery: It’s great to be here, and thank you. It’s an exciting announcement that I hope we don’t end up spending too much time talking about because money is a lot more boring than technology. But yeah, we are very happy, both to be here and to be making the announcement.


Corey: Yeah. CRV and Insight Partners are the lead investors on the round. And it’s great to see because I’ve been using Tailscale for a while now. And it is a transformative experience for the way that I think about these things. A while back, I wrote a Lambda layer that lets Lambda functions take advantage of it, but in fairness, I did write it, so anyone looking at that should—“Haha, that’s why you’re not a developer full-time. You’re bad at it.” Yes, I am.


But I can’t stop raving about how useful Tailscale is, with the counterpoint that it’s also very difficult to explain to people who are not—at least in my experience—broken in a very particular way, as I am. What is Tailscale? And what does it do?


Avery: Right. Well, I mean, first of all, one of the things I really like about Tailscale and what we built is that, you know, even if you’re not a super great developer—like you just described yourself—you can get excited about it, you can use it for things, you can build on top of it, and contribute back without having to understand every single little detail of what it does, right? Tailscale is something that a lot of people get excited about without having to know how it works; they just know what it gives them, right? The answer to what Tailscale is, is sort of… it can be hard to explain to people who don’t know about the kinds of problems that it solves, but the super short answer is it connects all of your devices and virtual machines and containers to each other, wherever they are, without going through an intermediary, right? So, it minimizes latency and it maximizes throughput, and it minimizes pain. And it sounds like that should be hard, but you can get it all done in, like, five minutes.


Corey: I have been using it for a while now. Originally, I was using it and federating through it I believe, via Google. I rebuilt and tore down the entire network in about five minutes, instead started federating through GitHub. Nowadays, you apparently changed your position on that identity and you use third-party SSL sources, as well as retaining user information and login stuff yourselves, which is just, it’s almost starved for choice, on some level. But I am such a fan of the product that if you’ll forgive me if I talk for about a minute or so on how I use it and my experience of it.


Avery: Go for it.


Corey: So, I wind up firing up Tailscale, and I have a network that from any of my devices, I can talk to any other. I have a couple of EC2 machines hanging out in AWS, I have a Raspberry Pi that I use as a DNS server sitting in the other room, I have my iPad, I have my iPhone, I have my laptop, I have my desktop, I have a VM sitting over in Google Cloud, I have a different VM sitting over an Oracle Cloud. And all of these things can talk to each other directly over a secured network. I can override DNS and talk to these things just by the machine name, I can talk to them via the address that winds up being passed out to them through this. It is transformative. It works on IPv4, IPv6, if I’m on a network without IPv6 access using Tailscale, suddenly I can.


I can emerge from almost any other node on this network. And adding a new device to this is effectively opening a link in a browser on either that device or a different one, clicking approve once I log in, and it’s done. That is my experience of it, so far. Is that directionally correct as far as how you think about the product? Because again, I use DNS TXT records as a database for God’s sake. I am probably not the world’s foremost technical authority on the proper use of things.


Avery: Right. Yeah. I mean, that’s a good description of what it does. I think it actually—it’s weird, right? It’s hard to get across in words just how simple it is, right?


That one-minute description used a bunch of technical-sounding terminology that probably the listeners to your podcast will understand. But, like, the average tech person doesn’t need to know any of those things in order to use Tailscale, right? You download it from the app store on your phone and your laptop. And you install Tailscale on both from the App Store. You log into your Google account or your GitHub account, and that’s it. Those two devices are tied together in time and space; they can see each other. You can access a web server that you’re running on your laptop from your phone without doing anything else, right?


And then you can start a VM in AWS and you load Tailscale in there, and now that’s part of your network. And so, there’s—you don’t need to know what IPv4 and IPv6 even are. You don’t need to know what DNS even is. It just, you know, the magic sort of comes together. We do a ton of stuff behind the scenes to make that magic work. But it’s this —one thing that one customer said to us one time is, like, “It makes the internet work the way you thought the internet worked until you learned how the internet worked.” If that makes sense.


Corey: Right. It basically works on duct tape and toothpicks all spit together, and it’s amazing that it works at all. I mean, this is going to sound relatively banal, but the way that I’ve used Tailscale the most is on my phone or on my iPad or on my Mac. I will connect to the Tailscale network by default, and when that is done, it passes out my pi-hole’s IP address as the custom DNS server for the entire network. So, I don’t see a whole bunch of ads, not just in browser, but in apps and the rest.


And every once in a while when something is broken because an ad server is apparently critical to something, great, I turn off the VPN on that device, use the natural stuff. My experience of the internet gets worse as a result and the thing starts working again, then I turn it back on. It is more or less the thing that I use as a very strange-looking ad blocker, in some respects, that I can toggle on and off with the click of a button. But it’s magic, it is effectively magic. From the device side, it’s open up an app and toggle a switch, or it is grab from the menu bar on a Mac, there’s an application that runs and just click the connect button or the disconnect button.


There is no MFA every time you connect. There is no type in a username and password. There is no lengthy handshake. I hit connect and it is connected by the time I have moved the mouse back from the menu bar to the application I was working in. Whenever I show this to someone who uses a corporate VPN, they don’t believe me.


Avery: Right. Yeah, exactly. It’s hard to believe. It's like, “Hey, did anything actually happen here?” Because we removed you know, for example, it doesn’t by default catch all your traffic, it only catches the traffic to your private network, so it’s safe to leave it on all the time because it’s not interfering with what you’re doing.


What you’re describing is using Pi-Hole, which is a Raspberry Pi-based DNS server that is an ad blocker, most people using Pi-Hole have one at home, so when they’re at home they get ads blocked, but when they leave home they don’t get their ads blocked. If you add Tailscale to that, you can use your Pi-Hole even when you’re not at home, and it sort of makes it that much more useful. I think an important difference from, say, other services that you can use an adblocker or a privacy VPN is that we never see your traffic, right? Tailscale creates a private network between you and all your personal devices, and that private network is private even from us, right? We help you connect the devices to each other, but when your traffic goes to Pi-Hole, it’s your Pi-Hole. It’s not our adblocker. It’s your adblocker, right, so we never see what traffic you’re going to, we never see what DNS names you're looking up because it was just never made available to us, right?


Corey: Right. But did you do—the level of visibility you have into my network is fascinating in a variety of different ways, but it is also equally fascinating—one of those ways—is that how limited it is. You know what devices I have, the last time they’ve connected, the version of Tailscale they’re running, an IP address on it, and you also wind up seeing what services are advertised and available on those networks if I decide to enable that. Which is great for things like development; I’m going to be doing development in a local dev sense on an EC2 instance somewhere. And well, I don’t want to set up a tunnel with SSH to wind up having to proxy traffic over there just so I can wind up hitting some high port that I bound to, and I certainly don’t want to expose that to the general internet; that is a worst practice for all these things.


And Tailscale magically makes this go away. I haven’t done this in much depth yet with a variety of my team members, but when you start working on this with teams who are doing development work, someone can have something running on their laptop and just seamlessly share it with their colleagues. It’s transformative, especially in an area where very often that colleague is not sitting in the same room getting the greasy fingerprints on your laptop screen.


Avery: Yep. Yeah, exactly. So, you mentioned the services list which you have to specifically opt into, and the reason we did that is that, you know, the list of devices and hostnames and IP addresses, we have to collect because that’s how the service works, right? You send us the information about your devices, and then we send the public keys for those devices to the other devices. We can’t get out of collecting that, whereas the services list is purely an interesting add-on feature, and we decided that we didn’t want to collect that by default because it would make people nervous about their privacy.


So, if you want that feature, you click it on; if you don’t want it, don’t turn it on, you can still share services with people inside your network; they just need to know that those services exist. You send them the URL or whatever and it’ll work, but it doesn’t show up as a list of things that we can see in that case. But yeah, sharing stuff between your coworkers is definitely… is a major use case for Tailscale and dev and infrastructure teams in particular. Like, you can—designers, for example, run a test version of the website on their laptop, and then they say, “Hey, visit this URL on my laptop.” And you don’t have to be in the same office, you can both be sitting in different cafes in different cities. Tailscale will make it so that the connection between those two computers still works, even if they’re both behind firewalls, even if they’re both behind different NATs, and so on.


Corey: One of the things that astounded me the most; I am reluctant to completely trust things that are new that touch the network. Early on in my career, I made network engineering mistake 101, which is making a change to the firewall in your data center without having another way in. And the drive across town or calling remote hands to get them to let you back in and when you locked things out. Because you folks are building these things on a pretty consistent clip; there are a lot of updates and releases across all of the platforms. And invariably, I find myself on some devices version behind or so, just because of the pace of innovation. “Oh, great. We’re updating the VPN client. Cool. So, I’m going to expect this thing to drop and I’m going to have to go in and jigger it to get it working again.”


That has never happened. I have finally given in to, I guess, the iron test of this, and I have closed SSH from the internet to most of these nodes. In fact, some of them sit —the Pi-Hole sitting at home, if you’re not on my home network, there is no outside way in without breaking in. It is absolutely one of those things that disappears into the background in a way that I was extraordinarily surprised to find.


Avery: Right. Well, that is something—I mean, I’m old and grumpy, I guess, is sort of the beginning part of all this, right? I’ve seen all this annoying stuff that happens with software. And, you know, and many of us, in fact, at Tailscale are old and grumpy, and we just didn’t want to repeat those same things. So, first of all, network stuff to an even stronger degree than virtually any other kind of product, if your network stops working, everything stops working, right, so it’s number one priority that Tailscale has to not mess up your network.


Because if it does, you instantly lose faith. There’s kind of like—Tailscale gives you this magical feeling when you first install it, but that feeling of magic goes away very quickly the first time it screws something up and you can’t connect when you really need to. So, we put a huge amount of work into making sure that you can connect when you really need to. We have a lot of automated tests. One of our policies that I think is almost unheard of is that we intend to never deprecate support for older versions of the Tailscale client.


And to this day, we’re about three years into Tailscale, we’ve never deprecated an old client that anybody is using. So eventually, people—though in fact hard to believe, but eventually, people do stop using some old versions, so those ones don’t work anymore, necessarily. But any version of Tailscale that is in use today is going to keep working as long as anybody is using it. We have a very, very, very strong backwards compatibility policy. Because the worst thing that I can imagine is having some Raspberry Pi sitting out in the void somewhere that I haven’t looked at for two years, that whoops, Tailscale broke it, and now I can’t connect to it, and now I have to go drive down there and fix it, right? It would be just insultingly terrible for that to happen.


And we just make sure that doesn’t happen. Another thing that people get excited about is, like, on a Debian system or whatever, if you’ve got the Debian package installed, you can do an apt-get upgrade. Tailscale upgrades and even your SSH session doesn’t drop. Every now and then people [comment and was like 00:14:13] —


Corey: That was the weirdest part. I was expecting it to go away or hang for a long period of time. And sure, I guess it might drop a packet or so, I’ve never bothered to look because it is so seamless.


Avery: Right. Yeah, exactly. It’s just, like, “Wait. Did anything even happen?” It’s like, “Yes”—


Corey: Right—


Avery: —“Something happened. We upgraded it out from underneath you.”


Corey: —my next thing is [crosstalk 00:14:28]—yeah, I grep Tailscale on the process table. Like, okay, is this just a stale thing that’s existing [unintelligible 00:14:34] to bounce it? No, it has just been started. It was so seamless under the hood that it was amazing. There is something that is—a lot of things have been very deeply right on this.


Something else that I think is worth pointing out is that if any company had the brainpower there to roll their own crypto, it would be you folks, but you don’t. You’re riding on top of WireGuard, an open-source project that does full-mesh VPNs with terrible user interfaces.


Avery: Yep. So, you know, I guess disclosure. Back in 1997 when I started my first startup, I was not smart enough to not roll my own crypto. And therefore the VPN I wrote at the time definitely had giant security holes. It was also not that popular, so nobody found them. But I, you know eventually I found [crosstalk 00:15:21]—


Corey: “Except a bank, which I really shouldn’t disclose.” Kidding, I’m kidding. But yeah.


Avery: [laugh]. No, no, no. The bank never used that software. [laugh]. But yeah. Nowadays, I’ve been through a lot, and I… I would not describe myself as a security expert. Although people often describe me as a security expert. I don’t know what that means. But I am enough of an expert to know that I should not be rolling my own crypto. And the people who invented WireGuard, it’s one of the—I feel like I’m overstating things, but I’m not—it’s one of the biggest leaps forward in cryptography, in probably the history of computing. Now, it builds on a series of things that are part of the same leap forward, right? It’s built on the protocol that Signal uses called the Noise Protocol, right? Signal and Noise are built on the Ed25519 curve, made by —or popularized by Dan Bernstein who’s a major cryptographer in this area. Sometimes popular, sometimes—


Corey: Oh, djb.


Avery: —not popular. Yeah, exactly.


Corey: He also, near and dear to my heart, wrote djbdns, which was a well-known, widely deployed DNS server, by which I of course mean database. Please, continue.


Avery: Yep. [laugh]. I’ve been a huge fan of basically everything djb has ever made in the history of—


Corey: Oh, you’re a qmail person. I am on the postfix side of [unintelligible 00:16:37].


Avery: Yep. Well, my first startup back in 1997, we made Linux-based server appliances for small businesses. And we use qmail, we use djbdns, we used a couple of other djb products. And you know, for the history of that product—you know, leaving aside my VPN that was a security hole—the djb stuff never had a single problem. That company was eventually acquired by IBM.


One of the first things IBM did is, like, “Whoa, djb has a super-weird software license. We can’t be doing this. Let’s replace it with software that has a decent license.” So, they dropped out djbdns and started using BIND. Within a week, there was a security hole in BIND that affected all of these appliances that they now controlled, right?


So, djb is a very big-brained, super genius in security, whatever you might think of his personality. And it’s sort of like was the basis for this revolution in cryptography that WireGuard has sort of brought to the networking world. And it’s hard to overstate. Just, like, the number of lines of code, there’s something like 100 times less code to implement WireGuard than to implement IPsec. Like, that is very hard to believe, but it is actually the case.


And that made it something really powerful to build on top of. Like, it’s super hard for somebody like me to screw up the security of a WireGuard deployment, where it’s very easy to screw up the security of an IPsec deployment.


Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of “Hello, World” demos? Allow me to introduce you to Oracle’s Always Free tier. It provides over 20 free services and infrastructure, networking, databases, observability, management, and security. And—let me be clear here—it’s actually free. There’s no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself, all while gaining the networking, load balancing, and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build. With Always Free, you can do things like run small-scale applications or do proof-of-concept testing without spending a dime. You know that I always like to put asterisks next to the word free? This is actually free, no asterisk. Start now. Visit snark.cloud/oci-free that’s snark.cloud/oci-free.


Corey: I just want to call something out as well, that when I say that you folks definitely have the intellectual firepower to roll your own crypto should you choose to do so, but you chose not to, if anything, I’m understating it. To be clear, one of the blog posts you had somewhat recently out was how you are maintaining what is effectively your own fork of the Go programming language. Which is one of those things when someone hears that it’s like, “I’m sorry, can you say that again? Because I am almost certain I misunderstood something.” What is the high-level version of that?


Avery: Well, there's, I think, two important points there. One of them is that yes, we did fork the Go programming language; it’s supposed to be a temporary fork because it allows us to do some experiments with the go back-end. And the primary reason we were able to do that is because we employ a couple of people who used to be on the core Go team. And that was not because we went out looking for people who used to be on the core Go team, that’s just how it worked out. But because we do, it’s easier for them to fork Go than it would be for the average person, and in many ways, it’s easier for them to get their job done by just continuing to work on the codebase they’ve already worked on.


But the second point is actually, as compilers go, the Go compiler is probably the very easiest one I’ve ever seen to be able to fork and edit. Like it’s super-clear code, you’re just editing Go code, which is already pretty easy. But they really put a ton of work into making it readable and understandable. So, like, average people actually can fork the Go compiler and not be completely bamboozled by how difficult everything is, right? Compared to, like, GCC where just building the thing is something that takes you weeks to learn how to do, right, Go is just, like, you run this script and build your compiler [unintelligible 00:19:35]—


Corey: Yeah. Let me clear this quarter on my schedule so I can go ahead and do that. Yeah, no, thank you.


Avery: Yeah. I’ve built copies of GCC and it’s absolutely nightmarish, right? And built people’s forks of GCC for special embedded processors and stuff. And this is, like, a f—this is a career that you can specialize in, building GCC, right? There are people that do this, right? And the Go compiler, it’s really—


Corey: Well, it’s 40 years of load-bearing technical debt.


Avery: Yeah. Yeah. But the Go compiler. It’s very nice; it’s just a program that’s written in Go, that compiles under Go, and then you end up with one binary, right? And as long as you have that binary, everything just works, right? And so, it’s actually surprisingly easy to fork Go. I don’t want to—you know, I wouldn’t put that on the same level of difficulty as, like, not screwing up cryptography, if you’re trying to do it yourself. [crosstalk 00:20:16]


Corey: [crosstalk 00:20:16] their own crypto algorithm that they themselves can’t defeat. Yeah, it turns out that basically, breaking crypto is a team sport. Who knew?


Avery: Yeah. Exactly. Generally, with security, you have this problem a lot, right? It’s a lot harder to build a system that nobody can break into, than it is to break into a random system, right? Because you know, the job of securing something against everybody is much harder than the job of finding something you can break into.


Corey: So, I did have a question about something you said earlier, where one of the use cases—one of the design goals—is not to have a breaking change to a point where an old device cannot still connect to the private network. But you do have a key expiry for devices where a device needs to relog in, and it can be anywhere between 3 and 180 as I look at it. I don’t know if some of the more enterprise-y options have longer options that they can set, but what happen—how do you not have to drive out to the back of beyond to re-authenticate that Raspberry Pi every six months?


Avery: Ah. So, this is something, it’s at the policy layer, and we have not finished refining this to perfection, I would say, right now. What we do have though, if your key does expire, there’s a button in the admin panel to say, like, boost this device for a little bit longer. Sort of unexpire it for another 30 minutes—I don’t remember what the—how much time it is—then you can SSH into the device and do a proper key refresh on it without actually having to drive out there. Now, we did for one version, accidentally break the key reactivation feature so that if the client noticed it’s key is expired, it actually disconnected from the Tailscale network altogether and then didn’t receive the message to, like, “Hey, could you please increase the length of your key?” That was fixable by power cycling it, which you could often get somebody to do without driving all the way out there. But we fixed that, so now that—


Corey: “Have you tried turning it off and back on again,” is still a surprisingly effective way of troubleshooting something.


Avery: Yeah, exactly. So, that wasn’t—I mean, it was kind of annoying for some people. But yeah, the reason we use, by default, every key always expires is because unlimited time credentials are one of the worst security holes that people don’t really acknowledge. Because technically, it’ll never be the, like—you know, it’ll never show up as the highest severity security hole that you have an unlimited time credential sitting in your home directory, but it is something that—well, I can tell a story. There is a company that I heard about that had you know—SSH keys are typically unlimited time credentials; the easiest way to do it is you run ssh-keygen, it puts something in your home directory, you copy the public key to all the devices you want to be able to log into, and then you never think about it again.


So, this is a company that, of course, every developer in their company had done this; they had a production network with a bunch of SSH keys in it. Some not very ethical employee worked there, had keys in their production systems, and eventually got fired. Now, of course, this company had good processes in place, they went through all the devices and took out this person’s public key from all the devices. What they didn’t know is that during lunch one day, this person had gone around to all their coworkers' workstations that hadn’t been locked, downloaded the private keys for those people on his—


Corey: Oh no.


Avery: —computer before he got fired. And so, shortly after he got fired, their entire production network got wiped out. Now, they didn’t have enough forensics at the time to know how it all got wiped out, so they spent some time putting it all back in place, this time with forensics. About a month later—they rebuilt everything from scratch, all new public keys and everything. You couldn’t possibly have any backdoors in this system, right?


And then a month later, it all got wiped out again. This time, the forensics revealed and, like, it was one of the existing employees, coming from a different country, that had gotten into their private production network and wiped everything out. How did that happen? It was because this person had years earlier, downloaded all their public—or private keys when he wandered around through the office. You can fix this problem instantly, by just expiring your keys and forcing your rotation periodically, right?


SSH doesn’t make that very easy. You can with SSH setup, SSH certificate authentication, which is a huge ordeal to get configured, but once it’s working, it solves this particular problem, right? Tailscale [crosstalk 00:24:19]—


Corey: On Mac and iOS, there is a slight improvement to this that I’m a big fan of because I agree with you. I am lousy at rotating my keys, but there’s an open-source project called Secretive that I use on the Mac that stores the private key in the Secure Enclave, which the Mac will not let out of it. And I have to use Touch ID to authenticate every time I want to connect to something. Which can get annoying from time to time, but there is no way for someone to copy that off. Historically, I would—


Avery: That’s true.


Corey: Have a passphrase that was also tied to the key so if someone grabbed it off the disk, it still theoretically would not be usable. And that was—but again, that is an absolute vector that needs to be addressed and thought about. Key rotation is huge.


Avery: And you have to go through this effort to sort it all out, right? So Tailscale, we just have this policy: We don’t do unlimited length credentials; we do key rotation for everything, and we just sort of set different time limits for this rotation depending on how picky you want to be about it. But any key expiry is much, much better than no key expiry. Even if you set it to a six-month key expiry, you still have at least it’s only the six-month window that somebody could theoretically reuse your keys. And we can also rotate keys behind the scenes and so on.


So, in the SSH case, the way people use Tailscale, you stopped opening the SSH port to the world. You’re only SSH when you’re 
connected over Tailscale. The fact that your Tailscale keys rotate and expire over time is what protects your SSH session. So, you could keep using static SSH keys that never expire—don’t try to figure out all this other complicated stuff, right—and you’re still protected from these private SSH, like, unlimited length keys. Now, that said, for servers, Tailscale does have a button where you can say, like, “Please stop expiring the key.” This is a server, nobody’s ever going to get physical access to the machine.


The only thing we could do with the private key for this machine is allow other people to SSH into it, which is not very dangerous, right? It’s pretty much, like, somebody stealing your SSH authorized keys file; like, it doesn’t really matter. And for that case, you turn off the expiry altogether. But expiring keys is intended for use by, like, devices that employees are actually holding in their hands where if it expires, it’s no big deal, you push the login button and it refreshes.


Corey: There’s something that is very nice about dealing with something that is just so sensible. I mean, we’ve all—at least in the olden days of running sysadmin stuff, we had this problem we would generate—or purchase back in those days—SSL certificates and, great, they expire to a year or so at the end of the year, people forget, and then it would expire you to run around fixing this. And the default knee-jerk response was that was awful. Let’s get the next one for five years so we didn’t have to think about it that long.


And it’s always a wildcard and so it gets put all over the place, and you wind up with these problems. One of the things that Let’s Encrypt has done super well is forcing a rotation every 90 days so you know where it is. It’s just often enough you want to automate it. And ACM, the AWS certificate manager that they use, takes a slightly different approach. It doesn’t give you the private key; it embeds it in other places so they can handle the rotation themselves.


And they start screaming in your email if they can’t verify that it’s time for renewal long before it hits. It’s different approaches to the problem, but yeah, five years out, how should I know all the places the certificate has wound up in that intervening time? Most of the people who did it aren’t there anymore. And one day, surprise, a website breaks, either because its SSL cert isn’t working, or one of the back-end services it depends on suddenly doesn’t have that working. It’s become a mess, so having a forced modernity to these things is important.


Avery: Right. It’s forced modernity, and it’s just basically, it’s all behind the scenes. Like, you don’t even think about the fact that Tailscale gave you a key because that is not relevant to your day-to-day life, right? You logged in, something happened, all these devices ended up on your network. What actually happened is that public and private keys—you know, a private key was generated, the public keys were distributed properly, things are getting rotated, but you don’t have to care about all that stuff.


So, it’s fun that Tailscale is what we call secure by default, right? People love to use it because it’s easier, it makes their life easier, but security teams like it because actually, it changes the default security posture from, like, “Ugh, I’m going to have to tell everybody to please stop doing these five things because it always creates security holes,” to like, “Whoa, the thing that they’re going to do most naturally is actually going to be safe.” Right? I really like that about it. You’re not thinking about certificates, but their certificates are getting rotated exactly as they should be.


Corey: There’s just something so nice about computers doing the heavy lifting for us. It’s one of the weird things about Tailscale is it falls into a very strange spot where there is effectively zero maintenance burden on me, but I still use it to toggle it on or off in scenarios often enough to remember that it’s there and that I’m using it. It is the perfect sweet spot of being somewhat close to top of mind, but never in a sense that is, “Oh, I got to deal with this freaking thing again.” It never feels that way. Logging into it, it has long-lived sessions at the browser, so it isn’t one of those, ah, you have to go back to GitHub and re-authenticate and do all these other dog-and-pony show things. It just works. It is damn near a consumer-level of ease-of-use, start to finish. The hard part, of course, is how on earth you explain this to someone [laugh] without a background in this space.


Avery: Yeah, exactly. It’s something we ask ourselves sometimes is, like, well, you know, Tailscale is great for developers right now. It is easy enough to use, even for consumers, but, like, how would you explain it to consumers and find a good use case for consumers? And it’s something that I think we are going to do eventually, but it hasn’t been, up until now, a super high priority for us just because developers are this sort of like the core audience that we haven’t even finished building a great product that does everything that they want, yet. There is one little feature in Tailscale that’s the beginning of something that's consumer-friendly; it’s called Taildrop.


I don’t know if you’ve seen this one. You can turn it on, and basically, it acts like AirDrop in Apple products, except you don’t need to care about physical proximity and it works with every kind of device, not just Apple devices, right? So, you can add it as—it shows up in the share pane on your Mac OS or Windows or iOS device. You can use it from Linux, you just use it to send files of any type, and it sends them point to point not through a cloud provider so that we never see a copy of the file. It only goes between your devices over your encrypted network. So, that’s something that consumers kind of like.


Corey: Feels like Tailprint for Bonjour could wind up being another aspect of this as well. And I’m still hoping for something almost Ansible-like where run the following command, whether it’s pre-approved or not, on a following subset of things. In my case, for example, it’s, I would love it if it would just automatically, when I press the button, update Tailscale across all of the nodes that support it, namely the Linux boxes. I don’t think you can trigger an App Store update from within a sandboxed app on iOS, but I’ve been—


Avery: Right.


Corey: Surprised before. Yeah. But it’s nice to be able to do some things.


Avery: Yeah. This is one of those—yeah, we get that request a lot for, like, can you push a button to auto-update Tailscale? It makes me really sad that we get this request because the need for this is a sign that all of the OS vendors have completely botched software updates, right? Like, the OS should be the thing, updating your software on a good schedule based on a set of rules, and it shouldn’t be the job of every single application to provide their own software update. It’s actually a massive, embarrassing, security hole that software can even update itself, right?


Because if it can update itself, then you know, imagine someone breaks into the production services of a company that is offering a particular program. They put malware into a version of the software, they put it into the software update server, and then they trigger everything in the network to push the software update to those devices. Now, you’ve got malware installed on all your devices, right? It’s very strange that people asked for this as a feature. [laugh].


Tailscale currently does not have that feature; it doesn’t push software updates on its own. But it’s such a popular feature that I think we’re going to have to implement it because everybody wants this because Windows, for example, is simply just never going to automatically update your software for you. We have to have these weird-super admin rights on your machine so that we can push software updates because nobody else will. I feel really weird about that. You know, the security world should be protesting this more.


But instead, they’re like, asking, can you please put this feature in because I’ve got a checklist in my compliance thing that says, “Is all your software up-to-date?” I don’t have a checklist item that says, “Does any of my software have super-admin rights that they shouldn’t have?” Right? It’s sort of, I guess, the next level of supply-chain management is the big word. Nobody—there is no supply chain management for software.


Corey: There isn’t, for better or worse. I wish there were, but there simply is not. Ugh. Next year, maybe. We hope.


Avery: Yep. So, you have to trust your vendors, fundamentally, which I guess will always be true. That’s true for Tailscale as well, 
right? Whether or not we include the software update pushing. If you’re installing a VPN product provided by a vendor, you have to trust that we’re going to put the right stuff into the software.


And the best—the only thing I can really do is just be honest about these issues and say, “Well, look, we try our best. We definitely try not to implement features that are going to turn into security holes for you.” And I think we do a lot better than most vendors do in that area. But it’s very hard to be perfect because nobody knows how to do software supply chain well.


Corey: Ugh. I hear you. I that’s the nice thing, too. Honestly, the big reason I know I need to update these things and the reason I want to do it’s actually you. Because whenever I log in and look at my devices in the Tailscale thing, there’s a little icon next to the one that there’s an update available here.


And you have fixed a lot of the niceties on this, like, ah, there’s an update available for the iOS version. It’s, “Really? Because it’s not available in the Apple Store yet,” as I sit there spamming the thing. That stopped happening. There’s a lot of just very nice quality-of-life improvements that are easy to miss.


Avery: Yep, yeah, that’s kind of weird. We actually went a little overboard on the update available notifications for a while because there’s always this trade-off, right? Like I said, we have a policy of never breaking old versions, so when people see the update available notification, they kind of panic. It’s, like, “Oh no, I better install the update, before Talescale cuts me off.” And, like, well, we’re not actually ever going to cut you off, so you shouldn’t have to worry about that stuff.


But on the other hand, you’re not going to get the latest features and bug fixes unless you’re running the latest version, so when people email us saying, “Hey, I’m using Tailscale from six months ago, and I have this problem,” the first thing our support team does is say, “Well, can you please try the latest one, and does the problem go away?” Because it’s kind of inefficient debugging six-month-old software. So, one way we were trying to, like, minimize that cost is, like, hey, we could just tell people there’s a new version available and then maybe they’ll update it themselves. But that resulted in people panicking. Like, oh, no, I need to install the software really, really soon because I can’t afford to break my network.


Corey: Right.


Avery: And because our system is based on WireGuard and this is —you know, I’ll probably jinx it by saying this but, like, we’ve never had an actual security hole that we’ve had to issue a Tailscale update to resolve, right? People see the update available thing and, like, “Oh, no, I bet there’s a whole bunch of vulnerabilities that they fixed.” It’s like, “Well, no.” WireGuard has also never had a vulnerability, right? [laugh] it’s… yeah, it’s, you know, sooner or later there probably will be one, and when there is one, we’ll probably have to make the, you know, update notification in red or something instead of just the little icon on the admin panel. But yeah, it’s—


Corey: [laugh].


Avery: —we try [crosstalk 00:35:23]—


Corey: Nice job on jinxing it, by the way, I appreciate that.


Avery: Yeah I know. I mean, I try to try my best. [laugh]. But I’ve actually been surprised. It’s very much like my experience with all the djb stuff we used in the past.


Like, when we were using qmail and djbdns for years, there was never once a security hole, right? It’s very interesting that it is possible to design software that never once has a security hole. And nobody does that, right? I mean, I would say I’m not as smart as djb; our software is probably, you know, not going to be as one hundred percent perfect as that, but we try really, really hard to aim for that as a goal.


Corey: Yeah. I really want to thank you for taking the time to speak with me about everything Tailscale is up to. And again, congratulations on your Series B. If people want to learn more, where should they go?


Avery: I guess, tailscale.com is the place. We also have @tailscale in Twitter. My own personal Twitter is @apenwarr, which you probably won’t be able to spell unless you Google for me or something—


Corey: But it’s in the [show notes 00:36:19], which makes this even easier.


Avery: It is? Ah, there you go. So yeah, there’s lots of information. But the number one thing I tell people is, like, look, it is a lot easier to get started than you think it is. Even after you’ve heard it 100 times, nobody ever believes how easy it is to get started. Just go to 
the App Store, download the app, log into your account, and you’re already done, right? Try that and you don’t even have to read anything.


Corey: I would tear you apart for that statement if it weren’t—if it were slightly less true than it is, but it is transformative. Give it a try. It’s a strong endorsement from me. Thank you so much for your time. I appreciate it.


Avery: Thank you, too. Great talking to you, and talk next time.


Corey: Indeed. Avery Pennarun, CEO of Tailscale. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this show, please leave a five-star review on your podcast platform of choice, and smash the like and subscribe buttons, whereas if you’ve hated it, same thing—five-star review, smash the buttons—and also leave an angry bitter comment about how you are smart enough to roll your own crypto, so you don’t understand why other people wouldn’t do it.


Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.


Announcer: This has been a HumblePod production. Stay humble.


Transcript

Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.

Corey: This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into production. I’m going to just guess that it’s awful because it’s always awful. No one loves their deployment process. What if launching new features didn’t require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren’t what you expect? LaunchDarkly does exactly this. To learn more, visit launchdarkly.com and tell them Corey sent you, and watch for the wince.

Corey: This episode is sponsored by our friends at Revelo. Revelo is the Spanish word of the day, and its spelled R-E-V-E-L-O. It means “I reveal.” Now, have you tried to hire an engineer lately? I assure you it is significantly harder than it sounds. One of the things that Revelo has recognized is something I’ve been talking about for a while, specifically that while talent is evenly distributed, opportunity is absolutely not. They’re exposing a new talent pool to, basically, those of us without a presence in Latin America via their platform. It’s the largest tech talent marketplace in Latin America with over a million engineers in their network, which includes—but isn’t limited to—talent in Mexico, Costa Rica, Brazil, and Argentina. Now, not only do they wind up spreading all of their talent on English ability, as well as you know, their engineering skills, but they go significantly beyond that. Some of the folks on their platform are hands down the most talented engineers that I’ve ever spoken to. Let’s also not forget that Latin America has high time zone overlap with what we have here in the United States, so you can hire full-time remote engineers who share most of the workday as your team. It’s an end-to-end talent service, so you can find and hire engineers in Central and South America without having to worry about, frankly, the colossal pain of cross-border payroll and benefits and compliance because Revelo handles all of it. If you’re hiring engineers, check out revelo.io/screaming to get 20% off your first three months. That’s R-E-V-E-L-O dot I-O slash screaming.

Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. Generally, at the start of these shows, I mention something about money. When I have a promoted guest, which means that they are sponsoring this episode, I talk about that. This is not that moment. There’s no money changing hands here.

And in fact, I’m about to talk about a product that I am a huge fan of, but I’m, also as of this recording, not paying for. So, one might think I’m the product, but no. Let’s actually start by talking about money. My guest today is Avery Pennarun, the CEO of Tailscale, and as of today, being the day that this goes out, you folks have just raised $100 million in a Series B. First, thank you for joining me, followed immediately by congratulations.

Avery: It’s great to be here, and thank you. It’s an exciting announcement that I hope we don’t end up spending too much time talking about because money is a lot more boring than technology. But yeah, we are very happy, both to be here and to be making the announcement.

Corey: Yeah. CRV and Insight Partners are the lead investors on the round. And it’s great to see because I’ve been using Tailscale for a while now. And it is a transformative experience for the way that I think about these things. A while back, I wrote a Lambda layer that lets Lambda functions take advantage of it, but in fairness, I did write it, so anyone looking at that should—“Haha, that’s why you’re not a developer full-time. You’re bad at it.” Yes, I am.

But I can’t stop raving about how useful Tailscale is, with the counterpoint that it’s also very difficult to explain to people who are not—at least in my experience—broken in a very particular way, as I am. What is Tailscale? And what does it do?

Avery: Right. Well, I mean, first of all, one of the things I really like about Tailscale and what we built is that, you know, even if you’re not a super great developer—like you just described yourself—you can get excited about it, you can use it for things, you can build on top of it, and contribute back without having to understand every single little detail of what it does, right? Tailscale is something that a lot of people get excited about without having to know how it works; they just know what it gives them, right? The answer to what Tailscale is, is sort of… it can be hard to explain to people who don’t know about the kinds of problems that it solves, but the super short answer is it connects all of your devices and virtual machines and containers to each other, wherever they are, without going through an intermediary, right? So, it minimizes latency and it maximizes throughput, and it minimizes pain. And it sounds like that should be hard, but you can get it all done in, like, five minutes.

Corey: I have been using it for a while now. Originally, I was using it and federating through it I believe, via Google. I rebuilt and tore down the entire network in about five minutes, instead started federating through GitHub. Nowadays, you apparently changed your position on that identity and you use third-party SSL sources, as well as retaining user information and login stuff yourselves, which is just, it’s almost starved for choice, on some level. But I am such a fan of the product that if you’ll forgive me if I talk for about a minute or so on how I use it and my experience of it.

Avery: Go for it.

Corey: So, I wind up firing up Tailscale, and I have a network that from any of my devices, I can talk to any other. I have a couple of EC2 machines hanging out in AWS, I have a Raspberry Pi that I use as a DNS server sitting in the other room, I have my iPad, I have my iPhone, I have my laptop, I have my desktop, I have a VM sitting over in Google Cloud, I have a different VM sitting over an Oracle Cloud. And all of these things can talk to each other directly over a secured network. I can override DNS and talk to these things just by the machine name, I can talk to them via the address that winds up being passed out to them through this. It is transformative. It works on IPv4, IPv6, if I’m on a network without IPv6 access using Tailscale, suddenly I can.

I can emerge from almost any other node on this network. And adding a new device to this is effectively opening a link in a browser on either that device or a different one, clicking approve once I log in, and it’s done. That is my experience of it, so far. Is that directionally correct as far as how you think about the product? Because again, I use DNS TXT records as a database for God’s sake. I am probably not the world’s foremost technical authority on the proper use of things.

Avery: Right. Yeah. I mean, that’s a good description of what it does. I think it actually—it’s weird, right? It’s hard to get across in words just how simple it is, right?

That one-minute description used a bunch of technical-sounding terminology that probably the listeners to your podcast will understand. But, like, the average tech person doesn’t need to know any of those things in order to use Tailscale, right? You download it from the app store on your phone and your laptop. And you install Tailscale on both from the App Store. You log into your Google account or your GitHub account, and that’s it. Those two devices are tied together in time and space; they can see each other. You can access a web server that you’re running on your laptop from your phone without doing anything else, right?

And then you can start a VM in AWS and you load Tailscale in there, and now that’s part of your network. And so, there’s—you don’t need to know what IPv4 and IPv6 even are. You don’t need to know what DNS even is. It just, you know, the magic sort of comes together. We do a ton of stuff behind the scenes to make that magic work. But it’s this —one thing that one customer said to us one time is, like, “It makes the internet work the way you thought the internet worked until you learned how the internet worked.” If that makes sense.

Corey: Right. It basically works on duct tape and toothpicks all spit together, and it’s amazing that it works at all. I mean, this is going to sound relatively banal, but the way that I’ve used Tailscale the most is on my phone or on my iPad or on my Mac. I will connect to the Tailscale network by default, and when that is done, it passes out my pi-hole’s IP address as the custom DNS server for the entire network. So, I don’t see a whole bunch of ads, not just in browser, but in apps and the rest.

And every once in a while when something is broken because an ad server is apparently critical to something, great, I turn off the VPN on that device, use the natural stuff. My experience of the internet gets worse as a result and the thing starts working again, then I turn it back on. It is more or less the thing that I use as a very strange-looking ad blocker, in some respects, that I can toggle on and off with the click of a button. But it’s magic, it is effectively magic. From the device side, it’s open up an app and toggle a switch, or it is grab from the menu bar on a Mac, there’s an application that runs and just click the connect button or the disconnect button.

There is no MFA every time you connect. There is no type in a username and password. There is no lengthy handshake. I hit connect and it is connected by the time I have moved the mouse back from the menu bar to the application I was working in. Whenever I show this to someone who uses a corporate VPN, they don’t believe me.

Avery: Right. Yeah, exactly. It’s hard to believe. It's like, “Hey, did anything actually happen here?” Because we removed you know, for example, it doesn’t by default catch all your traffic, it only catches the traffic to your private network, so it’s safe to leave it on all the time because it’s not interfering with what you’re doing.

What you’re describing is using Pi-Hole, which is a Raspberry Pi-based DNS server that is an ad blocker, most people using Pi-Hole have one at home, so when they’re at home they get ads blocked, but when they leave home they don’t get their ads blocked. If you add Tailscale to that, you can use your Pi-Hole even when you’re not at home, and it sort of makes it that much more useful. I think an important difference from, say, other services that you can use an adblocker or a privacy VPN is that we never see your traffic, right? Tailscale creates a private network between you and all your personal devices, and that private network is private even from us, right? We help you connect the devices to each other, but when your traffic goes to Pi-Hole, it’s your Pi-Hole. It’s not our adblocker. It’s your adblocker, right, so we never see what traffic you’re going to, we never see what DNS names you're looking up because it was just never made available to us, right?

Corey: Right. But did you do—the level of visibility you have into my network is fascinating in a variety of different ways, but it is also equally fascinating—one of those ways—is that how limited it is. You know what devices I have, the last time they’ve connected, the version of Tailscale they’re running, an IP address on it, and you also wind up seeing what services are advertised and available on those networks if I decide to enable that. Which is great for things like development; I’m going to be doing development in a local dev sense on an EC2 instance somewhere. And well, I don’t want to set up a tunnel with SSH to wind up having to proxy traffic over there just so I can wind up hitting some high port that I bound to, and I certainly don’t want to expose that to the general internet; that is a worst practice for all these things.

And Tailscale magically makes this go away. I haven’t done this in much depth yet with a variety of my team members, but when you start working on this with teams who are doing development work, someone can have something running on their laptop and just seamlessly share it with their colleagues. It’s transformative, especially in an area where very often that colleague is not sitting in the same room getting the greasy fingerprints on your laptop screen.

Avery: Yep. Yeah, exactly. So, you mentioned the services list which you have to specifically opt into, and the reason we did that is that, you know, the list of devices and hostnames and IP addresses, we have to collect because that’s how the service works, right? You send us the information about your devices, and then we send the public keys for those devices to the other devices. We can’t get out of collecting that, whereas the services list is purely an interesting add-on feature, and we decided that we didn’t want to collect that by default because it would make people nervous about their privacy.

So, if you want that feature, you click it on; if you don’t want it, don’t turn it on, you can still share services with people inside your network; they just need to know that those services exist. You send them the URL or whatever and it’ll work, but it doesn’t show up as a list of things that we can see in that case. But yeah, sharing stuff between your coworkers is definitely… is a major use case for Tailscale and dev and infrastructure teams in particular. Like, you can—designers, for example, run a test version of the website on their laptop, and then they say, “Hey, visit this URL on my laptop.” And you don’t have to be in the same office, you can both be sitting in different cafes in different cities. Tailscale will make it so that the connection between those two computers still works, even if they’re both behind firewalls, even if they’re both behind different NATs, and so on.

Corey: One of the things that astounded me the most; I am reluctant to completely trust things that are new that touch the network. Early on in my career, I made network engineering mistake 101, which is making a change to the firewall in your data center without having another way in. And the drive across town or calling remote hands to get them to let you back in and when you locked things out. Because you folks are building these things on a pretty consistent clip; there are a lot of updates and releases across all of the platforms. And invariably, I find myself on some devices version behind or so, just because of the pace of innovation. “Oh, great. We’re updating the VPN client. Cool. So, I’m going to expect this thing to drop and I’m going to have to go in and jigger it to get it working again.”

That has never happened. I have finally given in to, I guess, the iron test of this, and I have closed SSH from the internet to most of these nodes. In fact, some of them sit —the Pi-Hole sitting at home, if you’re not on my home network, there is no outside way in without breaking in. It is absolutely one of those things that disappears into the background in a way that I was extraordinarily surprised to find.

Avery: Right. Well, that is something—I mean, I’m old and grumpy, I guess, is sort of the beginning part of all this, right? I’ve seen all this annoying stuff that happens with software. And, you know, and many of us, in fact, at Tailscale are old and grumpy, and we just didn’t want to repeat those same things. So, first of all, network stuff to an even stronger degree than virtually any other kind of product, if your network stops working, everything stops working, right, so it’s number one priority that Tailscale has to not mess up your network.

Because if it does, you instantly lose faith. There’s kind of like—Tailscale gives you this magical feeling when you first install it, but that feeling of magic goes away very quickly the first time it screws something up and you can’t connect when you really need to. So, we put a huge amount of work into making sure that you can connect when you really need to. We have a lot of automated tests. One of our policies that I think is almost unheard of is that we intend to never deprecate support for older versions of the Tailscale client.

And to this day, we’re about three years into Tailscale, we’ve never deprecated an old client that anybody is using. So eventually, people—though in fact hard to believe, but eventually, people do stop using some old versions, so those ones don’t work anymore, necessarily. But any version of Tailscale that is in use today is going to keep working as long as anybody is using it. We have a very, very, very strong backwards compatibility policy. Because the worst thing that I can imagine is having some Raspberry Pi sitting out in the void somewhere that I haven’t looked at for two years, that whoops, Tailscale broke it, and now I can’t connect to it, and now I have to go drive down there and fix it, right? It would be just insultingly terrible for that to happen.

And we just make sure that doesn’t happen. Another thing that people get excited about is, like, on a Debian system or whatever, if you’ve got the Debian package installed, you can do an apt-get upgrade. Tailscale upgrades and even your SSH session doesn’t drop. Every now and then people [comment and was like 00:14:13] —

Corey: That was the weirdest part. I was expecting it to go away or hang for a long period of time. And sure, I guess it might drop a packet or so, I’ve never bothered to look because it is so seamless.

Avery: Right. Yeah, exactly. It’s just, like, “Wait. Did anything even happen?” It’s like, “Yes”—

Corey: Right—

Avery: —“Something happened. We upgraded it out from underneath you.”

Corey: —my next thing is [crosstalk 00:14:28]—yeah, I grep Tailscale on the process table. Like, okay, is this just a stale thing that’s existing [unintelligible 00:14:34] to bounce it? No, it has just been started. It was so seamless under the hood that it was amazing. There is something that is—a lot of things have been very deeply right on this.

Something else that I think is worth pointing out is that if any company had the brainpower there to roll their own crypto, it would be you folks, but you don’t. You’re riding on top of WireGuard, an open-source project that does full-mesh VPNs with terrible user interfaces.

Avery: Yep. So, you know, I guess disclosure. Back in 1997 when I started my first startup, I was not smart enough to not roll my own crypto. And therefore the VPN I wrote at the time definitely had giant security holes. It was also not that popular, so nobody found them. But I, you know eventually I found [crosstalk 00:15:21]—

Corey: “Except a bank, which I really shouldn’t disclose.” Kidding, I’m kidding. But yeah.

Avery: [laugh]. No, no, no. The bank never used that software. [laugh]. But yeah. Nowadays, I’ve been through a lot, and I… I would not describe myself as a security expert. Although people often describe me as a security expert. I don’t know what that means. But I am enough of an expert to know that I should not be rolling my own crypto. And the people who invented WireGuard, it’s one of the—I feel like I’m overstating things, but I’m not—it’s one of the biggest leaps forward in cryptography, in probably the history of computing. Now, it builds on a series of things that are part of the same leap forward, right? It’s built on the protocol that Signal uses called the Noise Protocol, right? Signal and Noise are built on the Ed25519 curve, made by —or popularized by Dan Bernstein who’s a major cryptographer in this area. Sometimes popular, sometimes—

Corey: Oh, djb.

Avery: —not popular. Yeah, exactly.

Corey: He also, near and dear to my heart, wrote djbdns, which was a well-known, widely deployed DNS server, by which I of course mean database. Please, continue.

Avery: Yep. [laugh]. I’ve been a huge fan of basically everything djb has ever made in the history of—

Corey: Oh, you’re a qmail person. I am on the postfix side of [unintelligible 00:16:37].

Avery: Yep. Well, my first startup back in 1997, we made Linux-based server appliances for small businesses. And we use qmail, we use djbdns, we used a couple of other djb products. And you know, for the history of that product—you know, leaving aside my VPN that was a security hole—the djb stuff never had a single problem. That company was eventually acquired by IBM.

One of the first things IBM did is, like, “Whoa, djb has a super-weird software license. We can’t be doing this. Let’s replace it with software that has a decent license.” So, they dropped out djbdns and started using BIND. Within a week, there was a security hole in BIND that affected all of these appliances that they now controlled, right?

So, djb is a very big-brained, super genius in security, whatever you might think of his personality. And it’s sort of like was the basis for this revolution in cryptography that WireGuard has sort of brought to the networking world. And it’s hard to overstate. Just, like, the number of lines of code, there’s something like 100 times less code to implement WireGuard than to implement IPsec. Like, that is very hard to believe, but it is actually the case.

And that made it something really powerful to build on top of. Like, it’s super hard for somebody like me to screw up the security of a WireGuard deployment, where it’s very easy to screw up the security of an IPsec deployment.

Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of “Hello, World” demos? Allow me to introduce you to Oracle’s Always Free tier. It provides over 20 free services and infrastructure, networking, databases, observability, management, and security. And—let me be clear here—it’s actually free. There’s no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself, all while gaining the networking, load balancing, and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build. With Always Free, you can do things like run small-scale applications or do proof-of-concept testing without spending a dime. You know that I always like to put asterisks next to the word free? This is actually free, no asterisk. Start now. Visit snark.cloud/oci-free that’s snark.cloud/oci-free.

Corey: I just want to call something out as well, that when I say that you folks definitely have the intellectual firepower to roll your own crypto should you choose to do so, but you chose not to, if anything, I’m understating it. To be clear, one of the blog posts you had somewhat recently out was how you are maintaining what is effectively your own fork of the Go programming language. Which is one of those things when someone hears that it’s like, “I’m sorry, can you say that again? Because I am almost certain I misunderstood something.” What is the high-level version of that?

Avery: Well, there's, I think, two important points there. One of them is that yes, we did fork the Go programming language; it’s supposed to be a temporary fork because it allows us to do some experiments with the go back-end. And the primary reason we were able to do that is because we employ a couple of people who used to be on the core Go team. And that was not because we went out looking for people who used to be on the core Go team, that’s just how it worked out. But because we do, it’s easier for them to fork Go than it would be for the average person, and in many ways, it’s easier for them to get their job done by just continuing to work on the codebase they’ve already worked on.

But the second point is actually, as compilers go, the Go compiler is probably the very easiest one I’ve ever seen to be able to fork and edit. Like it’s super-clear code, you’re just editing Go code, which is already pretty easy. But they really put a ton of work into making it readable and understandable. So, like, average people actually can fork the Go compiler and not be completely bamboozled by how difficult everything is, right? Compared to, like, GCC where just building the thing is something that takes you weeks to learn how to do, right, Go is just, like, you run this script and build your compiler [unintelligible 00:19:35]—

Corey: Yeah. Let me clear this quarter on my schedule so I can go ahead and do that. Yeah, no, thank you.

Avery: Yeah. I’ve built copies of GCC and it’s absolutely nightmarish, right? And built people’s forks of GCC for special embedded processors and stuff. And this is, like, a f—this is a career that you can specialize in, building GCC, right? There are people that do this, right? And the Go compiler, it’s really—

Corey: Well, it’s 40 years of load-bearing technical debt.

Avery: Yeah. Yeah. But the Go compiler. It’s very nice; it’s just a program that’s written in Go, that compiles under Go, and then you end up with one binary, right? And as long as you have that binary, everything just works, right? And so, it’s actually surprisingly easy to fork Go. I don’t want to—you know, I wouldn’t put that on the same level of difficulty as, like, not screwing up cryptography, if you’re trying to do it yourself. [crosstalk 00:20:16]

Corey: [crosstalk 00:20:16] their own crypto algorithm that they themselves can’t defeat. Yeah, it turns out that basically, breaking crypto is a team sport. Who knew?

Avery: Yeah. Exactly. Generally, with security, you have this problem a lot, right? It’s a lot harder to build a system that nobody can break into, than it is to break into a random system, right? Because you know, the job of securing something against everybody is much harder than the job of finding something you can break into.

Corey: So, I did have a question about something you said earlier, where one of the use cases—one of the design goals—is not to have a breaking change to a point where an old device cannot still connect to the private network. But you do have a key expiry for devices where a device needs to relog in, and it can be anywhere between 3 and 180 as I look at it. I don’t know if some of the more enterprise-y options have longer options that they can set, but what happen—how do you not have to drive out to the back of beyond to re-authenticate that Raspberry Pi every six months?

Avery: Ah. So, this is something, it’s at the policy layer, and we have not finished refining this to perfection, I would say, right now. What we do have though, if your key does expire, there’s a button in the admin panel to say, like, boost this device for a little bit longer. Sort of unexpire it for another 30 minutes—I don’t remember what the—how much time it is—then you can SSH into the device and do a proper key refresh on it without actually having to drive out there. Now, we did for one version, accidentally break the key reactivation feature so that if the client noticed it’s key is expired, it actually disconnected from the Tailscale network altogether and then didn’t receive the message to, like, “Hey, could you please increase the length of your key?” That was fixable by power cycling it, which you could often get somebody to do without driving all the way out there. But we fixed that, so now that—

Corey: “Have you tried turning it off and back on again,” is still a surprisingly effective way of troubleshooting something.

Avery: Yeah, exactly. So, that wasn’t—I mean, it was kind of annoying for some people. But yeah, the reason we use, by default, every key always expires is because unlimited time credentials are one of the worst security holes that people don’t really acknowledge. Because technically, it’ll never be the, like—you know, it’ll never show up as the highest severity security hole that you have an unlimited time credential sitting in your home directory, but it is something that—well, I can tell a story. There is a company that I heard about that had you know—SSH keys are typically unlimited time credentials; the easiest way to do it is you run ssh-keygen, it puts something in your home directory, you copy the public key to all the devices you want to be able to log into, and then you never think about it again.

So, this is a company that, of course, every developer in their company had done this; they had a production network with a bunch of SSH keys in it. Some not very ethical employee worked there, had keys in their production systems, and eventually got fired. Now, of course, this company had good processes in place, they went through all the devices and took out this person’s public key from all the devices. What they didn’t know is that during lunch one day, this person had gone around to all their coworkers' workstations that hadn’t been locked, downloaded the private keys for those people on his—

Corey: Oh no.

Avery: —computer before he got fired. And so, shortly after he got fired, their entire production network got wiped out. Now, they didn’t have enough forensics at the time to know how it all got wiped out, so they spent some time putting it all back in place, this time with forensics. About a month later—they rebuilt everything from scratch, all new public keys and everything. You couldn’t possibly have any backdoors in this system, right?

And then a month later, it all got wiped out again. This time, the forensics revealed and, like, it was one of the existing employees, coming from a different country, that had gotten into their private production network and wiped everything out. How did that happen? It was because this person had years earlier, downloaded all their public—or private keys when he wandered around through the office. You can fix this problem instantly, by just expiring your keys and forcing your rotation periodically, right?

SSH doesn’t make that very easy. You can with SSH setup, SSH certificate authentication, which is a huge ordeal to get configured, but once it’s working, it solves this particular problem, right? Tailscale [crosstalk 00:24:19]—

Corey: On Mac and iOS, there is a slight improvement to this that I’m a big fan of because I agree with you. I am lousy at rotating my keys, but there’s an open-source project called Secretive that I use on the Mac that stores the private key in the Secure Enclave, which the Mac will not let out of it. And I have to use Touch ID to authenticate every time I want to connect to something. Which can get annoying from time to time, but there is no way for someone to copy that off. Historically, I would—

Avery: That’s true.

Corey: Have a passphrase that was also tied to the key so if someone grabbed it off the disk, it still theoretically would not be usable. And that was—but again, that is an absolute vector that needs to be addressed and thought about. Key rotation is huge.

Avery: And you have to go through this effort to sort it all out, right? So Tailscale, we just have this policy: We don’t do unlimited length credentials; we do key rotation for everything, and we just sort of set different time limits for this rotation depending on how picky you want to be about it. But any key expiry is much, much better than no key expiry. Even if you set it to a six-month key expiry, you still have at least it’s only the six-month window that somebody could theoretically reuse your keys. And we can also rotate keys behind the scenes and so on.

So, in the SSH case, the way people use Tailscale, you stopped opening the SSH port to the world. You’re only SSH when you’re connected over Tailscale. The fact that your Tailscale keys rotate and expire over time is what protects your SSH session. So, you could keep using static SSH keys that never expire—don’t try to figure out all this other complicated stuff, right—and you’re still protected from these private SSH, like, unlimited length keys. Now, that said, for servers, Tailscale does have a button where you can say, like, “Please stop expiring the key.” This is a server, nobody’s ever going to get physical access to the machine.

The only thing we could do with the private key for this machine is allow other people to SSH into it, which is not very dangerous, right? It’s pretty much, like, somebody stealing your SSH authorized keys file; like, it doesn’t really matter. And for that case, you turn off the expiry altogether. But expiring keys is intended for use by, like, devices that employees are actually holding in their hands where if it expires, it’s no big deal, you push the login button and it refreshes.

Corey: There’s something that is very nice about dealing with something that is just so sensible. I mean, we’ve all—at least in the olden days of running sysadmin stuff, we had this problem we would generate—or purchase back in those days—SSL certificates and, great, they expire to a year or so at the end of the year, people forget, and then it would expire you to run around fixing this. And the default knee-jerk response was that was awful. Let’s get the next one for five years so we didn’t have to think about it that long.

And it’s always a wildcard and so it gets put all over the place, and you wind up with these problems. One of the things that Let’s Encrypt has done super well is forcing a rotation every 90 days so you know where it is. It’s just often enough you want to automate it. And ACM, the AWS certificate manager that they use, takes a slightly different approach. It doesn’t give you the private key; it embeds it in other places so they can handle the rotation themselves.

And they start screaming in your email if they can’t verify that it’s time for renewal long before it hits. It’s different approaches to the problem, but yeah, five years out, how should I know all the places the certificate has wound up in that intervening time? Most of the people who did it aren’t there anymore. And one day, surprise, a website breaks, either because its SSL cert isn’t working, or one of the back-end services it depends on suddenly doesn’t have that working. It’s become a mess, so having a forced modernity to these things is important.

Avery: Right. It’s forced modernity, and it’s just basically, it’s all behind the scenes. Like, you don’t even think about the fact that Tailscale gave you a key because that is not relevant to your day-to-day life, right? You logged in, something happened, all these devices ended up on your network. What actually happened is that public and private keys—you know, a private key was generated, the public keys were distributed properly, things are getting rotated, but you don’t have to care about all that stuff.

So, it’s fun that Tailscale is what we call secure by default, right? People love to use it because it’s easier, it makes their life easier, but security teams like it because actually, it changes the default security posture from, like, “Ugh, I’m going to have to tell everybody to please stop doing these five things because it always creates security holes,” to like, “Whoa, the thing that they’re going to do most naturally is actually going to be safe.” Right? I really like that about it. You’re not thinking about certificates, but their certificates are getting rotated exactly as they should be.

Corey: There’s just something so nice about computers doing the heavy lifting for us. It’s one of the weird things about Tailscale is it falls into a very strange spot where there is effectively zero maintenance burden on me, but I still use it to toggle it on or off in scenarios often enough to remember that it’s there and that I’m using it. It is the perfect sweet spot of being somewhat close to top of mind, but never in a sense that is, “Oh, I got to deal with this freaking thing again.” It never feels that way. Logging into it, it has long-lived sessions at the browser, so it isn’t one of those, ah, you have to go back to GitHub and re-authenticate and do all these other dog-and-pony show things. It just works. It is damn near a consumer-level of ease-of-use, start to finish. The hard part, of course, is how on earth you explain this to someone [laugh] without a background in this space.

Avery: Yeah, exactly. It’s something we ask ourselves sometimes is, like, well, you know, Tailscale is great for developers right now. It is easy enough to use, even for consumers, but, like, how would you explain it to consumers and find a good use case for consumers? And it’s something that I think we are going to do eventually, but it hasn’t been, up until now, a super high priority for us just because developers are this sort of like the core audience that we haven’t even finished building a great product that does everything that they want, yet. There is one little feature in Tailscale that’s the beginning of something that's consumer-friendly; it’s called Taildrop.

I don’t know if you’ve seen this one. You can turn it on, and basically, it acts like AirDrop in Apple products, except you don’t need to care about physical proximity and it works with every kind of device, not just Apple devices, right? So, you can add it as—it shows up in the share pane on your Mac OS or Windows or iOS device. You can use it from Linux, you just use it to send files of any type, and it sends them point to point not through a cloud provider so that we never see a copy of the file. It only goes between your devices over your encrypted network. So, that’s something that consumers kind of like.

Corey: Feels like Tailprint for Bonjour could wind up being another aspect of this as well. And I’m still hoping for something almost Ansible-like where run the following command, whether it’s pre-approved or not, on a following subset of things. In my case, for example, it’s, I would love it if it would just automatically, when I press the button, update Tailscale across all of the nodes that support it, namely the Linux boxes. I don’t think you can trigger an App Store update from within a sandboxed app on iOS, but I’ve been—

Avery: Right.

Corey: Surprised before. Yeah. But it’s nice to be able to do some things.

Avery: Yeah. This is one of those—yeah, we get that request a lot for, like, can you push a button to auto-update Tailscale? It makes me really sad that we get this request because the need for this is a sign that all of the OS vendors have completely botched software updates, right? Like, the OS should be the thing, updating your software on a good schedule based on a set of rules, and it shouldn’t be the job of every single application to provide their own software update. It’s actually a massive, embarrassing, security hole that software can even update itself, right?

Because if it can update itself, then you know, imagine someone breaks into the production services of a company that is offering a particular program. They put malware into a version of the software, they put it into the software update server, and then they trigger everything in the network to push the software update to those devices. Now, you’ve got malware installed on all your devices, right? It’s very strange that people asked for this as a feature. [laugh].

Tailscale currently does not have that feature; it doesn’t push software updates on its own. But it’s such a popular feature that I think we’re going to have to implement it because everybody wants this because Windows, for example, is simply just never going to automatically update your software for you. We have to have these weird-super admin rights on your machine so that we can push software updates because nobody else will. I feel really weird about that. You know, the security world should be protesting this more.

But instead, they’re like, asking, can you please put this feature in because I’ve got a checklist in my compliance thing that says, “Is all your software up-to-date?” I don’t have a checklist item that says, “Does any of my software have super-admin rights that they shouldn’t have?” Right? It’s sort of, I guess, the next level of supply-chain management is the big word. Nobody—there is no supply chain management for software.

Corey: There isn’t, for better or worse. I wish there were, but there simply is not. Ugh. Next year, maybe. We hope.

Avery: Yep. So, you have to trust your vendors, fundamentally, which I guess will always be true. That’s true for Tailscale as well, right? Whether or not we include the software update pushing. If you’re installing a VPN product provided by a vendor, you have to trust that we’re going to put the right stuff into the software.

And the best—the only thing I can really do is just be honest about these issues and say, “Well, look, we try our best. We definitely try not to implement features that are going to turn into security holes for you.” And I think we do a lot better than most vendors do in that area. But it’s very hard to be perfect because nobody knows how to do software supply chain well.

Corey: Ugh. I hear you. I that’s the nice thing, too. Honestly, the big reason I know I need to update these things and the reason I want to do it’s actually you. Because whenever I log in and look at my devices in the Tailscale thing, there’s a little icon next to the one that there’s an update available here.

And you have fixed a lot of the niceties on this, like, ah, there’s an update available for the iOS version. It’s, “Really? Because it’s not available in the Apple Store yet,” as I sit there spamming the thing. That stopped happening. There’s a lot of just very nice quality-of-life improvements that are easy to miss.

Avery: Yep, yeah, that’s kind of weird. We actually went a little overboard on the update available notifications for a while because there’s always this trade-off, right? Like I said, we have a policy of never breaking old versions, so when people see the update available notification, they kind of panic. It’s, like, “Oh no, I better install the update, before Talescale cuts me off.” And, like, well, we’re not actually ever going to cut you off, so you shouldn’t have to worry about that stuff.

But on the other hand, you’re not going to get the latest features and bug fixes unless you’re running the latest version, so when people email us saying, “Hey, I’m using Tailscale from six months ago, and I have this problem,” the first thing our support team does is say, “Well, can you please try the latest one, and does the problem go away?” Because it’s kind of inefficient debugging six-month-old software. So, one way we were trying to, like, minimize that cost is, like, hey, we could just tell people there’s a new version available and then maybe they’ll update it themselves. But that resulted in people panicking. Like, oh, no, I need to install the software really, really soon because I can’t afford to break my network.

Corey: Right.

Avery: And because our system is based on WireGuard and this is —you know, I’ll probably jinx it by saying this but, like, we’ve never had an actual security hole that we’ve had to issue a Tailscale update to resolve, right? People see the update available thing and, like, “Oh, no, I bet there’s a whole bunch of vulnerabilities that they fixed.” It’s like, “Well, no.” WireGuard has also never had a vulnerability, right? [laugh] it’s… yeah, it’s, you know, sooner or later there probably will be one, and when there is one, we’ll probably have to make the, you know, update notification in red or something instead of just the little icon on the admin panel. But yeah, it’s—

Corey: [laugh].

Avery: —we try [crosstalk 00:35:23]—

Corey: Nice job on jinxing it, by the way, I appreciate that.

Avery: Yeah I know. I mean, I try to try my best. [laugh]. But I’ve actually been surprised. It’s very much like my experience with all the djb stuff we used in the past.

Like, when we were using qmail and djbdns for years, there was never once a security hole, right? It’s very interesting that it is possible to design software that never once has a security hole. And nobody does that, right? I mean, I would say I’m not as smart as djb; our software is probably, you know, not going to be as one hundred percent perfect as that, but we try really, really hard to aim for that as a goal.

Corey: Yeah. I really want to thank you for taking the time to speak with me about everything Tailscale is up to. And again, congratulations on your Series B. If people want to learn more, where should they go?

Avery: I guess, tailscale.com is the place. We also have @tailscale in Twitter. My own personal Twitter is @apenwarr, which you probably won’t be able to spell unless you Google for me or something—

Corey: But it’s in the [show notes 00:36:19], which makes this even easier.

Avery: It is? Ah, there you go. So yeah, there’s lots of information. But the number one thing I tell people is, like, look, it is a lot easier to get started than you think it is. Even after you’ve heard it 100 times, nobody ever believes how easy it is to get started. Just go to the App Store, download the app, log into your account, and you’re already done, right? Try that and you don’t even have to read anything.

Corey: I would tear you apart for that statement if it weren’t—if it were slightly less true than it is, but it is transformative. Give it a try. It’s a strong endorsement from me. Thank you so much for your time. I appreciate it.

Avery: Thank you, too. Great talking to you, and talk next time.

Corey: Indeed. Avery Pennarun, CEO of Tailscale. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this show, please leave a five-star review on your podcast platform of choice, and smash the like and subscribe buttons, whereas if you’ve hated it, same thing—five-star review, smash the buttons—and also leave an angry bitter comment about how you are smart enough to roll your own crypto, so you don’t understand why other people wouldn’t do it.

Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

Announcer: This has been a HumblePod production. Stay humble.

Newsletter Footer

Get the Newsletter

Reach over 30,000 discerning engineers, managers, enthusiasts who actually care about the state of Amazon’s cloud ecosystems.

"*" indicates required fields

This field is for validation purposes and should be left unchanged.
Sponsor Icon Footer

Sponsor an Episode

Get your message in front of people who care enough to keep current about the cloud phenomenon and its business impacts.