“The cloud” may be a nebulous thing, a concept rather than a piece of hardware. The term broadly refers to data and services that are accessed via the internet and hosted on shared hardware in a third-party data center. Companies that offer these services are called “cloud providers.”
“Cloud infrastructure” is the means by which you build your own systems and applications in the cloud. It encompasses all of the underlying tools and services that your applications and workloads run on top of. At its most basic, cloud infrastructure mimics the familiar components of conventional IT systems: servers (referred to as “compute resources” in cloud parlance), storage, databases, and networking components. But as cloud providers have matured, they’ve also developed tools and concepts specific to the cloud.
To understand the appeal of cloud infrastructure, let’s turn back the clock to 2006. Back in those primitive years, organizations bought their own computing and networking hardware. Even relatively small companies owned rack-mounted servers, network storage drives, routers, load balancers, firewalls — all of the physical components necessary to build a functioning IT system. We call that “on-premises” or “on-prem” infrastructure.
In the enlightened 2020s, we may wonder why small companies back then would want to own a bunch of expensive, soon-to-be outdated computing and networking hardware. They usually didn’t. But at the time, there weren’t many alternatives. If you wanted email, file sharing, databases and all the benefits of modern software systems, you needed physical infrastructure to power it. That pretty much meant buying and operating the components yourself.
Buying equipment took time and money; setup and operation of the equipment required dedicated staff or contractors or both; and security and maintenance of the equipment necessitated many other expenses (like industrial air conditioning, power, locks, fire suppression, alarms, etc.). This meant that companies were investing significant amounts of capital and effort to build internal capabilities for something that had little do with their core business models.
Virtualization and the birth of the cloud
It used to be that “infrastructure” meant “on-prem” infrastructure. But in the aughts, virtualization technology matured rapidly, paving the way for the modern commercial cloud.
A typical computer hosts a single operating system, and the available CPU (central processing unit), RAM (random-access memory), storage, and networking capacity all map directly to physical properties of the hardware. If you want more disk space for your on-prem infrastructure, you need to install more or bigger hard disks. If you want more CPU power, you need to upgrade the physical processors. Virtualization changes that paradigm, allowing multiple operating systems to run on a single hardware host and to share its physical resources. In this scenario, resources are divided among guest operating systems. One guest operating system may be assigned, say, 20% of the hardware’s available CPU, while another guest receives 80%. This allocation can be changed at any time by updating a setting in the virtualization software. In some cases, the total combined resources assigned to the hosts can even exceed the actual physical resources of the hardware.
In 2000, Amazon began to break its monolithic applications into decentralized services. In order to help Amazon engineers build services more quickly and cheaply, the company used virtualization to create flexible and efficient computing environments, running on shared commodity hardware. Amazon soon realized that other companies may be willing to pay for these virtualized infrastructures, and in 2006 it began offering “cloud” computing as a service under the name Amazon Web Services (AWS).
Moving infrastructure to the cloud
In the early days of the public cloud, providers focused on three core services: compute (i.e., virtual servers), databases, and storage. But before long, customers wanted to move more of their IT infrastructure to the cloud or to have entirely cloud-native systems. Networking, queues, caches, logging, monitoring, DNS — a wide variety of components go into building an IT system, and cloud providers rushed to offer them all as services. Most anything that is possible on-prem can now be created with cloud-native services.
But cloud providers have gone beyond just mimicking on-prem concepts with virtualization — they now offer services that abstract some common functionality even further, blurring the boundaries between infrastructure, software and services. Secrets, for example, is the AWS secrets-management service. You can use it to store and retrieve sensitive information, like credentials and access tokens, but you never have to think about the hardware or the software that it runs on. It’s a pure service: You just expect it to do whatever it promises. But you may still think of it as a component of your cloud infrastructure. More and more cloud infrastructure components are being built this way: not just as hardware or software in someone else’s data center, but as completely self-contained and abstract services. And as these services mature and become more provider-specific, the definition of “cloud infrastructure” grows increasingly nuanced.
Who uses cloud infrastructure and why?
There are a lot of good reasons to build systems using cloud infrastructure:
- Zero upfront cost. Since you’re not buying hardware, there is no capital investment. This, on its own, is a major advantage of cloud infrastructure. Upfront cost had been a major barrier to entry for a lot of companies, but now anyone with a credit card can instantly begin using enterprise-grade infrastructure.
- Pay-as-you-go billing. Because cloud components are virtualized, they can be created and destroyed in seconds. For the most part, cloud infrastructure is billed by time, storage, and throughput. This fine-grained billing doesn’t guarantee affordability, but it makes the cost of running a given service much more flexible. When something isn’t being used, it can often be scaled down, shut off, or destroyed in an automated way.
- Agility and flexibility. Beyond the basic flexibility of scaling, cloud infrastructure also helps facilitate the quick adoption of new technologies. As newer and better compute resources and services become available, you can test and integrate them without having to make long-term investments. Most services even allow a trial period where initial usage is free.
- Resilience. All major cloud providers offer services in multiple regions and zones, making it possible to build systems that can tolerate failures in any single zone, or even multiple zones. And all providers offer backup and restoration solutions that are relatively simple. This resilience is much more difficult and expensive to build in an on-prem system. Take this from someone who regularly had to take tape backups home in case the building burned down overnight!
- Abstract services. As I mentioned, cloud providers offer a growing number of managed services, allowing you to run, for example, a database without needing to be an expert in the minutiae of administering databases. More and more, things that used to require very specific technical knowledge are now provided as services, allowing organizations to focus on the functionality of their systems rather than the nuts and bolts of the infrastructure.
- Reduced operational burden. Cloud infrastructure involves a shared responsibility model: The cloud provider is responsible for the operation, maintenance and security of the cloud, and you are responsible for what you do in the cloud. If the data center floods, that’s the cloud provider’s responsibility. If someone downloads a bunch of confidential data from your unsecured storage bucket, that’s your responsibility. This is a major point: Many compliance frameworks have specific requirements for the physical security of your hardware, and with a cloud provider you don’t need to worry about any of that.
These benefits apply to all customers, but they strongly favor startups and small businesses. Organizations that don’t have a lot of money, expertise or time can build systems relatively quickly and cheaply in the cloud. This has been a factor in the explosion of startups over the past 15 years. And again, this is not limited to technology companies: All companies require IT infrastructure of some kind. Combined with purely web-based offerings like Google Suite, cloud infrastructure providers have made it much cheaper and simpler for small businesses to build out IT solutions.
But enterprise-scale businesses also benefit from cloud infrastructure, and after years of hesitation, many are now migrating to full or hybrid cloud models. This has been a slow process; enterprises move slowly in general, and they have been especially anxious about regulatory requirements, security, performance, cost of switching, vendor lock-in and many other practical concerns. But keep in mind that a large organization may have sunk a lot of time and money into building out traditional on-prem IT capabilities, and multiple internal groups were committed to protecting these investments. Traditional IT divisions have faced, in some ways, an existential crisis. But it’s hard to deny the benefits of cloud infrastructure, and large-scale organizations will continue migrating to the cloud and building out hybrid solutions that combine on-prem infrastructure and cloud components.
Why not use cloud infrastructure?
While cloud infrastructure looks good on paper, there are a few reasons that it may not be a good fit:
- Cost of switching. If you have an established system, it can be painful and time-consuming to move to a new one. This is true no matter where you’re moving to or from, and for larger systems this can be a major hurdle.
- Pricing does not fit your use case. In some cases, it may be cheaper to host systems or run workloads yourself. AWS, for example, charges nontrivial fees for data transfer, and depending on what you’re trying to achieve, you may pay less running your workloads in-house.
- Lack of control. The shared responsibility model reduces your day-to-day operational burden, but it also means that if underlying cloud services go down, there’s little that you can do about it. Take it from someone who survived the Great S3 Outage of 2017! There are ways that you can architect for fault tolerance, but they uniformly require extra expense, and striking the right balance can be hard.
- Security. I hesitate to mention this, because I think that security is an issue no matter where your infrastructure lives. But using cloud infrastructure does expose new and unique vulnerabilities that require expertise to mitigate.
- Lock-in. I don’t think this issue is unique to the cloud; I personally have not worked with any system of substance that was easy to migrate to a comparable provider. But as cloud providers offer more proprietary services, customers will become increasingly wedded to the conventions of the provider.
Cloud infrastructure providers
There are a number of cloud infrastructure providers, and the top three players are all outgrowths of larger tech conglomerates.
AWS is the oldest, and by far the largest. Exact numbers are hard to come by, but AWS currently holds about a third of the cloud computing market and offers more than 200 discrete products and services. So many services, in fact, that AWS struggles to come up with decent names for them all. AWS also has the largest geographic reach of the major providers, and it seems the most focused on migrating enterprise data and workloads to the cloud. AWS also offers plenty of tools and incentives to lure you into the cloud.
Microsoft Azure is the second-largest provider, with around a fifth of the market. Azure is focused on integrations with Windows products and services, which is attractive to organizations who already have a heavy investment in Microsoft products like Office 365 and Teams. These tend to be enterprise organizations who value predictability and consistency across their tech stacks. But it’s also possible to run Linux and other open-source tools on the Azure platform. Azure has seen steady growth in recent years, though direct comparisons to AWS are difficult since the two report growth differently. In any case, Azure is a serious player and will likely continue to gain market share.
A distant third in the space is Google Cloud Provider, with around one-tenth of the market. Google Cloud Provider’s focus, beyond interoperability with other Google products, is to leverage its reputation as a big-data expert and to focus on open-source solutions and Kubernetes infrastructure.
There are many more companies in this market, but they’re all small by comparison. Any new contender would have a long way to catch up in terms of service offerings, global reach, service and support, and brand recognition. That’s a big task. It’s possible we’ll see a new cloud infrastructure giant at some point, but the top three seem positioned to dominate the market for the near future.