How AWS dumps the mental burden of inconsistent APIs on developers

Years ago, I founded a company that built iPhone apps. And those apps needed web services.

Back in 2009, that meant you ordered rack-mounted servers at Dell, carried them into your local data center for colocation, and managed them by hand or through automation tools like Ansible. I vividly remember spending nights restoring crashed servers and scouring web shops for the right kind of replacement power unit.

Then I was introduced to AWS.

I was blown away by its pervasive infrastructure-as-code mindset. Suddenly, spinning up a server was an API call. Restoring a backup was an API call. Deploying a new subnet was … well, you get the point. My software developer brain was overjoyed.

I’m still a big fan of infrastructure-as-code and Amazon’s API mandate. However, my relationship with AWS APIs has soured a bit. Years of intensive interaction have brought to the surface some details I wasn’t aware of when I started out. And one of those details has become supremely annoying: inconsistent APIs.

How inconsistency hurts developers — and the bottom line

Why bother ranting about inconsistent APIs? Why not accept them the way they are, read the docs, write the tests, and implement the APIs as they come?

It comes down to the developer experience. As a developer, I want to focus on the business problem I’m working to solve. That means everything else needs to operate without taking up conscious thought. I want my OS and IDE to be predictable, so they don’t get in my way. I want my keyboard shortcuts to always work, so I can blindly and efficiently write my code. And I want my APIs to be consistent, so I can build muscle memory to seamlessly integrate them.

If I need to look up the docs for every single API, or even just feel like I might have to look them up, it detracts from my actual work: solving business problems.

Inconsistent APIs cost more time to implement, and they increase the mental load on developers. But they might also reduce the quality of the applications using them. Worst case scenario, this can lead to a bug in production, with serious time and effort required to fix it. And that cost adds up.

Calling every single API reveals inconsistency in every layer

For a recent project, I collected a daily snapshot of every resource in every AWS account in my AWS Organization. Existing services like Config, Cost and Usage Reports, and Resource Groups don’t have full coverage (this frustrating fact is worthy of a blog post of its own). To collect my resources, I built a state machine that looped over all APIs and called every List and Describe API. I thought I’d get a consistently structured body of responses. Boy, was I wrong.

What I found was that there’s no consistency between different AWS services, between related AWS services, and even within a single service’s API. Let me give you an example of each.

Inconsistency between AWS services

There are many ways the various service APIs differ. It starts with the meaning of List and Describe calls. Logically, ListSomething would return a list of all resources of that type, and Describe would describe a single resource in detail. But there’s no such consistency.

IAM, for example, only has List operations: ListRoles, ListUsers, ListPolicies, and so on. These contain lists of resources, including their details.

Then there’s DynamoDB. It also has List operations, like ListTables, but this call only returns a list of plain table names. Nothing more. If you want to retrieve the details of those tables, you need to call DescribeTable for each of the results.

Next up: RDS. This service has no List operations at all! Instead, there are only Describe calls, such as DescribeDBInstances, DescribeDBClusters, and DescribeDBSnapshots. These calls return a list of resources, including all of their details. So functionally, they behave like the List operations in IAM.

Kinesis is a great example of inconsistent APIs within the same family of services. By looking at the APIs, you can tell the same people designed them or were at least inspired by one another. This is a trap that can lead you to falsely believe the APIs are consistent.

For example, Kinesis Data Streams (KDS) has ListStreams, Kinesis Data Firehose has ListDeliveryStreams, and Kinesis Data Analytics has ListApplications. These calls all follow the DynamoDB API design, where the resources’ names are returned in the List operation and you need to retrieve their details with a Describe call. Fine.

These Describe calls all return an ARN. Their respective fields are StreamARN, DeliveryStreamARN, and ApplicationARN. OK.

They also all return a name. Their fields are StreamName, DeliveryStreamName, and ApplicationName. Still good.

Then we get to the creation date. For KDS, this field is StreamCreationTimestamp, but for both Firehose and Analytics, it’s CreateTimestamp. They all return a Unix timestamp. Why does KDS have a different key?

When we include Kinesis Video Streams (KVS), it gets even worse. This service also has the StreamARN and StreamName fields, but you can get them through ListStreams and DescribeStream. And for KVS, the creation timestamp is stored under yet another name: CreationTime.

Inconsistency within an AWS service

Inconsistency within a service is the worst type of inconsistency because, ostensibly, a single service is maintained by a single team.

It’s a real problem for developers. You can make a mental note that when talking to a different API, you might need to use a different syntax. But when various calls within the same service are inconsistent, you can never really rely on muscle memory, ever.

One of the more painful examples of this can be found working with REST APIs in API Gateway. This is a complex service, so it can be forgiven for having a complex service API. It has dozens of resources, including API Keys, Methods, Resources, Usage Plans, and Documentation Parts. Each of these resource types can be retrieved in singular form with a request like GetAuthorizer and in plural form with a request like GetAuthorizers. (Side note: That’s a whole new variation on the Describe and List standard.)

All of the list calls, including GetApiKeys, GetAuthorizers, GetBasePathMappings, and GetRestApis, return a JSON dictionary that has an items field containing the list of resources. It looks like this:

{
 "items": [ <list of resources> ]
}

All of them do. Dozens of calls. They all return items. Except for GetStages, which returns the item key. What?

{
    "item": [ <list of resources> ]
}

This items versus item example can lead to bigger issues. If you’ve seen 20 examples of an items response, you’d be forgiven for assuming the 21st also returns items. An automated check or test deployment might surface this bug, but it might also miss it. Fixing it after it’s live can cost significant time as well as your reputation.

An AWS API is a promise — even if it’s a bad one

When AWS releases an API into general availability, it makes a promise. It says: You can use this API like this, and you will continue to be able to use this API exactly like this until the end of time.

And that’s a good thing! It builds trust with developers. It allows systems using those APIs to work into perpetuity as well. No maintenance, no forced migrations, no deprecations.

However, this also means that when AWS releases a bad or inconsistent API, it has no way of changing it. The API is here to stay, for better or worse.

Outside of AWS, we use API versioning to deal with bad or inconsistent APIs while keeping the API promise.

API versioning has been around for decades, and it’s even supported out of the box by Amazon API Gateway. You can either use path-based versioning (https://api.mydomain.com/v1/something), query string versioning (https://api.mydomain.com/something?v=1), or use HTTP headers. AWS APIs use none of these, yet we’ve seen API versioning in a few services.

For example, there’s KinesisAnalytics and KinesisAnalyticsV2. The documentation for the first states:

> This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only supports SQL applications. Version 2 of the API supports SQL and Java applications.

Apparently, the new Java features couldn’t be integrated into the API without breaking backwards compatibility. We can see why in the StartApplication call, which takes the following input in V1:

python
response = client.start_application(
    ApplicationName='string',
    InputConfigurations=[
        {
            'Id': 'string',
            'InputStartingPositionConfiguration': {
                'InputStartingPosition': 'NOW'|'TRIM_HORIZON'|'LAST_STOPPED_POINT'
            }
        },
    ]
)

In V2, the call looks like this:

python
response = client.start_application(
    ApplicationName='string',
    RunConfiguration={
        'FlinkRunConfiguration': {
            'AllowNonRestoredState': True|False
        },
        'SqlRunConfigurations': [
            {
                'InputId': 'string',
                'InputStartingPositionConfiguration': {
                    'InputStartingPosition': 'NOW'|'TRIM_HORIZON'|'LAST_STOPPED_POINT'
                }
            },
        ],
        'ApplicationRestoreConfiguration': {
            'ApplicationRestoreType': 'SKIP_RESTORE_FROM_SNAPSHOT'|'RESTORE_FROM_LATEST_SNAPSHOT'|'RESTORE_FROM_CUSTOM_SNAPSHOT',
            'SnapshotName': 'string'
        }
    }
)

Fair enough. Backwards compatibility was not possible, so AWS released a new version. Applications using the old API could continue to function as they did, and the promise was kept.

This begs the question: Are there more V2 APIs? Turns out, there are:

ApiGatewayV2
CloudHSMV2
GreengrassV2
WAFV2
ElasticLoadBalancingv2 (note the inconsistent lowercase “V”)

But these aren’t necessarily new APIs on existing services. Rather, they’re actual new services with their own API.

ApiGatewayV2 is the API for HTTP API Gateways
CloudHSMV2 is a different service from CloudHSM
ElasticLoadBalancingv2 covers Application Load Balancers and Network Load Balancers, while ElasticLoadBalancing is the API for Classic Load Balancers

It’s quite amazing on how many levels the AWS APIs can be inconsistent.

The top 4 worst service APIs I’ve worked with

Inconsistency is always a pain, but some service APIs are worse offenders than others. I’ve rounded up examples of the most inconsistent APIs I’ve encountered, starting with our favorite content delivery network.

Amazon CloudFront

The worst service API I’ve encountered so far is CloudFront. This API is inconsistent compared to other services, is inconsistent within its own domain, and returns responses in such a broken way that it should be considered a bug.

First, the method-naming and response format. CloudFront has Get, List, and Describe calls. The List operations all return a dictionary with an Items key, which is a list:

{
    "CachePolicyList": {
        "NextMarker": "string",
        "MaxItems": 123,
        "Quantity": 123,
        "Items": [
            {
                "Type": "managed"|"custom",
                "CachePolicy": {
                    ...
                }
            }
        ]
    }
}

This is a divergence from every other API, all of which have the resources on the top level. For example, here’s S3 ListBuckets:

{
    "Buckets": [
        {
            "Name": "string",
            "CreationDate": "string"
        },
    ],
    "Owner": {
        "DisplayName": "string",
        "ID": "string"
    }
}

Then there are the APIs within CloudFront itself. The service API offers different ways to get a list of distributions:

ListDistributions
ListDistributionsByCachePolicyId
ListDistributionsByKeyGroup
ListDistributionsByOriginRequestPolicyId
ListDistributionsByRealtimeLogConfig
ListDistributionsByWebACLId

From these APIs, ListDistributions, ListDistributionsByRealtimeLogConfig, and ListDistributionsByWebACLId return the object => list => object structure, as seen above. But ListDistributionsByCachePolicyId, ListDistributionsByKeyGroup, and ListDistributionsByOriginRequestPolicyId only return distribution identifiers in string format (object => list => string), like below:

{
    "DistributionIdList": {
        "Marker": "string",
        "NextMarker": "string",
        "MaxItems": 123,
        "IsTruncated": True|False,
        "Quantity": 123,
        "Items": [
            "string",
        ]
    }
}

As if this isn’t enough, all of these calls return empty output when no resources are found. Not an empty list, not a dictionary stating “Quantity”: 0, no, nothing, nada. Zero bytes. Content-Length: 0.

This is bad. This will break any parser because an empty string is not valid JSON. It also means that all of these calls have two different types of output: a dictionary or an empty string. As a developer, it means you have to inspect the result length before passing it on to the parser. You can’t even write a simple try-catch block because then you wouldn’t be able to differentiate between a failing API call and zero results. This is not only bad design, but once again, it’s inconsistent with all the other services that at least offer reliable output.

The kicker is that because this bad API is a promise, it’s guaranteed to stay bad forever.

Amazon WorkSpaces

The response keys for all Describe calls in the WorkSpaces API are in CamelCase, like this:

{
   "Images": [ 
      { 
         "Created": number,
         "Description": "string",
         "ErrorCode": "string",
         "ErrorMessage": "string",
         "ImageId": "string",
         "Name": "string",
         "OperatingSystem": { 
            "Type": "string"
         },
         "OwnerAccountId": "string",
         "RequiredTenancy": "string",
         "State": "string"
      }
   ],
   "NextToken": "string"
}

Except DescribeIpGroups, which uses lowerCamelCase. Additionally, it uses Result as the result key, while all other calls have a field like Images, Workspaces, or Bundles.

{
   "NextToken": "string",
   "Result": [ 
      { 
         "groupDesc": "string",
         "groupId": "string",
         "groupName": "string",
         "userRules": [ 
            { 
               "ipRule": "string",
               "ruleDesc": "string"
            }
         ]
      }
   ]
}

The DescribeWorkspaceBundles call has a CreationTime field, for DescribeWorkspaceImages it’s Created, and the DescribeWorkspaces call returns no launch or creation time at all.

Amazon Redshift

The following Redshift calls have these strings as the top level key in the response dictionary:

DescribeClusterParameterGroups: ParameterGroups
DescribeClusters: Clusters
DescribeClusterSnapshots: Snapshots

So DescribeClusterSecurityGroups should return SecurityGroups, right? No. It’s ClusterSecurityGroups.

Amazon Cognito

Cognito has always been “different,” and its API is no exception. Where every other API provides sensible defaults for optional parameters like MaxResults, Cognito forces you to supply MaxResults for some calls: ListUserPools and ListResourceServers. Cognito wouldn’t be Cognito if those values were consistent; the range for MaxResults is 1-60 for ListUserPools and 1-50 for ListResourceServers.

The best AWS service API

But it’s not all doom and gloom. There are a few APIs that are consistent and have very thoughtful designs. I specifically want to call out Amazon FSx, which has the following JSON response keys:

DescribeBackups: Backups
DescribeDataRepositoryTasks: DataRepositoryTasks
DescribeFileSystems: FileSystems

Very consistent and predictable!

The details of these calls all contain a CreationTime and ResourceARN field, as well as an identifier, which is found respectively under BackupId, TaskId, and FileSystemId. Also consistent and predictable. Thank you, FSx team!

API inconsistencies are fixable, if AWS chooses to fix them. It probably won’t.

Building services and innovating at Amazon’s scale is hard. Really hard. AWS has impressively solved this problem with its two-pizza team approach, which allows service teams to autonomously design and implement their solutions. But while these isolated development teams can go fast, it’s obviously a trade-off with consistency. We see that in the AWS console, in CloudFormation definitions and coverage, and in the APIs.

I don’t believe this problem is unsolvable.

AWS could have a central API review board, it could have consistent guidelines, it could have API versioning, it could design its APIs in the open, it could ask for partner feedback under NDA, it could first deliver APIs internally and have another team build a consistent interface on top of that … the list goes on. Solutions galore, but I’m not holding my breath.

Despite the problems that inconsistent APIs create for AWS customers, the burden will continue to be on us developers for the foreseeable future.