DevOps

Docker 29 Broke Our AWS ECR Deploys: The Missing Permission Nobody Told You About

By Jeff Wray

This morning, we did two deployments 15 minutes apart. The first one passed. The second one failed with a cryptic 403 Forbidden error during docker push to Amazon ECR. Nothing in our code had changed. Here's what happened and how we fixed it.

UPDATE - February 13, 2026

An AWS engineer reached out with the actual root cause: the 403 Forbidden occurs when the caller lacks ecr:BatchGetImage permissions. Docker 29's containerd image store makes HEAD requests that require this permission, which older Docker versions didn't need.

To debug ECR permission failures: Check AWS CloudTrail for the specific permission denial — the failure codes there will tell you exactly which permission is missing.

The fix: Add ecr:BatchGetImage to your IAM policy. If you can't modify IAM permissions quickly (or want a workaround that doesn't require IAM changes), the crane-based solution below still works by bypassing the problematic manifest check entirely.

Why this post exists: CI/CD breaks between deploys with a cryptic 403, and you start pulling every thread. Docker version changed, manifest format changed, and apparently permissions too. A lot of moving pieces landing at once with no clear signal pointing to IAM.

GitHub announced the runner update, but it was buried in release notes — no banner in Actions, no heads-up where developers actually work. Then you've got this whole chain to debug: GitHub Actions → Docker → Laravel Vapor → ECR → IAM, where each layer can mask where the actual problem lives. Modern infrastructure has multiple choke points and failure modes that look identical from the outside.

For context: Laravel Vapor is a serverless deployment platform that sits on top of AWS Lambda and deploys Docker images to ECR. As of this writing, even Vapor's recommended IAM policy doesn't include the new required permissions yet.

Even opening an AWS support case didn't yield quick results. The goal was to get unblocked and move on, and the crane workaround accomplished that. Now we know the cleaner fix — add the IAM permissions.

The Symptom

Our GitHub Actions workflow builds a Docker image and pushes it to Amazon ECR. Overnight, the push step started failing:

unexpected status from HEAD request to
https://ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/v2/REPO/manifests/IMAGE_TAG:
403 Forbidden

Every layer pushed successfully. The failure only happened at the final manifest step — a HEAD request to check if the manifest already existed returned 403 instead of the expected 404 (not found) or 200 (exists).

The Root Cause

GitHub Actions recently updated their ubuntu-latest runner image, upgrading Docker from 28.0.4 to 29.1.5 (actions/runner-images#13474).

Docker 29 made a significant change: containerd is now the default image store. This means Docker 29 stores and pushes images using OCI (Open Container Initiative) manifest format instead of the legacy Docker v2 Schema 2 format. ↳ This is true, but it's not why the push fails.

The problem? Docker 29's containerd makes a HEAD request to check if the manifest exists before pushing. This request requires ecr:BatchGetImage permission — a permission that older Docker versions didn't need for the push operation. Without it, ECR returns 403 Forbidden.

The simplest fix: Add ecr:BatchGetImage to your IAM role/policy. See the update above for details.

This is tracked in moby/moby#51532 and discussed in actions/runner-images#13474.

What We Tried (Before We Knew It Was Permissions)

We initially assumed this was a manifest format issue. These attempts all failed because we were solving the wrong problem. Documenting them here so you don't waste time on the same dead ends.

Attempt 1: Pin Docker Client to v27

We downloaded the Docker 27 static binary and placed it ahead of Docker 29 in $PATH:

- name: Install Docker 27 client
  run: |
    curl -sL https://download.docker.com/linux/static/stable/x86_64/docker-27.5.1.tgz | tar xz
    cp docker/docker /usr/local/bin/docker

Result: Failed. The Docker client was v27, but the daemon was still v29 with containerd. The daemon controls the push, not the client. You can't fix a server-side behavior by swapping the client binary.

Attempt 2: --platform linux/amd64 Flag

Docker 29 introduced a --platform flag for docker push that forces a single-platform push instead of a multi-platform manifest index:

- name: Wrap docker push
  run: |
    REAL_DOCKER=$(which docker)
    cat > /usr/local/bin/docker <<WRAPPER
    #!/bin/sh
    if [ "\$1" = "push" ]; then
      shift
      exec $REAL_DOCKER push --platform linux/amd64 "\$@"
    fi
    exec $REAL_DOCKER "\$@"
    WRAPPER
    chmod +x /usr/local/bin/docker

Result: Failed. This changes which manifest gets pushed, but doesn't affect the HEAD request that requires ecr:BatchGetImage. Still 403.

Attempt 3: Build-time Flags

We added several environment variables and build options:

env:
  BUILDX_NO_DEFAULT_ATTESTATIONS: 1
  DOCKER_DEFAULT_PLATFORM: linux/amd64

# On the build command:
--provenance=false

Result: These are useful for keeping the image clean (no attestation layers), but they don't change the manifest format used during push. Still 403.

The Fix: Bypass Docker Push with Crane

The solution was to stop using docker push entirely. Instead, we use crane, a lightweight Go tool for interacting with container registries.

The approach:

  1. docker save exports the image as a standard Docker v2 tarball
  2. crane push sends that tarball to the registry using Docker v2 Schema 2 manifest format

Here's the workflow step:

- name: Wrap Docker push for ECR OCI compatibility
  run: |
    # Docker 29 containerd pushes OCI manifests that ECR rejects with 403
    # Bypass docker push entirely: save image to tarball, push with crane
    CRANE_VERSION=v0.20.3
    curl -sL "https://github.com/google/go-containerregistry/releases/download/${CRANE_VERSION}/go-containerregistry_Linux_x86_64.tar.gz" \
      | tar xz -C /usr/local/bin crane
    echo "crane $(crane version)"

    REAL_DOCKER=$(which docker)
    cat > /usr/local/bin/docker <<WRAPPER
    #!/bin/sh
    if [ "\$1" = "push" ]; then
      shift
      # Parse image reference (skip flags like --platform)
      IMAGE=""
      for arg in "\$@"; do
        case "\$arg" in
          --*) ;;
          *) IMAGE="\$arg" ;;
        esac
      done
      echo "=== crane push workaround for Docker 29 / ECR OCI ==="
      echo "Image: \$IMAGE"
      TMPTAR=\$(mktemp /tmp/image-XXXXXX.tar)
      $REAL_DOCKER save "\$IMAGE" -o "\$TMPTAR" && crane push "\$TMPTAR" "\$IMAGE"
      RC=\$?
      rm -f "\$TMPTAR"
      exit \$RC
    else
      exec $REAL_DOCKER "\$@"
    fi
    WRAPPER
    chmod +x /usr/local/bin/docker
    echo "Docker wrapper with crane push installed."
  working-directory: .

Why This Works

  • docker save always produces a Docker v2 format tarball, regardless of the storage backend. Even with containerd, the save command outputs the standard Docker archive format.
  • crane push reads that tarball and pushes it to the registry using Docker v2 Schema 2 media types — the format ECR has supported for years.
  • The wrapper transparently intercepts docker push commands while passing all other Docker commands (build, login, tag, etc.) through to the real Docker binary.
  • crane reads Docker's credential store (~/.docker/config.json), so ECR authentication works automatically after docker login.

Why a Wrapper Script?

If you control the push command directly, you could just call crane push yourself. But many deployment tools (Laravel Vapor, Terraform, Pulumi, etc.) run docker push internally. The wrapper lets you fix the push behavior without modifying the deployment tool.

GitHub Actions ECR Authentication

This workaround assumes you're already authenticating to ECR in your GitHub Actions workflow. The standard setup uses the aws-actions/amazon-ecr-login action, which populates Docker's credential store:

- name: Configure AWS credentials
  uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsRole
    aws-region: us-east-1

- name: Login to Amazon ECR
  uses: aws-actions/amazon-ecr-login@v2

The ECR login action writes credentials to ~/.docker/config.json, which crane reads automatically. No additional authentication setup is needed — the wrapper just intercepts docker push and uses the same credentials that were already configured.

Key Takeaways

1. Docker 29 requires new IAM permissions for ECR

The switch to containerd changes how Docker checks manifests before pushing. The manifest format breaks ECR. ↳ Actually, it requires ecr:BatchGetImage permission that older versions didn't need. If you push to ECR from GitHub Actions, you may be affected without warning when your runner image updates.

2. The 403 error is a permissions issue

Docker 29's containerd requires ecr:BatchGetImage permission for the manifest HEAD request. Check CloudTrail for the specific denial. Add this permission to your IAM policy, or use the crane workaround below.

3. Client-side fixes don't work

The Docker daemon controls the push, not the client binary. Swapping the CLI doesn't change the daemon's behavior.

4. --platform doesn't change the manifest format

It selects a single platform from a multi-platform index, but the manifest is still OCI format.

↳ This is true but irrelevant — the issue is permissions, not manifest format.

5. crane is a reliable escape hatch

When Docker's push behavior doesn't work with your registry, crane gives you explicit control over what gets pushed and in what format.

Timeline

Time Event
9:00 AM First deploy succeeds — Docker 28.0.4 on runner
9:15 AM Second deploy fails with 403 Forbidden — runner image updated to Docker 29.1.5 between builds
+1 hour Identified Docker 29 containerd change as root cause; Docker 27 client pin and --platform flag attempted — both failed
+2 hours crane push workaround deployed and confirmed working

When Can You Remove the Workaround?

Immediately — if you add ecr:BatchGetImage to your IAM policy.

The crane workaround was developed before we knew the root cause. Now that we know it's a permissions issue, the fix is straightforward: update your IAM role to include the permission Docker 29's containerd needs.

If you're using the crane workaround and want to remove it:

  1. Add ecr:BatchGetImage to your ECR push IAM policy
  2. Remove the crane wrapper script from your CI/CD
  3. Test a push with native docker push

References

CI/CD Pipeline Breaking? Let's Fix It.

Infrastructure issues like this can cost days of debugging. A fractional CTO brings the experience to diagnose these problems quickly and implement robust solutions.

Get Help With Your Infrastructure