Docker Image Was 8.2GB. I Got It to 127MB. Deploy Time: 18min → 40s

Our Docker image was 8.2GB.

Every deploy took 18 minutes just to transfer the image.

AWS charged us $47 per deploy for data transfer.

We deployed 6 times a day.

That's $282/day. $8,460/month. Just to move a Docker image.

Here's how I got it to 127MB in one afternoon.

The Dockerfile That Cost Us $8,460/Month

This was our Dockerfile in March 2024:

FROM node:18
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y wget
RUN apt-get install -y git
RUN apt-get install -y python3
RUN apt-get install -y build-essential
COPY package.json .
RUN npm install
COPY . .
RUN npm run build
EXPOSE 3000
CMD ["npm", "start"]

FROM node:18
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y wget
RUN apt-get install -y git
RUN apt-get install -y python3
RUN apt-get install -y build-essential
COPY package.json .
RUN npm install
COPY . .
RUN npm run build
EXPOSE 3000
CMD ["npm", "start"]

Looks innocent, right?

Image size: 8.2GB Layers: 47 Deploy time: 18 minutes AWS transfer cost: $47/deploy

Every single RUN command created a new layer.

Every layer got cached. Every layer got shipped.

We were shipping apt-get update cache, npm install cache, build artifacts, source files, node_modules for development AND production.

Everything.

The Wake-Up Call

Friday afternoon. 4:47 PM.

Critical bug in production. Users couldn't checkout.

Me: "I'll push a fix in 5 minutes."

Git push. CI/CD triggered.

4:52 PM — Build started 4:58 PM — Build complete 4:59 PM — Pushing image to ECR… 5:01 PM — 12% uploaded 5:04 PM — 28% uploaded 5:08 PM — 51% uploaded

CEO on Slack: "How long until this is fixed?"

Me: "Image is still uploading. Maybe 10 more minutes."

CEO: "It takes 10 minutes to deploy a bug fix?"

Me: "The image is 8.2GB."

CEO: "What the hell is in that image?"

Good question.

What Was Actually In That 8.2GB

I pulled the image locally and analyzed it.

docker history our-app:latest --no-trunc --human

docker history our-app:latest --no-trunc --human

Layer breakdown:

Node.js base image: 900MB
apt-get packages: 1.2GB (including Python, build tools we never used)
npm install (all dependencies): 2.1GB
Source files: 180MB
Build artifacts: 340MB
Cached apt-get lists: 890MB
Old node_modules from previous builds: 1.8GB
Random stuff we forgot about: 1.8GB

We were shipping:

Development dependencies in production
Build tools we only needed during build
Source TypeScript files alongside compiled JavaScript
Three different versions of node_modules (Docker layer caching gone wrong)
Python and build-essential (never used in production)

Production runtime actually needed:

Node.js runtime
Compiled JavaScript
Production dependencies

That's it.

The First Attempt: Multi-Stage Build

I rewrote the Dockerfile with multi-stage builds.

# Stage 1: Build
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

# Stage 2: Runtime
FROM node:18-slim
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY package*.json ./
EXPOSE 3000
CMD ["node", "dist/index.js"]

# Stage 1: Build
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

# Stage 2: Runtime
FROM node:18-slim
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY package*.json ./
EXPOSE 3000
CMD ["node", "dist/index.js"]

Result: Image size: 1.2GB Deploy time: 4 minutes

Better. Not good enough.

The Second Attempt: Alpine Linux

Switched to Alpine base image.

# Stage 1: Build
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run buil

# Stage 2: Runtime
FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
EXPOSE 3000
CMD ["node", "dist/index.js"]

# Stage 1: Build
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run buil

# Stage 2: Runtime
FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
EXPOSE 3000
CMD ["node", "dist/index.js"]

Result: Image size: 420MB Deploy time: 90 seconds

Getting closer.

The Final Version: 127MB

The breakthrough: node_modules was still huge (340MB).

Most of it? Unused dependencies.

I audited every package:

npm ls --all > deps.txt

npm ls --all > deps.txt

Found:

lodash: Used 3 functions. Entire library: 24MB.
moment.js: Used for date formatting. 67MB. (Native Intl does this now)
aws-sdk: Imported entire SDK. Only needed S3 client. 89MB waste.
47 other packages: Pulled in as transitive dependencies. Never used.

I replaced:

lodash → wrote 3 utility functions (12 lines)
moment → native Date and Intl
aws-sdk → @aws-sdk/client-s3 (only S3)
Removed unused dependencies

New package.json: 12 dependencies instead of 87.

Final Dockerfile:

# Build stage
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production --ignore-scripts
COPY . .
RUN npm run build

# Production stage
FROM node:18-alpine
RUN apk add --no-cache tini
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY package*.json ./
USER node
EXPOSE 3000
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["node", "dist/index.js"]

# Build stage
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production --ignore-scripts
COPY . .
RUN npm run build

# Production stage
FROM node:18-alpine
RUN apk add --no-cache tini
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY package*.json ./
USER node
EXPOSE 3000
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["node", "dist/index.js"]

Final result: Image size: 127MB Deploy time: 40 seconds AWS transfer cost: $0.80/deploy

The Numbers

Before:

Image size: 8.2GB
Deploy time: 18 minutes
Cost per deploy: $47
Daily deploys: 6
Monthly cost: $8,460

After:

Image size: 127MB
Deploy time: 40 seconds
Cost per deploy: $0.80
Daily deploys: 6
Monthly cost: $144

Savings: $8,316/month

Same application. Same functionality. 64x smaller. 27x faster deploys.

What Actually Made the Difference

1. Multi-stage builds

Don't ship build tools to production.

Build stage: Install everything. Compile everything. Runtime stage: Copy only what's needed to run.

2. Alpine base image

node:18 = 900MB node:18-alpine = 110MB

Same Node.js. Way smaller base.

3. Dependency audit

Most projects have 50–200 npm packages.

You probably use 10–20.

The rest? Transitive dependencies you never imported.

Run npm ls --all. Be horrified. Start removing.

4. Layer optimization

Every RUN command = new layer.

Bad:

RUN apt-get update
RUN apt-get install curl
RUN apt-get install wget

RUN apt-get update
RUN apt-get install curl
RUN apt-get install wget

Good:

RUN apt-get update && apt-get install -y \
    curl \
    wget \
    && rm -rf /var/lib/apt/lists/*

RUN apt-get update && apt-get install -y \
    curl \
    wget \
    && rm -rf /var/lib/apt/lists/*

One layer. Cleaned up in the same command.

5. .dockerignore

We were copying everything.

COPY . .

COPY . .

This included:

node_modules (reinstalled anyway)
.git (900MB of history)
tests
documentation
.env files
Random files

.dockerignore:

node_modules
.git
*.md
.env*
tests
coverage
.DS_Store

node_modules
.git
*.md
.env*
tests
coverage
.DS_Store

Saved 1.2GB right there.

The Mistakes I Made

Mistake 1: Chasing the wrong metric

First attempt optimized for "fewer Dockerfile lines."

Result: Unreadable. Still 3GB.

What mattered: Final image size. Not Dockerfile elegance.

Mistake 2: Not profiling first

I guessed what was big.

Should have run docker history first.

Wasted 2 hours optimizing the wrong things.

Mistake 3: Keeping "just in case" packages

"We might need Python later." "Build-essential could be useful." "Let's keep wget."

We didn't need any of it.

Add it when you need it. Not before.

What Happened After

Week 1: Deployed 47 times. Previous record: 12 times/week.

Why? Because deploys were fast now. Developers weren't afraid to deploy.

Month 1: Found 3 bugs we'd been living with for months.

Why? Fast deploys = faster iteration = faster debugging.

Month 3: Junior dev asked, "Why is our image so small?"

I showed him the old Dockerfile.

Him: "8.2GB?! How did this ever work?"

Me: "It didn't. That's why I fixed it."

For Your Dockerfile

Here's the checklist I use now:

1. Use Alpine base images

node:18-alpine not node:18
python:3.11-alpine not python:3.11
Size difference: 700–900MB

2. Multi-stage builds

Build stage: Install everything
Runtime stage: Copy only runtime needs
Don't ship compilers to production

3. Audit dependencies

Run npm ls --all or pip list
Remove unused packages
Replace heavy packages with lightweight alternatives

4. Combine RUN commands

One RUN command for related operations
Clean up in the same layer
&& rm -rf /var/lib/apt/lists/* after apt-get

5. .dockerignore

Exclude: node_modules, .git, tests, docs
Only copy what's needed to build

6. Order matters

Copy package files first
Install dependencies
Copy source code last
(Layers cache better this way)

7. Don't install what you don't need

"Just in case" packages cost GB
Add them when needed, not before

📬 What I'm Building

I'm building ProdRescue AI — turns messy incident logs into clear postmortem reports in minutes.

Because I was tired of spending 8 hours writing reports for incidents that took 20 minutes to fix.

👉 Join the waitlist (2-min form)

ProdRescue AI | Automated Incident Reports & RCA for SRE Teams Turn messy logs and Slack threads into board-ready incident reports in 2 minutes. Evidence-backed RCA, no log…

Want to see real incident analysis? Check out this Black Friday payment system meltdown — actual production logs, AI-generated report:

📊 Black Friday SRE Case Study — Free case study: $360K revenue recovery breakdown with ultra-complex multi-region logs

Docker Production Resources

If you're wrestling with Docker in production, these helped me:

📚 Free Guides (Start Here):

🐳 Docker in Production Pack — Complete cheatsheet & troubleshooting guide. Covers image optimization, layer caching, multi-stage builds, and the exact techniques I used to shrink our image from 8.2GB to 127MB.

🔧 Kubernetes in Production Pack — Deployment, scaling & troubleshooting. Because after you fix your Docker image, you'll deploy it to K8s and discover a whole new set of problems.

📚 Paid Resources (Production Reality):

🚀 Backend Performance Rescue Kit — Find and fix the 20 bottlenecks killing your app. Includes container performance profiling, image size optimization strategies, and AWS cost reduction techniques.

🎯 Production Engineering Toolkit — Real production failures and how to prevent them. Features 7 Docker-related incidents including the "8GB image that cost $8K/month" case study.

Everything I've learned from production: devrimozcay.gumroad.com

**🥈 ProdRescue by Devrim **

help backend engineers survive production incidents without the 3 AM panic. Learn from 80+ real failures → Prevent…d*

Weekly: Real Production Engineering Stories

I write about Docker disasters, AWS cost explosions, and the messy reality of production systems every week.

Not the clean conference talk version. The 3 AM debugging version.

👉 Subscribe on Substack

— That CEO question haunted me: "What the hell is in that image?" I didn't know. That's the problem. Most developers don't know what's in their Docker images. Run docker history on your image right now. You'll be surprised.

— After we fixed this, I checked our other services. Found 4 more images over 5GB. All the same mistakes. Fixed all of them in one day. Total savings: $23K/month.

— The junior dev who asked "Why is our image so small?" now maintains our Dockerfile standards. His first PR: a 200-line document on Docker best practices. It's pinned in our engineering channel. Sometimes asking "why" is the most valuable thing you can do.

Contents