Most Node.js Dockerfiles in production are bad. Not "slightly suboptimal" bad. I mean running as root, shipping 600MB images with devDependencies baked in, no health checks, and secrets hardcoded in environment variables that anyone with docker inspect can read.

I know because I wrote those Dockerfiles. For years. They worked, so I never questioned them. Then one day a security audit flagged our container running as PID 1 root with write access to the entire filesystem, and I realized that "works" and "production-ready" are very different bars.

This is the Docker setup I now use for every Node.js project. It's not theoretical. It runs the services behind this site and several others I maintain. Every pattern here exists because I either got burned by the alternative or watched someone else get burned.

Why Your Current Dockerfile is Probably Wrong#

Let me guess what your Dockerfile looks like:

dockerfile

FROM node:20
WORKDIR /app
COPY . .
RUN npm install
EXPOSE 3000
CMD ["node", "server.js"]

This is the "hello world" of Dockerfiles. It works. It also has at least five problems that will hurt you in production.

Running as Root#

By default, the node Docker image runs as root. That means your application process has root privileges inside the container. If someone exploits a vulnerability in your app — a path traversal bug, an SSRF, a dependency with a backdoor — they have root access to the container filesystem, can modify binaries, install packages, and potentially escalate further depending on your container runtime configuration.

"But containers are isolated!" Partially. Container escapes are real. CVE-2024-21626, CVE-2019-5736 — these are real-world container breakouts. Running as non-root is a defense-in-depth measure. It costs nothing and it closes an entire class of attacks.

Installing devDependencies in Production#

npm install without flags installs everything. Your test runners, linters, build tools, type checkers — all sitting in your production image. This bloats your image by hundreds of megabytes and increases your attack surface. Every additional package is another potential vulnerability that Trivy or Snyk will flag.

COPY Everything#

COPY . . copies your entire project directory into the image. That includes .git (which can be enormous), .env files (which contain secrets), node_modules (which you're about to reinstall anyway), test files, documentation, CI configs — everything.

No Health Checks#

Without a HEALTHCHECK instruction, Docker has no idea whether your application is actually serving traffic. The process could be running but deadlocked, out of memory, or stuck in an infinite loop. Docker will report the container as "running" because the process hasn't exited. Your load balancer keeps sending traffic to a zombie container.

No Layer Caching Strategy#

Copying everything before installing dependencies means that changing a single line of source code invalidates the npm install cache. Every build reinstalls all dependencies from scratch. On a project with heavy dependencies, that's 2-3 minutes of wasted time per build.

Let's fix all of this.

Multi-Stage Builds: The Single Biggest Win#

Multi-stage builds are the most impactful change you can make to a Node.js Dockerfile. The concept is simple: use one stage to build your application, then copy only the artifacts you need into a clean, minimal final image.

Here's the difference in practice:

dockerfile

# Single stage: ~600MB
FROM node:20
WORKDIR /app
COPY . .
RUN npm install
CMD ["node", "server.js"]
 
# Multi-stage: ~150MB
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
 
FROM node:20-alpine AS runner
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./
CMD ["node", "dist/server.js"]

The builder stage has everything: full Node.js, npm, build tools, source code, devDependencies. The runner stage has only what's needed at runtime. The builder stage is discarded entirely — it doesn't end up in the final image.

Real Size Comparisons#

I measured these on an actual Express.js API project with about 40 dependencies:

Approach	Image Size
`node:20` + `npm install`	1.1 GB
`node:20-slim` + `npm install`	420 MB
`node:20-alpine` + `npm ci`	280 MB
Multi-stage + alpine + production deps only	150 MB
Multi-stage + alpine + pruned deps	95 MB

That's a 10x reduction from the naive approach. Smaller images mean faster pulls, faster deployments, and less attack surface.

Why Alpine?#

Alpine Linux uses musl libc instead of glibc, and it doesn't include a package manager cache, documentation, or most utilities you'd find in a standard Linux distribution. The base node:20-alpine image is about 50MB compared to 350MB for node:20-slim and over 1GB for the full node:20.

The tradeoff is that some npm packages with native bindings (like bcrypt, sharp, canvas) need to be compiled against musl. In most cases this just works — npm will download the correct prebuilt binary. If you hit issues, you can install build dependencies in the builder stage:

dockerfile

FROM node:20-alpine AS builder
RUN apk add --no-cache python3 make g++
# ... rest of build

These build tools only exist in the builder stage. They're not in your final image.

The Complete Production Dockerfile#

Here's the Dockerfile I use as a starting point for every Node.js project. Every line is intentional.

dockerfile

# ============================================
# Stage 1: Install dependencies
# ============================================
FROM node:20-alpine AS deps
 
# Security: create a working directory before anything else
WORKDIR /app
 
# Install dependencies based on lockfile
# Copy ONLY package files first — this is critical for layer caching
COPY package.json package-lock.json ./
 
# ci is better than install: it's faster, stricter, and reproducible
# --omit=dev excludes devDependencies from this stage
RUN npm ci --omit=dev
 
# ============================================
# Stage 2: Build the application
# ============================================
FROM node:20-alpine AS builder
 
WORKDIR /app
 
# Copy package files and install ALL dependencies (including dev)
COPY package.json package-lock.json ./
RUN npm ci
 
# NOW copy source code — changes here don't invalidate the npm ci cache
COPY . .
 
# Build the application (TypeScript compile, Next.js build, etc.)
RUN npm run build
 
# ============================================
# Stage 3: Production runner
# ============================================
FROM node:20-alpine AS runner
 
# Add labels for image metadata
LABEL maintainer="your-email@example.com"
LABEL org.opencontainers.image.source="https://github.com/yourorg/yourrepo"
 
# Security: install dumb-init for proper PID 1 signal handling
RUN apk add --no-cache dumb-init
 
# Security: set NODE_ENV before anything else
ENV NODE_ENV=production
 
# Security: use non-root user
# The node image already includes a 'node' user (uid 1000)
USER node
 
# Create app directory owned by node user
WORKDIR /app
 
# Copy production dependencies from deps stage
COPY --from=deps --chown=node:node /app/node_modules ./node_modules
 
# Copy built application from builder stage
COPY --from=deps --chown=node:node /app/package.json ./
COPY --from=builder --chown=node:node /app/dist ./dist
 
# Expose the port (documentation only — doesn't publish it)
EXPOSE 3000
 
# Health check: curl isn't available in alpine, use node
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
  CMD node -e "require('http').get('http://localhost:3000/health', (r) => { process.exit(r.statusCode === 200 ? 0 : 1) })"
 
# Use dumb-init as PID 1 to handle signals properly
ENTRYPOINT ["dumb-init", "--"]
 
# Start the application
CMD ["node", "dist/server.js"]

Let me explain the parts that aren't obvious.

Why Three Stages Instead of Two?#

The deps stage installs only production dependencies. The builder stage installs everything (including devDependencies) and builds the app. The runner stage copies production deps from deps and built code from builder.

Why not install production deps in the builder stage? Because the builder stage has devDependencies mixed in. You'd have to run npm prune --production after the build, which is slower and less reliable than having a clean production dependency install.

Why dumb-init?#

When you run node server.js in a container, Node.js becomes PID 1. PID 1 has special behavior in Linux: it doesn't receive default signal handlers. If you send SIGTERM to the container (which is what docker stop does), Node.js as PID 1 won't handle it by default. Docker waits 10 seconds, then sends SIGKILL, which immediately terminates the process without any cleanup — no graceful shutdown, no closing database connections, no finishing in-flight requests.

dumb-init acts as PID 1 and properly forwards signals to your application. Your Node.js process receives SIGTERM as expected and can shut down gracefully:

javascript

// server.js
const server = app.listen(3000);
 
process.on('SIGTERM', () => {
  console.log('SIGTERM received, shutting down gracefully');
  server.close(() => {
    console.log('HTTP server closed');
    // Close database connections, flush logs, etc.
    process.exit(0);
  });
});

An alternative is --init flag in docker run, but baking it into the image means it works regardless of how the container is started.

The .dockerignore File#

This is just as important as the Dockerfile itself. Without it, COPY . . sends everything to the Docker daemon:

# .dockerignore
node_modules
npm-debug.log*
.git
.gitignore
.env
.env.*
!.env.example
Dockerfile
docker-compose*.yml
.dockerignore
README.md
LICENSE
.github
.vscode
.idea
coverage
.nyc_output
*.test.ts
*.test.js
*.spec.ts
*.spec.js
__tests__
test
tests
docs
.husky
.eslintrc*
.prettierrc*
tsconfig.json
jest.config.*
vitest.config.*

Every file in .dockerignore is a file that won't be sent to the build context, won't end up in your image, and won't invalidate your layer cache when changed.

Layer Caching: Stop Waiting 3 Minutes Per Build#

Docker builds images in layers. Each instruction creates a layer. If a layer hasn't changed, Docker uses the cached version. But here's the critical detail: if a layer changes, all subsequent layers are invalidated.

This is why the order of instructions matters enormously.

The Wrong Order#

dockerfile

COPY . .
RUN npm ci

Every time you change any file — a single line in a single source file — Docker sees that the COPY . . layer changed. It invalidates that layer and everything after it, including npm ci. You reinstall all dependencies on every code change.

The Right Order#

dockerfile

COPY package.json package-lock.json ./
RUN npm ci
COPY . .

Now npm ci only runs when package.json or package-lock.json changes. If you only changed source code, Docker reuses the cached npm ci layer. On a project with 500+ dependencies, this saves 60-120 seconds per build.

Cache Mount for npm#

Docker BuildKit supports cache mounts that persist the npm cache between builds:

dockerfile

RUN --mount=type=cache,target=/root/.npm \
    npm ci --omit=dev

This keeps the npm download cache across builds. If a dependency was already downloaded in a previous build, npm uses the cached version instead of downloading it again. This is especially useful in CI where you're building frequently.

To use BuildKit, set the environment variable:

bash

DOCKER_BUILDKIT=1 docker build -t myapp .

Or add to your Docker daemon configuration:

json

{
  "features": {
    "buildkit": true
  }
}

Using ARG for Cache Busting#

Sometimes you need to force a layer to rebuild. For example, if you're pulling a latest tag from a registry and want to ensure you get the newest version:

dockerfile

ARG CACHE_BUST=1
RUN npm ci

Build with a unique value to bust the cache:

bash

docker build --build-arg CACHE_BUST=$(date +%s) -t myapp .

Use this sparingly. The whole point of caching is speed — only bust the cache when you have a reason.

Secrets Management: Stop Putting Secrets in Your Dockerfile#

This is one of the most common and dangerous mistakes. I see it constantly:

dockerfile

# NEVER DO THIS
ENV DATABASE_URL=postgres://user:password@db:5432/myapp
ENV API_KEY=sk-live-abc123def456

Environment variables set with ENV in a Dockerfile are baked into the image. Anyone who pulls the image can see them with docker inspect or docker history. They're also visible in every layer after they're set. Even if you unset them later, they exist in the layer history.

The Three Levels of Secrets#

1. Build-time secrets (Docker BuildKit)

If you need secrets during the build (like a private npm registry token), use BuildKit's --secret flag:

dockerfile

# syntax=docker/dockerfile:1
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
 
# Mount the secret at build time — it's never stored in the image
RUN --mount=type=secret,id=npmrc,target=/app/.npmrc \
    npm ci
 
COPY . .
RUN npm run build

Build with:

bash

docker build --secret id=npmrc,src=$HOME/.npmrc -t myapp .

The .npmrc file is available during the RUN command but is never committed to any image layer. It doesn't appear in docker history or docker inspect.

2. Runtime secrets via environment variables

For secrets your application needs at runtime, pass them when starting the container:

bash

docker run -d \
  -e DATABASE_URL="postgres://user:pass@db:5432/myapp" \
  -e API_KEY="sk-live-abc123" \
  myapp

Or with an env file:

bash

docker run -d --env-file .env.production myapp

These are visible via docker inspect on the running container, but they're not baked into the image. Anyone who pulls the image doesn't get the secrets.

3. Docker secrets (Swarm / Kubernetes)

For proper secret management in orchestrated environments:

yaml

# docker-compose.yml (Swarm mode)
version: "3.8"
services:
  api:
    image: myapp:latest
    secrets:
      - db_password
      - api_key
 
secrets:
  db_password:
    external: true
  api_key:
    external: true

Docker mounts secrets as files at /run/secrets/<secret_name>. Your application reads them from the filesystem:

javascript

import { readFileSync } from "fs";
 
function getSecret(name) {
  try {
    return readFileSync(`/run/secrets/${name}`, "utf8").trim();
  } catch {
    // Fall back to environment variable for local development
    return process.env[name.toUpperCase()];
  }
}
 
const dbPassword = getSecret("db_password");

This is the most secure approach because secrets never appear in environment variables, process listings, or container inspection output.

.env Files and Docker#

Never include .env files in your Docker image. Your .dockerignore should exclude them (which is why we listed .env and .env.* earlier). For local development with docker-compose, mount them at runtime:

yaml

services:
  api:
    env_file:
      - .env.local

Health Checks: Let Docker Know Your App is Actually Working#

A health check tells Docker whether your application is functioning correctly. Without one, Docker only knows if the process is running — not if it's actually able to handle requests.

The HEALTHCHECK Instruction#

dockerfile

HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
  CMD node -e "require('http').get('http://localhost:3000/health', (r) => { process.exit(r.statusCode === 200 ? 0 : 1) })"

Let me break down the parameters:

--interval=30s: Check every 30 seconds
--timeout=10s: If the check takes longer than 10 seconds, consider it failed
--start-period=40s: Give the app 40 seconds to start before counting failures
--retries=3: Mark unhealthy after 3 consecutive failures

Why Not Use curl?#

Alpine doesn't include curl by default. You could install it (apk add --no-cache curl), but that adds another binary to your minimal image. Using Node.js directly means zero additional dependencies.

For even lighter health checks, you can use a dedicated script:

javascript

// healthcheck.js
const http = require("http");
 
const options = {
  hostname: "localhost",
  port: 3000,
  path: "/health",
  timeout: 5000,
};
 
const req = http.request(options, (res) => {
  process.exit(res.statusCode === 200 ? 0 : 1);
});
 
req.on("error", () => process.exit(1));
req.on("timeout", () => {
  req.destroy();
  process.exit(1);
});
 
req.end();

dockerfile

COPY --chown=node:node healthcheck.js ./
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
  CMD ["node", "healthcheck.js"]

The Health Endpoint#

Your application needs a /health endpoint for the check to hit. Don't just return 200 — actually verify your app is healthy:

javascript

app.get("/health", async (req, res) => {
  const checks = {
    uptime: process.uptime(),
    timestamp: Date.now(),
    status: "ok",
  };
 
  try {
    // Check database connection
    await db.query("SELECT 1");
    checks.database = "connected";
  } catch (err) {
    checks.database = "disconnected";
    checks.status = "degraded";
  }
 
  try {
    // Check Redis connection
    await redis.ping();
    checks.redis = "connected";
  } catch (err) {
    checks.redis = "disconnected";
    checks.status = "degraded";
  }
 
  const statusCode = checks.status === "ok" ? 200 : 503;
  res.status(statusCode).json(checks);
});

A "degraded" status with a 503 tells the orchestrator to stop routing traffic to this instance while it recovers, but doesn't necessarily trigger a restart.

Why Health Checks Matter for Orchestrators#

Docker Swarm, Kubernetes, and even plain docker-compose with restart: always use health checks to make decisions:

Load balancers stop sending traffic to unhealthy containers
Rolling updates wait for the new container to be healthy before stopping the old one
Orchestrators can restart containers that become unhealthy
Deployment pipelines can verify a deployment succeeded

Without health checks, a rolling deployment might kill the old container before the new one is ready, causing downtime.

docker-compose for Development#

Your development environment should be as close to production as possible, but with the convenience of hot reload, debuggers, and instant feedback. Here's the docker-compose setup I use for development:

yaml

# docker-compose.dev.yml
services:
  app:
    build:
      context: .
      dockerfile: Dockerfile.dev
      args:
        NODE_VERSION: "20"
    ports:
      - "3000:3000"
      - "9229:9229"   # Node.js debugger
    volumes:
      # Mount source code for hot reload
      - .:/app
      # Anonymous volume to preserve node_modules from the image
      # This prevents the host's node_modules from overriding the container's
      - /app/node_modules
    environment:
      - NODE_ENV=development
      - DATABASE_URL=postgres://postgres:devpassword@db:5432/myapp_dev
      - REDIS_URL=redis://redis:6379
    env_file:
      - .env.local
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started
    command: npm run dev
 
  db:
    image: postgres:16-alpine
    ports:
      - "5432:5432"
    environment:
      POSTGRES_DB: myapp_dev
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: devpassword
    volumes:
      # Named volume for persistent data across container restarts
      - pgdata:/var/lib/postgresql/data
      # Initialization scripts
      - ./scripts/init-db.sql:/docker-entrypoint-initdb.d/init.sql
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
 
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redisdata:/data
    command: redis-server --appendonly yes
 
  # Optional: database admin UI
  adminer:
    image: adminer:latest
    ports:
      - "8080:8080"
    depends_on:
      - db
 
volumes:
  pgdata:
  redisdata:

Key Development Patterns#

Volume mounts for hot reload: The .:/app volume mount maps your local source code into the container. When you save a file, the change is immediately visible inside the container. Combined with a dev server that watches for changes (like nodemon or tsx --watch), you get instant feedback.

The node_modules trick: The anonymous volume - /app/node_modules ensures the container uses its own node_modules (installed during the image build) rather than your host's node_modules. This is critical because native modules compiled on macOS won't work inside a Linux container.

Service dependencies: depends_on with condition: service_healthy ensures the database is actually ready before your app tries to connect. Without the health check condition, depends_on only waits for the container to start — not for the service inside it to be ready.

Named volumes: pgdata and redisdata persist across container restarts. Without named volumes, you'd lose your database every time you run docker-compose down.

The Development Dockerfile#

Your development Dockerfile is simpler than production:

dockerfile

# Dockerfile.dev
ARG NODE_VERSION=20
FROM node:${NODE_VERSION}-alpine
 
WORKDIR /app
 
# Install all dependencies (including devDependencies)
COPY package*.json ./
RUN npm ci
 
# Source code is mounted via volume, not copied
# But we still need it for the initial build
COPY . .
 
EXPOSE 3000 9229
 
CMD ["npm", "run", "dev"]

No multi-stage build, no production optimization. The goal is fast iteration, not small images.

Production Docker Compose#

Production docker-compose is a different beast. Here's what I use:

yaml

# docker-compose.prod.yml
services:
  app:
    image: ghcr.io/yourorg/myapp:${TAG:-latest}
    restart: unless-stopped
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=production
    env_file:
      - .env.production
    deploy:
      resources:
        limits:
          cpus: "1.0"
          memory: 512M
        reservations:
          cpus: "0.25"
          memory: 128M
      replicas: 2
    healthcheck:
      test: ["CMD", "node", "-e", "require('http').get('http://localhost:3000/health', r => process.exit(r.statusCode === 200 ? 0 : 1))"]
      interval: 30s
      timeout: 10s
      start_period: 40s
      retries: 3
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
    networks:
      - internal
      - web
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started
 
  db:
    image: postgres:16-alpine
    restart: unless-stopped
    volumes:
      - pgdata:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: myapp
      POSTGRES_USER: ${DB_USER}
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    deploy:
      resources:
        limits:
          cpus: "1.0"
          memory: 1G
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${DB_USER}"]
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - internal
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
 
  redis:
    image: redis:7-alpine
    restart: unless-stopped
    command: >
      redis-server
      --appendonly yes
      --maxmemory 256mb
      --maxmemory-policy allkeys-lru
    volumes:
      - redisdata:/data
    deploy:
      resources:
        limits:
          cpus: "0.5"
          memory: 512M
    networks:
      - internal
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "3"
 
volumes:
  pgdata:
    driver: local
  redisdata:
    driver: local
 
networks:
  internal:
    driver: bridge
  web:
    external: true

What's Different From Development#

Restart policy: unless-stopped restarts the container automatically if it crashes, unless you explicitly stopped it. This handles the "3 AM crash" scenario. The alternative always would also restart containers you intentionally stopped, which is usually not what you want.

Resource limits: Without limits, a memory leak in your Node.js app will consume all available RAM on the host, potentially killing other containers or the host itself. Set limits based on your application's actual usage plus some headroom:

bash

# Monitor actual usage to set appropriate limits
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"

Logging configuration: Without max-size and max-file, Docker logs grow unbounded. I've seen production servers run out of disk space because of Docker logs. json-file with rotation is the simplest solution. For centralized logging, swap to fluentd or gelf driver:

yaml

logging:
  driver: "fluentd"
  options:
    fluentd-address: "localhost:24224"
    tag: "myapp.{{.Name}}"

Network isolation: The internal network is only accessible to services in this compose stack. The database and Redis are not exposed to the host or other containers. Only the app service is connected to the web network, which your reverse proxy (Nginx, Traefik) uses to route traffic.

No port mapping for databases: Notice that db and redis don't have ports in the production config. They're only accessible via the internal Docker network. In development, we expose them so we can use local tools (pgAdmin, Redis Insight). In production, there's no reason for them to be accessible from outside the Docker network.

Next.js Specific: The Standalone Output#

Next.js has a built-in Docker optimization that many people don't know about: standalone output mode. It traces your application's imports and copies only the files needed to run — no node_modules required (dependencies are bundled).

Enable it in next.config.ts:

typescript

// next.config.ts
import type { NextConfig } from "next";
 
const nextConfig: NextConfig = {
  output: "standalone",
};
 
export default nextConfig;

This changes the build output dramatically. Instead of needing the entire node_modules directory, Next.js produces a self-contained server.js in .next/standalone/ that includes only the dependencies it actually uses.

The Next.js Production Dockerfile#

This is the Dockerfile I use for Next.js projects, based on the official Vercel example but with security hardening:

dockerfile

# ============================================
# Stage 1: Install dependencies
# ============================================
FROM node:20-alpine AS deps
RUN apk add --no-cache libc6-compat
WORKDIR /app
 
COPY package.json package-lock.json ./
RUN npm ci
 
# ============================================
# Stage 2: Build the application
# ============================================
FROM node:20-alpine AS builder
WORKDIR /app
 
COPY --from=deps /app/node_modules ./node_modules
COPY . .
 
# Disable Next.js telemetry during build
ENV NEXT_TELEMETRY_DISABLED=1
 
RUN npm run build
 
# ============================================
# Stage 3: Production runner
# ============================================
FROM node:20-alpine AS runner
WORKDIR /app
 
RUN apk add --no-cache dumb-init
 
ENV NODE_ENV=production
ENV NEXT_TELEMETRY_DISABLED=1
 
# Non-root user
RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 nextjs
 
# Copy public assets
COPY --from=builder /app/public ./public
 
# Set up the standalone output directory
# Automatically leverages output traces to reduce image size
# https://nextjs.org/docs/advanced-features/output-file-tracing
RUN mkdir .next
RUN chown nextjs:nodejs .next
 
# Copy the standalone server and static files
COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static
 
USER nextjs
 
EXPOSE 3000
 
ENV PORT=3000
ENV HOSTNAME="0.0.0.0"
 
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
  CMD node -e "require('http').get('http://localhost:3000/api/health', (r) => { process.exit(r.statusCode === 200 ? 0 : 1) })"
 
ENTRYPOINT ["dumb-init", "--"]
CMD ["node", "server.js"]

Size Comparison for Next.js#

Approach	Image Size
`node:20` + full `node_modules` + `.next`	1.4 GB
`node:20-alpine` + full `node_modules` + `.next`	600 MB
`node:20-alpine` + standalone output	120 MB

The standalone output is transformative. A 1.4 GB image becomes 120 MB. Deploys that took 90 seconds to pull now take 10 seconds.

Static File Handling#

Next.js standalone mode doesn't include the public folder or the static assets from .next/static. You need to copy them explicitly (which we do in the Dockerfile above). In production, you typically want a CDN in front of these:

typescript

// next.config.ts
const nextConfig: NextConfig = {
  output: "standalone",
  assetPrefix: process.env.CDN_URL || undefined,
};

If you're not using a CDN, Next.js serves static files directly. The standalone server handles this fine — you just need to make sure the files are in the right place (which our Dockerfile ensures).

Sharp for Image Optimization#

Next.js uses sharp for image optimization. In the Alpine-based production image, you need to make sure the correct binary is available:

dockerfile

# In the runner stage, before switching to non-root user
RUN apk add --no-cache --virtual .sharp-deps vips-dev

Or better, install it as a production dependency and let npm handle the platform-specific binary:

bash

npm install sharp

The node:20-alpine image works with sharp's prebuilt linux-x64-musl binary. No special configuration needed in most cases.

Image Scanning and Security#

Building a small image with a non-root user is a good start, but it's not enough for serious production workloads. Here's how to go further.

Trivy: Scan Your Images#

Trivy is a comprehensive vulnerability scanner for container images. Run it in your CI pipeline:

bash

# Install trivy
brew install aquasecurity/trivy/trivy  # macOS
# or
apt-get install trivy  # Debian/Ubuntu
 
# Scan your image
trivy image myapp:latest

Sample output:

myapp:latest (alpine 3.19.1)
=============================
Total: 0 (UNKNOWN: 0, LOW: 0, MEDIUM: 0, HIGH: 0, CRITICAL: 0)

Node.js (node_modules/package-lock.json)
=========================================
Total: 2 (UNKNOWN: 0, LOW: 0, MEDIUM: 1, HIGH: 1, CRITICAL: 0)

┌──────────────┬────────────────┬──────────┬────────┬───────────────┐
│   Library    │ Vulnerability  │ Severity │ Status │ Fixed Version │
├──────────────┼────────────────┼──────────┼────────┼───────────────┤
│ semver       │ CVE-2022-25883 │ HIGH     │ fixed  │ 7.5.4         │
│ word-wrap    │ CVE-2023-26115 │ MEDIUM   │ fixed  │ 1.2.4         │
└──────────────┴────────────────┴──────────┴────────┴───────────────┘

Integrate it in CI to fail builds on critical vulnerabilities:

yaml

# .github/workflows/docker.yml
- name: Build image
  run: docker build -t myapp:${{ github.sha }} .
 
- name: Scan image
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: myapp:${{ github.sha }}
    exit-code: 1
    severity: CRITICAL,HIGH
    ignore-unfixed: true

Read-Only Filesystem#

You can run containers with a read-only root filesystem. This prevents an attacker from modifying binaries, installing tools, or writing malicious scripts:

bash

docker run --read-only \
  --tmpfs /tmp \
  --tmpfs /app/.next/cache \
  myapp:latest

The --tmpfs mounts provide writable temporary directories where your application legitimately needs to write (temp files, caches). Everything else is read-only.

In docker-compose:

yaml

services:
  app:
    image: myapp:latest
    read_only: true
    tmpfs:
      - /tmp
      - /app/.next/cache

Drop All Capabilities#

Linux capabilities are fine-grained permissions that replace the all-or-nothing root model. By default, Docker containers get a subset of capabilities. You can drop all of them:

bash

docker run --cap-drop=ALL myapp:latest

If your application needs to bind to a port below 1024, you'd need NET_BIND_SERVICE. But since we're using port 3000 with a non-root user, we don't need any capabilities:

yaml

services:
  app:
    image: myapp:latest
    cap_drop:
      - ALL
    security_opt:
      - no-new-privileges:true

no-new-privileges prevents the process from gaining additional privileges through setuid/setgid binaries. This is a defense-in-depth measure that costs nothing.

Pin Your Base Image Digest#

Instead of using node:20-alpine (which is a moving target), pin to a specific digest:

dockerfile

FROM node:20-alpine@sha256:abcdef123456...

Get the digest with:

bash

docker inspect --format='{{index .RepoDigests 0}}' node:20-alpine

This ensures your builds are 100% reproducible. The tradeoff is that you don't automatically get security patches to the base image. Use Dependabot or Renovate to automate digest updates:

yaml

# .github/dependabot.yml
version: 2
updates:
  - package-ecosystem: docker
    directory: "/"
    schedule:
      interval: weekly

CI/CD Integration: Putting It All Together#

Here's a complete GitHub Actions workflow that builds, scans, and pushes a Docker image:

yaml

# .github/workflows/docker.yml
name: Build and Push Docker Image
 
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
 
env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}
 
jobs:
  build:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
      security-events: write
 
    steps:
      - name: Checkout
        uses: actions/checkout@v4
 
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
 
      - name: Log in to Container Registry
        if: github.event_name != 'pull_request'
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
 
      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=sha
            type=ref,event=branch
            type=semver,pattern={{version}}
 
      - name: Build and push
        uses: docker/build-push-action@v6
        with:
          context: .
          push: ${{ github.event_name != 'pull_request' }}
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
 
      - name: Scan image with Trivy
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:sha-${{ github.sha }}
          exit-code: 1
          severity: CRITICAL,HIGH
          ignore-unfixed: true
 
      - name: Upload Trivy results
        if: always()
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: trivy-results.sarif

BuildKit Cache in CI#

The cache-from: type=gha and cache-to: type=gha,mode=max lines use GitHub Actions cache as a Docker layer cache. This means your CI builds benefit from layer caching across runs. First build takes 5 minutes; subsequent builds with only code changes take 30 seconds.

Common Pitfalls and How to Avoid Them#

The node_modules Inside the Image vs Host Conflict#

If you volume-mount your project directory into a container (-v .:/app), the host's node_modules overrides the container's. Native modules compiled on macOS won't work in Linux. Always use the anonymous volume trick:

yaml

volumes:
  - .:/app
  - /app/node_modules  # preserves container's node_modules

SIGTERM Handling in TypeScript Projects#

If you're running TypeScript with tsx or ts-node in development, signal handling works normally. But in production, if you're using the compiled JavaScript with node, make sure your compiled output preserves the signal handlers. Some build tools optimize away "unused" code.

Memory Limits and Node.js#

Node.js doesn't automatically respect Docker memory limits. If your container has a 512MB memory limit, Node.js will still try to use its default heap size (around 1.5 GB on 64-bit systems). Set the max old space size:

dockerfile

CMD ["node", "--max-old-space-size=384", "dist/server.js"]

Leave about 25% headroom between the Node.js heap limit and the container memory limit for non-heap memory (buffers, native code, etc.).

Or use the automatic detection flag:

dockerfile

ENV NODE_OPTIONS="--max-old-space-size=384"

Timezone Issues#

Alpine uses UTC by default. If your application depends on a specific timezone:

dockerfile

RUN apk add --no-cache tzdata
ENV TZ=America/New_York

But better: write timezone-agnostic code. Store everything in UTC. Convert to local time only at the presentation layer.

Build Arguments vs Environment Variables#

ARG is available only during build. It doesn't persist in the final image (unless you copy it to ENV).
ENV persists in the image and is available at runtime.

dockerfile

# Build-time configuration
ARG NODE_VERSION=20
FROM node:${NODE_VERSION}-alpine
 
# Runtime configuration
ENV PORT=3000
 
# WRONG: This makes the secret visible in the image
ARG API_KEY
ENV API_KEY=${API_KEY}
 
# RIGHT: Pass secrets at runtime
# docker run -e API_KEY=secret myapp

Monitoring in Production#

Your Docker setup isn't complete without observability. Here's a minimal but effective monitoring stack:

yaml

# docker-compose.monitoring.yml
services:
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    ports:
      - "9090:9090"
    networks:
      - internal
 
  grafana:
    image: grafana/grafana:latest
    volumes:
      - grafana_data:/var/lib/grafana
    ports:
      - "3001:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
    networks:
      - internal
 
volumes:
  prometheus_data:
  grafana_data:

Expose metrics from your Node.js app using prom-client:

javascript

import { collectDefaultMetrics, Registry, Histogram } from "prom-client";
 
const register = new Registry();
collectDefaultMetrics({ register });
 
const httpRequestDuration = new Histogram({
  name: "http_request_duration_seconds",
  help: "Duration of HTTP requests in seconds",
  labelNames: ["method", "route", "status_code"],
  buckets: [0.01, 0.05, 0.1, 0.5, 1, 5],
  registers: [register],
});
 
// Middleware
app.use((req, res, next) => {
  const end = httpRequestDuration.startTimer();
  res.on("finish", () => {
    end({ method: req.method, route: req.route?.path || req.path, status_code: res.statusCode });
  });
  next();
});
 
// Metrics endpoint
app.get("/metrics", async (req, res) => {
  res.set("Content-Type", register.contentType);
  res.end(await register.metrics());
});

The Checklist#

Before you ship a containerized Node.js app to production, verify:

Most of these are one-time setup. Do it once, template it, and every new project starts with a production-ready container from day one.

Docker isn't complicated. But the gap between a "working" Dockerfile and a production-ready one is wider than most people think. The patterns in this guide close that gap. Use them, adapt them, and stop deploying root containers with 1GB images and no health checks. Your future self — the one getting paged at 3 AM — will thank you.

Why Your Current Dockerfile is Probably Wrong#

Let me guess what your Dockerfile looks like:

dockerfile

FROM node:20
WORKDIR /app
COPY . .
RUN npm install
EXPOSE 3000
CMD ["node", "server.js"]

This is the "hello world" of Dockerfiles. It works. It also has at least five problems that will hurt you in production.

Running as Root#

Installing devDependencies in Production#

COPY Everything#

No Health Checks#

No Layer Caching Strategy#

Let's fix all of this.

Multi-Stage Builds: The Single Biggest Win#

Here's the difference in practice:

dockerfile

# Single stage: ~600MB
FROM node:20
WORKDIR /app
COPY . .
RUN npm install
CMD ["node", "server.js"]
 
# Multi-stage: ~150MB
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
 
FROM node:20-alpine AS runner
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./
CMD ["node", "dist/server.js"]

Real Size Comparisons#

I measured these on an actual Express.js API project with about 40 dependencies:

Approach	Image Size
`node:20` + `npm install`	1.1 GB
`node:20-slim` + `npm install`	420 MB
`node:20-alpine` + `npm ci`	280 MB
Multi-stage + alpine + production deps only	150 MB
Multi-stage + alpine + pruned deps	95 MB

That's a 10x reduction from the naive approach. Smaller images mean faster pulls, faster deployments, and less attack surface.

Why Alpine?#

dockerfile

FROM node:20-alpine AS builder
RUN apk add --no-cache python3 make g++
# ... rest of build

These build tools only exist in the builder stage. They're not in your final image.

The Complete Production Dockerfile#

Here's the Dockerfile I use as a starting point for every Node.js project. Every line is intentional.

dockerfile

# ============================================
# Stage 1: Install dependencies
# ============================================
FROM node:20-alpine AS deps
 
# Security: create a working directory before anything else
WORKDIR /app
 
# Install dependencies based on lockfile
# Copy ONLY package files first — this is critical for layer caching
COPY package.json package-lock.json ./
 
# ci is better than install: it's faster, stricter, and reproducible
# --omit=dev excludes devDependencies from this stage
RUN npm ci --omit=dev
 
# ============================================
# Stage 2: Build the application
# ============================================
FROM node:20-alpine AS builder
 
WORKDIR /app
 
# Copy package files and install ALL dependencies (including dev)
COPY package.json package-lock.json ./
RUN npm ci
 
# NOW copy source code — changes here don't invalidate the npm ci cache
COPY . .
 
# Build the application (TypeScript compile, Next.js build, etc.)
RUN npm run build
 
# ============================================
# Stage 3: Production runner
# ============================================
FROM node:20-alpine AS runner
 
# Add labels for image metadata
LABEL maintainer="your-email@example.com"
LABEL org.opencontainers.image.source="https://github.com/yourorg/yourrepo"
 
# Security: install dumb-init for proper PID 1 signal handling
RUN apk add --no-cache dumb-init
 
# Security: set NODE_ENV before anything else
ENV NODE_ENV=production
 
# Security: use non-root user
# The node image already includes a 'node' user (uid 1000)
USER node
 
# Create app directory owned by node user
WORKDIR /app
 
# Copy production dependencies from deps stage
COPY --from=deps --chown=node:node /app/node_modules ./node_modules
 
# Copy built application from builder stage
COPY --from=deps --chown=node:node /app/package.json ./
COPY --from=builder --chown=node:node /app/dist ./dist
 
# Expose the port (documentation only — doesn't publish it)
EXPOSE 3000
 
# Health check: curl isn't available in alpine, use node
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
  CMD node -e "require('http').get('http://localhost:3000/health', (r) => { process.exit(r.statusCode === 200 ? 0 : 1) })"
 
# Use dumb-init as PID 1 to handle signals properly
ENTRYPOINT ["dumb-init", "--"]
 
# Start the application
CMD ["node", "dist/server.js"]

Let me explain the parts that aren't obvious.

Why Three Stages Instead of Two?#

Why dumb-init?#

dumb-init acts as PID 1 and properly forwards signals to your application. Your Node.js process receives SIGTERM as expected and can shut down gracefully:

javascript

// server.js
const server = app.listen(3000);
 
process.on('SIGTERM', () => {
  console.log('SIGTERM received, shutting down gracefully');
  server.close(() => {
    console.log('HTTP server closed');
    // Close database connections, flush logs, etc.
    process.exit(0);
  });
});

An alternative is --init flag in docker run, but baking it into the image means it works regardless of how the container is started.

The .dockerignore File#

This is just as important as the Dockerfile itself. Without it, COPY . . sends everything to the Docker daemon:

# .dockerignore
node_modules
npm-debug.log*
.git
.gitignore
.env
.env.*
!.env.example
Dockerfile
docker-compose*.yml
.dockerignore
README.md
LICENSE
.github
.vscode
.idea
coverage
.nyc_output
*.test.ts
*.test.js
*.spec.ts
*.spec.js
__tests__
test
tests
docs
.husky
.eslintrc*
.prettierrc*
tsconfig.json
jest.config.*
vitest.config.*

Every file in .dockerignore is a file that won't be sent to the build context, won't end up in your image, and won't invalidate your layer cache when changed.

Layer Caching: Stop Waiting 3 Minutes Per Build#

This is why the order of instructions matters enormously.

The Wrong Order#

dockerfile

COPY . .
RUN npm ci

The Right Order#

dockerfile

COPY package.json package-lock.json ./
RUN npm ci
COPY . .

Cache Mount for npm#

Docker BuildKit supports cache mounts that persist the npm cache between builds:

dockerfile

RUN --mount=type=cache,target=/root/.npm \
    npm ci --omit=dev

To use BuildKit, set the environment variable:

bash

DOCKER_BUILDKIT=1 docker build -t myapp .

Or add to your Docker daemon configuration:

json

{
  "features": {
    "buildkit": true
  }
}

Using ARG for Cache Busting#

Sometimes you need to force a layer to rebuild. For example, if you're pulling a latest tag from a registry and want to ensure you get the newest version:

dockerfile

ARG CACHE_BUST=1
RUN npm ci

Build with a unique value to bust the cache:

bash

docker build --build-arg CACHE_BUST=$(date +%s) -t myapp .

Use this sparingly. The whole point of caching is speed — only bust the cache when you have a reason.

Secrets Management: Stop Putting Secrets in Your Dockerfile#

This is one of the most common and dangerous mistakes. I see it constantly:

dockerfile

# NEVER DO THIS
ENV DATABASE_URL=postgres://user:password@db:5432/myapp
ENV API_KEY=sk-live-abc123def456

The Three Levels of Secrets#

1. Build-time secrets (Docker BuildKit)

If you need secrets during the build (like a private npm registry token), use BuildKit's --secret flag:

dockerfile

# syntax=docker/dockerfile:1
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
 
# Mount the secret at build time — it's never stored in the image
RUN --mount=type=secret,id=npmrc,target=/app/.npmrc \
    npm ci
 
COPY . .
RUN npm run build

Build with:

bash

docker build --secret id=npmrc,src=$HOME/.npmrc -t myapp .

The .npmrc file is available during the RUN command but is never committed to any image layer. It doesn't appear in docker history or docker inspect.

2. Runtime secrets via environment variables

For secrets your application needs at runtime, pass them when starting the container:

bash

docker run -d \
  -e DATABASE_URL="postgres://user:pass@db:5432/myapp" \
  -e API_KEY="sk-live-abc123" \
  myapp

Or with an env file:

bash

docker run -d --env-file .env.production myapp

These are visible via docker inspect on the running container, but they're not baked into the image. Anyone who pulls the image doesn't get the secrets.

3. Docker secrets (Swarm / Kubernetes)

For proper secret management in orchestrated environments:

yaml

# docker-compose.yml (Swarm mode)
version: "3.8"
services:
  api:
    image: myapp:latest
    secrets:
      - db_password
      - api_key
 
secrets:
  db_password:
    external: true
  api_key:
    external: true

Docker mounts secrets as files at /run/secrets/<secret_name>. Your application reads them from the filesystem:

javascript

import { readFileSync } from "fs";
 
function getSecret(name) {
  try {
    return readFileSync(`/run/secrets/${name}`, "utf8").trim();
  } catch {
    // Fall back to environment variable for local development
    return process.env[name.toUpperCase()];
  }
}
 
const dbPassword = getSecret("db_password");

This is the most secure approach because secrets never appear in environment variables, process listings, or container inspection output.

.env Files and Docker#

yaml

services:
  api:
    env_file:
      - .env.local

Health Checks: Let Docker Know Your App is Actually Working#

A health check tells Docker whether your application is functioning correctly. Without one, Docker only knows if the process is running — not if it's actually able to handle requests.

The HEALTHCHECK Instruction#

dockerfile

HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
  CMD node -e "require('http').get('http://localhost:3000/health', (r) => { process.exit(r.statusCode === 200 ? 0 : 1) })"

Let me break down the parameters:

--interval=30s: Check every 30 seconds
--timeout=10s: If the check takes longer than 10 seconds, consider it failed
--start-period=40s: Give the app 40 seconds to start before counting failures
--retries=3: Mark unhealthy after 3 consecutive failures

Why Not Use curl?#

For even lighter health checks, you can use a dedicated script:

javascript

// healthcheck.js
const http = require("http");
 
const options = {
  hostname: "localhost",
  port: 3000,
  path: "/health",
  timeout: 5000,
};
 
const req = http.request(options, (res) => {
  process.exit(res.statusCode === 200 ? 0 : 1);
});
 
req.on("error", () => process.exit(1));
req.on("timeout", () => {
  req.destroy();
  process.exit(1);
});
 
req.end();

dockerfile

COPY --chown=node:node healthcheck.js ./
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
  CMD ["node", "healthcheck.js"]

The Health Endpoint#

Your application needs a /health endpoint for the check to hit. Don't just return 200 — actually verify your app is healthy:

javascript

app.get("/health", async (req, res) => {
  const checks = {
    uptime: process.uptime(),
    timestamp: Date.now(),
    status: "ok",
  };
 
  try {
    // Check database connection
    await db.query("SELECT 1");
    checks.database = "connected";
  } catch (err) {
    checks.database = "disconnected";
    checks.status = "degraded";
  }
 
  try {
    // Check Redis connection
    await redis.ping();
    checks.redis = "connected";
  } catch (err) {
    checks.redis = "disconnected";
    checks.status = "degraded";
  }
 
  const statusCode = checks.status === "ok" ? 200 : 503;
  res.status(statusCode).json(checks);
});

A "degraded" status with a 503 tells the orchestrator to stop routing traffic to this instance while it recovers, but doesn't necessarily trigger a restart.

Why Health Checks Matter for Orchestrators#

Docker Swarm, Kubernetes, and even plain docker-compose with restart: always use health checks to make decisions:

Load balancers stop sending traffic to unhealthy containers
Rolling updates wait for the new container to be healthy before stopping the old one
Orchestrators can restart containers that become unhealthy
Deployment pipelines can verify a deployment succeeded

Without health checks, a rolling deployment might kill the old container before the new one is ready, causing downtime.

docker-compose for Development#

yaml

# docker-compose.dev.yml
services:
  app:
    build:
      context: .
      dockerfile: Dockerfile.dev
      args:
        NODE_VERSION: "20"
    ports:
      - "3000:3000"
      - "9229:9229"   # Node.js debugger
    volumes:
      # Mount source code for hot reload
      - .:/app
      # Anonymous volume to preserve node_modules from the image
      # This prevents the host's node_modules from overriding the container's
      - /app/node_modules
    environment:
      - NODE_ENV=development
      - DATABASE_URL=postgres://postgres:devpassword@db:5432/myapp_dev
      - REDIS_URL=redis://redis:6379
    env_file:
      - .env.local
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started
    command: npm run dev
 
  db:
    image: postgres:16-alpine
    ports:
      - "5432:5432"
    environment:
      POSTGRES_DB: myapp_dev
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: devpassword
    volumes:
      # Named volume for persistent data across container restarts
      - pgdata:/var/lib/postgresql/data
      # Initialization scripts
      - ./scripts/init-db.sql:/docker-entrypoint-initdb.d/init.sql
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
 
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redisdata:/data
    command: redis-server --appendonly yes
 
  # Optional: database admin UI
  adminer:
    image: adminer:latest
    ports:
      - "8080:8080"
    depends_on:
      - db
 
volumes:
  pgdata:
  redisdata:

Key Development Patterns#

Named volumes: pgdata and redisdata persist across container restarts. Without named volumes, you'd lose your database every time you run docker-compose down.

The Development Dockerfile#

Your development Dockerfile is simpler than production:

dockerfile

# Dockerfile.dev
ARG NODE_VERSION=20
FROM node:${NODE_VERSION}-alpine
 
WORKDIR /app
 
# Install all dependencies (including devDependencies)
COPY package*.json ./
RUN npm ci
 
# Source code is mounted via volume, not copied
# But we still need it for the initial build
COPY . .
 
EXPOSE 3000 9229
 
CMD ["npm", "run", "dev"]

No multi-stage build, no production optimization. The goal is fast iteration, not small images.

Production Docker Compose#

Production docker-compose is a different beast. Here's what I use:

yaml

# docker-compose.prod.yml
services:
  app:
    image: ghcr.io/yourorg/myapp:${TAG:-latest}
    restart: unless-stopped
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=production
    env_file:
      - .env.production
    deploy:
      resources:
        limits:
          cpus: "1.0"
          memory: 512M
        reservations:
          cpus: "0.25"
          memory: 128M
      replicas: 2
    healthcheck:
      test: ["CMD", "node", "-e", "require('http').get('http://localhost:3000/health', r => process.exit(r.statusCode === 200 ? 0 : 1))"]
      interval: 30s
      timeout: 10s
      start_period: 40s
      retries: 3
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
    networks:
      - internal
      - web
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started
 
  db:
    image: postgres:16-alpine
    restart: unless-stopped
    volumes:
      - pgdata:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: myapp
      POSTGRES_USER: ${DB_USER}
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    deploy:
      resources:
        limits:
          cpus: "1.0"
          memory: 1G
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${DB_USER}"]
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - internal
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
 
  redis:
    image: redis:7-alpine
    restart: unless-stopped
    command: >
      redis-server
      --appendonly yes
      --maxmemory 256mb
      --maxmemory-policy allkeys-lru
    volumes:
      - redisdata:/data
    deploy:
      resources:
        limits:
          cpus: "0.5"
          memory: 512M
    networks:
      - internal
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "3"
 
volumes:
  pgdata:
    driver: local
  redisdata:
    driver: local
 
networks:
  internal:
    driver: bridge
  web:
    external: true

What's Different From Development#

bash

# Monitor actual usage to set appropriate limits
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"

yaml

logging:
  driver: "fluentd"
  options:
    fluentd-address: "localhost:24224"
    tag: "myapp.{{.Name}}"

Next.js Specific: The Standalone Output#

Enable it in next.config.ts:

typescript

// next.config.ts
import type { NextConfig } from "next";
 
const nextConfig: NextConfig = {
  output: "standalone",
};
 
export default nextConfig;

The Next.js Production Dockerfile#

This is the Dockerfile I use for Next.js projects, based on the official Vercel example but with security hardening:

dockerfile

# ============================================
# Stage 1: Install dependencies
# ============================================
FROM node:20-alpine AS deps
RUN apk add --no-cache libc6-compat
WORKDIR /app
 
COPY package.json package-lock.json ./
RUN npm ci
 
# ============================================
# Stage 2: Build the application
# ============================================
FROM node:20-alpine AS builder
WORKDIR /app
 
COPY --from=deps /app/node_modules ./node_modules
COPY . .
 
# Disable Next.js telemetry during build
ENV NEXT_TELEMETRY_DISABLED=1
 
RUN npm run build
 
# ============================================
# Stage 3: Production runner
# ============================================
FROM node:20-alpine AS runner
WORKDIR /app
 
RUN apk add --no-cache dumb-init
 
ENV NODE_ENV=production
ENV NEXT_TELEMETRY_DISABLED=1
 
# Non-root user
RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 nextjs
 
# Copy public assets
COPY --from=builder /app/public ./public
 
# Set up the standalone output directory
# Automatically leverages output traces to reduce image size
# https://nextjs.org/docs/advanced-features/output-file-tracing
RUN mkdir .next
RUN chown nextjs:nodejs .next
 
# Copy the standalone server and static files
COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static
 
USER nextjs
 
EXPOSE 3000
 
ENV PORT=3000
ENV HOSTNAME="0.0.0.0"
 
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
  CMD node -e "require('http').get('http://localhost:3000/api/health', (r) => { process.exit(r.statusCode === 200 ? 0 : 1) })"
 
ENTRYPOINT ["dumb-init", "--"]
CMD ["node", "server.js"]

Size Comparison for Next.js#

Approach	Image Size
`node:20` + full `node_modules` + `.next`	1.4 GB
`node:20-alpine` + full `node_modules` + `.next`	600 MB
`node:20-alpine` + standalone output	120 MB

The standalone output is transformative. A 1.4 GB image becomes 120 MB. Deploys that took 90 seconds to pull now take 10 seconds.

Static File Handling#

typescript

// next.config.ts
const nextConfig: NextConfig = {
  output: "standalone",
  assetPrefix: process.env.CDN_URL || undefined,
};

Sharp for Image Optimization#

Next.js uses sharp for image optimization. In the Alpine-based production image, you need to make sure the correct binary is available:

dockerfile

# In the runner stage, before switching to non-root user
RUN apk add --no-cache --virtual .sharp-deps vips-dev

Or better, install it as a production dependency and let npm handle the platform-specific binary:

bash

npm install sharp

The node:20-alpine image works with sharp's prebuilt linux-x64-musl binary. No special configuration needed in most cases.

Image Scanning and Security#

Building a small image with a non-root user is a good start, but it's not enough for serious production workloads. Here's how to go further.

Trivy: Scan Your Images#

Trivy is a comprehensive vulnerability scanner for container images. Run it in your CI pipeline:

bash

# Install trivy
brew install aquasecurity/trivy/trivy  # macOS
# or
apt-get install trivy  # Debian/Ubuntu
 
# Scan your image
trivy image myapp:latest

Sample output:

myapp:latest (alpine 3.19.1)
=============================
Total: 0 (UNKNOWN: 0, LOW: 0, MEDIUM: 0, HIGH: 0, CRITICAL: 0)

Node.js (node_modules/package-lock.json)
=========================================
Total: 2 (UNKNOWN: 0, LOW: 0, MEDIUM: 1, HIGH: 1, CRITICAL: 0)

┌──────────────┬────────────────┬──────────┬────────┬───────────────┐
│   Library    │ Vulnerability  │ Severity │ Status │ Fixed Version │
├──────────────┼────────────────┼──────────┼────────┼───────────────┤
│ semver       │ CVE-2022-25883 │ HIGH     │ fixed  │ 7.5.4         │
│ word-wrap    │ CVE-2023-26115 │ MEDIUM   │ fixed  │ 1.2.4         │
└──────────────┴────────────────┴──────────┴────────┴───────────────┘

Integrate it in CI to fail builds on critical vulnerabilities:

yaml

# .github/workflows/docker.yml
- name: Build image
  run: docker build -t myapp:${{ github.sha }} .
 
- name: Scan image
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: myapp:${{ github.sha }}
    exit-code: 1
    severity: CRITICAL,HIGH
    ignore-unfixed: true

Read-Only Filesystem#

You can run containers with a read-only root filesystem. This prevents an attacker from modifying binaries, installing tools, or writing malicious scripts:

bash

docker run --read-only \
  --tmpfs /tmp \
  --tmpfs /app/.next/cache \
  myapp:latest

The --tmpfs mounts provide writable temporary directories where your application legitimately needs to write (temp files, caches). Everything else is read-only.

In docker-compose:

yaml

services:
  app:
    image: myapp:latest
    read_only: true
    tmpfs:
      - /tmp
      - /app/.next/cache

Drop All Capabilities#

Linux capabilities are fine-grained permissions that replace the all-or-nothing root model. By default, Docker containers get a subset of capabilities. You can drop all of them:

bash

docker run --cap-drop=ALL myapp:latest

If your application needs to bind to a port below 1024, you'd need NET_BIND_SERVICE. But since we're using port 3000 with a non-root user, we don't need any capabilities:

yaml

services:
  app:
    image: myapp:latest
    cap_drop:
      - ALL
    security_opt:
      - no-new-privileges:true

no-new-privileges prevents the process from gaining additional privileges through setuid/setgid binaries. This is a defense-in-depth measure that costs nothing.

Pin Your Base Image Digest#

Instead of using node:20-alpine (which is a moving target), pin to a specific digest:

dockerfile

FROM node:20-alpine@sha256:abcdef123456...

Get the digest with:

bash

docker inspect --format='{{index .RepoDigests 0}}' node:20-alpine

This ensures your builds are 100% reproducible. The tradeoff is that you don't automatically get security patches to the base image. Use Dependabot or Renovate to automate digest updates:

yaml

# .github/dependabot.yml
version: 2
updates:
  - package-ecosystem: docker
    directory: "/"
    schedule:
      interval: weekly

CI/CD Integration: Putting It All Together#

Here's a complete GitHub Actions workflow that builds, scans, and pushes a Docker image:

yaml

# .github/workflows/docker.yml
name: Build and Push Docker Image
 
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
 
env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}
 
jobs:
  build:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
      security-events: write
 
    steps:
      - name: Checkout
        uses: actions/checkout@v4
 
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
 
      - name: Log in to Container Registry
        if: github.event_name != 'pull_request'
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
 
      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=sha
            type=ref,event=branch
            type=semver,pattern={{version}}
 
      - name: Build and push
        uses: docker/build-push-action@v6
        with:
          context: .
          push: ${{ github.event_name != 'pull_request' }}
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
 
      - name: Scan image with Trivy
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:sha-${{ github.sha }}
          exit-code: 1
          severity: CRITICAL,HIGH
          ignore-unfixed: true
 
      - name: Upload Trivy results
        if: always()
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: trivy-results.sarif

BuildKit Cache in CI#

Common Pitfalls and How to Avoid Them#

The node_modules Inside the Image vs Host Conflict#

yaml

volumes:
  - .:/app
  - /app/node_modules  # preserves container's node_modules

SIGTERM Handling in TypeScript Projects#

Memory Limits and Node.js#

dockerfile

CMD ["node", "--max-old-space-size=384", "dist/server.js"]

Leave about 25% headroom between the Node.js heap limit and the container memory limit for non-heap memory (buffers, native code, etc.).

Or use the automatic detection flag:

dockerfile

ENV NODE_OPTIONS="--max-old-space-size=384"

Timezone Issues#

Alpine uses UTC by default. If your application depends on a specific timezone:

dockerfile

RUN apk add --no-cache tzdata
ENV TZ=America/New_York

But better: write timezone-agnostic code. Store everything in UTC. Convert to local time only at the presentation layer.

Build Arguments vs Environment Variables#

ARG is available only during build. It doesn't persist in the final image (unless you copy it to ENV).
ENV persists in the image and is available at runtime.

dockerfile

# Build-time configuration
ARG NODE_VERSION=20
FROM node:${NODE_VERSION}-alpine
 
# Runtime configuration
ENV PORT=3000
 
# WRONG: This makes the secret visible in the image
ARG API_KEY
ENV API_KEY=${API_KEY}
 
# RIGHT: Pass secrets at runtime
# docker run -e API_KEY=secret myapp

Monitoring in Production#

Your Docker setup isn't complete without observability. Here's a minimal but effective monitoring stack:

yaml

# docker-compose.monitoring.yml
services:
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    ports:
      - "9090:9090"
    networks:
      - internal
 
  grafana:
    image: grafana/grafana:latest
    volumes:
      - grafana_data:/var/lib/grafana
    ports:
      - "3001:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
    networks:
      - internal
 
volumes:
  prometheus_data:
  grafana_data:

Expose metrics from your Node.js app using prom-client:

javascript

import { collectDefaultMetrics, Registry, Histogram } from "prom-client";
 
const register = new Registry();
collectDefaultMetrics({ register });
 
const httpRequestDuration = new Histogram({
  name: "http_request_duration_seconds",
  help: "Duration of HTTP requests in seconds",
  labelNames: ["method", "route", "status_code"],
  buckets: [0.01, 0.05, 0.1, 0.5, 1, 5],
  registers: [register],
});
 
// Middleware
app.use((req, res, next) => {
  const end = httpRequestDuration.startTimer();
  res.on("finish", () => {
    end({ method: req.method, route: req.route?.path || req.path, status_code: res.statusCode });
  });
  next();
});
 
// Metrics endpoint
app.get("/metrics", async (req, res) => {
  res.set("Content-Type", register.contentType);
  res.end(await register.metrics());
});

The Checklist#

Before you ship a containerized Node.js app to production, verify:

Most of these are one-time setup. Do it once, template it, and every new project starts with a production-ready container from day one.

Why Your Current Dockerfile is Probably Wrong#

Running as Root#

Installing devDependencies in Production#

COPY Everything#

No Health Checks#

No Layer Caching Strategy#

Multi-Stage Builds: The Single Biggest Win#

Real Size Comparisons#

Why Alpine?#

The Complete Production Dockerfile#

Why Three Stages Instead of Two?#

Why dumb-init?#

The .dockerignore File#

Layer Caching: Stop Waiting 3 Minutes Per Build#

The Wrong Order#

The Right Order#

Cache Mount for npm#

Using ARG for Cache Busting#

Secrets Management: Stop Putting Secrets in Your Dockerfile#

The Three Levels of Secrets#

.env Files and Docker#

Health Checks: Let Docker Know Your App is Actually Working#

The HEALTHCHECK Instruction#

Why Not Use curl?#

The Health Endpoint#

Why Health Checks Matter for Orchestrators#

docker-compose for Development#

Key Development Patterns#

The Development Dockerfile#

Production Docker Compose#

What's Different From Development#

Next.js Specific: The Standalone Output#

The Next.js Production Dockerfile#

Size Comparison for Next.js#

Static File Handling#

Sharp for Image Optimization#

Image Scanning and Security#

Trivy: Scan Your Images#

Read-Only Filesystem#

Drop All Capabilities#

Pin Your Base Image Digest#

CI/CD Integration: Putting It All Together#

BuildKit Cache in CI#

Common Pitfalls and How to Avoid Them#

The node_modules Inside the Image vs Host Conflict#

SIGTERM Handling in TypeScript Projects#

Memory Limits and Node.js#

Timezone Issues#

Build Arguments vs Environment Variables#

Monitoring in Production#

The Checklist#

مقالات ذات صلة

JavaScript Minifier Guide for Performance and Deployment

YAML Validator Guide for Configuration Debugging

Why Your Current Dockerfile is Probably Wrong#

Running as Root#

Installing devDependencies in Production#

COPY Everything#

No Health Checks#

No Layer Caching Strategy#

Multi-Stage Builds: The Single Biggest Win#

Real Size Comparisons#

Why Alpine?#

The Complete Production Dockerfile#

Why Three Stages Instead of Two?#

Why dumb-init?#

The .dockerignore File#

Layer Caching: Stop Waiting 3 Minutes Per Build#

The Wrong Order#

The Right Order#

Cache Mount for npm#

Using ARG for Cache Busting#

Secrets Management: Stop Putting Secrets in Your Dockerfile#

The Three Levels of Secrets#

.env Files and Docker#

Health Checks: Let Docker Know Your App is Actually Working#

The HEALTHCHECK Instruction#

Why Not Use curl?#

The Health Endpoint#

Why Health Checks Matter for Orchestrators#