GitHub Actions CI/CD: Zero-Downtime Deployments जो सच में काम करते हैं

मैंने जिस भी project पर काम किया है, वो eventually उसी inflection point पर पहुंचता है: deploy process manually करना बहुत painful हो जाता है। आप tests run करना भूल जाते हैं। Locally build करते हैं लेकिन version bump करना भूल जाते हैं। Production में SSH करते हैं और पता चलता है कि पिछले deploy करने वाले ने stale .env file छोड़ दी।

GitHub Actions ने दो साल पहले मेरे लिए यह solve किया। पहले दिन perfectly नहीं — पहला workflow जो मैंने लिखा वो 200-line YAML nightmare था जो आधे time timeout होता था और कुछ cache नहीं करता था। लेकिन iteration दर iteration, मैं कुछ ऐसे पर पहुंचा जो इस site को reliably deploy करता है, zero downtime के साथ, चार minute से कम में।

यही वो workflow है, section दर section explained। Docs version नहीं। वो version जो production से टकराने पर भी survive करता है।

Building Blocks समझना#

पूरी pipeline में जाने से पहले, आपको clear mental model चाहिए कि GitHub Actions कैसे काम करता है। अगर आपने Jenkins या CircleCI use किया है, तो जो जानते हैं वो भूल जाइए। Concepts loosely map होते हैं, लेकिन execution model इतना अलग है कि trip कर सकता है।

Triggers: आपका Workflow कब Run होता है#

yaml

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  schedule:
    - cron: "0 6 * * 1" # Every Monday at 6 AM UTC
  workflow_dispatch:
    inputs:
      environment:
        description: "Target environment"
        required: true
        default: "staging"
        type: choice
        options:
          - staging
          - production

चार triggers, हर एक अलग purpose serve करता है:

push main पर आपका production deploy trigger है। Code merge हुआ? Ship करो।
pull_request हर PR पर आपके CI checks run करता है। यहां lint, type checks, और tests रहते हैं।
schedule आपके repo के लिए cron है। मैं इसे weekly dependency audit scans और stale cache cleanup के लिए use करता हूं।
workflow_dispatch आपको GitHub UI में input parameters के साथ manual "Deploy" button देता है। तब invaluable जब आपको बिना code change के staging deploy करना हो — शायद आपने environment variable update किया या base Docker image re-pull करनी है।

एक बात जो लोगों को काटती है: pull_request merge commit के against run होता है, PR branch HEAD के नहीं। यानी आपकी CI test कर रही है कि code merge के बाद कैसा दिखेगा। यह actually वही है जो आप चाहते हैं, लेकिन लोगों को surprise होता है जब green branch rebase के बाद red हो जाती है।

Jobs, Steps, और Runners#

yaml

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: "pnpm"
      - run: pnpm install --frozen-lockfile
      - run: pnpm lint

Jobs default से parallel run होते हैं। हर job को fresh VM (the "runner") मिलता है। ubuntu-latest reasonably beefy machine देता है — 2026 में 4 vCPUs, 16 GB RAM। Public repos के लिए free, private के लिए 2000 minutes/month।

Steps job के अंदर sequentially run होते हैं। हर uses: step marketplace से reusable action pull करता है। हर run: step shell command execute करता है।

--frozen-lockfile flag crucial है। इसके बिना, pnpm install CI में आपकी lockfile update कर सकता है, मतलब आप same dependencies test नहीं कर रहे जो developer ने commit की थीं। मैंने ऐसे phantom test failures देखे हैं जो locally गायब हो जाते हैं क्योंकि developer की machine पर lockfile पहले से correct है।

Environment Variables vs Secrets#

yaml

env:
  NODE_ENV: production
  NEXT_TELEMETRY_DISABLED: 1
 
jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: production
    steps:
      - name: Deploy
        env:
          SSH_PRIVATE_KEY: ${{ secrets.SSH_PRIVATE_KEY }}
          DEPLOY_HOST: ${{ secrets.DEPLOY_HOST }}
        run: |
          echo "$SSH_PRIVATE_KEY" > key.pem
          chmod 600 key.pem
          ssh -i key.pem deploy@$DEPLOY_HOST "cd /var/www/app && ./deploy.sh"

Environment variables workflow level पर env: से set plain text हैं, logs में visible। Non-sensitive config के लिए use करें: NODE_ENV, telemetry flags, feature toggles।

Secrets (${{ secrets.X }}) rest पर encrypted हैं, logs में masked, और सिर्फ same repo के workflows को available। Settings > Secrets and variables > Actions में set होते हैं।

environment: production line significant है। GitHub Environments आपको secrets को specific deployment targets तक scope करने देता है। आपकी staging SSH key और production SSH key दोनों का नाम SSH_PRIVATE_KEY हो सकता है लेकिन अलग values hold कर सकती हैं depending on कि job कौन सा environment target करता है। यह required reviewers भी unlock करता है — आप production deploys को manual approval के पीछे gate कर सकते हैं।

पूरी CI Pipeline#

यहां मैं CI pipeline का half कैसे structure करता हूं। Goal: हर category की error को fastest possible time में catch करना।

yaml

name: CI
 
on:
  pull_request:
    branches: [main]
  push:
    branches: [main]
 
concurrency:
  group: ci-${{ github.ref }}
  cancel-in-progress: true
 
jobs:
  lint:
    name: Lint
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: "pnpm"
      - run: pnpm install --frozen-lockfile
      - run: pnpm lint
 
  typecheck:
    name: Type Check
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: "pnpm"
      - run: pnpm install --frozen-lockfile
      - run: pnpm tsc --noEmit
 
  test:
    name: Unit Tests
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: "pnpm"
      - run: pnpm install --frozen-lockfile
      - run: pnpm test -- --coverage
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: coverage-report
          path: coverage/
          retention-days: 7
 
  build:
    name: Build
    needs: [lint, typecheck, test]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: "pnpm"
      - run: pnpm install --frozen-lockfile
      - run: pnpm build
      - uses: actions/upload-artifact@v4
        with:
          name: build-output
          path: .next/
          retention-days: 1

यह Structure क्यों#

Lint, typecheck, और test parallel में run होते हैं। इनकी एक-दूसरे पर कोई dependency नहीं है। Type error lint को run होने से नहीं रोकता, और failed test को type checker के wait करने की ज़रूरत नहीं। Typical run में, तीनों simultaneously चलते हुए 30-60 seconds में complete होते हैं।

Build तीनों का wait करता है। needs: [lint, typecheck, test] line का मतलब है build job तभी start होता है जब lint, typecheck, और test तीनों pass हों। ऐसे project को build करने का कोई मतलब नहीं जिसमें lint errors या type failures हैं।

concurrency with cancel-in-progress: true बहुत बड़ा time saver है। अगर आप जल्दी-जल्दी दो commits push करते हैं, तो पहला CI run cancel हो जाता है। इसके बिना, stale runs आपका minutes budget consume करेंगे और checks UI clutter करेंगे।

Coverage upload with if: always() मतलब आपको coverage report tests fail होने पर भी मिलता है। Debugging के लिए useful — आप देख सकते हैं कौन से tests fail हुए और क्या cover करते थे।

Fail-Fast vs. सबको Run होने दो#

Default में, अगर matrix में एक job fail होता है, GitHub बाकी cancel कर देता है। CI के लिए, मैं actually यही behavior चाहता हूं — अगर lint fail है, तो मुझे test results की परवाह नहीं। पहले lint fix करो।

लेकिन test matrices के लिए (मान लो, Node 20 और Node 22 पर testing), शायद आप सारी failures एक साथ देखना चाहें:

yaml

test:
  strategy:
    fail-fast: false
    matrix:
      node-version: [20, 22]
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - uses: pnpm/action-setup@v4
    - uses: actions/setup-node@v4
      with:
        node-version: ${{ matrix.node-version }}
        cache: "pnpm"
    - run: pnpm install --frozen-lockfile
    - run: pnpm test

fail-fast: false दोनों matrix legs को complete होने देता है। अगर Node 22 fail होता है लेकिन Node 20 pass होता है, तो आप यह information तुरंत देखते हैं बजाय re-run करने के।

Speed के लिए Caching#

CI speed में सबसे बड़ा improvement caching है। Medium project पर cold pnpm install 30-45 seconds लेता है। Warm cache के साथ, 3-5 seconds। इसे चार parallel jobs में multiply करो तो हर run पर दो minutes बच रहे हैं।

pnpm Store Cache#

yaml

- uses: actions/setup-node@v4
  with:
    node-version: 22
    cache: "pnpm"

यह one-liner pnpm store (~/.local/share/pnpm/store) cache करता है। Cache hit पर, pnpm install --frozen-lockfile downloading की जगह store से hard-link करता है। यह अकेला repeat runs पर install time 80% कम कर देता है।

अगर आपको ज़्यादा control चाहिए — जैसे कि OS के basis पर भी cache करना — तो actions/cache directly use करें:

yaml

- uses: actions/cache@v4
  with:
    path: |
      ~/.local/share/pnpm/store
      node_modules
    key: pnpm-${{ runner.os }}-${{ hashFiles('pnpm-lock.yaml') }}
    restore-keys: |
      pnpm-${{ runner.os }}-

restore-keys fallback important है। अगर pnpm-lock.yaml change होती है (new dependency), तो exact key match नहीं होगा, लेकिन prefix match फिर भी ज़्यादातर cached packages restore करेगा। सिर्फ diff download होता है।

Next.js Build Cache#

Next.js का अपना build cache .next/cache में होता है। Runs के बीच इसे cache करने का मतलब incremental builds — सिर्फ changed pages और components recompile होते हैं।

yaml

- uses: actions/cache@v4
  with:
    path: .next/cache
    key: nextjs-${{ runner.os }}-${{ hashFiles('pnpm-lock.yaml') }}-${{ hashFiles('src/**/*.ts', 'src/**/*.tsx') }}
    restore-keys: |
      nextjs-${{ runner.os }}-${{ hashFiles('pnpm-lock.yaml') }}-
      nextjs-${{ runner.os }}-

यह three-level key strategy मतलब:

Exact match: same dependencies और same source files। Full cache hit, build near-instant।
Partial match (dependencies): dependencies same लेकिन source changed। Build सिर्फ changed files recompile करता है।
Partial match (OS only): dependencies changed। Build जो reuse कर सकता है करता है।

मेरे project से real numbers: cold build ~55 seconds लेता है, cached build ~15 seconds। 73% reduction।

Docker Layer Caching#

Docker builds वहां हैं जहां caching really impactful होती है। Full Next.js Docker build — OS deps install करना, source copy करना, pnpm install run करना, next build run करना — cold में 3-4 minutes लेता है। Layer caching के साथ, 30-60 seconds।

yaml

- uses: docker/build-push-action@v6
  with:
    context: .
    push: true
    tags: ghcr.io/${{ github.repository }}:latest
    cache-from: type=gha
    cache-to: type=gha,mode=max

type=gha GitHub Actions का built-in cache backend use करता है। mode=max सारी layers cache करता है, सिर्फ final वाली नहीं। Multi-stage builds के लिए यह critical है जहां intermediate layers (जैसे pnpm install) rebuild करना सबसे expensive होता है।

Turborepo Remote Cache#

अगर आप Turborepo के साथ monorepo में हैं, तो remote caching transformative है। पहला build task outputs cache में upload करता है। बाद के builds recomputing की जगह download करते हैं।

yaml

- run: pnpm turbo build --remote-only
  env:
    TURBO_TOKEN: ${{ secrets.TURBO_TOKEN }}
    TURBO_TEAM: ${{ vars.TURBO_TEAM }}

मैंने Turbo remote cache के साथ monorepo CI times 8 minutes से 90 seconds तक गिरते देखे हैं। Catch: इसके लिए Vercel account या self-hosted Turbo server चाहिए। Single-app repos के लिए overkill है।

Docker Build और Push#

अगर आप VPS (या किसी server) पर deploy कर रहे हैं, तो Docker reproducible builds देता है। CI में जो image run होती है वही image production में run होती है। "It works on my machine" अब नहीं होगा क्योंकि machine ही image है।

Multi-Stage Dockerfile#

Workflow से पहले, यहां वो Dockerfile है जो मैं Next.js के लिए use करता हूं:

dockerfile

# Stage 1: Dependencies
FROM node:22-alpine AS deps
RUN corepack enable && corepack prepare pnpm@latest --activate
WORKDIR /app
COPY package.json pnpm-lock.yaml ./
RUN pnpm install --frozen-lockfile --prod=false
 
# Stage 2: Build
FROM node:22-alpine AS builder
RUN corepack enable && corepack prepare pnpm@latest --activate
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
ENV NEXT_TELEMETRY_DISABLED=1
RUN pnpm build
 
# Stage 3: Production
FROM node:22-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production
ENV NEXT_TELEMETRY_DISABLED=1
 
RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 nextjs
 
COPY --from=builder /app/public ./public
COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static
 
USER nextjs
EXPOSE 3000
ENV PORT=3000
CMD ["node", "server.js"]

तीन stages, clear separation। Final image ~150MB है बजाय ~1.2GB के जो सब कुछ copy करने पर होगा। सिर्फ production artifacts runner stage तक पहुंचते हैं।

Build-and-Push Workflow#

yaml

name: Build and Push Docker Image
 
on:
  push:
    branches: [main]
 
env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}
 
jobs:
  build-and-push:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
 
    steps:
      - name: Checkout
        uses: actions/checkout@v4
 
      - name: Set up QEMU
        uses: docker/setup-qemu-action@v3
 
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
 
      - name: Log in to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
 
      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=sha,prefix=
            type=ref,event=branch
            type=raw,value=latest,enable={{is_default_branch}}
 
      - name: Build and push
        uses: docker/build-push-action@v6
        with:
          context: .
          platforms: linux/amd64,linux/arm64
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

चलिए important decisions unpack करते हैं।

GitHub Container Registry (ghcr.io)#

मैं Docker Hub की जगह ghcr.io use करता हूं तीन कारणों से:

Authentication free है। GITHUB_TOKEN हर workflow में automatically available है — Docker Hub credentials store करने की ज़रूरत नहीं।
Proximity। Images उसी infrastructure से pull होती हैं जिस पर CI run होती है। CI में pulls fast हैं।
Visibility। Images GitHub UI में आपके repo से linked हैं। Packages tab में दिखती हैं।

Multi-Platform Builds#

yaml

platforms: linux/amd64,linux/arm64

यह line शायद 90 seconds add करती है, लेकिन worth it है। ARM64 images natively run होती हैं:

Apple Silicon Macs (M1/M2/M3/M4) पर Docker Desktop के साथ local development में
AWS Graviton instances पर (x86 equivalents से 20-40% सस्ते)
Oracle Cloud के free ARM tier पर

इसके बिना, M-series Macs पर आपके developers Rosetta emulation से x86 images run कर रहे हैं। काम करता है, लेकिन noticeably slower है और कभी-कभी architecture-specific bugs surface होते हैं।

QEMU cross-compilation layer provide करता है। Buildx multi-arch build orchestrate करता है और manifest list push करता है ताकि Docker automatically सही architecture pull करे।

Tagging Strategy#

yaml

tags: |
  type=sha,prefix=
  type=ref,event=branch
  type=raw,value=latest,enable={{is_default_branch}}

हर image को तीन tags मिलते हैं:

abc1234 (commit SHA): Immutable। आप हमेशा exact commit deploy कर सकते हैं।
main (branch name): Mutable। उस branch से latest build को point करता है।
latest: Mutable। सिर्फ default branch पर set होता है। आपका server यही pull करता है।

Production में कभी latest deploy मत करो बिना SHA कहीं record किए। जब कुछ टूटता है, आपको जानना होगा कौन सा latest। मैं server पर एक file में deployed SHA store करता हूं जो health endpoint read करता है।

VPS पर SSH Deployment#

यहां सब कुछ एक साथ आता है। CI pass, Docker image build और push, अब server को बताना है कि new image pull करे और restart करे।

SSH Action#

yaml

deploy:
  name: Deploy to Production
  needs: [build-and-push]
  runs-on: ubuntu-latest
  environment: production
 
  steps:
    - name: Deploy via SSH
      uses: appleboy/ssh-action@v1
      with:
        host: ${{ secrets.DEPLOY_HOST }}
        username: ${{ secrets.DEPLOY_USER }}
        key: ${{ secrets.SSH_PRIVATE_KEY }}
        port: ${{ secrets.SSH_PORT }}
        script_stop: true
        script: |
          set -euo pipefail
 
          APP_DIR="/var/www/akousa.net"
          IMAGE="ghcr.io/${{ github.repository }}:latest"
          DEPLOY_SHA="${{ github.sha }}"
 
          echo "=== Deploying $DEPLOY_SHA ==="
 
          # Pull the latest image
          docker pull "$IMAGE"
 
          # Stop and remove old container
          docker stop akousa-app || true
          docker rm akousa-app || true
 
          # Start new container
          docker run -d \
            --name akousa-app \
            --restart unless-stopped \
            --network host \
            -e NODE_ENV=production \
            -e DATABASE_URL="${DATABASE_URL}" \
            -p 3000:3000 \
            "$IMAGE"
 
          # Wait for health check
          echo "Waiting for health check..."
          for i in $(seq 1 30); do
            if curl -sf http://localhost:3000/api/health > /dev/null 2>&1; then
              echo "Health check passed on attempt $i"
              break
            fi
            if [ "$i" -eq 30 ]; then
              echo "Health check failed after 30 attempts"
              exit 1
            fi
            sleep 2
          done
 
          # Record deployed SHA
          echo "$DEPLOY_SHA" > "$APP_DIR/.deployed-sha"
 
          # Prune old images
          docker image prune -af --filter "until=168h"
 
          echo "=== Deploy complete ==="

Deploy Script Alternative#

Simple pull-and-restart से आगे किसी भी चीज़ के लिए, मैं logic workflow में inline करने की जगह server पर script में move करता हूं:

bash

#!/bin/bash
# /var/www/akousa.net/deploy.sh
set -euo pipefail
 
APP_DIR="/var/www/akousa.net"
LOG_FILE="$APP_DIR/deploy.log"
IMAGE="ghcr.io/akousa/akousa-net:latest"
 
log() {
  echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "$LOG_FILE"
}
 
log "Starting deployment..."
 
# Login to GHCR
echo "$GHCR_TOKEN" | docker login ghcr.io -u akousa --password-stdin
 
# Pull with retry
for attempt in 1 2 3; do
  if docker pull "$IMAGE"; then
    log "Image pulled successfully on attempt $attempt"
    break
  fi
  if [ "$attempt" -eq 3 ]; then
    log "ERROR: Failed to pull image after 3 attempts"
    exit 1
  fi
  log "Pull attempt $attempt failed, retrying in 5s..."
  sleep 5
done
 
# Health check function
health_check() {
  local port=$1
  local max_attempts=30
  for i in $(seq 1 $max_attempts); do
    if curl -sf "http://localhost:$port/api/health" > /dev/null 2>&1; then
      return 0
    fi
    sleep 2
  done
  return 1
}
 
# Start new container on alternate port
docker run -d \
  --name akousa-app-new \
  --env-file "$APP_DIR/.env.production" \
  -p 3001:3000 \
  "$IMAGE"
 
# Verify new container is healthy
if ! health_check 3001; then
  log "ERROR: New container failed health check. Rolling back."
  docker stop akousa-app-new || true
  docker rm akousa-app-new || true
  exit 1
fi
 
log "New container healthy. Switching traffic..."
 
# Switch Nginx upstream
sudo sed -i 's/server 127.0.0.1:3000/server 127.0.0.1:3001/' /etc/nginx/conf.d/upstream.conf
sudo nginx -t && sudo nginx -s reload
 
# Stop old container
docker stop akousa-app || true
docker rm akousa-app || true
 
# Rename new container
docker rename akousa-app-new akousa-app
 
log "Deployment complete."

फिर workflow single SSH command बन जाता है:

yaml

script: |
  cd /var/www/akousa.net && ./deploy.sh

यह better है क्योंकि: (1) deploy logic server पर version-controlled है, (2) debugging के लिए manually SSH से run कर सकते हैं, और (3) YAML के अंदर YAML के अंदर bash escape नहीं करना पड़ता।

Zero-Downtime Strategies#

"Zero downtime" marketing speak जैसा लगता है, लेकिन इसका precise meaning है: deployment के दौरान कोई request को connection refused या 502 नहीं मिलता। यहां तीन real approaches हैं, simplest से most robust तक।

Strategy 1: PM2 Cluster Mode Reload#

अगर आप Node.js directly चला रहे हैं (Docker में नहीं), PM2 का cluster mode सबसे आसान zero-downtime path देता है।

bash

# ecosystem.config.js already has:
#   instances: 2
#   exec_mode: "cluster"
 
pm2 reload akousa --update-env

pm2 reload (restart नहीं) rolling restart करता है। यह new workers spin up करता है, ready होने का wait करता है, फिर old workers एक-एक करके kill करता है। किसी भी point पर zero workers traffic serve नहीं कर रहे।

--update-env flag ecosystem config से environment variables reload करता है। इसके बिना, आपका old env deploy के बाद भी persist करता है जिसने .env change किया।

आपके workflow में:

yaml

- name: Deploy and reload PM2
  uses: appleboy/ssh-action@v1
  with:
    host: ${{ secrets.DEPLOY_HOST }}
    username: ${{ secrets.DEPLOY_USER }}
    key: ${{ secrets.SSH_PRIVATE_KEY }}
    script: |
      cd /var/www/akousa.net
      git pull origin main
      pnpm install --frozen-lockfile
      pnpm build
      pm2 reload ecosystem.config.js --update-env

मैं इस site के लिए यही use करता हूं। Simple, reliable, और downtime literally zero है — मैंने deploys के दौरान 100 req/s load generator चलाकर test किया है। एक भी 5xx नहीं।

Strategy 2: Nginx Upstream के साथ Blue/Green#

Docker deployments के लिए, blue/green old और new versions के बीच clean separation देता है।

Concept: old container ("blue") port 3000 पर और new container ("green") port 3001 पर run करो। Nginx blue को point करता है। Green start करो, healthy verify करो, Nginx green पर switch करो, फिर blue stop करो।

Nginx upstream config:

nginx

# /etc/nginx/conf.d/upstream.conf
upstream app_backend {
    server 127.0.0.1:3000;
}

nginx

# /etc/nginx/sites-available/akousa.net
server {
    listen 443 ssl http2;
    server_name akousa.net;
 
    location / {
        proxy_pass http://app_backend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_cache_bypass $http_upgrade;
    }
}

Switch script:

bash

#!/bin/bash
set -euo pipefail
 
CURRENT_PORT=$(grep -oP 'server 127\.0\.0\.1:\K\d+' /etc/nginx/conf.d/upstream.conf)
 
if [ "$CURRENT_PORT" = "3000" ]; then
  NEW_PORT=3001
  OLD_PORT=3000
else
  NEW_PORT=3000
  OLD_PORT=3001
fi
 
echo "Current: $OLD_PORT -> New: $NEW_PORT"
 
# Start new container on the alternate port
docker run -d \
  --name "akousa-app-$NEW_PORT" \
  --env-file /var/www/akousa.net/.env.production \
  -p "$NEW_PORT:3000" \
  "ghcr.io/akousa/akousa-net:latest"
 
# Wait for health
for i in $(seq 1 30); do
  if curl -sf "http://localhost:$NEW_PORT/api/health" > /dev/null; then
    echo "New container healthy on port $NEW_PORT"
    break
  fi
  [ "$i" -eq 30 ] && { echo "Health check failed"; docker stop "akousa-app-$NEW_PORT"; docker rm "akousa-app-$NEW_PORT"; exit 1; }
  sleep 2
done
 
# Switch Nginx
sudo sed -i "s/server 127.0.0.1:$OLD_PORT/server 127.0.0.1:$NEW_PORT/" /etc/nginx/conf.d/upstream.conf
sudo nginx -t && sudo nginx -s reload
 
# Stop old container
sleep 5  # Let in-flight requests complete
docker stop "akousa-app-$OLD_PORT" || true
docker rm "akousa-app-$OLD_PORT" || true
 
echo "Switched from :$OLD_PORT to :$NEW_PORT"

Nginx reload के बाद 5-second sleep laziness नहीं है — यह grace time है। Nginx का reload graceful है (existing connections open रहते हैं), लेकिन कुछ long-polling connections या streaming responses को complete होने का time चाहिए।

Strategy 3: Health Checks के साथ Docker Compose#

ज़्यादा structured approach के लिए, Docker Compose blue/green swap manage कर सकता है:

yaml

# docker-compose.yml
services:
  app:
    image: ghcr.io/akousa/akousa-net:latest
    restart: unless-stopped
    env_file: .env.production
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/api/health"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 30s
    deploy:
      replicas: 2
      update_config:
        parallelism: 1
        delay: 10s
        order: start-first
        failure_action: rollback
      rollback_config:
        parallelism: 0
        order: stop-first
    ports:
      - "3000:3000"

order: start-first key line है। इसका मतलब "old container stop करने से पहले new container start करो।" parallelism: 1 के साथ combine करें, आपको rolling update मिलता है — एक बार में एक container, हमेशा capacity maintain करते हुए।

Deploy करें:

bash

docker compose pull
docker compose up -d --remove-orphans

Docker Compose healthcheck watch करता है और new container को तब तक traffic route नहीं करता जब तक pass न हो। अगर healthcheck fail होता है, failure_action: rollback automatically previous version पर revert कर देता है। Single VPS पर Kubernetes-style rolling deployments के यह सबसे करीब है।

Secrets Management#

Secrets management उन चीज़ों में से है जो "mostly right" करना आसान है और बाकी edge cases में catastrophically गलत।

GitHub Secrets: Basics#

yaml

# Set via GitHub UI: Settings > Secrets and variables > Actions
 
steps:
  - name: Use a secret
    env:
      DB_URL: ${{ secrets.DATABASE_URL }}
    run: |
      # The value is masked in logs
      echo "Connecting to database..."
      # This would print "Connecting to ***" in the logs
      echo "Connecting to $DB_URL"

GitHub automatically log output से secret values redact करता है। अगर आपकी secret p@ssw0rd123 है और कोई step वो string print करता है, logs *** दिखाते हैं। यह अच्छे से काम करता है, एक caveat के साथ: अगर आपकी secret short है (जैसे 4-digit PIN), GitHub शायद mask न करे क्योंकि यह innocent strings से match हो सकती है। Secrets reasonably complex रखें।

Environment-Scoped Secrets#

yaml

jobs:
  deploy-staging:
    environment: staging
    steps:
      - run: echo "Deploying to ${{ secrets.DEPLOY_HOST }}"
      # DEPLOY_HOST = staging.akousa.net
 
  deploy-production:
    environment: production
    steps:
      - run: echo "Deploying to ${{ secrets.DEPLOY_HOST }}"
      # DEPLOY_HOST = akousa.net

Same secret name, per environment अलग values। Job पर environment field determine करता है कौन सा set of secrets inject होता है।

Production environments में required reviewers enable होने चाहिए। मतलब main पर push workflow trigger करता है, CI automatically run होता है, लेकिन deploy job pause होता है और किसी के GitHub UI में "Approve" click करने का wait करता है। Solo project के लिए यह overhead लग सकता है। किसी भी चीज़ जिसमें users हैं, पहली बार जब आप accidentally कुछ broken merge करते हैं तब lifesaver है।

OIDC: अब Static Credentials नहीं#

GitHub Secrets में store static credentials (AWS access keys, GCP service account JSON files) liability हैं। ये expire नहीं होते, इन्हें specific workflow run तक scope नहीं कर सकते, और leak होने पर manually rotate करना पड़ता है।

OIDC (OpenID Connect) यह solve करता है। GitHub Actions identity provider की तरह act करता है, और आपका cloud provider इसे on the fly short-lived credentials issue करने के लिए trust करता है:

yaml

jobs:
  deploy:
    runs-on: ubuntu-latest
    permissions:
      id-token: write  # Required for OIDC
      contents: read
 
    steps:
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-actions-deploy
          aws-region: eu-central-1
 
      - name: Push to ECR
        run: |
          aws ecr get-login-password --region eu-central-1 | \
            docker login --username AWS --password-stdin 123456789012.dkr.ecr.eu-central-1.amazonaws.com

कोई access key नहीं। कोई secret key नहीं। configure-aws-credentials action GitHub के OIDC token use करके AWS STS से temporary token request करता है। Token specific repo, branch, और environment तक scoped है। Workflow run के बाद expire होता है।

AWS side पर setup के लिए IAM OIDC identity provider और role trust policy चाहिए:

json

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::123456789012:oidc-provider/token.actions.githubusercontent.com"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
        },
        "StringLike": {
          "token.actions.githubusercontent.com:sub": "repo:akousa/akousa-net:ref:refs/heads/main"
        }
      }
    }
  ]
}

sub condition crucial है। इसके बिना, कोई भी repo जो somehow आपके OIDC provider के details प्राप्त कर ले role assume कर सकता है। इसके साथ, सिर्फ आपके specific repo की main branch कर सकती है।

GCP में Workload Identity Federation के साथ equivalent setup है। Azure में federated credentials हैं। अगर आपका cloud OIDC support करता है, use करें। 2026 में static cloud credentials store करने का कोई कारण नहीं।

Deployment SSH Keys#

SSH से VPS deployments के लिए, dedicated key pair generate करें:

bash

ssh-keygen -t ed25519 -C "github-actions-deploy" -f deploy_key -N ""

Server की ~/.ssh/authorized_keys में public key restrictions के साथ add करें:

restrict,command="/var/www/akousa.net/deploy.sh" ssh-ed25519 AAAA... github-actions-deploy

restrict prefix port forwarding, agent forwarding, PTY allocation, और X11 forwarding disable करता है। command= prefix मतलब यह key सिर्फ deploy script execute कर सकती है। अगर private key compromised भी हो, attacker सिर्फ deploy script run कर सकता है और कुछ नहीं।

GitHub Secrets में private key SSH_PRIVATE_KEY के रूप में add करें। यह एक static credential है जो मैं accept करता हूं — forced commands के साथ SSH keys का blast radius बहुत limited है।

PR Workflows: Preview Deployments#

हर PR preview environment deserve करता है। यह visual bugs catch करता है जो unit tests miss करते हैं, designers को बिना code checkout किए review करने देता है, और QA की life dramatically आसान बनाता है।

PR Open पर Preview Deploy करना#

yaml

name: Preview Deploy
 
on:
  pull_request:
    types: [opened, synchronize, reopened]
 
jobs:
  preview:
    runs-on: ubuntu-latest
    environment:
      name: preview-${{ github.event.number }}
      url: ${{ steps.deploy.outputs.url }}
 
    steps:
      - uses: actions/checkout@v4
 
      - name: Build preview image
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: ghcr.io/${{ github.repository }}:pr-${{ github.event.number }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
 
      - name: Deploy preview
        id: deploy
        uses: appleboy/ssh-action@v1
        with:
          host: ${{ secrets.PREVIEW_HOST }}
          username: ${{ secrets.DEPLOY_USER }}
          key: ${{ secrets.SSH_PRIVATE_KEY }}
          script: |
            PR_NUM=${{ github.event.number }}
            PORT=$((4000 + PR_NUM))
            IMAGE="ghcr.io/${{ github.repository }}:pr-${PR_NUM}"
 
            docker pull "$IMAGE"
            docker stop "preview-${PR_NUM}" || true
            docker rm "preview-${PR_NUM}" || true
 
            docker run -d \
              --name "preview-${PR_NUM}" \
              --restart unless-stopped \
              -e NODE_ENV=preview \
              -p "${PORT}:3000" \
              "$IMAGE"
 
            echo "url=https://pr-${PR_NUM}.preview.akousa.net" >> "$GITHUB_OUTPUT"
 
      - name: Comment PR with preview URL
        uses: actions/github-script@v7
        with:
          script: |
            const url = `https://pr-${{ github.event.number }}.preview.akousa.net`;
            const body = `### Preview Deployment
 
            | Status | URL |
            |--------|-----|
            | :white_check_mark: Deployed | [${url}](${url}) |
 
            _Last updated: ${new Date().toISOString()}_
            _Commit: \`${{ github.sha }}\`_`;
 
            // Find existing comment
            const comments = await github.rest.issues.listComments({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
            });
 
            const botComment = comments.data.find(c =>
              c.user.type === 'Bot' && c.body.includes('Preview Deployment')
            );
 
            if (botComment) {
              await github.rest.issues.updateComment({
                owner: context.repo.owner,
                repo: context.repo.repo,
                comment_id: botComment.id,
                body,
              });
            } else {
              await github.rest.issues.createComment({
                owner: context.repo.owner,
                repo: context.repo.repo,
                issue_number: context.issue.number,
                body,
              });
            }

Port calculation (4000 + PR_NUM) pragmatic hack है। PR #42 को port 4042 मिलता है। जब तक आपके कुछ hundred से ज़्यादा open PRs नहीं हैं, कोई collisions नहीं। Nginx wildcard config pr-*.preview.akousa.net को सही port पर route करता है।

PR Close पर Cleanup#

Preview environments जो clean up नहीं होते disk और memory खाते हैं। Cleanup job add करें:

yaml

name: Cleanup Preview
 
on:
  pull_request:
    types: [closed]
 
jobs:
  cleanup:
    runs-on: ubuntu-latest
    steps:
      - name: Remove preview container
        uses: appleboy/ssh-action@v1
        with:
          host: ${{ secrets.PREVIEW_HOST }}
          username: ${{ secrets.DEPLOY_USER }}
          key: ${{ secrets.SSH_PRIVATE_KEY }}
          script: |
            PR_NUM=${{ github.event.number }}
            docker stop "preview-${PR_NUM}" || true
            docker rm "preview-${PR_NUM}" || true
            docker rmi "ghcr.io/${{ github.repository }}:pr-${PR_NUM}" || true
            echo "Preview for PR #${PR_NUM} cleaned up."
 
      - name: Deactivate environment
        uses: actions/github-script@v7
        with:
          script: |
            const deployments = await github.rest.repos.listDeployments({
              owner: context.repo.owner,
              repo: context.repo.repo,
              environment: `preview-${{ github.event.number }}`,
            });
 
            for (const deployment of deployments.data) {
              await github.rest.repos.createDeploymentStatus({
                owner: context.repo.owner,
                repo: context.repo.repo,
                deployment_id: deployment.id,
                state: 'inactive',
              });
            }

Required Status Checks#

Repository settings (Settings > Branches > Branch protection rules) में, merge से पहले ये checks require करें:

lint — कोई lint errors नहीं
typecheck — कोई type errors नहीं
test — सारे tests pass
build — Project successfully build होता है

इसके बिना, कोई ज़रूर failing checks के साथ PR merge करेगा। Maliciously नहीं — वो "2 of 4 checks passed" देखेंगे और assume करेंगे बाकी दो अभी run हो रहे हैं। Lock it down।

"Require branches to be up to date before merging" भी enable करें। यह latest main पर rebase के बाद CI re-run force करता है। उस case को catch करता है जहां दो PRs individually CI pass करते हैं लेकिन combine होने पर conflict करते हैं।

Notifications#

जिस deployment के बारे में किसी को पता नहीं वो ऐसी deployment है जिस पर कोई trust नहीं करता। Notifications feedback loop close करती हैं।

Slack Webhook#

yaml

- name: Notify Slack
  if: always()
  uses: slackapi/slack-github-action@v2
  with:
    webhook: ${{ secrets.SLACK_DEPLOY_WEBHOOK }}
    webhook-type: incoming-webhook
    payload: |
      {
        "blocks": [
          {
            "type": "header",
            "text": {
              "type": "plain_text",
              "text": "${{ job.status == 'success' && 'Deploy Successful' || 'Deploy Failed' }}"
            }
          },
          {
            "type": "section",
            "fields": [
              {
                "type": "mrkdwn",
                "text": "*Repository:*\n${{ github.repository }}"
              },
              {
                "type": "mrkdwn",
                "text": "*Branch:*\n${{ github.ref_name }}"
              },
              {
                "type": "mrkdwn",
                "text": "*Commit:*\n<${{ github.server_url }}/${{ github.repository }}/commit/${{ github.sha }}|${{ github.sha }}>"
              },
              {
                "type": "mrkdwn",
                "text": "*Triggered by:*\n${{ github.actor }}"
              }
            ]
          },
          {
            "type": "actions",
            "elements": [
              {
                "type": "button",
                "text": {
                  "type": "plain_text",
                  "text": "View Run"
                },
                "url": "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
              }
            ]
          }
        ]
      }

if: always() critical है। इसके बिना, notification step deploy fail होने पर skip हो जाता है — जो exactly तब है जब आपको इसकी सबसे ज़्यादा ज़रूरत है।

GitHub Deployments API#

Richer deployment tracking के लिए, GitHub Deployments API use करें। यह repo UI में deployment history देता है और status badges enable करता है:

yaml

- name: Create GitHub Deployment
  id: deployment
  uses: actions/github-script@v7
  with:
    script: |
      const deployment = await github.rest.repos.createDeployment({
        owner: context.repo.owner,
        repo: context.repo.repo,
        ref: context.sha,
        environment: 'production',
        auto_merge: false,
        required_contexts: [],
        description: `Deploying ${context.sha.substring(0, 7)} to production`,
      });
      return deployment.data.id;
 
- name: Deploy
  run: |
    # ... actual deployment steps ...
 
- name: Update deployment status
  if: always()
  uses: actions/github-script@v7
  with:
    script: |
      const deploymentId = ${{ steps.deployment.outputs.result }};
      await github.rest.repos.createDeploymentStatus({
        owner: context.repo.owner,
        repo: context.repo.repo,
        deployment_id: deploymentId,
        state: '${{ job.status }}' === 'success' ? 'success' : 'failure',
        environment_url: 'https://akousa.net',
        log_url: `${context.serverUrl}/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId}`,
        description: '${{ job.status }}' === 'success'
          ? 'Deployment succeeded'
          : 'Deployment failed',
      });

अब आपकी GitHub में Environments tab complete deployment history दिखाता है: किसने क्या deploy किया, कब, और succeed हुआ या नहीं।

Failure-Only Email#

Critical deployments के लिए, failure पर email भी trigger करता हूं। GitHub Actions की built-in email से नहीं (बहुत noisy), बल्कि targeted webhook से:

yaml

- name: Alert on failure
  if: failure()
  run: |
    curl -X POST "${{ secrets.ALERT_WEBHOOK_URL }}" \
      -H "Content-Type: application/json" \
      -d '{
        "subject": "DEPLOY FAILED: ${{ github.repository }}",
        "body": "Commit: ${{ github.sha }}\nActor: ${{ github.actor }}\nRun: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
      }'

यह मेरी last line of defense है। Slack बढ़िया है लेकिन noisy भी है — लोग channels mute कर देते हैं। Run के link के साथ "DEPLOY FAILED" email attention ज़रूर खींचती है।

Complete Workflow File#

यहां सब कुछ single, production-ready workflow में wire किया हुआ है। यह बहुत करीब है जो actually इस site को deploy करता है।

yaml

name: CI/CD Pipeline
 
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  workflow_dispatch:
    inputs:
      skip_tests:
        description: "Skip tests (emergency deploy)"
        required: false
        type: boolean
        default: false
 
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.event_name == 'pull_request' }}
 
env:
  NODE_VERSION: "22"
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}
 
jobs:
  # ============================================================
  # CI: Lint, type check, and test in parallel
  # ============================================================
 
  lint:
    name: Lint
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
 
      - name: Setup pnpm
        uses: pnpm/action-setup@v4
 
      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: "pnpm"
 
      - name: Install dependencies
        run: pnpm install --frozen-lockfile
 
      - name: Run ESLint
        run: pnpm lint
 
  typecheck:
    name: Type Check
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
 
      - name: Setup pnpm
        uses: pnpm/action-setup@v4
 
      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: "pnpm"
 
      - name: Install dependencies
        run: pnpm install --frozen-lockfile
 
      - name: Run TypeScript compiler
        run: pnpm tsc --noEmit
 
  test:
    name: Unit Tests
    if: ${{ !inputs.skip_tests }}
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
 
      - name: Setup pnpm
        uses: pnpm/action-setup@v4
 
      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: "pnpm"
 
      - name: Install dependencies
        run: pnpm install --frozen-lockfile
 
      - name: Run tests with coverage
        run: pnpm test -- --coverage
 
      - name: Upload coverage report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: coverage-report
          path: coverage/
          retention-days: 7
 
  # ============================================================
  # Build: Only after CI passes
  # ============================================================
 
  build:
    name: Build Application
    needs: [lint, typecheck, test]
    if: always() && !cancelled() && needs.lint.result == 'success' && needs.typecheck.result == 'success' && (needs.test.result == 'success' || needs.test.result == 'skipped')
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
 
      - name: Setup pnpm
        uses: pnpm/action-setup@v4
 
      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: "pnpm"
 
      - name: Install dependencies
        run: pnpm install --frozen-lockfile
 
      - name: Cache Next.js build
        uses: actions/cache@v4
        with:
          path: .next/cache
          key: nextjs-${{ runner.os }}-${{ hashFiles('pnpm-lock.yaml') }}-${{ hashFiles('src/**/*.ts', 'src/**/*.tsx') }}
          restore-keys: |
            nextjs-${{ runner.os }}-${{ hashFiles('pnpm-lock.yaml') }}-
            nextjs-${{ runner.os }}-
 
      - name: Build Next.js application
        run: pnpm build
 
  # ============================================================
  # Docker: Build and push image (main branch only)
  # ============================================================
 
  docker:
    name: Build Docker Image
    needs: [build]
    if: github.ref == 'refs/heads/main' && github.event_name != 'pull_request'
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    outputs:
      image_tag: ${{ steps.meta.outputs.tags }}
 
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
 
      - name: Set up QEMU for multi-platform builds
        uses: docker/setup-qemu-action@v3
 
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
 
      - name: Log in to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
 
      - name: Extract image metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=sha,prefix=
            type=raw,value=latest,enable={{is_default_branch}}
 
      - name: Build and push Docker image
        uses: docker/build-push-action@v6
        with:
          context: .
          platforms: linux/amd64,linux/arm64
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
 
  # ============================================================
  # Deploy: SSH into VPS and update
  # ============================================================
 
  deploy:
    name: Deploy to Production
    needs: [docker]
    if: github.ref == 'refs/heads/main' && github.event_name != 'pull_request'
    runs-on: ubuntu-latest
    environment:
      name: production
      url: https://akousa.net
 
    steps:
      - name: Create GitHub Deployment
        id: deployment
        uses: actions/github-script@v7
        with:
          script: |
            const deployment = await github.rest.repos.createDeployment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              ref: context.sha,
              environment: 'production',
              auto_merge: false,
              required_contexts: [],
              description: `Deploy ${context.sha.substring(0, 7)}`,
            });
            return deployment.data.id;
 
      - name: Deploy via SSH
        uses: appleboy/ssh-action@v1
        with:
          host: ${{ secrets.DEPLOY_HOST }}
          username: ${{ secrets.DEPLOY_USER }}
          key: ${{ secrets.SSH_PRIVATE_KEY }}
          port: ${{ secrets.SSH_PORT }}
          script_stop: true
          command_timeout: 5m
          script: |
            set -euo pipefail
 
            APP_DIR="/var/www/akousa.net"
            IMAGE="ghcr.io/${{ github.repository }}:latest"
            SHA="${{ github.sha }}"
 
            echo "=== Deploy $SHA started at $(date) ==="
 
            # Pull new image
            docker pull "$IMAGE"
 
            # Run new container on alternate port
            docker run -d \
              --name akousa-app-new \
              --env-file "$APP_DIR/.env.production" \
              -p 3001:3000 \
              "$IMAGE"
 
            # Health check
            echo "Running health check..."
            for i in $(seq 1 30); do
              if curl -sf http://localhost:3001/api/health > /dev/null 2>&1; then
                echo "Health check passed (attempt $i)"
                break
              fi
              if [ "$i" -eq 30 ]; then
                echo "ERROR: Health check failed"
                docker logs akousa-app-new --tail 50
                docker stop akousa-app-new && docker rm akousa-app-new
                exit 1
              fi
              sleep 2
            done
 
            # Switch traffic
            sudo sed -i 's/server 127.0.0.1:3000/server 127.0.0.1:3001/' /etc/nginx/conf.d/upstream.conf
            sudo nginx -t && sudo nginx -s reload
 
            # Grace period for in-flight requests
            sleep 5
 
            # Stop old container
            docker stop akousa-app || true
            docker rm akousa-app || true
 
            # Rename and reset port
            docker rename akousa-app-new akousa-app
            sudo sed -i 's/server 127.0.0.1:3001/server 127.0.0.1:3000/' /etc/nginx/conf.d/upstream.conf
            # Note: we don't reload Nginx here because the container name changed,
            # not the port. The next deploy will use the correct port.
 
            # Record deployment
            echo "$SHA" > "$APP_DIR/.deployed-sha"
            echo "$(date -u +%Y-%m-%dT%H:%M:%SZ) $SHA" >> "$APP_DIR/deploy.log"
 
            # Cleanup old images (older than 7 days)
            docker image prune -af --filter "until=168h"
 
            echo "=== Deploy complete at $(date) ==="
 
      - name: Update deployment status
        if: always()
        uses: actions/github-script@v7
        with:
          script: |
            const deploymentId = ${{ steps.deployment.outputs.result }};
            await github.rest.repos.createDeploymentStatus({
              owner: context.repo.owner,
              repo: context.repo.repo,
              deployment_id: deploymentId,
              state: '${{ job.status }}' === 'success' ? 'success' : 'failure',
              environment_url: 'https://akousa.net',
              log_url: `${context.serverUrl}/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId}`,
            });
 
      - name: Notify Slack
        if: always()
        uses: slackapi/slack-github-action@v2
        with:
          webhook: ${{ secrets.SLACK_DEPLOY_WEBHOOK }}
          webhook-type: incoming-webhook
          payload: |
            {
              "blocks": [
                {
                  "type": "header",
                  "text": {
                    "type": "plain_text",
                    "text": "${{ job.status == 'success' && 'Deploy Successful' || 'Deploy Failed' }}"
                  }
                },
                {
                  "type": "section",
                  "fields": [
                    {
                      "type": "mrkdwn",
                      "text": "*Commit:*\n<${{ github.server_url }}/${{ github.repository }}/commit/${{ github.sha }}|`${{ github.sha }}`>"
                    },
                    {
                      "type": "mrkdwn",
                      "text": "*Actor:*\n${{ github.actor }}"
                    }
                  ]
                },
                {
                  "type": "actions",
                  "elements": [
                    {
                      "type": "button",
                      "text": { "type": "plain_text", "text": "View Run" },
                      "url": "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
                    }
                  ]
                }
              ]
            }
 
      - name: Alert on failure
        if: failure()
        run: |
          curl -sf -X POST "${{ secrets.ALERT_WEBHOOK_URL }}" \
            -H "Content-Type: application/json" \
            -d '{
              "subject": "DEPLOY FAILED: ${{ github.repository }}",
              "body": "Commit: ${{ github.sha }}\nRun: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
            }' || true

Flow को Walk Through करें#

जब मैं main पर push करता हूं:

Lint, Type Check, और Test simultaneously kick off होते हैं। तीन runners, तीन parallel jobs। कोई भी fail हो, pipeline रुक जाती है।
Build तभी run होता है जब तीनों pass हों। Validate करता है कि application compile होती है और working output produce करती है।
Docker production image build करता है और ghcr.io पर push करता है। Multi-platform, layer-cached।
Deploy VPS में SSH करता है, new image pull करता है, new container start करता है, health-check करता है, Nginx switch करता है, और clean up करता है।
Notifications outcome चाहे जो हो fire होती हैं। Slack को message जाता है। GitHub Deployments update होते हैं। Fail होने पर alert email जाती है।

जब मैं PR open करता हूं:

Lint, Type Check, और Test run होते हैं। Same quality gates।
Build run होता है verify करने के लिए कि project compile होता है।
Docker और Deploy skip होते हैं (if conditions इन्हें main branch तक gate करती हैं)।

जब emergency deploy चाहिए (tests skip):

Actions tab में "Run workflow" click करें।
skip_tests: true select करें।
Lint और typecheck फिर भी run होते हैं (उन्हें skip नहीं कर सकते — मैं खुद पर इतना trust नहीं करता)।
Tests skip, build run, Docker build, deploy fire।

यह दो साल से मेरा workflow है। Server migrations, Node.js major version upgrades, pnpm replacing npm, और इस site में 15 tools add होने से survive कर चुका है। Push से production तक total end-to-end time: average 3 minutes 40 seconds। सबसे slow step multi-platform Docker build ~90 seconds पर है। बाकी सब cached होकर near-instant है।

दो साल के Iteration से सबक#

मैं उन mistakes के साथ close करूंगा जो मैंने कीं ताकि आपको न करनी पड़ें।

Action versions pin करें। uses: actions/checkout@v4 ठीक है, लेकिन production के लिए uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 (full SHA) consider करें। Compromised action आपकी secrets exfiltrate कर सकता है। 2025 में tj-actions/changed-files incident ने prove किया कि यह theoretical नहीं है।

सब कुछ cache मत करो। मैंने एक बार node_modules directly cache किया (सिर्फ pnpm store नहीं) और stale native bindings से phantom build failure debug करने में दो घंटे बिताए। Package manager store cache करो, installed modules नहीं।

Timeouts set करो। हर job में timeout-minutes होना चाहिए। Default 360 minutes (6 hours) है। अगर SSH connection drop होने से deploy hang होता है, तो छह घंटे बाद पता लगने से better है कि पहले पता चले — और monthly minutes burn भी न हों।

yaml

jobs:
  deploy:
    timeout-minutes: 15
    runs-on: ubuntu-latest

concurrency समझदारी से use करो। PRs के लिए, cancel-in-progress: true हमेशा सही है — जो commit पहले ही force-push हो चुका उसकी CI result की किसी को परवाह नहीं। Production deploys के लिए, false set करो। आप नहीं चाहते कि fast-follow commit mid-rollout deploy cancel कर दे।

Workflow file test करो। act (https://github.com/nektos/act) use करके workflows locally run करो। यह सब कुछ catch नहीं करेगा (secrets available नहीं हैं, और runner environment अलग है), लेकिन YAML syntax errors और obvious logic bugs push करने से पहले catch करता है।

Building Blocks समझना#

Triggers: आपका Workflow कब Run होता है#

Jobs, Steps, और Runners#

Environment Variables vs Secrets#

पूरी CI Pipeline#

यह Structure क्यों#

Fail-Fast vs. सबको Run होने दो#

Speed के लिए Caching#

pnpm Store Cache#

Next.js Build Cache#

Docker Layer Caching#

Turborepo Remote Cache#

Docker Build और Push#

Multi-Stage Dockerfile#

Build-and-Push Workflow#

GitHub Container Registry (ghcr.io)#

Multi-Platform Builds#

Tagging Strategy#

VPS पर SSH Deployment#

SSH Action#

Deploy Script Alternative#

Zero-Downtime Strategies#

Strategy 1: PM2 Cluster Mode Reload#

Strategy 2: Nginx Upstream के साथ Blue/Green#

Strategy 3: Health Checks के साथ Docker Compose#

Secrets Management#

GitHub Secrets: Basics#

Environment-Scoped Secrets#

OIDC: अब Static Credentials नहीं#

Deployment SSH Keys#

PR Workflows: Preview Deployments#

PR Open पर Preview Deploy करना#

PR Close पर Cleanup#

Required Status Checks#

Notifications#

Slack Webhook#

GitHub Deployments API#

Failure-Only Email#

Complete Workflow File#

Flow को Walk Through करें#

दो साल के Iteration से सबक#

संबंधित पोस्ट

Docker for Node.js: वो Production-Ready Setup जिसके बारे में कोई नहीं बताता

Git Beyond the Basics: वो Workflows जो हर हफ़्ते घंटे बचाते हैं