The VPS Setup That Actually Works: Node.js, PM2, Nginx, and Zero-Downtime Deploys

This blog runs on a $10/month VPS. Not Vercel, not AWS, not a Kubernetes cluster managed by a team of six. A single Ubuntu box with Nginx, PM2, and a bash script that deploys in under 30 seconds.

I've tried the other paths. I've used Vercel (great until you need cron jobs, persistent WebSockets, or just control). I've used AWS (great if you enjoy spending half your day in IAM policies). I always end up back on a VPS.

But here's the problem: every "deploy to VPS" tutorial on the internet stops at the happy path. They show you how to install Node.js and run node server.js and call it production. Then your server gets SSH brute-forced, your process dies at 3 AM because nobody set up a process manager, and your SSL cert expired three months ago.

This is the guide I wish I had. Everything here is battle-tested — this exact setup serves the page you're reading right now.

Start With Security, Not Code#

Before you even think about Node.js, lock down the box. Fresh VPS instances are targets. Automated bots start hitting your SSH port within minutes of provisioning.

Create a Non-Root User#

bash

adduser deploy
usermod -aG sudo deploy

Set Up SSH Key Authentication#

On your local machine:

bash

ssh-copy-id deploy@your-server-ip

Then disable password authentication entirely:

bash

sudo nano /etc/ssh/sshd_config

bash

PasswordAuthentication no
PermitRootLogin no

bash

sudo systemctl restart sshd

If you skip this, you'll see thousands of failed login attempts in your auth logs within days. That's not paranoia — it's Tuesday on the public internet.

Firewall With UFW#

bash

sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow OpenSSH
sudo ufw allow 'Nginx Full'
sudo ufw enable

That's it. Four rules. Only SSH and web traffic get through.

Fail2Ban#

bash

sudo apt install fail2ban -y
sudo cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.local

Edit /etc/fail2ban/jail.local:

ini

[sshd]
enabled = true
port = ssh
filter = sshd
logpath = /var/log/auth.log
maxretry = 3
bantime = 3600
findtime = 600

bash

sudo systemctl enable fail2ban
sudo systemctl start fail2ban

Three failed SSH attempts and you're banned for an hour. I've watched Fail2Ban block hundreds of IPs in a single day. It works.

Unattended Security Updates#

bash

sudo apt install unattended-upgrades -y
sudo dpkg-reconfigure -plow unattended-upgrades

Your server will now auto-install security patches. One less thing to forget.

Node.js: Use NVM, Not apt#

I see this in every tutorial: sudo apt install nodejs. Don't do it.

Ubuntu's package repos ship ancient Node.js versions. Even the NodeSource PPA lags behind. And when you need to switch between Node 20 and Node 22 for different projects, you're stuck.

NVM solves this:

bash

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
source ~/.bashrc
nvm install --lts
nvm alias default lts/*

Now verify:

bash

node -v  # v22.x.x or whatever LTS is current
npm -v

The non-obvious tip: when you install global packages with NVM (like PM2), they're tied to that Node version. If you switch versions with nvm use, your globals disappear. Set your default and stick with it on the server:

bash

nvm alias default 22

This has bitten me exactly once. Once was enough.

PM2: The Process Manager That Earns Its Keep#

PM2 is the difference between "deployed" and "production-ready." It handles process management, clustering, log rotation, auto-restart on crashes, and startup scripts. For free.

Install and Set Up#

bash

npm install -g pm2

The Ecosystem Config#

Don't start apps with CLI flags. Use an ecosystem.config.js file. It's version-controlled, reproducible, and self-documenting.

javascript

// ecosystem.config.js
module.exports = {
  apps: [
    {
      name: "akousa",
      script: "node_modules/.bin/next",
      args: "start -p 3002",
      cwd: "/var/www/akousa.net",
      instances: 2,
      exec_mode: "cluster",
      max_memory_restart: "500M",
      env: {
        NODE_ENV: "production",
        PORT: 3002,
      },
      // Graceful shutdown
      kill_timeout: 5000,
      listen_timeout: 10000,
      wait_ready: false,
      // Logging
      log_date_format: "YYYY-MM-DD HH:mm:ss Z",
      error_file: "/var/log/pm2/akousa-error.log",
      out_file: "/var/log/pm2/akousa-out.log",
      merge_logs: true,
      // Auto-restart on failure
      autorestart: true,
      max_restarts: 10,
      min_uptime: "10s",
      // Don't watch in production
      watch: false,
    },
  ],
};

Let me explain the choices that matter:

instances: 2 instead of "max": On a small VPS with 1-2 cores, "max" sounds smart but it'll spawn processes that fight for resources during builds. Two instances gives you zero-downtime reloads while leaving headroom. On a 4+ core machine, sure, use "max".

exec_mode: "cluster": This is what enables zero-downtime reloads. Without cluster mode, pm2 reload is just a fancy restart. With cluster mode, PM2 restarts instances one at a time — your app never goes fully offline.

max_memory_restart: "500M": Your Next.js app has a memory leak? PM2 will restart it before it OOM-kills your server. This has saved me from 2 AM alerts more than once.

kill_timeout: 5000: Gives your app 5 seconds to finish in-flight requests before PM2 force-kills it. The default (1600ms) is too aggressive for apps with database connections.

watch: false: I've seen people leave watch: true in production. PM2 then restarts the app every time a log file changes. Your app enters a restart loop. Don't.

Startup Script#

Make PM2 survive reboots:

bash

pm2 startup systemd
# Copy and run the command it outputs
pm2 save

This generates a systemd service. After a server reboot, your app comes back automatically. Test it — reboot your server and verify. Don't assume.

Log Rotation#

Logs will eat your disk eventually. Install the rotation module:

bash

pm2 install pm2-logrotate
pm2 set pm2-logrotate:max_size 50M
pm2 set pm2-logrotate:retain 7
pm2 set pm2-logrotate:compress true

50MB max per file, keep 7 rotated files, compress the old ones. Without this, I've seen /var/log fill a 25GB disk in three weeks on a moderately trafficked app.

Nginx: The Reverse Proxy That Does More Than You Think#

"Why not just expose Node.js directly on port 80?"

Because Nginx handles things Node.js shouldn't waste cycles on: SSL termination, static file serving, gzip compression, request buffering, connection limits, and graceful handling of slow clients. It's written in C and purpose-built for this.

Install#

bash

sudo apt install nginx -y

The Config#

nginx

# /etc/nginx/sites-available/akousa.net
 
upstream node_app {
    server 127.0.0.1:3002;
    keepalive 64;
}
 
server {
    listen 80;
    listen [::]:80;
    server_name akousa.net www.akousa.net;
 
    # Redirect all HTTP to HTTPS
    return 301 https://$host$request_uri;
}
 
server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;
    server_name akousa.net www.akousa.net;
 
    # SSL (managed by Certbot — these lines get added automatically)
    ssl_certificate /etc/letsencrypt/live/akousa.net/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/akousa.net/privkey.pem;
    include /etc/letsencrypt/options-ssl-nginx.conf;
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
 
    # Security headers
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-XSS-Protection "1; mode=block" always;
    add_header Referrer-Policy "strict-origin-when-cross-origin" always;
    add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;
 
    # Gzip compression
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_min_length 256;
    gzip_types
        text/plain
        text/css
        text/javascript
        application/javascript
        application/json
        application/xml
        image/svg+xml
        application/wasm;
 
    # Proxy settings
    location / {
        proxy_pass http://node_app;
        proxy_http_version 1.1;
 
        # Headers
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
 
        # WebSocket support (if you ever need it)
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
 
        # Timeouts — generous but not infinite
        proxy_connect_timeout 60s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
 
        # Buffering — let Nginx handle slow clients
        proxy_buffering on;
        proxy_buffer_size 16k;
        proxy_buffers 4 32k;
        proxy_busy_buffers_size 64k;
    }
 
    # Next.js static assets — let Nginx serve them directly
    location /_next/static/ {
        alias /var/www/akousa.net/.next/static/;
        expires 365d;
        access_log off;
        add_header Cache-Control "public, immutable";
    }
 
    # Public static files
    location /static/ {
        alias /var/www/akousa.net/public/static/;
        expires 30d;
        access_log off;
    }
 
    # Block access to dot files
    location ~ /\. {
        deny all;
        access_log off;
        log_not_found off;
    }
}

Enable it:

bash

sudo ln -s /etc/nginx/sites-available/akousa.net /etc/nginx/sites-enabled/
sudo rm /etc/nginx/sites-enabled/default
sudo nginx -t
sudo systemctl reload nginx

Always run nginx -t before reloading. I once pushed a broken config and took the site down because I skipped the syntax check. The five characters nginx -t would have saved me thirty minutes of panicked debugging.

Things most tutorials miss in this config:

upstream block with keepalive 64: Nginx reuses connections to your Node.js backend instead of opening a new TCP connection for every request. This matters under load.

proxy_buffering on: Nginx reads the entire response from Node.js into memory, then sends it to the client at whatever speed the client can handle. Without this, a slow client on a 3G connection ties up your Node.js worker.

Serving _next/static/ directly: These are hashed, immutable assets. Let Nginx serve them from disk with a 365-day cache header. Your Node.js processes shouldn't be wasting time on this.

SSL in Five Minutes#

Let's Encrypt solved SSL. If you're still paying for certificates in 2026, stop.

bash

sudo apt install certbot python3-certbot-nginx -y
sudo certbot --nginx -d akousa.net -d www.akousa.net

Certbot will ask for your email, accept the ToS, and automatically modify your Nginx config to include the SSL directives. That's it.

Verify Auto-Renewal#

Certbot installs a systemd timer that checks twice a day and renews certificates within 30 days of expiration:

bash

sudo systemctl list-timers | grep certbot

Test that renewal works:

bash

sudo certbot renew --dry-run

If the dry run passes, you'll never think about SSL again. If it fails, it's usually because port 80 is blocked (check your UFW rules) or Nginx isn't running.

One thing that caught me: if you set up Nginx before running Certbot, make sure your server block is listening on port 80 without the HTTPS redirect first. Certbot needs to reach port 80 for the HTTP-01 challenge. After Certbot runs successfully, then add the redirect.

The Deploy Script#

This is the script that runs every time I push to production. No CI/CD platform, no GitHub Actions. Just SSH and bash.

bash

#!/bin/bash
# deploy.sh — zero-ish downtime deployment
 
set -euo pipefail
 
APP_DIR="/var/www/akousa.net"
APP_NAME="akousa"
LOG_FILE="/var/log/deploy.log"
HEALTH_URL="http://localhost:3002"
MAX_RETRIES=10
RETRY_INTERVAL=3
 
log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}
 
log "=== Deploy started ==="
 
cd "$APP_DIR"
 
# Pull latest code
log "Pulling latest changes..."
git pull origin main 2>&1 | tee -a "$LOG_FILE"
 
# Install dependencies
log "Installing dependencies..."
npm install --legacy-peer-deps 2>&1 | tee -a "$LOG_FILE"
 
# Build
log "Building application..."
rm -rf .next
npm run build 2>&1 | tee -a "$LOG_FILE"
 
if [ $? -ne 0 ]; then
    log "ERROR: Build failed. Aborting deploy."
    exit 1
fi
 
# Reload PM2 (zero-downtime in cluster mode)
log "Reloading PM2..."
pm2 reload "$APP_NAME" 2>&1 | tee -a "$LOG_FILE"
pm2 save 2>&1 | tee -a "$LOG_FILE"
 
# Health check with retries
log "Running health check..."
for i in $(seq 1 $MAX_RETRIES); do
    HTTP_CODE=$(curl -s -o /dev/null -w '%{http_code}' "$HEALTH_URL" 2>/dev/null || echo "000")
    if [ "$HTTP_CODE" = "200" ]; then
        log "Health check passed (HTTP $HTTP_CODE)"
        log "=== Deploy completed successfully ==="
        exit 0
    fi
    log "Health check attempt $i/$MAX_RETRIES (HTTP $HTTP_CODE). Retrying in ${RETRY_INTERVAL}s..."
    sleep $RETRY_INTERVAL
done
 
log "ERROR: Health check failed after $MAX_RETRIES attempts"
log "Rolling back to previous PM2 state..."
pm2 restart "$APP_NAME" 2>&1 | tee -a "$LOG_FILE"
exit 1

Make it executable:

bash

chmod +x deploy.sh

Deploy from your local machine:

bash

ssh root@your-server-ip "bash /var/www/akousa.net/deploy.sh"

Key decisions in this script:

set -euo pipefail: The script exits immediately on any error. Without this, a failed npm install silently continues into the build step, and you get a cryptic error that wastes 20 minutes to debug.

rm -rf .next before building: Next.js has a build cache that occasionally produces stale output. I got bit by this once — a page showed old content despite the source code being updated. Nuking the build directory adds maybe 15 seconds to the build but guarantees fresh output.

pm2 reload instead of pm2 restart: This is the zero-downtime part. In cluster mode, reload performs a rolling restart — it brings up new instances with the updated code, waits for them to be ready, then gracefully shuts down old ones. At no point are zero instances running.

Health check with retries: Next.js takes a few seconds to warm up after restart. The script waits up to 30 seconds (10 retries × 3 seconds), checking if the app responds with HTTP 200. If it doesn't, something is wrong and you need to know immediately — not find out from a user.

Rollback on failure: If the health check fails after all retries, the script restarts PM2 (which loads the last saved state). It's not a perfect rollback, but it's better than leaving the server in a broken state.

When Things Break at 2 AM#

Here's what I've actually debugged on this exact setup:

"The site is down"#

First commands to run:

bash

pm2 status
pm2 logs akousa --lines 50
sudo systemctl status nginx
sudo tail -50 /var/log/nginx/error.log

Nine times out of ten, pm2 logs tells you immediately what happened. A missing environment variable, a failed database connection, or an unhandled promise rejection.

"Memory keeps growing"#

bash

pm2 monit

This gives you a live dashboard of CPU and memory per process. If memory climbs steadily without leveling off, you have a leak. The max_memory_restart setting in your ecosystem config is your safety net — PM2 will restart the process before it takes down the server.

For deeper investigation:

bash

pm2 describe akousa

This shows uptime, restart count, and memory snapshots. If you see 47 restarts in the last 24 hours, that's your hint.

"SSL certificate expired"#

bash

sudo certbot certificates

Lists all certificates with their expiration dates. If auto-renewal failed:

bash

sudo certbot renew --force-renewal
sudo systemctl reload nginx

"Disk space is full"#

bash

df -h
du -sh /var/log/*
pm2 flush

pm2 flush clears all PM2 log files immediately. If you didn't set up log rotation (I told you), this is where you feel the pain.

The Command I Run Every Morning#

bash

ssh deploy@akousa.net "pm2 status && df -h / && uptime"

Three things in one line: are my processes running, is my disk okay, is the server overloaded. Takes two seconds. Catches problems before users do.

What Most Guides Won't Tell You#

Your build step is your biggest vulnerability. On a 1GB RAM VPS, npm run build for a Next.js app can consume 800MB+ of memory. If PM2 is running your app in two instances during the build, you'll OOM. Solutions: use a swap file (at least 2GB), or stop the app during builds and accept a few seconds of downtime. I use swap.

bash

sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

--legacy-peer-deps in your install command is a code smell, not a solution. I use it because some packages in my dependency tree haven't updated their peer dependency ranges. Every few months I try removing it. Someday it'll work. Until then, I ship.

Test your deploy script from scratch. Clone your repo on a fresh server and run every step manually. The number of "works on my machine" issues that hide in deploy scripts is embarrassing. I found three issues in mine when I did this — missing global packages, wrong file permissions, and a path that only existed because of a previous manual setup.

Put your server's IP in your SSH config. Stop typing IP addresses:

bash

# ~/.ssh/config
Host akousa
    HostName 69.62.66.94
    User deploy
    IdentityFile ~/.ssh/id_ed25519

Now ssh akousa is all you need. Small things compound.

The Full Checklist#

Before you call it done:

That last item is the one people skip. Don't be that person. Reboot the server, wait 60 seconds, and check if your app is live. If it isn't, your startup scripts are misconfigured and you'll find out at the worst possible time.

Is This "Enterprise-Grade"?#

No. And that's the point.

This setup serves this blog reliably for under $10/month. It's deployed in 30 seconds with a single command. I understand every piece of it. When something breaks, I know exactly where to look.

Could I use Docker? Sure. Could I use Kubernetes? Technically. Could I set up a full CI/CD pipeline with staging environments and canary deployments? Absolutely.

But I've learned that the best infrastructure is the one you actually understand, can debug at 2 AM, and doesn't cost more than the project earns. For a personal site, a SaaS MVP, or a small startup — this is that setup.

Ship first. Scale when you need to. And always, always, test your deploy script on a fresh server.