The VPS Setup That Actually Works: Node.js, PM2, Nginx, and Zero-Downtime Deploys
The exact VPS deployment setup I use in production — Ubuntu hardening, PM2 cluster mode, Nginx reverse proxy, SSL, and a deploy script that hasn't failed me yet. No theory, just what works.
This blog runs on a $10/month VPS. Not Vercel, not AWS, not a Kubernetes cluster managed by a team of six. A single Ubuntu box with Nginx, PM2, and a bash script that deploys in under 30 seconds.
I've tried the other paths. I've used Vercel (great until you need cron jobs, persistent WebSockets, or just control). I've used AWS (great if you enjoy spending half your day in IAM policies). I always end up back on a VPS.
But here's the problem: every "deploy to VPS" tutorial on the internet stops at the happy path. They show you how to install Node.js and run node server.js and call it production. Then your server gets SSH brute-forced, your process dies at 3 AM because nobody set up a process manager, and your SSL cert expired three months ago.
This is the guide I wish I had. Everything here is battle-tested — this exact setup serves the page you're reading right now.
Start With Security, Not Code#
Before you even think about Node.js, lock down the box. Fresh VPS instances are targets. Automated bots start hitting your SSH port within minutes of provisioning.
Create a Non-Root User#
adduser deploy
usermod -aG sudo deploySet Up SSH Key Authentication#
On your local machine:
ssh-copy-id deploy@your-server-ipThen disable password authentication entirely:
sudo nano /etc/ssh/sshd_configPasswordAuthentication no
PermitRootLogin nosudo systemctl restart sshdIf you skip this, you'll see thousands of failed login attempts in your auth logs within days. That's not paranoia — it's Tuesday on the public internet.
Firewall With UFW#
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow OpenSSH
sudo ufw allow 'Nginx Full'
sudo ufw enableThat's it. Four rules. Only SSH and web traffic get through.
Fail2Ban#
sudo apt install fail2ban -y
sudo cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.localEdit /etc/fail2ban/jail.local:
[sshd]
enabled = true
port = ssh
filter = sshd
logpath = /var/log/auth.log
maxretry = 3
bantime = 3600
findtime = 600sudo systemctl enable fail2ban
sudo systemctl start fail2banThree failed SSH attempts and you're banned for an hour. I've watched Fail2Ban block hundreds of IPs in a single day. It works.
Unattended Security Updates#
sudo apt install unattended-upgrades -y
sudo dpkg-reconfigure -plow unattended-upgradesYour server will now auto-install security patches. One less thing to forget.
Node.js: Use NVM, Not apt#
I see this in every tutorial: sudo apt install nodejs. Don't do it.
Ubuntu's package repos ship ancient Node.js versions. Even the NodeSource PPA lags behind. And when you need to switch between Node 20 and Node 22 for different projects, you're stuck.
NVM solves this:
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
source ~/.bashrc
nvm install --lts
nvm alias default lts/*Now verify:
node -v # v22.x.x or whatever LTS is current
npm -vThe non-obvious tip: when you install global packages with NVM (like PM2), they're tied to that Node version. If you switch versions with nvm use, your globals disappear. Set your default and stick with it on the server:
nvm alias default 22This has bitten me exactly once. Once was enough.
PM2: The Process Manager That Earns Its Keep#
PM2 is the difference between "deployed" and "production-ready." It handles process management, clustering, log rotation, auto-restart on crashes, and startup scripts. For free.
Install and Set Up#
npm install -g pm2The Ecosystem Config#
Don't start apps with CLI flags. Use an ecosystem.config.js file. It's version-controlled, reproducible, and self-documenting.
// ecosystem.config.js
module.exports = {
apps: [
{
name: "akousa",
script: "node_modules/.bin/next",
args: "start -p 3002",
cwd: "/var/www/akousa.net",
instances: 2,
exec_mode: "cluster",
max_memory_restart: "500M",
env: {
NODE_ENV: "production",
PORT: 3002,
},
// Graceful shutdown
kill_timeout: 5000,
listen_timeout: 10000,
wait_ready: false,
// Logging
log_date_format: "YYYY-MM-DD HH:mm:ss Z",
error_file: "/var/log/pm2/akousa-error.log",
out_file: "/var/log/pm2/akousa-out.log",
merge_logs: true,
// Auto-restart on failure
autorestart: true,
max_restarts: 10,
min_uptime: "10s",
// Don't watch in production
watch: false,
},
],
};Let me explain the choices that matter:
instances: 2 instead of "max": On a small VPS with 1-2 cores, "max" sounds smart but it'll spawn processes that fight for resources during builds. Two instances gives you zero-downtime reloads while leaving headroom. On a 4+ core machine, sure, use "max".
exec_mode: "cluster": This is what enables zero-downtime reloads. Without cluster mode, pm2 reload is just a fancy restart. With cluster mode, PM2 restarts instances one at a time — your app never goes fully offline.
max_memory_restart: "500M": Your Next.js app has a memory leak? PM2 will restart it before it OOM-kills your server. This has saved me from 2 AM alerts more than once.
kill_timeout: 5000: Gives your app 5 seconds to finish in-flight requests before PM2 force-kills it. The default (1600ms) is too aggressive for apps with database connections.
watch: false: I've seen people leave watch: true in production. PM2 then restarts the app every time a log file changes. Your app enters a restart loop. Don't.
Startup Script#
Make PM2 survive reboots:
pm2 startup systemd
# Copy and run the command it outputs
pm2 saveThis generates a systemd service. After a server reboot, your app comes back automatically. Test it — reboot your server and verify. Don't assume.
Log Rotation#
Logs will eat your disk eventually. Install the rotation module:
pm2 install pm2-logrotate
pm2 set pm2-logrotate:max_size 50M
pm2 set pm2-logrotate:retain 7
pm2 set pm2-logrotate:compress true50MB max per file, keep 7 rotated files, compress the old ones. Without this, I've seen /var/log fill a 25GB disk in three weeks on a moderately trafficked app.
Nginx: The Reverse Proxy That Does More Than You Think#
"Why not just expose Node.js directly on port 80?"
Because Nginx handles things Node.js shouldn't waste cycles on: SSL termination, static file serving, gzip compression, request buffering, connection limits, and graceful handling of slow clients. It's written in C and purpose-built for this.
Install#
sudo apt install nginx -yThe Config#
# /etc/nginx/sites-available/akousa.net
upstream node_app {
server 127.0.0.1:3002;
keepalive 64;
}
server {
listen 80;
listen [::]:80;
server_name akousa.net www.akousa.net;
# Redirect all HTTP to HTTPS
return 301 https://$host$request_uri;
}
server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name akousa.net www.akousa.net;
# SSL (managed by Certbot — these lines get added automatically)
ssl_certificate /etc/letsencrypt/live/akousa.net/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/akousa.net/privkey.pem;
include /etc/letsencrypt/options-ssl-nginx.conf;
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
# Security headers
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;
# Gzip compression
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_min_length 256;
gzip_types
text/plain
text/css
text/javascript
application/javascript
application/json
application/xml
image/svg+xml
application/wasm;
# Proxy settings
location / {
proxy_pass http://node_app;
proxy_http_version 1.1;
# Headers
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support (if you ever need it)
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# Timeouts — generous but not infinite
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
# Buffering — let Nginx handle slow clients
proxy_buffering on;
proxy_buffer_size 16k;
proxy_buffers 4 32k;
proxy_busy_buffers_size 64k;
}
# Next.js static assets — let Nginx serve them directly
location /_next/static/ {
alias /var/www/akousa.net/.next/static/;
expires 365d;
access_log off;
add_header Cache-Control "public, immutable";
}
# Public static files
location /static/ {
alias /var/www/akousa.net/public/static/;
expires 30d;
access_log off;
}
# Block access to dot files
location ~ /\. {
deny all;
access_log off;
log_not_found off;
}
}Enable it:
sudo ln -s /etc/nginx/sites-available/akousa.net /etc/nginx/sites-enabled/
sudo rm /etc/nginx/sites-enabled/default
sudo nginx -t
sudo systemctl reload nginxAlways run nginx -t before reloading. I once pushed a broken config and took the site down because I skipped the syntax check. The five characters nginx -t would have saved me thirty minutes of panicked debugging.
Things most tutorials miss in this config:
upstream block with keepalive 64: Nginx reuses connections to your Node.js backend instead of opening a new TCP connection for every request. This matters under load.
proxy_buffering on: Nginx reads the entire response from Node.js into memory, then sends it to the client at whatever speed the client can handle. Without this, a slow client on a 3G connection ties up your Node.js worker.
Serving _next/static/ directly: These are hashed, immutable assets. Let Nginx serve them from disk with a 365-day cache header. Your Node.js processes shouldn't be wasting time on this.
SSL in Five Minutes#
Let's Encrypt solved SSL. If you're still paying for certificates in 2026, stop.
sudo apt install certbot python3-certbot-nginx -y
sudo certbot --nginx -d akousa.net -d www.akousa.netCertbot will ask for your email, accept the ToS, and automatically modify your Nginx config to include the SSL directives. That's it.
Verify Auto-Renewal#
Certbot installs a systemd timer that checks twice a day and renews certificates within 30 days of expiration:
sudo systemctl list-timers | grep certbotTest that renewal works:
sudo certbot renew --dry-runIf the dry run passes, you'll never think about SSL again. If it fails, it's usually because port 80 is blocked (check your UFW rules) or Nginx isn't running.
One thing that caught me: if you set up Nginx before running Certbot, make sure your server block is listening on port 80 without the HTTPS redirect first. Certbot needs to reach port 80 for the HTTP-01 challenge. After Certbot runs successfully, then add the redirect.
The Deploy Script#
This is the script that runs every time I push to production. No CI/CD platform, no GitHub Actions. Just SSH and bash.
#!/bin/bash
# deploy.sh — zero-ish downtime deployment
set -euo pipefail
APP_DIR="/var/www/akousa.net"
APP_NAME="akousa"
LOG_FILE="/var/log/deploy.log"
HEALTH_URL="http://localhost:3002"
MAX_RETRIES=10
RETRY_INTERVAL=3
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}
log "=== Deploy started ==="
cd "$APP_DIR"
# Pull latest code
log "Pulling latest changes..."
git pull origin main 2>&1 | tee -a "$LOG_FILE"
# Install dependencies
log "Installing dependencies..."
npm install --legacy-peer-deps 2>&1 | tee -a "$LOG_FILE"
# Build
log "Building application..."
rm -rf .next
npm run build 2>&1 | tee -a "$LOG_FILE"
if [ $? -ne 0 ]; then
log "ERROR: Build failed. Aborting deploy."
exit 1
fi
# Reload PM2 (zero-downtime in cluster mode)
log "Reloading PM2..."
pm2 reload "$APP_NAME" 2>&1 | tee -a "$LOG_FILE"
pm2 save 2>&1 | tee -a "$LOG_FILE"
# Health check with retries
log "Running health check..."
for i in $(seq 1 $MAX_RETRIES); do
HTTP_CODE=$(curl -s -o /dev/null -w '%{http_code}' "$HEALTH_URL" 2>/dev/null || echo "000")
if [ "$HTTP_CODE" = "200" ]; then
log "Health check passed (HTTP $HTTP_CODE)"
log "=== Deploy completed successfully ==="
exit 0
fi
log "Health check attempt $i/$MAX_RETRIES (HTTP $HTTP_CODE). Retrying in ${RETRY_INTERVAL}s..."
sleep $RETRY_INTERVAL
done
log "ERROR: Health check failed after $MAX_RETRIES attempts"
log "Rolling back to previous PM2 state..."
pm2 restart "$APP_NAME" 2>&1 | tee -a "$LOG_FILE"
exit 1Make it executable:
chmod +x deploy.shDeploy from your local machine:
ssh root@your-server-ip "bash /var/www/akousa.net/deploy.sh"Key decisions in this script:
set -euo pipefail: The script exits immediately on any error. Without this, a failed npm install silently continues into the build step, and you get a cryptic error that wastes 20 minutes to debug.
rm -rf .next before building: Next.js has a build cache that occasionally produces stale output. I got bit by this once — a page showed old content despite the source code being updated. Nuking the build directory adds maybe 15 seconds to the build but guarantees fresh output.
pm2 reload instead of pm2 restart: This is the zero-downtime part. In cluster mode, reload performs a rolling restart — it brings up new instances with the updated code, waits for them to be ready, then gracefully shuts down old ones. At no point are zero instances running.
Health check with retries: Next.js takes a few seconds to warm up after restart. The script waits up to 30 seconds (10 retries × 3 seconds), checking if the app responds with HTTP 200. If it doesn't, something is wrong and you need to know immediately — not find out from a user.
Rollback on failure: If the health check fails after all retries, the script restarts PM2 (which loads the last saved state). It's not a perfect rollback, but it's better than leaving the server in a broken state.
When Things Break at 2 AM#
Here's what I've actually debugged on this exact setup:
"The site is down"#
First commands to run:
pm2 status
pm2 logs akousa --lines 50
sudo systemctl status nginx
sudo tail -50 /var/log/nginx/error.logNine times out of ten, pm2 logs tells you immediately what happened. A missing environment variable, a failed database connection, or an unhandled promise rejection.
"Memory keeps growing"#
pm2 monitThis gives you a live dashboard of CPU and memory per process. If memory climbs steadily without leveling off, you have a leak. The max_memory_restart setting in your ecosystem config is your safety net — PM2 will restart the process before it takes down the server.
For deeper investigation:
pm2 describe akousaThis shows uptime, restart count, and memory snapshots. If you see 47 restarts in the last 24 hours, that's your hint.
"SSL certificate expired"#
sudo certbot certificatesLists all certificates with their expiration dates. If auto-renewal failed:
sudo certbot renew --force-renewal
sudo systemctl reload nginx"Disk space is full"#
df -h
du -sh /var/log/*
pm2 flushpm2 flush clears all PM2 log files immediately. If you didn't set up log rotation (I told you), this is where you feel the pain.
The Command I Run Every Morning#
ssh deploy@akousa.net "pm2 status && df -h / && uptime"Three things in one line: are my processes running, is my disk okay, is the server overloaded. Takes two seconds. Catches problems before users do.
What Most Guides Won't Tell You#
Your build step is your biggest vulnerability. On a 1GB RAM VPS, npm run build for a Next.js app can consume 800MB+ of memory. If PM2 is running your app in two instances during the build, you'll OOM. Solutions: use a swap file (at least 2GB), or stop the app during builds and accept a few seconds of downtime. I use swap.
sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab--legacy-peer-deps in your install command is a code smell, not a solution. I use it because some packages in my dependency tree haven't updated their peer dependency ranges. Every few months I try removing it. Someday it'll work. Until then, I ship.
Test your deploy script from scratch. Clone your repo on a fresh server and run every step manually. The number of "works on my machine" issues that hide in deploy scripts is embarrassing. I found three issues in mine when I did this — missing global packages, wrong file permissions, and a path that only existed because of a previous manual setup.
Put your server's IP in your SSH config. Stop typing IP addresses:
# ~/.ssh/config
Host akousa
HostName 69.62.66.94
User deploy
IdentityFile ~/.ssh/id_ed25519Now ssh akousa is all you need. Small things compound.
The Full Checklist#
Before you call it done:
- Non-root user with sudo access
- SSH key auth only, password auth disabled
- UFW enabled with only necessary ports open
- Fail2Ban protecting SSH
- Unattended security upgrades enabled
- Node.js installed via NVM
- PM2 running your app in cluster mode
- PM2 startup script configured (survives reboot)
- PM2 log rotation installed
- Nginx reverse proxy with proper headers
- SSL via Let's Encrypt with auto-renewal
- Deploy script with health checks
- Swap file configured (for build headroom)
- Tested: reboot the server and verify everything comes back
That last item is the one people skip. Don't be that person. Reboot the server, wait 60 seconds, and check if your app is live. If it isn't, your startup scripts are misconfigured and you'll find out at the worst possible time.
Is This "Enterprise-Grade"?#
No. And that's the point.
This setup serves this blog reliably for under $10/month. It's deployed in 30 seconds with a single command. I understand every piece of it. When something breaks, I know exactly where to look.
Could I use Docker? Sure. Could I use Kubernetes? Technically. Could I set up a full CI/CD pipeline with staging environments and canary deployments? Absolutely.
But I've learned that the best infrastructure is the one you actually understand, can debug at 2 AM, and doesn't cost more than the project earns. For a personal site, a SaaS MVP, or a small startup — this is that setup.
Ship first. Scale when you need to. And always, always, test your deploy script on a fresh server.