Replit Load & Scale Skill

Replit Load & Scale

Overview

Load testing, scaling strategies, and capacity planning for Replit deployments. Covers Autoscale behavior tuning, Reserved VM right-sizing, cold start optimization, database connection scaling, and capacity benchmarking.

Prerequisites

Replit app deployed (Autoscale or Reserved VM)
Load testing tool: k6, autocannon, or curl
Health endpoint implemented

Replit Scaling Model

| Deployment Type | Scaling Behavior | Cold Start | Best For | |-----------------|-----------------|------------|----------| | Autoscale | 0 to N instances based on traffic | Yes (5-30s) | Variable traffic | | Reserved VM | Fixed resources, always-on | No | Consistent traffic | | Static | CDN-backed, infinite scale | No | Frontend assets |

Instructions

Step 1: Baseline Benchmark

# Quick benchmark with autocannon (built into Node.js ecosystem)
npx autocannon -c 10 -d 30 https://your-app.replit.app/health
# -c 10: 10 concurrent connections
# -d 30: 30 seconds duration

# Output shows:
# - Requests/sec
# - Latency (p50, p95, p99)
# - Throughput (bytes/sec)
# - Error count

Step 2: Load Test with k6

// load-test.js — comprehensive Replit load test
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';

const errorRate = new Rate('errors');
const coldStartTrend = new Trend('cold_start_duration');

export const options = {
  stages: [
    { duration: '1m', target: 5 },    // Warm up
    { duration: '3m', target: 20 },   // Normal load
    { duration: '2m', target: 50 },   // Peak load
    { duration: '1m', target: 0 },    // Cool down
  ],
  thresholds: {
    http_req_duration: ['p(95)<2000'],  // 95% of requests under 2s
    errors: ['rate<0.05'],              // Error rate under 5%
  },
};

const BASE_URL = __ENV.DEPLOY_URL || 'https://your-app.replit.app';

export default function () {
  // Health check
  const healthRes = http.get(`${BASE_URL}/health`);
  check(healthRes, {
    'health returns 200': (r) => r.status === 200,
    'health under 1s': (r) => r.timings.duration < 1000,
  });
  errorRate.add(healthRes.status !== 200);

  // Detect cold start
  if (healthRes.timings.duration > 5000) {
    coldStartTrend.add(healthRes.timings.duration);
  }

  // API endpoint
  const apiRes = http.get(`${BASE_URL}/api/status`);
  check(apiRes, {
    'api returns 200': (r) => r.status === 200,
  });

  sleep(1);
}

# Run k6 load test
k6 run --env DEPLOY_URL=https://your-app.replit.app load-test.js

# With JSON output
k6 run --out json=results.json load-test.js

Step 3: Cold Start Optimization (Autoscale)

Autoscale cold starts happen when:
- First request after period of no traffic
- Replit needs to start a new container instance
- Typical: 5-30 seconds depending on app size

Reduction strategies:
1. Minimize startup imports (lazy-load heavy modules)
2. Use smaller Nix dependency set
3. Pre-connect database in background (don't block startup)
4. Keep package count low
5. Use compiled JavaScript (not tsx at runtime)

Before (slow cold start):
  run = "npx tsx src/index.ts"  → compiles TS at startup

After (fast cold start):
  build = "npm run build"  → compiles during deploy
  run = "node dist/index.js"  → runs pre-compiled JS

# .replit — optimized for fast cold start
[deployment]
build = ["sh", "-c", "npm ci --production && npm run build"]
run = ["sh", "-c", "node dist/index.js"]
deploymentTarget = "autoscale"

Step 4: Reserved VM Sizing

Choose VM size based on load test results:

If peak CPU < 30% → downsize (save money)
If peak CPU > 70% → upsize (prevent throttling)
If peak memory > 80% → upsize (prevent OOM)

Machine sizes:
  0.25 vCPU / 512 MB  → Simple APIs, < 50 req/s
  0.5 vCPU / 1 GB     → Standard apps, < 200 req/s
  1 vCPU / 2 GB       → Moderate traffic, < 500 req/s
  2 vCPU / 4 GB       → High traffic, < 1000 req/s
  4 vCPU / 8-16 GB    → Compute-heavy, > 1000 req/s

To change:
  Deployment Settings > Machine Size > Select new tier
  Redeployment required to apply

Step 5: Database Connection Scaling

// Tune PostgreSQL pool for Replit container limits
import { Pool } from 'pg';

// Small container (0.25 vCPU / 512 MB)
const smallPool = new Pool({
  connectionString: process.env.DATABASE_URL,
  ssl: { rejectUnauthorized: false },
  max: 3,                    // Few connections
  idleTimeoutMillis: 10000,  // Release quickly
});

// Medium container (1 vCPU / 2 GB)
const mediumPool = new Pool({
  connectionString: process.env.DATABASE_URL,
  ssl: { rejectUnauthorized: false },
  max: 10,                   // More headroom
  idleTimeoutMillis: 30000,
});

// Large container (4 vCPU / 8 GB)
const largePool = new Pool({
  connectionString: process.env.DATABASE_URL,
  ssl: { rejectUnauthorized: false },
  max: 20,
  idleTimeoutMillis: 60000,
});

// Dynamic pool sizing based on container resources
function createOptimalPool(): Pool {
  const memMB = Math.round(process.memoryUsage().rss / 1024 / 1024);
  const maxConns = memMB < 256 ? 3 : memMB < 1024 ? 10 : 20;

  return new Pool({
    connectionString: process.env.DATABASE_URL,
    ssl: { rejectUnauthorized: false },
    max: maxConns,
    idleTimeoutMillis: 30000,
    connectionTimeoutMillis: 5000,
  });
}

Step 6: Capacity Planning Template

## Capacity Assessment

### Current State
- Deployment type: [Autoscale / Reserved VM]
- Machine size: [vCPU / RAM]
- Peak RPS: [from load test]
- P95 latency: [from load test]
- Cold start time: [Autoscale only]

### Load Test Results
| Metric | Idle | Normal (20 VU) | Peak (50 VU) |
|--------|------|----------------|--------------|
| RPS | 0 | X | Y |
| P50 latency | - | Xms | Yms |
| P95 latency | - | Xms | Yms |
| Error rate | - | X% | Y% |
| Memory | XMB | XMB | XMB |

### Recommendations
1. [Scale action based on results]
2. [Database pool adjustment]
3. [Cold start mitigation]
4. [Cost optimization]

### Scaling Triggers
- CPU > 70% sustained: upgrade VM
- Memory > 80%: upgrade VM or fix leak
- P95 > 2s: add caching or optimize queries
- Error rate > 1%: investigate root cause

Error Handling

| Issue | Cause | Solution | |-------|-------|----------| | Cold start > 15s | Heavy startup | Pre-compile, lazy imports | | Connection pool exhausted | Too many concurrent requests | Increase pool.max or add queueing | | OOM during load test | Memory leak under load | Profile with /debug/memory | | Inconsistent results | Autoscale scaling up | Warm up before measuring |

Resources

Next Steps

For reliability patterns, see replit-reliability-patterns.

Agent Skills: Replit Load & Scale

Install this agent skill to your local

Skill Files