Sentry Load & Scale
Configure Sentry for applications processing 1M+ requests/day without sacrificing error visibility, burning through quota, or adding measurable SDK overhead. Covers adaptive sampling, connection pooling, multi-region tagging, quota management, SDK benchmarking, batch submission, load testing, and self-hosted deployment considerations.
Prerequisites
- Application handling sustained high traffic (>10K requests/min or >1M events/day)
- Sentry organization with quota and billing access (Settings > Subscription)
@sentry/nodev8+ installed (npm ls @sentry/node)- Performance baseline established (p50/p95/p99 latency without Sentry)
- Event volume estimates calculated per category (errors, transactions, replays, attachments)
Instructions
Step 1 — Implement Adaptive Sampling
Static tracesSampleRate wastes quota at scale because it treats a health check the same as a checkout. Replace it with a traffic-aware tracesSampler that adjusts rates based on endpoint criticality and current load.
Traffic-aware tracesSampler:
import * as Sentry from '@sentry/node';
// Track request volume per endpoint for adaptive rate adjustment
const endpointVolume = new Map<string, { count: number; resetAt: number }>();
const WINDOW_MS = 60_000;
function getAdaptiveRate(name: string, baseRate: number): number {
const now = Date.now();
let entry = endpointVolume.get(name);
if (!entry || now > entry.resetAt) {
entry = { count: 0, resetAt: now + WINDOW_MS };
endpointVolume.set(name, entry);
}
entry.count++;
// Scale down sampling as volume increases within window
// 0-100 req/min: full base rate
// 100-1000: halve it
// 1000+: quarter it
if (entry.count > 1000) return baseRate * 0.25;
if (entry.count > 100) return baseRate * 0.5;
return baseRate;
}
Sentry.init({
dsn: process.env.SENTRY_DSN,
tracesSampler: (samplingContext) => {
const { name, parentSampled } = samplingContext;
// Always respect parent decision for distributed tracing consistency
if (parentSampled !== undefined) return parentSampled ? 1.0 : 0;
// Tier 0: Never sample — high-frequency, zero diagnostic value
if (name?.match(/\/(health|ready|alive|ping|metrics|favicon)/)) return 0;
if (name?.match(/\.(css|js|png|jpg|svg|woff2?|ico)$/)) return 0;
// Tier 1: Always sample — business-critical, low volume
if (name?.includes('/payment') || name?.includes('/checkout')) return 1.0;
if (name?.includes('/auth/login')) return getAdaptiveRate('auth', 0.5);
// Tier 2: Moderate sampling — API mutations (higher signal)
if (name?.startsWith('POST /api/')) return getAdaptiveRate(name, 0.05);
if (name?.startsWith('PUT /api/')) return getAdaptiveRate(name, 0.05);
if (name?.startsWith('DELETE /api/')) return getAdaptiveRate(name, 0.05);
// Tier 3: Light sampling — API reads
if (name?.startsWith('GET /api/')) return getAdaptiveRate(name, 0.02);
// Tier 4: Background jobs — sample sparingly
if (name?.startsWith('job:') || name?.startsWith('queue:')) {
return getAdaptiveRate(name, 0.01);
}
// Tier 5: Everything else — minimal baseline
return getAdaptiveRate(name || 'default', 0.005);
},
});
Adaptive error deduplication with beforeSend:
// Reduce duplicate error volume by 90%+ while preserving first-occurrence fidelity
const errorCounts = new Map<string, number>();
const ERROR_WINDOW_MS = 60_000;
setInterval(() => errorCounts.clear(), ERROR_WINDOW_MS);
Sentry.init({
dsn: process.env.SENTRY_DSN,
beforeSend(event, hint) {
const error = hint?.originalException;
const key = error instanceof Error
? `${error.name}:${error.message?.substring(0, 100)}`
: `unknown:${String(event.message || '').substring(0, 100)}`;
const count = (errorCounts.get(key) || 0) + 1;
errorCounts.set(key, count);
// First occurrence: always send with full context
if (count === 1) return event;
// 2-10: send every 5th (capture ramp-up pattern)
if (count <= 10) return count % 5 === 0 ? event : null;
// 11-100: send every 25th (confirm still happening)
if (count <= 100) return count % 25 === 0 ? event : null;
// 100+: send every 100th (volume indicator only)
return count % 100 === 0 ? event : null;
},
});
Step 2 — Optimize SDK for Minimal Overhead
At high throughput, every byte and every millisecond of SDK processing matters. This configuration reduces memory footprint, payload size, and CPU time.
Lean SDK initialization:
import * as Sentry from '@sentry/node';
import os from 'node:os';
Sentry.init({
dsn: process.env.SENTRY_DSN,
environment: process.env.NODE_ENV || 'production',
release: `${process.env.SERVICE_NAME}@${process.env.VERSION || 'unknown'}`,
// --- Memory reduction ---
maxBreadcrumbs: 15, // Down from 100 default; saves ~85KB/scope
maxValueLength: 200, // Truncate long string values
// --- Disable high-overhead integrations ---
integrations: (defaults) => defaults.filter(i =>
!['Console', 'ContextLines'].includes(i.name)
),
// --- No profiling at high scale (use dedicated APM if needed) ---
profilesSampleRate: 0,
// --- Transport tuning for high-throughput ---
transportOptions: {
bufferSize: 100, // Default 64; absorbs traffic spikes
},
// --- Context size limiter ---
beforeSend(event) {
// Truncate oversized contexts to prevent payload bloat
if (event.contexts) {
for (const [key, ctx] of Object.entries(event.contexts)) {
const str = JSON.stringify(ctx);
if (str.length > 2000) {
event.contexts[key] = { _truncated: true, originalSize: str.length };
}
}
}
// Strip headers that add bulk without diagnostic value
if (event.request?.headers) {
const keep = ['content-type', 'accept', 'user-agent', 'x-request-id'];
event.request.headers = Object.fromEntries(
Object.entries(event.request.headers)
.filter(([k]) => keep.includes(k.toLowerCase()))
);
}
return event;
},
// --- Multi-region tags for infrastructure visibility ---
serverName: process.env.HOSTNAME || process.env.POD_NAME || os.hostname(),
initialScope: {
tags: {
region: process.env.AWS_REGION || process.env.GCP_REGION || 'unknown',
cluster: process.env.K8S_CLUSTER || 'default',
pod: process.env.POD_NAME || 'unknown',
service: process.env.SERVICE_NAME || 'unknown',
},
},
});
Graceful shutdown ensuring event delivery:
import * as Sentry from '@sentry/node';
async function shutdown(signal: string) {
console.log(`${signal} received — flushing Sentry events`);
// Stop accepting new requests
server.close();
// Flush all pending events (2s timeout prevents hanging deploys)
const flushed = await Sentry.close(2000);
if (!flushed) {
console.warn('Sentry flush timed out — some events may be lost');
}
process.exit(0);
}
process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));
Step 3 — Manage Quotas, Test Under Load, and Plan for Scale
Quota management and reserved volume pricing:
Application: 10M requests/day, 0.1% error rate, @sentry/node v8
Error events (with adaptive beforeSend):
Raw errors: 10M x 0.001 = 10,000/day
After dedup: ~1,000/day (90% reduction) = 30K/month
Transaction events (with tiered tracesSampler):
Health/static: 0% of 4M = 0
Payment (T1): 100% of 5K = 5,000/day
POST API (T2): 5% of 500K = 25,000/day
GET API (T3): 2% of 5M = 100,000/day
Other (T5): 0.5% of 500K = 2,500/day
Total: ~132K/day = 4M/month
Sentry Business plan ($26/mo base):
Errors: 30K included in base plan
Transactions: 100K included, overage 3.9M x $0.000025 = ~$97/mo
Estimated total: ~$123/month for 10M requests/day
Reserved volume (if predictable traffic):
5M txns/mo reserved = $80/mo (vs $97 on-demand)
Saves ~$17/mo, locks in price for 12 months
→ Total: ~$106/month
SDK overhead benchmarks:
// Measure SDK initialization cost
const initStart = performance.now();
Sentry.init({ /* ... */ });
const initMs = performance.now() - initStart;
console.log(`Sentry.init: ${initMs.toFixed(1)}ms`);
// Expected: 5-15ms (Node.js), acceptable <50ms
// Measure per-request overhead with Sentry vs without
import { performance, PerformanceObserver } from 'node:perf_hooks';
async function benchmarkOverhead(iterations: number = 1000) {
// Baseline: request without Sentry instrumentation
const baseStart = performance.now();
for (let i = 0; i < iterations; i++) {
await handleRequest({ path: '/api/test', method: 'GET' });
}
const baseMs = (performance.now() - baseStart) / iterations;
// Instrumented: request with Sentry span
const sentryStart = performance.now();
for (let i = 0; i < iterations; i++) {
await Sentry.startSpan(
{ name: 'GET /api/test', op: 'http.server' },
() => handleRequest({ path: '/api/test', method: 'GET' })
);
}
const sentryMs = (performance.now() - sentryStart) / iterations;
console.log(`Baseline: ${baseMs.toFixed(3)}ms/req`);
console.log(`With Sentry: ${sentryMs.toFixed(3)}ms/req`);
console.log(`Overhead: ${(sentryMs - baseMs).toFixed(3)}ms (${(((sentryMs - baseMs) / baseMs) * 100).toFixed(1)}%)`);
// Healthy: <0.5ms overhead per request, <2% CPU impact
}
Load testing Sentry integration with k6:
// k6-sentry-load-test.js
// Run: k6 run --vus 100 --duration 5m k6-sentry-load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';
const errorRate = new Rate('sentry_errors_captured');
const latencyOverhead = new Trend('sentry_latency_overhead_ms');
export const options = {
stages: [
{ duration: '1m', target: 50 }, // Ramp up
{ duration: '3m', target: 200 }, // Sustained load
{ duration: '1m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<500'], // p95 under 500ms with Sentry
sentry_latency_overhead_ms: ['p(95)<5'], // Sentry adds <5ms at p95
},
};
const BASE_URL = __ENV.BASE_URL || 'http://localhost:3000';
export default function () {
// Normal traffic: API reads (high volume, low sample rate)
const readRes = http.get(`${BASE_URL}/api/products`);
check(readRes, { 'GET 200': (r) => r.status === 200 });
// Track overhead via server timing header (if exposed)
const sentryMs = readRes.headers['Server-Timing']?.match(/sentry;dur=(\d+\.?\d*)/);
if (sentryMs) latencyOverhead.add(parseFloat(sentryMs[1]));
// Occasional writes (lower volume, higher sample rate)
if (Math.random() < 0.1) {
const writeRes = http.post(`${BASE_URL}/api/orders`, JSON.stringify({
items: [{ sku: 'TEST-001', qty: 1 }],
}), { headers: { 'Content-Type': 'application/json' } });
check(writeRes, { 'POST 201': (r) => r.status === 201 });
}
// Trigger errors (verify Sentry captures under load)
if (Math.random() < 0.01) {
const errRes = http.get(`${BASE_URL}/api/nonexistent-route`);
errorRate.add(errRes.status === 404);
}
sleep(0.1);
}
Background worker batch patterns:
import * as Sentry from '@sentry/node';
// For queue workers processing millions of jobs/day
async function processJobBatch(jobs: Job[]) {
// Group jobs for batch-level tracing instead of per-job spans
return Sentry.startSpan(
{
name: `batch.${jobs[0]?.type || 'unknown'}`,
op: 'queue.batch',
attributes: { 'batch.size': jobs.length },
},
async () => {
const results = { success: 0, failed: 0 };
for (const job of jobs) {
try {
await Sentry.withScope(async (scope) => {
scope.setTag('job.type', job.type);
scope.setTag('job.queue', job.queue);
scope.setContext('job', {
id: job.id,
attempts: job.attempts,
});
await executeJob(job);
results.success++;
});
} catch (error) {
results.failed++;
Sentry.captureException(error, {
tags: { 'job.id': job.id, 'job.type': job.type },
level: job.attempts >= 3 ? 'error' : 'warning',
});
}
}
Sentry.setMeasurement('batch.success_rate',
results.success / jobs.length, 'ratio');
return results;
}
);
}
// Periodic flush for long-running workers (don't rely on process exit)
setInterval(async () => {
await Sentry.flush(2000);
}, 30_000);
Self-hosted Sentry for enterprise (>100M events/month):
Key tuning for self-hosted (docker-compose.override.yml on top of getsentry/self-hosted):
- Relay:
RELAY_PROCESSING_MAX_RATE: 50000,RELAY_UPSTREAM_MAX_CONNECTIONS: 200 - Kafka:
KAFKA_NUM_PARTITIONS: 32(match to consumer count) - Snuba: 4+ consumer replicas for Clickhouse ingestion parallelism
- Clickhouse: 16G+ RAM, dedicated SSD volumes
Self-hosted vs SaaS break-even:
SaaS at 100M events/month: ~$2,500/mo (Business plan + overage)
Self-hosted (3x r6g.2xlarge): ~$1,200/mo infra + $800/mo ops (0.25 FTE)
Break-even: ~50M events/month
→ Use SaaS up to 50M events; evaluate self-hosted above that
Output
- Adaptive sampling reducing duplicate error volume by 90%+ while preserving first-occurrence fidelity
- Traffic-aware
tracesSamplerwith 5 tiers adjusting dynamically based on endpoint volume - SDK memory and CPU footprint minimized (15 breadcrumbs, truncated contexts, filtered headers)
- Connection pooling via persistent HTTPS agent for efficient event submission
- Multi-region infrastructure tags for filtering by region/cluster/pod in Sentry dashboard
- Cost model with reserved volume pricing showing $106/month for 10M requests/day
- k6 load test script validating Sentry overhead stays under 5ms at p95
- Batch job processing pattern with scope isolation and periodic flush
- Self-hosted vs SaaS break-even analysis for enterprise decision-making
Error Handling
| Error | Cause | Solution |
|-------|-------|----------|
| Events silently dropped | SDK buffer full during traffic spike | Increase transportOptions.bufferSize to 200+, verify network to Sentry ingest |
| 429 rate limit from Sentry | Quota exhausted or spike protection triggered | Enable spike protection in Settings > Subscription, reduce sample rates |
| Memory growing linearly over time | Breadcrumb or scope accumulation | Reduce maxBreadcrumbs, verify withScope is used (not configureScope) |
| Lost events on deploy/restart | No Sentry.close() in shutdown handler | Add SIGTERM/SIGINT handlers calling Sentry.close(2000) |
| Distributed traces broken at scale | Mixed sampling decisions across services | Always check parentSampled first in tracesSampler |
| Clickhouse OOM on self-hosted | Insufficient memory for event volume | Allocate 16G+ RAM, increase Snuba consumer replicas |
| k6 shows >5ms Sentry overhead | Too many integrations or large payloads | Disable Console/ContextLines integrations, reduce maxValueLength |
| Quota burn from replay/attachments | Replays not rate-limited separately | Set replaysSessionSampleRate: 0.01 and replaysOnErrorSampleRate: 0.1 |
Examples
Minimal high-scale init (copy-paste ready):
import * as Sentry from '@sentry/node';
Sentry.init({
dsn: process.env.SENTRY_DSN,
environment: process.env.NODE_ENV,
release: `${process.env.SERVICE_NAME}@${process.env.VERSION}`,
maxBreadcrumbs: 15,
maxValueLength: 200,
profilesSampleRate: 0,
tracesSampler: ({ name, parentSampled }) => {
if (parentSampled !== undefined) return parentSampled ? 1.0 : 0;
if (name?.match(/\/(health|ping|metrics)/)) return 0;
if (name?.includes('/payment')) return 1.0;
if (name?.startsWith('POST /api/')) return 0.05;
return 0.005;
},
});
Verify sampling is working as expected:
// Add to non-production environments temporarily
Sentry.init({
// ... config ...
tracesSampler: (ctx) => {
const rate = calculateRate(ctx); // your logic
if (process.env.DEBUG_SENTRY === 'true') {
console.log(`[sentry] ${ctx.name} → rate=${rate}`);
}
return rate;
},
});
Resources
- Quota Management — spike protection, rate limits, reserved volume
- Sampling Configuration — tracesSampler API reference
- Transport Configuration — custom transport, buffer size
- Self-Hosted Sentry — installation and scaling guide
- Pricing Calculator — estimate costs by event volume
- SDK Performance Overhead — benchmarks and best practices
Next Steps
- Run the k6 load test against staging to establish your baseline Sentry overhead
- Set up Sentry Spike Protection (Settings > Subscription > Spike Protection) before going to production
- Configure server-side sampling rules in Sentry Dynamic Sampling (Project Settings > Performance) to complement client-side
tracesSampler - Create a Sentry dashboard with widgets for: events/hour by category, quota usage %, p95 SDK overhead
- Review the
sentry-cost-tuningskill for detailed quota optimization strategies