Bright Data Performance Tuning
Overview
Optimize Bright Data scraping performance through connection pooling, response caching, concurrent request tuning, and smart product selection. Web Unlocker latency is typically 5-30s due to CAPTCHA solving; Scraping Browser sessions are 10-60s.
Prerequisites
- Bright Data zone configured
- Understanding of async patterns
- Redis or file cache available (optional)
Latency Benchmarks
| Product | P50 | P95 | P99 | Notes | |---------|-----|-----|-----|-------| | Web Unlocker (simple) | 3s | 8s | 15s | No CAPTCHA | | Web Unlocker (CAPTCHA) | 10s | 25s | 45s | With CAPTCHA solving | | Scraping Browser | 8s | 20s | 40s | Full browser render | | SERP API (sync) | 2s | 5s | 10s | Search results | | Residential Proxy | 1s | 3s | 8s | Raw proxy, no unblocking |
Instructions
Step 1: Choose the Right Product
// Product selection matrix
function selectProduct(target: { js: boolean; captcha: boolean; structured: boolean }) {
if (target.structured) return 'serp_api'; // Pre-parsed JSON
if (!target.js && !target.captcha) return 'residential'; // Fastest
if (target.js) return 'scraping_browser'; // Browser rendering
return 'web_unlocker'; // Best default
}
Step 2: Connection Pooling with Keep-Alive
import { Agent } from 'https';
import axios from 'axios';
// Reuse TCP connections to brd.superproxy.io
const httpsAgent = new Agent({
keepAlive: true,
maxSockets: 25, // Match your concurrency limit
maxFreeSockets: 5,
timeout: 120000,
rejectUnauthorized: false,
});
const client = axios.create({
proxy: { host: 'brd.superproxy.io', port: 33335, auth: { username: proxyUser, password: proxyPass } },
httpsAgent,
timeout: 60000,
});
Step 3: Response Caching Layer
// src/brightdata/cache.ts — avoid re-scraping identical URLs
import { createHash } from 'crypto';
import { LRUCache } from 'lru-cache';
const memoryCache = new LRUCache<string, string>({
max: 500, // Max cached pages
maxSize: 100_000_000, // 100MB total
sizeCalculation: (v) => Buffer.byteLength(v),
ttl: 3600000, // 1 hour
});
export async function cachedScrape(
url: string,
scraper: (url: string) => Promise<string>,
ttlMs?: number
): Promise<string> {
const key = createHash('sha256').update(url).digest('hex');
const cached = memoryCache.get(key);
if (cached) {
console.log(`Cache HIT: ${url}`);
return cached;
}
const html = await scraper(url);
memoryCache.set(key, html, { ttl: ttlMs });
console.log(`Cache MISS: ${url} (${Buffer.byteLength(html)} bytes)`);
return html;
}
Step 4: Concurrent Scraping with Backpressure
import PQueue from 'p-queue';
// Tune concurrency based on your plan and target site
const scrapeQueue = new PQueue({
concurrency: 10, // Concurrent proxy connections
interval: 1000, // Per second window
intervalCap: 15, // Max new requests per second
});
async function scrapeMany(urls: string[]): Promise<Map<string, string>> {
const results = new Map<string, string>();
await Promise.allSettled(
urls.map(url =>
scrapeQueue.add(async () => {
const html = await cachedScrape(url, (u) => client.get(u).then(r => r.data));
results.set(url, html);
})
)
);
console.log(`Scraped ${results.size}/${urls.length} successfully`);
return results;
}
Step 5: Use Async API for Bulk Jobs
For 100+ URLs, use the Web Scraper API instead of individual proxy requests:
// Bulk collection — one API call, Bright Data handles parallelism
async function bulkScrape(urls: string[]) {
const response = await fetch(
`https://api.brightdata.com/datasets/v3/trigger?dataset_id=${DATASET_ID}&format=json`,
{
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.BRIGHTDATA_API_TOKEN}`,
'Content-Type': 'application/json',
},
body: JSON.stringify(urls.map(url => ({ url }))),
}
);
return response.json(); // Returns snapshot_id for status polling
}
// 1000 URLs via one trigger vs 1000 individual proxy requests
Step 6: Performance Monitoring
class ScrapeMetrics {
private timings: number[] = [];
private errors = 0;
private cacheHits = 0;
record(durationMs: number) { this.timings.push(durationMs); }
recordError() { this.errors++; }
recordCacheHit() { this.cacheHits++; }
report() {
const sorted = [...this.timings].sort((a, b) => a - b);
return {
count: sorted.length,
errors: this.errors,
cacheHits: this.cacheHits,
p50: sorted[Math.floor(sorted.length * 0.5)] || 0,
p95: sorted[Math.floor(sorted.length * 0.95)] || 0,
p99: sorted[Math.floor(sorted.length * 0.99)] || 0,
};
}
}
Output
- Right product selection per use case
- Connection pooling reducing TCP overhead
- Response cache avoiding duplicate scrapes
- Concurrent scraping with backpressure control
- Bulk API for large-scale jobs
Error Handling
| Issue | Cause | Solution | |-------|-------|----------| | Slow scrapes | CAPTCHA solving overhead | Expected for Web Unlocker; use cache | | Connection exhausted | Too many concurrent | Reduce p-queue concurrency | | Memory pressure | Large cached pages | Set maxSize on LRU cache | | Timeout storms | All requests hitting slow site | Add circuit breaker |
Resources
Next Steps
For cost optimization, see brightdata-cost-tuning.