Agent Skills: Exa Load & Scale

|

UncategorizedID: jeremylongshore/claude-code-plugins-plus-skills/exa-load-scale

Install this agent skill to your local

pnpm dlx add-skill https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/HEAD/plugins/saas-packs/exa-pack/skills/exa-load-scale

Skill Files

Browse the full folder contents for exa-load-scale.

Download Skill

Loading file tree…

plugins/saas-packs/exa-pack/skills/exa-load-scale/SKILL.md

Skill Metadata

Name
exa-load-scale
Description
'Implement Exa load testing, capacity planning, and scaling strategies.

Exa Load & Scale

Overview

Load testing and capacity planning for Exa integrations. Key constraint: Exa's default rate limit is 10 QPS. Scaling strategies focus on caching, request queuing, parallel processing within rate limits, and search type selection for latency budgets.

Prerequisites

  • k6 load testing tool installed
  • Test environment Exa API key (separate from production)
  • Redis for result caching

Capacity Reference

| Search Type | Typical Latency | Max Throughput (10 QPS) | |-------------|----------------|-------------------------| | instant | < 150ms | 10 req/s (600/min) | | fast | < 425ms | 10 req/s (600/min) | | auto | 300-1500ms | 10 req/s (600/min) | | neural | 500-2000ms | 10 req/s (600/min) | | deep | 2-5s | 10 req/s (600/min) |

With caching (50% hit rate): Effective throughput doubles to 20 req/s equivalent.

Instructions

Step 1: k6 Load Test Against Your Wrapper

// exa-load-test.js
import http from "k6/http";
import { check, sleep } from "k6";

export const options = {
  stages: [
    { duration: "1m", target: 5 },    // Ramp up to 5 VUs
    { duration: "3m", target: 5 },    // Steady state
    { duration: "1m", target: 10 },   // Push toward rate limit
    { duration: "2m", target: 10 },   // Stress test
    { duration: "1m", target: 0 },    // Ramp down
  ],
  thresholds: {
    http_req_duration: ["p(95)<3000"],  // 3s P95 for neural search
    http_req_failed: ["rate<0.05"],     // < 5% error rate
  },
};

const queries = [
  "best practices for building RAG systems",
  "transformer architecture improvements 2025",
  "TypeScript 5.5 new features",
  "vector database comparison guide",
  "AI safety alignment research",
];

export default function () {
  const query = queries[Math.floor(Math.random() * queries.length)];

  const response = http.post(
    `${__ENV.APP_URL}/api/search`,
    JSON.stringify({ query, numResults: 3 }),
    {
      headers: { "Content-Type": "application/json" },
      timeout: "10s",
    }
  );

  check(response, {
    "status 200": (r) => r.status === 200,
    "has results": (r) => JSON.parse(r.body).results?.length > 0,
    "latency < 3s": (r) => r.timings.duration < 3000,
  });

  sleep(0.5 + Math.random()); // 0.5-1.5s between requests
}
# Run load test
k6 run --env APP_URL=http://localhost:3000 exa-load-test.js

Step 2: Throughput Maximizer with Request Queue

import Exa from "exa-js";
import PQueue from "p-queue";

const exa = new Exa(process.env.EXA_API_KEY);

// Stay under 10 QPS rate limit
const searchQueue = new PQueue({
  concurrency: 8,        // max concurrent requests
  interval: 1000,        // per second
  intervalCap: 10,       // Exa's QPS limit
});

async function highThroughputSearch(queries: string[]) {
  const results = [];

  for (const query of queries) {
    const promise = searchQueue.add(async () => {
      const result = await exa.searchAndContents(query, {
        type: "auto",
        numResults: 3,
        text: { maxCharacters: 500 },
      });
      return { query, results: result.results };
    });
    results.push(promise);
  }

  return Promise.all(results);
}

// Process 100 queries respecting rate limits
const queries = Array.from({ length: 100 }, (_, i) => `research topic ${i}`);
console.time("batch");
const results = await highThroughputSearch(queries);
console.timeEnd("batch");
// Expected: ~10-12 seconds (100 queries / 10 QPS)

Step 3: Caching for Scale

import { LRUCache } from "lru-cache";

// Cache eliminates repeat queries entirely
const cache = new LRUCache<string, any>({
  max: 10000,
  ttl: 3600 * 1000, // 1-hour TTL
});

async function scalableSearch(query: string, opts: any) {
  const key = `${query.toLowerCase().trim()}:${opts.type}:${opts.numResults}`;
  const cached = cache.get(key);
  if (cached) return cached;

  const result = await searchQueue.add(() =>
    exa.searchAndContents(query, opts)
  );
  cache.set(key, result);
  return result;
}

// With 50% cache hit rate:
// 100 unique queries → 50 API calls → 5 seconds instead of 10

Step 4: Capacity Planning Calculator

interface CapacityEstimate {
  dailySearches: number;
  peakQPS: number;
  cacheHitRate: number;
  effectiveQPS: number;
  withinLimits: boolean;
  recommendation: string;
}

function estimateCapacity(
  dailySearches: number,
  peakMultiplier = 3,
  expectedCacheHitRate = 0.5
): CapacityEstimate {
  const avgQPS = dailySearches / (24 * 3600);
  const peakQPS = avgQPS * peakMultiplier;
  const effectiveQPS = peakQPS * (1 - expectedCacheHitRate);
  const withinLimits = effectiveQPS <= 10; // Default Exa limit

  let recommendation = "Within default limits";
  if (effectiveQPS > 10 && effectiveQPS <= 50) {
    recommendation = "Contact hello@exa.ai for Enterprise rate limits";
  } else if (effectiveQPS > 50) {
    recommendation = "Requires Enterprise plan + aggressive caching + request queue";
  }

  return { dailySearches, peakQPS, cacheHitRate: expectedCacheHitRate, effectiveQPS, withinLimits, recommendation };
}

// Example: 50,000 searches/day
const estimate = estimateCapacity(50000);
console.log(estimate);
// { effectiveQPS: ~0.87, withinLimits: true, recommendation: "Within default limits" }

Benchmark Results Template

## Exa Performance Benchmark
**Date:** YYYY-MM-DD | **SDK:** exa-js X.Y.Z

| Metric | Value |
|--------|-------|
| Total Requests | N |
| Success Rate | X% |
| Cache Hit Rate | X% |
| P50 Latency | Xms |
| P95 Latency | Xms |
| Peak QPS (actual API calls) | X |
| 429 Rate Limit Errors | N |

Error Handling

| Issue | Cause | Solution | |-------|-------|----------| | 429 errors in load test | Exceeding 10 QPS | Reduce concurrency, add cache | | Inconsistent latency | Different search types | Standardize on one type per test | | Timeout errors | Deep search under load | Use fast or auto for load tests | | Cache miss rate high | Unique queries per request | Use a fixed query pool |

Resources

Next Steps

For reliability patterns, see exa-reliability-patterns.