Exa Core Workflow B — Similarity, Contents & Answer Skill

Exa Core Workflow B — Similarity, Contents & Answer

Overview

Secondary Exa workflow covering three endpoints beyond search: findSimilar (discover pages semantically related to a URL), getContents (retrieve text/highlights for known URLs), and answer (get AI-generated answers with web citations). These complement the primary search workflow in exa-core-workflow-a.

Prerequisites

exa-js installed and EXA_API_KEY configured
Familiarity with exa-core-workflow-a search patterns

Instructions

Step 1: Find Similar Pages

import Exa from "exa-js";

const exa = new Exa(process.env.EXA_API_KEY);

// findSimilar takes a URL (not a query string) and returns
// pages with semantically similar content
const similar = await exa.findSimilar(
  "https://openai.com/research/gpt-4",
  {
    numResults: 10,
    excludeSourceDomain: true, // exclude openai.com from results
    startPublishedDate: "2024-01-01T00:00:00.000Z",
    excludeDomains: ["reddit.com", "twitter.com"],
  }
);

for (const r of similar.results) {
  console.log(`${r.title} — ${r.url}`);
}

Step 2: Find Similar with Contents

// findSimilarAndContents combines similarity search + content extraction
const results = await exa.findSimilarAndContents(
  "https://huggingface.co/blog/llama3",
  {
    numResults: 5,
    text: { maxCharacters: 2000 },
    highlights: { maxCharacters: 500, query: "open source LLM" },
    excludeSourceDomain: true,
  }
);

for (const r of results.results) {
  console.log(`## ${r.title}`);
  console.log(`URL: ${r.url}`);
  console.log(`Highlights: ${r.highlights?.join(" | ")}`);
  console.log(`Text preview: ${r.text?.substring(0, 300)}...\n`);
}

Step 3: Get Contents for Known URLs

// getContents retrieves page content for a list of URLs you already have
// Useful when you have URLs from a previous search or external source
const contents = await exa.getContents(
  [
    "https://arxiv.org/abs/2401.00001",
    "https://arxiv.org/abs/2401.00002",
    "https://blog.example.com/article",
  ],
  {
    text: { maxCharacters: 3000 },
    highlights: { maxCharacters: 500 },
    summary: { query: "key findings and methodology" },
    livecrawl: "preferred",     // try fresh, fall back to cache
    livecrawlTimeout: 15000,    // 15s timeout
    // Subpage crawling: retrieve linked pages from each URL
    subpages: 3,                // crawl up to 3 subpages per URL
    subpageTarget: "documentation",  // find subpages matching this term
  }
);

for (const r of contents.results) {
  console.log(`${r.title}: ${r.text?.length || 0} chars`);
  if (r.summary) console.log(`Summary: ${r.summary}`);
}

Step 4: AI-Powered Answer with Citations

// answer() searches the web and returns an AI-generated answer with sources
const answer = await exa.answer(
  "What are the key differences between RAG and fine-tuning for LLMs?",
  {
    text: true,
    // The answer response includes citations linking to source results
  }
);

console.log("Answer:", answer.answer);
console.log("\nSources:");
for (const r of answer.results) {
  console.log(`  - ${r.title}: ${r.url}`);
}

Step 5: Streaming Answer

// streamAnswer returns chunks as they're generated
for await (const chunk of exa.streamAnswer(
  "What is the current state of quantum computing in 2025?"
)) {
  if (chunk.content) {
    process.stdout.write(chunk.content);
  }
  if (chunk.citations) {
    console.log("\n\nCitations:", JSON.stringify(chunk.citations, null, 2));
  }
}

Output

Similar pages discovered from a seed URL
Page content (text, highlights, summary) for known URLs
AI-generated answers with web source citations
Streaming answer chunks for real-time display

Error Handling

| Error | HTTP Code | Cause | Solution | |-------|-----------|-------|----------| | INVALID_URLS | 400 | Malformed URLs in getContents | Validate URLs have protocol | | CRAWL_NOT_FOUND | 404 | Content unavailable at URL | Verify URL is accessible | | CRAWL_TIMEOUT | 504 | Live crawl exceeded timeout | Increase livecrawlTimeout | | SOURCE_NOT_AVAILABLE | 403 | Paywalled or blocked content | Try without livecrawl: "always" | | UNABLE_TO_GENERATE_RESPONSE | 501 | Insufficient data for answer | Rephrase query or add context | | Empty similar.results | 200 | Seed URL not indexed | Try a more popular seed URL |

Examples

Competitive Intelligence Pipeline

async function findCompetitors(companyUrl: string) {
  // Find companies similar to a given company
  const similar = await exa.findSimilarAndContents(companyUrl, {
    numResults: 10,
    excludeSourceDomain: true,
    text: { maxCharacters: 500 },
    category: "company",
  });

  return similar.results.map(r => ({
    name: r.title,
    url: r.url,
    description: r.text?.substring(0, 200),
  }));
}

Batch URL Content Retrieval

async function enrichUrls(urls: string[]) {
  // Process URLs in batches to stay within rate limits
  const batchSize = 10;
  const allContents = [];

  for (let i = 0; i < urls.length; i += batchSize) {
    const batch = urls.slice(i, i + batchSize);
    const contents = await exa.getContents(batch, {
      text: { maxCharacters: 1500 },
      summary: { query: "main topic and key points" },
    });
    allContents.push(...contents.results);
  }

  return allContents;
}

Resources

Next Steps

For common errors, see exa-common-errors. For SDK patterns, see exa-sdk-patterns.

Agent Skills: Exa Core Workflow B — Similarity, Contents & Answer

Install this agent skill to your local

Skill Files