Purpose
Extracts textual content, summaries, or structured data from PDF documents using Claude's multimodal vision capabilities.
When to Use
- When needing to pull raw text from a PDF for processing or analysis.
- When summarizing multi-page reports or documents.
- When extracting specific fields from forms using a JSON schema.
Process
- Ingestion: Process all provided PDF documents.
- Task Determination: Analyze the prompt for extraction, summarization, Q&A, or data extraction.
- Execution: Maintain reading order and extract content according to task requirements.
- Format: Return a valid JSON object entry for each document.
Agent Call
Called by: doc-processor-agent (or any other agent)
Input: $PROMPT (what to do) and one or more PDF files.
Output
Returns a JSON object containing the processed output for each document, or an error object.