Agent Skills: OCR Document Processor

Extract text from images and scanned PDFs using OCR. Supports 100+ languages, table detection, structured output (markdown/JSON), and batch processing.

UncategorizedID: dkyazzentwatwa/chatgpt-skills/ocr-document-processor

Install this agent skill to your local

pnpm dlx add-skill https://github.com/dkyazzentwatwa/chatgpt-skills/tree/HEAD/ocr-document-processor

Skill Files

Browse the full folder contents for ocr-document-processor.

Download Skill

Loading file tree…

ocr-document-processor/SKILL.md

Skill Metadata

Name
ocr-document-processor
Description
Extract text and structure from scans, images, and scanned PDFs. Use for OCR, searchable PDFs, table extraction, receipt parsing, and business card parsing.

OCR Document Processor

Handle OCR-heavy inputs where text must be recovered from images or scanned pages.

Use This For

  • OCR on images and scanned PDFs
  • Searchable PDF export
  • Structured extraction to text, markdown, JSON, or HTML
  • Table extraction from scanned material
  • Receipt parsing and business card parsing

Workflow

  1. Decide whether plain OCR, structured extraction, or document-specific parsing is needed.
  2. Preprocess noisy inputs before extraction when skew, blur, or shadows are present.
  3. Use scripts/ocr_processor.py for core OCR tasks.
  4. Use the focused helpers when the input is specialized:
    • scripts/business_card_scanner.py
    • scripts/receipt_scanner.py
  5. Return confidence caveats when the source is low quality, rotated, handwritten, or multilingual.

Guardrails

  • Prefer explicit language selection when accuracy matters.
  • Do not claim fields are exact when OCR confidence is weak.
  • Route non-scanned digital PDFs to document-converter-suite instead of OCR by default.