Agent-Skills.md

Agent Skills: OCR Document Processor

Extract text and structure from scans, images, and scanned PDFs. Use for OCR, searchable PDFs, table extraction, receipt parsing, and business card parsing.

UncategorizedID: dkyazzentwatwa/chatgpt-skills/ocr-document-processor

Author

dkyazzentwatwa

https://github.com/dkyazzentwatwa View all skills

Repository

dkyazzentwatwa/chatgpt-skills

dkyazzentwatwa

7318

Install this agent skill to your local

pnpm dlx add-skill https://github.com/dkyazzentwatwa/chatgpt-skills/tree/HEAD/ocr-document-processor

Skill Files

Browse the full folder contents for ocr-document-processor.

Loading file tree…

ocr-document-processor/SKILL.md

Skill Metadata

Name: ocr-document-processor
Description: Extract text and structure from scans, images, and scanned PDFs. Use for OCR, searchable PDFs, table extraction, receipt parsing, and business card parsing.

OCR Document Processor

Handle OCR-heavy inputs where text must be recovered from images or scanned pages.

Use This For

OCR on images and scanned PDFs
Searchable PDF export
Structured extraction to text, markdown, JSON, or HTML
Table extraction from scanned material
Receipt parsing and business card parsing

Workflow

Decide whether plain OCR, structured extraction, or document-specific parsing is needed.
Preprocess noisy inputs before extraction when skew, blur, or shadows are present.
Use scripts/ocr_processor.py for core OCR tasks.
Use the focused helpers when the input is specialized:
- scripts/business_card_scanner.py
- scripts/receipt_scanner.py
Return confidence caveats when the source is low quality, rotated, handwritten, or multilingual.

Guardrails

Prefer explicit language selection when accuracy matters.
Do not claim fields are exact when OCR confidence is weak.
Route non-scanned digital PDFs to document-converter-suite instead of OCR by default.