MANDATORY RULES: VIOLATION IS FORBIDDEN Skill

MANDATORY RULES: VIOLATION IS FORBIDDEN

Response language follows language setting in .agents/oma-config.yaml if configured.
NEVER skip steps. Execute from Step 1 in order.
Default output location: same directory as input PDF.

Vendor note: This workflow executes inline (no subagent spawning). Uses uvx opendataloader-pdf for zero-install execution.

Step 1: Validate Input

Identify the PDF file path from the user's request
Confirm the file exists (ls -lh "{path}")
Determine output location:
- User specified a path → use it
- Not specified → same directory as the input PDF
Set output filename: {input_basename}.md

If user provided no PDF path, ask:

Which PDF file should I convert? Please provide the file path.

Step 2: Convert

Run the conversion:

uvx opendataloader-pdf "{input_path}" --format markdown --output-dir "{output_dir}"

Variant: Tagged PDF (structured documents)

uvx opendataloader-pdf "{input_path}" --format markdown --output-dir "{output_dir}" --use-struct-tree

Variant: With images embedded

uvx opendataloader-pdf "{input_path}" --format markdown --output-dir "{output_dir}" --image-output embedded

Variant: Multiple formats

uvx opendataloader-pdf "{input_path}" --format markdown,json --output-dir "{output_dir}"

Step 3: Lint & Format

Run mdformat to normalize the converted Markdown:

uvx mdformat "{output_path}"

This auto-fixes:

Inconsistent heading style
Missing blank lines around blocks
Trailing whitespace
Unordered list marker normalization

To check without modifying (dry-run):

uvx mdformat --check "{output_path}"

Step 4: Verify Output

Check the output file was created
Read the first ~50 lines to verify structure:
- Headings present (#, ##, etc.)
- Tables rendered (pipe syntax |)
- No garbled text or encoding issues
If quality is poor:
- Try --use-struct-tree if not already used
- Suggest hybrid mode for scanned/complex PDFs

Step 5: Report

Tell the user:

Output file path
Quick quality assessment (headings, tables, images detected)
Any issues or recommendations

Output example:

PDF converted successfully.

Output: /path/to/document.md
- 15 pages processed
- 3 tables, 12 headings detected
- No issues found

OCR Mode (Scanned PDFs)

If standard conversion produces empty or garbled output:

Inform the user the PDF appears to be scanned
Guide them to start the hybrid server (it is a console script of the [hybrid] extra — the bare package name does not exist on PyPI; first run downloads a large OCR stack):
```
uvx --from "opendataloader-pdf[hybrid]" opendataloader-pdf-hybrid --port 5002 --force-ocr --ocr-lang "en"
```

Re-run conversion with hybrid:

uvx opendataloader-pdf --hybrid docling-fast "{input_path}" --format markdown --output-dir "{output_dir}"

For Korean documents, use --ocr-lang "ko,en".

Error Recovery

| Error | Recovery | |-------|----------| | uvx not found | Guide: curl -LsSf https://astral.sh/uv/install.sh \| sh | | File not found | Ask user to verify the path | | Permission denied | Check file permissions | | Empty output | Likely scanned PDF → suggest OCR mode | | Timeout on large PDF | Suggest processing specific page ranges |

Quick Reference

| Command | Effect | |---------|--------| | /pdf document.pdf | Convert to .md in same directory | | /pdf document.pdf --output-dir ./out/ | Convert to specified directory | | /pdf document.pdf --format json | Output as JSON instead | | /pdf document.pdf --use-struct-tree | Use Tagged PDF structure | | /pdf document.pdf --image-output embedded | Include images as base64 | | /pdf *.pdf | Batch convert all PDFs |

Agent Skills: MANDATORY RULES: VIOLATION IS FORBIDDEN

Install this agent skill to your local

Skill Files