Agent Skills: PDF to Markdown

Extracting text and tables, merging/splitting documents. Use when needing to convert PDFs while preserving some structure.

UncategorizedID: cardoso-neto/personal-ai-infra/pdf-to-markdown

Install this agent skill to your local

pnpm dlx add-skill https://github.com/cardoso-neto/personal-ai-infra/tree/HEAD/skills/pdf-to-markdown

Skill Files

Browse the full folder contents for pdf-to-markdown.

Download Skill

Loading file tree…

skills/pdf-to-markdown/SKILL.md

Skill Metadata

Name
pdf-to-markdown
Description
Extracting text and tables, merging/splitting documents. Use when needing to convert PDFs while preserving structure.

PDF to Markdown

marker-pdf

CLI

# pip install marker-pdf  # python==3.12
marker_single input.pdf --output_dir ./marker-output

marker_single input.pdf --output_dir ./out --page_range "0,5-10"  # specific pages
marker_single input.pdf --output_dir ./out --force_ocr  # for scanned PDFs
OUTPUT_IMAGE_FORMAT=PNG marker_single input.pdf --output_dir ./out  # change image format to PNG
marker_single input.pdf --output_dir ./out --use_llm \
  --llm_service marker.services.openai.OpenAIService \
  --openai_api_key "$OPENAI_API_KEY" \
  --openai_model gpt-5.2

Output:

  • marker_output/<filename>/<filename>.md
  • extracted images as JPEGs by default.
    • marker_output/<filename>/_page_<N>_Figure_<M>.jpeg
  • Run marker_single --help for all available options.

python API

downsides

  • Discards most styling (colors, fonts, anything that would need HTML).