Agent Skills: Pandoc Document Converter

>

UncategorizedID: haiyuan-ai/agent-skills/pandoc-converter

Install this agent skill to your local

pnpm dlx add-skill https://github.com/haiyuan-ai/agent-skills/tree/HEAD/pandoc-converter

Skill Files

Browse the full folder contents for pandoc-converter.

Download Skill

Loading file tree…

pandoc-converter/SKILL.md

Skill Metadata

Name
pandoc-converter
Description
>

Pandoc Document Converter

Convert between Markdown, Word (.docx), HTML, and PDF with proper CJK support out of the box.

Supported Conversions

| From | To | |----------|--------------| | Markdown | PDF | | Markdown | Word (.docx) | | Markdown | HTML | | Word | Markdown | | Word | PDF | | HTML | Markdown | | HTML | Word (.docx) | | HTML | PDF |

PDF as input is not supported (pandoc limitation).


Scripts

| Script | Purpose | When to use | |--------|---------|-------------| | convert-to-pdf.sh | Optimized PDF with CJK monospace font, 11pt, 1.5cm margins | All PDF conversions (recommended) | | fix-ascii-art.py | Pad ASCII box lines to equal width | Before Word conversion if ASCII diagrams exist |


Step-by-step Workflow

1. Identify the conversion

From the user's request, determine:

  • Source file(s): path and format
  • Target format: pdf, docx, md, or html
  • Options: template, styling, batch mode

Verify the source file exists before proceeding.

2. Build the pandoc command

Start with the base: pandoc <input> -o <output>

Then layer on options based on the target format.

PDF Output

Recommended: use the conversion script (includes CJK monospace font, 11pt, optimized margins):

bash ~/.agents/skills/pandoc-converter/scripts/convert-to-pdf.sh input.md

Manual setup:

pandoc input.md -o output.pdf \
  --pdf-engine=xelatex \
  -V CJKmainfont="PingFang SC" \
  -V monofont="Sarasa Fixed SC" \
  -V geometry:margin=2cm

Common variables:

-V fontsize=11pt             # 11pt recommended for technical docs
-V linestretch=1.5
-V papersize=a4
-V toc=true

πŸ“š Font details: references/fonts.md

Word Output

Recommended workflow:

# 1. Fix ASCII art alignment (if needed)
python3 ~/.agents/skills/pandoc-converter/scripts/fix-ascii-art.py input.md --check

# 2. Fix if issues found
python3 ~/.agents/skills/pandoc-converter/scripts/fix-ascii-art.py input.md

# 3. Convert with reference.docx
pandoc input.md -o output.docx \
  --reference-doc=~/.agents/skills/pandoc-converter/references/reference.docx

Built-in reference.docx includes:

  • CJK font: 思源黑体 CN (Source Han Sans CN)
  • English font: Times New Roman
  • Code font: Sarasa Fixed SC (CJK-aware monospace)
  • Table styles: Header shading, vertical center alignment

Markdown Output

pandoc input.docx -o output.md --extract-media=./media --wrap=none

HTML Output

# Standalone HTML
pandoc input.md -o output.html --standalone

# With CSS
pandoc input.md -o output.html --standalone --css=style.css

# Self-contained (embed images)
pandoc input.md -o output.html --standalone --embed-resources

HTML Input

# HTML to Markdown
pandoc input.html -o output.md --wrap=none

# HTML to PDF
pandoc input.html -o output.pdf --pdf-engine=xelatex -V CJKmainfont="PingFang SC"

3. Handle images and resources

  • For markdown with local images: use --resource-path if needed
  • For Word to Markdown: always use --extract-media
  • For PDF with images: xelatex handles most formats

4. Run and verify

Execute the command. Common issues:

| Issue | Solution | |-------|----------| | xelatex not found | brew install --cask mactex | | Font not found | fc-list :lang=zh to list available fonts | | Missing LaTeX package | tlmgr install <package> |

5. Batch conversion

Use a for-loop with the same options as single-file conversion:

for f in *.md; do pandoc "$f" -o "${f%.md}.pdf" --pdf-engine=xelatex -V CJKmainfont="PingFang SC"; done

Advanced Features

The following features are documented in separate reference files:

| Feature | Description | Reference | |---------|-------------|-----------| | Font Configuration | CJK fonts, fallback, code fonts | references/fonts.md | | Syntax Highlighting | Code themes, language support | references/syntax-highlighting.md | | Math | LaTeX equations, MathJax, KaTeX | references/math.md | | PDF Features | Metadata, frontmatter, watermarks | references/pdf-features.md | | Advanced | Citations, multi-file, GFM, Lua filters | references/advanced.md |


Common Pitfalls

  • Garbled Chinese text in PDF: Always use --pdf-engine=xelatex with a CJK font
  • Word styles look wrong: Use --reference-doc for custom styling
  • Images missing in Markdown output: Add --extract-media
  • PDF margins too tight: Add -V geometry:margin=2cm
  • HTML lacks styles: Use --standalone
  • HTML images not showing: Use --embed-resources to inline images
  • Citations not rendering: Ensure --citeproc is included
  • Math not rendering in HTML: Add --mathjax or --katex
  • ASCII art misaligned in Word/PDF:
    • Run python3 ~/.agents/skills/pandoc-converter/scripts/fix-ascii-art.py input.md
    • Use convert-to-pdf.sh which enforces monospace font
  • Code block background shows trailing spaces: reference.docx has no shading on Source Code style

Output Naming

Unless the user specifies an output path, place the output in the same directory as the input, with the same base name and the new extension.


Safety

  • Check if target file exists before overwriting
  • Always quote paths in shell commands
  • Only read source files and write output; never modify originals
Pandoc Document Converter Skill | Agent Skills