OCRmyPDF — Optimization Guide
Overview
OCRmyPDF provides extensive optimization options to reduce file size, create PDF/A archival documents, and configure output quality.
For core OCR functionality, see the ocrmypdf skill. For image processing (deskew, rotate, clean), see ocrmypdf-image. For batch/Docker/scripting, see ocrmypdf-batch.
Compression Levels
# Level 0 — no optimization (fastest)
ocrmypdf --optimize 0 input.pdf output.pdf
# Level 1 — lossless (default)
ocrmypdf --optimize 1 input.pdf output.pdf
# Level 2 — lossy (aggressive)
ocrmypdf --optimize 2 input.pdf output.pdf
# Level 3 — lossless, aggressive JPEG recompression
ocrmypdf --optimize 3 input.pdf output.pdf
PDF/A Output
PDF/A is an archival format with embedded fonts and colorspaces:
# PDF/A-1b (basic, default)
ocrmypdf --output-type pdfa input.pdf output.pdf
# PDF/A-2b (includes transparency)
ocrmypdf --output-type pdfa2b input.pdf output.pdf
# PDF/A-2u (Unicode)
ocrmypdf --output-type pdfa2u input.pdf output.pdf
# Standard PDF (no archival)
ocrmypdf --output-type pdf input.pdf output.pdf
JBIG2 Encoding
JBIG2 provides excellent compression for monochrome (1-bit) images:
# Enable JBIG2 (requires jbig2enc)
ocrmypdf --jbig2-lossy input.pdf output.pdf # Lossy
ocrmypdf --jbib2-lossless input.pdf output.pdf # Lossless (v17+)
Requirements:
# Debian/Ubuntu
apt install jbig2enc
# macOS
brew install jbig2enc
PNG Optimization
Optimize embedded PNG images:
# Use pngquant for lossy compression
ocrmypdf --png-lossy input.pdf output.pdf
# Lossless PNG optimization
ocrmypdf --png-lossless input.pdf output.pdf
Ghostscript Options
Fine-tune PDF processing with Ghostscript:
# Set PDF minor version
ocrmypdf --pdf-renderer hatch input.pdf output.pdf
# Use pdfimages for better image extraction
ocrmypdf --pdf-renderer img2pdf input.pdf output.pdf
Sidecar Text
Generate text file alongside PDF without modifying PDF:
# Generate sidecar only
ocrmypdf --output-type none --sidecar text.txt input.pdf output.pdf
# Typical sidecar workflow
ocrmypdf --sidecar text.txt --force-ocr input.pdf output.pdf
Combined Recipes
Maximum compression
ocrmypdf --optimize 3 --jbig2-lossy --png-lossy input.pdf small.pdf
Archival PDF/A with compression
ocrmypdf --output-type pdfa --optimize 2 input.pdf archival.pdf
Lossless output
ocrmypdf --output-type pdf --optimize 1 --png-lossless input.pdf lossless.pdf
Quick Reference
| Task | Command |
|------|---------|
| No optimization | --optimize 0 |
| Lossless default | --optimize 1 |
| Aggressive lossy | --optimize 2 |
| Max quality | --optimize 3 |
| PDF/A-1b (default) | --output-type pdfa |
| PDF/A-2b | --output-type pdfa2b |
| JBIG2 lossy | --jbig2-lossy |
| PNG lossy | --png-lossy |
| Sidecar text | --sidecar text.txt |
Troubleshooting
- Large file size: Try
--optimize 2or--png-lossy. - PDF/A validation fails: Use
--output-type pdfa2bfor better compatibility. - Font issues: PDF/A-2u ensures full Unicode support.