Document Converter Suite
Run best-effort extraction and rebuild workflows across common document formats. Preserve clean structure, not pixel-perfect layout.
Use This For
- Converting between
pdf,docx,pptx,xlsx,txt,csv,md, andhtml - Pulling tables or spreadsheet-style grids into editable outputs
- Running utility PDF operations such as merge, split, rotate, watermark, or page extraction
- Filling simple document or form-style templates
Workflow
- Confirm the source format, target format, and whether editability or fidelity matters more.
- Use
scripts/convert.pyfor single documents andscripts/batch_convert.pyfor folders. - Use the bundled utility scripts when the user needs a focused PDF or table task:
scripts/pdf_toolkit.pyscripts/table_extractor.pyscripts/form_filler.py
- Say explicitly when the output is best-effort and likely to lose layout, images, OCR text, or advanced formatting.
Guardrails
- Do not promise visual fidelity.
- Treat scanned PDFs as OCR problems, not conversion problems.
- Raise safety caps gradually on large sheets or documents instead of processing everything blindly.
References
references/conversion_matrix.mdfor supported paths.references/limitations.mdfor failure modes and tradeoffs.