Document Importer
Import external documents into editable markdown.
When To Use
- User provides a DOCX, PPTX, XLSX, PDF, or HTML file to convert into project documentation
- User wants to extract content from a document for rewriting or remediation
- User has a slide deck or spreadsheet to turn into markdown documentation
When NOT To Use
- Academic paper analysis: use
tome:papers - Web article knowledge intake: use
memory-palace:knowledge-intake - Content already in markdown: use
scribe:doc-generatorremediation mode directly
Import Workflow
Step 1: Identify Source
Determine the source document:
- Local file path: verify it exists with Read tool
- URL: verify accessibility
- User description: confirm format and location
Step 2: Convert to Markdown
Apply the leyline:document-conversion protocol:
- Construct URI from source (file path or URL)
- Try the markitdown MCP tool for best quality
- If unavailable, use native tool fallbacks
- If format unsupported, inform user
Step 3: Structural Cleanup
After conversion, normalize the markdown:
- Ensure ATX headings (
# style, not setext underlines) - Wrap prose lines at 80 characters per
leyline:markdown-formatting - Fix broken tables (align columns, add headers)
- Remove conversion artifacts (page numbers, headers/footers, watermarks, repeated logos)
- Preserve all substantive content
Step 4: Sanitize External Content
Apply the leyline:content-sanitization checklist:
- Size check (truncate sections over 2000 words)
- Strip system/instruction tags
- Wrap in external content boundary markers
Step 5: Write Draft
Write the converted markdown to the target location.
Default: same directory as source, with .md extension.
Ask the user for target path if ambiguous.
Step 6: Hand Off to Doc-Generator (Optional)
If the user wants polishing or rewriting:
- Invoke
Skill(scribe:doc-generator)in Remediation mode on the imported file - The doc-generator handles slop detection, style application, and quality gates
Offer this step; do not assume the user wants remediation.
Output Quality
The imported markdown should:
- Have a top-level
# Titlefrom the document title - Preserve the original heading hierarchy
- Convert tables to markdown tables
- Convert images to
references (note: image files may need separate handling) - Convert lists faithfully
- Mark unclear or garbled sections with
<!-- REVIEW: conversion artifact -->
Exit Criteria
- Source document identified and accessible
- Conversion attempted via document-conversion protocol
- Structural cleanup applied
- Sanitization checklist passed
- Draft written to target path
- User informed of any conversion limitations