HWP Skill - HWP / HWPX / HWPML to Markdown Conversion
Scheduling
Goal
Convert Korean HWP-family documents into readable Markdown or structured JSON while preserving document structure for LLM context, RAG, government-document review, or enterprise document processing.
Intent signature
- User asks to convert, parse, read, extract, or transform
.hwp,.hwpx, or.hwpml. - User mentions Korean word processor files, Hangul documents, government forms, or "한글 파일".
- User needs headings, tables, nested tables, lists, images, footnotes, or hyperlinks extracted from HWP-family files.
When to use
- Converting Korean HWP documents (
.hwp,.hwpx,.hwpml) to Markdown - Preparing Korean government/enterprise documents for LLM context or RAG
- Extracting structured content (tables, headings, lists, images) from HWP
- User says "convert this HWP", "parse hwpx", "HWP to markdown", "한글 파일"
When NOT to use
- PDF files -> use
oma-pdf(OCR + Tagged PDF specialization) - XLSX / DOCX files -> out of scope; run
bunx kordocdirectly if needed (note:oma-docsis the documentation-drift skill, not a converter) - Generating or editing HWP documents -> out of scope
- Already-text files -> use Read tool directly
Expected inputs
input_path:.hwp,.hwpx, or.hwpmlfile pathoutput_pathoroutput_dir: optional explicit output targetformat: optional output format, defaultmarkdownpage_range: optional page or section rangekordoc_version: optional pinned kordoc version
Expected outputs
- Markdown output next to the input file or in the requested directory
- Optional JSON output when requested
- Post-processed Markdown with flattened GFM tables and stripped Private Use Area glyphs by default
- A short report with output path, detected source format, and conversion issues
Dependencies
bunandbunxbunx kordoc@latestor configured pinned kordoc versionresources/flatten-tables.tsfor Markdown cleanup- Local filesystem access to input and output paths
Control-flow features
- Branches by file extension, output target, format, page range, encryption/DRM state, and post-processing requirements
- Calls external CLI tools through
bunxandbun run - Reads local HWP-family files and writes local Markdown or JSON output
- Routes non-HWP inputs to other skills instead of stretching this skill's scope
Structural Flow
Entry
- Confirm the input path exists.
- Confirm the extension is
.hwp,.hwpx, or.hwpml. - Resolve output path or directory and default filename.
- Check that
bunis available.
Scenes
- PREPARE: Validate path, extension, size, output target, and requested format.
- ACQUIRE: Detect source format and runtime availability.
- ACT: Run
kordocwith explicit output target and requested options. - VERIFY: Post-process Markdown and inspect structure for headings, tables, lists, images, and footnotes.
- FINALIZE: Report output path, source format, and any conversion limitations.
Transitions
- If the input is
.pdf, stop and route tooma-pdf. - If the input is
.xlsxor.docx, explain that this skill does not advertise those formats. - If
bunis unavailable, stop and ask the user to install Bun. - If Markdown is produced, run
resources/flatten-tables.tsunless the caller explicitly needs HTML tables or PUA glyphs preserved. - If output is empty or garbled, consult
resources/troubleshooting.md.
Failure and recovery
| Failure | Recovery |
|---------|----------|
| bun or bunx unavailable | Ask user to install Bun |
| Unsupported or mismatched format | Check extension and magic bytes, then route or stop |
| Encrypted or DRM-locked document | Report limitation and request an accessible copy when needed |
| Empty Markdown output | Treat as possible scanned-image content and recommend OCR outside this skill |
| Complex merged tables | Accept flattened Markdown or HTML fallback as best effort |
| Stale kordoc cache | Use bunx kordoc@latest or configured pinned version |
| Cannot find module "turndown" from flatten-tables.ts | Run bun install in .agents/skills/oma-hwp/resources/ (its node_modules is gitignored and absent on fresh clones) |
Exit
- Success: output file exists and structure is readable after post-processing.
- Partial success: output exists with explicitly reported table, glyph, encryption, or fidelity limitations.
- Failure: no reliable output is produced and the blocking cause is reported.
Logical Operations
Actions
| Action | SSL primitive | Evidence |
|--------|---------------|----------|
| Validate file path and extension | VALIDATE | Input preflight in execution protocol |
| Check runtime availability | VALIDATE | bun --version |
| Select output target and format | SELECT | Output behavior and config |
| Run converter | CALL_TOOL | bunx kordoc@latest |
| Write output artifact | WRITE | Markdown or JSON output |
| Flatten tables and strip PUA glyphs | CALL_TOOL | resources/flatten-tables.ts |
| Inspect extraction quality | VALIDATE | Verification step |
| Report result | NOTIFY | Final user-facing summary |
Tools and instruments
kordoc: primary HWP-family conversion CLIflatten-tables.ts: post-processing for GFM tables and Hancom PUA cleanupbun/bunx: runtime and CLI executor
Canonical command path
bunx kordoc@latest "{input_path}" -o "{output_path}"
# fresh clone: run `bun install` in .agents/skills/oma-hwp/resources/ first (node_modules is gitignored)
bun ".agents/skills/oma-hwp/resources/flatten-tables.ts" "{output_path}"
For batch conversion, use an explicit output directory:
bunx kordoc@latest "{input_pattern}" -d "{output_dir}"
Resource scope
| Scope | Resource target |
|-------|-----------------|
| LOCAL_FS | Input HWP-family files and generated outputs |
| PROCESS | bunx kordoc and bun run subprocesses |
| MEMORY | Format decisions, validation notes, and final report |
Preconditions
- Input file exists and is readable.
- Output location is writable or can be created.
bunis installed.kordoccan parse the document or fail with a reportable error.
Effects and side effects
- Creates Markdown or JSON output files.
- May flatten merged-cell tables, trading cell fidelity for Markdown compatibility.
- Strips Private Use Area characters by default because they render as blanks without Hancom fonts.
- Does not intentionally modify the source HWP-family document.
Guardrails
- Always pass
@latestor an explicit pinned version to avoid stalebunxcache. - Always pass an explicit output target when the user expects a file.
- Do not add custom security defenses around kordoc's ZIP, XML, SSRF, or XSS defenses.
- Report missing tables, garbled text, empty output, encrypted segments, and best-effort DRM extraction.
- Keep full CLI details in
resources/execution-protocol.mdand troubleshooting branches inresources/troubleshooting.md.
Supported Formats
| Format | Extension | Notes |
|--------|-----------|-------|
| HWP 5.x binary | .hwp | Full support (incl. DRM-locked via kordoc's rhwp-algorithm port) |
| HWPX | .hwpx | Full support incl. nested tables, merged cells |
| HWPML | .hwp (XML variant) | Auto-detected by signature |
kordoc also parses PDF / XLSX / DOCX. Those are intentionally outside this skill's scope; see "When NOT to use".
References
- Execution protocol:
resources/execution-protocol.md - Troubleshooting:
resources/troubleshooting.md - Configuration:
config/hwp-config.yaml - Upstream: https://github.com/chrisryugj/kordoc
- Related:
../oma-pdf/SKILL.md(use for.pdfinputs)