DOCX Processing Skill | Agent Skills

DOCX Processing

Overview

Work with Microsoft Word documents (.docx files) for creation, editing, analysis, and conversion.

Reading/Analyzing Documents

Text Extraction

Use pandoc for simple text extraction:

pandoc document.docx -t plain -o output.txt

Raw XML Access

Unpack for direct access to comments, formatting, and metadata:

unzip document.docx -d document_unpacked/

Creating New Documents

Use JavaScript/TypeScript with the docx library:

import { Document, Paragraph, TextRun, Packer } from 'docx';

const doc = new Document({
  sections: [{
    properties: {},
    children: [
      new Paragraph({
        children: [
          new TextRun("Hello World"),
        ],
      }),
    ],
  }],
});

// Export
const buffer = await Packer.toBuffer(doc);

Editing Existing Documents

Workflow

Unpack the DOCX file
Modify XML content directly
Repack the document

Python Approach

from docx import Document

doc = Document('input.docx')
for para in doc.paragraphs:
    if 'old text' in para.text:
        para.text = para.text.replace('old text', 'new text')
doc.save('output.docx')

Redlining Workflow (Tracked Changes)

Convert to markdown first
Identify changes in logical batches (3-10 per group)
Unpack the document
Implement changes using precise XML edits
Only mark text that actually changes
Verify comprehensively

Document Conversion

DOCX to PDF

libreoffice --headless --convert-to pdf document.docx

PDF to Images

pdftoppm -jpeg -r 150 document.pdf output

Key Principles

Read referenced documentation files completely without range limits
Maintain minimal, precise edits when working with tracked changes
Preserve original formatting when possible

Agent Skills: DOCX Processing

Install this agent skill to your local

Skill Files