Agent Skills: PPTX Slide Inspector

>

UncategorizedID: edwinhu/workflows/pptx-render

Install this agent skill to your local

pnpm dlx add-skill https://github.com/edwinhu/workflows/tree/HEAD/skills/pptx-render

Skill Files

Browse the full folder contents for pptx-render.

Download Skill

Loading file tree…

skills/pptx-render/SKILL.md

Skill Metadata

Name
pptx-render
Description
>

Announce: "I'm using pptx-render to extract PPTX slide content."

PPTX Slide Inspector

Extracts content from PPTX slides using python-pptx. Primary use case: understanding what a PPTX slide contains (shapes, text, positions, images) for comparison against Typst slides, especially diagrams and visual items (VIS-* in content inventories).

Prerequisites

| Tool | Source | |------|--------| | python-pptx | pixi project dependency |

Step 1: Identify the PPTX File and Slide Number

If the user references a content inventory item (e.g., VIS-3, DQ-7), look up its PPTX slide number:

grep "VIS-3\|the-item-id" inventory/content-inventory-XX.md

Step 2: Extract Slide Shapes

from pptx import Presentation
import json

prs = Presentation('path/to/slides.pptx')
slide = prs.slides[SLIDE_NUM - 1]  # 0-indexed

for shape in slide.shapes:
    info = {
        'name': shape.name,
        'left_in': round(shape.left / 914400, 2),
        'top_in': round(shape.top / 914400, 2),
        'width_in': round(shape.width / 914400, 2),
        'height_in': round(shape.height / 914400, 2),
    }
    if shape.has_text_frame:
        info['text'] = shape.text_frame.text
    if shape.shape_type == 13:  # MSO_SHAPE_TYPE.PICTURE
        info['is_image'] = True
    if shape.has_table:
        info['is_table'] = True
        info['rows'] = len(shape.table.rows)
        info['cols'] = len(shape.table.columns)
    print(json.dumps(info))

Step 3: Extract Images (if needed)

To save embedded images from a slide:

from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE_TYPE

prs = Presentation('path/to/slides.pptx')
slide = prs.slides[SLIDE_NUM - 1]

for i, shape in enumerate(slide.shapes):
    if shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
        image = shape.image
        ext = image.content_type.split('/')[-1]
        with open(f'/tmp/pptx-slide-{SLIDE_NUM}-img-{i}.{ext}', 'wb') as f:
            f.write(image.blob)
        print(f'Saved image {i}: {image.content_type} ({shape.width/914400:.1f}x{shape.height/914400:.1f} in)')

Step 4: Interpret the Layout

Shape positions use inches from top-left corner:

  • left_in / top_in: position of shape's top-left corner
  • Standard slide is 10" × 7.5" (widescreen) or 10" × 5.63" (16:9)
  • Shapes with is_image: true and generic names ("Picture 5") are usually clipart
  • Group shapes may contain sub-shapes (connectors, arrows) — inspect .shapes on groups

Classifying Slide Content

| Shape Pattern | Likely Content | |--------------|----------------| | Multiple text boxes + arrows/lines at specific positions | Substantive diagram — reproduce in Typst | | Single large Picture shape filling the slide | Clipart/stock photo — skip or replace | | Table shape | Data table — reproduce as Typst #table | | Text boxes only, no connectors | Text slide — no diagram needed | | Group shapes with AutoShapes inside | Flow diagram — extract sub-shapes |

Quick Reference

# One-liner to dump all shapes from slide N
uv run python3 -c "
from pptx import Presentation; import json
prs = Presentation('PPTX_PATH')
for s in prs.slides[N-1].shapes:
    d = {'name': s.name, 'text': s.text_frame.text if s.has_text_frame else None,
         'pos': f'{s.left/914400:.1f},{s.top/914400:.1f}',
         'size': f'{s.width/914400:.1f}x{s.height/914400:.1f}'}
    print(json.dumps(d))
"

Rendering slides to PDF/PNG

For actual rasterization (not content extraction), use the shared x2t wrapper — ONLYOFFICE x2t is stateless and parallel-safe, unlike soffice:

# pptx -> PDF (all slides, then split with pdftoppm if per-slide PNGs needed)
python3 ${CLAUDE_SKILL_DIR}/../../scripts/x2t_convert.py deck.pptx deck.pdf
# pptx -> PNG (first slide only)
python3 ${CLAUDE_SKILL_DIR}/../../scripts/x2t_convert.py deck.pptx slide1.png

Do NOT call soffice --headless directly — it silently fails on macOS (returns 0, no output) due to profile lock issues. The wrapper prefers x2t and only falls back to soffice where x2t is absent.

PPTX Slide Inspector Skill | Agent Skills