Slide Builder
Transform essay-to-speech output into complete, presentation-ready slide decks with multiple output format support (HTML, Remotion video, PowerPoint).
When to Use
Invoke when user:
- Has essay-to-speech output and wants slides
- Says "create slides from this talk track"
- Needs to "build a presentation" from spoken content
- Wants to convert a talk track to video format
- Uses
/slide-buildercommand - Asks for "presentation slides" from transformed essay content
Prerequisites
Input required: Output from the essay-to-speech skill containing:
### Originalsections (verbatim essay text)### Talk Tracksections with semantic tags### Imagessections with ratings (USE/ADAPT/RECREATE/SKIP)### Slide Ideassuggestions
Core Process
1. Parse Essay-to-Speech Output
Extract structured data from each section:
Section → {
title: string,
original: string,
talkTrack: TaggedContent[],
images: ImageAssessment[],
slideIdeas: string[]
}
Semantic tags to identify:
[HOOK]- Opening attention-grabber → Title/hook slide[KEY_POINT]- Core argument → Statement slide[EVIDENCE]- Data/proof → Data visualization slide[STORY]- Narrative → Story/quote slide[TRANSITION]- Bridge → Section divider or no slide[CALLBACK]- Reference → Recap element[LANDING]- Conclusion → Summary slide[CTA]- Call to action → Action slide
2. Plan Slide Deck
Map semantic tags to slides:
| Tag | Slide Type | Typical Visual |
|-----|------------|----------------|
| [HOOK] | Title/Opening | Bold statement, striking image |
| [KEY_POINT] | Statement | Single phrase, minimal graphic |
| [EVIDENCE] | Data | Chart, statistic callout, comparison |
| [STORY] | Story | Photo, quote attribution, timeline |
| [TRANSITION] | Divider (optional) | Section title, progress indicator |
| [CALLBACK] | Recap | Reference to earlier slide |
| [LANDING] | Summary | Key takeaways, visual recap |
| [CTA] | Action | Contact info, next steps, QR code |
Slide count heuristic:
- 1-2 slides per
[KEY_POINT] - 1 slide per
[EVIDENCE]block - Section dividers are optional (skip for tight decks)
- Target: 1 slide per 45-60 seconds of speaking
3. Handle Images
Process image assessments from essay-to-speech:
| Rating | Action |
|--------|--------|
| USE | Include directly in slide |
| ADAPT | Note modifications needed (enlarge labels, crop, simplify) |
| RECREATE | Generate Nano Banana prompt for new visual |
| SKIP | Do not include |
For RECREATE images: Generate a Nano Banana prompt following these guidelines:
- 16:9 aspect ratio for slides
- Clear, simple compositions
- Large readable text/labels
- Brand colors if specified
Example RECREATE prompt:
"Clean horizontal bar chart comparing 5 items, minimal style,
white background, teal (#557373) bars, large bold labels,
no gridlines, presentation-ready, 16:9 aspect ratio"
4. Generate Output
Output Format: Talk Track v5
The primary output format for presentations.
Structure
# Talk Track v5 frontmatter
version: 5
title: "Presentation Title"
subtitle: "Optional Subtitle"
author: "Presenter Name"
date: "2025-01-15"
target_minutes: 15
audio_voice: "af_heart"
brand:
primary: "#557373"
background: "#F2EFEA"
text: "#0D0D0D"
sections:
- id: opening
name: "Opening"
color: "#557373"
- id: problem
name: "The Problem"
color: "#6B8E6B"
- id: solution
name: "The Solution"
color: "#C4785A"
- id: closing
name: "Closing"
color: "#557373"
---
## Slides
| # | Slug | Title | Image | Section |
|---|------|-------|-------|---------|
| 1 | hook | The Question | hook.png | opening |
| 2 | problem-1 | What's Broken | problem-chart.png | problem |
| 3 | evidence | The Data | evidence.png | problem |
| 4 | solution | A New Approach | solution.png | solution |
| 5 | action | Your Next Step | cta.png | closing |
---
## [hook] The Question

<!-- AUDIO -->
[HOOK] Let me ask you something that might change how you think about this entire problem...
What if everything you believed was based on outdated assumptions?
<!-- /AUDIO -->
**Speaker Notes:**
- Pause after the question
- Make eye contact with audience
- Let the tension build
---
## [problem-1] What's Broken

<!-- AUDIO -->
[KEY_POINT] The current approach fails in three critical ways.
[EVIDENCE] First, efficiency drops by 40% when teams scale past 10 people. Second, communication overhead grows exponentially. Third, institutional knowledge gets siloed.
<!-- /AUDIO -->
**Speaker Notes:**
- Point to chart as you mention each stat
- Emphasize "exponentially"
---
Format Rules
- YAML Frontmatter - Metadata, timing, voice, sections
- Slide Index Table - Quick reference for all slides
- Individual Slides - Each with:
- H2 header:
## [slug] Title - Image reference (if applicable)
<!-- AUDIO -->block with talk track**Speaker Notes:**for presenter context
- H2 header:
Audio Block Format
Content between <!-- AUDIO --> and <!-- /AUDIO --> is:
- Read aloud by TTS engines
- Preserves semantic tags for timing hints
- Excludes speaker notes and visual descriptions
Timing Calculation
Estimate duration based on word count:
- Speaking rate: 130-150 words/minute
- Add 2-3 seconds per slide transition
- Add pause time for
[PAUSE]markers
Voice Options
Development: Kokoro TTS (via claude-speak)
- Local, free, fast iteration
- Invoke:
/claude-speakskill with audio block text - Voice:
af_heart(default) or specify in frontmatter - Output: Local audio file per slide or full presentation
Production: ElevenLabs v3
- Word-level timestamps for precise video sync
- Higher quality for final distribution
- Requires ElevenLabs API key
- Output: Audio + JSON timing data
See references/voice-options.md for full configuration.
Alternative Output Formats
HTML Slide Engine
Static HTML presentation with:
- Keyboard navigation (arrows, space)
- Speaker notes toggle (N key)
- Timer display
- Print to PDF support
See references/html-engine.md for template.
Remotion Video
Export to React-based video for:
- YouTube/social publishing
- Embedded animations
- Precise audio sync with timestamps
See references/remotion-video.md for project setup.
Workflow
Standard Flow
essay-to-speech output
↓
[slide-builder]
↓
Talk Track v5 (.md)
↓
┌─────┼─────┐
↓ ↓ ↓
HTML Video Audio
Quick Start
- Input: Provide essay-to-speech output
- Review plan: Claude proposes slide structure
- Confirm or adjust: Modify slide count, sections, visuals
- Generate: Claude outputs Talk Track v5 markdown
- Images: Generate RECREATE images via Nano Banana
- Audio: Generate voice via claude-speak or ElevenLabs
- Render: Export to HTML, video, or both
Best Practices
Slide Design Principles
- One idea per slide - Split dense content
- 6 words or less on screen - The rest is spoken
- High contrast - Readable from back row
- Consistent visual language - Same fonts, colors, style
- Images > bullet points - Visual storytelling wins
Talk Track Integration
- Audio is king - Slides support speech, not replace it
- Match pacing - Visual changes align with spoken transitions
- Build reveals - Don't show everything at once
- Breathing room - Not every sentence needs a slide change
Image Guidance
For RECREATE images, always specify:
- Aspect ratio (16:9 for slides)
- Style (clean, minimal, professional)
- Key data to visualize
- What to AVOID (clutter, small text, decorative elements)
What This Skill Does NOT Do
- Edit or create original essay content (that's essay-to-speech)
- Design custom graphics (use Nano Banana for that)
- Record actual audio (use claude-speak or ElevenLabs)
- Render final video (use Remotion or video editor)
- Create PowerPoint/Keynote files directly (exports markdown)
Integration
Upstream:
essay-to-speech- Provides structured input
Downstream:
nano-banana- Generates RECREATE imagesclaude-speak- Generates audio narrationveo3-prompter- Creates video segments (if needed)
References
references/talk-track-v5.md- Complete format specificationreferences/html-engine.md- Static HTML slide playerreferences/remotion-video.md- React video export setupreferences/voice-options.md- TTS configuration and comparisonreferences/image-handling.md- Full image processing workflowreferences/examples.md- Complete input→output examples