Agent Skills: AI Talking Head

Specialized skill for AI talking head and lip-sync video generation. Use when you need presenter videos, UGC-style content, or lip-synced avatars. Triggers on: talking head, presenter video, lip sync, UGC video. Outputs professional talking head videos.

UncategorizedID: GroundMountCompany/groundmounts-app/ai-talking-head

Install this agent skill to your local

pnpm dlx add-skill https://github.com/GroundMountCompany/groundmounts-app/tree/HEAD/Vibe-Creative-Pack-Claude-Code-v/ai-talking-head

Skill Files

Browse the full folder contents for ai-talking-head.

Download Skill

Loading file tree…

Vibe-Creative-Pack-Claude-Code-v/ai-talking-head/SKILL.md

Skill Metadata

Name
ai-talking-head
Description
"Specialized skill for AI talking head and lip-sync video generation. Use when you need presenter videos, UGC-style content, or lip-synced avatars. Triggers on: talking head, presenter video, lip sync, UGC video. Outputs professional talking head videos."

AI Talking Head

Generate talking head videos, presenter content, and lip-synced videos.

Use this skill when: You need a person (real or AI) talking to camera. Route here from: ai-creative-workflow, ai-creative-strategist, or direct requests.


Why This Skill Exists

The problem: Talking head videos are the most persuasive content format but:

  1. Recording yourself is time-consuming and requires confidence
  2. Professional presenters are expensive ($500-5000+ per video)
  3. UGC creators charge $100-500 per post and may not match your brand
  4. Iterating on scripts means re-filming everything
  5. Scaling personalized video is nearly impossible manually

The solution: AI talking heads that:

  • Generate professional presenter videos in minutes
  • Let you iterate on scripts without re-recording
  • Create unlimited variants for A/B testing
  • Maintain consistent brand presenter identity
  • Scale personalized outreach cost-effectively

The game-changer: Combining avatar generation + lip-sync lets you:

  • Create a consistent "brand spokesperson"
  • Update any script without re-filming
  • Test multiple presenter styles quickly
  • Produce video content at 10x the speed

Presenter Style Exploration (Before Generation)

Critical insight from ai-creative-strategist: Don't generate with one style and hope it works. Explore genuinely DIFFERENT presenter styles first.

The Style Exploration Process

STEP 1: GENERATE 4-5 DIFFERENT PRESENTER STYLES

This is NOT: Same person with different clothes This IS: Fundamentally different presenter archetypes that each tell a different story

[YOUR BRAND] - Style Exploration

Generate presenter concepts for these 5 directions:

1. CORPORATE AUTHORITY
   - Demographic: 35-50, professional appearance
   - Setting: Modern office, corporate environment
   - Wardrobe: Business professional, suit/blazer
   - Energy: Confident, measured, authoritative
   - Vibe: "Trust the expert"

2. RELATABLE FRIEND
   - Demographic: 25-40, approachable look
   - Setting: Home office, kitchen, casual space
   - Wardrobe: Smart casual, comfortable
   - Energy: Warm, conversational, genuine
   - Vibe: "Let me share what worked for me"

3. ENERGETIC CREATOR
   - Demographic: 22-35, creator aesthetic
   - Setting: Ring light setup, content studio
   - Wardrobe: Trendy casual, branded
   - Energy: High, dynamic, enthusiastic
   - Vibe: "You HAVE to try this"

4. EXPERT EDUCATOR
   - Demographic: 30-55, credible appearance
   - Setting: Study, library, professional backdrop
   - Wardrobe: Smart casual, glasses optional
   - Energy: Calm, explanatory, helpful
   - Vibe: "Let me explain how this works"

5. LIFESTYLE ASPIRATIONAL
   - Demographic: 28-45, aspirational look
   - Setting: Beautiful home, travel location, luxury
   - Wardrobe: Elevated casual, tasteful
   - Energy: Relaxed confidence, success aura
   - Vibe: "This is what my life looks like"

STEP 2: IDENTIFY WINNER

After generating style exploration:

REVIEW each presenter style:

Which presenter:
- Best matches brand voice?
- Would audience trust most?
- Fits the content type?
- Has right energy level?
- Would work across multiple videos?

WINNER: [Selected style]
BECAUSE: [Why this style wins for this brand/use case]

STEP 3: EXTRACT PRESENTER PRINCIPLES

Once winner identified:

WINNING STYLE EXTRACTION

Demographics:
- Age range: [X-X]
- Gender: [if specific]
- Ethnicity: [if specific]
- Overall look: [descriptors]

Environment:
- Primary setting: [where they present from]
- Background elements: [what's visible]
- Lighting style: [natural/studio/mixed]

Wardrobe:
- Style: [formal/casual/etc.]
- Colors: [palette]
- Accessories: [if any]

Delivery:
- Energy level: [1-10]
- Speaking pace: [slow/medium/fast]
- Hand gestures: [minimal/moderate/expressive]
- Eye contact: [direct to camera always]

Audio:
- Voice tone: [warm/authoritative/energetic]
- Pacing: [conversational/punchy/measured]

STEP 4: APPLY ACROSS CONTENT

Use extracted principles for:

  • All future videos maintain consistency
  • Same presenter = brand recognition
  • Variations in script, not in presenter

Presenter Archetype Deep Dives

Corporate Authority

When to use: B2B, financial services, healthcare, enterprise SaaS, professional services

Visual Formula:

[Man/Woman] in [30s-50s], [silver/dark hair], wearing [tailored blazer/suit],
in [modern glass office/conference room with city view], [warm professional lighting],
[confident composed expression], [seated at desk OR standing with slight lean],
[direct eye contact with camera], [subtle hand gestures], corporate executive style

Setting Options:

  • Corner office with city view
  • Modern conference room
  • Executive desk with minimal decor
  • Standing at presentation screen
  • Seated in designer chair

Wardrobe Options:

  • Tailored navy blazer over white shirt
  • Grey suit, no tie (modern)
  • Classic suit with subtle tie
  • Blazer over turtleneck (thought leader)
  • Professional dress (solid colors)

Energy Markers:

  • Measured pace
  • Deliberate movements
  • Confident pauses
  • Minimal but purposeful gestures
  • Assured vocal tone

Relatable Friend (UGC Style)

When to use: DTC brands, consumer products, wellness, beauty, lifestyle

Visual Formula:

[Friendly man/woman] in [25-40s], wearing [casual but put-together outfit],
in [bright modern apartment/kitchen/home office], [natural window light],
[genuine warm smile], [relaxed comfortable posture], [talking to camera like
a friend], [natural hand movements], authentic UGC creator style

Setting Options:

  • Bright kitchen counter
  • Cozy living room couch
  • Home office with plants
  • Bedroom getting-ready setup
  • Outdoor patio/balcony

Wardrobe Options:

  • Cozy sweater/cardigan
  • Simple t-shirt
  • Casual button-down
  • Loungewear (if brand appropriate)
  • Athleisure

Energy Markers:

  • Conversational rhythm
  • Natural pauses ("honestly?", "okay so...")
  • Expressive facial reactions
  • Genuine enthusiasm without over-selling
  • Relatable body language

UGC Script Patterns:

DISCOVERY: "Okay so I found this [product] and I'm obsessed..."
REVIEW: "So I've been using [product] for [time] and here's my honest take..."
COMPARISON: "I used to use [old product] but then I tried [new product]..."
TRANSFORMATION: "Before [product] I was [problem]. Now? [result]."

Energetic Creator

When to use: Gen-Z products, entertainment, gaming, trendy DTC, social apps

Visual Formula:

[Young energetic creator] in [22-35], [colorful trendy outfit], in [content
studio with ring light/neon lights], [bright dynamic lighting], [animated
expressions], [lots of movement and gestures], [high energy delivery],
[fast-paced enthusiastic style], YouTube/TikTok creator aesthetic

Setting Options:

  • Ring light setup visible
  • LED/neon accent lighting
  • Streaming/gaming setup
  • Colorful backdrop
  • Outdoor action setting

Wardrobe Options:

  • Graphic tees
  • Bold colors
  • Branded merch
  • Trendy streetwear
  • Statement accessories

Energy Markers:

  • Fast-paced delivery
  • Big expressions
  • Lots of hand movement
  • Pattern interrupts
  • Enthusiasm at 10

Creator Script Patterns:

HOOK: "STOP scrolling. This is important."
REVEAL: "I literally just discovered [thing] and I'm freaking out."
CHALLENGE: "I bet you can't guess what [product] does."
REACTION: "[reaction to trying product]... WAIT what?!"

Expert Educator

When to use: Online courses, professional services, B2B explainers, tutorials

Visual Formula:

[Knowledgeable expert] in [30s-55], [smart casual or academic style],
in [home study/office with books/whiteboard], [balanced lighting],
[thoughtful composed expression], [explaining with purposeful gestures],
[patient instructive tone], educator/thought leader style

Setting Options:

  • Study with bookshelves
  • Office with credentials visible
  • Whiteboard/screen behind
  • Standing at presentation
  • Desk with relevant props

Wardrobe Options:

  • Button-down shirt
  • Blazer over casual shirt
  • Sweater over collared shirt
  • Glasses (authority signal)
  • Minimal accessories

Energy Markers:

  • Patient pace
  • Teaching rhythm
  • Logical structure
  • Illustrative gestures
  • "Here's what matters" moments

Lifestyle Aspirational

When to use: Luxury brands, high-ticket services, aspirational DTC, travel, real estate

Visual Formula:

[Elegant successful person] in [30s-50s], [elevated casual attire],
in [beautiful interior/scenic location], [golden hour OR designer lighting],
[relaxed confident demeanor], [speaking with quiet confidence], [minimal
but graceful movement], aspirational lifestyle aesthetic

Setting Options:

  • Designer living room
  • Travel location (balcony view)
  • Luxury car interior
  • High-end restaurant/hotel
  • Yacht/beach/resort

Wardrobe Options:

  • Designer casual
  • Linen/natural fabrics
  • Neutral luxury palette
  • Subtle jewelry/watch
  • Effortlessly elegant

Energy Markers:

  • Relaxed confidence
  • No rushing
  • "I have time" energy
  • Subtle smile
  • Quiet success vibes

Video Model Roster (Quality Winners)

Generate presenter videos with ALL THREE models, present outputs for selection:

| Model | Owner | Speed | Strengths | |-------|-------|-------|-----------| | Sora 2 | openai | ~80s | Excellent general quality, good faces | | Veo 3.1 | google | ~130s | Native audio generation, natural movement | | Kling v2.5 Turbo Pro | kwaivgi | ~155s | Best for people/motion, most realistic |

Strategy: Run same prompt through all 3 models → User picks best output.

Model Selection Guide

FOR MAXIMUM REALISM (people quality):
    → Kling v2.5 Turbo Pro (best faces, most natural movement)

FOR SPEED + QUALITY BALANCE:
    → Sora 2 (fastest, still good quality)

FOR BUILT-IN AUDIO:
    → Veo 3.1 (generates audio with video)

FOR UGC AUTHENTICITY:
    → Kling v2.5 (handles casual movements well)

FOR CORPORATE/FORMAL:
    → Sora 2 or Kling v2.5 (cleaner, more controlled)

Lip-Sync Model

For adding speech to existing videos:

| Model | Use | Cost | Speed | Quality | |-------|-----|------|-------|---------| | Kling Lip-Sync | Add voiceover to any video | ~$0.20 | ~1min | Excellent |

When to use Lip-Sync:

  • You have a great presenter video but need different script
  • Client wants to change messaging after video generation
  • Creating personalized versions of same base video
  • Adding voiceover to product demo videos
  • Dubbing content for different languages

Use Cases Deep Dive

1. Lip-Sync Overlay

Best for: Adding voiceover to existing video, dubbing, personalization

Input Requirements:

  • Video with visible face (front-facing works best)
  • Audio file (MP3, WAV) OR text script

Workflow:

{
  "model_owner": "kwaivgi",
  "model_name": "kling-lip-sync",
  "Prefer": "wait",
  "input": {
    "video": "https://... (source video URL)",
    "audio": "https://... (audio file URL)"
  }
}

Or with text (uses built-in TTS):

{
  "input": {
    "video": "https://... (source video URL)",
    "text": "Script text to speak"
  }
}

Quality Tips:

  • Source video should have face visible 70%+ of time
  • Forward-facing shots work better than profiles
  • Avoid videos with heavy face movement/turning
  • Audio should be clear without background noise
  • Script pacing should match natural speech

2. AI Presenter Generation

Best for: Creating presenter content from scratch, brand spokesperson

Multi-Model Workflow:

// Sora 2
{
  "model_owner": "openai",
  "model_name": "sora-2",
  "input": {
    "prompt": "[presenter prompt]",
    "aspect_ratio": "16:9",
    "duration": 5
  }
}

// Veo 3.1 (with native audio)
{
  "model_owner": "google",
  "model_name": "veo-3.1",
  "input": {
    "prompt": "[presenter prompt]",
    "aspect_ratio": "16:9",
    "generate_audio": true
  }
}

// Kling v2.5
{
  "model_owner": "kwaivgi",
  "model_name": "kling-v2.5-turbo-pro",
  "input": {
    "prompt": "[presenter prompt]",
    "aspect_ratio": "16:9",
    "duration": 5
  }
}

Then add lip-sync if specific script needed:

{
  "model_owner": "kwaivgi",
  "model_name": "kling-lip-sync",
  "input": {
    "video": "[generated video URL]",
    "text": "[script text]"
  }
}

3. UGC-Style Content

Best for: Authentic testimonials, product reviews, social proof

The UGC Formula:

[Relatable person] + [Casual setting] + [Natural lighting] +
[Authentic delivery] + [Genuine reaction] = Believable UGC

Prompt Template:

Friendly [demographic] sitting in [casual setting], natural window light,
holding/showing [product], genuine excited expression, talking directly to
camera like filming a selfie video, authentic UGC testimonial style, casual
comfortable body language, 5 seconds

UGC Authenticity Markers:

  • Slightly imperfect framing
  • Natural lighting (not studio)
  • Casual wardrobe
  • Real reactions, not posed
  • Personal space as backdrop
  • Eye contact with camera

4. Personal Brand Series

Best for: Thought leaders, course creators, coaches, consultants

Consistency Formula:

ESTABLISH ONCE, USE FOREVER:
- Same presenter appearance
- Same setting/background
- Same wardrobe style
- Same energy level
- Same lighting setup

Only change: Script and specific content

Series Prompt Template:

[Consistent presenter description - use same each time], [same setting],
[same lighting], [same wardrobe style], [same energy], discussing [new topic],
[consistent delivery style], 5 seconds

Script Mastery

Duration Calculation

| Word Count | Duration | Use Case | |------------|----------|----------| | 15 words | ~5 seconds | Social hook | | 30 words | ~10 seconds | Instagram Reel | | 45 words | ~15 seconds | TikTok optimal | | 60 words | ~20 seconds | Short testimonial | | 90 words | ~30 seconds | Product explainer | | 150 words | ~60 seconds | Full testimonial |

Rule: ~150 words per minute at natural conversational pace

Script Structures

HOOK-VALUE-CTA (15-30 seconds):

Hook (0-3 sec): [Attention-grabber - question, statement, or pattern interrupt]
Value (3-20 sec): [Main message, benefit, or story]
CTA (20-30 sec): [Clear next step]

PROBLEM-AGITATE-SOLVE (30-60 seconds):

Problem (0-10 sec): [Name the pain point]
Agitate (10-30 sec): [Make them feel it]
Solve (30-60 sec): [Present the solution + CTA]

BEFORE-AFTER (15-30 seconds):

Before (0-10 sec): [Life before product/solution]
After (10-25 sec): [Transformation/result]
CTA (25-30 sec): [How to get same result]

Tone Templates

Professional/Corporate:

"[Name] here with [Company]. Today I want to share how [product/insight]
can help you [achieve outcome]. Here's what you need to know..."

Casual/UGC:

"Okay so I've been using [product] for [time] and honestly? I'm obsessed.
Here's why [specific benefit]. If you [problem], you need this."

Expert/Educational:

"One thing I see people get wrong about [topic] is [misconception].
Here's what actually works: [insight]. Let me show you..."

Energetic/Sales:

"Stop what you're doing. [Product] just changed everything. I'm serious -
[result] in [timeframe]. You HAVE to try this."

Aspirational:

"[Casual opening]. I wanted to share something that's completely transformed
[area of life]. [Product] gave me [result]. Here's how it works..."

Platform-Specific Optimization

TikTok/Reels (9:16)

Specs:

  • Aspect Ratio: 9:16 (vertical)
  • Duration: 15-30 seconds optimal
  • Safe Zone: Keep face/text center 60%

Style Adjustments:

→ Higher energy delivery
→ Faster pacing
→ Hook in first 1-2 seconds
→ Pattern interrupts
→ Jump cuts acceptable
→ Casual/authentic feel

Prompt Modifier:

...[base prompt], filmed vertically like TikTok/Reels content,
energetic creator style, direct eye contact with camera

YouTube (16:9)

Specs:

  • Aspect Ratio: 16:9 (landscape)
  • Duration: 30-120 seconds
  • Safe Zone: Standard letterbox

Style Adjustments:

→ More measured pacing
→ Can be longer form
→ More professional setups accepted
→ Room for B-roll integration
→ Intro/outro structure

Prompt Modifier:

...[base prompt], widescreen YouTube style, professional yet engaging,
room for graphics/lower thirds

LinkedIn (1:1 or 16:9)

Specs:

  • Aspect Ratio: 1:1 (square) or 16:9
  • Duration: 30-60 seconds optimal
  • Tone: Professional but personal

Style Adjustments:

→ Professional appearance
→ Business-appropriate setting
→ Thought leadership tone
→ Value-first messaging
→ Credibility signals

Prompt Modifier:

...[base prompt], professional LinkedIn style, credible expert appearance,
business casual in modern office environment

Instagram Stories (9:16)

Specs:

  • Aspect Ratio: 9:16
  • Duration: 15 seconds max per segment
  • Ephemeral feel

Style Adjustments:

→ Casual, in-the-moment feel
→ Can be "rougher" quality
→ Direct audience address
→ Personal/behind-scenes vibe
→ Clear single message per story

Ads (Various)

Facebook/Instagram Ads:

  • 1:1, 4:5, or 9:16
  • 15-30 second optimal
  • Hook in 0-3 seconds
  • Clear CTA

YouTube Ads:

  • 16:9
  • 15-30 second (skippable) or 6 second (bumper)
  • Brand visible throughout

Audio & Voice Considerations

When Using Veo 3.1 Native Audio

Strengths:

  • Generates synchronized audio with video
  • Natural ambient sounds
  • Speech that matches lip movement
  • Good for establishing scenes

Limitations:

  • Less control over specific script
  • Audio quality varies
  • May need post-processing

When Adding Lip-Sync

Best Practices:

  • Use high-quality audio recording
  • Match energy level to video presenter
  • Pace script to natural speaking rhythm
  • Allow for breath pauses
  • Keep sentences short (easier sync)

Voice-Over Tips

If recording your own VO for lip-sync:

□ Record in quiet environment
□ Use consistent distance from mic
□ Match energy to presenter style
□ Natural pauses between sentences
□ Clear enunciation
□ Export as MP3 or WAV

If using TTS (text input):

□ Use punctuation for natural pauses
□ Write phonetically for tricky words
□ Keep sentences conversational length
□ Test different phrasings
□ Consider adding "..." for pauses

Execution Workflow

Step 1: Clarify Requirements

Before generating:

□ What's the use case? (UGC, corporate, educational, etc.)
□ What platform? (TikTok, YouTube, LinkedIn, ads)
□ What aspect ratio? (9:16, 16:9, 1:1)
□ What duration? (and word count)
□ What presenter style? (see archetypes)
□ What's the script/message?
□ Need lip-sync to specific audio?

Step 2: Style Selection

If not predefined:

□ Generate style exploration with 4-5 different presenter styles
□ Present options to user
□ Extract principles from winner
□ Document for consistency

Step 3: Construct Prompt

Use this formula:

[PRESENTER DESCRIPTION] + [SETTING] + [LIGHTING] +
[EXPRESSION/ENERGY] + [ACTION] + [STYLE MODIFIER] + [DURATION]

Step 4: Multi-Model Generation

Run same prompt through:
1. Sora 2 (~80s)
2. Veo 3.1 (~130s)
3. Kling v2.5 (~155s)

Present all three to user for selection.

Step 5: Add Lip-Sync (If Needed)

If specific script delivery required:

1. User approves video from Step 4
2. Run through Kling Lip-Sync
3. Input: selected video + audio/text
4. Output: synced talking head

Step 6: Deliver & Iterate

## Talking Head Video Options

**Style:** [Archetype used]
**Platform:** [Target platform]
**Duration:** [X seconds]

### Option 1: Sora 2
[video URL]
Notes: [quality assessment]

### Option 2: Veo 3.1 (with audio)
[video URL]
Notes: [quality assessment]

### Option 3: Kling v2.5
[video URL]
Notes: [quality assessment]

**Select preferred video for lip-sync or final delivery.**

Quality Checklist

Technical Quality

  • [ ] Face clearly visible throughout
  • [ ] No uncanny valley artifacts
  • [ ] Consistent appearance (no morphing)
  • [ ] Smooth natural movement
  • [ ] Appropriate resolution for platform

Presenter Quality

  • [ ] Matches intended archetype
  • [ ] Expression appropriate for message
  • [ ] Energy level fits content type
  • [ ] Wardrobe matches brand/context
  • [ ] Setting supports message

Lip-Sync Quality (if applicable)

  • [ ] Mouth movement matches audio
  • [ ] Natural speech rhythm
  • [ ] No obvious desync
  • [ ] Head movement doesn't break sync
  • [ ] Audio quality clear

Content Quality

  • [ ] Script delivered clearly
  • [ ] Pacing appropriate for platform
  • [ ] Hook captures attention
  • [ ] Message comes through
  • [ ] CTA clear (if applicable)

Common Issues & Solutions

| Issue | Cause | Solution | |-------|-------|----------| | Uncanny valley feel | Model limitations | Use Kling v2.5 for most realistic faces | | Face morphing mid-video | Long duration | Keep videos shorter (5-10 sec), extend with cuts | | Lip-sync drift | Audio/video mismatch | Use shorter scripts, clear enunciation | | Wrong energy level | Prompt too vague | Be explicit about energy: "calm" vs "enthusiastic" | | Generic stock presenter | No specific direction | Add detailed demographic and style descriptors | | Setting doesn't match | Prompt conflict | Prioritize setting description, remove conflicts | | Awkward hand movement | Unspecified gestures | Add gesture direction or specify "minimal movement" | | Bad lighting | Missing lighting prompt | Always include lighting: "warm natural light" | | Doesn't look like brand | No style consistency | Create and use presenter spec document | | Audio quality poor | TTS limitations | Use recorded audio instead of text input |


Output Format

Style Exploration Output

## Presenter Style Exploration

**Brand/Project:** [Name]
**Use Case:** [What videos will be used for]

### Style 1: Corporate Authority
[video URL or generation]
- Demographic: [specifics]
- Setting: [description]
- Energy: [level]

### Style 2: Relatable Friend
[video URL or generation]
- Demographic: [specifics]
- Setting: [description]
- Energy: [level]

[...continue for all 5 styles...]

**Recommendation:** Style [X] best fits because [reasons]
**Feedback needed:** Which direction resonates?

Generated Video Output

## Talking Head Video Generated

**Style:** [Archetype]
**Platform:** [Target]
**Duration:** [X seconds]

### Model Outputs:

**Sora 2:** [URL]
**Veo 3.1:** [URL] (includes audio)
**Kling v2.5:** [URL]

**Prompt Used:**
> [full prompt for reference]

**Next Steps:**
- [ ] Select preferred video
- [ ] Add lip-sync to specific script (if needed)
- [ ] Request variation
- [ ] Approve for use

Lip-Sync Output

## Lip-Sync Video Delivered

**Source Video:** [URL]
**Script:** "[excerpt...]"
**Duration:** [X seconds]

**Final Video:** [URL]

**Quality Check:**
- ✓ Sync accuracy
- ✓ Natural rhythm
- ✓ Audio clarity
- ✓ Expression match

**Options:**
- [ ] Approve and use
- [ ] Adjust script and resync
- [ ] Try different source video

Pipeline Integration

TALKING HEAD PIPELINE

┌─────────────────────────────────────────┐
│  Request arrives (direct or routed)     │
│  → Clarify: platform, duration, style   │
│  → Determine: generation vs lip-sync    │
└─────────────────────────────────────────┘
                    │
        ┌───────────┴───────────┐
        ▼                       ▼
┌──────────────────┐   ┌──────────────────┐
│  Style Undefined │   │  Style Defined   │
│  → Run style     │   │  → Skip to       │
│    exploration   │   │    generation    │
└──────────────────┘   └──────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────┐
│  ai-talking-head (THIS SKILL)           │
│  → Multi-model generation               │
│  → Present options                      │
│  → Add lip-sync if needed               │
│  → Quality check                        │
└─────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────┐
│  Delivery                               │
│  → Platform-optimized output            │
│  → Ready for ads/social/content         │
└─────────────────────────────────────────┘

Handoff Protocols

Receiving from ai-creative-workflow

Receive:
  use_case: "talking head" | "UGC" | "presenter" | "lip-sync"
  platform: "[target platform]"
  aspect_ratio: "[ratio]"
  duration: "[seconds]"
  style: "[archetype or custom]"
  script: "[text]"
  audio_url: "[if lip-sync with audio]"
  video_url: "[if lip-sync to existing]"

Returning to Workflow

Return:
  status: "complete" | "needs_selection" | "needs_iteration"
  deliverables:
    - video_url: "[URL]"
      model: "[which model]"
      has_audio: true | false
      duration: "[seconds]"
  feedback_needed: "[any questions]"

Receiving Video from ai-product-video

Receive for lip-sync:
  video_url: "[product video URL]"
  aspect_ratio: "[ratio]"
  script: "[voiceover text]"
  audio_url: "[optional, if pre-recorded]"

Tips from Experience

What Works

  1. Consistency beats variety — Same presenter across videos builds recognition
  2. Kling v2.5 for faces — Most realistic human generation
  3. Shorter is safer — 5-10 second clips avoid quality degradation
  4. Explicit energy levels — "calm and measured" vs "enthusiastic and dynamic"
  5. Multi-model approach — Always generate with 2-3 models, let user pick
  6. Lip-sync extends value — One good video can become many scripts

What Doesn't Work

  1. Vague presenter description — "A person talking" = generic results
  2. Long continuous takes — Quality degrades after 10-15 seconds
  3. Ignoring setting — Presenter without context looks artificial
  4. Skipping style exploration — First idea rarely best for brand
  5. Mismatched energy — Corporate script + UGC style = awkward
  6. Complex movements — Walking + talking + gesturing = artifacts

The 80/20

80% of talking head success comes from:

  1. Clear presenter archetype selection
  2. Matching energy to platform
  3. Short, punchy scripts
  4. Using Kling v2.5 for realism

Get these four right, and you'll get good results.


Quick Reference

| Task | Model | Process | |------|-------|---------| | Generate presenter video | All 3 models | Multi-model, user picks | | Add speech to existing video | Kling Lip-Sync | Direct, ~1min | | Presenter + specific script | Generate → Lip-Sync | Two-step | | Video with built-in audio | Veo 3.1 | Single generation | | Most realistic face | Kling v2.5 | Single or multi-model | | Fastest generation | Sora 2 | Single generation | | UGC style | Kling v2.5 | Handles casual movement best |