AI Talking Head Skill | Agent Skills

AI Talking Head

Generate talking head videos, presenter content, and lip-synced videos.

Use this skill when: You need a person (real or AI) talking to camera. Route here from: ai-creative-workflow, ai-creative-strategist, or direct requests.

Why This Skill Exists

The problem: Talking head videos are the most persuasive content format but:

Recording yourself is time-consuming and requires confidence
Professional presenters are expensive ($500-5000+ per video)
UGC creators charge $100-500 per post and may not match your brand
Iterating on scripts means re-filming everything
Scaling personalized video is nearly impossible manually

The solution: AI talking heads that:

Generate professional presenter videos in minutes
Let you iterate on scripts without re-recording
Create unlimited variants for A/B testing
Maintain consistent brand presenter identity
Scale personalized outreach cost-effectively

The game-changer: Combining avatar generation + lip-sync lets you:

Create a consistent "brand spokesperson"
Update any script without re-filming
Test multiple presenter styles quickly
Produce video content at 10x the speed

Presenter Style Exploration (Before Generation)

Critical insight from ai-creative-strategist: Don't generate with one style and hope it works. Explore genuinely DIFFERENT presenter styles first.

The Style Exploration Process

STEP 1: GENERATE 4-5 DIFFERENT PRESENTER STYLES

This is NOT: Same person with different clothes This IS: Fundamentally different presenter archetypes that each tell a different story

[YOUR BRAND] - Style Exploration

Generate presenter concepts for these 5 directions:

1. CORPORATE AUTHORITY
   - Demographic: 35-50, professional appearance
   - Setting: Modern office, corporate environment
   - Wardrobe: Business professional, suit/blazer
   - Energy: Confident, measured, authoritative
   - Vibe: "Trust the expert"

2. RELATABLE FRIEND
   - Demographic: 25-40, approachable look
   - Setting: Home office, kitchen, casual space
   - Wardrobe: Smart casual, comfortable
   - Energy: Warm, conversational, genuine
   - Vibe: "Let me share what worked for me"

3. ENERGETIC CREATOR
   - Demographic: 22-35, creator aesthetic
   - Setting: Ring light setup, content studio
   - Wardrobe: Trendy casual, branded
   - Energy: High, dynamic, enthusiastic
   - Vibe: "You HAVE to try this"

4. EXPERT EDUCATOR
   - Demographic: 30-55, credible appearance
   - Setting: Study, library, professional backdrop
   - Wardrobe: Smart casual, glasses optional
   - Energy: Calm, explanatory, helpful
   - Vibe: "Let me explain how this works"

5. LIFESTYLE ASPIRATIONAL
   - Demographic: 28-45, aspirational look
   - Setting: Beautiful home, travel location, luxury
   - Wardrobe: Elevated casual, tasteful
   - Energy: Relaxed confidence, success aura
   - Vibe: "This is what my life looks like"

STEP 2: IDENTIFY WINNER

After generating style exploration:

REVIEW each presenter style:

Which presenter:
- Best matches brand voice?
- Would audience trust most?
- Fits the content type?
- Has right energy level?
- Would work across multiple videos?

WINNER: [Selected style]
BECAUSE: [Why this style wins for this brand/use case]

STEP 3: EXTRACT PRESENTER PRINCIPLES

Once winner identified:

WINNING STYLE EXTRACTION

Demographics:
- Age range: [X-X]
- Gender: [if specific]
- Ethnicity: [if specific]
- Overall look: [descriptors]

Environment:
- Primary setting: [where they present from]
- Background elements: [what's visible]
- Lighting style: [natural/studio/mixed]

Wardrobe:
- Style: [formal/casual/etc.]
- Colors: [palette]
- Accessories: [if any]

Delivery:
- Energy level: [1-10]
- Speaking pace: [slow/medium/fast]
- Hand gestures: [minimal/moderate/expressive]
- Eye contact: [direct to camera always]

Audio:
- Voice tone: [warm/authoritative/energetic]
- Pacing: [conversational/punchy/measured]

STEP 4: APPLY ACROSS CONTENT

Use extracted principles for:

All future videos maintain consistency
Same presenter = brand recognition
Variations in script, not in presenter

Presenter Archetype Deep Dives

Corporate Authority

When to use: B2B, financial services, healthcare, enterprise SaaS, professional services

Visual Formula:

[Man/Woman] in [30s-50s], [silver/dark hair], wearing [tailored blazer/suit],
in [modern glass office/conference room with city view], [warm professional lighting],
[confident composed expression], [seated at desk OR standing with slight lean],
[direct eye contact with camera], [subtle hand gestures], corporate executive style

Setting Options:

Corner office with city view
Modern conference room
Executive desk with minimal decor
Standing at presentation screen
Seated in designer chair

Wardrobe Options:

Tailored navy blazer over white shirt
Grey suit, no tie (modern)
Classic suit with subtle tie
Blazer over turtleneck (thought leader)
Professional dress (solid colors)

Energy Markers:

Measured pace
Deliberate movements
Confident pauses
Minimal but purposeful gestures
Assured vocal tone

Relatable Friend (UGC Style)

When to use: DTC brands, consumer products, wellness, beauty, lifestyle

Visual Formula:

[Friendly man/woman] in [25-40s], wearing [casual but put-together outfit],
in [bright modern apartment/kitchen/home office], [natural window light],
[genuine warm smile], [relaxed comfortable posture], [talking to camera like
a friend], [natural hand movements], authentic UGC creator style

Setting Options:

Bright kitchen counter
Cozy living room couch
Home office with plants
Bedroom getting-ready setup
Outdoor patio/balcony

Wardrobe Options:

Cozy sweater/cardigan
Simple t-shirt
Casual button-down
Loungewear (if brand appropriate)
Athleisure

Energy Markers:

Conversational rhythm
Natural pauses ("honestly?", "okay so...")
Expressive facial reactions
Genuine enthusiasm without over-selling
Relatable body language

UGC Script Patterns:

DISCOVERY: "Okay so I found this [product] and I'm obsessed..."
REVIEW: "So I've been using [product] for [time] and here's my honest take..."
COMPARISON: "I used to use [old product] but then I tried [new product]..."
TRANSFORMATION: "Before [product] I was [problem]. Now? [result]."

Energetic Creator

When to use: Gen-Z products, entertainment, gaming, trendy DTC, social apps

Visual Formula:

[Young energetic creator] in [22-35], [colorful trendy outfit], in [content
studio with ring light/neon lights], [bright dynamic lighting], [animated
expressions], [lots of movement and gestures], [high energy delivery],
[fast-paced enthusiastic style], YouTube/TikTok creator aesthetic

Setting Options:

Ring light setup visible
LED/neon accent lighting
Streaming/gaming setup
Colorful backdrop
Outdoor action setting

Wardrobe Options:

Graphic tees
Bold colors
Branded merch
Trendy streetwear
Statement accessories

Energy Markers:

Fast-paced delivery
Big expressions
Lots of hand movement
Pattern interrupts
Enthusiasm at 10

Creator Script Patterns:

HOOK: "STOP scrolling. This is important."
REVEAL: "I literally just discovered [thing] and I'm freaking out."
CHALLENGE: "I bet you can't guess what [product] does."
REACTION: "[reaction to trying product]... WAIT what?!"

Expert Educator

When to use: Online courses, professional services, B2B explainers, tutorials

Visual Formula:

[Knowledgeable expert] in [30s-55], [smart casual or academic style],
in [home study/office with books/whiteboard], [balanced lighting],
[thoughtful composed expression], [explaining with purposeful gestures],
[patient instructive tone], educator/thought leader style

Setting Options:

Study with bookshelves
Office with credentials visible
Whiteboard/screen behind
Standing at presentation
Desk with relevant props

Wardrobe Options:

Button-down shirt
Blazer over casual shirt
Sweater over collared shirt
Glasses (authority signal)
Minimal accessories

Energy Markers:

Patient pace
Teaching rhythm
Logical structure
Illustrative gestures
"Here's what matters" moments

Lifestyle Aspirational

When to use: Luxury brands, high-ticket services, aspirational DTC, travel, real estate

Visual Formula:

[Elegant successful person] in [30s-50s], [elevated casual attire],
in [beautiful interior/scenic location], [golden hour OR designer lighting],
[relaxed confident demeanor], [speaking with quiet confidence], [minimal
but graceful movement], aspirational lifestyle aesthetic

Setting Options:

Designer living room
Travel location (balcony view)
Luxury car interior
High-end restaurant/hotel
Yacht/beach/resort

Wardrobe Options:

Designer casual
Linen/natural fabrics
Neutral luxury palette
Subtle jewelry/watch
Effortlessly elegant

Energy Markers:

Relaxed confidence
No rushing
"I have time" energy
Subtle smile
Quiet success vibes

Video Model Roster (Quality Winners)

Generate presenter videos with ALL THREE models, present outputs for selection:

| Model | Owner | Speed | Strengths | |-------|-------|-------|-----------| | Sora 2 | openai | ~80s | Excellent general quality, good faces | | Veo 3.1 | google | ~130s | Native audio generation, natural movement | | Kling v2.5 Turbo Pro | kwaivgi | ~155s | Best for people/motion, most realistic |

Strategy: Run same prompt through all 3 models → User picks best output.

Model Selection Guide

FOR MAXIMUM REALISM (people quality):
    → Kling v2.5 Turbo Pro (best faces, most natural movement)

FOR SPEED + QUALITY BALANCE:
    → Sora 2 (fastest, still good quality)

FOR BUILT-IN AUDIO:
    → Veo 3.1 (generates audio with video)

FOR UGC AUTHENTICITY:
    → Kling v2.5 (handles casual movements well)

FOR CORPORATE/FORMAL:
    → Sora 2 or Kling v2.5 (cleaner, more controlled)

Lip-Sync Model

For adding speech to existing videos:

| Model | Use | Cost | Speed | Quality | |-------|-----|------|-------|---------| | Kling Lip-Sync | Add voiceover to any video | ~$0.20 | ~1min | Excellent |

When to use Lip-Sync:

You have a great presenter video but need different script
Client wants to change messaging after video generation
Creating personalized versions of same base video
Adding voiceover to product demo videos
Dubbing content for different languages

Use Cases Deep Dive

1. Lip-Sync Overlay

Best for: Adding voiceover to existing video, dubbing, personalization

Input Requirements:

Video with visible face (front-facing works best)
Audio file (MP3, WAV) OR text script

Workflow:

{
  "model_owner": "kwaivgi",
  "model_name": "kling-lip-sync",
  "Prefer": "wait",
  "input": {
    "video": "https://... (source video URL)",
    "audio": "https://... (audio file URL)"
  }
}

Or with text (uses built-in TTS):

{
  "input": {
    "video": "https://... (source video URL)",
    "text": "Script text to speak"
  }
}

Quality Tips:

Source video should have face visible 70%+ of time
Forward-facing shots work better than profiles
Avoid videos with heavy face movement/turning
Audio should be clear without background noise
Script pacing should match natural speech

2. AI Presenter Generation

Best for: Creating presenter content from scratch, brand spokesperson

Multi-Model Workflow:

// Sora 2
{
  "model_owner": "openai",
  "model_name": "sora-2",
  "input": {
    "prompt": "[presenter prompt]",
    "aspect_ratio": "16:9",
    "duration": 5
  }
}

// Veo 3.1 (with native audio)
{
  "model_owner": "google",
  "model_name": "veo-3.1",
  "input": {
    "prompt": "[presenter prompt]",
    "aspect_ratio": "16:9",
    "generate_audio": true
  }
}

// Kling v2.5
{
  "model_owner": "kwaivgi",
  "model_name": "kling-v2.5-turbo-pro",
  "input": {
    "prompt": "[presenter prompt]",
    "aspect_ratio": "16:9",
    "duration": 5
  }
}

Then add lip-sync if specific script needed:

{
  "model_owner": "kwaivgi",
  "model_name": "kling-lip-sync",
  "input": {
    "video": "[generated video URL]",
    "text": "[script text]"
  }
}

3. UGC-Style Content

Best for: Authentic testimonials, product reviews, social proof

The UGC Formula:

[Relatable person] + [Casual setting] + [Natural lighting] +
[Authentic delivery] + [Genuine reaction] = Believable UGC

Prompt Template:

Friendly [demographic] sitting in [casual setting], natural window light,
holding/showing [product], genuine excited expression, talking directly to
camera like filming a selfie video, authentic UGC testimonial style, casual
comfortable body language, 5 seconds

UGC Authenticity Markers:

Slightly imperfect framing
Natural lighting (not studio)
Casual wardrobe
Real reactions, not posed
Personal space as backdrop
Eye contact with camera

4. Personal Brand Series

Best for: Thought leaders, course creators, coaches, consultants

Consistency Formula:

ESTABLISH ONCE, USE FOREVER:
- Same presenter appearance
- Same setting/background
- Same wardrobe style
- Same energy level
- Same lighting setup

Only change: Script and specific content

Series Prompt Template:

[Consistent presenter description - use same each time], [same setting],
[same lighting], [same wardrobe style], [same energy], discussing [new topic],
[consistent delivery style], 5 seconds

Script Mastery

Duration Calculation

| Word Count | Duration | Use Case | |------------|----------|----------| | 15 words | ~5 seconds | Social hook | | 30 words | ~10 seconds | Instagram Reel | | 45 words | ~15 seconds | TikTok optimal | | 60 words | ~20 seconds | Short testimonial | | 90 words | ~30 seconds | Product explainer | | 150 words | ~60 seconds | Full testimonial |

Rule: ~150 words per minute at natural conversational pace

Script Structures

HOOK-VALUE-CTA (15-30 seconds):

Hook (0-3 sec): [Attention-grabber - question, statement, or pattern interrupt]
Value (3-20 sec): [Main message, benefit, or story]
CTA (20-30 sec): [Clear next step]

PROBLEM-AGITATE-SOLVE (30-60 seconds):

Problem (0-10 sec): [Name the pain point]
Agitate (10-30 sec): [Make them feel it]
Solve (30-60 sec): [Present the solution + CTA]

BEFORE-AFTER (15-30 seconds):

Before (0-10 sec): [Life before product/solution]
After (10-25 sec): [Transformation/result]
CTA (25-30 sec): [How to get same result]

Tone Templates

Professional/Corporate:

"[Name] here with [Company]. Today I want to share how [product/insight]
can help you [achieve outcome]. Here's what you need to know..."

Casual/UGC:

"Okay so I've been using [product] for [time] and honestly? I'm obsessed.
Here's why [specific benefit]. If you [problem], you need this."

Expert/Educational:

"One thing I see people get wrong about [topic] is [misconception].
Here's what actually works: [insight]. Let me show you..."

Energetic/Sales:

"Stop what you're doing. [Product] just changed everything. I'm serious -
[result] in [timeframe]. You HAVE to try this."

Aspirational:

"[Casual opening]. I wanted to share something that's completely transformed
[area of life]. [Product] gave me [result]. Here's how it works..."

Platform-Specific Optimization

TikTok/Reels (9:16)

Specs:

Aspect Ratio: 9:16 (vertical)
Duration: 15-30 seconds optimal
Safe Zone: Keep face/text center 60%

Style Adjustments:

→ Higher energy delivery
→ Faster pacing
→ Hook in first 1-2 seconds
→ Pattern interrupts
→ Jump cuts acceptable
→ Casual/authentic feel

Prompt Modifier:

...[base prompt], filmed vertically like TikTok/Reels content,
energetic creator style, direct eye contact with camera

YouTube (16:9)

Specs:

Aspect Ratio: 16:9 (landscape)
Duration: 30-120 seconds
Safe Zone: Standard letterbox

Style Adjustments:

→ More measured pacing
→ Can be longer form
→ More professional setups accepted
→ Room for B-roll integration
→ Intro/outro structure

Prompt Modifier:

...[base prompt], widescreen YouTube style, professional yet engaging,
room for graphics/lower thirds

LinkedIn (1:1 or 16:9)

Specs:

Aspect Ratio: 1:1 (square) or 16:9
Duration: 30-60 seconds optimal
Tone: Professional but personal

Style Adjustments:

→ Professional appearance
→ Business-appropriate setting
→ Thought leadership tone
→ Value-first messaging
→ Credibility signals

Prompt Modifier:

...[base prompt], professional LinkedIn style, credible expert appearance,
business casual in modern office environment

Instagram Stories (9:16)

Specs:

Aspect Ratio: 9:16
Duration: 15 seconds max per segment
Ephemeral feel

Style Adjustments:

→ Casual, in-the-moment feel
→ Can be "rougher" quality
→ Direct audience address
→ Personal/behind-scenes vibe
→ Clear single message per story

Ads (Various)

Facebook/Instagram Ads:

1:1, 4:5, or 9:16
15-30 second optimal
Hook in 0-3 seconds
Clear CTA

YouTube Ads:

16:9
15-30 second (skippable) or 6 second (bumper)
Brand visible throughout

Audio & Voice Considerations

When Using Veo 3.1 Native Audio

Strengths:

Generates synchronized audio with video
Natural ambient sounds
Speech that matches lip movement
Good for establishing scenes

Limitations:

Less control over specific script
Audio quality varies
May need post-processing

When Adding Lip-Sync

Best Practices:

Use high-quality audio recording
Match energy level to video presenter
Pace script to natural speaking rhythm
Allow for breath pauses
Keep sentences short (easier sync)

Voice-Over Tips

If recording your own VO for lip-sync:

□ Record in quiet environment
□ Use consistent distance from mic
□ Match energy to presenter style
□ Natural pauses between sentences
□ Clear enunciation
□ Export as MP3 or WAV

If using TTS (text input):

□ Use punctuation for natural pauses
□ Write phonetically for tricky words
□ Keep sentences conversational length
□ Test different phrasings
□ Consider adding "..." for pauses

Execution Workflow

Step 1: Clarify Requirements

Before generating:

□ What's the use case? (UGC, corporate, educational, etc.)
□ What platform? (TikTok, YouTube, LinkedIn, ads)
□ What aspect ratio? (9:16, 16:9, 1:1)
□ What duration? (and word count)
□ What presenter style? (see archetypes)
□ What's the script/message?
□ Need lip-sync to specific audio?

Step 2: Style Selection

If not predefined:

□ Generate style exploration with 4-5 different presenter styles
□ Present options to user
□ Extract principles from winner
□ Document for consistency

Step 3: Construct Prompt

Use this formula:

[PRESENTER DESCRIPTION] + [SETTING] + [LIGHTING] +
[EXPRESSION/ENERGY] + [ACTION] + [STYLE MODIFIER] + [DURATION]

Step 4: Multi-Model Generation

Run same prompt through:
1. Sora 2 (~80s)
2. Veo 3.1 (~130s)
3. Kling v2.5 (~155s)

Present all three to user for selection.

Step 5: Add Lip-Sync (If Needed)

If specific script delivery required:

1. User approves video from Step 4
2. Run through Kling Lip-Sync
3. Input: selected video + audio/text
4. Output: synced talking head

Step 6: Deliver & Iterate

## Talking Head Video Options

**Style:** [Archetype used]
**Platform:** [Target platform]
**Duration:** [X seconds]

### Option 1: Sora 2
[video URL]
Notes: [quality assessment]

### Option 2: Veo 3.1 (with audio)
[video URL]
Notes: [quality assessment]

### Option 3: Kling v2.5
[video URL]
Notes: [quality assessment]

**Select preferred video for lip-sync or final delivery.**

Quality Checklist

Technical Quality

[ ] Face clearly visible throughout
[ ] No uncanny valley artifacts
[ ] Consistent appearance (no morphing)
[ ] Smooth natural movement
[ ] Appropriate resolution for platform

Presenter Quality

[ ] Matches intended archetype
[ ] Expression appropriate for message
[ ] Energy level fits content type
[ ] Wardrobe matches brand/context
[ ] Setting supports message

Lip-Sync Quality (if applicable)

[ ] Mouth movement matches audio
[ ] Natural speech rhythm
[ ] No obvious desync
[ ] Head movement doesn't break sync
[ ] Audio quality clear

Content Quality

[ ] Script delivered clearly
[ ] Pacing appropriate for platform
[ ] Hook captures attention
[ ] Message comes through
[ ] CTA clear (if applicable)

Common Issues & Solutions

| Issue | Cause | Solution | |-------|-------|----------| | Uncanny valley feel | Model limitations | Use Kling v2.5 for most realistic faces | | Face morphing mid-video | Long duration | Keep videos shorter (5-10 sec), extend with cuts | | Lip-sync drift | Audio/video mismatch | Use shorter scripts, clear enunciation | | Wrong energy level | Prompt too vague | Be explicit about energy: "calm" vs "enthusiastic" | | Generic stock presenter | No specific direction | Add detailed demographic and style descriptors | | Setting doesn't match | Prompt conflict | Prioritize setting description, remove conflicts | | Awkward hand movement | Unspecified gestures | Add gesture direction or specify "minimal movement" | | Bad lighting | Missing lighting prompt | Always include lighting: "warm natural light" | | Doesn't look like brand | No style consistency | Create and use presenter spec document | | Audio quality poor | TTS limitations | Use recorded audio instead of text input |

Output Format

Style Exploration Output

## Presenter Style Exploration

**Brand/Project:** [Name]
**Use Case:** [What videos will be used for]

### Style 1: Corporate Authority
[video URL or generation]
- Demographic: [specifics]
- Setting: [description]
- Energy: [level]

### Style 2: Relatable Friend
[video URL or generation]
- Demographic: [specifics]
- Setting: [description]
- Energy: [level]

[...continue for all 5 styles...]

**Recommendation:** Style [X] best fits because [reasons]
**Feedback needed:** Which direction resonates?

Generated Video Output

## Talking Head Video Generated

**Style:** [Archetype]
**Platform:** [Target]
**Duration:** [X seconds]

### Model Outputs:

**Sora 2:** [URL]
**Veo 3.1:** [URL] (includes audio)
**Kling v2.5:** [URL]

**Prompt Used:**
> [full prompt for reference]

**Next Steps:**
- [ ] Select preferred video
- [ ] Add lip-sync to specific script (if needed)
- [ ] Request variation
- [ ] Approve for use

Lip-Sync Output

## Lip-Sync Video Delivered

**Source Video:** [URL]
**Script:** "[excerpt...]"
**Duration:** [X seconds]

**Final Video:** [URL]

**Quality Check:**
- ✓ Sync accuracy
- ✓ Natural rhythm
- ✓ Audio clarity
- ✓ Expression match

**Options:**
- [ ] Approve and use
- [ ] Adjust script and resync
- [ ] Try different source video

Pipeline Integration

TALKING HEAD PIPELINE

┌─────────────────────────────────────────┐
│  Request arrives (direct or routed)     │
│  → Clarify: platform, duration, style   │
│  → Determine: generation vs lip-sync    │
└─────────────────────────────────────────┘
                    │
        ┌───────────┴───────────┐
        ▼                       ▼
┌──────────────────┐   ┌──────────────────┐
│  Style Undefined │   │  Style Defined   │
│  → Run style     │   │  → Skip to       │
│    exploration   │   │    generation    │
└──────────────────┘   └──────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────┐
│  ai-talking-head (THIS SKILL)           │
│  → Multi-model generation               │
│  → Present options                      │
│  → Add lip-sync if needed               │
│  → Quality check                        │
└─────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────┐
│  Delivery                               │
│  → Platform-optimized output            │
│  → Ready for ads/social/content         │
└─────────────────────────────────────────┘

Handoff Protocols

Receiving from ai-creative-workflow

Receive:
  use_case: "talking head" | "UGC" | "presenter" | "lip-sync"
  platform: "[target platform]"
  aspect_ratio: "[ratio]"
  duration: "[seconds]"
  style: "[archetype or custom]"
  script: "[text]"
  audio_url: "[if lip-sync with audio]"
  video_url: "[if lip-sync to existing]"

Returning to Workflow

Return:
  status: "complete" | "needs_selection" | "needs_iteration"
  deliverables:
    - video_url: "[URL]"
      model: "[which model]"
      has_audio: true | false
      duration: "[seconds]"
  feedback_needed: "[any questions]"

Receiving Video from ai-product-video

Receive for lip-sync:
  video_url: "[product video URL]"
  aspect_ratio: "[ratio]"
  script: "[voiceover text]"
  audio_url: "[optional, if pre-recorded]"

Tips from Experience

What Works

Consistency beats variety — Same presenter across videos builds recognition
Kling v2.5 for faces — Most realistic human generation
Shorter is safer — 5-10 second clips avoid quality degradation
Explicit energy levels — "calm and measured" vs "enthusiastic and dynamic"
Multi-model approach — Always generate with 2-3 models, let user pick
Lip-sync extends value — One good video can become many scripts

What Doesn't Work

Vague presenter description — "A person talking" = generic results
Long continuous takes — Quality degrades after 10-15 seconds
Ignoring setting — Presenter without context looks artificial
Skipping style exploration — First idea rarely best for brand
Mismatched energy — Corporate script + UGC style = awkward
Complex movements — Walking + talking + gesturing = artifacts

The 80/20

80% of talking head success comes from:

Clear presenter archetype selection
Matching energy to platform
Short, punchy scripts
Using Kling v2.5 for realism

Get these four right, and you'll get good results.

Quick Reference

| Task | Model | Process | |------|-------|---------| | Generate presenter video | All 3 models | Multi-model, user picks | | Add speech to existing video | Kling Lip-Sync | Direct, ~1min | | Presenter + specific script | Generate → Lip-Sync | Two-step | | Video with built-in audio | Veo 3.1 | Single generation | | Most realistic face | Kling v2.5 | Single or multi-model | | Fastest generation | Sora 2 | Single generation | | UGC style | Kling v2.5 | Handles casual movement best |

Agent Skills: AI Talking Head

Install this agent skill to your local

Skill Files

AI Talking Head

Why This Skill Exists

Presenter Style Exploration (Before Generation)

The Style Exploration Process

Presenter Archetype Deep Dives

Corporate Authority

Relatable Friend (UGC Style)

Energetic Creator

Expert Educator

Lifestyle Aspirational

Video Model Roster (Quality Winners)

Model Selection Guide

Lip-Sync Model

Use Cases Deep Dive

1. Lip-Sync Overlay

2. AI Presenter Generation

3. UGC-Style Content

4. Personal Brand Series

Script Mastery

Duration Calculation

Script Structures

Tone Templates

Platform-Specific Optimization

TikTok/Reels (9:16)

YouTube (16:9)

LinkedIn (1:1 or 16:9)

Instagram Stories (9:16)

Ads (Various)

Audio & Voice Considerations

When Using Veo 3.1 Native Audio

When Adding Lip-Sync

Voice-Over Tips

Execution Workflow

Step 1: Clarify Requirements

Step 2: Style Selection

Step 3: Construct Prompt

Step 4: Multi-Model Generation

Step 5: Add Lip-Sync (If Needed)

Step 6: Deliver & Iterate

Quality Checklist

Technical Quality

Presenter Quality

Lip-Sync Quality (if applicable)

Content Quality

Common Issues & Solutions

Output Format

Style Exploration Output

Generated Video Output

Lip-Sync Output

Pipeline Integration

Handoff Protocols

Receiving from ai-creative-workflow

Returning to Workflow

Receiving Video from ai-product-video

Tips from Experience

What Works

What Doesn't Work

The 80/20

Quick Reference