Agent Skills: YouTube Publish (Scripted Flow)

End-to-end YouTube publishing workflow using ordered scripts: prepare/concat video, upload draft, transcribe with Parakeet, generate copy+thumbnails with Gemini, update YouTube metadata, then schedule socials (Postiz) 15 minutes after publish.

UncategorizedID: devexpert-io/devexpert-skills/youtube-publish

Install this agent skill to your local

pnpm dlx add-skill https://github.com/antoniolg/agent-kit/tree/HEAD/skills/youtube-publish

Skill Files

Browse the full folder contents for youtube-publish.

Download Skill

Loading file tree…

skills/youtube-publish/SKILL.md

Skill Metadata

Name
youtube-publish
Description
"End-to-end YouTube publishing workflow using ordered scripts: prepare/concat video, upload draft, transcribe with Parakeet, generate copy with the calling model, optionally prepare English dubbing assets, render thumbnails, update YouTube metadata, then schedule socials (PostFlow) 15 minutes after publish."

YouTube Publish (Scripted Flow)

Use scripts in order. Stop for validation after copy + thumbnail generation. If the user did not already specify them, ask up front for:

  • the exact publish day/time
  • whether they want English dubbing for YouTube
  • whether they also want the English X variant

Rule: the English X variant depends on the English YouTube dubbing pack. It is valid to do English dubbing for YouTube without doing the English X variant, but not the other way around.

Behavior rules for the agent

  • Tone & Authority: Titles and copy must focus on engineering, architecture, and solving developer friction. Avoid reaction-style hype and mass-content phrasing.
  • Title Blacklist (strict): Forbidden in titles: RIP, Increíble, Brutal, Locura, Definitivo, ¿El fin de...?, and crown/fire emojis.
  • Technical Anchor (strict): Every title must include at least one engineering keyword: Orquestación, Despliegue, Infraestructura, Clean Architecture, Refactorización, Pipeline, Capa de Abstracción.
  • Title Derivation: Do not ask for a title hint; derive it from the video stem and the technical density of the SRT.
  • Scheduling: If the user provides a publish time, resolve to exact YYYY-MM-DD HH:MM using system time and pass --publish-at + --timezone. Always determine and pass --timezone.
  • Missing decisions: If publish date/time, English YouTube dubbing, or English X dubbing were not explicitly provided, ask for them before starting the workflow.
  • Content Generation Engine: Titles, thumbnail ideas, description, chapters, and LinkedIn copy must be written by the same model executing this skill.
  • English Variant: When the user wants a multilingual YouTube version, always generate an English pack for the same video: translated transcript, English title, English description, and dubbed English audio.
  • English Scope: The English pack is for YouTube multi-language audio/localization only. Do not create English social posts unless the user explicitly asks for them.
  • English X Dependency: If the user wants the English X variant, that implies the English YouTube dubbing pack must also be produced first.
  • Thumbnail Generation: Generate 3 thumbnails using the presenter photo set. Default presenter is antonio (assets/antonio-1.png, antonio-2.png, antonio-3.png). If the user indicates the video is from Nino, switch presenter to nino (assets/nino-1.png, nino-2.png, nino-3.png). Keep only two non-negotiables: (1) massive bold white text (max 3-4 words), (2) cinematic dark look with cyan/magenta accents. Everything else should adapt to the video's narrative with maximum creative freedom.
  • Reference Photos (strict): For each generated thumbnail, always pass the 3 presenter images together as references. They are identity anchors (not fixed poses); the model is free to choose the best posture/composition.
  • Thumbnail Engine: Reuse nano-banana-pro/scripts/generate_image.py as the single image generation engine. Keep default model behavior (Flash). Only override model explicitly when requested.
  • Thumbnail Creativity Rule: Deliver 1 safer option + 2 exploratory options. Avoid producing near-duplicates. If an unconventional concept communicates better for that specific video, prioritize it.
  • English Thumbnail Rule: If the English YouTube dubbing pack is requested, once the user chooses the final thumbnail you must create an English-edited version of that same thumbnail by editing the selected image so its main headline text is in English instead of Spanish. Keep the composition, styling, and identity intact; only adapt the main headline text.
  • Workflow: Upload a private draft before generating copy so the video URL can be used in social text.
  • Newsletter: Disabled in this flow. Do not generate or schedule newsletter here.
  • X Strategy: Do not schedule/publish to X via PostFlow in this flow. X is handled as native video upload outside this step.
  • X Variants: The default X asset is the Spanish native video with the selected Spanish thumbnail embedded as the first 500ms. If the user requested the English X variant too, also build an English native X video using the English-dubbed video plus the English-edited thumbnail.
  • Links: In social posts, the comment must not be just the link; it must include a brief descriptive text inviting to watch (e.g., "Watch the full technical analysis here: https://...").
  • Comment Sequence: For final publish/update, always use this order: set video to unlisted, insert promo comment (Domina la IA...), then set final status (private with publishAt if scheduled, otherwise private).
  • Schedule Decision Required: Never publish without an explicit decision in Programación (final): either a date YYYY-MM-DD HH:MM or private.
  • Timing: Schedule social posts 15 minutes after the YouTube publish time.
  • Audio Consistency: prepare_video.py runs audio normalization in auto mode by default. It analyzes LUFS/true peak/LRA and only re-encodes audio when out of target.

Content Styles

LinkedIn Post Style

  • Length/Format: 600–900 characters, 3–6 short paragraphs, 1–2 emojis.
  • Strategy (Signal vs Noise): Start with a principle or real engineering problem, not with "new video" framing.
  • Identity: Keep the tone of technical authority. Fewer creator-marketing phrases, more architecture conclusions and tradeoffs.
  • Scope: 1 central idea focused on technical authority. No digressions.
  • Closing: Final line “Link en el primer comentario.” followed by a short question or CTA.
  • Restrictions: No hashtags.

Scripted flow (order)

  1. Prepare video

    • Command:
      python scripts/prepare_video.py --videos /path/v1.mp4 [/path/v2.mp4 ...]
      
    • Audio behavior (default): --audio-normalization auto targets -14 LUFS, -1 dBTP, and max LRA 9; it skips normalization when already in range.
    • Optional:
      • --audio-normalization always to force normalization.
      • --audio-normalization off to skip analysis and normalization.
    • Output JSON with workdir, video, slug.
  2. Upload draft (private)

    • Command:
      python scripts/upload_draft.py --video <video> --output-video-id <workdir>/video_id.txt --client-secret <path>
      
    • Write video_id.txt and create video_url.txt.
  3. Transcribe + clean

    • Command:
      python scripts/transcribe_parakeet.py --video <video> --out-dir <workdir>
      
    • Outputs:
      • transcript.es.cleaned.srt
      • transcript.es.dub.srt (same transcript resegmented into more natural dubbing units)
    • After transcription, copy the SRT to the vault transcripts folder:
      cp <workdir>/transcript.es.cleaned.srt ~/Documents/aipal/transcripts/<YYYY-MM-DD>-<slug>.srt
      
    • Use today's date and the video slug from step 1 as the filename.
  4. Prepare English dubbing assets (when multilingual output is requested)

    • Read <workdir>/transcript.es.dub.srt when present (fallback: transcript.es.cleaned.srt) and create:
      • <workdir>/transcript.en.srt translated to natural English while preserving timestamps.
      • <workdir>/title.en.txt with 1 English YouTube title for the dubbed track.
      • <workdir>/description.en.txt with 1 English YouTube description for the dubbed track.
    • Generate dubbed English audio using the youtube-dubber project.
    • Default dubbing path:
      • scripts/dub_voxtral.py
      • model voxtral-mini-tts-latest
      • English reference clip from the presenter's own voice when available
    • Fallback path:
      • Chatterbox / Qwen only if Voxtral is unavailable or clearly worse for a specific run
    • Save at least:
      • <workdir>/dubbed_audio.en.wav
      • <workdir>/dubbed_video.en.mp4 if the dubbing pipeline also muxes the video
      • <workdir>/title.en.txt
      • <workdir>/description.en.txt
    • Keep the English title/description technically faithful to the Spanish source, not marketing-localized beyond what is needed for natural English.
    • The goal is to run Voxtral through the same timed dubbing pipeline as the other models, not a manual narration-only shortcut.
  5. Generate copy with the calling model

    • Read <workdir>/transcript.es.cleaned.srt directly and generate:
      • 3 Technical Authority Titles.
      • 3 Thumbnail ideas (Artifact-based).
      • Description (remove any self-link to current video).
      • Chapters (MM:SS).
      • LinkedIn post (per rules).
    • Save the result into <workdir>/content.md.
    • Also save thumbnail concepts into <workdir>/ideas.json with this shape:
      {
        "titles": ["...", "...", "..."],
        "thumbnails": [
          {"text": "...", "artifact": "...", "concept": "..."},
          {"text": "...", "artifact": "...", "concept": "..."},
          {"text": "...", "artifact": "...", "concept": "..."}
        ]
      }
      
    • Make sure content.md contains at least these sections so downstream validation stays compatible:
      # Pack YouTube — <slug>
      
      ## Enlace del vídeo
      <video_url>
      
      ## Títulos
      - ...
      - ...
      - ...
      
      ## Ideas de thumbnails
      1. Texto: ...
         Artifact: ...
         Concept: ...
      
      ## Descripción
      ...
      
      ## Capítulos
      00:00 ...
      
      ## LinkedIn
      ...
      
      ## Título (final)
      
      ## Descripción (final)
      
      ## Capítulos (final)
      
      ## Post LinkedIn (final)
      
      ## Thumbnail (final)
      
      ## Programación (final)
      (YYYY-MM-DD HH:MM o "private")
      
      ## Title (EN)
      
      ## Description (EN)
      
    • Title quality gate: reject title candidates that break blacklist or technical-anchor rules.
  6. Generate 3 thumbnails

    • Use presenter photos according to context: by default antonio, and nino only when explicitly requested for a Nino video. Create 3 images into <workdir>/thumb-1.png, thumb-2.png, thumb-3.png.
    • The same model executing the skill must derive the 3 image prompts from the transcript and ideas.json, then save each prompt into <workdir>/thumb-1.prompt.txt, thumb-2.prompt.txt, thumb-3.prompt.txt.
    • Keep the two anchors fixed (massive white text + cinematic cyan/magenta look), but allow concept/composition/artifact/background to vary freely by story.
    • Target mix: 1 safe option + 2 exploratory options.
    • Example with multiple inputs:
      uv run /path/to/nano-banana-pro/scripts/generate_image.py --prompt "Antonio working..." --filename "thumb-1.png" --input-image assets/antonio-1.png assets/antonio-2.png assets/antonio-3.png
      
    • If using helper scripts:
      • Batch render from an existing ideas.json: python scripts/generate_missing_thumbs.py --presenter antonio --out-dir <videos-root>
      • Nino video: python scripts/generate_missing_thumbs.py --presenter nino --out-dir <videos-root>
      • Optional override: --image-model <model> only if explicitly needed; otherwise keep the default model.
  7. Stop to ask for validation of:

    • Title (choose one of the 3 generated).
    • Thumbnail (choose one of the 3 generated).
    • Description (edit if needed).
    • Chapters (edit if needed).
    • LinkedIn post (edit if needed).
    • English title (edit if needed).
    • English description (edit if needed).
    • After the user confirms the final thumbnail and the English YouTube dubbing pack is enabled, create the English-edited thumbnail before any final X-video build.
  8. Update YouTube

    • Command:
      python scripts/update_youtube.py --video-id <id> --title "..." --description-file <desc.txt> --thumbnail <thumb.png> --publish-at "YYYY-MM-DD HH:MM" --timezone <IANA> --client-secret <path>
      
  9. Build native X video variant (after thumbnail choice)

    • Command:
      python scripts/build_x_native_video.py --video <video.mp4> --thumbnail <thumb.png> --output <workdir>/video-x.mp4 --intro-ms 500
      
    • Result: a version ready for X where the first 500ms shows the selected thumbnail as a static cover frame.
    • Always build the Spanish X variant from the original Spanish video + final Spanish thumbnail.
    • If the user requested the English X variant too, also build:
      python scripts/build_x_native_video.py --video <workdir>/dubbed_video.en.mp4 --thumbnail <english-thumb.png> --output <workdir>/video-x.en.mp4 --intro-ms 500
      
    • The English X variant must use the English-edited thumbnail, not the Spanish one.
  10. Schedule socials (PostFlow, excluding X)

  • Command:
    python scripts/schedule_socials.py --text-file <linkedin.txt> --scheduled-date <ISO8601+offset> --comment-url <video_url> --image <thumb.png>
    
  • This script publishes to configured socials except X.
  • Note: schedule_socials.py percent-encodes underscores in the --comment-url (e.g. _ -> %5F) to avoid LinkedIn URL formatting issues.
  1. Final Reminder
  • Explicitly remind the user to go to YouTube Studio to:
    • Enable monetization (not supported via API).
    • Add End Screens (not supported via API).
    • If multilingual output was requested, upload the English dubbed audio track and apply the English title/description in the YouTube Studio multi-language UI.
    • If the English X variant was requested, remember that there should now be two native X assets ready: the Spanish video-x.mp4 and the English video-x.en.mp4.