Tapestry: Unified Content Extraction + Action Planning Skill

Tapestry: Unified Content Extraction + Action Planning

Master skill that orchestrates the entire Tapestry workflow:

Detect content type from URL
Extract content using appropriate method
Create a Ship-Learn-Next action plan automatically

Prerequisites

This skill requires UV for dependency management. Run from the tapestry-skills project root.

Workflow Overview

URL → Validate → Detect Type → Extract Content → Create Plan → Save Files

Output: Two files saved:

Content file: [Title].txt
Plan file: Ship-Learn-Next Plan - [Quest].md

Security Requirements

CRITICAL: Before processing ANY URL, validate it first using the tapestry security utilities.

All security utilities are available via UV from the project root.

URL Validation (Required)

URL="$1"

# Run security validation (checks protocol, blocks SSRF, etc.)
uv run tapestry-validate-url "$URL" || exit 1

Filename Sanitization (Required)

# Use tapestry sanitization utility for all titles
SAFE_TITLE=$(uv run tapestry-sanitize-filename "$TITLE")

Step 1: Detect Content Type

detect_content_type() {
    local URL="$1"

    # YouTube patterns
    if [[ "$URL" =~ youtube\.com/watch || "$URL" =~ youtu\.be/ || "$URL" =~ youtube\.com/shorts ]]; then
        echo "youtube"
        return
    fi

    # PDF by extension
    if [[ "$URL" =~ \.pdf($|\?) ]]; then
        echo "pdf"
        return
    fi

    # PDF by Content-Type header
    if curl -sI --max-time 10 "$URL" | grep -iq "Content-Type:.*application/pdf"; then
        echo "pdf"
        return
    fi

    # Default to article
    echo "article"
}

CONTENT_TYPE=$(detect_content_type "$URL")
echo "Detected: $CONTENT_TYPE"

Step 2: Extract Content

YouTube Extraction

Use the youtube-transcript skill workflow:

# yt-dlp is available through UV
VIDEO_TITLE=$(uv run yt-dlp --print "%(title)s" "$URL" 2>/dev/null)
SAFE_TITLE=$(uv run tapestry-sanitize-filename "$VIDEO_TITLE")

# Create temp file
TEMP_DIR=$(mktemp -d)
trap "rm -rf '$TEMP_DIR'" EXIT

# Download transcript (try manual first, then auto-generated)
if ! uv run yt-dlp --write-sub --skip-download --sub-langs en -o "$TEMP_DIR/transcript" "$URL" 2>/dev/null; then
    uv run yt-dlp --write-auto-sub --skip-download --sub-langs en -o "$TEMP_DIR/transcript" "$URL"
fi

# Find and convert VTT to clean text
VTT_FILE=$(find "$TEMP_DIR" -name "*.vtt" | head -n 1)
uv run tapestry-vtt-to-text "$VTT_FILE" --output "${SAFE_TITLE}.txt"

CONTENT_FILE="${SAFE_TITLE}.txt"

Article Extraction

Use the article-extractor skill workflow:

# Check for extraction tools
if command -v reader &> /dev/null; then
    TOOL="reader"
else
    TOOL="trafilatura"
fi

TEMP_FILE=$(mktemp)
trap "rm -f '$TEMP_FILE'" EXIT

case $TOOL in
    reader)
        reader "$URL" > "$TEMP_FILE"
        TITLE=$(head -n 1 "$TEMP_FILE" | sed 's/^# //')
        ;;
    trafilatura)
        uv run trafilatura --URL "$URL" --output-format txt --no-comments > "$TEMP_FILE"
        TITLE=$(uv run trafilatura --URL "$URL" --json 2>/dev/null | \
            python3 -c "import json,sys; print(json.load(sys.stdin).get('title','Article'))" 2>/dev/null || echo "Article")
        ;;
esac

# Fallback if extraction failed
if [ ! -s "$TEMP_FILE" ]; then
    uv run tapestry-extract-html "$URL" --output "$TEMP_FILE"
    TITLE=$(head -n 1 "$TEMP_FILE" | sed 's/^# //')
fi

SAFE_TITLE=$(uv run tapestry-sanitize-filename "$TITLE")
CONTENT_FILE="${SAFE_TITLE}.txt"
mv "$TEMP_FILE" "$CONTENT_FILE"
trap - EXIT

PDF Extraction

# Sanitize filename from URL
URL_BASENAME=$(basename "$URL" | cut -d'?' -f1)
SAFE_PDF=$(uv run tapestry-sanitize-filename "$URL_BASENAME")

# Ensure .pdf extension
[[ "$SAFE_PDF" != *.pdf ]] && SAFE_PDF="${SAFE_PDF}.pdf"

# Download with security checks
uv run tapestry-safe-download "$URL" "$SAFE_PDF" --max-size 104857600

# Verify it's actually a PDF
if ! head -c 4 "$SAFE_PDF" | grep -q '%PDF'; then
    echo "Error: Downloaded file is not a valid PDF"
    rm -f "$SAFE_PDF"
    exit 1
fi

# Extract text if pdftotext available
if command -v pdftotext &> /dev/null; then
    CONTENT_FILE="${SAFE_PDF%.pdf}.txt"
    pdftotext "$SAFE_PDF" "$CONTENT_FILE"
    echo "Extracted text to: $CONTENT_FILE"
else
    echo "Note: pdftotext not found. Install with: brew install poppler"
    CONTENT_FILE="$SAFE_PDF"
fi

Step 3: Create Action Plan

After extracting content, invoke the ship-learn-next skill logic:

Read the extracted content file
Extract 3-5 core actionable lessons
Define a specific 4-8 week quest
Design Rep 1 (shippable this week)
Outline Reps 2-5 (progressive iterations)
Save as: Ship-Learn-Next Plan - [Quest Title].md

Key points:

Focus on actionable lessons, not summaries
Rep 1 must be completable in 1-7 days
Each rep produces real artifacts
Emphasize doing over studying

Step 4: Present Results

Tapestry Workflow Complete!

Content Extracted:
  Type: [youtube/article/pdf]
  Title: [Title]
  Saved to: [filename.txt]
  Words: [X]

Action Plan Created:
  Quest: [Quest title]
  Saved to: Ship-Learn-Next Plan - [Title].md

Rep 1 (This Week): [Rep 1 goal]

When will you ship Rep 1?

Error Handling

| Issue | Action | |-------|--------| | UV not installed | Install with curl -LsSf https://astral.sh/uv/install.sh \| sh | | Invalid URL | Reject with clear message | | No subtitles (YouTube) | Offer Whisper transcription (with consent) | | Paywall/login required | Inform user, cannot extract | | Download failed | Check URL, retry, inform user | | Empty extraction | Verify before planning, don't create empty plan |

Dependencies

All dependencies are managed via UV and pyproject.toml:

yt-dlp: YouTube downloads (pinned version)
trafilatura: Article extraction (pinned version)
openai-whisper (optional): For videos without subtitles

System tools (install separately if needed):

pdftotext: PDF text extraction (brew install poppler)
reader: Mozilla Readability (npm install -g reader-cli)

Security Reference

For detailed security guidelines, see: ../shared/references/security-guidelines.md

Key requirements:

Validate all URLs before processing
Sanitize all filenames
Use temp files with cleanup traps
Set download size limits
Quote all variables in shell commands

Agent Skills: Tapestry: Unified Content Extraction + Action Planning

Install this agent skill to your local

Skill Files