Style Analyzer
Overview
Extract a structured, quantitative writing style profile from text samples. The profile captures vocabulary patterns, sentence structure, tone markers, and formatting preferences as measurable metrics that can constrain downstream content generation.
Core principle: Style is measurable. Every writer produces a statistical fingerprint across vocabulary, syntax, and formatting dimensions.
When to Use
- Before generating content that should match a specific author's voice
- When building a reusable style profile for brand voice consistency
- When comparing writing styles across multiple authors or documents
- When auditing content for style drift from an established voice
Workflow
Step 1: Ingest Text Samples
Accept text from one or more sources:
# Analyze a single file
node .claude/tools/cli/style-profiler.cjs /path/to/sample.txt
# Analyze a directory of text files
node .claude/tools/cli/style-profiler.cjs /path/to/samples/
# Output is written to .claude/context/data/user-style-profile.json
Alternatively, read files with the Read tool and analyze inline for smaller samples.
Minimum input requirements:
- At least 3 text samples for reliable metrics
- Each sample should be 200+ words
- Samples should represent the author's typical output (not edge cases)
Step 2: Extract Vocabulary Patterns
Analyze word usage across all samples:
- Tokenize all text into words (lowercase, strip punctuation)
- Remove stop words (the, a, is, are, etc.)
- Compute frequency distribution of remaining words
- Extract top 50 vocabulary (most frequently used content words)
- Calculate type-token ratio (unique words / total words) as vocabulary richness score
- Identify signature phrases (2-3 word combinations that appear 3+ times)
Step 3: Analyze Sentence Structure
Measure syntactic patterns:
- Average sentence length (words per sentence)
- Sentence length variance (standard deviation)
- Short sentence ratio (sentences under 8 words / total sentences)
- Long sentence ratio (sentences over 25 words / total sentences)
- Question frequency (questions / total sentences)
- Clause complexity (average commas per sentence as a proxy)
Step 4: Identify Tone Markers
Score the writing on five dimensions (each 1.0 to 5.0):
| Dimension | 1.0 (Low) | 5.0 (High) | How Measured | | ---------- | ----------- | ------------- | ----------------------------------------------- | | Formality | Very formal | Very casual | Contraction frequency, slang usage, punctuation | | Directness | Hedged | Blunt | Hedge word frequency (maybe, perhaps, somewhat) | | Emotion | Neutral | Expressive | Exclamation marks, emotional adjectives | | Humor | Serious | Playful | Parenthetical asides, informal interjections | | Authority | Tentative | Authoritative | Imperative sentences, certainty language |
Step 5: Detect Formatting Preferences
Analyze structural patterns:
- Average paragraph length (sentences per paragraph)
- Heading depth (H1 only, H1-H2, H1-H3, etc.)
- List frequency (bulleted/numbered lists per 1000 words)
- Code block frequency (code blocks per 1000 words, if technical)
- Bold/italic usage (emphasis markers per 1000 words)
- Punctuation signature (em-dash frequency, semicolon frequency, ellipsis frequency)
Step 6: Build Style Profile JSON
Assemble all metrics into a structured profile:
{
"version": "1.0.0",
"createdAt": "2026-03-21T00:00:00Z",
"sampleCount": 5,
"totalWords": 12450,
"vocabulary": {
"topWords": ["specific", "pattern", "implementation", "..."],
"typeTokenRatio": 0.42,
"signaturePhrases": ["in practice", "the key insight"]
},
"sentenceStructure": {
"avgLength": 18.3,
"lengthVariance": 7.2,
"shortSentenceRatio": 0.15,
"longSentenceRatio": 0.22,
"questionFrequency": 0.08,
"avgCommasPerSentence": 1.4
},
"tone": {
"formality": 2.8,
"directness": 4.1,
"emotion": 2.0,
"humor": 1.5,
"authority": 3.8
},
"formatting": {
"avgParagraphLength": 3.2,
"headingDepth": 3,
"listFrequencyPer1000": 2.1,
"codeBlockFrequencyPer1000": 0.8,
"emphasisFrequencyPer1000": 4.5,
"punctuation": {
"emDashFrequency": 0.03,
"semicolonFrequency": 0.01,
"ellipsisFrequency": 0.005,
"exclamationFrequency": 0.02
}
}
}
Save to .claude/context/data/user-style-profile.json.
Iron Laws
- ALWAYS require minimum 3 text samples before computing a style profile -- fewer samples produce unreliable metrics that do not generalize.
- NEVER include stop words in the top-50 vocabulary list -- stop words are universal and carry no style signal.
- ALWAYS save the profile to
.claude/context/data/user-style-profile.json-- downstream skills depend on this exact path. - NEVER treat a single metric in isolation as the style fingerprint -- style is the combination of all dimensions; individual metrics can coincide across very different authors.
- ALWAYS include the
sampleCountandtotalWordsfields in the profile -- consumers need to assess the statistical reliability of the metrics.
Anti-Patterns
| Anti-Pattern | Why It Fails | Correct Approach |
| ------------------------------------------ | ------------------------------------------------------ | -------------------------------------------------------------- |
| Analyzing a single short sample | Insufficient data; metrics reflect one mood, not style | Require 3+ samples of 200+ words each |
| Including stop words in vocabulary metrics | Universal words add noise; no discriminative power | Filter all stop words before frequency analysis |
| Outputting profile to a non-standard path | Downstream skills cannot find the profile | Always write to .claude/context/data/user-style-profile.json |
| Treating tone as a single number | Tone is multidimensional; a single score loses nuance | Score all five dimensions independently |
| Skipping formatting analysis | Style includes structure, not just words | Always analyze paragraph length, heading depth, list usage |
Assigned Agents
This skill is used by:
voice-replicator-agent-- Primary consumer for style-constrained content generation
Memory Protocol (MANDATORY)
Before starting:
node .claude/lib/memory/memory-search.cjs "style analysis text profiling"
Read .claude/context/memory/learnings.md
After completing:
- New style pattern discovered ->
.claude/context/memory/learnings.md - Issue with analysis ->
.claude/context/memory/issues.md - Decision about metrics ->
.claude/context/memory/decisions.md
ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.