Glossary Generator
Generate a comprehensive glossary of terms from a learning graph's concept list with ISO 11179-compliant definitions.
TOKEN EFFICIENCY WARNING
This skill generates large files (2,000+ lines). Token cost matters far more than wall-clock time for teacher users on limited budgets.
Default approach: ONE serial Task agent that writes all definitions directly to a temp file. This is the most token-efficient method because:
- System prompt / tool description overhead is paid only once (~12K tokens)
- No coordination or assembly overhead
- Proven to complete a 300-term glossary in under 50K tokens
| Approach | Agent overhead | Definition generation | Assembly | Total | |----------|---------------|----------------------|----------|-------| | 1 serial agent writing to file | ~12K (once) | ~35K | ~700 (script) | ~48K | | 4 parallel agents + script | ~48K (4x) | ~35K | ~700 (script) | ~84K | | 4 parallel agents + manual Edit | ~48K (4x) | ~35K | ~200K (!!!) | ~283K |
Always use the serial approach unless the user explicitly asks for speed.
The assembly step (sorting and writing the final file) MUST always use a Python
script — NEVER manually emit glossary content through Edit/Write tool calls.
See logs/glossary-generation-very-inefficient.md for the full post-mortem.
Purpose
This skill automates glossary creation for intelligent textbooks by converting concept labels from a learning graph into properly formatted glossary definitions. Each definition follows ISO 11179 metadata registry standards: precise, concise, distinct, non-circular, and free of business rules. The skill ensures consistency across terminology, validates cross-references, and produces alphabetically ordered entries with relevant examples.
Following a short definition you may provide a discussion of why the term is important in the textbook and an example of how the term is used.
When to Use This Skill
Use this skill after the Learning Graph skill has completed and the concept list has been finalized. All markdown content in the /docs area can also be scanned looking for words or phases that might not be clear to the average high-school student.
The glossary relies on having a complete, reviewed list of concepts from the learning graph's concept enumeration phase. Specifically, trigger this skill when:
- A concept list file exists (typically
docs/learning-graph/02-concept-list-v1.md) - The concept list has been reviewed and approved
- The course description exists with clear learning outcomes
- Ready to create or update the textbook's glossary
Workflow
Step 1: Validate Input Quality
Before generating definitions, assess the quality of the concept list:
- Read the concept list file (typically
docs/learning-graph/02-concept-list-v1.md) - Check for duplicate concept labels (target: 100% unique)
- Verify Title Case formatting (target: 95%+ compliance)
- Validate length constraints (target: 98% under 32 characters)
- Assess concept clarity (no ambiguous terms)
Calculate a quality score (1-100 scale):
- 90-100: All concepts unique, properly formatted, appropriate length
- 70-89: Most concepts meet standards, minor formatting issues
- 50-69: Some duplicate concepts or formatting inconsistencies
- Below 50: Significant issues requiring manual review
User Dialog Triggers:
- If score < 70: Ask "The concept list has quality issues. Would you like to review and clean it before generating the glossary?"
- If duplicates found: Ask "Found [N] duplicate concepts. Should I remove duplicates automatically or would you like to review?"
- If formatting issues: Ask "Found [N] concepts with formatting issues. Auto-fix?"
Step 2: Read Course Context
Read the course description file (docs/course-description.md) and any other markdonw files in /docs/**/*.md to understand:
- Target audience (for appropriate example complexity)
- Course objectives (for terminology alignment)
- Prerequisites (for background knowledge assumptions)
- Learning outcomes (for context on concept usage)
Step 3: Generate Definitions Using a Single Serial Agent
Default approach (most token-efficient): Launch ONE Task agent that generates all definitions and writes them directly to a single temp file.
Task agent prompt:
"Generate ISO 11179-compliant glossary definitions for the following [N] terms.
Write ALL entries as markdown (#### headers with definitions, examples, and
discussion) to the file /tmp/glossary-raw.md using the Write tool.
Each entry uses #### for the term header. Do not return the content in your
response — just confirm the file was written and report the term count.
[Paste the full term list here]
[Paste course description context here for audience level]"
The single agent writes all definitions to one file. This pays system-prompt overhead only once (~12K tokens) and avoids all coordination costs. Proven at under 50K tokens for a 380-term glossary.
Optional parallel approach (only if user explicitly requests speed): Split terms
into 4 alphabetical ranges and launch 4 parallel Task agents, each writing to
/tmp/glossary-part-{1,2,3,4}.md. This costs ~84K tokens (nearly 2x serial) due
to repeated agent overhead but completes faster in wall-clock time.
For each concept in the list, create a definition that follows ISO 11179 standards:
Precision (25 points): Accurately capture the concept's meaning
- Define the concept specifically in the context of the course
- Use terminology appropriate for the target audience
- Ensure the definition matches how the concept is used in the course
Conciseness (25 points): Keep definitions brief (target: 20-50 words)
- Avoid unnecessary words or explanations
- Get to the core meaning quickly
- Use clear, direct language
Distinctiveness (25 points): Make each definition unique and distinguishable
- Avoid copying definitions from other sources
- Ensure no two definitions are too similar
- Highlight what makes this concept different from related concepts
Non-circularity (25 points): Avoid circular dependencies
- Do not reference undefined terms in definitions
- Do not create circular chains (A depends on B, B depends on A)
- Use simpler, more fundamental terms in definitions
Example Format:
For a concept "Learning Graph":
#### Learning Graph
A directed graph of concepts that reflects the order that concepts should be learned to master a new concept.
Learning graphs are the foundational data structure use for intelligent textbooks. They are used to guide
intelligent agents and recommend learning paths for students.
**Example:** In a programming course, the learning graph shows that "Variables" must be understood before "Functions," which must be understood before "Recursion."
Step 4: Add Examples (60-80% of terms)
For most concepts (target: 60-80%), include a relevant example:
- Start with "Example:" (no newline after colon)
- Provide a concrete illustration from the course domain
- Keep examples brief (1-2 sentences)
- Ensure examples clarify the concept without adding confusion
Step 5: Add Cross-References
Where appropriate, reference related terms:
- Use "See also:" for related concepts
- Use "Contrast with:" for opposing concepts
- Ensure all cross-referenced terms exist in the glossary
- Keep cross-references to 1-3 per term
Step 6: Assemble Glossary File Using a Python Script
CRITICAL: NEVER manually assemble the glossary through Edit/Write tool calls. Alphabetical sorting and file merging is a trivial programming task. Doing it manually through LLM text generation wastes 100,000+ tokens that cost real money.
MANDATORY APPROACH: Write and execute a Python script via the Bash tool that:
- Reads the agent output file(s) —
/tmp/glossary-raw.md(serial) or/tmp/glossary-part-*.md(parallel) - Parses entries by splitting on
####headers - Sorts entries alphabetically (case-insensitive) using
sorted() - Writes the final
docs/glossary.mdin one pass
Reference script (adapt paths as needed):
#!/usr/bin/env python3
"""Merge glossary parts into a single sorted glossary."""
import glob, os, re
entries = {}
# Support both serial (single file) and parallel (multiple files)
if os.path.exists('/tmp/glossary-raw.md'):
sources = ['/tmp/glossary-raw.md']
else:
sources = sorted(glob.glob('/tmp/glossary-part-*.md'))
for path in sources:
with open(path) as f:
content = f.read()
for block in re.split(r'\n(?=#### )', content):
block = block.strip()
m = re.match(r'#### (.+)', block)
if m:
entries[m.group(1).strip()] = block
sorted_terms = sorted(entries.keys(), key=lambda t: t.lower().lstrip('0123456789-'))
with open('docs/glossary.md', 'w') as out:
out.write('# Glossary of Terms\n\n')
for term in sorted_terms:
out.write(entries[term] + '\n\n')
print(f"Wrote {len(sorted_terms)} terms to docs/glossary.md")
Run this script with python3 /tmp/assemble_glossary.py via the Bash tool.
Total cost: ~500 tokens for the script + ~200 tokens for output = ~700 tokens
(versus 200,000+ tokens if done manually through Edit calls).
NEVER DO ANY OF THE FOLLOWING:
- Write glossary entries directly through the Write or Edit tool
- Copy-paste subagent output into Edit tool old_string/new_string parameters
- Manually sort terms by emitting them in alphabetical order
- Append sections to the glossary file one at a time through Edit calls
Formatting rules for the assembled file:
- Do not put any
---strings in the glossary. They are not needed. - Sort all terms alphabetically (case-insensitive) — the script handles this
- Use level-4 headers (####) for term names
- Place definition in body text (no special formatting)
- Use "Example:" for examples (bold, with colon)
- Maintain consistent spacing between entries (one blank line between entries)
Step 7: Generate Quality Report
Create docs/learning-graph/glossary-quality-report.md with:
ISO 11179 Metadata Registry Compliance Metrics:
For each definition, score on 5 criteria (25 points each):
- Precision: Does it accurately capture the meaning?
- Conciseness: Is it brief (20-50 words)?
- Distinctiveness: Is it unique and distinguishable?
- Non-circularity: No circular dependencies?
- Unencumbered by business rules: Free of specific policies or rules?
Overall Quality Metrics:
- Average definition length: [X] words
- Definitions meeting all 4 criteria: [X]%
- Circular definitions found: [X]
- Example coverage: [X]%
- Cross-references: [X] total, [X] broken
Readability:
- Flesch-Kincaid grade level: [X]
- Appropriate for target audience: Yes/No
Recommendations:
- List any definitions scoring < 70/100
- Identify circular dependencies to fix
- Suggest concepts needing examples
- Note any broken cross-references
Step 8: Validate Output
Perform final validation:
- Verify alphabetical ordering (100% compliance required)
- Check all cross-references point to existing terms
- Ensure all concepts from input list are included
- Validate markdown syntax renders correctly
- Confirm no circular definitions exist
Success Criteria:
- Overall quality score > 85/100
- Zero circular definitions
- 100% alphabetical ordering
- All terms from concept list included
- Markdown renders correctly in mkdocs
Step 9: Update Navigation (Optional)
If mkdocs.yml does not already include the glossary:
- Read
mkdocs.yml - Check if "Glossary: glossary.md" exists in nav section
- If missing, add it in an appropriate location
- Preserve existing navigation structure
Step 10: Generate Cross-Reference Index (Optional)
Create docs/learning-graph/glossary-cross-ref.json for semantic search:
{
"terms": [
{
"term": "Learning Graph",
"related_terms": ["Concept Dependency", "Directed Acyclic Graph"],
"contrasts_with": ["Linear Curriculum"],
"category": "Educational Technology"
}
]
}
This JSON file enables future features like:
- Semantic search across glossary
- Concept relationship visualization
- Automated suggestion of related terms
Quality Scoring Reference
Use this rubric to score each definition (1-100 scale):
85-100: Excellent
- Meets all 4 ISO 11179 criteria (20+ pts each)
- Appropriate length (20-50 words)
- Includes relevant example
- Clear, unambiguous language
- No circular dependencies
70-84: Good
- Meets 3-4 ISO criteria
- Acceptable length (15-60 words)
- May lack example
- Generally clear
- No serious issues
55-69: Adequate
- Meets 2-3 ISO criteria
- Length issues (too short or too long)
- Missing example where helpful
- Some ambiguity
- Minor circular references
Below 55: Needs Revision
- Fails multiple ISO criteria
- Serious length issues
- Confusing or circular
- Missing context
- Requires complete rewrite
Common Pitfalls to Avoid
Circular Definitions:
- Bad: "A Learning Graph is a graph that shows learning."
- Good: "A directed graph of concepts that reflects the order concepts should be learned."
Too Vague:
- Bad: "A thing used in education."
- Good: "A directed graph of concepts that reflects prerequisite relationships."
Too Long:
- Bad: "A learning graph is a specialized type of directed acyclic graph structure commonly used in educational technology and instructional design contexts to represent the hierarchical and sequential relationships between different conceptual elements that students need to master in order to achieve specific learning outcomes."
- Good: "A directed graph of concepts that reflects the order concepts should be learned to master a new concept."
Business Rules:
- Bad: "Students must complete prerequisites before advancing to dependent concepts."
- Good: "A directed graph showing prerequisite relationships between concepts."
Undefined Terms:
- Bad: "Uses a DAG structure" (if DAG not in glossary)
- Good: "Uses a directed acyclic graph structure"
Output Files Summary
Required:
docs/glossary.md- Complete glossary in alphabetical order with ISO 11179-compliant definitions
Recommended:
docs/learning-graph/glossary-quality-report.md- Quality assessment and recommendations
Optional:
docs/learning-graph/glossary-cross-ref.json- JSON mapping for semantic search- Updates to
mkdocs.ymlnavigation if glossary link missing
Example Session
User: "Generate a glossary from my concept list"
Claude (using this skill):
- Reads concept list file and
docs/course-description.md(~5K tokens) - Validates quality (checks for duplicates, formatting) (~1K tokens)
- Launches ONE serial Task agent that writes all definitions to
/tmp/glossary-raw.md(~35K tokens) - Writes a Python assembly script to
/tmp/assemble_glossary.py(~500 tokens) - Runs the script via Bash — it parses, sorts, and writes
docs/glossary.md(~200 tokens) - Verifies term count with
grep -c "^####" docs/glossary.md(~100 tokens) - Updates
mkdocs.ymlnavigation if needed (~500 tokens) - Reports: "Created glossary with 187 terms. Overall quality score: 89/100. Added examples to 71% of terms. No circular definitions found."
Total token budget: ~48K tokens (NOT 250K+)
REMEMBER: The subagents generate text (unavoidable LLM work). The assembly is
a programming task — use sorted(), not the Edit tool.