Literate Programming Skill
CRITICAL: This skill MUST be activated BEFORE making any changes to .nw files!
You are an expert in literate programming using the noweb system.
Reference Files
This skill includes detailed references in references/:
| File | Content | Search patterns |
|------|---------|-----------------|
| noweb-commands.md | Tangling, weaving, flags, troubleshooting | notangle, noweave, -R, -L |
| testing-patterns.md | Test organization, placement, dependency testing | test functions, pytest, after implementation |
| git-workflow.md | Version control, .gitignore, pre-commit | git, commit, generated files |
| multi-directory-projects.md | Large project organization, makefiles | src/, doc/, tests/, MODULES |
| project-initialization.md | New project setup, templates, checklist | new project, initialize, pyproject.toml |
| preamble.tex | Standard LaTeX preamble for documentation | \usepackage, memoir |
When to Use This Skill
Correct Workflow
- User asks to modify a .nw file
- YOU ACTIVATE THIS SKILL IMMEDIATELY
- You plan the changes with literate programming principles
- You make the changes following the principles
- You regenerate code with make/notangle
Anti-pattern (NEVER do this)
- User asks to modify a .nw file
- You directly edit the .nw file ← WRONG
- Later review finds literate quality problems
- You have to redo everything
Remember
- .nw files are NOT regular source code files
- They combine documentation and code for human readers
- Literate quality is AS IMPORTANT as code correctness
- Bad literate quality = failed task, even if code works
Planning Changes
When making changes to a .nw file:
- Read the existing file to understand structure and narrative
- Plan with literate programming in mind:
- What is the "why" behind this change?
- How does this fit into the existing narrative?
- What new chunks are needed? What are their meaningful names?
- Where in the pedagogical order should this be explained?
- Design documentation BEFORE writing code:
- Write prose explaining the problem and solution
- Use subsections to structure complex explanations
- Decompose code into well-named chunks:
- Each chunk = one coherent concept
- Names describe purpose, not syntax (like pseudocode)
- Write the code chunks
- Regenerate and test
Key principle: If you find yourself writing code comments to explain logic, that explanation belongs in the documentation chunks instead.
Reviewing Literate Programs
When reviewing, evaluate:
- Narrative flow: Coherent story? Pedagogical order?
- Variation theory: Contrasts used? "Whole, parts, whole" structure?
- Chunk quality: Meaningful names? Focused on single concepts?
- Explanation quality: Explains "why" not just "what"? Red flags: prose that begins "We [verb] the [noun]" matching a function name; prose that describes parameter types visible in the signature; prose that restates conditionals without explaining why they matter.
- Test organization: Tests after implementation, not before?
- Proper noweb syntax:
[[code]]notation? Valid chunk references?
Core Philosophy
Literate programming (Knuth) has two goals:
- Explain to human beings what we want a computer to do
- Present concepts in order best for human understanding (psychological order, not compiler order)
Variation Theory
Apply variation-theory skill when structuring explanations:
- Contrast: Show what something IS vs what it is NOT
- Separation: Start with whole (module outline), then parts (chunks)
- Generalization: Show pattern across different contexts
- Fusion: Integrate parts back into coherent whole
CRITICAL: Show concrete examples FIRST, then state general principles. Readers cannot discern a pattern without first experiencing variation.
Noweb File Format
Documentation Chunks
- Begin with
@followed by space or newline - Contain explanatory text (LaTeX, Markdown, etc.)
- Copied verbatim by noweave
Code Chunks
- Begin with
<<chunk name>>=on a line by itself (column 1) - End when another chunk begins or at end of file
- Reference other chunks using
<<chunk name>> - Multiple chunks with same name are concatenated
Syntax Rules
- Quote code in documentation using
[[code]](escapes LaTeX special chars) - Escape:
@<<for literal<<,@@in column 1 for literal@
Writing Guidelines
-
Start with the human story - problem, approach, design decisions
-
Introduce concepts in pedagogical order - not compiler order
-
Use meaningful chunk names - 2-5 word summary of purpose (like pseudocode)
-
Reference variables in chunk names - when a chunk operates on a specific variable, use
[[variable]]notation in the chunk name to make the connection explicit (e.g.,<<add graders to [[graders]] list>>) -
Decompose by concept, not syntax
-
Explain the "why" - don't just describe what the code does. Prose that merely restates the code in English teaches nothing. Good prose explains why a design choice was made: what alternative was rejected, what would break without this approach, or what constraint drives the implementation.
Self-test: If your prose could be mechanically generated from the function signature, it's "what" not "why." Ask yourself: What design decision does this paragraph justify? What alternative did we reject and why? If the paragraph doesn't answer either question, rewrite it.
BAD — prose restates code in English:
\subsection{Counting $n$-grams} We count overlapping $n$-grams. If $n$ is larger than the input, the result is empty. <<functions>>= def ngram_counts(text, *, n): ... @GOOD — prose explains why this design choice:
\subsection{Counting $n$-grams} We use overlapping $n$-grams because they capture all positional contexts---in \enquote{THE}, overlapping bigrams yield TH and HE, whereas non-overlapping would only yield TH. This matches the standard definition used in cryptanalysis. <<functions>>= def ngram_counts(text, *, n): ... @Red flags that prose is "what" not "why":
- Begins "We [verb] the [noun]" where the verb matches a function name
- Describes parameter types or return values already in the signature
- Restates conditional logic ("If X, we do Y") without explaining why X matters
-
Keep chunks focused — one function per
<<functions>>=chunk with prose before it. Each function (or small group of tightly related functions) gets its own<<functions>>=chunk preceded by explanatory prose. Never put multiple unrelated functions in a single chunk.BAD — four functions crammed into one chunk with minimal prose:
\subsection{Helper Functions} We provide several utility functions. <<functions>>= def normalize_text(text): ... def letters_only(text): ... def key_shifts(key): ... def index_of_coincidence(text): ... @GOOD — each function with its own subsection and prose:
\subsection{Text Normalization} Before analysis, we strip non-alphabetic characters and convert to lowercase so that frequency counts are meaningful. <<functions>>= def normalize_text(text): ... @ \subsection{Index of Coincidence} The index of coincidence measures how likely two randomly chosen letters from a text are identical ... <<functions>>= def index_of_coincidence(text): ... @ -
Decompose long functions into named sub-chunks — If a function has more than ~25 lines and contains two or more distinct algorithmic phases, decompose it into named sub-chunks. Each sub-chunk name should read like a step in an algorithm description. The prose before each sub-chunk explains why that phase works the way it does. This is the classic Knuth technique.
BAD — 80-line function with one line of prose:
We generate plaintext by concatenating sentences. <<functions>>= def generate_plaintext(size, *, sources, seed=None): """...""" if size <= 0: raise ValueError(...) paragraphs = extract_paragraphs(sources, ...) ... # 75 more lines return normalize(prefix, options) @GOOD — function body decomposed into named sub-chunks with prose:
<<functions>>= def generate_plaintext(size, *, sources, seed=None): """...""" <<prepare filtered paragraphs>> <<pick random starting point>> <<collect sentences until target length>> <<select closest sentence boundary>> @ We extract paragraphs from the corpus, removing headings and ToC entries. Paragraphs lacking sentence-ending punctuation are discarded---they are typically list items or table rows. <<prepare filtered paragraphs>>= if size <= 0: raise ValueError("size must be positive") ... @ To avoid always starting at the beginning of the corpus, we rotate to a random paragraph. <<pick random starting point>>= rng = random.Random(seed) ... @ -
Use bucket chunks — distribute
<<constants>>=near their relevant code - Define each constant in the section where it is conceptually relevant. Never group all constants into a single\subsection{Constants}.BAD — all constants dumped in one subsection:
\subsection{Constants} <<constants>>= DATA_DIR = ... % used in loading section GUTENBERG_START = ... % used in extraction section SENTENCE_RE = ... % used in sentence splitting section KEEP_PUNCT = ... % used in normalization section @GOOD — each constant near the code that uses it:
\subsection{Loading Texts} <<constants>>= DATA_DIR = Path(__file__).parent / "data" @ <<functions>>= def load_text(path): ... @ \subsection{Extracting Body Text} <<constants>>= GUTENBERG_START = "*** START OF" GUTENBERG_END = "*** END OF" @ <<functions>>= def extract_body(text): ... @ -
Define constants for magic numbers - never hardcode values
-
Co-locate dependencies with features - feature's imports in feature's section
-
Prefer public functions - Default to making functions public with docstrings. Only use
_-prefixed private functions for true internal helpers tightly coupled to a single caller. Public utilities (e.g.,normalize_text,letters_only) are reusable across modules and discoverable viahelp(). Duplicated private helpers across modules (e.g.,_to_asciiin bothvigenere.nwandplaintexts.nw) are a sign the function should be public in a shared module. -
Keep lines under 80 characters - both prose and code
LaTeX Documentation Quality
Apply latex-writing skill. Most common anti-patterns in .nw files:
Lists with bold labels: Use \begin{description} with \item[Label], NOT \begin{itemize} with \item \textbf{Label}:
Code with manual escaping: Use [[code]], NOT \texttt{...\_...}
Manual quotes: Use \enquote{...}, NOT "..." or ...'' ``
Manual cross-references: Use \cref{...}, NOT Section~\ref{...}
Progressive Disclosure Pattern
When introducing high-level structure, use abstract placeholder chunks that defer specifics:
def cli_show(user_regex,
<<options for filtering>>):
<<implementation>>
@
[... later, explain each option ...]
\paragraph{The --all option}
<<options for filtering>>=
all: Annotated[bool, all_opt] = False,
@
Benefits: readable high-level structure, pedagogical ordering, maintainability.
The same technique applies to function bodies: long functions can use
<<phase name>> sub-chunks to present algorithmic steps in pedagogical
order with prose between them (see Writing Guideline 8, "Decompose long
functions").
Chunk Concatenation Patterns
Use multiple definitions when building up a parameter list pedagogically:
\subsection{Adding the diff flag}
<<args for diff>>=
diff=args.diff,
@
[... later ...]
\subsection{Fine-tuning thresholds}
<<args for diff>>=
threshold=args.threshold
@
Use separate chunks when contexts differ (different scopes):
<<args from command line>>= # Has args object
diff=args.diff,
@
<<params for recursion>>= # No args, only parameters
diff=diff,
@
Test Organization
CRITICAL: Tests MUST appear AFTER implementation, distributed throughout
the file near the code they verify. NEVER create a \section{Tests} or
\section{Unit Tests} that groups all tests at the end of the file.
See references/testing-patterns.md for detailed patterns.
Key rules:
- Each implementation section is followed by its
<<test functions>>=chunk - Use single
<<test functions>>chunk name — noweb concatenates them - Use
from module import *in the test file header - Frame tests pedagogically: "Let's verify this works..."
BAD — all tests collected at the end:
\section{Encryption}
<<functions>>=
def encrypt(text, key): ...
@
\section{Decryption}
<<functions>>=
def decrypt(text, key): ...
@
\section{Tests} % ← NEVER do this
<<test functions>>=
def test_encrypt(): ...
def test_decrypt(): ...
@
GOOD — each test immediately after its implementation:
\section{Encryption}
<<functions>>=
def encrypt(text, key): ...
@
Let's verify that encryption produces the expected ciphertext:
<<test functions>>=
def test_encrypt(): ...
@
\section{Decryption}
<<functions>>=
def decrypt(text, key): ...
@
We can verify that decryption inverts encryption:
<<test functions>>=
def test_decrypt(): ...
@
Multi-Directory Projects
For large projects (5+ .nw files), see references/multi-directory-projects.md.
Key structure:
project/
├── Makefile # Root orchestrator (compile → test → docs)
├── pyproject.toml # Poetry packaging configuration
├── src/ # .nw files → .py + .tex
├── doc/ # Document wrapper (.nw), preamble.tex
├── tests/ # Extracted test files (unit/ subdir)
└── makefiles/ # Shared build rules (noweb.mk, subdir.mk)
Initializing a New Project
See references/project-initialization.md for full details. Quick checklist:
- Create
pyproject.tomlwith[tool.poetry]packages/include/exclude - Create
src/.gitignore(*.py,*.tex) andtests/.gitignore(*.py) - Create
src/packagename/Makefilewith explicit__init__.pyrule - Create
src/packagename/packagename.nwwith<<[[__init__.py]]>>and<<test [[packagename.py]]>>chunks - Create
tests/Makefilewith auto-discovery (uses%20encoding,cpif,unit/subdirectory) - Create
doc/packagename.nwwrapper,doc/Makefile,doc/preamble.tex - Create root
Makefileorchestrating compile → test → docs
LaTeX-Safe Chunk Names
Use [[...]] notation for Python chunks with underscores:
<<[[module_name.py]]>>=
def my_function():
pass
@
Extract with: notangle -R"[[module_name.py]]" file.nw > module_name.py
Best Practices Summary
-
Write documentation first - then add code
-
Keep lines under 80 characters
-
Check for unused chunks - run
norootsto find typos -
Keep tangled code in .gitignore - .nw is source of truth
-
NEVER commit generated files - .py and .tex from .nw are build artifacts
-
Test your tangles - ensure extracted code runs
-
Require PEP-257 docstrings on all public functions - Prose in
.nwis for maintainers reading the literate source; docstrings are for users of the compiled.pywho never see the.nwfile. Both are needed. Private functions (prefixed_) may omit docstrings. Never use\crefor other LaTeX commands inside docstrings.BAD — function with prose but no docstring:
We convert text to lowercase ASCII for uniform comparison. <<functions>>= def normalize_text(text): return text.lower().encode("ascii", "ignore").decode() @GOOD — prose for maintainers AND docstring for users:
We convert text to lowercase ASCII for uniform comparison. <<functions>>= def normalize_text(text): """Return lowercase ASCII version of ``text``. Non-ASCII characters are silently dropped. """ return text.lower().encode("ascii", "ignore").decode() @ -
Include table of contents - add
\tableofcontentsin documentation
Git Workflow
See references/git-workflow.md for details.
Core rules:
- Only commit .nw files to git
- Add generated files to .gitignore immediately
- Regenerate code with
makeafter checkout/pull - Never commit generated .py or .tex files
Noweb Commands Quick Reference
See references/noweb-commands.md for details.
# Tangling
notangle -R"[[module.py]]" file.nw > module.py
noroots file.nw # List root chunks
# Weaving
noweave -n -delay -x -t2 file.nw > file.tex # For inclusion
noweave -latex -x file.nw > file.tex # Standalone
When Literate Programming Is Valuable
- Complex algorithms requiring detailed explanation
- Educational code where understanding is paramount
- Code maintained by others
- Programs where design decisions need documentation
- Projects combining multiple languages/tools