Unix Review
Evaluate codebases against the Unix philosophy and SOLID principles, with a novel AI-Readiness dimension for the age of AI-assisted development.
Philosophy
Code Quality = f(Codebase, Architecture Type, Team Context, AI Collaboration)
The Unix philosophy—born from decades of building systems that survive—remains the most battle-tested framework for software design. Eric Raymond distilled it into 17 rules. SOLID principles complement them at the object/module level. Together they form a complete lens for evaluating code.
But we add a new dimension: AI-Readiness. Code that AI agents can navigate, understand, and safely modify is not a luxury—it's a competitive advantage. The same properties that make code good for humans (clarity, modularity, separation of concerns) make it good for AI, but AI has specific needs around context windows, file independence, and composable interfaces.
We collapse 17 Unix rules + 5 SOLID principles into 8 scoreable dimensions to avoid overwhelming output while preserving full analytical coverage.
Citation Fidelity Firewall (when reviewing — never violate)
A code review is evidence about real code. Citing is not the same as remembering. This skill demands path:line references for every exemplar, violation, and score rationale — and that demand is exactly the condition that tempts a model to invent a plausible-but-wrong line number, reconstruct a quoted snippet from memory, or name a violation it never actually located. A confident citation pointing at the wrong line (or a line that doesn't exist) is a failure, not a finding, no matter how reasonable it sounds.
The hard rule: every path:line reference, every quoted code snippet, and every named violation or exemplar in your report MUST be one you actually opened and read in this repo and verified exists at that location. If you have not opened a file, you may not cite a line in it. Never guess a line number, never reconstruct a quote from memory, never carry a citation over from a similar codebase you've seen before. When you cannot ground a claim in a line you have read, you have exactly three honest moves, in order:
- Cite what you actually read — point only to files and lines you opened. A
path:lineyou verified beats five you half-remember. - Flag the un-read area — if a dimension's weakness lives in code you did not open, say so explicitly: "Robustness in
src/net/not assessed — files not read." Name the gap; do not paper over it with an invented citation. - Reduce the claim to match the evidence — if you only sampled part of the codebase, scope the finding to what you saw ("in the files reviewed…") rather than generalizing to code you never opened.
A review with fewer, real citations beats one padded with invented ones. It is always acceptable — often more honest — to return three solid, verified citations and a list of un-read areas than to manufacture a citation per dimension to fill the template. Empty slots are a signal, not a defect.
Calibration is part of fidelity. Do not inflate scores to be agreeable. A genuinely poor codebase MUST be allowed to score low — 40s, 30s, lower if the evidence supports it. Do not cluster every dimension at 70–85 because it feels safe; that is its own form of fabrication (inventing quality that isn't there). Every score must be backed by cited evidence: if you can't point to read code justifying a 75, the score is unsupported and should drop to what you can actually defend.
Final pass — citation re-verification (outranks every other concern): before emitting the report, re-open each file you cite and confirm the line number, the snippet, and the named construct still match what you wrote. Roll back — or downgrade to "not assessed" — any citation you cannot re-verify against the actual file. This check outranks completeness, template-filling, and stylistic polish.
Quick Start
Quick Scan: Reconnaissance → Score 8 dimensions → Top 5 recommendations Deep Review: Full file analysis → Dimensional scoring with exemplars/violations → Architecture profile → Detailed remediation AI-Readiness Audit: Focus on AI-Readiness sub-scores with specific remediation
Operating Modes
Mode 1: Quick Scan (default)
When: User says "review," "evaluate," or "how's this codebase?"
Process:
- Reconnaissance (2-3 min exploration)
- Read project structure, entry points, config files
- Sample 5-8 representative files across the codebase
- Identify architecture type
- Score all 8 dimensions (0-100)
- Output architecture profile with top 5 recommendations
Mode 2: Deep Review
When: User says "deep review," "full audit," or "thorough analysis"
Process:
- Reconnaissance (thorough exploration)
- Map full project structure
- Read all key files, not just samples
- Trace dependency chains
- Analyze test coverage patterns
- Score all 8 dimensions with exemplar files and violation citations
- Diagnose dimension gaps
- Output full architecture profile with detailed remediation plan
Mode 3: AI-Readiness Audit
When: User says "AI readiness," "agent friendly," or "how AI-ready is this?"
Process:
- Standard reconnaissance
- Deep-dive on AI-Readiness sub-scores:
- Navigability, Contextual Independence, Modifiability, Agent Composability, Inspectability
- Load
references/ai-readiness.md - Output AI-readiness focused report with specific file-level recommendations
Three-Phase Workflow
Phase 1: Reconnaissance
Before scoring, understand the terrain:
- Structure scan:
treeor glob patterns to map the project - Entry points: Find main files, CLI entry, server bootstrap
- Architecture type detection:
- CLI Tool
- Library/SDK
- Web Application
- Microservice/API
- AI/ML Pipeline
- Monorepo/Multi-package
- Sample files: Read representative files across layers (entry, core logic, utilities, tests, config)
- Dependency analysis: Check package manifest, imports, coupling patterns
Phase 2: Dimensional Scoring
Score each dimension 0-100. Load references/scoring-rubric.md for calibration.
The 8 Dimensions:
| Dimension | Unix Rules | SOLID | What It Measures | |-----------|-----------|-------|------------------| | Modularity | Modularity, Parsimony | SRP | Do modules do one thing? Are interfaces narrow? | | Composability | Composition, Separation | ISP, DIP | Can parts be recombined? Are concerns separated? | | Clarity | Clarity, Transparency, Least Surprise | — | Can you understand what code does without tracing? | | Simplicity | Simplicity, Economy, Optimization | — | Is complexity justified? Is premature optimization absent? | | Robustness | Robustness, Repair, Silence | LSP (secondary) | Does it handle failure gracefully? Is error reporting clear? | | Data-Drivenness | Representation, Generation | — | Is complexity in data structures, not code? Is code generated where possible? | | Extensibility | Extensibility, Diversity | OCP, LSP, DIP (secondary) | Can behavior be extended without modifying internals? | | AI-Readiness | (cross-cutting) | (cross-cutting) | Can AI agents navigate, understand, and safely modify this code? |
For each dimension, identify:
- Exemplar: Best file/module demonstrating this principle (with path:line) — must be a file you actually read; see Citation Fidelity Firewall
- Violation: Worst offender against this principle (with path:line) — must be a real, verified location, not a reconstructed-from-memory line
- Score rationale: 1-2 sentence justification, grounded in cited evidence
If you did not read enough of the codebase to ground a dimension's exemplar or violation in a verified path:line, leave it as "not assessed — files not read" rather than inventing one. An empty slot is honest; a fabricated citation is not.
Phase 3: Synthesis
- Apply architecture-type calibration weights (load
references/scoring-rubric.md) - Compute weighted overall score
- Diagnose dimension gaps (high X + low Y = specific problem)
- Generate top 5 actionable recommendations ranked by impact
- Format output
Architecture-Type Calibration
Different architectures emphasize different dimensions:
| Dimension | CLI | Library | Web App | Microservice | AI/ML | Monorepo | |-----------|-----|---------|---------|--------------|-------|----------| | Modularity | 15% | 20% | 15% | 15% | 10% | 20% | | Composability | 15% | 20% | 10% | 20% | 10% | 15% | | Clarity | 15% | 15% | 10% | 10% | 15% | 10% | | Simplicity | 15% | 10% | 10% | 10% | 10% | 10% | | Robustness | 10% | 10% | 15% | 15% | 15% | 10% | | Data-Drivenness | 10% | 5% | 10% | 10% | 20% | 10% | | Extensibility | 10% | 15% | 15% | 10% | 10% | 15% | | AI-Readiness | 10% | 5% | 15% | 10% | 10% | 10% |
AI-Readiness Sub-Scores
AI-Readiness breaks into 5 sub-criteria (each 0-100, averaged for dimension score):
| Sub-Score | What It Measures | |-----------|-----------------| | Navigability | Can an agent find what it needs? Clear naming, logical structure, discoverable entry points | | Contextual Independence | Can a file be understood without reading the entire codebase? Minimal implicit state | | Modifiability | Can changes be made safely? Clear boundaries, good test coverage, predictable side effects | | Agent Composability | Are there CLI interfaces, scriptable APIs, or tool-friendly boundaries? | | Inspectability | Can an agent verify its changes? Observable state, testable contracts, clear feedback loops |
Load references/ai-readiness.md for detailed criteria and anti-patterns.
Dimension Gap Diagnostics
Certain dimension combinations reveal specific architectural problems:
| Pattern | Diagnosis | |---------|-----------| | High Modularity + Low Composability | Good boundaries but poor interfaces—modules can't talk | | High Clarity + Low Simplicity | Well-documented complexity—the docs explain unnecessary code | | High Robustness + Low Clarity | Defensive programming obscuring logic—error handling hides the happy path | | High Extensibility + Low Modularity | Plugin systems bolted onto monoliths—extension points but no seams | | High Simplicity + Low Robustness | Happy-path code that breaks on edge cases | | Low AI-Readiness + High Clarity | Human-readable but agent-hostile—implicit conventions, no scriptable interfaces | | High Data-Drivenness + Low Clarity | Magic configuration—config is powerful but opaque | | High Clarity + Low Data-Drivenness | Logic hard-coded in control flow instead of being data-driven |
Output Format
UNIX REVIEW — ARCHITECTURE PROFILE
Project: [name]
Architecture: [detected type] | Calibration: [applied]
Files analyzed: [N] | Lines of code: [~N]
DIMENSIONAL SCORES:
Modularity: ████████░░ 82 - [Brief interpretation]
Composability: ██████░░░░ 63 - [Brief interpretation]
Clarity: █████████░ 91 - [Brief interpretation]
Simplicity: ███████░░░ 74 - [Brief interpretation]
Robustness: ██████░░░░ 58 - [Brief interpretation]
Data-Drivenness: █████░░░░░ 48 - [Brief interpretation]
Extensibility: ████████░░ 79 - [Brief interpretation]
AI-Readiness: ███████░░░ 71 - [Brief interpretation]
OVERALL: ████████░░ 73 (Architecture-weighted)
DIMENSION GAP DIAGNOSIS:
[Diagnostic based on score patterns — e.g., "High clarity but low
data-drivenness suggests logic encoded in control flow rather than
data structures. Consider configuration-driven approaches."]
EXEMPLARS (Best Practices Found):
1. [path:line] — [What it does well and which dimension]
2. [path:line] — [...]
3. [path:line] — [...]
VIOLATIONS (Highest Impact):
1. [path:line] — [What principle it violates and why it matters]
2. [path:line] — [...]
3. [path:line] — [...]
AI-READINESS DETAIL:
Navigability: ████████░░ 80
Context Independence: ██████░░░░ 65
Modifiability: ███████░░░ 72
Agent Composability: ██████░░░░ 60
Inspectability: ████████░░ 78
TOP 5 RECOMMENDATIONS:
1. [Most impactful, actionable fix with specific files]
2. [...]
3. [...]
4. [...]
5. [...]
Scoring Philosophy
- Be calibrated, not generous. 70 is genuinely good. 90+ is exceptional. 50 means real problems. A genuinely poor codebase must be allowed to score in the 30s or 40s — do not inflate to be agreeable, and do not cluster scores in a safe 70–85 band. Inventing quality that isn't there is fabrication just like inventing a citation.
- Architecture type matters. A CLI tool doesn't need the composability of a library.
- Dimension gaps are diagnostic. The pattern of scores reveals more than any single number.
- Cite specific files — and only ones you read. Every exemplar and violation must reference actual code with
path:linethat you opened and verified. Never guess a line number or reconstruct a snippet from memory. Re-verify every citation against the file before emitting the report. See the Citation Fidelity Firewall — it outranks template completeness. - Recommendations must be actionable. "Improve modularity" is useless. "Extract the validation logic from
src/handlers/auth.ts:45-120into a separate module" is actionable.
Reference Files
Load as needed:
references/unix-rules.md— All 17 ESR rules with code smells and exemplar patternsreferences/solid-principles.md— 5 SOLID principles mapped to the 8 dimensionsreferences/scoring-rubric.md— Score bands, architecture weights, diagnostic combos, calibration examplesreferences/ai-readiness.md— AI-age thesis, sub-criteria detail, anti-patterns
Critical Principles
- Reconnaissance Before Judgment. Never score without reading code. Sample broadly.
- Architecture-Type Calibration Is Non-Negotiable. A microservice and a CLI have different priorities.
- Dimension Gaps Are Diagnostic. The pattern matters more than the average.
- Cite or It Didn't Happen — and Never Fabricate the Citation. Every claim must reference specific files and lines you actually read and verified. A fabricated
path:lineis worse than no citation: it didn't happen, and now it looks like it did. When you can't ground a claim, flag the un-read area instead. (See Citation Fidelity Firewall.) - Actionable Over Exhaustive. Five concrete recommendations beat twenty vague observations.
- The Unix Way Is Pragmatic. Rules are heuristics, not laws. Flag violations but acknowledge justified exceptions.
- AI-Readiness Is About Structure, Not Comments. Good structure for AI means good structure for humans—but the reverse isn't always true.
- Score the Code, Not the Idea. Evaluate implementation quality, not whether the project concept is good.