Agent Knowledge Optimizer
Transform accumulated documentation into a retrieval-optimized knowledge system.
Core Principle
File organization is a human concern. Agents don't browse—they search and load. Optimize for:
- Discovery: What knowledge exists?
- Relevance: Is it needed for this task?
- Efficiency: What's the minimum to load?
Workflow
Phase 1: Knowledge Extraction
Inventory all agent documentation:
# Find all agent doc sources
find . -maxdepth 2 -name "*.md" -path "*/.claude/*" -o \
-name "*.md" -path "*/.codex/*" -o \
-name "*.md" -path "*/.cursor/*" -o \
-name "CLAUDE.md" -o -name "AGENTS.md" -o -name "INSTRUCTIONS.md"
For each file, extract:
- Discrete facts (single pieces of actionable information)
- Instructions (procedures, rules, constraints)
- Context triggers (when is this knowledge needed?)
Phase 2: Chunk Analysis
Break content into retrieval units—the smallest self-contained piece of information that makes sense alone.
Good chunk:
## Adding API Endpoints
1. Create handler in src/handlers/
2. Register route in src/routes.rs
3. Add OpenAPI spec to docs/api.yaml
Bad chunk (too coupled):
See the API section for endpoint patterns, but first read the auth docs,
which reference the middleware guide...
Score each chunk:
- Self-contained? Can agent act on this without loading more?
- Task-specific? Clear when this is needed?
- Information-dense? High signal per token?
Phase 3: Build Knowledge Manifest
Generate .claude/KNOWLEDGE.md—a lightweight index the agent reads first:
# Knowledge Manifest
## Task → Knowledge Map
| When working on... | Load | Key terms |
|-------------------|------|-----------|
| API endpoints | references/api.md | route, handler, endpoint |
| Authentication | references/auth.md | token, session, login |
| Database changes | references/schema.md | migration, model, query |
| Testing | references/testing.md | spec, fixture, mock |
| Deployment | references/deploy.md | release, staging, prod |
## Quick Reference
### Build Commands
- `npm run dev` — Start dev server (port 3000)
- `npm test` — Run test suite
- `npm run build` — Production build
### Key Paths
- Handlers: `src/handlers/`
- Routes: `src/routes.ts`
- Tests: `tests/`
### Critical Rules
- Never commit .env files
- All PRs require tests
- Use conventional commits
The manifest contains:
- Task→Knowledge map: What to load for what context
- Quick reference: High-frequency facts (no file loading needed)
- Critical rules: Must-know constraints (always relevant)
Phase 4: Compile Optimized Artifacts
Transform verbose source docs into dense, agent-optimized versions.
Compression techniques:
| Source (verbose) | Compiled (dense) |
|-----------------|------------------|
| "When you want to add a new endpoint, you should first create a handler function..." | New endpoint: handler → route → spec |
| Long prose paragraphs | Structured tables |
| Repeated information | Single source of truth |
| Examples with explanation | Just the pattern |
Output structure:
.claude/
├── CLAUDE.md # Human-readable, can stay verbose
├── KNOWLEDGE.md # Agent manifest (generated)
└── compiled/ # Agent-optimized versions (generated)
├── api.md # Dense API reference
├── patterns.md # Code patterns as templates
└── rules.md # All constraints in one place
Phase 5: Generate Retrieval Hints
Add grep-friendly markers throughout compiled docs:
<!-- @task:new-endpoint @load:api,routes -->
## Adding Endpoints
<!-- @task:fix-auth @load:auth,middleware -->
## Authentication Flow
<!-- @task:write-test @load:testing -->
## Test Patterns
These markers enable:
# Find relevant sections for a task
grep -l "@task:new-endpoint" .claude/compiled/*.md
Phase 6: Validation
Test the optimized system:
- Coverage check: Every fact from source exists in compiled output
- Retrieval test: Can common tasks be served with minimal loading?
- Density check: Compiled versions smaller than sources?
# Compare sizes
wc -l .claude/references/*.md # Source
wc -l .claude/compiled/*.md # Compiled (should be smaller)
Manifest Format
The KNOWLEDGE.md manifest follows this structure:
# Knowledge Manifest
<!-- Auto-generated. Source: .claude/references/, CLAUDE.md -->
## Task Context Map
<!-- What to load based on current work -->
| Context | Load | Search |
|---------|------|--------|
| [task description] | [file path] | [grep terms] |
## Always-Loaded Facts
<!-- High-frequency, never needs file lookup -->
### Commands
[Most-used commands as a table]
### Paths
[Key directories and their purposes]
### Rules
[Critical constraints that always apply]
## Chunk Index
<!-- What exists and where -->
| Topic | Location | Lines | Summary |
|-------|----------|-------|---------|
| [topic] | [file:line-range] | [count] | [one-line summary] |
Information Density Principles
Convert Prose to Structure
Before:
"The authentication system uses JWT tokens stored in httpOnly cookies. When a user logs in, the server validates credentials against the database, generates a token with a 24-hour expiry, and sets it as a cookie..."
After:
## Auth Flow
- Method: JWT in httpOnly cookie
- Expiry: 24h
- Flow: credentials → DB validate → token → cookie
Eliminate Redundancy
If the same information appears in multiple places, create one canonical source and reference it:
## Token Handling
See: [Auth Flow](#auth-flow) — tokens section
Prefer Tables Over Lists
Before:
- The API endpoint for users is /api/users
- The API endpoint for posts is /api/posts
- The API endpoint for comments is /api/comments
After:
| Resource | Endpoint |
|----------|----------|
| Users | /api/users |
| Posts | /api/posts |
| Comments | /api/comments |
Use Patterns Over Examples
Before:
To create a user handler:
```javascript
export async function createUser(req, res) {
const { name, email } = req.body;
const user = await db.users.create({ name, email });
res.json(user);
}
After:
Handler pattern: `export async function {action}{Resource}(req, res)`
Body: Extract params → DB operation → Return result
Output Checklist
After optimization, verify:
- [ ]
KNOWLEDGE.mdexists and is under 100 lines - [ ] Task→knowledge mappings cover common workflows
- [ ] Quick reference has most-used facts
- [ ] Compiled docs are denser than sources
- [ ] No orphaned knowledge (everything indexed)
- [ ] Retrieval hints enable grep-based discovery
- [ ] Original source docs untouched (human reference)