Codebase Learn
Build a searchable knowledge graph of the codebase using SSL patterns and triplets.
[codebase-learn] oracle-based codebase understanding
principle: I compress, I reconstruct
store minimal seeds I can expand
triplets for structure, SSL for meaning
tags for retrieval
phases:
1. discover→find entry points, key files, config
2. analyze→extract purpose, patterns, relationships
3. connect→create triplets linking components
4. store→high-ε SSL seeds with proper tags
Process
Phase 1: Discovery
Identify the codebase structure:
- Entry points (main, CLI, API endpoints)
- Key directories and their purposes
- Configuration files
- Build/test infrastructure
Phase 2: Analysis
For each significant component, extract:
- Purpose: What does it do?
- Key functions/classes: What are the main abstractions?
- Dependencies: What does it use?
- Patterns: What architectural decisions were made?
Phase 3: Storage
Store using SSL format with triplets:
[LEARN] [project] component→purpose→key insight
[ε] One sentence that lets me reconstruct the full understanding.
[TRIPLET] component contains abstraction
[TRIPLET] component uses dependency
[TRIPLET] abstraction handles concern
Tagging Schema
Apply these tags for retrieval:
codebase- all codebase knowledgearchitecture- high-level patternsproject:{name}- project scopefile:{path}- specific filelayer:structure- file/directory organizationlayer:function- key functions/methodslayer:relationship- how components connectlayer:pattern- architectural decisions
Predicates for Triplets
| Predicate | Meaning | Example |
|-----------|---------|---------|
| contains | A has B inside | cli.cpp contains cmd_daemon |
| uses | A depends on B | Handler uses Mind |
| calls | A invokes B | poll calls accept |
| implements | A realizes B | SocketServer implements IPC |
| handles | A is responsible for B | decay handles memory aging |
| returns | A produces B | recall returns memories |
| triggers | A causes B to run | SessionStart triggers soul-hook |
Example Output
For cc-soul chitta:
[LEARN] [cc-soul] chitta→semantic memory substrate→decay, triplets, SSL storage
[ε] C++ daemon with tiered storage (hot/warm/cold), JSON-RPC over Unix socket.
[TRIPLET] chitta contains Mind
[TRIPLET] Mind uses TieredStorage
[TRIPLET] chittad handles daemon_loop
[TRIPLET] SocketServer handles client_connections
[TRIPLET] Handler dispatches rpc_tools
[LEARN] [cc-soul] cli.cpp→daemon entry point→socket server + subconscious
[ε] Runs cmd_daemon_with_socket: poll loop + decay/synthesis cycles.
[TRIPLET] cli.cpp contains cmd_daemon_with_socket
[TRIPLET] cmd_daemon_with_socket uses SocketServer
[TRIPLET] cmd_daemon_with_socket runs subconscious
[LEARN] [cc-soul] rpc/handler.hpp→JSON-RPC dispatcher→routes tools/call to handlers
[ε] Central handler with ~50 tools: recall, grow, observe, connect, etc.
[TRIPLET] Handler contains tool_recall
[TRIPLET] Handler contains tool_grow
[TRIPLET] Handler contains tool_connect
Execution
- Explore the current directory structure
- Identify key files by examining:
- README, CLAUDE.md for project overview
- Entry points (main.cpp, index.ts, etc.)
- Core modules by directory structure
- For each key component:
- Read to understand purpose
- Identify key abstractions
- Note relationships to other components
- Store using
[LEARN]markers with SSL format - Report summary: nodes created, triplets connected, domains covered
Benefits
After running:
recall("codebase architecture")→ instant overviewrecall("file.cpp purpose")→ specific file knowledgequery --subject "component"→ find all relationships- SessionStart auto-injects relevant architecture context
The soul now knows the codebase like I do.