Auto-Recall
Overview
Semantic retrieval from the perpetual memory vector store at query time. Parses
query intent, searches the perpetual_memory LanceDB table, ranks results by
relevance and recency, and returns structured context for injection into agent
prompts.
Core principle: Every agent should be able to recall any past interaction instantly by meaning, not by filename or keyword. Auto-recall is the read side of the perpetual memory architecture.
When to Use
- At the start of any task to recall related past decisions and learnings
- When debugging to find previously encountered similar issues
- When an agent needs context about how a similar problem was solved before
- When building prompts that benefit from historical context
- When checking if a pattern or approach was tried previously
Do NOT use for:
- Searching code (use
pnpm search:codeinstead) - Searching markdown memory files (use
memory-search.cjsfor that) - Real-time interaction monitoring (use
perpetual-memoryskill for writes)
Relationship to Existing Tools
| Tool | Searches | Use For |
| ------------------- | -------------------------- | ------------------------------------ |
| pnpm search:code | Code files (BM25+semantic) | Finding code patterns |
| memory-search.cjs | Markdown memory files | Searching learnings/decisions/issues |
| auto-recall | perpetual_memory table | Recalling past interactions |
Auto-recall is complementary -- it searches a different index (the perpetual memory vector store) that contains auto-captured interaction summaries rather than manually-written markdown entries.
Workflow
Step 1: Parse Query Intent
Before searching, classify the query intent:
| Intent Type | Query Pattern | Search Strategy | | ----------- | ----------------------------------------- | ------------------------- | | Decision | "Why did we choose X", "What was decided" | Filter: category=decision | | Issue | "Have we seen this error before" | Filter: category=issue | | Pattern | "How do we usually handle X" | Filter: category=pattern | | Learning | "What did we learn about X" | Filter: category=learning | | General | Any other query | No category filter |
Step 2: Search Vector Store
# Basic semantic search
node .claude/tools/cli/auto-embed.cjs --query "how does the routing guard handle Write operations" --limit 10
# Or via the skill script
node .claude/skills/auto-recall/scripts/main.cjs --query "JWT refresh token pattern" --limit 5
Step 3: Rank by Relevance + Recency
Results are ranked by cosine similarity from LanceDB. For time-sensitive queries, apply a recency boost:
final_score = similarity * 0.7 + recency_score * 0.3
where recency_score = max(0, 1 - (days_since_creation / 30))
This ensures recent interactions are slightly preferred when similarity is close.
Step 4: Inject Context
Format retrieved memories for agent prompt injection:
## Recalled Context (from perpetual memory)
1. [decision] (sim=0.87, 2d ago, agent=architect)
Chose JWT RS256 over HS256 for key rotation support. ADR-045.
2. [learning] (sim=0.82, 5d ago, agent=developer)
Token refresh requires httpOnly cookies to prevent XSS.
3. [issue] (sim=0.74, 1d ago, agent=qa)
JWT expiry not propagated to frontend. Workaround in auth.middleware.ts:47.
CLI Reference
# Semantic query
node .claude/skills/auto-recall/scripts/main.cjs --query "routing guard behavior"
# Query with category filter
node .claude/skills/auto-recall/scripts/main.cjs --query "auth decision" --category decision
# Query with limit
node .claude/skills/auto-recall/scripts/main.cjs --query "memory system" --limit 5
# Query with recency boost
node .claude/skills/auto-recall/scripts/main.cjs --query "recent changes" --recency-boost
# Output as JSON
node .claude/skills/auto-recall/scripts/main.cjs --query "search" --json
Agent Integration Pattern
Agents should invoke auto-recall at the start of significant tasks:
// At task start, recall relevant context
Skill({ skill: 'auto-recall' });
// Then query for task-relevant history
// node .claude/skills/auto-recall/scripts/main.cjs --query "<task description>" --limit 5
This provides agents with historical context about similar past work, preventing repeated mistakes and leveraging prior decisions.
Iron Laws
- NEVER use auto-recall as a replacement for memory-search.cjs -- they search different indexes and are complementary.
- ALWAYS limit results to avoid context bloat -- default to 5-10 results, never more than 20.
- NEVER inject recalled context without relevance filtering -- results below 0.5 similarity are noise.
- ALWAYS include metadata (category, agent, timestamp) in recalled context for traceability.
- NEVER block on auto-recall failure -- if the perpetual memory table is unavailable, proceed without it.
Anti-Patterns
| Anti-Pattern | Why It Fails | Correct Approach |
| ---------------------------------------- | ------------------------------------------------ | ---------------------------------------------- |
| Using auto-recall for code search | Wrong index; code is in BM25/semantic code index | Use pnpm search:code for code discovery |
| Injecting all results into context | Low-similarity results pollute the prompt | Filter to similarity > 0.5 before injecting |
| Blocking task on recall failure | Perpetual memory is a bonus, not a dependency | Gracefully degrade: proceed without recall |
| Recalling without specifying limit | Unbounded results consume too many tokens | Always set --limit (default: 10) |
| Trusting recall over fresh code analysis | Past context may be outdated | Use recall as starting context, verify current |
Memory Protocol (MANDATORY)
Before starting:
Read .claude/context/memory/learnings.md
After completing:
- New pattern ->
.claude/context/memory/learnings.md - Issue found ->
.claude/context/memory/issues.md - Decision made ->
.claude/context/memory/decisions.md
ASSUME INTERRUPTION: If it's not in memory, it didn't happen.