Agent Skills: repo-rag

Perform high-recall codebase retrieval using semantic search and symbol indexing. Use when you need to find specific code, understand project structure, or verify architectural patterns before editing.

UncategorizedID: oimiragieo/agent-studio/repo-rag

Install this agent skill to your local

pnpm dlx add-skill https://github.com/oimiragieo/agent-studio/tree/HEAD/.claude/skills/_archive/dead/repo-rag

Skill Files

Browse the full folder contents for repo-rag.

Download Skill

Loading file tree…

.claude/skills/_archive/dead/repo-rag/SKILL.md

Skill Metadata

Name
repo-rag
Description
Perform high-recall codebase retrieval using semantic search and symbol indexing. Use when you need to find specific code, understand project structure, or verify architectural patterns before editing.

References (archive): SCAFFOLD_SKILLS_ARCHIVE_MAP.md — semantic search / vector patterns from everything-claude-code backend-patterns, tdd-workflow.

<identity> Repo RAG (Retrieval Augmented Generation) - Provides advanced codebase search capabilities beyond simple grep. </identity> <capabilities> - High-recall codebase retrieval using semantic search - Symbol indexing for finding classes, functions, and types - Understanding project structure - Verifying architectural patterns before editing </capabilities> <instructions> <execution_process> 1. **Symbol Search First**: Use `symbols` to find classes, functions, and types. This is more accurate than text search for code structures. 2. **Semantic Search**: Use `search` for concepts, comments, or broader patterns. 3. **Verification**: Always verify the file path and context returned before proposing edits. </execution_process>

<usage_patterns>

  • Architecture Review: Run symbol searches on key interfaces to understand the dependency graph.
  • Plan Mode: Use this skill to populate the "Context" section of a Plan Mode artifact.
  • Refactoring: Identify all usages of a symbol before renaming or modifying it. </usage_patterns> </instructions>
<examples> <code_example> **Symbol Search**:
symbols "UserAuthentication"

Semantic Search:

search "authentication middleware logic"

</code_example> </examples>

RAG Evaluation

Overview

Systematic evaluation of RAG quality using retrieval and end-to-end metrics. Based on Claude Cookbooks patterns.

Evaluation Metrics

Retrieval Metrics (from .claude/tools/repo-rag/metrics.py):

  • Precision: Proportion of retrieved chunks that are actually relevant
    • Formula: Precision = True Positives / Total Retrieved
    • High precision (0.8-1.0): System retrieves mostly relevant items
  • Recall: Completeness of retrieval - how many relevant items were found
    • Formula: Recall = True Positives / Total Correct
    • High recall (0.8-1.0): System finds most relevant items
  • F1 Score: Harmonic mean of precision and recall
    • Formula: F1 = 2 × (Precision × Recall) / (Precision + Recall)
    • Balanced measure when both precision and recall matter
  • MRR (Mean Reciprocal Rank): Measures ranking quality
    • Formula: MRR = 1 / rank of first correct item
    • High MRR (0.8-1.0): Correct items ranked first

End-to-End Metrics (from .claude/tools/repo-rag/evaluation.py):

  • Accuracy (LLM-as-Judge): Overall correctness using Claude evaluation
    • Compares generated answer to correct answer
    • Focuses on substance and meaning, not exact wording
    • Checks for completeness and absence of contradictions

Evaluation Process

  1. Create Evaluation Dataset:

    {
      "query": "How is user authentication implemented?",
      "correct_chunks": ["src/auth/middleware.ts", "src/auth/types.ts"],
      "correct_answer": "User authentication uses JWT tokens...",
      "category": "authentication"
    }
    
  2. Run Retrieval Evaluation:

    # Using Python directly
    from .claude.tools.repo_rag.metrics import evaluate_retrieval
    metrics = evaluate_retrieval(retrieved_chunks, correct_chunks)
    print(f"Precision: {metrics['precision']}, Recall: {metrics['recall']}, F1: {metrics['f1']}, MRR: {metrics['mrr']}")
    
  3. Run End-to-End Evaluation:

    # Using Python directly
    from .claude.tools.repo_rag.evaluation import evaluate_end_to_end
    result = evaluate_end_to_end(query, generated_answer, correct_answer)
    print(f"Correct: {result['is_correct']}, Explanation: {result['explanation']}")
    

Expected Performance

Based on Claude Cookbooks results:

  • Basic RAG: Precision 0.43, Recall 0.66, F1 0.52, MRR 0.74, Accuracy 71%
  • With Re-ranking: Precision 0.44, Recall 0.69, F1 0.54, MRR 0.87, Accuracy 81%

Best Practices

  1. Separate Evaluation: Evaluate retrieval and end-to-end separately
  2. Create Comprehensive Datasets: Cover common and edge cases
  3. Evaluate Regularly: Run evaluations after codebase changes
  4. Track Metrics Over Time: Monitor improvements
  5. Use Both Metrics: Precision/Recall for retrieval, Accuracy for end-to-end

References

Memory Protocol (MANDATORY)

Before starting: Read .claude/context/memory/learnings.md

After completing:

  • New pattern -> .claude/context/memory/learnings.md
  • Issue found -> .claude/context/memory/issues.md
  • Decision made -> .claude/context/memory/decisions.md

ASSUME INTERRUPTION: If it's not in memory, it didn't happen.