Agent Skills: Exploring Codebases

Semantic search for codebases. Locates matches with ripgrep and expands them into full AST nodes (functions/classes) using tree-sitter. Returns complete, syntactically valid code blocks rather than fragmented lines. Use when looking for specific implementations, examples, or references where full context is needed.

UncategorizedID: oaustegard/claude-skills/exploring-codebases

Install this agent skill to your local

pnpm dlx add-skill https://github.com/oaustegard/claude-skills/tree/HEAD/exploring-codebases

Skill Files

Browse the full folder contents for exploring-codebases.

Download Skill

Loading file tree…

exploring-codebases/SKILL.md

Skill Metadata

Name
exploring-codebases
Description
>-

Exploring Codebases

Exploratory code analysis for unfamiliar repositories. This skill is a workflow, not a tool — it orchestrates tree-sitting (structural) and featuring (semantic) into a progressive disclosure sequence.

Dependencies

  • tree-sitting — AST-powered code navigation (structural inventory)
  • featuring — Feature documentation generator (what/why layer)
uv venv /home/claude/.venv 2>/dev/null
uv pip install tree-sitter-language-pack fastmcp --python /home/claude/.venv/bin/python

Workflow

Phase 1: Structural Inventory (tree-sitting)

Get oriented — what's here, how big, what languages?

cd /mnt/skills/user/tree-sitting/scripts
/home/claude/.venv/bin/python -c "
import sys; sys.path.insert(0, '.')
from engine import cache

stats = cache.scan('/path/to/repo')
print(cache.tree_overview())
"

This gives you the directory tree with file counts, symbol counts, and languages per directory. Takes ~700ms for a 250-file repo, then all subsequent queries are sub-millisecond.

Phase 2: Drill Into Structure

Follow what looks interesting. Use tree-sitting queries to build understanding:

/home/claude/.venv/bin/python -c "
import sys; sys.path.insert(0, '/mnt/skills/user/tree-sitting/scripts')
from engine import cache

# Already scanned — these are instant
print(cache.dir_overview('src/core'))       # Files + top symbols in a directory
print(cache.find_symbol('*Handler*'))       # Glob search across codebase
print(cache.file_symbols('src/api/routes.py'))  # Full API of a single file
print(cache.get_source('handle_request'))   # Read a specific implementation
"

Heuristics for what to drill into first:

  • Directories with high symbol counts relative to file counts (dense logic)
  • Entry point patterns: main, cli, app, server, routes, handler
  • Files with many imports (integration points)
  • The root directory's top-level files (often config + entry points)

Phase 3: Feature Synthesis (featuring)

Once you understand the structure, generate the "what does it DO?" layer:

/home/claude/.venv/bin/python /mnt/skills/user/featuring/scripts/gather.py /path/to/repo \
  --skip tests,.github,node_modules --source-budget 8000

Read the gather output, then synthesize _FEATURES.md following the featuring skill's format. This is the LLM step — identify capabilities, group symbols into features, write user-facing descriptions.

Phase 4: Targeted Deep Dives

With structural inventory + feature map in hand, use tree-sitting's get_source() to read specific implementations where the feature narrative needs verification or where behavior isn't clear from signatures.

/home/claude/.venv/bin/python -c "
import sys; sys.path.insert(0, '/mnt/skills/user/tree-sitting/scripts')
from engine import cache

# Read implementations that matter
print(cache.get_source('authenticate'))
print(cache.references('AuthToken'))
"

When to Use This vs Other Skills

| Situation | Use | |-----------|-----| | "I just cloned this, what is it?" | exploring-codebases (this skill) | | "Where is the retry logic?" | searching-codebases | | "Find all files matching class.*Error" | searching-codebases | | "Show me the symbols in auth.py" | tree-sitting directly | | "Document what this codebase does" | featuring directly |

Exploring is the divergent skill — you don't know what you're looking for yet. Searching is the convergent skill — you know what you want, you need to find it.

Output

The exploration produces understanding, not necessarily files. But the concrete artifacts, when warranted, are:

  • _FEATURES.md — top-down feature documentation (via featuring)
  • Mental model of codebase structure, entry points, and architecture

Scaling

For large repos (>100 files), use --skip aggressively in Phase 1 to exclude tests, vendored code, generated files, and docs. Focus the initial scan on src/ or the primary source directory. Expand scope as needed.

For monorepos, treat each package/service as a separate exploration. Generate per-subsystem _FEATURES.md files linked from a root index.