Extern Researcher
Overview
Enable agents to efficiently study external open-source repositories by leveraging existing research first, using temporary workspaces for clones, and persisting findings to the global thoughts system.
When to Use
- When needing to study patterns or implementations from external repositories
- When checking if a repository has already been researched
- When cloning an external repo for temporary study
- When persisting research findings about external code
- When cleaning up temporary workspace after research is complete
Key Concepts
Research-Centric Model
The catalog tracks research studies, not cloned repositories. Clones are temporary workspaces; research is the persistent artifact.
Directory Structure
The catalog lives inside the org-global shared/ directory so thoughts-cli can scan it
and it syncs across all projects via the org-global git repo.
{thoughtsHome}/{orgGlobal.path}/shared/extern/ # Persistent research storage (canonical)
├── catalog.md # Global research catalog
└── repos/{org-repo}/ # Research documents per repository
{any-project}/thoughts/global/org/shared/extern/ # Same location via project symlink
{any-project}/.extern/ # Temporary workspace (disposable)
└── {org-repo}/ # Cloned repos for study
The script reads ~/.config/thoughts/config.json to resolve thoughtsHome and
orgGlobal.path automatically. Default: ~/thoughts/global/shared/extern/.
Catalog Format
The catalog uses markdown with YAML frontmatter:
---
type: extern-research-catalog
version: 1.0
last_updated: YYYY-MM-DD
total_repos_studied: N
total_studies: N
repos:
- name: "org/repo"
url: "https://github.com/org/repo"
first_studied: "YYYY-MM-DD"
study_count: N
topics: ["topic1", "topic2"]
---
Scripts
Execute via justfile or directly with uv:
Via Justfile (Recommended)
just -f {base_dir}/justfile <recipe> [args...]
| Recipe | Arguments | Description |
|--------|-----------|-------------|
| list | | List all research in catalog |
| search | term | Search catalog by repo or topic |
| stats | | Show catalog statistics |
| add-study | repo url topic document context | Add a new study to catalog |
Direct Execution
uv run {base_dir}/scripts/catalog.py <command> [args...]
| Command | Arguments | Description |
|---------|-----------|-------------|
| list | | List all research studies |
| search | <term> | Search by repo name or topic |
| stats | | Show catalog statistics |
| add-study | --repo --url --topic --document --context | Add new study |
Core Workflow
Phase 1: Discovery (ALWAYS FIRST)
Before cloning any repository:
- Read the global catalog:
{thoughtsHome}/global/shared/extern/catalog.md - Search for the repo by name, URL, or topic
- If found:
- Read existing research documents
- Evaluate: Is this sufficient for current needs?
- If YES: Use existing research, skip cloning
- If NO: Proceed to Phase 2, noting what gaps exist
- If not found: Proceed to Phase 2
Phase 2: Workspace Initialization
If cloning is needed:
- Check if
.extern/exists in current project - If not, create it:
mkdir -p .extern - MUST copy AGENTS.md template to workspace:
This provides agent guidance for future sessions working in this directory.cp {base_dir}/assets/workspace-agents.md .extern/AGENTS.md - Ensure .gitignore includes
.extern/
Phase 3: Clone Repository
- Parse the input (URL, clone command, org/repo format)
- Derive directory name:
{org}-{repo}(lowercase, kebab-case)https://github.com/facebook/react→facebook-reacthttps://github.com/vercel/next.js→vercel-next-js
- Clone with shallow depth (unless full history needed):
git clone --single-branch --depth 1 {url} .extern/{org-repo}/ - Verify success
Phase 4: Research
- Focus on the specific question/topic
- Use appropriate subagents:
codebase-pattern-finder- Find implementations, usage examplescodebase-analyzer- Deep dive on specific componentscodebase-locator- Find files by purpose/feature
- Document findings with file:line references
- Note connections to current project needs
Phase 5: Persist Research
-
Create research document:
- Location:
{thoughtsHome}/global/shared/extern/repos/{org-repo}/ - Filename:
{YYYY-MM-DD}_{topic-slug}.md - Use the research document template below
- Location:
-
MUST update global catalog using the script (not manual file edits):
just -f {base_dir}/justfile add-study \ "org/repo" \ "https://github.com/org/repo" \ "Topic studied" \ "repos/org-repo/YYYY-MM-DD_topic.md" \ "Why this was studied"The script ensures consistent catalog formatting and validates entries.
-
Sync thoughts if configured:
thoughts sync
Phase 6: Cleanup (Optional)
When workspace is no longer needed:
- Confirm with user before deletion
- Remove .extern/ directory:
rm -rf .extern/ - Research remains in
{thoughtsHome}/global/shared/extern/
Research Document Template
When creating research documents, use this structure:
---
date: {ISO timestamp}
repo: "{org}/{repo}"
url: "{full URL}"
topic: "{research topic}"
context: "{why this was studied}"
project: "{project that prompted this}"
tags: [extern-research, {domain-tags}]
---
# {Topic}: {Repo Name}
## Research Question
{What we set out to learn}
## Key Findings
### Finding 1: {Title}
{Description with code examples and file:line references}
### Finding 2: {Title}
{...}
## Applicable Patterns
{How these findings could apply to our work}
## References
- `{file}:{line}` - {description}
Suitable Subagents
| Agent | Use Case |
|-------|----------|
| codebase-pattern-finder | Find implementations, usage examples |
| codebase-analyzer | Deep dive on specific components |
| codebase-locator | Find files by purpose/feature |
| web-search-researcher | Additional context about the project |
Important Constraints
- ALWAYS check catalog first - Never clone without checking existing research
- Workspace is temporary - Do not store valuable data in .extern/
- Research is permanent - Always persist to
{thoughtsHome}/global/shared/extern/ - Update catalog - Every study must update the global catalog
- Respect clone depth - Use shallow clones unless full history needed
- Use org-repo naming - Include organization to avoid conflicts
Error Handling
- Catalog not found → Auto-created on first
add-studyat{thoughtsHome}/global/shared/extern/catalog.md - Clone fails → Check URL format, network, permissions; report to user
- Research exists but stale → Proceed with new research, reference existing findings
- Workspace already exists → Reuse existing .extern/, check if repo is already cloned