Repo Scanning Process Skill

Repo Scanning Process

Step-by-step procedure for scanning GitHub repos to gather corroborating evidence for bug clusters, assigning confidence tiers to each finding.

Overview

Loaded by the repo-scanner agent inside the x-bug-triage plugin. Walks each clustered bug through a fixed evidence-gathering pipeline against the up-to-three repos most likely to host the bug: matching open/recent issues, recent commits in the impact window, affected code paths, and recent deploys correlated to the cluster's first_seen timestamp. Each finding is graded against the evidence-tier policy and recorded as cluster evidence.

Prerequisites

Cluster table populated upstream by the bug-clustering stage
surface_repo_mapping configured for every product_surface present in clusters
Triage MCP server reachable for mcp__triage__search_issues, mcp__triage__inspect_recent_commits, mcp__triage__inspect_code_paths, and mcp__triage__check_recent_deploys
GitHub access tokens with read scope on the target repos

Instructions

Step 1: Select Repos

For each cluster:

Look up repos from surface_repo_mapping using the cluster's product_surface
Cap at top 3 repos per cluster (hard limit — never scan more)
If no mapping exists, note it as a warning and skip

Step 2: Search Issues

For each repo, call mcp__triage__search_issues with the cluster's symptoms and error_strings:

Match error strings against open/recent issues
Assign evidence tier based on match confidence

Step 3: Inspect Recent Commits

Call mcp__triage__inspect_recent_commits for each repo:

7-day window from current date
Filter by affected paths if known from the cluster's feature_area
Look for commits that touch relevant code paths

Step 4: Inspect Code Paths

Call mcp__triage__inspect_code_paths with the cluster's surface and feature_area:

Identify likely affected code paths
Check for recent changes or known fragile areas

Step 5: Check Recent Deploys

Call mcp__triage__check_recent_deploys for each repo:

Correlate deploy/release timing with cluster's first_seen timestamp
Recent deploy near first_seen is a stronger signal

Step 6: Assign Evidence Tiers

For each piece of evidence, assign a tier:

| Tier | Name | Criteria | |------|------|----------| | 1 | Exact | issue_match at >=0.9 confidence | | 2 | Strong | issue_match >=0.7, recent_commit >=0.8, affected_path >=0.7, recent_deploy >=0.8 | | 3 | Moderate | Lower confidence matches, sibling_failure | | 4 | Weak | external_dependency, heuristic proximity |

Step 7: Handle Degradation

If a repo is inaccessible or an API call fails:

Log a degraded scan result with the error reason
Continue scanning remaining repos — never abort the whole scan
Include degradation warnings in output

Output

cluster_evidence rows tagged with tier (1–4), source kind, repo, and finding link
cluster_scan_warnings rows for any repo skipped or degraded during scanning
Updated cluster evidence_summary field with per-tier counts

Error Handling

Missing surface→repo mapping: warn, skip the cluster's scan, proceed with remaining clusters
GitHub API rate limit hit: pause and resume, or degrade to "rate_limited" warning if budget exhausted
Single tool call failure: capture error reason, continue to next step, never abort a multi-step scan
All four signal sources empty for a cluster: record evidence_summary="empty" — downstream stages still run

Examples

Triggered automatically after owner-routing for each clustered bug. Typical output for a 12-cluster batch against 3 repos each: "12 clusters scanned, 7 with Tier 1 evidence, 3 with Tier 2 only, 2 with no evidence (Tier 4 weak only). 1 repo skipped (rate limit)."

Resources

Load evidence tier definitions for proper tier assignment:

!cat skills/x-bug-triage/references/evidence-policy.md

Agent Skills: Repo Scanning Process

Install this agent skill to your local

Skill Files