TiDB Optimizer Bugfix Skill
Fix TiDB optimizer bugs with minimal, reviewable changes and strong regression protection.
When to use
Use this skill when the task is any optimizer/planner correctness bug, including plan cacheability, logical/physical rewrite rules, cost-based selection regressions, and SQL compatibility edge cases.
Non-negotiable rules
-
Keep the fix minimal.
- Only change logic that is on the proven root-cause path.
- Do not mix unrelated cleanup, refactor, or naming churn.
-
Cover language features impacted by the bug and lock behavior with tests.
- Expand coverage across affected SQL constructs (for example: subquery, CTE, set operator, derived table, partition pruning mode, prepare vs non-prepare).
- Prefer behavior-focused assertions over implementation details.
-
Test design MUST follow
tidb-test-guidelines.- Read and apply
.agents/skills/tidb-test-guidelines/SKILL.mdin the TiDB repo. - Prefer appending to existing test suites/files.
- Follow placement, naming, fixture reuse, and shard-count guidance from the guideline.
- When adding new tests, MUST run
make bazel_prepare.
- Read and apply
-
Persist debugging knowledge for future bugfixes.
- Summarize issues encountered, observed symptoms, validated root cause, and why the final fix is minimal.
- Add or append a note under TiDB
docs/notein the relevant subsystem path. - Include the key regression SQL/test shape and validation commands in the note.
Required inputs
- A concrete bug case (issue link, failing SQL, or failing test).
- Target branch and module scope.
- Original PR link/number when the task is PR-driven and you need the related-issue bot sweep.
If either is missing, ask once, then proceed with explicit assumptions.
Workflow
-
Reproduce and localize
- Reproduce with the smallest SQL/test case.
- Locate ownership paths with
rgand existing tests before writing code. - Build 1-3 root-cause hypotheses and rank by likelihood.
-
Prove the hypothesis before fixing
- Confirm the highest-probability hypothesis with targeted inspection and/or a minimal failing test.
- If evidence rejects it, move to the next hypothesis.
-
Implement the smallest defensible fix
- Change only the necessary branch/condition/state transition.
- Keep intent explicit and avoid broad reordering unless required for correctness.
-
Mandatory self-check loop
loop {
ask: Is the fix minimal?
answer: remove any code not required for the proven root cause.
ask: Does every added line have meaning, and is it covered by a test case?
answer: map each added line to one behavior assertion; if no mapping exists, remove it or add a test.
verify:
- reproduce failure before fix (or document why pre-fix replay is infeasible)
- add/extend tests (prefer existing suites)
- run targeted tests to prove the hypothesis
if all checks pass: break
}
- Related-issue sweep for PR-driven fixes
- If there is an original PR, run:
curl -X POST https://tiara.hawkingrei.com/issues/trigger-reply/<issue_id>
- Replace
<issue_id>with the ID of the issue being fixed (which is typically linked in the original PR). - After the request completes, inspect the bot reply on the original PR and collect the related issues it points out.
- Before closing the task, check whether the current fix also resolves any of those related issues.
- If a related issue is also fixed, add regression coverage or explicit validation for that SQL shape when practical.
- If a related issue is not covered, call it out as remaining follow-up work instead of silently assuming it is fixed.
-
Validation and closure
- Confirm old behavior is rejected and new behavior is accepted.
- Keep validation scope minimal but sufficient to prove no regression in nearby semantics.
-
Write reusable notes
- Record debugging pitfalls and verified behavior in
docs/note. - Prefer appending an existing note file over creating a new top-level note file.
- Record debugging pitfalls and verified behavior in
Test strategy checklist
- Start from existing optimizer/planner tests; append cases first.
- Include regression case for the exact failing SQL shape.
- Add one nearby semantic variant to prevent trivial overfitting.
- If bug affects prepare path, verify both prepare and non-prepare semantics when relevant.
- Keep assertions stable and deterministic.
Command recipes (TiDB repo)
Use targeted commands and keep evidence explicit.
# Locate code and nearby tests
rg -n "<keyword_or_symbol>" pkg/planner pkg/expression pkg/executor
# Typical targeted optimizer/planner tests
go test -run <TestName> -tags=intest,deadlock ./pkg/planner/...
# If package uses failpoints, decide via search first
rg -n "failpoint\\.Inject|failpoint\\.Enable" pkg/planner/<subpkg>
# Trigger the PR bot to surface related issues, then inspect the original PR reply
curl -X POST https://tiara.hawkingrei.com/issues/trigger-reply/<issue_id>
# MUST run bazel prepare after adding new tests (also run it for other repository-required cases)
make bazel_prepare
Reporting contract
When finishing, report:
- Files changed and why each change is necessary.
- Risk check: correctness, compatibility, performance.
- Exact verification commands used.
- Whether the original PR bot surfaced related issues, and which ones were also fixed or left for follow-up.
- Note path added/updated under
docs/note(or why note update was skipped). - What was not verified locally.