Cleaning Commit History
Clean up messy git histories into logical commit sequences that are easy to review and maintain.
VCS Detection
Check which VCS to use first:
jj root 2>/dev/null && echo "USE_JJ=true" || echo "USE_JJ=false"
If jj is available, prefer jj commands -- they're non-interactive and the oplog provides automatic safety (no manual backup branch needed).
Operating Procedure
Phase 0: Safety
Git: Create backup branch before any surgery:
git branch ${CURRENT_BRANCH}-backup
jj: Not needed -- oplog provides safety. Note the current operation ID with jj op log -n 1.
Phase 1: Inventory
- Determine base branch (main, master, or dev)
- Use
git merge-base(orjj log) to find comparison point - Inventory all feature-only changes with
git log --oneline $BASE..$FEATURE_BRANCH - Note large files, generated paths, vendored code, migrations
Phase 2: Sea of Changes
Compute the net diff from BASE to FEATURE_BRANCH (not commit-by-commit). This represents all changes that need reorganization.
Phase 3: Classify & Cluster
Cluster changes into logical buckets (strict priority):
- Generated/Vendored/Lockfiles -- isolated to dedicated commits
- Pure renames/moves -- separated from content changes
- Formatting-only (whitespace, import order, lint fixes) -- isolated
- Refactors without behavior change -- separate from logic
- Feature/Logic changes -- grouped by cohesive unit
- Tests -- co-located with their corresponding logic changes
Split when a commit mixes mechanical and semantic changes. Squash when multiple tiny edits serve the same concern.
Phase 4: Determine Commit Order
Order for buildability and minimal noise:
- Pure renames/moves
- Formatting-only sweep
- Refactors (non-behavioral)
- Schema/Migrations
- Feature/Logic in dependency order
- Tests (accompany or immediately follow their logic)
- Docs/Changelog
- Vendored/lockfile updates
Every intermediate state must build and pass tests.
Phase 5: Rebuild Commits
Git: git reset --mixed $BASE, then stage related hunks per planned commit with git add -p.
jj workflow:
# Squash related changes (always use -m!)
jj squash --from <change1> --into <change2> -m "combined message"
# Selective restore (jj split is interactive, avoid it)
jj new -m "first part"
jj restore --from @- <files-for-first-commit>
# Reorder
jj rebase -r <change> -d <new-parent>
# Clean up messages
jj describe -m "feat(scope): message"
# If anything goes wrong
jj op restore <before-surgery>
Phase 6: Validation
git diff $BASE..HEADequals the original sea (no loss of intent)- Each commit shows clean boundaries with minimal file overlap
- Every commit builds and tests successfully
- No secrets or large binary blobs
Commit Message Style
<type>(<scope>): <short description in present tense, under 72 chars>
- <Bullet point starting with verb>
- <What changed and why>
[Optional: BREAKING CHANGE:, Refs:, Co-authored-by:]
Types: feat, fix, refactor, perf, chore, test, docs, build, ci
Strict Rules
- Never mix formatting/import-order with behavior changes
- Always separate file renames/moves from edits to those files
- Always keep generated and vendored changes isolated
- Always co-locate tests with their logic change
- Never create broken intermediate states
Deliverables
- Safety Confirmation: Backup branch (git) or oplog snapshot (jj)
- Commit Plan: Ordered list with title, scope, type, rationale, and files
- Applied History: Rewritten commits matching the plan
- Summary Report: Changes vs original, tradeoffs, recovery instructions