publish-readiness
The gate between a private repo and a public one. Working-tree scans catch today's leaks; history is where leaks hide. Verdict at the end: ship or fix first, with exact commands.
The checklist
Run every check. Report each as pass/fail with evidence.
1. Working tree scan
Search tracked files for:
- Secrets: API keys, tokens, passwords, private keys (
grep -rEfor common prefixes likeAKIA,ghp_,sk-,AIza,-----BEGIN, pluspassword\s*=) - Private IPs: RFC 1918 ranges (
10\.,172\.(1[6-9]|2[0-9]|3[01])\.,192\.168\.). Docs and examples should use RFC 5737 (192.0.2.x,198.51.100.x,203.0.113.x) or RFC 2544 (198.18.0.0/15) ranges instead. - Internal hostnames, machine names, LAN service URLs, personal email addresses that are not the public git identity
- Agent session dirs accidentally tracked:
.claude/,.codex/, memory handoffs, transcripts
A policy-driven scanner beats ad hoc grep; content-guard does this with a pre-push hook. Scan only tracked files (git ls-files), not node_modules/.
2. History scan
The tree being clean means nothing if a secret was committed and later deleted:
git log --all -p | grep -nE '<patterns from check 1>'
git log --all --pretty=format:'%s%n%b' | grep -nE '<same>' # commit messages too
3. Hygiene
- LICENSE present and intentional
- README states what/why/how-to-run; no dead internal links or references to private repos
.gitignorecovers.claude/,.codex/, env files, build artifacts- No leftover TODO referencing private context ("ask <name> about the prod box")
4. Remote state (if already public)
- No backup branches holding pre-scrub history:
git ls-remote origin 'refs/heads/backup/*' - Tags match local after any history rewrite
Remediation: history rewrite
When check 2 fails, forward-only "scrub commits" do not fix it; old content stays reachable. The recipe:
- Tarball
.gitsomewhere outside the repo (the only real safety net; local backup branches get rewritten too). - Build a replacements file,
old==>newper line, longest matches first. git filter-repo --replace-text <file> --force(also--replace-messageif commit messages quoted the strings).- filter-repo strips the origin remote; re-add it.
- Force-push the branch AND every tag (filter-repo repoints tags to new SHAs; stale remote tags keep the leak alive).
- Delete any remote backup branches.
- Verify from a fresh clone in /tmp, never from the working tree.
Hard truth about GitHub: after a force-push, the old commits remain fetchable by their full SHA until GitHub's internal GC runs, which can take weeks and cannot be triggered. If the leaked content is genuinely sensitive, the instant fix is delete-and-recreate the repo (new object store, old SHAs 404 immediately; costs stars and created-date) and rotate any leaked credential regardless. A credential that touched a public repo is burned; rotation is not optional.
Verdict format
## publish-readiness: <repo> (<date>)
| Check | Result | Evidence |
|-------|--------|----------|
**Verdict:** SHIP / FIX FIRST
(if FIX FIRST: numbered fix list with exact commands, blocking items marked)
Common mistakes
- Scanning only the working tree and calling it clean.
- Trusting the local clone to verify a purge. Fresh clone or it did not happen.
- Treating example IPs as harmless. Real internal addressing leaks topology; use the reserved documentation ranges.
- Publishing first, planning to "clean up later". Later is after the crawlers.