Confidence Levels
Express confidence as a percentage, not vague certainty.
Core Principle
A thorough analysis that looks certain but isn't can mislead users into wrong decisions. Conflating explanation quality with evidence quality causes harm.
Critical Rules
| Rule | Enforcement | |------|-------------| | Express confidence as % | Not "probably" - use "70% confident" | | Explain gaps below 95% | Mandatory "Why not 100%?" | | Validate before presenting | If you can gather evidence, do it | | Show your math | Evidence adds confidence, gaps subtract |
Confidence Scale
| Range | Icon | Meaning | |-------|------|---------| | 0-30% | π΄ | Speculation - needs significant validation | | 31-60% | π‘ | Plausible - evidence exists but gaps remain | | 61-85% | π | Likely - strong evidence, minor gaps | | 86-94% | π’ | High confidence - validated, minor uncertainty | | 95-100% | π― | Confirmed - fully validated |
Calibration Guide
| Level | Meaning | |-------|---------| | 20% | One possibility among several | | 40% | Evidence points this direction, key assumptions unverified | | 60% | Evidence supports this, alternatives not ruled out | | 80% | Strong evidence, assumptions verified, alternatives less likely | | 95% | Validated with direct evidence, alternatives ruled out | | 100% | Mathematical/logical certainty only |
Pre-Conclusion Checkpoint
Before claiming ANY conclusion, complete this:
1. Evidence Inventory
- What hard evidence supports this?
- Direct evidence (code/logs that prove it)?
- What's the strongest piece of evidence?
2. Falsifiability Check
- What would INVALIDATE this theory?
- Have I looked for that data?
- If no: WHY NOT?
3. Assumption Audit
- What am I assuming WITHOUT verification?
- Mark each: [VERIFIED] or [ASSUMED]
4. Alternative Possibilities
- What else could explain these symptoms?
- Why is my conclusion more likely?
Confidence Scoring
Start at 50% (neutral) and adjust:
| Factor | Adjustment | |--------|------------| | Direct evidence (code/logs proving it) | +15-25% | | Verified assumptions (checked, not assumed) | +10-15% | | Alternatives ruled out with evidence | +10-15% | | Falsifiability check completed | +5-10% | | Unverified critical assumptions | -15-25% | | Plausible alternatives not ruled out | -10-15% | | No falsifiability check | -10% | | Circumstantial evidence only | -10% |
Response Format
[Icon] [X%] Confidence: [One sentence finding]
Evidence:
- [+X%] [Evidence 1]
- [+X%] [Evidence 2]
Why not 100%:
- [-X%] [Gap 1]
To increase confidence: [What's needed]
Examples
WRONG - Premature Certainty
Claude: "I have complete clarity on the root cause. The issue
is definitely the database connection pool being exhausted.
Here's my detailed analysis..."
[200 lines of report]
User: [provides data that contradicts analysis]
Claude: "This changes everything..."
Problem: Claimed certainty with unverified assumptions.
CORRECT - Honest Assessment
π‘ 40% Confidence: Database connection pool may be exhausted
Evidence:
- [+15%] Error pattern matches connection timeout
- [+10%] Load spike occurred around failure time
Why not 100%:
- [-25%] Haven't verified actual pool metrics
- [-15%] Haven't ruled out: network issues, DNS problems
- [-10%] This is mostly pattern matching, not direct evidence
To increase confidence: I need the connection pool metrics
from the time of failure. Can you provide those?
WRONG - Confidence in Explanation Quality
Claude: "Based on my thorough analysis of the code flow,
I'm certain the bug is in the authentication middleware."
Problem: Thoroughness of presentation β correctness. Detailed reasoning without verified evidence is speculation.
CORRECT - Evidence-Based Confidence
π 75% Confidence: Bug is in authentication middleware
Evidence:
- [+20%] Error message explicitly mentions "token expired"
- [+15%] Code trace shows validation at auth/middleware.ts:45
- [+10%] Token timestamp is from 2 hours ago
Why not 100%:
- [-10%] Haven't verified the clock sync between servers
- [-15%] Could be a different middleware in the chain
To increase confidence: Let me check the server time sync
and trace the full middleware chain.
Self-Validation Rule
Before returning to user with questions you can answer yourself:
Can I gather more evidence myself?
ββ Search codebase for confirming/denying data?
ββ Fetch a file that validates an assumption?
ββ Check actual state vs assumed state?
ββ Run a test to verify?
If YES β DO IT. Then reassess confidence.
If NO β Present with honest confidence + what you need.
Critical: If confidence is below 80% and you CAN gather more evidence β DO IT.
Trigger Words
Auto-invoke this skill when about to claim:
- "root cause is", "the problem is"
- "complete clarity", "definitely", "certainly"
- "clearly the issue", "obviously"
- Any conclusive claim during investigation
Integration
| Skill | Relationship |
|-------|--------------|
| critical-peer | Challenge conclusions lacking evidence |
| research-first | Gather evidence before concluding |
| debugging-methodology | Evidence-based investigation |
Anti-Patterns
| Anti-Pattern | Violation | |--------------|-----------| | "Complete clarity" | Claimed certainty without validation | | "Definitely the issue" | Unqualified conclusion | | Building detailed reports | Thoroughness β correctness | | "It's probably X" | Missing confidence % and gaps | | Skipping falsifiability | Haven't asked "what would prove me wrong?" |
Quick Reference
- [ ] Did I express confidence as a percentage?
- [ ] Did I explain what's stopping 100%?
- [ ] Did I show evidence for the % claimed?
- [ ] Could I gather more evidence myself?
- [ ] Did I check for falsifying evidence?