Tree of Thoughts Reasoning Methodology
Purpose: Systematic parallel exploration of solution spaces through recursive branching, self-reflection, and rigorous evaluation. Use this methodology when facing complex problems with multiple viable solution paths.
When to Use Tree of Thoughts
✅ Use ToT when:
- Problem has multiple viable solution approaches (3+ fundamentally different paths)
- Need to find optimal solution, not just any solution
- Can define clear evaluation criteria
- Complexity justifies systematic exploration
- Trade-offs exist between competing approaches
- Strategic or architectural decisions with long-term impact
❌ Don't use ToT when:
- Problem has obvious single solution path
- Time-critical decisions with simple trade-offs
- Problem is well-defined with standard solution
- Exploratory work where breadth matters more than depth
Examples:
- "Should we use REST, GraphQL, or gRPC?" (3 paths, clear trade-offs) ✅
- "Design distributed caching system balancing latency, consistency, cost" (multi-dimensional) ✅
- "Fix this syntax error" (single path) ❌
- "Research all available databases" (breadth-of-thought better) ❌
Core Methodology: 5-Step Process
Step 1: Problem Decomposition (5+ Branches)
Objective: Identify 5+ fundamentally different approaches to explore
Actions:
- Analyze problem to identify key dimensions (technical, organizational, risk, cost, timeline)
- Brainstorm 5-10 distinct approaches (not variations of same approach)
- Define evaluation criteria from problem constraints
- Validate diversity: Each branch explores different solution philosophy
Example (Distributed Caching):
Branch A: Write-through consistency (strong consistency, higher latency)
Branch B: Eventual consistency (performance, weaker guarantees)
Branch C: Hybrid tiered (hot data write-through, cold data eventual)
Branch D: Edge-centric (CDN-style, geography-aware)
Branch E: Cost-optimized minimal (single region, no replication)
Deliverable: 5+ distinct approach definitions
Step 2: Parallel Branch Exploration
Objective: Explore each branch systematically with self-reflection
For each branch:
- Analyze approach against problem requirements
- Consider strengths, weaknesses, trade-offs
- Identify assumptions and constraints
- End with self-reflection (see template below)
Self-Reflection Template (REQUIRED for each branch):
## Branch [X]: [Approach Name]
[Analysis of this approach: 2-4 paragraphs covering requirements, strengths, weaknesses, trade-offs]
### Self-Reflection
- **Confidence**: [0-100]/100
- **Strengths**: [What makes this approach compelling]
- **Weaknesses**: [Gaps, assumptions, limitations]
- **Trade-offs**: [What you gain vs what you lose]
- **Recommendation**: [Continue deeper exploration? Prune? Why?]
Execution Options:
- With Task tool: Spawn 5+ parallel tasks for independent exploration
- Without Task tool: Explore branches sequentially using TodoWrite to track progress
- Hybrid: Use Task for complex branches, sequential for simple ones
Deliverable: 5+ explored branches with self-reflections
Step 3: Branch Evaluation (Scoring)
Objective: Systematically evaluate all branches against criteria
Evaluation Criteria (100 points total, 5 categories × 20 points):
-
Novelty (0-20): Does it explore new solution space vs obvious approaches?
- 18-20: Innovative approach, fresh perspective
- 12-17: Good approach with some novel elements
- 6-11: Standard approach, minor tweaks
- 0-5: Obvious/conventional approach
-
Feasibility (0-20): Practically implementable with reasonable resources?
- 18-20: Proven technology, clear implementation path
- 12-17: Feasible with moderate effort/risk
- 6-11: Significant technical challenges
- 0-5: Impractical or resource-intensive
-
Completeness (0-20): Addresses all stated requirements?
- 18-20: Covers all requirements comprehensively
- 12-17: Covers most requirements, minor gaps
- 6-11: Missing key requirements
- 0-5: Incomplete solution
-
Confidence (0-20): Branch's self-reflection confidence score?
- 18-20: High confidence (80-100%) with justification
- 12-17: Medium confidence (60-79%)
- 6-11: Low confidence (40-59%)
- 0-5: Very low confidence (<40%)
-
Alignment (0-20): Matches problem constraints and context?
- 18-20: Perfect fit for constraints
- 12-17: Good fit, minor misalignment
- 6-11: Notable misalignment
- 0-5: Poor fit for context
Scoring Process:
- Review each branch's analysis and self-reflection
- Score each branch on all 5 criteria (0-20 per criterion)
- Calculate total score (0-100) for each branch
- Rank branches by total score
- Select highest-scoring branch for deeper exploration
Deliverable: Scored ranking of all branches, winner selected
Step 4: Recursive Depth Exploration (Level 1+)
Objective: Recursively expand the best branch
Actions:
- Take highest-scoring branch from Step 3
- Decompose that branch into 5+ sub-approaches or refinements
- Repeat Steps 2-3 for the new level (explore → evaluate → select)
- Continue recursion until stopping criteria met
Minimum Depth: 4 levels (Level 0 → 1 → 2 → 3)
Level Transition Example:
Level 0: "Distributed caching system" (5 approaches)
→ Winner: Branch B (Eventual consistency)
Level 1: "Eventual consistency variants" (5 refinements)
- B.1: Last-write-wins
- B.2: Version vectors
- B.3: CRDTs
- B.4: Causal consistency
- B.5: Session consistency
→ Winner: Branch B.3 (CRDTs)
Level 2: "CRDT implementations" (5 options)
- B.3.1: G-Counter
- B.3.2: PN-Counter
- B.3.3: LWW-Element-Set
- B.3.4: OR-Set
- B.3.5: RGA (Replicated Growable Array)
→ Winner: Branch B.3.4 (OR-Set)
Level 3: "OR-Set optimizations" (5 variants)
[Explore specific implementation strategies]
→ Winner: Branch B.3.4.2 (Tombstone compaction)
Deliverable: Recursive tree with minimum 4 levels explored
Step 5: Final Synthesis
Objective: Synthesize insights into final recommendation
Actions:
- Trace winning path: Document Level 0 → Level 1 → Level 2 → Level 3+
- Extract key insights: What was learned at each level?
- Document pruned branches: Why were alternatives discarded?
- Calculate confidence: Final confidence score (see Bayesian formula below)
- State assumptions: What assumptions underpin the recommendation?
- Provide recommendation: Clear, actionable guidance
Synthesis Template:
## Tree of Thoughts Analysis Complete
### Winning Path
- **Level 0**: [Chosen approach] (Score: X/100)
- **Level 1**: [Refinement] (Score: X/100)
- **Level 2**: [Sub-refinement] (Score: X/100)
- **Level 3**: [Implementation] (Score: X/100)
### Key Insights
1. [Insight from Level 0]
2. [Insight from Level 1]
3. [Insight from Level 2]
4. [Insight from Level 3]
### Alternatives Considered
- [Branch A]: Pruned because [reason]
- [Branch C]: Pruned because [reason]
- [Branch D]: Pruned because [reason]
### Final Confidence: [X]%
**Justification**: [Why this confidence level based on exploration depth, evidence, and remaining uncertainties]
### Recommendation
[Clear, actionable recommendation with next steps]
### Remaining Uncertainties
- [Assumption 1]
- [Assumption 2]
Deliverable: Comprehensive synthesis with traced path and confidence score
Stopping Criteria
Stop exploration when ANY of:
- ✅ Reached 4+ levels AND best branch confidence >80%
- ✅ Reached 6 levels (maximum recommended depth)
- ✅ All branches converge to same solution across multiple levels
- ✅ Diminishing returns (Level N scores similar to Level N-1)
Warning signs (don't stop yet):
- ❌ Only 2-3 levels explored
- ❌ Confidence <80% without clear reason
- ❌ Winner not clearly superior to alternatives
Bayesian Confidence Scoring
Purpose: Quantify confidence based on accumulated evidence
Formula:
Prior Odds = P(correct) / (1 - P(correct))
Likelihood Ratio = Evidence strength (from scores)
Posterior Odds = Prior Odds × Likelihood Ratio
Final Confidence = Posterior Odds / (1 + Posterior Odds)
Practical Calculation:
- Start with prior confidence: 50% (neutral)
- For each evaluation criterion score (0-20):
- Convert to likelihood ratio:
LR = 0.25 + (score/20) * 3.75 - Update odds:
Odds = Odds × LR
- Convert to likelihood ratio:
- Convert back to probability:
Conf = Odds / (1 + Odds) - Cap at 95% (Bayesian humility for unknown unknowns)
Example:
- Branch scores: Novelty 18/20, Feasibility 19/20, Completeness 17/20, Confidence 18/20, Alignment 19/20
- Likelihood ratios: 3.62, 3.81, 3.44, 3.62, 3.81
- Final odds: 1.0 × 3.62 × 3.81 × 3.44 × 3.62 × 3.81 = 1,782
- Confidence: 1782 / 1783 = 99.9% → Capped at 95%
Confidence Interpretation:
- 90-95%: Exceptional evidence, suitable for critical decisions
- 80-89%: High confidence, suitable for important decisions
- 70-79%: Medium confidence, consider additional validation
- 60-69%: Low confidence, recommend further investigation
- <60%: Very low confidence, gather more information
Self-Critique Checklist
After applying ToT methodology, verify:
- [ ] Branch Diversity: Are all 5+ branches fundamentally different (not variations)?
- [ ] Self-Reflection Quality: Does each branch have genuine self-reflection (not boilerplate)?
- [ ] Evaluation Rigor: Did I systematically score all 5 criteria for each branch?
- [ ] Depth Achievement: Did I reach minimum 4 levels of exploration?
- [ ] Confidence Validity: Is final confidence score justified by exploration depth?
- [ ] Pruning Rationale: Can I explain why each non-selected branch was discarded?
- [ ] Path Traceability: Can I clearly trace the winning path from root to leaf?
- [ ] Synthesis Clarity: Does final output provide actionable recommendation?
- [ ] Stopping Appropriateness: Did I stop for valid reasons per criteria?
Common Mistakes to Avoid
- Too Few Branches: Using <5 branches reduces exploration quality
- Variation vs Diversity: Creating 5 variations of same approach instead of 5 different approaches
- Shallow Depth: Stopping at 1-2 levels instead of minimum 4
- Biased Evaluation: Favoring familiar approaches without systematic scoring
- Missing Self-Reflection: Skipping confidence assessment in branches
- Premature Convergence: Selecting winner before thorough evaluation
- Over-Recursion: Going beyond 6 levels without clear benefit
- Poor Synthesis: Not clearly documenting winning path and rationale
Reference Documentation
Detailed Templates: ~/.claude/skills/tree-of-thoughts/references/tree-of-thoughts-patterns.md
Includes:
- Branch exploration template (detailed prompts)
- Self-reflection rubric (confidence scoring guide)
- Evaluation matrix (scoring examples per criterion)
- Level transition logic (when/how to deepen)
- Edge case handling (convergence, insufficient diversity)
Quick Start Examples
Example 1: Simple Decision (3 levels)
Problem: Choose between Redis, Memcached, or Hazelcast for caching
Level 0 (3 branches):
- Branch A: Redis (rich data structures)
- Branch B: Memcached (pure speed)
- Branch C: Hazelcast (distributed computing) → Winner: Branch A (Redis) - 85/100
Level 1 (Redis deployment options):
- A.1: Single instance
- A.2: Sentinel (high availability)
- A.3: Cluster (horizontal scaling)
- A.4: Redis Enterprise
- A.5: Managed service (AWS ElastiCache) → Winner: Branch A.3 (Cluster) - 88/100
Level 2 (Cluster configuration):
- A.3.1: 3 masters, no replicas
- A.3.2: 3 masters, 3 replicas
- A.3.3: 6 masters, 6 replicas
- A.3.4: Auto-scaling cluster
- A.3.5: Hybrid (critical data replicated) → Winner: Branch A.3.2 (3+3) - 91/100
Confidence: 88% (3 levels, clear winner at each level)
Example 2: Complex Architecture (5 levels)
Problem: Design microservices communication strategy
Level 0: REST, gRPC, Message Queue, Event Sourcing, GraphQL (5 approaches) Level 1: [Winner] expanded into 5 sub-approaches Level 2: [Winner] expanded into 5 implementation variants Level 3: [Winner] expanded into 5 technology choices Level 4: [Winner] expanded into 5 deployment patterns
Confidence: 93% (5 levels, 25+ branches explored total)
Summary
Tree of Thoughts is a systematic methodology for exploring complex problem spaces through:
- Parallel branching (5+ approaches per level)
- Self-reflection (confidence scoring for each branch)
- Rigorous evaluation (5 criteria, 0-100 scoring)
- Recursive depth (minimum 4 levels)
- Bayesian confidence (evidence-based scoring)
Use it for strategic decisions, architectural choices, and optimization problems where systematic exploration yields better outcomes than intuition alone.
Remember: Quality over speed. ToT trades time for rigor. The goal is high-confidence optimal solutions, not quick answers.