Hierarchical Matching Systems
This skill provides rigid diagnostic and architectural procedures for hierarchical matching systems. Follow checklists exactly—matching bugs often hide in skipped steps.
Quick Reference
- Algorithm selection: See references/decision-guide.md
- Algorithm details: See references/algorithms.md
1. Problem Classification Checklist
Before any work, classify the problem. Check ALL that apply:
□ TWO-SIDED: Both sides have preferences (students↔schools, workers↔jobs)
□ ONE-SIDED: Only one side has preferences (tasks→workers, items→bins)
□ HIERARCHICAL: Entities exist at multiple levels (org→dept→team→person)
□ WEIGHTED: Matches have costs/scores to optimize
□ CONSTRAINED: Hard limits exist (capacity, exclusions, required pairings)
□ STABLE: Solution must resist defection (no blocking pairs)
□ OPTIMAL: Solution must minimize/maximize objective function
□ FUZZY: Entities may partially match (entity resolution, deduplication)
Classification determines algorithm family. Proceed to Section 2 for architecture or Section 3 for debugging.
2. Architecture Procedure
Follow these phases in order when designing a new matching system.
Phase 2.1: Requirements Translation
Convert each business requirement into formal constraints:
| Business Requirement | Constraint Type | Formal Expression |
|---------------------|-----------------|-------------------|
| "Each student gets one school" | Capacity | |matches(s)| = 1 ∀ student s |
| "Schools have seat limits" | Capacity | |matches(school)| ≤ capacity |
| "Siblings must be together" | Coupling | school(s1) = school(s2) if siblings(s1,s2) |
| "Student X cannot attend Y" | Exclusion | (X, Y) ∉ matches |
| "Priority for residents" | Preference ordering | rank(resident) < rank(non-resident) |
Checklist:
□ List ALL business requirements
□ Classify each as: capacity | coupling | exclusion | ordering | soft preference
□ Identify conflicts between requirements (document tradeoffs)
□ Distinguish HARD constraints (must satisfy) from SOFT (optimize toward)
□ Validate translations with stakeholder examples
Phase 2.2: Algorithm Selection
Use references/decision-guide.md to select algorithm. Verify:
□ Algorithm handles all HARD constraints
□ Algorithm can optimize SOFT constraints (or document gaps)
□ Complexity acceptable for data size (see references/algorithms.md)
□ Stability requirements met (if two-sided)
□ Optimality requirements met (if weighted)
Phase 2.3: Data Model Design
Define entities, relationships, and preference representation:
□ Entity schema for each side (attributes, identifiers)
□ Preference representation (ranked list | score matrix | pairwise comparisons)
□ Constraint encoding (how exclusions/couplings are stored)
□ Hierarchy representation (if multi-level: tree | DAG | adjacency list)
□ Tie-breaking rules (deterministic ordering for equal preferences)
Phase 2.4: Interface Contracts
Specify inputs, outputs, and invariants:
Input Contract:
□ Preference format and validation rules
□ Constraint format and validation rules
□ Required vs optional fields
□ How missing preferences are handled (reject | default rank | exclude)
Output Contract:
□ Match format (pairs | assignment map | ranked list)
□ Unmatched entity handling (explicit list | null matches | error)
□ Match metadata (scores, stability proof, constraint satisfaction report)
Invariants:
□ Determinism: same input → same output (document randomness if any)
□ Completeness: all entities matched OR explicitly unmatched
□ Validity: all matches satisfy hard constraints
Phase 2.5: Testing Strategy
Define validation before implementation:
□ Unit tests for preference parsing and constraint validation
□ Property tests: stability, optimality, constraint satisfaction
□ Edge cases: empty inputs, single entity, all tied preferences
□ Regression tests from known-good examples
□ Performance benchmarks at target scale
3. Debugging Procedure
Follow this diagnostic sequence for any matching issue. Do not skip steps.
Phase 3.1: Symptom Classification
Identify the symptom category:
| Symptom | Category | Go To | |---------|----------|-------| | Same inputs, different outputs | INSTABILITY | 3.2 | | Matches violate business rules | CONSTRAINT VIOLATION | 3.3 | | Matches technically valid but "wrong" | PREFERENCE MISALIGNMENT | 3.4 | | Errors with nested/hierarchical data | HIERARCHY BUG | 3.5 | | Poor performance at scale | PERFORMANCE | 3.6 |
Phase 3.2: Instability Diagnosis
Root causes of non-deterministic matches:
□ RANDOMNESS: Check for unseeded RNG in tie-breaking
→ Fix: Use deterministic tie-breaker (lexicographic ID, timestamp)
□ FLOATING POINT: Check score comparisons for floating point issues
→ Fix: Use epsilon comparison or integer scores
□ HASH ORDERING: Check if iteration order depends on hash maps
→ Fix: Sort keys before iteration
□ PARALLEL RACE: Check for concurrent modifications
→ Fix: Synchronize or use sequential processing
□ INPUT ORDERING: Check if algorithm is order-sensitive
→ Fix: Canonicalize input order (sort by ID)
Verification:
□ Run matching 10x with identical inputs
□ Diff all outputs
□ If any differ, add logging to identify divergence point
Phase 3.3: Constraint Violation Diagnosis
Diagnostic sequence:
1. □ IDENTIFY: Which specific constraint is violated?
→ List the violated constraint and the violating match
2. □ TRACE: Where should constraint be enforced?
→ Map constraint to code location (filter | validation | algorithm step)
3. □ VERIFY ENCODING: Is constraint correctly represented?
→ Print constraint data structure, verify against requirement
4. □ VERIFY ENFORCEMENT: Is constraint checked at right time?
→ Add logging before/after enforcement point
5. □ CHECK ORDERING: Is constraint checked before conflicting decisions?
→ Trace decision sequence, verify constraint checked first
6. □ CHECK COMPLETENESS: Are all instances covered?
→ Enumerate all entities that should be constrained
Common failure patterns:
| Pattern | Symptom | Fix | |---------|---------|-----| | Late enforcement | Valid intermediate state, invalid final | Move check earlier | | Partial coverage | Some entities constrained, others not | Enumerate all cases | | Soft vs hard confusion | Constraint violated for "better" match | Reclassify as hard | | Stale data | Constraint on outdated values | Refresh before check |
Phase 3.4: Preference Misalignment Diagnosis
When matches are valid but don't reflect intended priorities:
1. □ EXTRACT: Get the actual preference data used
→ Log/print the exact preference structure at match time
2. □ COMPARE: Check against expected preferences
→ Side-by-side diff with business-stated priorities
3. □ TRACE TRANSFORMATION: Follow preference from input to algorithm
→ Log at each transformation step (parsing, normalization, scoring)
4. □ CHECK SCORING: Verify score calculation
→ Manual calculation for 2-3 example cases
5. □ CHECK AGGREGATION: If multi-criteria, verify combination
→ Test each criterion independently, then combined
6. □ CHECK NORMALIZATION: Verify scale/range handling
→ Check for min/max, z-score, or rank normalization bugs
Scoring function checklist:
□ Direction correct (higher = better or lower = better, consistently)
□ Scale appropriate (no single factor dominating)
□ Missing values handled (null → 0? → excluded? → default?)
□ Ties handled explicitly (not left to floating point chance)
□ Edge cases: extreme values, all same values, single candidate
Phase 3.5: Hierarchy Traversal Diagnosis
For multi-level matching issues:
1. □ VISUALIZE: Draw the hierarchy with the problematic match
→ Tree diagram showing all levels and the match path
2. □ CHECK INHERITANCE: Do child constraints inherit from parent?
→ Verify constraint propagation rules
3. □ CHECK AGGREGATION: How do child preferences roll up?
→ Verify aggregation function (sum | max | weighted | majority)
4. □ CHECK LEVEL INTERACTION: Can matches cross levels?
→ Document allowed/forbidden cross-level matches
5. □ CHECK TRAVERSAL ORDER: Top-down or bottom-up?
→ Verify algorithm processes levels in intended order
6. □ CHECK PARTIAL MATCHES: Can a parent match without all children?
→ Document completeness requirements per level
Common hierarchy bugs:
| Bug | Symptom | Fix | |-----|---------|-----| | Missing propagation | Child ignores parent constraint | Add inheritance logic | | Double counting | Same entity weighted multiple times | Deduplicate in aggregation | | Level skipping | Match at wrong level | Add level validation | | Orphan handling | Unattached children cause errors | Define orphan policy |
Phase 3.6: Performance Diagnosis
For scale and speed issues:
1. □ PROFILE: Identify the slow component
→ Time each phase: input parsing, preference building, matching, output
2. □ COMPLEXITY CHECK: Verify actual vs expected complexity
→ Log iteration counts, compare to theoretical O(n)
3. □ MEMORY CHECK: Profile memory usage
→ Watch for preference matrix explosion (n² space)
4. □ ALGORITHM FIT: Verify algorithm appropriate for scale
→ See references/algorithms.md for complexity comparison
5. □ CACHING: Check for redundant computation
→ Log cache hits/misses for preference lookups
6. □ BATCH VS STREAMING: Check processing model
→ Full recomputation vs incremental updates
4. Testing & Validation Procedure
4.1: Correctness Properties
Test these properties for every matching system:
□ DETERMINISM: run(input) = run(input) (10 trials minimum)
□ COMPLETENESS: all entities either matched or explicitly unmatched
□ VALIDITY: all matches satisfy all hard constraints
□ STABILITY (if applicable): no blocking pairs exist
□ OPTIMALITY (if applicable): objective function at expected value
4.2: Stability Verification (Two-Sided Matching)
For stable matching, verify no blocking pairs:
# Pseudocode - verify no blocking pair exists
for each unmatched_pair (a, b):
if a prefers b over current_match(a):
if b prefers a over current_match(b):
FAIL: blocking pair found (a, b)
□ Enumerate all non-matched pairs
□ Check mutual preference for each
□ Report any blocking pairs found
□ For large instances, sample-check (document coverage)
4.3: Constraint Satisfaction Verification
□ List all hard constraints
□ For each match, verify against each constraint
□ Generate constraint satisfaction report
□ Flag any violations with specific match and constraint
4.4: Edge Case Test Suite
Mandatory test cases:
□ Empty input (no entities on one or both sides)
□ Single entity (one-to-one degenerate case)
□ All identical preferences (maximum tie scenario)
□ Mutually exclusive preferences (everyone wants same thing)
□ Impossible constraints (unsatisfiable, should error clearly)
□ Maximum capacity (all slots exactly filled)
□ Minimum capacity (barely enough slots)
□ Self-referential (can entity match itself? test boundary)
□ Circular preferences (A→B→C→A)
4.5: Regression Test Maintenance
□ Capture real production cases that revealed bugs
□ Minimize to smallest reproducing example
□ Document expected behavior explicitly
□ Run on every change to matching logic
5. Review Checklist
When reviewing matching system code or design:
5.1: Design Review
□ Problem correctly classified (Section 1)
□ Algorithm appropriate for problem class (references/decision-guide.md)
□ All business requirements mapped to constraints (Section 2.1)
□ Hard vs soft constraints clearly distinguished
□ Tie-breaking is deterministic and documented
□ Hierarchy semantics defined (if applicable)
5.2: Implementation Review
□ Preference representation matches algorithm requirements
□ Constraints enforced at correct point in algorithm
□ No hidden randomness (unseeded RNG, hash iteration)
□ Floating point comparison handled correctly
□ Edge cases handled (empty, single, ties)
□ Error messages identify specific constraint violations
5.3: Testing Review
□ All properties from 4.1 tested
□ Edge cases from 4.4 covered
□ Performance benchmarked at realistic scale
□ Regression tests exist for past bugs
Appendix: Common Anti-Patterns
| Anti-Pattern | Problem | Solution | |--------------|---------|----------| | Greedy first-come | Order-dependent, non-optimal | Use proper algorithm | | Score = sum(all factors) | One factor dominates | Normalize scales | | Retry until valid | Non-deterministic, slow | Fix constraint order | | Global preference cache | Stale across updates | Invalidate on change | | String matching for entities | Case/whitespace bugs | Use canonical IDs | | Float equality for ties | Non-deterministic | Use epsilon or integer | | Recursive hierarchy walk | Stack overflow risk | Use iterative with explicit stack | | N² preference matrix | Memory explosion | Use sparse representation |