/marketplace:diagnose — Diagnose a Marketplace Problem
If you see unfamiliar placeholders or need to check which tools are connected, see CONNECTORS.md.
A routed diagnostic entry point. Describe the symptom in natural language; the skill routes to the right playbook, pulls live data to confirm, and prescribes fixes grounded in the knowledge libraries.
Usage
/marketplace:diagnose <free-form symptom description>
Examples:
/marketplace:diagnose the homefeed has been getting blander for 8 weeks; same 100 sitters dominate
/marketplace:diagnose new paid members aren't booking in their first 60 days
/marketplace:diagnose search relevance regressed 4% on nDCG after yesterday's mapping change
/marketplace:diagnose Lisbon has 200 sitters and 5 owners, nothing matches
/marketplace:diagnose our cold-start fallback is showing the same 10 items to every new user
Workflow
1. Load Company Context
Read the <company>-marketplace-context skill if present:
gotchas.md— most incidents repeat. Check for prior cases matching this symptom.surfaces.md/events.md/indexes.md/recipes.md— to know what the current system looks like.observability.md— to know which monitors should have caught this.
If no context skill, prompt the user to bootstrap, or fall back to direct discovery.
2. Classify the Symptom
Match the user's description against the five branches. If ambiguous, ask a clarifying question.
| Branch | Keywords / signals | |--------|-------------------| | Cold-start | "new visitor", "new member", "new sitter", "new listing", "new city", "cold start", "no history", "first week", "first booking", "after pay but before booking", "post-register not booking" | | Conversion funnel | "conversion leak", "funnel drop", "anonymous to register", "register to pay", "pay to first booking", "repeat rate", "renewal", "drop-off" | | Feed health | "stale", "blander", "same items", "death spiral", "popularity bias", "dominating", "never changes", "recency drift", "filter bubble" | | Relevance regression | "search regressed", "after change X", "mapping change", "synonym change", "analyzer change", "nDCG dropped", "zero-result spike" | | Liquidity imbalance | "too many sitters", "not enough owners", "ratio off", "supply thin", "geographic imbalance", "seasonal peak", "unworkable segment" |
Announce the chosen branch before proceeding, and ask the user to confirm or correct.
3. Branch A — Cold-Start
Question to answer: For what entity? And what's the fallback doing?
Data to pull:
- From
events.md/~~data warehouse: the cold-start cohort (users with < N interactions, or items with < M impressions, or a named geo with < K listings) - From
~~personalisation engine: the current cold-start solution / filter / fallback - From
~~observability: the cold-start fallback rate (how often is the fallback triggered?)
Decision tree:
- Is the cold-start solution actually firing? Check fallback rate. If rate < 5% when cohort > 20%, the router isn't detecting cold-start cases — fix the detection.
- Is the fallback using the right features? Cold-start should use the features the user HAS (inferred role, geo, inbound channel, onboarding answers). If the fallback only uses
popularity, it will show the same items to every new user. - Does the fallback decay as the user accumulates signal? Check whether the router transitions to the main model at the right interaction count (typically 5-15).
- Is the cold-start cohort distinguishable in metrics? Impressions should be tagged so you can measure cold-start conversion separately.
Prescribed rules (reference by title from marketplace-personalisation):
cold-best-of-segment-popularity— use segment-aware popularity, not global popularitycold-use-v2-recipe-with-metadata— metadata-aware recipes beat pure interaction-count gatescold-capture-onboarding-intent— use the onboarding answers as synthetic interactionscold-reserve-exploration-slots— reserve a small fraction for unfamiliar itemscold-tag-cold-start-recs— tag every cold-start recommendation so measurement can segment
Ship-or-kill: propose a specific fix, an A/B cohort, a primary metric (cold-start conversion within N days), a guardrail (overall conversion), and an MDE.
4. Branch B — Conversion Funnel
Question to answer: Which stage is leaking, and why is this stage-specific?
Data to pull:
- From
~~product analytics(e.g., Amplitude): the funnel for the last 30 days with stage-by-stage conversion - From
~~data warehouse: the cohort composition of users who drop at each stage - From
marketplace.md: the primary revenue event and the ranking label
Common leaks and the rules they map to:
| Leak | Likely root cause | Rules |
|------|------------------|-------|
| Anonymous → registered | Landing page doesn't match inferred intent; trust copy weak; signup friction | signal-classify-inbound-intent, owner-show-specific-local-reviews, onboard-ask-role-before-anything-else |
| Registered → paid | Paywall static / mistimed / cost not anchored | convert-trigger-paywall-on-specific-listings, convert-anchor-price-against-local-alternative, convert-never-interrupt-active-search |
| Paid → first booking | Post-pay experience is cold-start; no concrete next action; first-stay competition brutal | sitter-provide-concrete-first-stay-path, sitter-be-honest-about-first-stay-competition, gap-warn-about-cold-start-penalty |
| First booking → repeat | First-experience friction (not a ranking problem; escalate to product) | escalate |
| Repeat → renewal | Renewal reminder generic; no loss-aversion framing | convert-use-loss-aversion-framing-on-soft-locks, proof-use-specific-peer-stories-not-aggregates |
Ship-or-kill: propose one concrete intervention at the leaking stage, an A/B cohort, a primary metric (stage-specific conversion rate), guardrails (downstream stages), and an MDE.
5. Branch C — Feed Health (Death Spiral / Staleness / Popularity Bias)
Question to answer: Is this feedback-loop bias, model drift, or a content-side problem?
Data to pull:
- From
~~observability: top-N Gini trend over time, coverage 7d trend, impression concentration metric - From
~~data warehouse: impression distribution by item, by day; click distribution by item, by day - From
recipes.md: current serving-time filters and re-ranker
Decision tree:
- Is top-N Gini rising over time? → death spiral pattern. Run
/marketplace:explore-eventson impressions to confirm feedback-loop bias. - Is coverage (distinct items served / 7d) declining? → model is narrowing. Check
cap-provider-exposurerule — is there a serving-time cap? If not, add one. - Are impressions concentrating faster than clicks? → feedback loop is in training data. Inverse-propensity weight the training data.
- Is recency the problem? → check freshness weighting in the ranker (
rank-use-function-score-for-business-signalswith a decay term). - Is there a content-side collapse? → new listing volume may have dropped. Check
indexes.mdfor update pattern and volume trend. - Is there feature drift? → check
features.mdfor coverage gaps, PSI alarms, training-serving skew. If a feature has drifted, the model's effective input distribution has shifted even though no code changed. Loadmarketplace-recsys-feature-engineeringrules (quality-gate-features-on-coverage-and-drift,quality-serve-training-and-inference-from-one-store) to frame the fix. - Has a feature been silently killed? → check whether any feature the solution depends on is in a kill-review or has lost its upstream pipeline. A ranker silently degrades when a feature becomes null but the serving path doesn't re-score. Cross-reference
features.mdconsumer mapping with the current solution definition.
Prescribed rules:
loop-detect-death-spirals— the monitor that should have caught thismatch-cap-provider-exposure— serving-time cap on single-item or single-supplier dominanceloop-reserve-random-exploration— exploration slots in the feedobs-track-coverage-and-gini— add if not already tracked
Ship-or-kill: propose a re-ranker cap, an experiment against current production, and the monitor thresholds to add.
6. Branch D — Relevance Regression
Question to answer: Does the golden set confirm the regression, and what specifically broke?
Data to pull:
- From
golden-set.md: the existing golden query set for the regressed surface - From
~~search engine: run the golden queries against both the previous mapping / settings and the current ones (if snapshot / point-in-time is available) - From
~~observability: pull the relevant nDCG / MRR / zero-result-rate monitors around the change time
If no golden set exists, advise the user to run /marketplace:build-golden-set search first — or fall back to a hand-picked sample of regression candidates from gotchas.md.
Decision tree:
- Run the golden-set diff. Report per-query nDCG delta, ranked by worst-regression.
- Cluster the regressed queries by intent class. A regression concentrated in one intent class (e.g., "specific dates") points to a specific rule violation.
- Trace the cluster to the change. Common causes:
- Synonym expansion inflates recall on precision queries →
query-curate-synonyms-by-domain - Analyzer mismatch between index and query time →
index-match-index-and-query-time-analyzers - BM25 parameter change cascades to all queries →
rank-tune-bm25-parameters-last - Filter-to-must movement drops exact-match constraints →
retrieve-use-filter-clauses-for-exact-matches
- Synonym expansion inflates recall on precision queries →
Prescribed rules: whichever of the above match, plus plan-maintain-a-decisions-log as a process reminder.
Ship-or-kill: decide between rollback, partial rollback (e.g., keep the synonym, drop the weekend expansion), or forward-fix. If forward-fix, propose an A/B with a tight guardrail on nDCG@10.
7. Branch E — Liquidity Imbalance
Question to answer: Is this a structural supply-demand problem that ranking can't fix, or is ranking making it worse?
Data to pull:
- From
liquidity.md: current state per geo / season / side - From
~~data warehouse: the ratio per geo for the current quarter vs the prior-year equivalent - From
surfaces.md: any surfaces that match users across geos (e.g., saved search alerts, emails)
Decision tree:
- Is the ratio structurally broken, or seasonal? Compare to the same period last year.
- Is the long side's engagement dropping? When ratios get extreme, the side with excess supply disengages — they pay but the product becomes noise.
- Is ranking masking the imbalance or amplifying it? A ranker that recommends the same scarce-side items to everyone exhausts the scarce side faster. Check for a side-cap.
- Are there mitigations in flight? Check
liquidity.md→ "mitigations in flight" section. Is the current mitigation working?
Prescribed rules:
match-balance-supply-demand— do not rank feasibility-impossible itemsmatch-cap-provider-exposure— cap scarce-side item exposure to slow burnoutgap-route-unworkable-segments-to-alternatives— surface alternatives when the segment is structurally unworkablegap-surface-seasonal-supply-constraints— set expectations explicitly for the user
Ship-or-kill: distinguish between acceptable liquidity imbalance (transparent user expectations), ranking amplifying imbalance (fix ranking), and structural marketplace problem (requires growth / supply-acquisition work, not ranking work). Route accordingly.
8. Propose a Ship-or-Kill Decision
Every branch ends with a single recommended next step:
- Fix now (if the root cause is clear, low-risk, and a quick patch)
- A/B test (if the fix is non-trivial or carries risk; propose primary metric, guardrails, MDE, cohort, duration)
- Escalate (if the problem is outside ranking / personalisation — e.g., product, growth, supply acquisition)
9. Update gotchas.md
After diagnosis, offer to append the diagnosis to gotchas.md with:
- Symptom
- Root cause (confirmed or suspected)
- Fix proposed
- Leading indicator that would have caught this earlier
Read-only posture
Every query in every branch is read-only and announced before execution.
Tips
- Start with
gotchas.md. More than half of diagnoses are repeats of prior incidents. - Don't skip to prescription — always pull live data first. Research-grounded rules are priors, not conclusions.
- Be explicit about which branch you're in — say it out loud so the user can redirect if you've misclassified.
- If multiple branches apply (e.g., a death spiral causing a funnel leak), address the upstream branch first.
Related Commands
/marketplace:explore-events— often the right first step if the diagnosis points to instrumentation/marketplace:review-change— if the diagnosis points to a recent ship/marketplace:build-observability— add the monitor that would have caught this next time/marketplace:build-golden-set— prerequisite for Branch D (relevance regression)