Marketplace Engineering Two-Sided Search and Recsys Planning Best Practices
Comprehensive planning, design and diagnostic guide for search and recommendation systems in two-sided trust marketplaces. Covers OpenSearch index, query and ranking patterns, the methodology for planning retrieval work, the handoff points to recommendation-specific tooling, and the instrumentation and dashboard layer that turns measurement into ongoing decision making. Contains 57 rules across 10 categories ordered by cascade impact, plus two playbooks (plan a new system from scratch, diagnose an existing one) and explicit living-artefact conventions (decisions log, golden set, gotchas).
When to Apply
Reference this skill when:
- Planning a new marketplace retrieval project from scratch
- Reviewing an existing retrieval system that feels stale, unfair, or unpersonalised
- Designing the OpenSearch index mapping, analyzers, or query DSL
- Choosing retrieval primitives per product surface (search, recs, hybrid, curated)
- Deciding which search quality metrics to track and dashboard
- Running the weekly search-quality review ritual
- Diagnosing a silent regression in ranking, coverage, or zero-result rate
- Deciding when a retrieval problem is actually a personalisation problem
This skill is the precursor to marketplace-personalisation. Start here for
planning and search work; hand off to the personalisation skill when the diagnosed
bottleneck is impression tracking, feedback-loop bias, or AWS Personalize-specific
design.
Living Context
This skill treats the system as evolving. Three living artefacts carry context across sessions, releases, and team changes — read them before making suggestions, update them after every shipped change:
gotchas.md(in this skill folder) — append-only diagnostic lessons. Every gotcha has a date and a short description of what surprised the team and how it was resolved.- Decisions log (maintained in the product repo, typically
decisions/*.md) — every ranking change, schema tweak, and synonym edit recorded with its hypothesis, offline and online evidence, ship criterion, outcome, and rollback path. See ruleplan-maintain-a-decisions-log. - Golden query set (frozen per eval cycle, committed to the product repo) — the
reference set of queries against which every ranking change is offline-evaluated
before an online test. See rule
plan-version-the-golden-set.
Rule Categories
Categories are ordered by cascade impact on the retrieval lifecycle: intent misunderstanding poisons architecture; wrong architecture poisons index; wrong index poisons retrieval forever until a reindex; every downstream layer inherits the upstream error.
| # | Category | Prefix | Impact |
|---|----------|--------|--------|
| 1 | Problem Framing and User Intent | intent- | CRITICAL |
| 2 | Surface Taxonomy and Architecture | arch- | CRITICAL |
| 3 | Index Design and Mapping | index- | HIGH |
| 4 | Planning and Improvement Methodology | plan- | HIGH |
| 5 | Query Understanding | query- | MEDIUM-HIGH |
| 6 | Retrieval Strategy | retrieve- | MEDIUM-HIGH |
| 7 | Relevance and Ranking | rank- | MEDIUM-HIGH |
| 8 | Search and Recommender Blending | blend- | MEDIUM |
| 9 | Measurement and Experimentation | measure- | MEDIUM |
| 10 | Instrumentation, Dashboards and Decision Triggers | monitor- | MEDIUM |
Quick Reference
1. Problem Framing and User Intent (CRITICAL)
intent-map-queries-to-intent-classes— classify before retrievingintent-separate-known-item-from-discovery— different failure modes, different strategiesintent-audit-live-query-logs-first— design from real data, not imagined dataintent-distinguish-transactional-from-exploratory— precision vs diversityintent-reject-one-search-for-everything— per-surface query shapesintent-treat-no-search-as-first-class-choice— curated is a legitimate answer
2. Surface Taxonomy and Architecture (CRITICAL)
arch-map-surface-to-retrieval-primitive— a single-source-of-truth routing tablearch-split-candidate-generation-from-ranking— two-stage pipelinesarch-design-zero-result-fallback— declare fallback owner per surfacearch-design-for-cold-start-from-day-one— cold start is permanent, not bootstraparch-avoid-mono-stack-retrieval— diversify primary dependenciesarch-route-surfaces-deliberately— every routing decision recorded
3. Index Design and Mapping (HIGH)
index-design-mappings-conservatively— reindex is expensiveindex-use-keyword-and-text-as-multi-fields— full-text plus exact matchindex-match-index-and-query-time-analyzers— tokens must agreeindex-use-language-analyzers-for-language-fields— language-aware stemmingindex-separate-searchable-from-display-fields— index only what you searchindex-use-index-templates-for-consistency— prevent mapping driftindex-stream-listing-updates-via-cdc— freshness in seconds, not hours
4. Planning and Improvement Methodology (HIGH)
plan-audit-before-you-build— instrumentation gate on kick-offplan-build-golden-query-set-first— the first artefact, not the lastplan-find-bottleneck-before-optimising— theory of constraintsplan-maintain-a-decisions-log— living context across team changesplan-version-the-golden-set— frozen per eval cycleplan-handoff-to-personalisation-skill— recognise the boundary
5. Query Understanding (MEDIUM-HIGH)
query-normalise-before-anything-else— canonical string inquery-use-language-analyzers-for-stemming— double-digit recall winsquery-curate-synonyms-by-domain— domain vocabulary not thesaurusquery-use-fuzzy-matching-for-typos— 10-15% of queries have typosquery-classify-before-routing— single-pass classifierquery-build-autocomplete-on-separate-index— latency isolation
6. Retrieval Strategy (MEDIUM-HIGH)
retrieve-use-filter-clauses-for-exact-matches— filter cache winsretrieve-use-bool-structure-deliberately— must vs should vs filterretrieve-run-expensive-signals-in-rescore— rescore window limits costretrieve-combine-bm25-and-knn-via-hybrid-search— lexical plus semanticretrieve-paginate-with-search-after— constant-cost deep paginationretrieve-choose-embedding-model-deliberately— re-embedding is expensive
7. Relevance and Ranking (MEDIUM-HIGH)
rank-tune-bm25-parameters-last— upstream levers firstrank-use-function-score-for-business-signals— explicit named functionsrank-deploy-ltr-only-after-golden-set-exists— supervised learning needs labelsrank-apply-diversity-at-rank-time— after scoring, not beforerank-normalise-scores-across-retrieval-primitives— comparable scales
8. Search and Recommender Blending (MEDIUM)
blend-use-search-alone-for-specific-intent— precision queriesblend-combine-search-and-personalisation-scores— normalised weighted sumblend-keep-hybrid-blending-explainable— traceable resultsblend-never-return-zero-results— guaranteed cascade to non-empty
9. Measurement and Experimentation (MEDIUM)
measure-define-session-success-per-surface— one definition per surfacemeasure-track-ndcg-mrr-zero-result-rate— three metrics for one picturemeasure-track-reformulation-rate-as-failure-signal— cheapest failure metricmeasure-use-click-models-for-implicit-judgments— scale beyond human judgesmeasure-run-interleaving-as-cheap-ab-proxy— 10x less sample needed
10. Instrumentation, Dashboards and Decision Triggers (MEDIUM)
monitor-log-every-query-with-full-context— structured replayable eventsmonitor-scrub-pii-from-query-logs— redact before warehouse ingestionmonitor-build-search-health-dashboard— threshold lines, colour bandsmonitor-alert-on-decision-triggers— quality metrics, not error ratesmonitor-track-ranking-stability-churn— RBO churn as leading indicatormonitor-run-weekly-search-quality-review— calendar-driven ritual
Planning and Improving
Two playbooks compose the rules into end-to-end workflows:
references/playbooks/planning.md— Plan a new marketplace retrieval system from scratch. Nine-step workflow from intent audit through the first A/B-tested online lift, with explicit exit criteria per step.references/playbooks/improving.md— Diagnose and improve an existing retrieval system. Decision tree that walks through telemetry, index freshness, coverage, baseline gap, cold start, segment regressions, and algorithm iteration in that order, with hand-off points tomarketplace-personalisationwhen the bottleneck is personalisation-specific.
Read the playbooks first when the task is "design a new search and recommender project" or "this retrieval system needs to get better". Read individual rules when a specific question arises during implementation or review.
How to Use
- Read
references/_sections.mdfor category structure and cascade rationale. - Read
gotchas.mdfor diagnostic lessons accumulated from prior incidents. - Read
references/playbooks/planning.mdto plan a new system. - Read
references/playbooks/improving.mdto diagnose an existing one. - Read individual rule files when a specific task matches the rule title.
- Use
assets/templates/_template.mdto author new rules as the skill grows.
Related Skills
marketplace-personalisation— The companion skill covering AWS Personalize implementation, impression tracking, schema design, two-sided matching, feedback loops, and the personalisation-specific diagnostic playbook. Hand off to this skill when the diagnostic identifies a personalisation-specific bottleneck.
Reference Files
| File | Description | |------|-------------| | references/_sections.md | Category definitions and impact ordering | | references/playbooks/planning.md | Plan a new retrieval system | | references/playbooks/improving.md | Diagnose an existing retrieval system | | gotchas.md | Accumulated diagnostic lessons (living) | | assets/templates/_template.md | Template for authoring new rules | | metadata.json | Version, discipline, references |