Marketplace Engineering Two-Sided Personalisation Best Practices
Comprehensive guide for designing, building and improving personalisation and recommendation systems in two-sided trust marketplaces on AWS Personalize. Contains 49 rules across 9 categories, ordered by cascade impact on the personalisation lifecycle, plus two playbooks for planning a new system from scratch and diagnosing an existing one.
When to Apply
Reference this skill when:
- Designing the event schema and tracking for a new recommender system
- Choosing an AWS Personalize recipe (USER_PERSONALIZATION_v2, SIMS, PERSONALIZED_RANKING_v2)
- Writing or reviewing candidate-generation and re-ranking code for marketplace search or homefeed
- Handling cold start for new providers, new seekers, or new catalog regions
- Diagnosing a live system that "mostly works but feels stale, unfair, or unpersonalised"
- Planning the next experiment, baseline comparison, or A/B test for the recommender
- Investigating concentration, coverage collapse, death spirals, or training-serving skew
- Adding observability dashboards, drift detection, or online metric slicing
Setup
This skill has no user-specific configuration — it is self-contained. References are live URLs to official AWS Personalize documentation, academic papers on bias and exposure, and engineering blogs from Airbnb and DoorDash.
Rule Categories
Categories are ordered by cascade impact: earlier stages poison everything downstream.
| # | Category | Prefix | Impact |
|---|----------|--------|--------|
| 1 | Event Tracking and Capture | track- | CRITICAL |
| 2 | Dataset and Schema Design | schema- | CRITICAL |
| 3 | Two-Sided Matching Patterns | match- | CRITICAL |
| 4 | Simple Baselines and Theory of Constraints | simple- | HIGH |
| 5 | Feedback Loops and Bias Control | loop- | HIGH |
| 6 | Cold Start and Coverage | cold- | HIGH |
| 7 | Recipe and Pipeline Selection | recipe- | MEDIUM-HIGH |
| 8 | Inference, Filters and Re-ranking | infer- | MEDIUM-HIGH |
| 9 | Observability and Online Metrics | obs- | MEDIUM-HIGH |
Quick Reference
1. Event Tracking and Capture (CRITICAL)
track-log-impressions-alongside-clicks— the denominator that turns clicks into a rate and unlocks unbiased trainingtrack-use-stable-opaque-item-ids— prevents history loss when listings rename or movetrack-stamp-events-with-request-id— the join key that enables impression-to-outcome attributiontrack-stream-events-via-putevents— real-time adaptation versus end-of-day bulk importtrack-capture-negative-signals— dismissal is information, silence is nottrack-measure-outcomes-not-clicks— reward the completed booking, not the clickbait
2. Dataset and Schema Design (CRITICAL)
schema-design-conservatively— Interactions schemas are immutable, Users/Items are painful to changeschema-keep-user-item-thin— volatile fields belong in eventsschema-enforce-metadata-freshness— PutItems on every metadata changeschema-prefer-categorical-fields— unlock per-value featuresschema-weight-event-value— align the model with the business outcomeschema-include-context-everywhere— train-serve feature parityschema-meet-minimum-dataset-sizes— 50 users / 50 items / 1000 interactions before training
3. Two-Sided Matching Patterns (CRITICAL)
match-rank-mutual-fit— rank by mutual accept probabilitymatch-hard-filter-before-ranking— retrieval enforces feasibilitymatch-cap-provider-exposure— diversity as a fairness constraintmatch-model-capacity-constraints— capacity-discounted scoringmatch-balance-supply-demand— per-segment strategy routing
4. Simple Baselines and Theory of Constraints (HIGH)
simple-ship-popularity-baseline— a reference point that every ML model must beatsimple-find-bottleneck-first— diagnostic before optimisationsimple-heuristic-rerank-cold-cohorts— trust × recency × proximitysimple-budget-complexity— ship or kill criterion before runningsimple-audit-before-build— telemetry audit gates model worksimple-measure-gap-to-baseline— baseline retained as permanent minority bucket
5. Feedback Loops and Bias Control (HIGH)
loop-log-ranking-slot— slot data for position-bias correctionloop-reserve-random-exploration— unbiased training dataloop-optimize-completed-outcome— reward the goal, not the proxyloop-decay-event-weights— old preferences fadeloop-detect-death-spirals— exposure Gini as a leading indicator
6. Cold Start and Coverage (HIGH)
cold-use-v2-recipe-with-metadata— metadata extrapolates to new listingscold-best-of-segment-popularity— segmentation beats global top-Ncold-capture-onboarding-intent— ask instead of guessingcold-reserve-exploration-slots— promotions filter for fresh inventorycold-tag-cold-start-recs— warm-versus-cold metric slicing
7. Recipe and Pipeline Selection (MEDIUM-HIGH)
recipe-default-to-user-personalization-v2— discovery defaultrecipe-sims-for-item-page-only— similar-items is not a homepage reciperecipe-personalized-ranking-as-reranker— not a candidate generatorrecipe-build-candidate-rerank-pipeline— two layers, two concernsrecipe-defer-hpo-until-baseline-measured— prove the model before tuning
8. Inference, Filters and Re-ranking (MEDIUM-HIGH)
infer-use-filters-api— Personalize backfills to numResultsinfer-rerank-rules-after-model— preserve the model distributioninfer-deduplicate-canonical-entity— provider-level dedup, not listing-levelinfer-enforce-exposure-caps— rolling fairness constraintsinfer-cache-responses-short-ttl— session continuity and cost control
9. Observability and Online Metrics (MEDIUM-HIGH)
obs-always-ab-test— before-and-after is never enoughobs-track-coverage-and-gini— exposure-health signalsobs-slice-metrics-by-segment— aggregate metrics hide segment regressionsobs-watch-online-offline-divergence— proxy overfitting detectorobs-alarm-on-prediction-drift— distribution KL-divergence as early warning
Planning and Improving Recommendations
Two playbooks drive end-to-end workflows that compose the rules above:
references/playbooks/planning.md— Plan a new recommender system from scratch: a nine-step workflow that starts with instrumentation and ends with the first A/B-tested ML lift over a popularity baseline.references/playbooks/improving.md— Diagnose and improve an existing recommender: a decision tree that identifies the current bottleneck (telemetry, freshness, coverage, feedback loop, algorithm) and routes to the specific rules that fix it.
Read the playbooks first when the task is "design a recommender" or "this recommender is underperforming". Read the individual rules when a specific question arises during implementation or review.
How to Use
- Read
references/_sections.mdfor category structure and impact ordering. - Read individual rule files under
references/when a specific rule matches the task at hand. - Read
references/playbooks/planning.mdto design a new system. - Read
references/playbooks/improving.mdto diagnose an existing system. - Use
assets/templates/_template.mdto author new rules as the skill grows.
Reference Files
| File | Description | |------|-------------| | references/_sections.md | Category definitions, impact ordering, cascade rationale | | references/playbooks/planning.md | Planning playbook for a new recommender | | references/playbooks/improving.md | Diagnostic playbook for an existing recommender | | assets/templates/_template.md | Template for authoring new rules | | metadata.json | Version, discipline, authoritative reference URLs |