Marketplace Engineering Two-Sided Personalisation Best Practices Skill

Marketplace Engineering Two-Sided Personalisation Best Practices

Comprehensive guide for designing, building and improving personalisation and recommendation systems in two-sided trust marketplaces on AWS Personalize. Contains 49 rules across 9 categories, ordered by cascade impact on the personalisation lifecycle, plus two playbooks for planning a new system from scratch and diagnosing an existing one.

When to Apply

Reference this skill when:

Designing the event schema and tracking for a new recommender system
Choosing an AWS Personalize recipe (USER_PERSONALIZATION_v2, SIMS, PERSONALIZED_RANKING_v2)
Writing or reviewing candidate-generation and re-ranking code for marketplace search or homefeed
Handling cold start for new providers, new seekers, or new catalog regions
Diagnosing a live system that "mostly works but feels stale, unfair, or unpersonalised"
Planning the next experiment, baseline comparison, or A/B test for the recommender
Investigating concentration, coverage collapse, death spirals, or training-serving skew
Adding observability dashboards, drift detection, or online metric slicing

Setup

This skill has no user-specific configuration — it is self-contained. References are live URLs to official AWS Personalize documentation, academic papers on bias and exposure, and engineering blogs from Airbnb and DoorDash.

Rule Categories

Categories are ordered by cascade impact: earlier stages poison everything downstream.

| # | Category | Prefix | Impact | |---|----------|--------|--------| | 1 | Event Tracking and Capture | track- | CRITICAL | | 2 | Dataset and Schema Design | schema- | CRITICAL | | 3 | Two-Sided Matching Patterns | match- | CRITICAL | | 4 | Simple Baselines and Theory of Constraints | simple- | HIGH | | 5 | Feedback Loops and Bias Control | loop- | HIGH | | 6 | Cold Start and Coverage | cold- | HIGH | | 7 | Recipe and Pipeline Selection | recipe- | MEDIUM-HIGH | | 8 | Inference, Filters and Re-ranking | infer- | MEDIUM-HIGH | | 9 | Observability and Online Metrics | obs- | MEDIUM-HIGH |

Quick Reference

1. Event Tracking and Capture (CRITICAL)

track-log-impressions-alongside-clicks — the denominator that turns clicks into a rate and unlocks unbiased training
track-use-stable-opaque-item-ids — prevents history loss when listings rename or move
track-stamp-events-with-request-id — the join key that enables impression-to-outcome attribution
track-stream-events-via-putevents — real-time adaptation versus end-of-day bulk import
track-capture-negative-signals — dismissal is information, silence is not
track-measure-outcomes-not-clicks — reward the completed booking, not the clickbait

2. Dataset and Schema Design (CRITICAL)

schema-design-conservatively — Interactions schemas are immutable, Users/Items are painful to change
schema-keep-user-item-thin — volatile fields belong in events
schema-enforce-metadata-freshness — PutItems on every metadata change
schema-prefer-categorical-fields — unlock per-value features
schema-weight-event-value — align the model with the business outcome
schema-include-context-everywhere — train-serve feature parity
schema-meet-minimum-dataset-sizes — 50 users / 50 items / 1000 interactions before training

3. Two-Sided Matching Patterns (CRITICAL)

match-rank-mutual-fit — rank by mutual accept probability
match-hard-filter-before-ranking — retrieval enforces feasibility
match-cap-provider-exposure — diversity as a fairness constraint
match-model-capacity-constraints — capacity-discounted scoring
match-balance-supply-demand — per-segment strategy routing

4. Simple Baselines and Theory of Constraints (HIGH)

simple-ship-popularity-baseline — a reference point that every ML model must beat
simple-find-bottleneck-first — diagnostic before optimisation
simple-heuristic-rerank-cold-cohorts — trust × recency × proximity
simple-budget-complexity — ship or kill criterion before running
simple-audit-before-build — telemetry audit gates model work
simple-measure-gap-to-baseline — baseline retained as permanent minority bucket

5. Feedback Loops and Bias Control (HIGH)

loop-log-ranking-slot — slot data for position-bias correction
loop-reserve-random-exploration — unbiased training data
loop-optimize-completed-outcome — reward the goal, not the proxy
loop-decay-event-weights — old preferences fade
loop-detect-death-spirals — exposure Gini as a leading indicator

6. Cold Start and Coverage (HIGH)

cold-use-v2-recipe-with-metadata — metadata extrapolates to new listings
cold-best-of-segment-popularity — segmentation beats global top-N
cold-capture-onboarding-intent — ask instead of guessing
cold-reserve-exploration-slots — promotions filter for fresh inventory
cold-tag-cold-start-recs — warm-versus-cold metric slicing

7. Recipe and Pipeline Selection (MEDIUM-HIGH)

recipe-default-to-user-personalization-v2 — discovery default
recipe-sims-for-item-page-only — similar-items is not a homepage recipe
recipe-personalized-ranking-as-reranker — not a candidate generator
recipe-build-candidate-rerank-pipeline — two layers, two concerns
recipe-defer-hpo-until-baseline-measured — prove the model before tuning

8. Inference, Filters and Re-ranking (MEDIUM-HIGH)

infer-use-filters-api — Personalize backfills to numResults
infer-rerank-rules-after-model — preserve the model distribution
infer-deduplicate-canonical-entity — provider-level dedup, not listing-level
infer-enforce-exposure-caps — rolling fairness constraints
infer-cache-responses-short-ttl — session continuity and cost control

9. Observability and Online Metrics (MEDIUM-HIGH)

obs-always-ab-test — before-and-after is never enough
obs-track-coverage-and-gini — exposure-health signals
obs-slice-metrics-by-segment — aggregate metrics hide segment regressions
obs-watch-online-offline-divergence — proxy overfitting detector
obs-alarm-on-prediction-drift — distribution KL-divergence as early warning

Planning and Improving Recommendations

Two playbooks drive end-to-end workflows that compose the rules above:

references/playbooks/planning.md — Plan a new recommender system from scratch: a nine-step workflow that starts with instrumentation and ends with the first A/B-tested ML lift over a popularity baseline.
references/playbooks/improving.md — Diagnose and improve an existing recommender: a decision tree that identifies the current bottleneck (telemetry, freshness, coverage, feedback loop, algorithm) and routes to the specific rules that fix it.

Read the playbooks first when the task is "design a recommender" or "this recommender is underperforming". Read the individual rules when a specific question arises during implementation or review.

How to Use

Read references/_sections.md for category structure and impact ordering.
Read individual rule files under references/ when a specific rule matches the task at hand.
Read references/playbooks/planning.md to design a new system.
Read references/playbooks/improving.md to diagnose an existing system.
Use assets/templates/_template.md to author new rules as the skill grows.

Reference Files

| File | Description | |------|-------------| | references/_sections.md | Category definitions, impact ordering, cascade rationale | | references/playbooks/planning.md | Planning playbook for a new recommender | | references/playbooks/improving.md | Diagnostic playbook for an existing recommender | | assets/templates/_template.md | Template for authoring new rules | | metadata.json | Version, discipline, authoritative reference URLs |

Agent Skills: Marketplace Engineering Two-Sided Personalisation Best Practices

Install this agent skill to your local

Skill Files