Agent Skills: Dedupe + Rank

|

UncategorizedID: willoscar/research-units-pipeline-skills/dedupe-rank

Install this agent skill to your local

pnpm dlx add-skill https://github.com/WILLOSCAR/research-units-pipeline-skills/tree/HEAD/.codex/skills/dedupe-rank

Skill Files

Browse the full folder contents for dedupe-rank.

Download Skill

Loading file tree…

.codex/skills/dedupe-rank/SKILL.md

Skill Metadata

Name
dedupe-rank
Description
|

Dedupe + Rank

Turns a raw candidate pool into a deduped pool and a stable core set.

Input

  • papers/papers_raw.jsonl

Outputs

  • papers/papers_dedup.jsonl
  • papers/core_set.csv

Script boundary

scripts/run.py should own only:

  • title/year deduplication
  • deterministic ranking
  • stable paper_id generation

Use shared domain packs or pipeline contract metadata for topic-specific or product-specific behavior.

Contract-driven behavior

The script should prefer pipeline contract metadata over profile-name branching.

Current important field:

  • quality_contract.candidate_pool_policy.keep_full_deduped_pool

If true, the script keeps the full deduped pool in papers/core_set.csv unless the user explicitly overrides core size.

Acceptance

  • deduped JSONL exists
  • core-set CSV exists
  • reruns are stable for the same inputs

Non-goals

  • retrieval
  • screening
  • manual topic authoring inside the script