Agent-Skills.md

Agent Skills: Dedupe + Rank

|

UncategorizedID: willoscar/research-units-pipeline-skills/dedupe-rank

Author

willoscar

https://github.com/willoscar View all skills

Repository

willoscar/research-units-pipeline-skills

WILLOSCAR

37425

Install this agent skill to your local

pnpm dlx add-skill https://github.com/WILLOSCAR/research-units-pipeline-skills/tree/HEAD/.codex/skills/dedupe-rank

Skill Files

Browse the full folder contents for dedupe-rank.

Loading file tree…

.codex/skills/dedupe-rank/SKILL.md

Skill Metadata

Name: dedupe-rank
Description: |

Dedupe + Rank

Turns a raw candidate pool into a deduped pool and a stable core set.

Input

papers/papers_raw.jsonl

Outputs

papers/papers_dedup.jsonl
papers/core_set.csv

Script boundary

scripts/run.py should own only:

title/year deduplication
deterministic ranking
stable paper_id generation

Use shared domain packs or pipeline contract metadata for topic-specific or product-specific behavior.

Contract-driven behavior

The script should prefer pipeline contract metadata over profile-name branching.

Current important field:

quality_contract.candidate_pool_policy.keep_full_deduped_pool

If true, the script keeps the full deduped pool in papers/core_set.csv unless the user explicitly overrides core size.

Acceptance

deduped JSONL exists
core-set CSV exists
reruns are stable for the same inputs

Non-goals

retrieval
screening
manual topic authoring inside the script