Enterprise Artifact Search Skill (Robust)
This skill delegates multi-hop artifact retrieval + structured entity extraction to a lightweight subagent, keeping the main agent’s context lean.
It is designed for datasets where a workspace contains many interlinked artifacts (documents, chat logs, meeting transcripts, PRs, URLs) plus reference metadata (employee/customer directories).
This version adds two critical upgrades:
- Product grounding & anti-distractor filtering (prevents mixing CoFoAIX/other products when asked about CoachForce).
- Key reviewer extraction rules (prevents “meeting participants == reviewers” mistake; prefers explicit reviewers, then evidence-based contributors).
When to Invoke This Skill
Invoke when ANY of the following is true:
- The question requires multi-hop evidence gathering (artifact → references → other artifacts).
- The answer must be retrieved from artifacts (IDs/names/dates/roles), not inferred.
- Evidence is scattered across multiple artifact types (docs + slack + meetings + PRs + URLs).
- You need precise pointers (doc_id/message_id/meeting_id/pr_id) to justify outputs.
- You must keep context lean and avoid loading large files into context.
Why Use This Skill?
Without this skill: you manually grep many files, risk missing cross-links, and often accept the first “looks right” report (common failure: wrong product).
With this skill: a subagent:
- locates candidate artifacts fast
- follows references across channels/meetings/docs/PRs
- extracts structured entities (employee IDs, doc IDs)
- verifies product scope to reject distractors
- returns a compact evidence map with artifact pointers
Typical context savings: 70–95%.
Invocation
Use this format:
Task(subagent_type="enterprise-artifact-search", prompt="""
Dataset root: /root/DATA
Question: <paste the question verbatim>
Output requirements:
- Return JSON-ready extracted entities (employee IDs, doc IDs, etc.).
- Provide evidence pointers: artifact_id(s) + short supporting snippets.
Constraints:
- Avoid oracle/label fields (ground_truth, gold answers).
- Prefer primary artifacts (docs/chat/meetings/PRs/URLs) over metadata-only shortcuts.
- MUST enforce product grounding: only accept artifacts proven to be about the target product.
""")
Core Procedure (Must Follow)
Step 0 — Parse intent + target product
- Extract:
- target product name (e.g., “CoachForce”)
- entity types needed (e.g., author employee IDs, key reviewer employee IDs)
- artifact types likely relevant (“Market Research Report”, docs, review threads)
If product name is missing in question, infer cautiously from nearby context ONLY if explicitly supported by artifacts; otherwise mark AMBIGUOUS.
Step 1 — Build candidate set (wide recall, then filter)
Search in this order:
- Product artifact file(s):
/root/DATA/products/<Product>.jsonif exists. - Global sweep (if needed): other product files and docs that mention the product name.
- Within found channels/meetings: follow doc links (e.g.,
/archives/docs/<doc_id>), referenced meeting chats, PR mentions.
Collect all candidates matching:
- type/document_type/title contains “Market Research Report” (case-insensitive)
- OR doc links/slack text contains “Market Research Report”
- OR meeting transcripts tagged document_type “Market Research Report”
Step 2 — HARD Product Grounding (Anti-distractor gate)
A candidate report is VALID only if it passes at least 2 independent grounding signals:
Grounding signals (choose any 2+):
A) Located under the correct product artifact container (e.g., inside products/CoachForce.json and associated with that product’s planning channels/meetings).
B) Document content/title explicitly mentions the target product name (“CoachForce”) or a canonical alias list you derive from artifacts.
C) Shared in a channel whose name is clearly for the target product (e.g., planning-CoachForce, #coachforce-*) OR a product-specific meeting series (e.g., CoachForce_planning_*).
D) The document id/link path contains a product-specific identifier consistent with the target product (not another product).
E) A meeting transcript discussing the report includes the target product context in the meeting title/series/channel reference.
Reject rule (very important):
- If the report content repeatedly names a different product (e.g., “CoFoAIX”) and lacks CoachForce grounding → mark as DISTRACTOR and discard, even if it is found in the same file or near similar wording.
Why: Benchmarks intentionally insert same doc type across products; “first hit wins” is a common failure.
Step 3 — Select the correct report version
If multiple VALID reports exist, choose the “final/latest” by this precedence:
- Explicit “latest” marker (id/title/link contains
latest, or most recent date field) - Explicit “final” marker
- Otherwise, pick the most recent by
datefield - If dates missing, choose the one most frequently referenced in follow-up discussions (slack replies/meeting chats)
Keep the selected report’s doc_id and link as the anchor.
Step 4 — Extract author(s)
Extract authors in this priority order:
- Document fields:
author,authors,created_by,owner - PR fields if the report is introduced via PR:
author,created_by - Slack: the user who posted “Here is the report…” message (only if it clearly links to the report doc_id and is product-grounded)
Normalize into employee IDs:
- If already an
eid_*, keep it. - If only a name appears, resolve via employee directory metadata (name → employee_id) but only after you have product-grounded evidence.
Step 5 — Extract key reviewers (DO NOT equate “participants” with reviewers)
Key reviewers must be evidence-based contributors, not simply attendees.
Use this priority order:
Tier 1 (best): explicit reviewer fields
- Document fields:
reviewers,key_reviewers,approvers,requested_reviewers - PR fields:
reviewers,approvers,requested_reviewers
Tier 2: explicit feedback authors
- Document
feedbacksections that attribute feedback to specific people/IDs - Meeting transcripts where turns are attributable to people AND those people provide concrete suggestions/edits
Tier 3: slack thread replies to the report-share message
- Only include users who reply with substantive feedback/suggestions/questions tied to the report.
- Exclude:
- the author (unless question explicitly wants them included as reviewer too)
- pure acknowledgements (“looks good”, “thanks”) unless no other reviewers exist
Critical rule:
- Meeting
participantslist alone is NOT sufficient.- Only count someone as a key reviewer if the transcript shows they contributed feedback
- OR they appear in explicit reviewer fields.
If the benchmark expects “key reviewers” to be “the people who reviewed in the review meeting”, then your evidence must cite the transcript lines/turns that contain their suggestions.
Step 6 — Validate IDs & de-duplicate
- All outputs must be valid employee IDs (pattern
eid_...) and exist in the employee directory if provided. - Remove duplicates while preserving order:
- authors first
- key reviewers next
Output Format (Strict, JSON-ready)
Return:
1) Final Answer Object
{
"target_product": "<ProductName>",
"report_doc_id": "<doc_id>",
"author_employee_ids": ["eid_..."],
"key_reviewer_employee_ids": ["eid_..."],
"all_employee_ids_union": ["eid_..."]
}
2) Evidence Map (pointers + minimal snippets)
For each extracted ID, include:
- artifact type + artifact id (doc_id / meeting_id / slack_message_id / pr_id)
- a short snippet that directly supports the mapping
Example evidence record:
{
"employee_id": "eid_xxx",
"role": "key_reviewer",
"evidence": [
{
"artifact_type": "meeting_transcript",
"artifact_id": "CoachForce_planning_2",
"snippet": "…Alex: We should add a section comparing CoachForce to competitor X…"
}
]
}
Recommendation Types
Return one of:
- USE_EVIDENCE — evidence sufficient and product-grounded
- NEED_MORE_SEARCH — missing reviewer signals; must expand search (PRs, slack replies, other meetings)
- AMBIGUOUS — conflicting product signals or multiple equally valid reports
Common Failure Modes (This skill prevents them)
-
Cross-product leakage
Picking “Market Research Report” for another product (e.g., CoFoAIX) because it appears first.
→ Fixed by Step 2 (2-signal product grounding). -
Over-inclusive reviewers
Treating all meeting participants as reviewers.
→ Fixed by Step 5 (evidence-based reviewer definition). -
Wrong version
Choosing draft over final/latest.
→ Fixed by Step 3. -
Schema mismatch
Returning a flat list when evaluator expects split fields.
→ Fixed by Output Format.
Mini Example (Your case)
Question:
“Find employee IDs of the authors and key reviewers of the Market Research Report for the CoachForce product?”
Correct behavior:
- Reject any report whose content/links are clearly about CoFoAIX unless it also passes 2+ CoachForce grounding signals.
- Select CoachForce’s final/latest report.
- Author from doc field
author. - Key reviewers from explicit
reviewers/key_reviewersif present; else from transcript turns or slack replies showing concrete feedback.
Do NOT Invoke When
- The answer is in a single small known file and location with no cross-references.
- The task is a trivial one-hop lookup and product scope is unambiguous.