Research & Literature Gathering Skill
Overview
Use this skill when implementing calculations requires identifying applicable standards, gathering reference literature, and mapping gaps. It queries three existing data sources and produces a structured YAML research brief.
Inputs
- category: engineering discipline (e.g.
geotechnical,structural,subsea) - subcategory: specific topic (e.g.
pile_capacity,fatigue,viv_analysis)
8-Step Workflow
Step 1 — Query the Standards Ledger
Find standards already tracked for the domain:
uv run --no-project python scripts/data/document-index/query-ledger.py \
--domain <category> --verbose
Record each standard's status (gap, done, wrk_captured, reference).
Step 2 — Query the Document Index
Search the 1M-record doc index for relevant documents:
uv run --no-project python -c "
import json
from collections import Counter
matches = []
with open('data/document-index/index.jsonl') as f:
for line in f:
rec = json.loads(line)
path_lower = rec.get('path', '').lower()
summary_lower = (rec.get('summary') or '').lower()
if '<category>' in path_lower or '<subcategory>' in path_lower \
or '<category>' in summary_lower or '<subcategory>' in summary_lower:
matches.append(rec)
print(f'Found {len(matches)} documents')
by_source = Counter(r['source'] for r in matches)
for s, c in by_source.most_common():
print(f' {s}: {c}')
"
Prioritize og_standards and ace_standards sources over project files.
Step 3 — Cross-Reference Capability Map
Identify what is implemented vs. gap in the target repo:
uv run --no-project python -c "
import yaml
with open('specs/capability-map/digitalmodel.yaml') as f:
data = yaml.safe_load(f)
for m in data['modules']:
if '<subcategory>' in m['module'] or '<category>' in m['module']:
print(f\"Module: {m['module']} ({m['standards_count']} standards)\")
for s in m.get('standards', [])[:30]:
print(f\" {s['status']:15s} {s['org']:8s} {s['id'][:70]}\")
"
Also check assetutilities.yaml and worldenergydata.yaml if the category
may span repos.
Step 4 — Produce the Research Brief
Save as specs/capability-map/research-briefs/<category>-<subcategory>.yaml
using the template below.
Step 5 — Search University & Academic Resources
University coursework and textbooks are high-value sources — they contain worked examples with verified answers, ideal for TDD test assertions and calculation-report YAML examples.
- Search the doc index for university/academic materials:
- Keywords: course name, textbook author, university, lecture, homework, example problem
- Sources:
ace_project,dde_project(may contain archived coursework)
- Search for relevant textbook chapters and problem sets:
- Structural: Roark's Formulas, Shigley, Timoshenko
- Geotechnical: Das, Coduto, API RP 2GEO worked examples
- Hydrodynamics: DNV-RP-C205 examples, Faltinsen, Chakrabarti
- Pipeline: Bai & Bai, Mousselli, Palmer & King
- Financial: Hull (Options), Bodie/Kane/Marcus (Investments)
- Archive all coursework material as dark intelligence:
- Save to
knowledge/dark-intelligence/<category>/<subcategory>/ - These are private resources not publicly shared
- Used to inform implementations, generate test data, validate calculations
- Include: problem statements, known inputs/outputs, solution methodology
- Save to
Add to the research brief under university_resources and worked_examples.
Step 6 — Document Download Tasks
For each standard not yet available locally:
- First: check doc index for existing copies (
og_standards,ace_standards) - Second: check O&G Standards SQLite at
/mnt/ace/O&G-Standards/_inventory.db - Third: search public sources (standard body websites, university repos, OpenCourseWare)
- Fourth: search university digital libraries (MIT OCW, Stanford, TU Delft open access)
- Fifth: flag as
paywalled — manual download requiredif not freely available
Hand off actual downloads to the doc-research-download skill.
Step 7 — Deep Online Research
Use WebSearch to find freely available PDFs, papers, and technical references
for standards identified as needs_download or paywalled:
# Generate research brief from existing data sources first
uv run --no-project python scripts/data/research-literature/research-domain.py \
--category <category> --repo <repo>
Then use WebSearch/WebFetch to find:
- Free PDFs from standard body websites (DNV Veracity, API publications)
- Open-access papers from OnePetro, ISOPE, OTC archives
- University lecture notes and textbook chapters
- Technical guidance documents from BOEM, BSEE, HSE UK
Update the research brief with discovered URLs and availability status.
Step 8 — Download Script Generation
Generate a curl/wget-based download script for the domain:
uv run --no-project python scripts/data/research-literature/research-domain.py \
--category <category> --repo <repo> --generate-download-script
This creates download-literature.sh at the domain's /mnt/ace/ literature path.
The script sources scripts/lib/download-helpers.sh and supports --dry-run.
After generation, manually curate the script:
- Add discovered URLs from Step 7
- Set proper filenames:
<author>-<year>-<short-title>.pdf - Run
--dry-runto verify - Execute and validate with
file *.pdf(reject HTML/WAF responses)
Domain-to-Repo Mapping
See config/research-literature/domain-repo-map.yaml for the full mapping.
| Domain | Repo | Tier | |--------|------|------| | geotechnical | digitalmodel | 1 | | cathodic_protection | digitalmodel | 1 | | structural | digitalmodel | 1 | | hydrodynamics | digitalmodel | 1 | | drilling | OGManufacturing | 1 | | pipeline | digitalmodel | 1 | | bsee | worldenergydata | 1 | | metocean | worldenergydata | 1 | | subsea | digitalmodel | 1 | | naval_architecture | digitalmodel | 1 | | mooring | digitalmodel | 2 | | risers | digitalmodel | 2 | | economics | worldenergydata | 3 |
Research Brief Template
# research-brief-<category>-<subcategory>.yaml
category: "<category>"
subcategory: "<subcategory>"
generated: "YYYY-MM-DD"
applicable_standards:
- id: "<STANDARD-ID>"
title: "<full title>"
org: "<DNV/API/ISO/etc>"
status: "available|needs_download|paywalled"
doc_path: "<path in index or null>"
key_sections: ["Sec X.Y — relevant topic"]
available_documents:
- path: "<path from index>"
source: "<og_standards|ace_standards|etc>"
summary: "<from Phase B if available>"
relevance: "high|medium|low"
download_tasks:
- standard: "<STANDARD-ID>"
url: "<where to find it>"
notes: "paywalled — check ace_standards first"
key_equations:
- name: "<equation name>"
standard: "<STANDARD-ID>"
section: "<Sec X.Y>"
latex: "<LaTeX if known>"
description: "<what it computes>"
university_resources:
- source: "<textbook/course/lecture>"
title: "<title>"
author: "<author or institution>"
relevance: "high|medium|low"
archived_at: "knowledge/dark-intelligence/<category>/<subcategory>/<filename>"
worked_examples_count: N
notes: "<what makes this useful>"
worked_examples:
- source: "<STANDARD-ID or textbook>"
section: "<Sec X.Y or Ch N>"
description: "<example problem description>"
inputs: {}
expected_output: {}
use_as_test: true # flag for TDD test generation
implementation_target:
repo: "<digitalmodel|worldenergydata|etc>"
module: "<discipline>/<module>"
existing_code: "<path if any>"
calc_report_template: "examples/reporting/<name>.yaml"
AC Checklist
- [ ] Standards ledger queried for domain
- [ ] Doc index searched with category and subcategory keywords
- [ ] Capability map cross-referenced for implementation status
- [ ] University/academic resources searched (textbooks, coursework, OCW)
- [ ] Worked examples with known answers identified for TDD tests
- [ ] Coursework materials archived in
knowledge/dark-intelligence/<category>/ - [ ] Research brief YAML saved to
specs/capability-map/research-briefs/ - [ ] Download tasks identified with availability status
- [ ] Brief reviewed for completeness before handing off to implementation WRK
- [ ] Deep online research performed (WebSearch for free PDFs and papers)
- [ ] Download script generated via
--generate-download-script - [ ] Download script manually curated with discovered URLs
- [ ] Downloads validated with
file *.pdf(no HTML/WAF responses)
See also
- data/dark-intel — dark intelligence archive where research materials and worked examples are stored