# Literature Search Strategies

## Effective Techniques for Finding Scientific Evidence

Comprehensive literature search is essential for grounding hypotheses in existing evidence. This reference provides strategies for both PubMed (biomedical literature) and general scientific search.

## Search Strategy Framework

### Three-Phase Approach

1. **Broad exploration:** Understand the landscape and identify key concepts
2. **Focused searching:** Target specific mechanisms, theories, or findings
3. **Citation mining:** Follow references and related articles from key papers

### Before You Search

**Clarify search goals:**
- What aspects of the phenomenon need evidence?
- What types of studies are most relevant (reviews, primary research, methods)?
- What time frame is relevant (recent only, or historical context)?
- What level of evidence is needed (mechanistic, correlational, causal)?

## PubMed Search Strategies

### When to Use PubMed

Use WebFetch with PubMed URLs for:
- Biomedical and life sciences research
- Clinical studies and medical literature
- Molecular, cellular, and physiological mechanisms
- Disease etiology and pathology
- Drug and therapeutic research

### Effective PubMed Search Techniques

#### 1. Start with Review Articles

**Why:** Reviews synthesize literature, identify key concepts, and provide comprehensive reference lists.

**Search strategy:**
- Add "review" to search terms
- Use PubMed filters: Article Type → Review, Systematic Review, Meta-Analysis
- Look for recent reviews (last 2-5 years)

**Example searches:**
- `https://pubmed.ncbi.nlm.nih.gov/?term=wound+healing+diabetes+review`
- `https://pubmed.ncbi.nlm.nih.gov/?term=gut+microbiome+cognition+systematic+review`

#### 2. Use MeSH Terms (Medical Subject Headings)

**Why:** MeSH terms are standardized vocabulary that captures concept variations.

**Strategy:**
- PubMed auto-suggests MeSH terms
- Helps find papers using different terminology for same concept
- More comprehensive than keyword-only searches

**Example:**
- Instead of just "heart attack," use MeSH term "Myocardial Infarction"
- Captures papers using "MI," "heart attack," "cardiac infarction," etc.

#### 3. Boolean Operators and Advanced Syntax

**AND:** Narrow search (all terms must be present)
- `diabetes AND wound healing AND inflammation`

**OR:** Broaden search (any term can be present)
- `(Alzheimer OR dementia) AND gut microbiome`

**NOT:** Exclude terms
- `cancer treatment NOT surgery`

**Quotes:** Exact phrases
- `"oxidative stress"`

**Wildcards:** Variations
- `gene*` finds gene, genes, genetic, genetics

#### 4. Filter by Publication Type and Date

**Publication types:**
- Clinical Trial
- Meta-Analysis
- Systematic Review
- Research Support, NIH
- Randomized Controlled Trial

**Date filters:**
- Recent work (last 2-5 years): Cutting-edge findings
- Historical work: Foundational studies
- Specific time periods: Track development of understanding

#### 5. Use "Similar Articles" and "Cited By"

**Strategy:**
- Find one highly relevant paper
- Click "Similar articles" for related work
- Use cited by tools to find newer work building on it

### PubMed Search Examples by Hypothesis Goal

**Mechanistic understanding:**
```
https://pubmed.ncbi.nlm.nih.gov/?term=(mechanism+OR+pathway)+AND+[phenomenon]+AND+(molecular+OR+cellular)
```

**Causal relationships:**
```
https://pubmed.ncbi.nlm.nih.gov/?term=[exposure]+AND+[outcome]+AND+(randomized+controlled+trial+OR+cohort+study)
```

**Biomarkers and associations:**
```
https://pubmed.ncbi.nlm.nih.gov/?term=[biomarker]+AND+[disease]+AND+(association+OR+correlation+OR+prediction)
```

**Treatment effectiveness:**
```
https://pubmed.ncbi.nlm.nih.gov/?term=[intervention]+AND+[condition]+AND+(efficacy+OR+effectiveness+OR+clinical+trial)
```

## General Scientific Web Search Strategies

### When to Use Web Search

Use WebSearch for:
- Non-biomedical sciences (physics, chemistry, materials, earth sciences)
- Interdisciplinary topics
- Recent preprints and unpublished work
- Grey literature (technical reports, conference proceedings)
- Broader context and cross-domain analogies

### Effective Web Search Techniques

#### 1. Use Domain-Specific Search Terms

**Include field-specific terminology:**
- Chemistry: "mechanism," "reaction pathway," "synthesis"
- Physics: "model," "theory," "experimental validation"
- Materials science: "properties," "characterization," "synthesis"
- Ecology: "population dynamics," "community structure"

#### 2. Target Academic Sources

**Search operators:**
- `site:arxiv.org` - Preprints (physics, CS, math, quantitative biology)
- `site:biorxiv.org` - Biology preprints
- `site:edu` - Academic institutions
- `filetype:pdf` - Academic papers (often)

**Example searches:**
- `superconductivity high temperature mechanism site:arxiv.org`
- `CRISPR off-target effects site:biorxiv.org`

#### 3. Search for Authors and Labs

**When you find a relevant paper:**
- Search for the authors' other work
- Find their lab website for unpublished work
- Identify key research groups in the field

#### 4. Use Google Scholar Approaches

**Strategies:**
- Use "Cited by" to find newer related work
- Use "Related articles" to expand search
- Set date ranges to focus on recent work
- Use author: operator to find specific researchers

#### 5. Combine General and Specific Terms

**Structure:**
- Specific phenomenon + general concept
- "tomato plant growth" + "bacterial promotion"
- "cognitive decline" + "gut microbiome"

**Boolean logic:**
- Use quotes for exact phrases: `"spike protein mutation"`
- Use OR for alternatives: `(transmissibility OR transmission rate)`
- Combine: `"spike protein" AND (transmissibility OR virulence) AND mutation`

## Cross-Database Search Strategies

### Comprehensive Literature Search Workflow

1. **Start with reviews (PubMed or Web Search):**
   - Identify key concepts and terminology
   - Note influential papers and researchers
   - Understand current state of field

2. **Focused primary research (PubMed):**
   - Search for specific mechanisms
   - Find experimental evidence
   - Identify methodologies

3. **Broaden with web search:**
   - Find related work in other fields
   - Locate recent preprints
   - Identify analogous systems

4. **Citation mining:**
   - Follow references from key papers
   - Use "cited by" to find recent work
   - Track influential studies

5. **Iterative refinement:**
   - Add new terms discovered in papers
   - Narrow if too many results
   - Broaden if too few relevant results

## Topic-Specific Search Strategies

### Mechanisms and Pathways

**Goal:** Understand how something works

**Search components:**
- Phenomenon + "mechanism"
- Phenomenon + "pathway"
- Phenomenon + specific molecules/pathways suspected

**Examples:**
- `diabetic wound healing mechanism inflammation`
- `autophagy pathway cancer`

### Associations and Correlations

**Goal:** Find what factors are related

**Search components:**
- Variable A + Variable B + "association"
- Variable A + Variable B + "correlation"
- Variable A + "predicts" + Variable B

**Examples:**
- `vitamin D cardiovascular disease association`
- `gut microbiome diversity predicts cognitive function`

### Interventions and Treatments

**Goal:** Evidence for what works

**Search components:**
- Intervention + condition + "efficacy"
- Intervention + condition + "randomized controlled trial"
- Intervention + condition + "treatment outcome"

**Examples:**
- `probiotic intervention depression randomized controlled trial`
- `exercise intervention cognitive decline efficacy`

### Methods and Techniques

**Goal:** How to test hypothesis

**Search components:**
- Method name + application area
- "How to measure" + phenomenon
- Technique + validation

**Examples:**
- `CRISPR screen cancer drug resistance`
- `measure protein-protein interaction methods`

### Analogous Systems

**Goal:** Find insights from related phenomena

**Search components:**
- Mechanism + different system
- Similar phenomenon + different organism/condition

**Examples:**
- If studying plant-microbe symbiosis: search `nitrogen fixation rhizobia legumes`
- If studying drug resistance: search `antibiotic resistance evolution mechanisms`

## Evaluating Paper Impact and Quality

### Citation Count Significance

Citation counts indicate influence and importance in the field. Interpret citations relative to paper age and field norms:

| Paper Age | Citations | Interpretation |
|-----------|-----------|----------------|
| 0-3 years | 20+ | Noteworthy - gaining traction |
| 0-3 years | 100+ | Highly Influential - significant impact already |
| 3-7 years | 100+ | Significant - established contribution |
| 3-7 years | 500+ | Landmark - major contribution to field |
| 7+ years | 500+ | Seminal - widely recognized important work |
| 7+ years | 1000+ | Foundational - field-defining paper |

**Field-specific considerations:**
- Biomedical/clinical: Higher citation norms (NEJM papers often 1000+)
- Computer Science: Conference citations matter more than journals
- Mathematics/Physics: Lower citation norms, longer citation half-lives
- Social Sciences: Moderate citation norms, high book citation rates

### Journal Impact Factor Guidance

**Tier 1 - Premier Venues (Always Prefer):**
- **General Science:** Nature (IF ~65), Science (IF ~55), Cell (IF ~65), PNAS (IF ~12)
- **Medicine:** NEJM (IF ~175), Lancet (IF ~170), JAMA (IF ~120), BMJ (IF ~93)
- **Field Flagships:** Nature Medicine, Nature Biotechnology, Nature Methods, Nature Genetics

**Tier 2 - High-Impact Specialized (Strong Preference):**
- Impact Factor >10
- Examples: JAMA Internal Medicine, Annals of Internal Medicine, Circulation, Blood
- Top ML/AI conferences: NeurIPS, ICML, ICLR (equivalent to IF 15-25)

**Tier 3 - Respected Specialized (Include When Relevant):**
- Impact Factor 5-10
- Established society journals
- Well-indexed specialty journals

**Tier 4 - Other Peer-Reviewed (Use Sparingly):**
- Impact Factor <5
- Only cite if directly relevant AND no better source exists

### Author Track Record Evaluation

Prefer papers from established researchers:

**Strong Author Indicators:**
- **High h-index:** >40 in established fields, >20 for early-career stars
- **Multiple Tier-1 publications:** Track record in Nature/Science/Cell family
- **Institutional affiliation:** Leading research universities and institutes
- **Recognition:** Awards, fellowships, editorial positions
- **First/last authorship:** On multiple highly-cited papers

**How to Check Author Reputation:**
1. Google Scholar profile: Check h-index, i10-index, total citations
2. PubMed: Search author name, review publication venues
3. Institutional page: Check position, awards, grants
4. ORCID profile: Full publication history

### Conference Ranking Awareness (Computer Science/AI)

For ML/AI and computer science topics, conference rankings matter:

**A* (Flagship) - Equivalent to Nature/Science:**
- NeurIPS (Neural Information Processing Systems)
- ICML (International Conference on Machine Learning)
- ICLR (International Conference on Learning Representations)
- CVPR (Computer Vision and Pattern Recognition)
- ACL (Association for Computational Linguistics)

**A (Excellent) - Equivalent to Tier-2 Journals:**
- AAAI, IJCAI (AI general)
- EMNLP, NAACL (NLP)
- ECCV, ICCV (Computer Vision)
- SIGKDD, WWW (Data Mining)

**B (Good) - Equivalent to Tier-3 Journals:**
- COLING, CoNLL (NLP)
- WACV, BMVC (Computer Vision)
- Most ACM/IEEE specialized conferences

## Evaluating Source Quality

### Primary Research Quality Indicators

**Strong quality signals:**
- Published in Tier-1 or Tier-2 venues
- High citation count for paper age
- Written by established researchers with strong track records
- Large sample sizes (for statistical power)
- Pre-registered studies (reduces bias)
- Appropriate controls and methods
- Consistent with other findings
- Transparent data and methods

**Red flags:**
- No peer review (use cautiously)
- Conflicts of interest not disclosed
- Methods not clearly described
- Extraordinary claims without extraordinary evidence
- Contradicts large body of evidence without explanation

### Review Quality Indicators

**Systematic reviews (highest quality):**
- Pre-defined search strategy
- Explicit inclusion/exclusion criteria
- Quality assessment of included studies
- Quantitative synthesis (meta-analysis)

**Narrative reviews (variable quality):**
- Expert synthesis of field
- May have selection bias
- Useful for context and framing
- Check author expertise and citations

## Time Management in Literature Search

### Allocate Search Time Appropriately

**For straightforward hypotheses (30-60 min):**
- 1-2 broad review articles
- 3-5 targeted primary research papers
- Quick web search for recent developments

**For complex hypotheses (1-3 hours):**
- Multiple reviews for different aspects
- 10-15 primary research papers
- Systematic search across databases
- Citation mining from key papers

**For contentious topics (3+ hours):**
- Systematic review approach
- Identify competing perspectives
- Track historical development
- Cross-reference findings

### Diminishing Returns

**Signs you've searched enough:**
- Finding the same papers repeatedly
- New searches yield mostly irrelevant papers
- Sufficient evidence to support/contextualize hypotheses
- Multiple independent lines of evidence converge

**When to search more:**
- Major gaps in understanding remain
- Conflicting evidence needs resolution
- Hypothesis seems inconsistent with literature
- Need specific methodological information

## Documenting Search Results

### Information to Capture

**For each relevant paper:**
- Full citation (authors, year, journal, title)
- Key findings relevant to hypothesis
- Study design and methods
- Limitations noted by authors
- How it relates to hypothesis

### Organizing Findings

**Group by:**
- Supporting evidence for hypothesis A, B, C
- Methodological approaches
- Conflicting findings requiring explanation
- Gaps in current knowledge

**Synthesis notes:**
- What is well-established?
- What is controversial or uncertain?
- What analogies exist in other systems?
- What methods are commonly used?

### Citation Organization for Hypothesis Reports

**For report structure:** Organize citations for two audiences:

**Main Text (15-20 key citations):**
- Most influential papers (highly cited, seminal studies)
- Recent definitive evidence (last 2-3 years)
- Key papers directly supporting each hypothesis (3-5 per hypothesis)
- Major reviews synthesizing the field

**Appendix A: Comprehensive Literature Review (40-60+ citations):**
- **Historical context:** Foundational papers establishing field
- **Current understanding:** Recent reviews and meta-analyses
- **Hypothesis-specific evidence:** 8-15 papers per hypothesis covering:
  - Direct supporting evidence
  - Analogous mechanisms in related systems
  - Methodological precedents
  - Theoretical framework papers
- **Conflicting findings:** Papers representing different viewpoints
- **Knowledge gaps:** Papers identifying limitations or unanswered questions

**Target citation density:** Aim for 50+ total references to provide comprehensive support for all claims and demonstrate thorough literature grounding.

**Grouping strategy for Appendix A:**
1. Background and context papers
2. Current understanding and established mechanisms
3. Evidence supporting each hypothesis (separate subsections)
4. Contradictory or alternative findings
5. Methodological and technical papers

## Practical Search Workflow

### Step-by-Step Process

1. **Define search goals (5 min):**
   - What aspects of phenomenon need evidence?
   - What would support or refute hypotheses?

2. **Broad review search (15-20 min):**
   - Find 1-3 review articles
   - Skim abstracts for relevance
   - Note key concepts and terminology

3. **Targeted primary research (30-45 min):**
   - Search for specific mechanisms/evidence
   - Read abstracts, scan figures and conclusions
   - Follow most promising references

4. **Cross-domain search (15-30 min):**
   - Look for analogies in other systems
   - Find recent preprints
   - Identify emerging trends

5. **Citation mining (15-30 min):**
   - Follow references from key papers
   - Use "cited by" for recent work
   - Identify seminal studies

6. **Synthesize findings (20-30 min):**
   - Summarize evidence for each hypothesis
   - Note patterns and contradictions
   - Identify knowledge gaps

### Iteration and Refinement

**When initial search is insufficient:**
- Broaden terms if too few results
- Add specific mechanisms/pathways if too many results
- Try alternative terminology
- Search for related phenomena
- Consult review articles for better search terms

**Red flags requiring more search:**
- Only finding weak or indirect evidence
- All evidence comes from single lab or source
- Evidence seems inconsistent with basic principles
- Major aspects of phenomenon lack any relevant literature

## Common Search Pitfalls

### Pitfalls to Avoid

1. **Confirmation bias:** Only seeking evidence supporting preferred hypothesis
   - **Solution:** Actively search for contradicting evidence

2. **Recency bias:** Only considering recent work, missing foundational studies
   - **Solution:** Include historical searches, track development of ideas

3. **Too narrow:** Missing relevant work due to restrictive terms
   - **Solution:** Use OR operators, try alternative terminology

4. **Too broad:** Overwhelmed by irrelevant results
   - **Solution:** Add specific terms, use filters, combine concepts with AND

5. **Single database:** Missing important work in other fields
   - **Solution:** Search both PubMed and general web, try domain-specific databases

6. **Stopping too soon:** Insufficient evidence to ground hypotheses
   - **Solution:** Set minimum targets (e.g., 2 reviews + 5 primary papers per hypothesis aspect)

7. **Cherry-picking:** Citing only supportive papers
   - **Solution:** Represent full spectrum of evidence, acknowledge contradictions

## Special Cases

### Emerging Topics (Limited Literature)

**When little published work exists:**
- Search for analogous phenomena in related systems
- Look for preprints (arXiv, bioRxiv)
- Find conference abstracts and posters
- Identify theoretical frameworks that may apply
- Note the limited evidence in hypothesis generation

### Controversial Topics (Conflicting Literature)

**When evidence is contradictory:**
- Systematically document both sides
- Look for methodological differences explaining conflict
- Check for temporal trends (has understanding shifted?)
- Identify what would resolve the controversy
- Generate hypotheses explaining the discrepancy

### Interdisciplinary Topics

**When spanning multiple fields:**
- Search each field's primary databases
- Use field-specific terminology for each domain
- Look for bridging papers that cite across fields
- Consider consulting domain experts
- Translate concepts between disciplines carefully

## Integration with Hypothesis Generation

### Using Literature to Inform Hypotheses

**Direct applications:**
- Established mechanisms to apply to new contexts
- Known pathways relevant to phenomenon
- Similar phenomena in related systems
- Validated methods for testing

**Indirect applications:**
- Analogies from different systems
- Theoretical frameworks to apply
- Gaps suggesting novel mechanisms
- Contradictions requiring resolution

### Balancing Literature Dependence

**Too literature-dependent:**
- Hypotheses merely restate known mechanisms
- No novel insights or predictions
- "Hypotheses" are actually established facts

**Too literature-independent:**
- Hypotheses ignore relevant evidence
- Propose implausible mechanisms
- Reinvent already-tested ideas
- Inconsistent with established principles

**Optimal balance:**
- Grounded in existing evidence
- Extend understanding in novel ways
- Acknowledge both supporting and challenging evidence
- Generate testable predictions beyond current knowledge
