Agent Skills: Topic Modeling and Text Mining

Apply LDA, NMF, and other computational methods to discover patterns in large text corpora with appropriate parameter tuning

UncategorizedID: a5c-ai/babysitter/topic-modeling-text-mining

Install this agent skill to your local

pnpm dlx add-skill https://github.com/a5c-ai/babysitter/tree/HEAD/plugins/babysitter/skills/babysit/process/specializations/domains/social-sciences-humanities/humanities/skills/topic-modeling-text-mining

Skill Files

Browse the full folder contents for topic-modeling-text-mining.

Download Skill

Loading file tree…

plugins/babysitter/skills/babysit/process/specializations/domains/social-sciences-humanities/humanities/skills/topic-modeling-text-mining/SKILL.md

Skill Metadata

Name
topic-modeling-text-mining
Description
Apply LDA, NMF, and other computational methods to discover patterns in large text corpora with appropriate parameter tuning

Topic Modeling and Text Mining

Apply LDA, NMF, and other computational methods to discover patterns in large text corpora with appropriate parameter tuning.

Overview

This skill enables computational analysis of large text collections. It encompasses topic modeling, text mining techniques, and pattern discovery to reveal structures and themes in textual data for humanistic inquiry.

Capabilities

Topic Modeling

  • LDA implementation
  • NMF analysis
  • Structural topic models
  • Dynamic topic models
  • Parameter optimization

Text Preprocessing

  • Tokenization
  • Stopword removal
  • Lemmatization/stemming
  • N-gram extraction
  • Document-term matrices

Pattern Discovery

  • Word frequency analysis
  • Collocation detection
  • Named entity recognition
  • Sentiment analysis
  • Network extraction

Visualization

  • Word clouds
  • Topic distributions
  • Temporal trends
  • Network graphs
  • Interactive displays

Usage Guidelines

Analysis Process

  1. Prepare text corpus
  2. Preprocess documents
  3. Select modeling approach
  4. Tune parameters
  5. Run analysis
  6. Interpret results
  7. Validate findings

Parameter Considerations

  • Number of topics
  • Iteration counts
  • Hyperparameters
  • Coherence metrics
  • Validation approaches

Interpretation Guidelines

  • Examine topic words
  • Review representative documents
  • Consider domain knowledge
  • Validate with close reading
  • Acknowledge limitations

Integration Points

Related Processes

  • Text Mining and Distant Reading
  • Corpus Linguistics Analysis
  • Network Analysis for Humanities

Collaborating Skills

  • tei-text-encoding
  • gis-mapping-humanities
  • literary-close-reading

References

  • Digital humanities methodology
  • Topic modeling tutorials
  • Text analysis tools
  • Computational linguistics resources