nlp | Agent Skills

sentencepiece

Language-independent tokenizer treating text as raw Unicode. Supports BPE and Unigram algorithms. Fast (50k sentences/sec), lightweight (6MB memory), deterministic vocabulary. Used by T5, ALBERT, XLNet, mBART. Train on raw text without pre-tokenization. Use when you need multilingual support, CJK languages, or reproducible tokenization.

tokenizermultilingualbpeunigram

ovachiever

text-summarizer

Generate extractive summaries from long text documents. Control summary length, extract key sentences, and process multiple documents.

text-summarizationextractive-summarizationmulti-document-processingsummary-length-control

dkyazzentwatwa

named-entity-extractor

Extract named entities (people, organizations, locations, dates) from text using NLP. Use for document analysis, information extraction, or data enrichment.

named-entity-recognitioninformation-extractiondocument-analysisdata-enrichment

dkyazzentwatwa

language-detector

Detect language of text with confidence scores, support for 50+ languages, and batch text classification.

language-detectiontext-classificationconfidence-scoresmulti-language

dkyazzentwatwa

nlp-basics

Process and analyze text using modern NLP techniques - preprocessing, embeddings, and transformers

preprocessingembeddingstransformersnatural-language-processing

pluginagentmarketplace

Agent Skills in category: nlp

sentencepiece

text-summarizer

named-entity-extractor

language-detector

nlp-basics