huggingface-tokenizers
Fast tokenizers optimized for research and production. Rust-based implementation tokenizes 1GB in <20 seconds. Supports BPE, WordPiece, and Unigram algorithms. Train custom vocabularies, track alignments, handle padding/truncation. Integrates seamlessly with transformers. Use when you need high-performance tokenization or custom tokenizer training.
huggingfacetokenizationnlprust
ovachiever
81
data-anonymizer
Detect and mask PII (names, emails, phones, SSN, addresses) in text and CSV files. Multiple masking strategies with reversible tokenization option.
data-protectionPII-maskingtokenizationcsv
dkyazzentwatwa
3
llm-basics
LLM architecture, tokenization, transformers, and inference optimization. Use for understanding and working with language models.
llmtransformerstokenizationinference-optimization
pluginagentmarketplace
1