ray-data
Scalable data processing for ML workloads. Streaming execution across CPU/GPU, supports Parquet/CSV/JSON/images. Integrates with Ray Train, PyTorch, TensorFlow. Scales from single machine to 100s of nodes. Use for batch inference, data preprocessing, multi-modal data loading, or distributed ETL pipelines.
flowio
Parse FCS (Flow Cytometry Standard) files v2.0-3.1. Extract events as NumPy arrays, read metadata/channels, convert to CSV/DataFrame, for flow cytometry data preprocessing.
process-mining-assistant
Perform an end-to-end process mining analysis via a command-line workflow that progressively ingests, profiles, cleans, mines and reports on event logs using PM4Py. The workflow generates stage-based artefacts (including versioned notebooks) and pauses at decision checkpoints so the user can validate findings and choose how to proceed.
data-analyzer
データ分析と可視化を行うスキル
field-extraction-parsing
Extract structured fields from unstructured log data using OPAL parsing functions. Covers extract_regex() for pattern matching with type casting, split() for delimited data, parse_json() for JSON logs, and JSONPath for navigating parsed structures. Use when you need to convert raw log text into queryable fields for analysis, filtering, or aggregation.
data-analysis
Analyze data files (CSV, JSON) and generate insights, summaries, and statistical analysis
ml-fundamentals
Master machine learning foundations - algorithms, preprocessing, feature engineering, and evaluation
python-analytics
Python data analysis with pandas, numpy, and analytics libraries
data-analytics-foundations
Core data analytics concepts, Excel/Google Sheets fundamentals, and data collection techniques
data-cleaning
Data cleaning, preprocessing, and quality assurance techniques
token-efficient
Use when processing 50+ items, analyzing CSV/log files, executing code in sandbox, or searching for tools. Load for data processing tasks. Achieves 98%+ token savings via in-sandbox execution, progressive disclosure, and pagination. Supports heredocs for multi-line bash.
data-processing
Process JSON with jq and YAML/TOML with yq. Filter, transform, query structured data efficiently. Triggers on: parse JSON, extract from YAML, query config, Docker Compose, K8s manifests, GitHub Actions workflows, package.json, filter data.
funsloth-check
Validate datasets for Unsloth fine-tuning. Use when the user wants to check a dataset, analyze tokens, calculate Chinchilla optimality, or prepare data for training.
symmetry-discovery-questionnaire
Use when ML engineers need to identify symmetries in their data but don't know where to start. Invoke when user mentions data symmetry, invariance discovery, what transformations matter, or needs help recognizing patterns their model should respect. Works collaboratively through domain analysis, transformation testing, and physical constraint identification.
pandas-pro
Use when working with pandas DataFrames, data cleaning, aggregation, merging, or time series analysis. Invoke for data manipulation, missing value handling, groupby operations, or performance optimization.
fine-tuning-expert
Use when fine-tuning LLMs, training custom models, or optimizing model performance for specific tasks. Invoke for parameter-efficient methods, dataset preparation, or model adaptation.
json-transformer
Transform, manipulate, and analyze JSON data structures with advanced operations.
csv-processor
Parse, transform, and analyze CSV files with advanced data manipulation capabilities.
Page 1 of 2 · 24 results