Agent Skills: model-evaluation

Evaluates machine learning models for performance, fairness, and reliability using appropriate metrics and validation techniques. Covers training debugging, hyperparameter tuning, and production monitoring. Trigger keywords: model evaluation, metrics, accuracy, precision, recall, F1, F1-score, ROC, AUC, ROC-AUC, confusion matrix, cross-validation, k-fold, stratified, overfitting, underfitting, bias, variance, bias-variance tradeoff, hyperparameter, hyperparameter tuning, loss, loss function, metric, benchmark, benchmarking, model performance, classification metrics, regression metrics, RMSE, MSE, MAE, MAPE, R2, R-squared, train-test split, validation set, test set, hold-out, learning curve, validation curve, model selection, error analysis, residual analysis, ML testing, training issues, convergence, gradient, vanishing gradient, exploding gradient, training instability, LLM evaluation, language model evaluation, prompt engineering evaluation, A/B testing, champion-challenger, model monitoring, model drift, data drift, concept drift, model decay.

UncategorizedID: cosmix/claude-loom/model-evaluation

Install this agent skill to your local

pnpm dlx add-skill https://github.com/cosmix/claude-loom/model-evaluation

Skill Files

Browse the full folder contents for model-evaluation.

Download Skill

Loading file tree…

Select a file to preview its contents.