Agent Skills: Experiment Tracking

Use when "experiment tracking", "MLflow", "Weights & Biases", "wandb", "model registry", "hyperparameter logging", "ML experiments", "training metrics"

UncategorizedID: eyadsibai/ltk/experiment-tracking

Install this agent skill to your local

pnpm dlx add-skill https://github.com/eyadsibai/ltk/tree/HEAD/plugins/ltk-data/skills/experiment-tracking

Skill Files

Browse the full folder contents for experiment-tracking.

Download Skill

Loading file tree…

plugins/ltk-data/skills/experiment-tracking/SKILL.md

Skill Metadata

Name
experiment-tracking
Description
Use when "experiment tracking", "MLflow", "Weights & Biases", "wandb", "model registry", "hyperparameter logging", "ML experiments", "training metrics"

Experiment Tracking

Track ML experiments, metrics, and models.

Comparison

| Platform | Best For | Self-hosted | Visualization | |----------|----------|-------------|---------------| | MLflow | Open-source, model registry | Yes | Basic | | W&B | Collaboration, sweeps | Limited | Excellent | | Neptune | Team collaboration | No | Good | | ClearML | Full MLOps | Yes | Good |


MLflow

Open-source platform from Databricks.

Core components:

  • Tracking: Log parameters, metrics, artifacts
  • Projects: Reproducible runs (MLproject file)
  • Models: Package and deploy models
  • Registry: Model versioning and staging

Strengths: Self-hosted, open-source, model registry, framework integrations Limitations: Basic visualization, less collaborative features

Key concept: Autologging for major frameworks - automatic metric capture with one line.


Weights & Biases (W&B)

Cloud-first experiment tracking with excellent visualization.

Core features:

  • Experiment tracking: Metrics, hyperparameters, system stats
  • Sweeps: Hyperparameter search (grid, random, Bayesian)
  • Artifacts: Dataset and model versioning
  • Reports: Shareable documentation

Strengths: Beautiful visualizations, team collaboration, hyperparameter sweeps Limitations: Cloud-dependent, limited self-hosting

Key concept: wandb.init() + wandb.log() - simple API, powerful features.


What to Track

| Category | Examples | |----------|----------| | Hyperparameters | Learning rate, batch size, architecture | | Metrics | Loss, accuracy, F1, per-epoch values | | Artifacts | Model checkpoints, configs, datasets | | System | GPU usage, memory, runtime | | Code | Git commit, diff, requirements |


Model Registry Concepts

| Stage | Purpose | |-------|---------| | None | Just logged, not registered | | Staging | Testing, validation | | Production | Serving live traffic | | Archived | Deprecated, kept for reference |


Decision Guide

| Scenario | Recommendation | |----------|----------------| | Self-hosted requirement | MLflow | | Team collaboration | W&B | | Model registry focus | MLflow | | Hyperparameter sweeps | W&B | | Beautiful dashboards | W&B | | Full MLOps pipeline | MLflow + deployment tools |

Resources