Experiment Tracking Skill
Learn: Master ML experiment tracking for reproducibility and collaboration.
Skill Overview
| Attribute | Value | |-----------|-------| | Bonded Agent | 02-experiment-tracking | | Difficulty | Intermediate | | Duration | 30 hours | | Prerequisites | mlops-basics |
Learning Objectives
- Set up experiment tracking infrastructure
- Log parameters, metrics, and artifacts systematically
- Compare experiments and identify best models
- Use model registry for version management
- Collaborate with team using shared tracking
Topics Covered
Module 1: Platform Setup (6 hours)
Platform Comparison:
| Feature | MLflow | W&B | Neptune | |---------|--------|-----|---------| | Self-hosted | ✅ | ❌ | ❌ | | Free tier | ✅ | ✅ | ✅ | | Real-time | ❌ | ✅ | ✅ | | Git integration | ⚠️ | ✅ | ✅ |
Setup Exercises:
- [ ] Install MLflow and start local server
- [ ] Create W&B account and initialize project
- [ ] Compare UI/UX of both platforms
Module 2: Experiment Logging (10 hours)
What to Log:
# Complete logging example
with mlflow.start_run():
# 1. Parameters (hyperparameters, configs)
mlflow.log_params({
"learning_rate": 0.001,
"batch_size": 32,
"model_type": "transformer"
})
# 2. Metrics (per-step and final)
for epoch in range(10):
mlflow.log_metrics({
"train_loss": train_loss,
"val_loss": val_loss
}, step=epoch)
# 3. Artifacts (models, plots, configs)
mlflow.log_artifact("confusion_matrix.png")
mlflow.pytorch.log_model(model, "model")
# 4. Tags (for filtering)
mlflow.set_tags({
"experiment_type": "baseline",
"dataset_version": "v2.1"
})
Module 3: Model Registry (8 hours)
Registry Workflow:
┌─────────────────────────────────────────────────────────────┐
│ MODEL REGISTRY FLOW │
├─────────────────────────────────────────────────────────────┤
│ │
│ Train → Log Model → Register → Staging → Production → Archive
│ │ │ │ │
│ ▼ ▼ ▼ │
│ Version 1 Validate Deploy │
│ Version 2 A/B Test Monitor │
│ Version N Approve Rollback │
│ │
└─────────────────────────────────────────────────────────────┘
Exercises:
- [ ] Register a trained model
- [ ] Promote model through stages
- [ ] Implement rollback procedure
Module 4: Best Practices (6 hours)
Naming Conventions:
experiments/
├── {project_name}/
│ ├── {experiment_type}_{date}/
│ │ ├── run_{config_hash}/
Reproducibility Checklist:
- [ ] Log git commit hash
- [ ] Capture environment (pip freeze)
- [ ] Set and log random seeds
- [ ] Log data version/hash
- [ ] Save config files as artifacts
Code Templates
Template: Production Experiment Tracker
# templates/experiment_tracker.py
import mlflow
import hashlib
import subprocess
from datetime import datetime
class ProductionExperimentTracker:
"""Production-ready experiment tracking wrapper."""
def __init__(self, experiment_name: str, tracking_uri: str):
mlflow.set_tracking_uri(tracking_uri)
mlflow.set_experiment(experiment_name)
self.run = None
def start_run(self, run_name: str = None):
"""Start a new tracked run."""
self.run = mlflow.start_run(run_name=run_name)
# Auto-log environment info
self._log_environment()
return self
def _log_environment(self):
"""Capture reproducibility information."""
# Git info
try:
git_hash = subprocess.check_output(
["git", "rev-parse", "HEAD"]
).decode().strip()
mlflow.set_tag("git_commit", git_hash)
except:
pass
# Timestamp
mlflow.set_tag("run_timestamp", datetime.now().isoformat())
def log_config(self, config: dict):
"""Log configuration as parameters."""
# Flatten nested config
flat_config = self._flatten_dict(config)
mlflow.log_params(flat_config)
def log_metrics(self, metrics: dict, step: int = None):
"""Log metrics with optional step."""
mlflow.log_metrics(metrics, step=step)
def log_model(self, model, artifact_path: str = "model"):
"""Log model with signature."""
mlflow.pytorch.log_model(model, artifact_path)
def end_run(self):
"""End the current run."""
if self.run:
mlflow.end_run()
def _flatten_dict(self, d: dict, parent_key: str = '') -> dict:
"""Flatten nested dictionary."""
items = []
for k, v in d.items():
new_key = f"{parent_key}.{k}" if parent_key else k
if isinstance(v, dict):
items.extend(self._flatten_dict(v, new_key).items())
else:
items.append((new_key, v))
return dict(items)
Troubleshooting Guide
| Issue | Cause | Solution | |-------|-------|----------| | Runs not syncing | Network issue | Check connectivity, use offline mode | | Large artifacts fail | Size limit | Use cloud storage for large files | | Duplicate run names | No uniqueness | Add timestamp or hash to names |
Resources
- MLflow Documentation
- W&B Documentation
- [See: training-pipelines] - Integrate tracking with pipelines
Version History
| Version | Date | Changes | |---------|------|---------| | 2.0.0 | 2024-12 | Production-grade with templates | | 1.0.0 | 2024-11 | Initial release |