Marimo Notebooks for IMSA Data Analysis
Purpose
Create interactive, reactive marimo notebooks for analyzing IMSA racing data with:
- Interactive filtering UIs for seasons, classes, events, and sessions
- Reactive programming where cells auto-update when filters change
- DuckDB integration for SQL-based data analysis
- Visualization dashboards with Altair charts and tables
- Git-friendly notebooks stored as pure Python files
Key Concepts
Reactive Execution
marimo automatically runs dependent cells when variables change. When you interact with a UI element (e.g., change a dropdown), all cells referencing that element automatically re-run with the new value.
# Cell 1: Define UI element (global variable)
season = mo.ui.dropdown(options=[2023, 2024, 2025], value=2025)
# Cell 2: Use the value - automatically reruns when season changes
data = load_season_data(season.value)
# Cell 3: Use data - automatically reruns when data changes
chart = create_visualization(data)
Critical Rules
- Assign UI elements to global variables - marimo can only synchronize elements assigned to global variables
- Reference the
.valueattribute - Access UI element values viaelement.value - No hidden state - Delete a cell and marimo deletes its variables
- Pure Python files - Notebooks are
.pyfiles, not JSON
Quick Start
Directory
Work in ./outputs/reports folder, create it if necissary
Installation
cd ./output/reports && uv init # if needed
cd ./output/reports && uv add marimo openai pandas ...
Create New Notebook
cd ./output/reports && uv run marimo edit imsa_analysis.py
Use Template
Copy the template from assets/imsa_analysis_template.py as a starting point for IMSA analysis.
Building IMSA Analysis Notebooks
Step 1: Import and Connect
import marimo as mo
@app.cell
def __():
import marimo as mo
import duckdb
import pandas as pd
import altair as alt
return alt, duckdb, mo, pd
@app.cell
def __(duckdb):
# Connect to IMSA database
conn = duckdb.connect("../imsa.duckdb", read_only=True)
return conn,
Step 2: Load Filter Options
@app.cell
def __(conn):
# Get available seasons, events, and classes
seasons_df = conn.execute("""
SELECT DISTINCT season FROM seasons
WHERE session = 'race'
ORDER BY season DESC
""").df()
events_df = conn.execute("""
SELECT DISTINCT event FROM seasons
WHERE session = 'race'
ORDER BY event
""").df()
classes_df = conn.execute("""
SELECT DISTINCT class FROM laps
WHERE class IS NOT NULL
ORDER BY class
""").df()
return seasons_df, events_df, classes_df
Step 3: Create Interactive Filters
@app.cell
def __(mo, seasons_df, events_df, classes_df):
# Create filter dropdowns
season_filter = mo.ui.dropdown(
options=seasons_df['season'].tolist(),
value=seasons_df['season'].iloc[0],
label="Season"
)
event_filter = mo.ui.dropdown(
options=events_df['event'].tolist(),
value=events_df['event'].iloc[0],
label="Event"
)
class_filter = mo.ui.dropdown(
options=classes_df['class'].tolist(),
value=classes_df['class'].iloc[0],
label="Class"
)
# Display filters horizontally
mo.hstack([
mo.vstack([mo.md("**Season**"), season_filter]),
mo.vstack([mo.md("**Event**"), event_filter]),
mo.vstack([mo.md("**Class**"), class_filter])
], justify="start")
return season_filter, event_filter, class_filter
Step 4: Get Session ID (Critical for IMSA)
@app.cell
def __(conn, season_filter, event_filter, mo):
# CRITICAL: Always work with session_id for lap time comparisons
# Never compare lap times across different session_ids
session_query = f"""
SELECT session_id, start_date, session
FROM seasons
WHERE season = {season_filter.value}
AND event = '{event_filter.value}'
AND session = 'race'
LIMIT 1
"""
session_result = conn.execute(session_query).df()
if len(session_result) > 0:
session_id = session_result['session_id'].iloc[0]
mo.md(f"**Session ID**: {session_id}")
else:
mo.md("⚠️ No race session found")
session_id = None
return session_id, session_result
Step 5: Query and Analyze Data
@app.cell
def __(conn, session_id, class_filter, mo):
# Query with proper IMSA filtering
if session_id:
query = f"""
SELECT
driver_name,
car_number,
lap_time,
lap_number,
bpillar_quartile
FROM laps
WHERE session_id = {session_id}
AND class = '{class_filter.value}'
AND bpillar_quartile IN (1, 2) -- Top 50% clean laps
AND flags = 'GF' -- Green flag only
ORDER BY lap_time
LIMIT 100
"""
laps_df = conn.execute(query).df()
if len(laps_df) > 0:
mo.md(f"Loaded {len(laps_df)} laps")
else:
mo.md("⚠️ No data found")
else:
laps_df = None
return laps_df,
Step 6: Visualize Results
@app.cell
def __(laps_df, alt, mo):
if laps_df is not None and len(laps_df) > 0:
chart = alt.Chart(laps_df).mark_boxplot().encode(
x=alt.X('driver_name:N', title='Driver', sort='-y'),
y=alt.Y('lap_time:Q', title='Lap Time (seconds)'),
color='driver_name:N'
).properties(
width=800,
height=400,
title='Lap Time Distribution'
)
chart
else:
mo.md("No data to visualize")
return chart,
IMSA-Specific Requirements
Always Use session_id
# ✅ CORRECT: Filter to specific session
WHERE session_id = {session_id}
AND class = '{class_filter.value}'
# ❌ WRONG: Comparing across sessions
WHERE season = 2025 # Don't do this for lap time analysis
Always Default to Race Sessions
# ✅ CORRECT: Default to 'race'
WHERE session = 'race'
# ⚠️ Only if explicitly requested
WHERE session = 'practice' # Different objectives, not comparable
Always Use BPillar Filtering for Races
# ✅ CORRECT: Filter to clean, representative laps
WHERE bpillar_quartile IN (1, 2) -- Top 50% of clean laps
AND flags = 'GF' -- Green flag
# ❌ WRONG: Including all laps (pit stops, traffic, etc.)
-- No quartile filtering
Never Compare Across Classes
# ✅ CORRECT: Analyze each class separately
WHERE session_id = {session_id} AND class = 'GTP'
# ❌ WRONG: Comparing GTP to GTD
WHERE session_id = {session_id} -- Missing class filter
Common UI Patterns
Multi-Select for Classes
class_selector = mo.ui.multiselect(
options=["GTP", "LMP2", "GTD"],
value=["GTP"],
label="Select Classes"
)
# Use in query with IN clause
classes_str = "', '".join(class_selector.value)
query = f"WHERE class IN ('{classes_str}')"
Slider for Top N Selection
top_n = mo.ui.slider(
start=5,
stop=50,
step=5,
value=10,
label="Top N Drivers",
show_value=True
)
# Use in LIMIT clause
query = f"... ORDER BY lap_time LIMIT {top_n.value}"
Form for Expensive Operations
# Gate expensive analysis behind a form
analysis_params = mo.md("""
### Configure Analysis
- Minimum laps: {min_laps}
- Include practice: {include_practice}
""").batch(
min_laps=mo.ui.slider(5, 50, value=10),
include_practice=mo.ui.checkbox(value=False)
).form()
# Only run when submitted
if analysis_params.value:
results = run_expensive_analysis(**analysis_params.value)
Conditional Display with mo.stop
# Stop if session not selected
mo.stop(
session_id is None,
mo.md("⚠️ Please select a valid session")
)
# Stop if no data
mo.stop(
laps_df is None or len(laps_df) == 0,
mo.md("⚠️ No data available for analysis")
)
Performance Optimization
Filter in SQL, Not Python
# ✅ GOOD: Filter in SQL
query = f"""
SELECT * FROM laps
WHERE session_id = {session_id}
AND class = '{class}'
AND bpillar_quartile IN (1, 2)
LIMIT 1000
"""
df = conn.execute(query).df()
# ❌ BAD: Load all data then filter
df = conn.execute("SELECT * FROM laps").df()
df = df[df['session_id'] == session_id] # Inefficient
Limit Data for Visualization
# Sample large datasets
query = f"""
SELECT * FROM laps
WHERE session_id = {session_id}
ORDER BY RANDOM()
LIMIT 1000 -- Sample for performance
"""
Visualization Patterns
Interactive Altair Charts
# Create selection
selection = alt.selection_point(fields=['driver_name'])
chart = alt.Chart(laps_df).mark_circle().encode(
x='lap_number:Q',
y='lap_time:Q',
color=alt.condition(selection, 'driver_name:N', alt.value('lightgray')),
tooltip=['driver_name', 'lap_time', 'lap_number']
).add_params(selection).properties(
width=800,
height=400
)
Interactive Tables
# Create interactive table
table = mo.ui.table(
laps_df,
selection='multi', # Allow row selection
page_size=20
)
# Display table
table
# Access selected rows in another cell
selected_rows = table.value
Dashboard Layout
# Create tabs for different analyses
tabs = mo.ui.tabs({
"Lap Times": lap_time_analysis,
"Consistency": consistency_metrics,
"Pit Strategy": pit_analysis,
"Weather Impact": weather_correlation
})
tabs
Reference Materials
Detailed Patterns
See references/marimo_patterns.md for comprehensive patterns including:
- Advanced UI element configurations
- DuckDB integration techniques
- Reactive design patterns
- Data visualization examples
- IMSA-specific query patterns
- Performance optimization tips
Deeper docs
See references/agent-docs.md are officially written docs for running a marimo agent. It's excellent and you should read it.
Template Notebook
See assets/imsa_analysis_template.py for a complete working example that demonstrates:
- Database connection setup
- Interactive filter creation
- Session ID retrieval
- Proper IMSA data filtering
- Visualization and analysis
- Error handling
Running Notebooks
Development Mode
# Edit notebook with live preview
cd ./output/reports && uv run marimo edit notebook.py
App Mode
# Run as read-only web app (hides code)
cd ./output/reports && uv run marimo run notebook.py
# Run with code visible
cd ./output/reports && uv run marimo run --include-code notebook.py
# Custom port
cd ./output/reports && uv run marimo run --port 8080 notebook.py
Script Mode
# Execute notebook as Python script
cd ./output/reports && uv run python notebook.py
# With command-line arguments
cd ./output/reports && uv run python notebook.py --season 2025 --event Sebring
Best Practices
1. One Cell, One Purpose
Each cell should have a single, clear purpose:
- Cell 1: Imports
- Cell 2: Database connection
- Cell 3: Load filter options
- Cell 4: Create UI filters
- Cell 5: Get session_id
- Cell 6: Query data
- Cell 7: Visualize
2. Validate Inputs
Always check if data exists before processing:
if session_id and laps_df is not None and len(laps_df) > 0:
# Process data
else:
mo.md("⚠️ No data available")
3. Use Descriptive Variable Names
# ✅ GOOD
season_filter = mo.ui.dropdown(...)
laps_df = conn.execute(query).df()
# ❌ BAD
s = mo.ui.dropdown(...)
df = conn.execute(query).df()
4. Add User Feedback
# Loading indicators
with mo.status.spinner(title="Loading data..."):
df = load_data()
# Success messages
mo.md(f"✅ Loaded {len(df)} laps")
# Warnings
if len(df) < 100:
mo.md("⚠️ Small sample size")
5. Document Complex Queries
# Add comments explaining IMSA-specific filtering
query = f"""
SELECT * FROM laps
WHERE session_id = {session_id} -- Single session for comparison
AND class = '{class}' -- Each class analyzed separately
AND bpillar_quartile IN (1, 2) -- Top 50% clean laps (excludes pit stops)
AND flags = 'GF' -- Green flag laps only
"""
Troubleshooting
UI Element Not Updating Other Cells
- Cause: Element not assigned to global variable
- Fix: Ensure
element = mo.ui.dropdown(...)at module level
Cell Not Re-Running on Change
- Cause: Cell doesn't reference the changed variable
- Fix: Check that cell reads
element.valuesomewhere
"No module named marimo"
- Cause: marimo not installed in current environment
- Fix:
pip install "marimo[sql]"
Database Connection Error
- Cause: Wrong path or database doesn't exist
- Fix: Check path in
duckdb.connect("path/to/imsa.duckdb")
Empty Results
- Cause: Filters too restrictive or no data for selection
- Fix: Check if session exists, validate filter values