Marimo Notebooks for IMSA Data Analysis Skill

Marimo Notebooks for IMSA Data Analysis

Purpose

Create interactive, reactive marimo notebooks for analyzing IMSA racing data with:

Interactive filtering UIs for seasons, classes, events, and sessions
Reactive programming where cells auto-update when filters change
DuckDB integration for SQL-based data analysis
Visualization dashboards with Altair charts and tables
Git-friendly notebooks stored as pure Python files

Key Concepts

Reactive Execution

marimo automatically runs dependent cells when variables change. When you interact with a UI element (e.g., change a dropdown), all cells referencing that element automatically re-run with the new value.

# Cell 1: Define UI element (global variable)
season = mo.ui.dropdown(options=[2023, 2024, 2025], value=2025)

# Cell 2: Use the value - automatically reruns when season changes
data = load_season_data(season.value)

# Cell 3: Use data - automatically reruns when data changes
chart = create_visualization(data)

Critical Rules

Assign UI elements to global variables - marimo can only synchronize elements assigned to global variables
Reference the .value attribute - Access UI element values via element.value
No hidden state - Delete a cell and marimo deletes its variables
Pure Python files - Notebooks are .py files, not JSON

Quick Start

Installation

cd ./output/reports && uv init # if needed
cd ./output/reports && uv add marimo openai pandas ...

Create New Notebook

cd ./output/reports && uv run marimo edit imsa_analysis.py

Use Template

Copy the template from assets/imsa_analysis_template.py as a starting point for IMSA analysis.

Building IMSA Analysis Notebooks

Step 1: Import and Connect

import marimo as mo

@app.cell
def __():
    import marimo as mo
    import duckdb
    import pandas as pd
    import altair as alt
    return alt, duckdb, mo, pd

@app.cell
def __(duckdb):
    # Connect to IMSA database
    conn = duckdb.connect("../imsa.duckdb", read_only=True)
    return conn,

Step 2: Load Filter Options

@app.cell
def __(conn):
    # Get available seasons, events, and classes
    seasons_df = conn.execute("""
        SELECT DISTINCT season FROM seasons 
        WHERE session = 'race'
        ORDER BY season DESC
    """).df()
    
    events_df = conn.execute("""
        SELECT DISTINCT event FROM seasons 
        WHERE session = 'race'
        ORDER BY event
    """).df()
    
    classes_df = conn.execute("""
        SELECT DISTINCT class FROM laps 
        WHERE class IS NOT NULL
        ORDER BY class
    """).df()
    
    return seasons_df, events_df, classes_df

Step 3: Create Interactive Filters

@app.cell
def __(mo, seasons_df, events_df, classes_df):
    # Create filter dropdowns
    season_filter = mo.ui.dropdown(
        options=seasons_df['season'].tolist(),
        value=seasons_df['season'].iloc[0],
        label="Season"
    )
    
    event_filter = mo.ui.dropdown(
        options=events_df['event'].tolist(),
        value=events_df['event'].iloc[0],
        label="Event"
    )
    
    class_filter = mo.ui.dropdown(
        options=classes_df['class'].tolist(),
        value=classes_df['class'].iloc[0],
        label="Class"
    )
    
    # Display filters horizontally
    mo.hstack([
        mo.vstack([mo.md("**Season**"), season_filter]),
        mo.vstack([mo.md("**Event**"), event_filter]),
        mo.vstack([mo.md("**Class**"), class_filter])
    ], justify="start")
    
    return season_filter, event_filter, class_filter

Step 4: Get Session ID (Critical for IMSA)

@app.cell
def __(conn, season_filter, event_filter, mo):
    # CRITICAL: Always work with session_id for lap time comparisons
    # Never compare lap times across different session_ids
    
    session_query = f"""
        SELECT session_id, start_date, session
        FROM seasons
        WHERE season = {season_filter.value}
          AND event = '{event_filter.value}'
          AND session = 'race'
        LIMIT 1
    """
    
    session_result = conn.execute(session_query).df()
    
    if len(session_result) > 0:
        session_id = session_result['session_id'].iloc[0]
        mo.md(f"**Session ID**: {session_id}")
    else:
        mo.md("⚠️ No race session found")
        session_id = None
    
    return session_id, session_result

Step 5: Query and Analyze Data

@app.cell
def __(conn, session_id, class_filter, mo):
    # Query with proper IMSA filtering
    if session_id:
        query = f"""
            SELECT 
                driver_name,
                car_number,
                lap_time,
                lap_number,
                bpillar_quartile
            FROM laps
            WHERE session_id = {session_id}
              AND class = '{class_filter.value}'
              AND bpillar_quartile IN (1, 2)  -- Top 50% clean laps
              AND flags = 'GF'  -- Green flag only
            ORDER BY lap_time
            LIMIT 100
        """
        
        laps_df = conn.execute(query).df()
        
        if len(laps_df) > 0:
            mo.md(f"Loaded {len(laps_df)} laps")
        else:
            mo.md("⚠️ No data found")
    else:
        laps_df = None
    
    return laps_df,

Step 6: Visualize Results

@app.cell
def __(laps_df, alt, mo):
    if laps_df is not None and len(laps_df) > 0:
        chart = alt.Chart(laps_df).mark_boxplot().encode(
            x=alt.X('driver_name:N', title='Driver', sort='-y'),
            y=alt.Y('lap_time:Q', title='Lap Time (seconds)'),
            color='driver_name:N'
        ).properties(
            width=800,
            height=400,
            title='Lap Time Distribution'
        )
        
        chart
    else:
        mo.md("No data to visualize")
    
    return chart,

IMSA-Specific Requirements

Always Use session_id

# ✅ CORRECT: Filter to specific session
WHERE session_id = {session_id}
  AND class = '{class_filter.value}'

# ❌ WRONG: Comparing across sessions
WHERE season = 2025  # Don't do this for lap time analysis

Always Default to Race Sessions

# ✅ CORRECT: Default to 'race'
WHERE session = 'race'

# ⚠️ Only if explicitly requested
WHERE session = 'practice'  # Different objectives, not comparable

Always Use BPillar Filtering for Races

# ✅ CORRECT: Filter to clean, representative laps
WHERE bpillar_quartile IN (1, 2)  -- Top 50% of clean laps
  AND flags = 'GF'  -- Green flag

# ❌ WRONG: Including all laps (pit stops, traffic, etc.)
-- No quartile filtering

Never Compare Across Classes

# ✅ CORRECT: Analyze each class separately
WHERE session_id = {session_id} AND class = 'GTP'

# ❌ WRONG: Comparing GTP to GTD
WHERE session_id = {session_id}  -- Missing class filter

Common UI Patterns

Multi-Select for Classes

class_selector = mo.ui.multiselect(
    options=["GTP", "LMP2", "GTD"],
    value=["GTP"],
    label="Select Classes"
)

# Use in query with IN clause
classes_str = "', '".join(class_selector.value)
query = f"WHERE class IN ('{classes_str}')"

Slider for Top N Selection

top_n = mo.ui.slider(
    start=5,
    stop=50,
    step=5,
    value=10,
    label="Top N Drivers",
    show_value=True
)

# Use in LIMIT clause
query = f"... ORDER BY lap_time LIMIT {top_n.value}"

Form for Expensive Operations

# Gate expensive analysis behind a form
analysis_params = mo.md("""
### Configure Analysis
- Minimum laps: {min_laps}
- Include practice: {include_practice}
""").batch(
    min_laps=mo.ui.slider(5, 50, value=10),
    include_practice=mo.ui.checkbox(value=False)
).form()

# Only run when submitted
if analysis_params.value:
    results = run_expensive_analysis(**analysis_params.value)

Conditional Display with mo.stop

# Stop if session not selected
mo.stop(
    session_id is None,
    mo.md("⚠️ Please select a valid session")
)

# Stop if no data
mo.stop(
    laps_df is None or len(laps_df) == 0,
    mo.md("⚠️ No data available for analysis")
)

Performance Optimization

Filter in SQL, Not Python

# ✅ GOOD: Filter in SQL
query = f"""
    SELECT * FROM laps
    WHERE session_id = {session_id}
      AND class = '{class}'
      AND bpillar_quartile IN (1, 2)
    LIMIT 1000
"""
df = conn.execute(query).df()

# ❌ BAD: Load all data then filter
df = conn.execute("SELECT * FROM laps").df()
df = df[df['session_id'] == session_id]  # Inefficient

Limit Data for Visualization

# Sample large datasets
query = f"""
    SELECT * FROM laps
    WHERE session_id = {session_id}
    ORDER BY RANDOM()
    LIMIT 1000  -- Sample for performance
"""

Visualization Patterns

Interactive Altair Charts

# Create selection
selection = alt.selection_point(fields=['driver_name'])

chart = alt.Chart(laps_df).mark_circle().encode(
    x='lap_number:Q',
    y='lap_time:Q',
    color=alt.condition(selection, 'driver_name:N', alt.value('lightgray')),
    tooltip=['driver_name', 'lap_time', 'lap_number']
).add_params(selection).properties(
    width=800,
    height=400
)

Interactive Tables

# Create interactive table
table = mo.ui.table(
    laps_df,
    selection='multi',  # Allow row selection
    page_size=20
)

# Display table
table

# Access selected rows in another cell
selected_rows = table.value

Dashboard Layout

# Create tabs for different analyses
tabs = mo.ui.tabs({
    "Lap Times": lap_time_analysis,
    "Consistency": consistency_metrics,
    "Pit Strategy": pit_analysis,
    "Weather Impact": weather_correlation
})

tabs

Reference Materials

Detailed Patterns

See references/marimo_patterns.md for comprehensive patterns including:

Advanced UI element configurations
DuckDB integration techniques
Reactive design patterns
Data visualization examples
IMSA-specific query patterns
Performance optimization tips

Deeper docs

See references/agent-docs.md are officially written docs for running a marimo agent. It's excellent and you should read it.

Template Notebook

See assets/imsa_analysis_template.py for a complete working example that demonstrates:

Database connection setup
Interactive filter creation
Session ID retrieval
Proper IMSA data filtering
Visualization and analysis
Error handling

Running Notebooks

Development Mode

# Edit notebook with live preview
cd ./output/reports && uv run marimo edit notebook.py

App Mode

# Run as read-only web app (hides code)
cd ./output/reports && uv run marimo run notebook.py

# Run with code visible
cd ./output/reports && uv run marimo run --include-code notebook.py

# Custom port
cd ./output/reports && uv run marimo run --port 8080 notebook.py

Script Mode

# Execute notebook as Python script
cd ./output/reports && uv run python notebook.py

# With command-line arguments
cd ./output/reports && uv run python notebook.py --season 2025 --event Sebring

Best Practices

1. One Cell, One Purpose

Each cell should have a single, clear purpose:

Cell 1: Imports
Cell 2: Database connection
Cell 3: Load filter options
Cell 4: Create UI filters
Cell 5: Get session_id
Cell 6: Query data
Cell 7: Visualize

2. Validate Inputs

Always check if data exists before processing:

if session_id and laps_df is not None and len(laps_df) > 0:
    # Process data
else:
    mo.md("⚠️ No data available")

3. Use Descriptive Variable Names

# ✅ GOOD
season_filter = mo.ui.dropdown(...)
laps_df = conn.execute(query).df()

# ❌ BAD
s = mo.ui.dropdown(...)
df = conn.execute(query).df()

4. Add User Feedback

# Loading indicators
with mo.status.spinner(title="Loading data..."):
    df = load_data()

# Success messages
mo.md(f"✅ Loaded {len(df)} laps")

# Warnings
if len(df) < 100:
    mo.md("⚠️ Small sample size")

5. Document Complex Queries

# Add comments explaining IMSA-specific filtering
query = f"""
    SELECT * FROM laps
    WHERE session_id = {session_id}  -- Single session for comparison
      AND class = '{class}'          -- Each class analyzed separately
      AND bpillar_quartile IN (1, 2) -- Top 50% clean laps (excludes pit stops)
      AND flags = 'GF'               -- Green flag laps only
"""

Troubleshooting

UI Element Not Updating Other Cells

Cause: Element not assigned to global variable
Fix: Ensure element = mo.ui.dropdown(...) at module level

Cell Not Re-Running on Change

Cause: Cell doesn't reference the changed variable
Fix: Check that cell reads element.value somewhere

"No module named marimo"

Cause: marimo not installed in current environment
Fix: pip install "marimo[sql]"

Database Connection Error

Cause: Wrong path or database doesn't exist
Fix: Check path in duckdb.connect("path/to/imsa.duckdb")

Empty Results

Cause: Filters too restrictive or no data for selection
Fix: Check if session exists, validate filter values

Agent Skills: Marimo Notebooks for IMSA Data Analysis

Install this agent skill to your local

Skill Files