Charting Intelligence & Data-to-Viz Pipeline Engineering Skill

Charting Intelligence & Data-to-Viz Pipeline Engineering

Enable any agent to (1) instantly identify the exact chart needed from raw data, (2) generate the precise path of queries/transforms to materialize that chart, and (3) evaluate and choose the optimal charting library/stack based on performance, scale, and interactivity requirements.

This is not "just call a library" — it is full-stack visualization strategy.

1. Core Decision Framework — Choosing the Chart That Fits the Data AND the Story

Before any code runs, answer these questions in order:

What is the goal of the viewer?

| Goal | Chart Type | |------|-----------| | Compare values | Bar/Column (grouped or stacked) | | Show trend over time | Line or Area | | Show distribution / spread | Histogram, Box Plot, Violin | | Show relationship / correlation | Scatter, Bubble, Heatmap | | Show composition / parts-of-whole | Stacked Bar or Area (never pie if >5 slices) | | Show hierarchy / flow | Treemap, Sunburst, Sankey | | Show geographic pattern | Choropleth or Symbol Map |

How many variables and what types?

| Variables | Chart | |-----------|-------| | 1 numeric, unordered | Histogram / Density | | 1 numeric + time | Line | | 1 categorical + 1 numeric | Bar | | 2 numeric | Scatter | | 1 categorical + time series | Grouped or Stacked Line/Area | | Many-to-many relationships | Heatmap or Parallel Coordinates |

Audience & Context Check

| Audience | Approach | |----------|----------| | Executive dashboard | Big numbers + simple bars/lines, zero clutter | | Analyst/explorer | Interactive tooltips, zoom, hover details, multiple linked views | | Mobile | Horizontal bars, large text, minimal colors | | Accessibility | High contrast, patterns instead of color-only, alt-text descriptions |

Rule of Thumb Table

| Data Situation | Best Chart (first choice) | Avoid | |---------------|--------------------------|-------| | >5 categories | Bar (horizontal) | Pie | | Time series >20 points | Line | Column | | Correlation between 2 measures | Scatter | Line (unless ordered) | | Parts of whole >5 slices | Stacked Bar or Treemap | Pie/Donut | | Outliers or distribution shape | Box + Violin | Bar | | Flow between stages | Sankey | Anything else |

2. The Data Pipeline Engine

Most databases do NOT have the exact aggregation ready. Auto-generate the full pipeline:

Step A — Inventory

Scan schema or sample 100 rows — detect column types, null rates, cardinality
Flag missing aggregations (e.g., "no daily_sales_by_region view exists")

Step B — Required Transformations

Auto-generate SQL or pandas code for:

Joins needed?
GROUP BY + SUM/AVG/COUNT?
Window functions for running totals or YoY?
Binning (e.g., age into decades)?
Pivot/unpivot?
Outlier flagging or imputation?

Step C — Materialization Strategy

| Scale | Strategy | |-------|----------| | One-off (<10k rows) | Run query on-the-fly | | Medium | Create materialized view or cached table | | Large/Real-time | Pre-aggregate in Spark/DuckDB, incremental refresh | | Extreme | Stream + windowed aggregates (Flink/Kafka) |

Step D — Validation

Run a tiny sample query first — confirm the shape matches the chosen chart type
If not, loop back and adjust aggregation

Example

User says "show monthly revenue by product category":

"I need: LEFT JOIN orders -> products -> categories; GROUP BY month, category; SUM(revenue). No view exists -> I will create temp table or run inline. Chart type: Stacked Area. Library recommendation below."

3. Library Selection Matrix

Always output the performance trade-off and recommended stack.

| Scale / Requirement | Recommended Library | Why | Fallback | |--------------------|-------------------|-----|----------| | <10k points, simple web dashboard | Chart.js or Recharts | <10 ms render, ~60 KB bundle | N/A | | 10k-500k points, interactive | Apache ECharts or Plotly.js | Canvas + WebGL, 60 fps on 100k points | D3 (slower) | | 500k-10M+ points, real-time | LightningChart or Highcharts Stock + WebGL | GPU accelerated, <50 ms at 5M points | Anything SVG-based fails | | Python backend + web | Plotly Dash or Bokeh | Server-side render + client streaming | Matplotlib (static only) | | Python notebook exploration | Seaborn + Plotly | Instant, beautiful defaults | -- | | Extremely large / streaming | DuckDB + Observable Plot or Perspective | In-memory columnar, sub-second on billions | -- | | No JavaScript (PDF reports) | Matplotlib + WeasyPrint or ReportLab | Pure Python, vector output | -- |

Optimization Rules (apply automatically)

Downsample for overview, show full detail on zoom (ECharts built-in)
Use Canvas instead of SVG above ~5k elements
Pre-aggregate at DB level whenever possible (biggest single win)
Lazy load charts below the fold
Bundle size: tree-shake everything except the one chart type you need
GPU vs CPU: if >100k points and user needs pan/zoom, force WebGL path

4. Full Workflow

Parse intent — identify required chart type from user request
Schema scan — detect column types, cardinality, row estimates
Decision framework — output chart recommendation + rationale
Generate transforms — exact SQL/pandas/transform code needed
Choose library — select by performance tier based on row estimate
Emit deliverables:
- Chart spec (JSON for the library or React component)
- SQL/transform script
- Performance warning or confirmation
- Accessibility note + alt-text template

5. Advanced Capabilities

"Show me what I should be charting but aren't" — auto-correlation scan + suggested visuals
"Optimize this dashboard for 10x speed" — rewrite query + switch library
"Make this mobile-first" — auto-switch to horizontal bars + simplify
Color-blind & accessibility mode — toggle patterns, high contrast
Export — SVG/PNG/PDF with embedded data table

Agent Skills: Charting Intelligence & Data-to-Viz Pipeline Engineering

Install this agent skill to your local

Skill Files