Data Storyteller Skill | Agent Skills

Data Storyteller

Automatically transform raw data into compelling, insight-rich reports. Upload any CSV or Excel file and get back a complete analysis with visualizations, statistical summaries, and narrative explanations - all without writing code.

Core Workflow

1. Load and Analyze Data

from scripts.data_storyteller import DataStoryteller

# Initialize with your data file
storyteller = DataStoryteller("your_data.csv")

# Or from a pandas DataFrame
import pandas as pd
df = pd.read_csv("your_data.csv")
storyteller = DataStoryteller(df)

2. Generate Full Report

# Generate comprehensive report
report = storyteller.generate_report()

# Access components
print(report['summary'])           # Executive summary
print(report['insights'])          # Key findings
print(report['statistics'])        # Statistical analysis
print(report['visualizations'])    # Generated chart info

3. Export Options

# Export to PDF
storyteller.export_pdf("analysis_report.pdf")

# Export to HTML (interactive charts)
storyteller.export_html("analysis_report.html")

# Export charts only
storyteller.export_charts("charts/", format="png")

Quick Start Examples

Basic Analysis

from scripts.data_storyteller import DataStoryteller

# One-liner full analysis
DataStoryteller("sales_data.csv").generate_report().export_pdf("report.pdf")

Custom Analysis

storyteller = DataStoryteller("data.csv")

# Focus on specific columns
storyteller.analyze_columns(['revenue', 'customers', 'date'])

# Set analysis parameters
report = storyteller.generate_report(
    include_correlations=True,
    include_outliers=True,
    include_trends=True,
    time_column='date',
    chart_style='business'
)

Features

Auto-Detection

Column Types: Numeric, categorical, datetime, text, boolean
Data Quality: Missing values, duplicates, outliers
Relationships: Correlations, dependencies, groupings
Time Series: Trends, seasonality, anomalies

Generated Visualizations

| Data Type | Charts Generated | |-----------|-----------------| | Numeric | Histogram, box plot, trend line | | Categorical | Bar chart, pie chart, frequency table | | Time Series | Line chart, decomposition, forecast | | Correlations | Heatmap, scatter matrix | | Comparisons | Grouped bar, stacked area |

Narrative Insights

The storyteller generates plain-English insights including:

Executive summary of key findings
Notable patterns and anomalies
Statistical significance notes
Actionable recommendations
Data quality warnings

Output Sections

1. Executive Summary

High-level overview of the dataset and key findings in 2-3 paragraphs.

2. Data Profile

Row/column counts
Memory usage
Missing value analysis
Duplicate detection
Data type distribution

3. Statistical Analysis

For each numeric column:

Central tendency (mean, median, mode)
Dispersion (std dev, IQR, range)
Distribution shape (skewness, kurtosis)
Outlier count

4. Categorical Analysis

For each categorical column:

Unique values count
Top/bottom categories
Frequency distribution
Category balance assessment

5. Correlation Analysis

Correlation matrix with significance
Strongest relationships highlighted
Multicollinearity warnings

6. Time-Based Analysis

If datetime column detected:

Trend direction and strength
Seasonality patterns
Year-over-year comparisons
Growth rate calculations

7. Visualizations

Auto-generated charts saved to report:

Distribution plots
Trend charts
Comparison charts
Correlation heatmaps

8. Recommendations

Data-driven suggestions:

Columns needing attention
Potential data quality fixes
Analysis suggestions
Business implications

Chart Styles

# Available styles
styles = ['business', 'scientific', 'minimal', 'dark', 'colorful']

storyteller.generate_report(chart_style='business')

Configuration

storyteller = DataStoryteller(df)

# Configure analysis
storyteller.config.update({
    'max_categories': 20,       # Max categories to show
    'outlier_method': 'iqr',    # 'iqr', 'zscore', 'isolation'
    'correlation_threshold': 0.5,
    'significance_level': 0.05,
    'date_format': 'auto',      # Or specify like '%Y-%m-%d'
    'language': 'en',           # Narrative language
})

Supported File Formats

| Format | Extension | Notes | |--------|-----------|-------| | CSV | .csv | Auto-detect delimiter | | Excel | .xlsx, .xls | Multi-sheet support | | JSON | .json | Records or columnar | | Parquet | .parquet | For large datasets | | TSV | .tsv | Tab-separated |

Example Output

Sample Executive Summary

"This dataset contains 10,847 records across 15 columns, covering sales transactions from January 2023 to December 2024. Revenue shows a strong upward trend (+23% YoY) with clear seasonal peaks in Q4. The top 3 product categories account for 67% of total revenue. Notable finding: Customer acquisition cost has increased 15% while retention rate dropped 8%, suggesting potential profitability concerns worth investigating."

Sample Insight

"Strong correlation detected between marketing_spend and new_customers (r=0.78, p<0.001). However, this relationship weakens significantly after $50K monthly spend, suggesting diminishing returns beyond this threshold."

Best Practices

Clean data first: Remove obvious errors before analysis
Name columns clearly: Helps auto-detection and narratives
Include dates: Enables time-series analysis
Provide context: Tell the storyteller what the data represents

Limitations

Maximum recommended: 1M rows, 100 columns
Complex nested data may need flattening
Images/binary data not supported
PDF export requires reportlab package

Dependencies

pandas>=2.0.0
numpy>=1.24.0
matplotlib>=3.7.0
seaborn>=0.12.0
scipy>=1.10.0
reportlab>=4.0.0
openpyxl>=3.1.0

Agent Skills: Data Storyteller

Install this agent skill to your local

Skill Files