Agent Skills: Data Storyteller

Transform CSV/Excel data into narrative reports with auto-generated insights, visualizations, and PDF export. Auto-detects patterns and creates plain-English summaries.

UncategorizedID: dkyazzentwatwa/chatgpt-skills/data-storyteller

Skill Files

Browse the full folder contents for data-storyteller.

Download Skill

Loading file tree…

data-storyteller/SKILL.md

Skill Metadata

Name
data-storyteller
Description
Transform CSV/Excel data into narrative reports with auto-generated insights, visualizations, and PDF export. Auto-detects patterns and creates plain-English summaries.

Data Storyteller

Automatically transform raw data into compelling, insight-rich reports. Upload any CSV or Excel file and get back a complete analysis with visualizations, statistical summaries, and narrative explanations - all without writing code.

Core Workflow

1. Load and Analyze Data

from scripts.data_storyteller import DataStoryteller

# Initialize with your data file
storyteller = DataStoryteller("your_data.csv")

# Or from a pandas DataFrame
import pandas as pd
df = pd.read_csv("your_data.csv")
storyteller = DataStoryteller(df)

2. Generate Full Report

# Generate comprehensive report
report = storyteller.generate_report()

# Access components
print(report['summary'])           # Executive summary
print(report['insights'])          # Key findings
print(report['statistics'])        # Statistical analysis
print(report['visualizations'])    # Generated chart info

3. Export Options

# Export to PDF
storyteller.export_pdf("analysis_report.pdf")

# Export to HTML (interactive charts)
storyteller.export_html("analysis_report.html")

# Export charts only
storyteller.export_charts("charts/", format="png")

Quick Start Examples

Basic Analysis

from scripts.data_storyteller import DataStoryteller

# One-liner full analysis
DataStoryteller("sales_data.csv").generate_report().export_pdf("report.pdf")

Custom Analysis

storyteller = DataStoryteller("data.csv")

# Focus on specific columns
storyteller.analyze_columns(['revenue', 'customers', 'date'])

# Set analysis parameters
report = storyteller.generate_report(
    include_correlations=True,
    include_outliers=True,
    include_trends=True,
    time_column='date',
    chart_style='business'
)

Features

Auto-Detection

  • Column Types: Numeric, categorical, datetime, text, boolean
  • Data Quality: Missing values, duplicates, outliers
  • Relationships: Correlations, dependencies, groupings
  • Time Series: Trends, seasonality, anomalies

Generated Visualizations

| Data Type | Charts Generated | |-----------|-----------------| | Numeric | Histogram, box plot, trend line | | Categorical | Bar chart, pie chart, frequency table | | Time Series | Line chart, decomposition, forecast | | Correlations | Heatmap, scatter matrix | | Comparisons | Grouped bar, stacked area |

Narrative Insights

The storyteller generates plain-English insights including:

  • Executive summary of key findings
  • Notable patterns and anomalies
  • Statistical significance notes
  • Actionable recommendations
  • Data quality warnings

Output Sections

1. Executive Summary

High-level overview of the dataset and key findings in 2-3 paragraphs.

2. Data Profile

  • Row/column counts
  • Memory usage
  • Missing value analysis
  • Duplicate detection
  • Data type distribution

3. Statistical Analysis

For each numeric column:

  • Central tendency (mean, median, mode)
  • Dispersion (std dev, IQR, range)
  • Distribution shape (skewness, kurtosis)
  • Outlier count

4. Categorical Analysis

For each categorical column:

  • Unique values count
  • Top/bottom categories
  • Frequency distribution
  • Category balance assessment

5. Correlation Analysis

  • Correlation matrix with significance
  • Strongest relationships highlighted
  • Multicollinearity warnings

6. Time-Based Analysis

If datetime column detected:

  • Trend direction and strength
  • Seasonality patterns
  • Year-over-year comparisons
  • Growth rate calculations

7. Visualizations

Auto-generated charts saved to report:

  • Distribution plots
  • Trend charts
  • Comparison charts
  • Correlation heatmaps

8. Recommendations

Data-driven suggestions:

  • Columns needing attention
  • Potential data quality fixes
  • Analysis suggestions
  • Business implications

Chart Styles

# Available styles
styles = ['business', 'scientific', 'minimal', 'dark', 'colorful']

storyteller.generate_report(chart_style='business')

Configuration

storyteller = DataStoryteller(df)

# Configure analysis
storyteller.config.update({
    'max_categories': 20,       # Max categories to show
    'outlier_method': 'iqr',    # 'iqr', 'zscore', 'isolation'
    'correlation_threshold': 0.5,
    'significance_level': 0.05,
    'date_format': 'auto',      # Or specify like '%Y-%m-%d'
    'language': 'en',           # Narrative language
})

Supported File Formats

| Format | Extension | Notes | |--------|-----------|-------| | CSV | .csv | Auto-detect delimiter | | Excel | .xlsx, .xls | Multi-sheet support | | JSON | .json | Records or columnar | | Parquet | .parquet | For large datasets | | TSV | .tsv | Tab-separated |

Example Output

Sample Executive Summary

"This dataset contains 10,847 records across 15 columns, covering sales transactions from January 2023 to December 2024. Revenue shows a strong upward trend (+23% YoY) with clear seasonal peaks in Q4. The top 3 product categories account for 67% of total revenue. Notable finding: Customer acquisition cost has increased 15% while retention rate dropped 8%, suggesting potential profitability concerns worth investigating."

Sample Insight

"Strong correlation detected between marketing_spend and new_customers (r=0.78, p<0.001). However, this relationship weakens significantly after $50K monthly spend, suggesting diminishing returns beyond this threshold."

Best Practices

  1. Clean data first: Remove obvious errors before analysis
  2. Name columns clearly: Helps auto-detection and narratives
  3. Include dates: Enables time-series analysis
  4. Provide context: Tell the storyteller what the data represents

Limitations

  • Maximum recommended: 1M rows, 100 columns
  • Complex nested data may need flattening
  • Images/binary data not supported
  • PDF export requires reportlab package

Dependencies

pandas>=2.0.0
numpy>=1.24.0
matplotlib>=3.7.0
seaborn>=0.12.0
scipy>=1.10.0
reportlab>=4.0.0
openpyxl>=3.1.0