Agent Skills: Jupytext Skill

This skill should be used when the user asks to "convert notebook to text", "use jupytext", "version control notebooks", "share data between kernels", "set up multi-kernel project", "pair notebooks with Python files", "sync ipynb and py files", or needs multi-kernel projects (Python/R/Stata/SAS) with version-control-friendly notebooks.

UncategorizedID: edwinhu/workflows/jupytext

Install this agent skill to your local

pnpm dlx add-skill https://github.com/edwinhu/workflows/tree/HEAD/skills/jupytext

Skill Files

Browse the full folder contents for jupytext.

Download Skill

Loading file tree…

skills/jupytext/SKILL.md

Skill Metadata

Name
jupytext
Description
Use when working with jupytext — converting notebooks to/from text formats, syncing paired .ipynb/.py files, multi-kernel projects (Python/R/Stata/SAS), or executing notebooks via papermill.

Contents

Jupytext Skill

Jupytext converts Jupyter notebooks to/from text formats (.py, .R, .md), enabling version control and multi-kernel workflows.

Execution Enforcement

IRON LAW: NO EXECUTION CLAIM WITHOUT OUTPUT VERIFICATION

Before claiming ANY jupytext script executed successfully, follow this sequence:

  1. EXECUTE using the papermill pipeline: jupytext --to notebook --output - script.py | papermill - output.ipynb
  2. CHECK for execution errors (papermill exit code and stderr)
  3. VERIFY output.ipynb exists and is non-empty
  4. INSPECT outputs using notebook-debug skill verification
  5. CLAIM success only after verification passes

This is non-negotiable. Skipping papermill execution is NOT HELPFUL — the user gets a notebook that fails on first run.

Rationalization Table - STOP If You Think:

| Excuse | Reality | Do Instead | |--------|---------|------------| | "I converted to ipynb, so it works" | Conversion ≠ execution | EXECUTE with papermill, not just convert | | "The .py file looks correct" | Syntax correctness ≠ runtime correctness | RUN and CHECK outputs | | "I'll let the user execute it" | You're passing broken code | VERIFY before claiming completion | | "Just a conversion task, no execution needed" | User expects working notebook | EXECUTE to confirm it works | | "I can use jupyter nbconvert --execute" | Papermill has better error handling | USE the recommended papermill pipeline | | "I'll save the intermediate ipynb first" | Creates clutter | USE the recommended pipeline (no intermediate files) | | "Exit code 0 means success" | Papermill can succeed with errors in cells | CHECK output.ipynb for tracebacks |

Red Flags - STOP Immediately If You Think:

  • "Let me just convert and return the ipynb" → NO. EXECUTE with papermill first.
  • "The .py file is simple, can't have errors" → NO. Simple code fails too.
  • "I'll execute without papermill" → NO. Use the recommended pipeline.
  • "Conversion completed, so job done" → NO. Execution verification required.

Execution Verification Checklist

Before EVERY "notebook works" claim:

Conversion:

  • [ ] Correct format specified (py:percent recommended)
  • [ ] Conversion command succeeded
  • [ ] No syntax errors in conversion

Execution (MANDATORY):

  • [ ] Used recommended papermill pipeline: jupytext --to notebook --output - script.py | papermill - output.ipynb
  • [ ] Papermill exit code is 0
  • [ ] No errors in stderr
  • [ ] output.ipynb file created
  • [ ] output.ipynb is non-empty (>100 bytes)

Output Verification:

  • [ ] Used notebook-debug skill's verification checklist
  • [ ] No tracebacks in any cell
  • [ ] All cells have execution_count (not null)
  • [ ] Expected outputs present (plots, dataframes, metrics)
  • [ ] No unexpected warnings or errors

Multi-Kernel Projects (if applicable):

  • [ ] Correct kernel specified in header
  • [ ] Interchange files created (parquet/DTA)
  • [ ] Downstream notebooks can read interchange files

Only after ALL checks pass:

  • [ ] Claim "notebook executed successfully"

Gate Function: Jupytext Execution

Follow this sequence for EVERY jupytext task involving execution:

1. CONVERT  → jupytext --to notebook --output -
2. EXECUTE  → papermill - output.ipynb (with params if needed)
3. CHECK    → Verify exit code and stderr
4. INSPECT  → Use notebook-debug verification
5. VERIFY   → Outputs match expectations
6. CLAIM    → "Notebook works" only after all gates passed

NEVER skip execution gate. Converting without executing proves nothing about correctness.

Drive-Aligned Framing

Skipping papermill execution is NOT HELPFUL — the user gets a notebook that looks correct but fails when they run it.

This is not just format conversion - verify that the notebook executes correctly. The user expects a working notebook, not just syntactically valid code.

Core Concepts

Percent Format (Recommended)

Use percent format (py:percent) for all projects:

# %% [markdown]
# # Analysis Title

# %%
import pandas as pd
df = pd.read_csv("data.csv")

# %% tags=["parameters"]
input_file = "data.csv"

Cell markers: # %% for code, # %% [markdown] for markdown.

Markdown dollar signs: Always wrap $ in backticks to prevent LaTeX rendering - # Cost: $50`` not # Cost: $50

Project Configuration

Create jupytext.toml in project root:

formats = "ipynb,py:percent"
notebook_metadata_filter = "-all"
cell_metadata_filter = "-all"

Essential Commands

# Convert notebook to percent-format Python file
jupytext --to py:percent notebook.ipynb

# Convert Python script to Jupyter notebook format
jupytext --to notebook script.py

# Enable bidirectional pairing to keep formats synchronized
jupytext --set-formats ipynb,py:percent notebook.ipynb

# Synchronize paired notebook and text file
jupytext --sync notebook.ipynb

Execution (Recommended Pattern)

Always pipe to papermill for execution - no intermediate files:

# Convert script to notebook and execute in atomic operation
jupytext --to notebook --output - script.py | papermill - output.ipynb

# Convert and execute with parameter injection
jupytext --to notebook --output - script.py | papermill - output.ipynb -p start_date "2024-01-01" -p n_samples 1000

# Convert and execute with detailed logging output
jupytext --to notebook --output - script.py | papermill - output.ipynb --log-output

# Convert and execute in memory without saving intermediate files
jupytext --to notebook --output - script.py | papermill - -

Key flags:

  • --output - tells jupytext to write to stdout
  • papermill - output.ipynb reads from stdin, writes to file
  • papermill - - reads from stdin, writes to stdout (for inspection)

Why this pattern:

  1. No intermediate .ipynb files cluttering the workspace
  2. Single atomic operation - convert and execute together
  3. Papermill handles parameters, logging, and error reporting
  4. Works in CI/CD pipelines without temp file cleanup

Debugging Runtime Errors

After execution, use notebook-debug skill to inspect tracebacks in the output ipynb.

Multi-Kernel Data Sharing

Share data between Python/R/Stata/SAS via files:

| Route | Format | Write | Read | |-------|--------|-------|------| | Python -> R | Parquet | df.to_parquet() | arrow::read_parquet() | | Python -> Stata | DTA | df.to_stata() | use "file.dta" | | Any -> Any | CSV | Native | Native | | SQL queries | DuckDB | Query parquet directly | Query parquet directly |

Cross-Kernel Pipeline Pattern

Python (prep) -> Parquet -> R (stats) -> Parquet -> Python (report)
                    |
                    v
               Stata (.dta) -> Econometrics

Workflow Integration

Git Pre-commit Hook

Add the following to .pre-commit-config.yaml:

repos:
  - repo: https://github.com/mwouts/jupytext
    rev: v1.16.0
    hooks:
      - id: jupytext
        args: [--sync]  # Synchronize paired formats before commit

Version Control Strategy

Choose one approach:

  • Option A: Commit only .py files (add *.ipynb to .gitignore) for minimal repository size
  • Option B: Commit both formats to give reviewers format choice

Editor Integration

Configure editors for automatic synchronization:

  • VS Code: Install Jupytext extension for automatic bidirectional sync
  • JupyterLab: Right-click notebook and select "Pair Notebook" for synchronization

Project Structure

Standard multi-kernel project layout:

project/
├── jupytext.toml          # Project-wide settings
├── environment.yml        # Conda env with all kernels
├── notebooks/
│   ├── 01_python_prep.py  # Python percent format
│   ├── 02_r_analysis.R    # R percent format
│   └── 03_stata_models.do # Stata script
├── data/
│   ├── raw/
│   └── processed/         # Parquet/DTA interchange files
└── results/

Kernel Specification

Specify kernel in file header:

# ---
# jupyter:
#   kernelspec:
#     display_name: Python 3
#     language: python
#     name: python3
# ---

# %% [markdown]
# # Python Analysis

Quick Troubleshooting

| Issue | Solution | |-------|----------| | Sync conflict | Delete .ipynb, regenerate from .py | | Wrong kernel | Add kernelspec header to .py file | | Metadata noise | Set notebook_metadata_filter = "-all" | | Cell order lost | Use percent format (preserves structure) |

Additional Resources

Reference Files

Detailed patterns and configurations:

  • references/formats.md - All format specifications (percent, light, sphinx, myst, rmd, quarto), cell metadata, configuration options
  • references/kernels.md - Kernel setup (IRkernel, xeus-r, stata_kernel, pystata, saspy), environment configuration, troubleshooting
  • references/data-sharing.md - Cross-kernel data sharing patterns (parquet, dta, csv, duckdb), full pipeline examples, validation patterns

Example Files

Working code in examples/:

  • examples/python_analysis.py - Python percent-format template with common patterns
  • examples/r_analysis.R - R percent-format template for statistical analysis
  • examples/cross_kernel_pipeline.py - Multi-kernel data sharing example

Scripts

Utility scripts in scripts/:

  • scripts/init_project.sh - Initialize jupytext project with standard structure
  • scripts/sync_all.sh - Sync all paired notebooks in project

Best Practices

  1. Use percent format - Best balance of readability and cell preservation
  2. Strip metadata for git - Use metadata filters for cleaner diffs
  3. Use parquet for interchange - Type-safe, cross-language compatible format
  4. Document kernel requirements - Include in README or environment.yml
  5. Enable pre-commit hooks - Ensure synchronization before commits