Spreadsheets
Opinionated tabular-data handling for macOS. TSV/CSV is the primary format; .xlsx is the exception. Stack: qsv for fast validation/profiling, duckdb for SQL, uv-run Python (stdlib csv + Decimal) for precision transforms, qsv excel/DuckDB/fastexcel for values-only Excel reads, XlsxWriter for new workbooks, openpyxl for existing workbook mechanics, and headless LibreOffice for formula recalculation. Run all Python through uv — never bare python or pip.
Tool Selection
scripts/peek.py, scripts/profile.py, and scripts/recalc.py are bundled with this skill. Resolve them relative to this SKILL.md's directory, not the user's current project. They are not supposed to exist in the target repo.
| Job | Use |
| ------------------------------------------------------------ | --------------------------------------------------- |
| First look or structural validation of CSV/TSV | uv run scripts/peek.py <file> [--strict] |
| Quality profile of CSV/TSV | uv run scripts/profile.py <file> [--markdown] |
| Counts, stats, frequencies, dedupe, column select | qsv (use --cache-threshold 0 for stats) |
| Joins, group-bys, pivots, cross-file SQL, format conversion | duckdb -c "..." |
| Row-level transforms, precision-critical edits | uv run Python with a PEP 723 header |
| Anything .xlsx in or out | read references/xlsx.md first |
| Recalculating .xlsx formulas | uv run scripts/recalc.py <file.xlsx> |
| Row/column-aware diff of two tables | bunx daff old.tsv new.tsv |
| Interactive viewing (suggest to the user; never launch TUIs) | csvlens, vd, Numbers.app |
Read references/recipes.md for common exact-decimal transforms, idempotent appends, schema validation, keyed diffs, and safe workbook output patterns.
Hard Rules
- Decimals, never floats. Crypto amounts carry up to 18 decimals — beyond float64. Keep amounts as strings end to end; compute with
decimal.Decimalor DuckDBDECIMAL(38, 18). Read withall_varchar = truein DuckDB and plain stdlibcsvin Python. If pandas is unavoidable, passdtype=str. - Touch only what was asked. No reordering, re-quoting, renumbering, or whitespace "tidying" outside the requested change. The diff must contain the change and nothing else.
- House format for authored files: TSV; UTF-8 without BOM; LF line endings; single trailing newline; lowercase
snake_caseheaders; ISO 8601 dates (YYYY-MM-DD; prb-finance timestamps useYYYY-MM-DD@HH:MM:SS);.decimal point; no thousands separators or currency symbols inside cells;-for null. Conventions already present in an existing file override every one of these. - Strip BOMs on read, never write them. Open files of unknown provenance with
encoding="utf-8-sig". - Validate after editing. Before no-shape-change edits, save a
peek.pyreport; after the edit, runpeek.py --strict --expect-like <before-report>. For intentional row/schema changes, use--strict --expect-columns <n>instead. Add--housefor authored TSVs that should follow this skill's house format. Do not skip this becausepeek.pyis absent from the target repo; the script lives next to thisSKILL.md. In prb-finance:just tsv-check, thenjust cli::write-changedto regenerate derived reports — never hand-edit generated.pool.tsv/.annual.tsv/.mdartifacts. - In-place edits are atomic. Write to a temp file next to the target, verify it, then
mvover the original. - Finance data stays local. Treat transaction logs and bank/exchange exports as private tax records: never send their contents to web services or external APIs.
- Escape spreadsheet formula injection when writing cells sourced from external data: prefix a leading
=,+, or@with'(a bare-null is exempt).
Inspect
peek.py is at scripts/peek.py inside this skill directory, beside this SKILL.md. In normal installed-skill use, that means:
~/.agents/skills/spreadsheets/scripts/peek.py
It is not a project-local helper and does not need to be installed in the repo being edited. If your current directory is the skill directory, run:
uv run scripts/peek.py <file> [--rows N] [--strict] [--house] [--expect-like before.peek.json]
If your current directory is the target project, run it by absolute path:
uv run ~/.agents/skills/spreadsheets/scripts/peek.py <file> [--rows N] [--strict] [--house] [--expect-like before.peek.json]
The report includes status, issues, encoding and BOM, newline style and trailing newline, delimiter and how it was detected, header with duplicates flagged, column/row counts, ragged and empty rows, - null usage, qsv validation metadata when available, and sample rows. Default inspection exits 0 for parseable delimited files. Validation flags exit 1 when they find issues. Operational errors such as missing files, binary spreadsheets, or empty files exit 2.
Fast validation loop:
# Before edits that should preserve shape and format
uv run ~/.agents/skills/spreadsheets/scripts/peek.py txs.tsv > txs.before.peek.json
# After the edit
uv run ~/.agents/skills/spreadsheets/scripts/peek.py txs.tsv --strict --expect-like txs.before.peek.json
# If rows or headers intentionally changed, lock only the resulting width
uv run ~/.agents/skills/spreadsheets/scripts/peek.py txs.tsv --strict --expect-columns 12
# If the file is private and you need to show the report, hide sample values
uv run ~/.agents/skills/spreadsheets/scripts/peek.py txs.tsv --redact-samples
--expect-like catches drift in column count, header, delimiter, encoding, newline style, trailing newline, and data row count; it also fails on newly introduced ragged, empty, duplicate-header, BOM, or qsv validation problems. --engine auto uses a fast parser for simple unquoted delimited files and falls back to stdlib csv; use --engine python if you need to force the conservative path. On a binary spreadsheet, peek.py exits with a pointer to the xlsx workflow.
Full local quality profile:
uv run ~/.agents/skills/spreadsheets/scripts/profile.py txs.tsv --markdown --redact-samples
The profile combines peek.py, qsv stats/frequencies without sidecar caches, header safety, formula-injection detection, and next-step recommendations. Use JSON output by default for machine reading; use --markdown for a concise human report.
Quick follow-ups with qsv:
qsv count txs.tsv # row count (excludes header)
qsv headers txs.tsv # numbered column names
qsv stats --cache-threshold 0 -E txs.tsv | qsv table # per-column types/ranges/cardinality without sidecars
qsv frequency -s event txs.tsv # value distribution of one column
qsv select date_utc,amount txs.tsv
qsv dedup txs.tsv
qsv infers the input delimiter from the file extension, but stdout is always comma-separated. When the result must stay TSV, write it with -o out.tsv (the output extension sets the delimiter) — never shell redirection. qsv stats creates sidecar caches by default when runs are slow; use --cache-threshold 0 unless the user explicitly wants reusable qsv caches or temporary indexes for very large files.
Query with DuckDB
Canonical read — everything as strings, - mapped to NULL:
FROM read_csv('txs.tsv', delim = '\t', header = true, all_varchar = true, nullstr = '-');
-- Profile every column
SUMMARIZE SELECT * FROM read_csv('txs.tsv', delim = '\t', all_varchar = true, nullstr = '-');
-- Aggregate with exact decimals
SELECT event, SUM(amount::DECIMAL(38, 18)) AS total
FROM read_csv('txs.tsv', delim = '\t', all_varchar = true, nullstr = '-')
GROUP BY event
ORDER BY total DESC;
-- Write a TSV back out
COPY (SELECT ...) TO 'out.tsv' (FORMAT csv, DELIMITER '\t', HEADER true, NULLSTR '-');
DuckDB also reads and writes .xlsx (read_xlsx, COPY ... (FORMAT xlsx)) — see references/xlsx.md.
Transform with uv-run Python
Stdlib csv keeps every cell a string — precision-safe by default. Script template:
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.12"
# dependencies = []
# ///
import csv
from decimal import Decimal
with open("in.tsv", encoding="utf-8-sig", newline="") as f:
rows = list(csv.DictReader(f, delimiter="\t"))
# transform here; use Decimal(row["amount"]) for arithmetic
with open("out.tsv", "w", encoding="utf-8", newline="") as f:
writer = csv.DictWriter(f, fieldnames=rows[0].keys(), delimiter="\t", lineterminator="\n")
writer.writeheader()
writer.writerows(rows)
- Pass
newline=""to everyopen()thecsvmodule touches, andlineterminator="\n"for LF output. - Third-party deps go in the PEP 723 block; for one-liners use
uv run --with <pkg> python -c "...". - For idempotent backfills, dedupe by multiset difference — count existing identical rows and append only the surplus, because identical rows can be legitimate (e.g. batch payouts).
Excel (.xlsx)
Read references/xlsx.md whenever a .xlsx/.xlsm is input or deliverable: openpyxl create/edit, DuckDB xlsx I/O, styling, conversion recipes, and the recalculation loop. The two absolutes:
- Write real formulas (
=SUM(B2:B9)), not values precomputed in Python. - After writing any formula, run
uv run scripts/recalc.py <file.xlsx>and deliver only when it exits0with"status": "success". Formula errors and incomplete cached values exit nonzero by default; use--softonly when you deliberately want a report without failing automation.
Recalculation needs LibreOffice: brew install --cask libreoffice. The script finds the app bundle on its own; soffice does not need to be on PATH.