Soda CLI Guide
The Soda CLI (sodacli) manages data quality from the command line. All commands follow sodacli <resource> <action>.
Important: always use --output json when running commands
When calling sodacli from an agent, always pass --output json to get structured, parseable output. Without it, the CLI outputs human-readable tables that are harder to parse.
Quick Reference — What Works
Fully working (tested against live API):
auth— login, logout, status, switch profilesdatasource— list, get, create, update, delete, onboard (full wizard), test-connection, diagnosticsdataset— list (with filters), get, update, profiling, diagnostics, permissions, onboardcontract— list, push, pull, diff, lint (JSON schema validation), create (skeleton/copilot), verify (cloud or local via soda-core)monitor— list, config, add (column/custom), update, deleteresults— list (with all filters, sorting, date ranges)job— status, logsrunner— list, get, create (returns API keys for Kubernetes deployment), deleteiam— user list, user invite, group CRUD, role listsecret— list, get, create, update, delete (client-side encrypted with AES-256-GCM + RSA-OAEP)
Not yet available (API blocked):
incident— list, get, update (documented in OpenAPI spec but still returns HTML on dev)dataset attributes— (documented in OpenAPI spec but still returns HTML on dev)notification— rules and integrationsjob list,job cancelcontract proposal— list, pull, push, closemonitor add --type dataset(dataset monitors exist by default, no write endpoint)
Exit Codes
| Code | Meaning |
|------|---------|
| 0 | Success / all checks passed |
| 1 | One or more checks failed |
| 2 | Execution error |
| 3 | Authentication error — run sodacli auth login |
Authentication
# Non-interactive (for agents and CI/CD)
sodacli auth login --host cloud.soda.io --api-key-id <id> --api-key-secret <secret>
# Check connection
sodacli auth status
# Named profiles
sodacli auth login --profile prod --host cloud.soda.io --api-key-id <id> --api-key-secret <secret>
sodacli auth switch prod
sodacli dataset list --profile prod # any command accepts --profile
Credentials stored in ~/.soda/credentials. Generate API keys at https://docs.soda.io/reference/generate-api-keys
Core Workflows
Discover what's there
# List datasources
sodacli datasource list --output json
# Get datasource details
sodacli datasource get <id> --output json
# List datasets (default limit: 10)
sodacli dataset list --output json
sodacli dataset list --datasource <name> --status onboarded --limit 50 --output json
# Filter datasets by name, date range, tag
sodacli dataset list --filter "orders" --from 2026-01-01 --until 2026-12-31 --output json
# Get dataset details
sodacli dataset get <dataset-id> --output json
# View profiling data (column stats, row count, missing %)
sodacli dataset profiling <dataset-id> --output json
Onboard a datasource (end-to-end)
# Full onboard: create datasource + discover + monitoring + profiling + contracts + verify
sodacli datasource onboard warehouse.yml --monitoring --profiling --contracts skeleton
# Or step by step:
sodacli datasource create warehouse.yml
# ⚠ Discovery is async — datasets won't appear immediately.
# Use `sodacli job logs <scan-id>` (printed by create) to confirm completion,
# or wait ~10-15 seconds before listing datasets.
sodacli datasource onboard <datasource-id> --monitoring --profiling --contracts none
Onboarding a single dataset from a new datasource:
dataset onboard only works after datasource onboard has run at least once. If you only need one table, you must still run datasource onboard first (which onboards all discovered datasets), then manage individual datasets afterwards.
# This will NOT work right after `datasource create`, even if the dataset appears in `dataset list`:
# sodacli dataset onboard <dataset-id> ← returns "dataset not found"
# Instead, onboard the datasource first, then work with individual datasets:
sodacli datasource onboard <datasource-id> --monitoring --profiling --contracts copilot
# Now `dataset onboard`, `dataset profiling`, `monitor add`, etc. work on individual datasets.
Contract generation modes:
skeleton— schema structure + not-null checks only (fast, deterministic)copilot— AI analyzes your data and generates format, range, uniqueness, and business-logic checks (takes longer, may need manual fixes — see Troubleshooting below)
When --contracts skeleton or --contracts copilot is used, the onboard flow automatically verifies the generated contracts against your data and displays pass/fail results.
Datasource config file format:
type: snowflake # snowflake, postgres, bigquery, mysql, redshift, etc.
name: my_warehouse
connection:
host: account.snowflakecomputing.com
database: ANALYTICS
schema: PUBLIC
user: soda_user
password: secret
role: SODA_ROLE
warehouse: COMPUTE_WH
Contracts (data quality checks)
# List contracts
sodacli contract list --output json
# Generate a contract from live schema
sodacli contract create --dataset datasource/db/schema/table --mode skeleton --output my_table.yml
# Pull existing contract from cloud
sodacli contract pull datasource/db/schema/table
# Edit locally, then push back
sodacli contract push my_table.yml
# Compare local vs cloud
sodacli contract diff my_table.yml
# Validate syntax (offline, no auth needed)
sodacli contract lint my_table.yml
sodacli contract lint contracts/*.yml # glob support
# Run checks via cloud Runner (local file)
sodacli contract verify my_table.yml --output json
# Run checks via cloud Runner using dataset DQN — no local file needed
sodacli contract verify datasource/db/schema/table --output json
# Run checks locally via soda-core (no cloud auth needed)
sodacli contract verify my_table.yml --local --datasource datasource.yml
# Fire and forget (returns immediately, cloud mode only)
sodacli contract verify my_table.yml --no-wait
Monitors (ML anomaly detection)
# List monitors on a dataset (--dataset is required)
sodacli monitor list --dataset <dataset-id> --output json
# View monitoring config
sodacli monitor config <dataset-id> --output json
# Enable monitoring with schedule
sodacli monitor config <dataset-id> --enable --schedule "0 6 * * *" --timezone "UTC"
# Add column monitor
sodacli monitor add --dataset <id> --type column --column revenue --metric avg
sodacli monitor add --dataset <id> --type column --column order_id --metric count --group-by region
# Group-by with excluded values (skip specific partitions)
sodacli monitor add --dataset <id> --type column --column amount --metric avg \
--group-by region --exclude-values region=EU,APAC
# Add custom SQL monitor
sodacli monitor add --dataset <id> --type custom \
--name "dup check" \
--sql "SELECT count(*) as c FROM t GROUP BY id HAVING count(*) > 1" \
--result-metric c
# Update a monitor (enable/disable, change SQL)
sodacli monitor update <monitor-id> --dataset <id> --disable
sodacli monitor update <monitor-id> --dataset <id> --name "renamed" --sql "SELECT 1 as c"
# Delete
sodacli monitor delete <monitor-id> --dataset <id>
Column metric types: count, missing-pct, duplicate-pct, distinct-count, min, max, avg, sum, std-dev, variance, q1, median, q3, min-length, max-length, avg-length, freshness
Results (check outcomes)
# Recent results
sodacli results list --output json
# Filter by dataset, status, date
sodacli results list --dataset <id> --status failing --output json
sodacli results list --dataset-name "orders" --from 2026-03-01 --until 2026-03-31 --output json
# Sort and paginate
sodacli results list --limit 50 --sort name --order asc --output json
Permissions
# List roles and users (to find IDs)
sodacli iam role list --output json
sodacli iam user list --output json
# List dataset permissions
sodacli dataset permissions list <dataset-id> --output json
# Grant / revoke
sodacli dataset permissions assign <dataset-id> --role <role-id> --user <user-id>
sodacli dataset permissions revoke <dataset-id> --role <role-id> --user <user-id>
Groups
sodacli iam group list --output json
sodacli iam group create --name "Data Engineers" --member alice@co.com --member bob@co.com
sodacli iam group update <id> --add-member carol@co.com
sodacli iam group update <id> --remove-member bob@co.com
sodacli iam group delete <id>
Jobs (scan status & logs)
# Check scan/job status (shows state, timing, check summary)
sodacli job status <scan-id> --output json
# View logs
sodacli job logs <scan-id>
sodacli job logs <scan-id> --follow # stream live
Secrets (encrypted credentials)
# List secrets
sodacli secret list --output json
# Create — value is encrypted client-side (AES-256-GCM + RSA-OAEP) before sending
sodacli secret create --name DB_PASSWORD # masked interactive prompt
sodacli secret create --name DB_PASSWORD --value "s3cret" # via flag
echo "s3cret" | sodacli secret create --name DB_PASSWORD # via stdin pipe
# Update value
sodacli secret update <id> # masked prompt
sodacli secret update <id> --value "new-value" # via flag
# Delete
sodacli secret delete <id>
# Reference in datasource configs: ${secret.DB_PASSWORD}
User invite
sodacli iam user invite --email alice@co.com --email bob@co.com # up to 10 per call
Datasource connection & diagnostics
# Test a datasource connection (async via Runner)
sodacli datasource test-connection config.yml
# Update datasource label, runner, or connection
sodacli datasource update <id> --label "Production DW"
# View diagnostics warehouse config
sodacli datasource diagnostics <id>
# Configure diagnostics warehouse
sodacli datasource diagnostics <id> --enable --warehouse same --collect-results --collect-failed-rows
sodacli datasource diagnostics <id> --max-failed-rows 5000 --expose-failed-rows-query
sodacli datasource diagnostics <id> --table-template "soda_{dataset_name}"
sodacli datasource diagnostics <id> --failed-rows-cta --failed-rows-cta-title "View in Snowflake" --failed-rows-cta-url "https://app.snowflake.com"
Profiling and dataset diagnostics
# Enable profiling
sodacli dataset profiling <dataset-id> --enable --schedule "0 6 * * *" --sampling-rows 1000000
# View profiling data
sodacli dataset profiling <dataset-id> --output json
# Set time-partition column
sodacli dataset time-partition <dataset-id> --column created_at
# Configure dataset-level diagnostics overrides
sodacli dataset diagnostics <dataset-id> --collect-results --collect-failed-rows
CI/CD Pattern
# Option 1: Cloud mode (needs auth)
sodacli auth login --host cloud.soda.io \
--api-key-id "$SODA_API_KEY_ID" \
--api-key-secret "$SODA_API_KEY_SECRET" \
--no-interactive
sodacli contract lint contracts/*.yml # validate syntax first
sodacli contract verify contracts/orders.yml --no-interactive --output json # verify via cloud Runner
# Option 2: Local mode (needs soda-core on PATH, no cloud auth)
sodacli contract lint contracts/*.yml
sodacli contract verify contracts/orders.yml --local --datasource datasource.yml
# Exit codes drive pipeline:
# 0 = all checks passed
# 1 = checks failed → fail the pipeline
# 2 = execution error → retry or alert
# 3 = auth error → check credentials
Troubleshooting
Contract verification fails with exit code 2
When contract verify returns exit code 2, the output may only show "0 checks passed" without details. Always check the scan logs for the actual error:
sodacli job logs <scan-id>
The scan ID is printed by the contract verify command.
Copilot contracts: UUID regex checks fail on Postgres
Copilot may generate valid_format regex checks on UUID columns (e.g., '^[0-9a-fA-F]{8}-...'). On Postgres, the ~ regex operator doesn't work on uuid type columns, causing a SQL error. Fix: Remove invalid / valid_format checks on UUID columns — the database data type already enforces UUID format. Keep missing and duplicate checks, which work fine.
Discovery scan not complete
After datasource create, the discovery scan runs asynchronously. If dataset list returns empty or dataset onboard returns "not found", the scan may still be running. Check with:
sodacli job logs <scan-id> # scan-id is printed by datasource create
Global Flags (all commands)
| Flag | Short | Description |
|------|-------|-------------|
| --output table\|json\|csv | -o | Output format (auto-detects TTY vs pipe) |
| --profile <name> | | Override active auth profile |
| --no-color | | Disable color output |
| --quiet | -q | Suppress non-essential output |
| --verbose | -v | Show detailed output |
| --no-interactive | | Never prompt; fail with clear error if input missing |
Full Command Reference
For detailed flags and usage of every command, see command-reference.md.