Database Analytics Skill
Query databases for data analysis using command-line clients, e.g.:
- PostgreSQL:
psql - MySQL:
mysql - ClickHouse:
clickhouse-client - BigQuery:
bq - Snowflake:
snowsqlor any toher database.
Connection
Connect using environment variables:
| Variable | Description | Example |
|---|---|---|
| DB_HOST | Database hostname | localhost |
| DB_PORT | Port number | 5432 |
| DB_DATABASE | Database name | analytics |
| DB_USER | Username | analyst |
| DB_PASSWORD | Password | (set in env) |
Verify connectivity before running queries:
psql -c "SELECT 1"
Prepare metadata
Explore database structure before writing any queries.
- Explore which datasets there are
- Explore which tables there are in db
- Using db-specific queries fetch the schema of a table
- If the table is small enough for each column fetch 10 most common disctinct values. If the table is too large, use sampling specific to db.
SQL Best Practices
- Use CTEs over subqueries
- Select specific columns, never SELECT *
- ORDER BY <date> DESC
- Always add
LIMIT 10(or less) when exploring data. Only remove the limit once you understand the result set size. Otherwise prefer outputing to the file in csv, parquet, json, or other formats - Use descriptive aliases, never one-symbol short abbreviations
- Before building complex analytics, verify assumptions doing MIN/MAX/COUNT aggregations for columns
- EXPLAIN ANALYZE for expensive queries
- Default time conditions When the user does not specify a time range, use sensible defaults:
| Query type | Default range | |---|---| | Trend / time-series | Last 30 days | | Snapshot / current state | Last 7 days | | Comparison (MoM, WoW) | Current vs previous period | | "Recent" | Last 7 days |
Use CURRENT_DATE - INTERVAL '...' for relative date filters
- Data freshness. Analytical databases may have ETL lag. If results look unexpectedly empty for recent dates - exclude it and report to the user
Error resolution
When a query returns zero rows, too many nulls, all-zeros columns
- Check column range - verify with
min/maxthat data exists for the period - Relax filters - remove one filter at a time to find which condition excludes all rows
- COUNT confirmation - run a simple
count(*)with each filter added incrementally - Explain to the user - state clearly that no data matched and why
- Never fabricate data - do not invent rows, approximate, or guess values
Output Guidelines
- Markdown tables for small results
- Shortened numbers
Use human-readable formats for large numbers:
- 1,234,567 -> 1.23M
- 45,678 -> 45.7K
- 0.1234 -> 12.3%
- Chart recommendations
- Time series -> line chart
- Category comparison -> bar chart
- Proportions -> pie chart or stacked bar
- Distribution -> histogram
- When the user asks something that resembles common known metrics
- Suggest the common definition of that metric
- Shortly explain how it is different from what user asks, PRO/CONs
Dashboards formats
- ASCII charts directly in the output
- png images using python
- interactive javascript charts