BigQuery Data Export
Use this skill when exporting data from BigQuery to Cloud Storage or local files.
Basic Extract Command
bq extract \
--location=LOCATION \
--destination_format=FORMAT \
--compression=COMPRESSION \
PROJECT:DATASET.TABLE \
gs://bucket/file.ext
Extract to CSV
Basic CSV Export
bq extract \
--destination_format=CSV \
--print_header=true \
dataset.table \
gs://bucket/export.csv
CSV with Options
bq extract \
--destination_format=CSV \
--compression=GZIP \
--field_delimiter=',' \
--print_header=true \
dataset.table \
gs://bucket/export.csv.gz
CSV flags:
--field_delimiter=','- Column separator--print_header=true/false- Include header row
Tab-Delimited Export
bq extract \
--destination_format=CSV \
--field_delimiter=$'\t' \
dataset.table \
gs://bucket/export.tsv
Extract to JSON
Newline-Delimited JSON
bq extract \
--destination_format=NEWLINE_DELIMITED_JSON \
--compression=GZIP \
dataset.table \
gs://bucket/export.json.gz
Output format:
{"id": 1, "name": "Alice", "amount": 100.50}
{"id": 2, "name": "Bob", "amount": 250.75}
Extract to Avro
bq extract \
--destination_format=AVRO \
--compression=SNAPPY \
dataset.table \
gs://bucket/export.avro
Benefits:
- Preserves schema
- Efficient binary format
- Fast re-import to BigQuery
Extract to Parquet
bq extract \
--destination_format=PARQUET \
dataset.table \
gs://bucket/export.parquet
Benefits:
- Columnar format
- Good compression
- Compatible with many analytics tools
Compression Options
Available compression:
GZIP- Good compression, slower (CSV, JSON, Avro)SNAPPY- Fast, moderate compression (Avro, Parquet)DEFLATE- Similar to GZIP (Avro)NONE- No compression (fastest)
Example:
bq extract \
--destination_format=CSV \
--compression=GZIP \
dataset.table \
gs://bucket/export.csv.gz
Large Table Exports
Using Wildcards (>1 GB)
BigQuery limitation: 1 GB per file
Solution: Use wildcard in destination
bq extract \
--destination_format=CSV \
dataset.large_table \
'gs://bucket/export-*.csv'
Output:
gs://bucket/export-000000000000.csv
gs://bucket/export-000000000001.csv
gs://bucket/export-000000000002.csv
...
Shard Pattern
# Create sharded exports
bq extract \
--destination_format=AVRO \
dataset.large_table \
'gs://bucket/shard/data-*.avro'
Note: Number of files depends on data size, not configurable.
Export Specific Partitions
Single Partition
# Export 2024-01-15 partition only
bq extract \
--destination_format=CSV \
dataset.partitioned_table\$20240115 \
gs://bucket/export_20240115.csv
Date Range (use WHERE in EXPORT DATA)
See EXPORT DATA section below.
EXPORT DATA SQL Statement
Basic EXPORT DATA
EXPORT DATA OPTIONS(
uri='gs://bucket/export-*.csv',
format='CSV',
overwrite=true,
header=true,
field_delimiter=','
) AS
SELECT * FROM `project.dataset.table`
WHERE date >= '2024-01-01';
Export Query Results
EXPORT DATA OPTIONS(
uri='gs://bucket/aggregated-*.parquet',
format='PARQUET',
overwrite=true
) AS
SELECT
customer_id,
DATE(order_timestamp) as order_date,
SUM(amount) as total_amount,
COUNT(*) as order_count
FROM `project.dataset.orders`
WHERE DATE(order_timestamp) >= '2024-01-01'
GROUP BY customer_id, order_date;
Format Options
CSV:
EXPORT DATA OPTIONS(
uri='gs://bucket/*.csv',
format='CSV',
header=true,
field_delimiter=',',
compression='GZIP'
) AS SELECT ...;
JSON:
EXPORT DATA OPTIONS(
uri='gs://bucket/*.json',
format='JSON',
compression='GZIP'
) AS SELECT ...;
Avro:
EXPORT DATA OPTIONS(
uri='gs://bucket/*.avro',
format='AVRO',
compression='SNAPPY'
) AS SELECT ...;
Parquet:
EXPORT DATA OPTIONS(
uri='gs://bucket/*.parquet',
format='PARQUET'
) AS SELECT ...;
Export to Local Files (Not Recommended)
Small Results via Query
# For small datasets only
bq query \
--format=csv \
--max_rows=10000 \
--use_legacy_sql=false \
'SELECT * FROM `project.dataset.table` LIMIT 10000' \
> local_export.csv
Limitation: Not suitable for large datasets. Use GCS for production.
Export Scheduled (Automation)
Using Cloud Scheduler + EXPORT DATA
# Create scheduled query
bq mk --transfer_config \
--target_dataset=dataset \
--display_name='Daily Export' \
--schedule='every 24 hours' \
--params='{"query":"EXPORT DATA OPTIONS(uri='\''gs://bucket/daily-*.csv'\'', format='\''CSV'\'') AS SELECT * FROM dataset.table WHERE date = CURRENT_DATE()"}' \
--data_source=scheduled_query
Using Cloud Composer (Airflow)
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator
export_task = BigQueryInsertJobOperator(
task_id='export_to_gcs',
configuration={
'extract': {
'sourceTable': {
'projectId': 'project',
'datasetId': 'dataset',
'tableId': 'table'
},
'destinationUris': ['gs://bucket/export-*.csv'],
'destinationFormat': 'CSV'
}
}
)
Monitoring Exports
Check Extract Jobs
bq ls --jobs --max_results=10
Job Details
bq show -j JOB_ID
Failed Exports
SELECT
job_id,
user_email,
error_result.message as error_message,
creation_time
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
job_type = 'EXTRACT'
AND state = 'DONE'
AND error_result IS NOT NULL
AND creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
ORDER BY creation_time DESC;
Export Best Practices
Format Selection
CSV:
- ✅ Human-readable
- ✅ Universal compatibility
- ❌ Larger file size
- ❌ No schema preservation
JSON:
- ✅ Human-readable
- ✅ Preserves nested structures
- ❌ Larger file size
Avro:
- ✅ Preserves schema
- ✅ Efficient binary format
- ✅ Fast BigQuery re-import
- ❌ Not human-readable
Parquet:
- ✅ Columnar format
- ✅ Good compression
- ✅ Analytics tool compatible
- ❌ Not human-readable
Compression Recommendations
For long-term storage: GZIP (best compression) For processing pipelines: SNAPPY (fast) For network transfer: GZIP (smaller size) For speed: NONE (no compression overhead)
Wildcards for Large Exports
Always use wildcards for:
- Tables >500 MB
- Unknown data size
- Distributed processing
Example:
bq extract dataset.large_table 'gs://bucket/export-*.avro'
Cost Considerations
Export Costs
- BigQuery extract: FREE
- GCS storage: Standard GCS pricing
- Network egress: Free within same region
Optimization
Reduce costs:
- Export only needed columns (use EXPORT DATA with SELECT)
- Filter rows before export (WHERE clause)
- Use compression (smaller files)
- Export to GCS in same region as BigQuery
Example - filtered export:
EXPORT DATA OPTIONS(
uri='gs://bucket/*.parquet',
format='PARQUET'
) AS
SELECT customer_id, order_date, amount -- Only needed columns
FROM `project.dataset.orders`
WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) -- Last 30 days only
AND amount > 0; -- Filter out zero amounts
Common Patterns
Daily Export
#!/bin/bash
DATE=$(date +%Y%m%d)
bq extract \
--destination_format=CSV \
--compression=GZIP \
dataset.table\$$DATE \
gs://bucket/exports/daily_export_$DATE.csv.gz
Incremental Export
-- Create temp table with new data
CREATE TEMP TABLE new_data AS
SELECT * FROM `project.dataset.table`
WHERE updated_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR);
-- Export only new data
EXPORT DATA OPTIONS(
uri='gs://bucket/incremental/data-*.parquet',
format='PARQUET'
) AS
SELECT * FROM new_data;
Export with Transformation
EXPORT DATA OPTIONS(
uri='gs://bucket/transformed-*.csv',
format='CSV'
) AS
SELECT
customer_id,
UPPER(customer_name) as customer_name,
ROUND(amount, 2) as amount,
FORMAT_DATE('%Y-%m-%d', order_date) as order_date
FROM `project.dataset.orders`
WHERE order_date >= '2024-01-01';
Troubleshooting
"Permission denied"
Problem: No write access to GCS bucket Solution: Grant BigQuery service account Storage Object Creator role
"Table too large"
Problem: Export exceeds 1GB without wildcard
Solution: Use wildcard pattern gs://bucket/export-*.csv
"Invalid URI"
Problem: Incorrect GCS path format
Solution: Use gs://bucket/path/file format, not https://
"Quota exceeded"
Problem: Too many extract jobs Solution: Batch exports or increase quota
Quick Reference
Format recommendations:
- Re-import to BigQuery → Avro
- Analytics tools → Parquet
- Data exchange → CSV
- API consumption → JSON
Compression guide:
- Best ratio → GZIP
- Fastest → SNAPPY or NONE
- Balance → SNAPPY
Size limits:
- 1 GB per file (use wildcards)
- 10 TB per extract job
- 50,000 URIs per export
Syntax patterns:
# Single file
gs://bucket/file.csv
# Wildcard (recommended)
'gs://bucket/prefix-*.csv'
# Sharded with path
'gs://bucket/path/to/shard-*.parquet'