Shell Tools Skill
Master jq, xargs, GNU parallel, and advanced pipelines
Learning Objectives
After completing this skill, you will be able to:
- [ ] Process JSON with jq
- [ ] Use xargs for argument handling
- [ ] Parallelize tasks with GNU parallel
- [ ] Build efficient data pipelines
- [ ] Use utility commands effectively
Prerequisites
- Strong Bash fundamentals
- Text processing basics
- Understanding of pipes
Core Concepts
1. jq Essentials
# Basic queries
jq '.' file.json # Pretty print
jq '.key' file.json # Get key
jq '.array[0]' file.json # First element
jq '.nested.key' file.json # Nested
# Filtering
jq '.[] | select(.active)' # Filter
jq '.[] | select(.count > 10)'
# Transform
jq '.[] | {id, name}' # Select fields
jq 'map(.price * .qty)' # Calculate
jq -r '.[] | @csv' # To CSV
# From variables
jq -n --arg x "$VAR" '{value: $x}'
2. Xargs
# Basic usage
echo "a b c" | xargs echo
# Safe with spaces
find . -print0 | xargs -0 rm
# Limit arguments
cat list | xargs -n 1 process
cat list | xargs -n 10 process
# Parallel
cat list | xargs -P 4 -n 1 process
# Placeholder
cat urls | xargs -I {} curl {}
3. GNU Parallel
# Basic
parallel echo ::: a b c
# From file
parallel process :::: list.txt
# With options
parallel -j 4 process ::: *.txt
parallel --progress process ::: *.txt
# Complex
parallel -j 4 --delay 0.5 \
'curl -s {} | jq .name' :::: urls.txt
4. Pipeline Utilities
# Sort and unique
sort file.txt
sort -n file.txt # Numeric
sort -u file.txt # Unique
sort file | uniq -c # Count
# Cut and paste
cut -d',' -f1,3 file.csv
paste file1.txt file2.txt
# Transform
tr 'a-z' 'A-Z' < file
tr -d '\r' < dos.txt > unix.txt
Common Patterns
API Data Pipeline
curl -s 'https://api.example.com/users' |
jq -r '.[] | select(.active) | [.id, .email] | @csv' |
sort -t',' -k2 |
head -20
Parallel Processing
# Compress all logs in parallel
find . -name "*.log" |
parallel -j 4 gzip
# Batch API calls with rate limit
cat ids.txt |
parallel -j 5 --delay 0.2 \
'curl -s "https://api.example.com/item/{}"'
Data Transformation
# JSON to formatted output
cat data.json |
jq -r '.items[] | "\(.id)\t\(.name)\t\(.price)"' |
column -t
Anti-Patterns
| Don't | Do | Why |
|-------|-----|-----|
| Parse JSON with grep | Use jq | Proper parsing |
| Sequential when parallel | Use parallel | Speed |
| cat \| xargs | xargs < file | Efficiency |
Practice Exercises
- JSON Processor: Transform API response
- Batch Processor: Parallel file processing
- Log Analyzer: Complex log pipeline
- Data Migrator: Transform and load data
Troubleshooting
Common Errors
| Error | Cause | Fix |
|-------|-------|-----|
| jq: error | Invalid JSON | Validate with jq . |
| xargs: arg too long | Too many args | Use -n |
| parallel: not found | Not installed | apt install parallel |
Debug Techniques
# Validate JSON
jq '.' < input.json
# Debug pipeline
command1 | tee /dev/stderr | command2
# Test jq filter
echo '{"a":1}' | jq '.a'
Performance Tips
# Faster sorting
LC_ALL=C sort file.txt
# Parallel for CPU-bound
parallel -j $(nproc) process ::: *.txt
# Stream large files
jq -c '.[]' large.json | while read -r line; do
# process line
done