Data Architecture Skill
Purpose
Design data architectures including data models, pipeline designs, governance frameworks, and quality management for operational and analytical systems.
Parameters
| Parameter | Type | Required | Validation | Default |
|-----------|------|----------|------------|---------|
| data_domain | string | ✅ | min: 20 chars | - |
| design_type | enum | ⚪ | model|pipeline|governance|quality | model |
| data_type | enum | ⚪ | operational|analytical|streaming | operational |
| volume_tier | enum | ⚪ | small|medium|large|massive | medium |
| output_format | enum | ⚪ | erd|yaml|json | erd |
Execution Flow
┌──────────────────────────────────────────────────────────┐
│ 1. VALIDATE: Check data domain and requirements │
│ 2. DISCOVER: Identify data sources and entities │
│ 3. MODEL: Create conceptual/logical/physical model │
│ 4. DESIGN: Pipeline or governance framework │
│ 5. QUALITY: Define data quality rules │
│ 6. VALIDATE: Check model consistency │
│ 7. DOCUMENT: Return data architecture │
└──────────────────────────────────────────────────────────┘
Retry Logic
| Error | Retry | Backoff | Max Attempts |
|-------|-------|---------|--------------|
| VALIDATION_ERROR | No | - | 1 |
| MODEL_GENERATION_ERROR | Yes | 1s | 2 |
| FORMAT_ERROR | Yes | 500ms | 3 |
Logging & Observability
log_points:
- event: design_started
level: info
data: [design_type, data_type]
- event: entities_identified
level: info
data: [entity_count, relationship_count]
- event: quality_rules_defined
level: info
data: [rule_count, dimensions_covered]
metrics:
- name: models_created
type: counter
labels: [design_type]
- name: design_time_ms
type: histogram
- name: entity_count
type: gauge
Error Handling
| Error Code | Description | Recovery |
|------------|-------------|----------|
| E401 | Missing data domain | Request domain description |
| E402 | Invalid relationships | Highlight circular/missing refs |
| E403 | Schema validation failed | Show validation errors |
| E404 | Unsupported volume tier | Suggest architectural changes |
Unit Test Template
test_cases:
- name: "E-commerce data model"
input:
data_domain: "E-commerce order management"
design_type: "model"
output_format: "erd"
expected:
has_entities: true
entities_include: ["Customer", "Order", "Product"]
has_relationships: true
valid_erd: true
- name: "Analytics pipeline"
input:
data_domain: "Customer analytics"
design_type: "pipeline"
data_type: "analytical"
expected:
has_ingestion: true
has_transformation: true
has_serving: true
- name: "Data quality rules"
input:
data_domain: "User profiles"
design_type: "quality"
expected:
has_dimensions: true
dimensions_include: ["completeness", "accuracy"]
has_rules: true
Troubleshooting
Common Issues
| Symptom | Root Cause | Resolution | |---------|------------|------------| | Missing relationships | Incomplete domain | Add missing entities | | Invalid ERD syntax | Format error | Validate Mermaid ERD | | Missing quality rules | Dimensions not specified | Add quality dimensions |
Debug Checklist
□ Is data domain clearly defined?
□ Are all entities identified?
□ Are relationships correctly typed?
□ Is output format valid?
□ Are quality dimensions covered?
Data Quality Dimensions
| Dimension | Example Rule | |-----------|--------------| | Completeness | NOT NULL checks | | Accuracy | Regex validation | | Consistency | Referential integrity | | Timeliness | SLA monitoring | | Uniqueness | Primary key constraints |
Integration
| Component | Trigger | Data Flow | |-----------|---------|-----------| | Agent 06 | Design request | Receives domain, returns model | | Agent 04 | Cloud data services | Cloud data platform |
Quality Standards
- Normalized: 3NF for operational, denormalized for analytical
- Documented: All entities and relationships described
- Quality-first: DQ rules for all critical fields
Version History
| Version | Date | Changes | |---------|------|---------| | 2.0.0 | 2025-01 | Production-grade: ERD, pipelines, DQ framework | | 1.0.0 | 2024-12 | Initial release |