Agent Skills: Senior Cloud Architect

Expert cloud architecture covering AWS, GCP, Azure, multi-cloud strategy, cost optimization, and cloud-native design.

UncategorizedID: borghei/claude-skills/senior-cloud-architect

Install this agent skill to your local

pnpm dlx add-skill https://github.com/borghei/Claude-Skills/tree/HEAD/engineering-team/senior-cloud-architect

Skill Files

Browse the full folder contents for senior-cloud-architect.

Download Skill

Loading file tree…

engineering-team/senior-cloud-architect/SKILL.md

Skill Metadata

Name
senior-cloud-architect
Description
Expert cloud architecture covering AWS, GCP, Azure, multi-cloud strategy, cost optimization, and cloud-native design.

Senior Cloud Architect

Expert-level cloud architecture and infrastructure design.

Core Competencies

  • Multi-cloud architecture
  • AWS, GCP, Azure platforms
  • Cloud-native design patterns
  • Cost optimization
  • Security and compliance
  • Migration strategies
  • Disaster recovery
  • Infrastructure automation

Cloud Platform Comparison

| Service | AWS | GCP | Azure | |---------|-----|-----|-------| | Compute | EC2, ECS, EKS | GCE, GKE | VMs, AKS | | Serverless | Lambda | Cloud Functions | Azure Functions | | Storage | S3 | Cloud Storage | Blob Storage | | Database | RDS, DynamoDB | Cloud SQL, Spanner | SQL DB, CosmosDB | | ML | SageMaker | Vertex AI | Azure ML | | CDN | CloudFront | Cloud CDN | Azure CDN |

AWS Architecture

Well-Architected Framework

Pillars:

  1. Operational Excellence

    • Infrastructure as Code
    • Monitoring and observability
    • Incident response
    • Continuous improvement
  2. Security

    • Identity and access management
    • Data protection
    • Infrastructure protection
    • Incident response
  3. Reliability

    • Fault tolerance
    • Disaster recovery
    • Change management
    • Failure testing
  4. Performance Efficiency

    • Right-sizing resources
    • Monitoring performance
    • Trade-off decisions
    • Keeping current
  5. Cost Optimization

    • Cost awareness
    • Right-sizing
    • Reserved capacity
    • Efficient resources
  6. Sustainability

    • Region selection
    • Efficient algorithms
    • Hardware utilization
    • Data management

Reference Architecture

┌─────────────────────────────────────────────────────────────┐
│                        Route 53 (DNS)                       │
└─────────────────────────────┬───────────────────────────────┘
                              │
┌─────────────────────────────▼───────────────────────────────┐
│                    CloudFront (CDN)                         │
│                    WAF (Web Application Firewall)           │
└─────────────────────────────┬───────────────────────────────┘
                              │
┌─────────────────────────────▼───────────────────────────────┐
│                Application Load Balancer                     │
└──────────┬───────────────────────────────────┬──────────────┘
           │                                   │
┌──────────▼──────────┐             ┌──────────▼──────────┐
│   ECS/EKS Cluster   │             │   ECS/EKS Cluster   │
│   (AZ-a)            │             │   (AZ-b)            │
└──────────┬──────────┘             └──────────┬──────────┘
           │                                   │
┌──────────▼───────────────────────────────────▼──────────┐
│                    ElastiCache (Redis)                   │
└─────────────────────────────┬───────────────────────────┘
                              │
┌─────────────────────────────▼───────────────────────────┐
│                    RDS Multi-AZ                          │
│                    (Primary + Standby)                   │
└─────────────────────────────────────────────────────────┘

Terraform AWS Module

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name = "${var.project}-${var.environment}"
  cidr = var.vpc_cidr

  azs             = ["${var.region}a", "${var.region}b", "${var.region}c"]
  private_subnets = var.private_subnets
  public_subnets  = var.public_subnets

  enable_nat_gateway     = true
  single_nat_gateway     = var.environment != "production"
  enable_dns_hostnames   = true
  enable_dns_support     = true

  tags = local.common_tags
}

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.0"

  cluster_name    = "${var.project}-${var.environment}"
  cluster_version = "1.28"

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  cluster_endpoint_public_access  = true
  cluster_endpoint_private_access = true

  eks_managed_node_groups = {
    main = {
      instance_types = var.node_instance_types
      min_size       = var.node_min_size
      max_size       = var.node_max_size
      desired_size   = var.node_desired_size
    }
  }

  tags = local.common_tags
}

module "rds" {
  source  = "terraform-aws-modules/rds/aws"
  version = "~> 6.0"

  identifier = "${var.project}-${var.environment}"

  engine               = "postgres"
  engine_version       = "15"
  family               = "postgres15"
  major_engine_version = "15"
  instance_class       = var.db_instance_class

  allocated_storage     = var.db_allocated_storage
  max_allocated_storage = var.db_max_allocated_storage

  db_name  = var.db_name
  username = var.db_username
  port     = 5432

  multi_az               = var.environment == "production"
  db_subnet_group_name   = module.vpc.database_subnet_group
  vpc_security_group_ids = [module.security_group.security_group_id]

  backup_retention_period = var.environment == "production" ? 30 : 7
  skip_final_snapshot     = var.environment != "production"

  tags = local.common_tags
}

Cost Optimization

Reserved vs On-Demand vs Spot

| Type | Discount | Commitment | Use Case | |------|----------|------------|----------| | On-Demand | 0% | None | Variable workloads | | Reserved | 30-72% | 1-3 years | Steady-state | | Savings Plans | 30-72% | 1-3 years | Flexible compute | | Spot | 60-90% | None | Fault-tolerant |

Cost Optimization Strategies

Right-sizing:

def analyze_utilization(instance_id: str, days: int = 14):
    """Analyze CPU/memory utilization for right-sizing recommendations."""
    cloudwatch = boto3.client('cloudwatch')

    metrics = cloudwatch.get_metric_statistics(
        Namespace='AWS/EC2',
        MetricName='CPUUtilization',
        Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
        StartTime=datetime.now() - timedelta(days=days),
        EndTime=datetime.now(),
        Period=3600,
        Statistics=['Average', 'Maximum']
    )

    avg_cpu = sum(p['Average'] for p in metrics['Datapoints']) / len(metrics['Datapoints'])
    max_cpu = max(p['Maximum'] for p in metrics['Datapoints'])

    if avg_cpu < 10 and max_cpu < 30:
        return 'downsize'
    elif avg_cpu > 80:
        return 'upsize'
    else:
        return 'optimal'

Cost Allocation Tags:

required_tags:
  - Environment: production|staging|development
  - Project: project-name
  - Owner: team-name
  - CostCenter: cost-center-id

automation:
  - Untagged resources alert after 24 hours
  - Auto-terminate development resources after 7 days
  - Weekly cost reports by tag

Cost Dashboard

┌─────────────────────────────────────────────────────────────┐
│                    Monthly Cost Summary                      │
├─────────────────────────────────────────────────────────────┤
│  Total: $45,231     vs Last Month: +5%                      │
│                                                              │
│  By Service:                    By Environment:              │
│  ├── EC2: $18,500 (41%)        ├── Production: $38,000      │
│  ├── RDS: $12,000 (27%)        ├── Staging: $4,500          │
│  ├── S3: $3,200 (7%)           └── Development: $2,731      │
│  ├── Lambda: $1,800 (4%)                                     │
│  └── Other: $9,731 (21%)       Savings Opportunity: $8,200   │
│                                                              │
│  Recommendations:                                            │
│  • Convert 12 instances to Reserved (save $4,200/mo)        │
│  • Delete 5 unused EBS volumes (save $180/mo)               │
│  • Resize 8 over-provisioned instances (save $1,800/mo)     │
└─────────────────────────────────────────────────────────────┘

Disaster Recovery

DR Strategies

| Strategy | RTO | RPO | Cost | |----------|-----|-----|------| | Backup & Restore | Hours | Hours | $ | | Pilot Light | Minutes | Minutes | $$ | | Warm Standby | Minutes | Seconds | $$$ | | Multi-Site Active | Seconds | Near-zero | $$$$ |

Multi-Region Architecture

┌────────────────────────────────────────────────────────────┐
│                      Global Load Balancer                   │
│                      (Route 53 / Cloud DNS)                 │
└──────────────┬─────────────────────────────┬───────────────┘
               │                             │
┌──────────────▼──────────────┐ ┌────────────▼──────────────┐
│      Primary Region         │ │     Secondary Region       │
│      (us-east-1)           │ │     (us-west-2)           │
│                            │ │                            │
│  ┌──────────────────────┐  │ │  ┌──────────────────────┐  │
│  │   Application Layer  │  │ │  │   Application Layer  │  │
│  │   (Active)          │  │ │  │   (Standby/Active)   │  │
│  └──────────┬───────────┘  │ │  └──────────┬───────────┘  │
│             │              │ │             │              │
│  ┌──────────▼───────────┐  │ │  ┌──────────▼───────────┐  │
│  │   Database           │──┼─┼──│   Database           │  │
│  │   (Primary)         │  │ │  │   (Read Replica)     │  │
│  └──────────────────────┘  │ │  └──────────────────────┘  │
└────────────────────────────┘ └────────────────────────────┘
                    │
                    │ Cross-Region Replication
                    ▼
        ┌──────────────────────┐
        │     S3 Backup        │
        │   (Multi-Region)     │
        └──────────────────────┘

Backup Strategy

backup_policy:
  database:
    frequency: continuous
    retention: 35 days
    cross_region: true
    encryption: aws/rds

  application_data:
    frequency: daily
    retention: 90 days
    versioning: enabled
    lifecycle:
      - transition_to_ia: 30 days
      - transition_to_glacier: 90 days
      - expiration: 365 days

  configuration:
    frequency: on_change
    retention: unlimited
    storage: git + s3

Security Architecture

Network Security

┌─────────────────────────────────────────────────────────────┐
│                           VPC                                │
│  ┌───────────────────────────────────────────────────────┐  │
│  │                    Public Subnet                       │  │
│  │  ┌─────────────┐  ┌─────────────┐  ┌───────────────┐  │  │
│  │  │   NAT GW    │  │     ALB     │  │   Bastion     │  │  │
│  │  └─────────────┘  └─────────────┘  └───────────────┘  │  │
│  └───────────────────────────────────────────────────────┘  │
│                              │                               │
│  ┌───────────────────────────▼───────────────────────────┐  │
│  │                   Private Subnet                       │  │
│  │  ┌─────────────┐  ┌─────────────┐  ┌───────────────┐  │  │
│  │  │   App Tier  │  │   App Tier  │  │   App Tier    │  │  │
│  │  └─────────────┘  └─────────────┘  └───────────────┘  │  │
│  └───────────────────────────────────────────────────────┘  │
│                              │                               │
│  ┌───────────────────────────▼───────────────────────────┐  │
│  │                   Data Subnet                          │  │
│  │  ┌─────────────┐  ┌─────────────┐  ┌───────────────┐  │  │
│  │  │     RDS     │  │    Redis    │  │  Elasticsearch│  │  │
│  │  └─────────────┘  └─────────────┘  └───────────────┘  │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

IAM Best Practices

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "LeastPrivilegeExample",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::my-bucket/uploads/*",
      "Condition": {
        "StringEquals": {
          "aws:PrincipalTag/Team": "engineering"
        },
        "IpAddress": {
          "aws:SourceIp": ["10.0.0.0/8"]
        }
      }
    }
  ]
}

Reference Materials

  • references/aws_patterns.md - AWS architecture patterns
  • references/gcp_patterns.md - GCP architecture patterns
  • references/multi_cloud.md - Multi-cloud strategies
  • references/cost_optimization.md - Cost optimization guide

Scripts

# Infrastructure cost analyzer
python scripts/cost_analyzer.py --account production --period monthly

# DR validation
python scripts/dr_test.py --region us-west-2 --type failover

# Security audit
python scripts/security_audit.py --framework cis --output report.html

# Resource inventory
python scripts/inventory.py --accounts all --format csv