Datadog Entity Generator Skill

Datadog Entity Generator

Generate comprehensive, validated Datadog Software Catalog entity YAML files (v3 schema) through project analysis and engineer interviews.

Workflow Overview

Analyze project - Scan source code for metadata signals
Fetch existing Datadog data - Query API for existing entities, teams, related services
Interview engineer - Fill gaps with targeted questions
Generate YAML - Create complete entity definition(s)
Validate - Check against official JSON schema
Merge/Output - Handle existing files, write to .datadog/

Step 1: Project Analysis

Run the project analyzer to extract metadata signals:

uv run scripts/project_analyzer.py /path/to/project

Detected signals:

pyproject.toml, package.json, pom.xml, build.gradle → name, description, language, dependencies
Dockerfile, docker-compose.yml → service type, dependencies
kubernetes/, helm/, terraform/ → infrastructure, dependencies
.github/workflows/, .gitlab-ci.yml → CI/CD pipelines
README.md, CODEOWNERS → description, owners
Existing .datadog/entity.datadog.yaml → merge base
openapi.yaml, swagger.json → API definitions

Output: JSON with detected values and confidence levels.

Step 2: Fetch Existing Datadog Data

Query Datadog API for context (requires DD_API_KEY and DD_APP_KEY in environment):

uv run scripts/datadog_fetcher.py --service-name <name>

Fetches:

Existing entity definitions for the service
Teams (for owner validation)
Related services with dependencies
Monitors, SLOs associated with service
APM service topology (dependencies)

Step 3: Engineer Interview

Conduct structured interview to fill gaps. See references/interview-guide.md for complete question bank.

Interview strategy:

Start with auto-detected values for confirmation
Ask only for missing required fields
Negotiate optional fields based on engineer preference
Validate URLs and integration patterns

Core questions by entity kind:

Service

Service name (confirm auto-detected or provide)
Display name (human-readable)
Owner team (validate against Datadog teams or accept new)
Tier: critical, high, medium, low
Lifecycle: production, experimental, deprecated
Type: web, grpc, rest, graphql, worker, custom
Languages (confirm from detection)
Dependencies (services, datastores, queues)
System membership (componentOf)
PagerDuty service URL
JIRA project
Confluence space
MS Teams channel
Runbook URL
Dashboard URL

Datastore

Type: postgres, mysql, redis, mongodb, elasticsearch, cassandra, dynamodb, etc.
What services depend on this datastore?

Queue

Type: kafka, rabbitmq, sqs, kinesis, pubsub, etc.
Producer and consumer services?

API

Type: openapi, graphql, rest, grpc
OpenAPI spec file reference?
Implementing service?

System

Component services, datastores, queues
Domain/product area

Required tags (HMH standards):

env: (production, staging, development)
service: (service name)
tier: (critical, high, medium, low)

Step 4: Generate YAML

Use the entity generator with collected data:

uv run scripts/entity_generator.py --input collected_data.json --output .datadog/

Multi-entity support: For monorepos, generate multiple entities separated by --- in single file or separate files.

Step 5: Validate

Validate against official Datadog JSON schema:

uv run scripts/schema_validator.py .datadog/entity.datadog.yaml

Validation checks:

Required fields present (apiVersion, kind, metadata.name)
Valid enum values (lifecycle, tier, kind)
URL formats for links and integrations
Contact email format
Tag format (key:value)

Step 6: Merge & Output

If .datadog/entity.datadog.yaml exists:

Parse existing definitions
Deep merge with new data (new values override, arrays extend)
Preserve custom extensions
Show diff for engineer approval

Output location: .datadog/entity.datadog.yaml

Entity Schema Quick Reference

See references/v3-schema.md for complete schema documentation.

Common Structure (all kinds)

apiVersion: v3
kind: service  # service | datastore | queue | api | system
metadata:
  name: my-service              # Required, unique identifier
  displayName: My Service       # Human-readable name
  namespace: default            # Optional, defaults to 'default'
  owner: team-name              # Primary owner team
  additionalOwners:             # Multi-ownership
    - name: sre-team
      type: operator
  description: Short description
  tags:
    - env:production
    - service:my-service
    - tier:critical
  contacts:
    - name: On-Call
      type: email
      contact: oncall@company.com
    - name: Team Channel
      type: microsoft-teams
      contact: https://teams.microsoft.com/l/channel/...
  links:
    - name: Runbook
      type: runbook
      url: https://confluence.company.com/runbook
    - name: Dashboard
      type: dashboard
      url: https://app.datadoghq.com/dashboard/xxx
    - name: Source Code
      type: repo
      provider: github
      url: https://github.com/org/repo
spec:
  lifecycle: production         # production | experimental | deprecated
  tier: critical                # critical | high | medium | low
  # ... kind-specific fields
integrations:
  pagerduty:
    serviceURL: https://company.pagerduty.com/service-directory/PXXXXXX
datadog:
  codeLocations:
    - repositoryURL: https://github.com/org/repo
      paths:
        - "src/**"
  logs:
    - name: Error Logs
      query: "service:my-service status:error"
  events:
    - name: Deployments
      query: "source:kubernetes service:my-service"
extensions:
  company.com/jira-project: PROJ
  company.com/confluence-space: https://confluence.company.com/space

Service-specific spec

spec:
  type: web                     # web | grpc | rest | graphql | worker
  languages:
    - python
    - go
  dependsOn:
    - service:auth-service
    - datastore:postgres-main
    - queue:events-kafka
  componentOf:
    - system:platform

Datastore-specific spec

spec:
  type: postgres                # postgres | mysql | redis | mongodb | etc.
  dependencyOf:                 # Services that depend on this
    - service:api-service

Queue-specific spec

spec:
  type: kafka                   # kafka | rabbitmq | sqs | kinesis
  componentOf:
    - system:messaging

System-specific spec

spec:
  components:
    - service:web-frontend
    - service:api-backend
    - datastore:main-db
    - queue:events

Integration URL Patterns

See references/integration-patterns.md for complete patterns.

PagerDuty: https://<subdomain>.pagerduty.com/service-directory/P<alphanumeric>

JIRA: Extension field company.com/jira-project: <PROJECT_KEY>

Confluence: Link with type: doc, provider: confluence

MS Teams: Contact with type: microsoft-teams, URL format: https://teams.microsoft.com/l/channel/...

Snyk/SonarQube/BrowserStack/Orca: Custom extensions field

Scripts Reference

| Script | Purpose | |--------|---------| | scripts/project_analyzer.py | Analyze project for metadata signals | | scripts/datadog_fetcher.py | Fetch existing Datadog entities and context | | scripts/entity_generator.py | Generate entity YAML from collected data | | scripts/schema_validator.py | Validate YAML against JSON schema |

Confidence Thresholds

Auto-apply (≥90%): Directly include in YAML without confirmation
Confirm (70-89%): Present to engineer for confirmation
Ask (< 70%): Ask engineer to provide value

Interview Best Practices

Present all auto-detected values first for batch confirmation
Group related questions (e.g., all contacts together)
Provide examples for complex fields (URLs, queries)
Validate URLs and patterns in real-time
Explain why each field matters for adoption
Offer to skip optional fields with explicit acknowledgment

Agent Skills: Datadog Entity Generator

Install this agent skill to your local

Skill Files