Datadog Entity Generator
Generate comprehensive, validated Datadog Software Catalog entity YAML files (v3 schema) through project analysis and engineer interviews.
Workflow Overview
- Analyze project - Scan source code for metadata signals
- Fetch existing Datadog data - Query API for existing entities, teams, related services
- Interview engineer - Fill gaps with targeted questions
- Generate YAML - Create complete entity definition(s)
- Validate - Check against official JSON schema
- Merge/Output - Handle existing files, write to
.datadog/
Step 1: Project Analysis
Run the project analyzer to extract metadata signals:
uv run scripts/project_analyzer.py /path/to/project
Detected signals:
pyproject.toml,package.json,pom.xml,build.gradle→ name, description, language, dependenciesDockerfile,docker-compose.yml→ service type, dependencieskubernetes/,helm/,terraform/→ infrastructure, dependencies.github/workflows/,.gitlab-ci.yml→ CI/CD pipelinesREADME.md,CODEOWNERS→ description, owners- Existing
.datadog/entity.datadog.yaml→ merge base openapi.yaml,swagger.json→ API definitions
Output: JSON with detected values and confidence levels.
Step 2: Fetch Existing Datadog Data
Query Datadog API for context (requires DD_API_KEY and DD_APP_KEY in environment):
uv run scripts/datadog_fetcher.py --service-name <name>
Fetches:
- Existing entity definitions for the service
- Teams (for owner validation)
- Related services with dependencies
- Monitors, SLOs associated with service
- APM service topology (dependencies)
Step 3: Engineer Interview
Conduct structured interview to fill gaps. See references/interview-guide.md for complete question bank.
Interview strategy:
- Start with auto-detected values for confirmation
- Ask only for missing required fields
- Negotiate optional fields based on engineer preference
- Validate URLs and integration patterns
Core questions by entity kind:
Service
- Service name (confirm auto-detected or provide)
- Display name (human-readable)
- Owner team (validate against Datadog teams or accept new)
- Tier:
critical,high,medium,low - Lifecycle:
production,experimental,deprecated - Type:
web,grpc,rest,graphql,worker,custom - Languages (confirm from detection)
- Dependencies (services, datastores, queues)
- System membership (componentOf)
- PagerDuty service URL
- JIRA project
- Confluence space
- MS Teams channel
- Runbook URL
- Dashboard URL
Datastore
- Type:
postgres,mysql,redis,mongodb,elasticsearch,cassandra,dynamodb, etc. - What services depend on this datastore?
Queue
- Type:
kafka,rabbitmq,sqs,kinesis,pubsub, etc. - Producer and consumer services?
API
- Type:
openapi,graphql,rest,grpc - OpenAPI spec file reference?
- Implementing service?
System
- Component services, datastores, queues
- Domain/product area
Required tags (HMH standards):
env:(production, staging, development)service:(service name)tier:(critical, high, medium, low)
Step 4: Generate YAML
Use the entity generator with collected data:
uv run scripts/entity_generator.py --input collected_data.json --output .datadog/
Multi-entity support: For monorepos, generate multiple entities separated by --- in single file or separate files.
Step 5: Validate
Validate against official Datadog JSON schema:
uv run scripts/schema_validator.py .datadog/entity.datadog.yaml
Validation checks:
- Required fields present (apiVersion, kind, metadata.name)
- Valid enum values (lifecycle, tier, kind)
- URL formats for links and integrations
- Contact email format
- Tag format (
key:value)
Step 6: Merge & Output
If .datadog/entity.datadog.yaml exists:
- Parse existing definitions
- Deep merge with new data (new values override, arrays extend)
- Preserve custom extensions
- Show diff for engineer approval
Output location: .datadog/entity.datadog.yaml
Entity Schema Quick Reference
See references/v3-schema.md for complete schema documentation.
Common Structure (all kinds)
apiVersion: v3
kind: service # service | datastore | queue | api | system
metadata:
name: my-service # Required, unique identifier
displayName: My Service # Human-readable name
namespace: default # Optional, defaults to 'default'
owner: team-name # Primary owner team
additionalOwners: # Multi-ownership
- name: sre-team
type: operator
description: Short description
tags:
- env:production
- service:my-service
- tier:critical
contacts:
- name: On-Call
type: email
contact: oncall@company.com
- name: Team Channel
type: microsoft-teams
contact: https://teams.microsoft.com/l/channel/...
links:
- name: Runbook
type: runbook
url: https://confluence.company.com/runbook
- name: Dashboard
type: dashboard
url: https://app.datadoghq.com/dashboard/xxx
- name: Source Code
type: repo
provider: github
url: https://github.com/org/repo
spec:
lifecycle: production # production | experimental | deprecated
tier: critical # critical | high | medium | low
# ... kind-specific fields
integrations:
pagerduty:
serviceURL: https://company.pagerduty.com/service-directory/PXXXXXX
datadog:
codeLocations:
- repositoryURL: https://github.com/org/repo
paths:
- "src/**"
logs:
- name: Error Logs
query: "service:my-service status:error"
events:
- name: Deployments
query: "source:kubernetes service:my-service"
extensions:
company.com/jira-project: PROJ
company.com/confluence-space: https://confluence.company.com/space
Service-specific spec
spec:
type: web # web | grpc | rest | graphql | worker
languages:
- python
- go
dependsOn:
- service:auth-service
- datastore:postgres-main
- queue:events-kafka
componentOf:
- system:platform
Datastore-specific spec
spec:
type: postgres # postgres | mysql | redis | mongodb | etc.
dependencyOf: # Services that depend on this
- service:api-service
Queue-specific spec
spec:
type: kafka # kafka | rabbitmq | sqs | kinesis
componentOf:
- system:messaging
System-specific spec
spec:
components:
- service:web-frontend
- service:api-backend
- datastore:main-db
- queue:events
Integration URL Patterns
See references/integration-patterns.md for complete patterns.
PagerDuty: https://<subdomain>.pagerduty.com/service-directory/P<alphanumeric>
JIRA: Extension field company.com/jira-project: <PROJECT_KEY>
Confluence: Link with type: doc, provider: confluence
MS Teams: Contact with type: microsoft-teams, URL format: https://teams.microsoft.com/l/channel/...
Snyk/SonarQube/BrowserStack/Orca: Custom extensions field
Scripts Reference
| Script | Purpose |
|--------|---------|
| scripts/project_analyzer.py | Analyze project for metadata signals |
| scripts/datadog_fetcher.py | Fetch existing Datadog entities and context |
| scripts/entity_generator.py | Generate entity YAML from collected data |
| scripts/schema_validator.py | Validate YAML against JSON schema |
Confidence Thresholds
- Auto-apply (≥90%): Directly include in YAML without confirmation
- Confirm (70-89%): Present to engineer for confirmation
- Ask (< 70%): Ask engineer to provide value
Interview Best Practices
- Present all auto-detected values first for batch confirmation
- Group related questions (e.g., all contacts together)
- Provide examples for complex fields (URLs, queries)
- Validate URLs and patterns in real-time
- Explain why each field matters for adoption
- Offer to skip optional fields with explicit acknowledgment