Engage Indicators
Manage indicators and metrics in the Engage Analytics dbt project at /Volumes/Biliba/github/engage-analytics/dbt.
How can I help?
Ask the user which task they need help with:
- Get Started (New User Setup) - Install prerequisites, configure dbt, set up database connection
- Add a New Questionnaire - Generate models for a new form added to the app
- Update Anonymization - Mark fields as PII/non-PII in questionnaire metadata
- Create a New Metric - Add a new indicator to the metrics catalog
- Modify an Existing Metric - Change how an existing metric is calculated
- Export Data to S3 - Export anonymized or PII data to S3 buckets
- Understand Existing Indicators - Query and explore current metrics in the system
Getting Started (New User Setup)
Prerequisites
- Install uv (Python package manager):
curl -LsSf https://astral.sh/uv/install.sh | sh
- Install direnv (optional, for auto-loading env vars):
# macOS
brew install direnv
# Add to shell (bash)
echo 'eval "$(direnv hook bash)"' >> ~/.bashrc
# Add to shell (zsh)
echo 'eval "$(direnv hook zsh)"' >> ~/.zshrc
dbt Setup
cd /Volumes/Biliba/github/engage-analytics/dbt
# Install dbt-postgres via uv
uv tool install dbt-postgres
# Install dbt package dependencies
uv run dbt deps --profiles-dir .
Database Configuration
Create environment variables for database connection. Choose one method:
Option A: Using direnv (recommended)
cd /Volumes/Biliba/github/engage-analytics/dbt
cp .envrc.example .envrc # or edit existing .envrc
direnv allow
Option B: Manual export
export DBT_HOST=localhost
export DBT_PORT=5432
export DBT_USER=postgres
export DBT_PASSWORD=your_password
export DBT_DATABASE=airbyte
export DBT_SCHEMA=engage_analytics
Edit values to match your PostgreSQL instance.
Verify Setup
cd /Volumes/Biliba/github/engage-analytics/dbt
# Test database connection
uv run dbt debug --profiles-dir .
# Run all models
uv run dbt run --profiles-dir .
Data Export Setup (Optional)
If you need to export data to S3:
cd /Volumes/Biliba/github/engage-analytics/dataexport
# Install Python dependencies
uv sync
# Configure environment
cp .env.example .env
# Edit .env with your database and AWS credentials
Quick Reference
- Indicator specs:
indicators/engage-indicators.csv - Metrics catalog:
macros/metrics.sql - Metrics fact table:
engage_analytics.fct_metrics_long - Project docs:
docs/metrics.md
Scripts (in repo root):
model_generator.py- Generate questionnaire models (named + anon)metadata_manager.py- Manage questionnaire metadatarun_dbt.sh- Run dbt commands with env vars
Data export (in dataexport/):
run_export.sh- Refresh dbt + export to S3export_to_s3.py- Direct S3 export
For full project structure, see references/project-structure.md. For indicator-to-metric mapping, see references/indicator-mapping.md.
Core Capabilities
1. Understand Existing Indicators
Query current metrics:
SELECT metric_id, description, max(value) as latest_value
FROM engage_analytics.fct_metrics_long
GROUP BY 1, 2 ORDER BY 1;
Check indicator coverage in docs/metrics.md - maps all 32 CSV indicators to 64 dbt metrics.
2. Create New Indicator
Guided workflow - ask user these questions:
- What is the indicator name and description?
- What domain? (System Use, Programmatic, Treatment, Follow-up, Adoption)
- What module? (mwTool, IPC, SBIRT, SPI, FWS, Planning Next Steps, All)
- What is the data source?
- Questionnaire ID (e.g.,
Questionnaire/1613532) - Task code (e.g.,
040for follow-up) - Existing model (e.g.,
patient,practitioners)
- Questionnaire ID (e.g.,
- What linkId or field contains the data? (for questionnaire-based)
- What is the data type? (Count, Percent)
- For percent metrics: what is the numerator and denominator?
- What disaggregation? (by organization, by practitioner, by month)
Implementation steps:
A. If new source model needed, create in models/metrics/:
-- models/metrics/new_metric_source.sql
-- ABOUTME: [What this model does]
-- ABOUTME: [What indicator it supports]
{{ config(materialized='view') }}
select
subject_patient_id,
organization_id,
-- metric-specific fields
from {{ ref('source_model') }}
where conditions
B. Add to metrics catalog in macros/metrics.sql:
- id: new_metric_name
unit: count # or percent
grain: day
entity_keys: [organization_id]
source_model: new_metric_source
expression: "count(distinct subject_patient_id)" # for count
# OR for percent:
# numerator: "count(distinct case when condition then subject_patient_id end)"
# denominator: "nullif(count(distinct subject_patient_id), 0)"
description: "Human-readable description"
version: v1
C. Rebuild metrics:
cd /Volumes/Biliba/github/engage-analytics/dbt
uv run dbt run --profiles-dir . --select new_metric_source fct_metrics_long
D. Verify:
SELECT * FROM engage_analytics.fct_metrics_long
WHERE metric_id = 'new_metric_name';
3. Add New Questionnaire
When a new form is added to the app:
A. Find questionnaire ID in raw data:
SELECT DISTINCT questionnaire_id
FROM engage_analytics_engage_analytics_stg.stg_questionnaire_response
ORDER BY 1;
B. Extract metadata using metadata_manager.py:
cd /Volumes/Biliba/github/engage-analytics/dbt
python3 metadata_manager.py extract --questionnaire-id NEW_ID
This extracts linkIds and adds them to data/questionnaire_metadata.csv.
C. Review and edit metadata in data/questionnaire_metadata.csv:
- Set
anon=TRUEfor PII fields (names, DOB, phone, address) - Set
anon=FALSEfor non-PII fields - Add readable
labelfor each field
D. Generate models using model_generator.py:
# Generate both named and anonymized models
python3 model_generator.py all --table qr_new_form --questionnaire-id NEW_ID
# Or generate separately:
python3 model_generator.py named --table qr_new_form --questionnaire-id NEW_ID
python3 model_generator.py anon --table qr_new_form
E. Build:
uv run dbt seed --profiles-dir . # Reload metadata
uv run dbt run --profiles-dir . --select qr_new_form qr_new_form_anon
Manual model creation (if scripts unavailable):
Named model (models/marts/qr_named/qr_new_form.sql):
{{ config(materialized='view') }}
{% set identifiers = ["Questionnaire/NEW_ID"] %}
{% if identifiers|length == 0 %}
select null::text as placeholder where false
{% else %}
{{ build_qr_wide_readable(identifiers, this.name) }}
{% endif %}
Anonymized model (models/marts/qr_anon/qr_new_form_anon.sql):
{{ config(materialized='view') }}
{{ create_anonymized_qr_view('qr_new_form', []) }}
4. Update Anonymization
Edit data/questionnaire_metadata.csv:
- Set
anon=TRUEfor PII fields (names, DOB, phone, address, SSN, Medicaid) - Set
anon=FALSEfor non-PII fields
Rebuild anonymized view:
uv run dbt seed --profiles-dir .
uv run dbt run --profiles-dir . --select qr_*_anon
5. Test Metrics
Verify metric logic matches source data:
-- Get metric value
SELECT metric_id, organization_id, value
FROM engage_analytics.fct_metrics_long
WHERE metric_id = 'metric_name';
-- Verify against source
SELECT organization_id, count(distinct subject_patient_id)
FROM engage_analytics.source_model
GROUP BY 1;
6. Export Data to S3
The dataexport/ directory contains tools for exporting data to S3 buckets.
Full refresh + export (recommended):
cd /Volumes/Biliba/github/engage-analytics/dataexport
./run_export.sh anon # Anonymized data only
./run_export.sh pii # PII data only
./run_export.sh # Both (default)
This runs:
dbt runto refresh all modelsexport_to_s3.pyto export and upload
Direct export (skip dbt refresh):
cd /Volumes/Biliba/github/engage-analytics/dataexport
uv run python export_to_s3.py --type anon
uv run python export_to_s3.py --type pii
uv run python export_to_s3.py --type both
uv run python export_to_s3.py --type both --delete-local # Remove local files after upload
Environment setup - create dataexport/.env with:
# Database
DBT_HOST=localhost
DBT_PORT=5432
DBT_USER=postgres
DBT_PASSWORD=your_password
DBT_DATABASE=airbyte
# AWS
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1
# S3 Buckets
S3_BUCKET_ANON=engage-analytics-exports-anon
S3_BUCKET_PII=engage-analytics-exports-pii
S3 file structure:
s3://engage-analytics-exports-anon/
└── 2024/01/12/
└── engage_analytics_export_anon_20240112_143022.zip
What gets exported:
- Anon:
qr_*_anonviews,patient_anon, resource tables - PII:
qr_*views,patient, resource tables
Common Patterns
Eligibility from mwTool
-- Extract boolean from mwTool (Questionnaire/1613532)
(jsonb_path_query_first(items::jsonb,
'$.**.item[*] ? (@.linkId == "flag-name").answer[0].valueBoolean'))::boolean
Acceptance from Planning Next Steps
-- Check acceptance field
WHERE planning_next_steps_did_the_client_accept_X = 'true'
Session Tracking
-- Count sessions by intervention type
SELECT intervention_type, count(distinct qr_id)
FROM engage_analytics.intervention_sessions
GROUP BY 1;
Assessment Score Severity
- PHQ-9: Severe 20-27, Moderate 10-19, Mild 5-9, Minimal 0-4
- GAD-7: Severe 15-21, Moderate 10-14, Mild 5-9, Minimal 0-4