HubSpot = ClickHouse Internal Analytics Tables
Core Principle: Query HubSpot CRM data via ClickHouse internal_analytics shard using stg_hubspot_* tables for contacts, forms, engagements, and lead qualification analysis.
HubSpot Tables Schema (Continuant - TD):
graph TD
HS[HubSpot Tables] --> Contacts[stg_hubspot_contacts]
HS --> Forms[stg_hubspot_form_submissions]
HS --> Engage[stg_hubspot_engagements]
Contacts --> CFields[email, pre_qual_status, segment, lead_score]
Forms --> FFields[form_id, submitted_at, values_extract_map]
Engage --> EFields[type: EMAIL/MEETING/CALL/TASK]
Engage --> Meetings[stg_hubspot_engagements_meetings]
Engage --> Calls[stg_hubspot_engagements_calls]
Engage --> Tasks[stg_hubspot_engagements_tasks]
Query Workflow (Occurrent - LR):
graph LR
A[Identify Data Need] --> B[Select Table]
B --> C[Build Query with Joins]
C --> D[Handle Arrays with arrayJoin]
D --> E[Execute via internal.query]
Primary source: 07_HubSpot_Schema_Description.md
Session ID: 76fd56b0-a051-44b2-b9e5-1563e48bd806 (2026-01-29)
1. Tables Reference
| Table | Purpose | Key Fields |
|-------|---------|------------|
| stg_hubspot_contacts | Contact info | email, hubspot_contacts_id, hubspot_contact_pre_qualification_status, hubspot_contact_segment, hubspot_create_datetime |
| stg_hubspot_form_submissions | Form fills | email, hubspot_form_id, hubspot_form_submitted_at, hubspot_page_url, hubspot_values_extract_map |
| stg_hubspot_engagements | All interactions | hubspot_engagements_id, hubspot_engagements_type, hubspot_contact_id_arr, hubspot_engagements_created_at |
| stg_hubspot_engagements_meetings | Meetings | hubspot_event_id, hubspot_event_title, hubspot_event_start_datetime, hubspot_contacts |
| stg_hubspot_engagements_calls | Calls | hubspot_engagement_call_status, hubspot_engagement_call_duration_milliseconds |
2. Pre-Qualification Statuses
Common values in hubspot_contact_pre_qualification_status:
Pre-Qualified- Qualified for salesWent Dark- No responseDisqualified- Not a fitDisqualified – spam- Spam leadActively Working- In progressAdded to sequence- In nurture sequence
3. Python Access Pattern
from data_sources.clickhouse import internal
# Get contacts with pre-qual status
df = internal.query("""
SELECT email, hubspot_contact_pre_qualification_status as status,
hubspot_contact_segment as segment, hubspot_create_datetime
FROM stg_hubspot_contacts
WHERE hubspot_create_datetime >= '2024-01-01'
AND hubspot_contact_pre_qualification_status != ''
""")
# Get form submissions with field values
df_forms = internal.query("""
SELECT email, hubspot_form_submitted_at,
hubspot_values_extract_map['monthly_ad_spend'] as ad_spend,
hubspot_values_extract_map['contact_channel__c'] as channel
FROM stg_hubspot_form_submissions
WHERE hubspot_form_submitted_at >= '2024-01-01'
""")
4. Working with Arrays
HubSpot tables use arrays for contact/company associations. Use arrayJoin to expand:
-- Get all meetings with contact info
SELECT
m.hubspot_event_title,
m.hubspot_event_start_datetime,
arrayJoin(e.hubspot_contact_id_arr) AS contact_id
FROM stg_hubspot_engagements e
JOIN stg_hubspot_engagements_meetings m
ON e.hubspot_engagements_id = m.hubspot_event_id
WHERE e.hubspot_engagements_type = 'MEETING'
5. Common Joins
Forms + Contacts:
SELECT fs.email, fs.hubspot_form_submitted_at,
c.hubspot_contact_pre_qualification_status,
c.hubspot_contact_segment
FROM stg_hubspot_form_submissions fs
JOIN stg_hubspot_contacts c ON fs.email = c.email
Contacts + Engagements:
WITH contact_engagements AS (
SELECT hubspot_engagements_id, hubspot_engagements_type,
arrayJoin(hubspot_contact_id_arr) AS contact_id
FROM stg_hubspot_engagements
)
SELECT c.email, ce.hubspot_engagements_type, COUNT(*) as cnt
FROM stg_hubspot_contacts c
JOIN contact_engagements ce ON c.hubspot_contacts_id = ce.contact_id
GROUP BY c.email, ce.hubspot_engagements_type
6. Form Field Extraction
hubspot_values_extract_map is a Map(String, String). Common fields:
monthly_ad_spend- Budget indicatorcontact_channel__c- How they found ushow_can_we_help- Interest areajobtitle- Job titlecompany- Company namefirstname,lastname,phone
Extract with: hubspot_values_extract_map['field_name']