# Marketing Funnel SQL Query Reference

Complete SQL query templates and metric calculation formulas for Improvado's internal marketing attribution analysis.

> **⚠️ TABLE RENAMED (Jan 2026):** `biz_multitouch_all_attribution_metrics_model` → `biz_attribution`. Same schema, new name.

## Table of Contents

- [Key Columns](#key-columns)
- [Metric Calculations](#metric-calculations)
- [Query Templates](#query-templates)
- [Cost & Efficiency Metrics](#cost--efficiency-metrics)
- [Quarter-over-Quarter Analysis](#quarter-over-quarter-analysis)

## Key Columns

**Table:** `internal_analytics.biz_attribution`

| Column | Description | Usage |
|--------|-------------|-------|
| `event_date` | Event date | Time filtering, grouping by quarter |
| `metrics_name` | Funnel stage | 'Website Users', 'Lead', 'SQL', 'Closed Won', etc. |
| `channel_type` | Channel attribution | 'Organic', 'Paid', 'Direct', etc. |
| `medium` | Traffic medium | Use for Organic AI override |
| `source` | Traffic source | Use for Organic AI override |
| `linear_weight` | Attribution weight | COUNT of records (use SUM) |
| `opportunity_amount` | Deal value | Multiply by linear_weight for ARR |
| `contact_email` | Contact email | Filter internal emails on Lead/Disco only |
| `spend` | Cost data | Sum for total spend |
| `max_event_datetime` | Event timestamp | Compare with grain columns for filtering |
| `year_grain` | Year boundary | Use for yearly aggregations |
| `quarter_grain` | Quarter boundary | Use for quarterly aggregations |
| `month_grain` | Month boundary | Use for monthly aggregations |
| `week_grain` | Week boundary | Use for weekly aggregations |

## Grain Filtering (CRITICAL)

### Why Grain Filtering?

**Without grain filtering you get ~15-20% inflated numbers** due to double-counting at period boundaries.

### Pre-computed Grain Columns

The table has pre-computed grain columns - use them with `max_event_datetime`:

| Report Period | Grain Column | Pattern |
|---------------|--------------|---------|
| Yearly | `year_grain` | `max_event_datetime = year_grain` |
| Quarterly | `quarter_grain` | `max_event_datetime = quarter_grain` |
| Monthly | `month_grain` | `max_event_datetime = month_grain` |
| Weekly | `week_grain` | `max_event_datetime = week_grain` |

### Using sumIf with Grain Columns

```sql
-- YEAR grain
sumIf(linear_weight, max_event_datetime = year_grain AND metrics_name = 'Website Users')

-- QUARTER grain
sumIf(linear_weight, max_event_datetime = quarter_grain AND metrics_name = 'SQL')

-- MONTH grain
sumIf(linear_weight, max_event_datetime = month_grain AND metrics_name = 'Lead')

-- WEEK grain
sumIf(linear_weight, max_event_datetime = week_grain AND metrics_name = 'Disco Lead')
```

### Example: Yearly Funnel with Grain

```sql
SELECT
    toYear(event_date) AS year,
    sumIf(linear_weight, max_event_datetime = year_grain AND metrics_name = 'Website Users') AS website_users,
    sumIf(linear_weight, max_event_datetime = year_grain AND metrics_name = 'Lead'
        AND contact_email NOT LIKE '%@iubridge.com%'
        AND contact_email NOT LIKE '%@kirill-markin.com%'
        AND contact_email NOT LIKE '%@ozma.io%') AS leads,
    sumIf(linear_weight, max_event_datetime = year_grain AND metrics_name = 'SQL') AS sqls,
    sumIf(linear_weight * opportunity_amount, max_event_datetime = year_grain AND metrics_name = 'SQL') AS sql_arr
FROM internal_analytics.biz_attribution
WHERE event_date >= '2019-01-01' AND event_date < '2026-01-01'
GROUP BY year
ORDER BY year
```

### Example: Quarterly Funnel with Grain

```sql
SELECT
    toStartOfQuarter(event_date) AS quarter,
    sumIf(linear_weight, max_event_datetime = quarter_grain AND metrics_name = 'Website Users') AS website_users,
    sumIf(linear_weight, max_event_datetime = quarter_grain AND metrics_name = 'SQL') AS sqls,
    sumIf(linear_weight * opportunity_amount, max_event_datetime = quarter_grain AND metrics_name = 'Closed Won') AS closed_won_arr
FROM internal_analytics.biz_attribution
WHERE event_date >= '2024-01-01' AND event_date < '2026-01-01'
GROUP BY quarter
ORDER BY quarter
```

### Example: Monthly Funnel with Grain

```sql
SELECT
    toStartOfMonth(event_date) AS month,
    sumIf(linear_weight, max_event_datetime = month_grain AND metrics_name = 'Website Users') AS website_users,
    sumIf(linear_weight, max_event_datetime = month_grain AND metrics_name = 'Lead') AS leads,
    sumIf(linear_weight, max_event_datetime = month_grain AND metrics_name = 'SQL') AS sqls
FROM internal_analytics.biz_attribution
WHERE event_date >= '2025-01-01' AND event_date < '2026-01-01'
GROUP BY month
ORDER BY month
```

## Metric Calculations

### Funnel Counts

```sql
-- Website Users
SUM(CASE WHEN metrics_name = 'Website Users' THEN linear_weight ELSE 0 END)

-- Leads (WITH email filter)
SUM(CASE WHEN metrics_name = 'Lead'
    AND contact_email NOT LIKE '%@iubridge.com%'
    AND contact_email NOT LIKE '%@kirill-markin.com%'
    AND contact_email NOT LIKE '%@ozma.io%'
    THEN linear_weight ELSE 0 END)

-- Disco Leads (WITH email filter)
SUM(CASE WHEN metrics_name = 'Disco Lead'
    AND contact_email NOT LIKE '%@iubridge.com%'
    AND contact_email NOT LIKE '%@kirill-markin.com%'
    AND contact_email NOT LIKE '%@ozma.io%'
    THEN linear_weight ELSE 0 END)

-- Opportunities (NO filter)
SUM(CASE WHEN metrics_name = 'Opportunity Created' THEN linear_weight ELSE 0 END)

-- SQLs (NO filter)
SUM(CASE WHEN metrics_name = 'SQL' THEN linear_weight ELSE 0 END)

-- Closed Won (NO filter)
SUM(CASE WHEN metrics_name = 'Closed Won' THEN linear_weight ELSE 0 END)
```

### Revenue/Pipeline Metrics

```sql
-- SQL ARR (Pipeline Value - NOT revenue!)
SUM(CASE WHEN metrics_name = 'SQL' THEN linear_weight * opportunity_amount ELSE 0 END)

-- Closed Won ARR (Actual Revenue)
SUM(CASE WHEN metrics_name = 'Closed Won' THEN linear_weight * opportunity_amount ELSE 0 END)
```

## Query Templates

### Template 1: Full Funnel by Quarter (with grain)

```sql
SELECT
    toStartOfQuarter(event_date) AS quarter,
    sumIf(linear_weight, max_event_datetime = quarter_grain AND metrics_name = 'Website Users') AS website_users,
    sumIf(linear_weight, max_event_datetime = quarter_grain AND metrics_name = 'Lead'
        AND contact_email NOT LIKE '%@iubridge.com%'
        AND contact_email NOT LIKE '%@kirill-markin.com%'
        AND contact_email NOT LIKE '%@ozma.io%') AS leads,
    sumIf(linear_weight, max_event_datetime = quarter_grain AND metrics_name = 'Disco Lead'
        AND contact_email NOT LIKE '%@iubridge.com%'
        AND contact_email NOT LIKE '%@kirill-markin.com%'
        AND contact_email NOT LIKE '%@ozma.io%') AS disco_leads,
    sumIf(linear_weight, max_event_datetime = quarter_grain AND metrics_name = 'Opportunity Created') AS opportunities,
    sumIf(linear_weight, max_event_datetime = quarter_grain AND metrics_name = 'SQL') AS sqls,
    sumIf(linear_weight * opportunity_amount, max_event_datetime = quarter_grain AND metrics_name = 'SQL') AS sql_arr,
    sumIf(linear_weight, max_event_datetime = quarter_grain AND metrics_name = 'Closed Won') AS closed_won,
    sumIf(linear_weight * opportunity_amount, max_event_datetime = quarter_grain AND metrics_name = 'Closed Won') AS closed_won_arr
FROM internal_analytics.biz_attribution
WHERE event_date >= '2025-01-01' AND event_date < '2026-01-01'
GROUP BY quarter
ORDER BY quarter
```

### Template 2: Channel Breakdown (Current Quarter with grain)

```sql
SELECT
    CASE
        WHEN medium = 'referral' AND source IN ('chatgpt.com', 'perplexity.ai', 'gemini.google.com')
        THEN 'Organic AI'
        ELSE channel_type
    END AS channel,
    sumIf(linear_weight, max_event_datetime = quarter_grain AND metrics_name = 'Website Users') AS website_users,
    sumIf(linear_weight, max_event_datetime = quarter_grain AND metrics_name = 'Lead'
        AND contact_email NOT LIKE '%@iubridge.com%'
        AND contact_email NOT LIKE '%@kirill-markin.com%'
        AND contact_email NOT LIKE '%@ozma.io%') AS leads,
    sumIf(linear_weight, max_event_datetime = quarter_grain AND metrics_name = 'Disco Lead'
        AND contact_email NOT LIKE '%@iubridge.com%'
        AND contact_email NOT LIKE '%@kirill-markin.com%'
        AND contact_email NOT LIKE '%@ozma.io%') AS disco_leads,
    sumIf(linear_weight, max_event_datetime = quarter_grain AND metrics_name = 'Opportunity Created') AS opportunities,
    sumIf(linear_weight, max_event_datetime = quarter_grain AND metrics_name = 'SQL') AS sqls,
    sumIf(linear_weight * opportunity_amount, max_event_datetime = quarter_grain AND metrics_name = 'SQL') AS sql_arr
FROM internal_analytics.biz_attribution
WHERE toStartOfQuarter(event_date) = toStartOfQuarter(today())
GROUP BY channel
ORDER BY sql_arr DESC  -- CRITICAL: Rank by SQL ARR, not SQL count!
```

### Template 3: N-Day Pace Comparison

Compare first N days of current quarter vs same N days of prior quarters (for incomplete quarters):

```sql
SELECT
    toStartOfQuarter(event_date) AS quarter,
    SUM(CASE WHEN metrics_name = 'Website Users' THEN linear_weight ELSE 0 END) AS website_users,
    SUM(CASE WHEN metrics_name = 'Lead'
        AND contact_email NOT LIKE '%@iubridge.com%'
        AND contact_email NOT LIKE '%@kirill-markin.com%'
        AND contact_email NOT LIKE '%@ozma.io%'
        THEN linear_weight ELSE 0 END) AS leads,
    SUM(CASE WHEN metrics_name = 'SQL' THEN linear_weight ELSE 0 END) AS sqls,
    SUM(CASE WHEN metrics_name = 'SQL' THEN linear_weight * opportunity_amount ELSE 0 END) AS sql_arr
FROM internal_analytics.biz_attribution
WHERE (
    (event_date >= '2025-01-01' AND event_date < '2025-01-29') OR  -- Q1 first 28 days
    (event_date >= '2025-04-01' AND event_date < '2025-04-29') OR  -- Q2 first 28 days
    (event_date >= '2025-07-01' AND event_date < '2025-07-29') OR  -- Q3 first 28 days
    (event_date >= '2025-10-01' AND event_date < '2025-10-29')     -- Q4 first 28 days
)
GROUP BY quarter
ORDER BY quarter
```

**Why pace analysis matters:**
- Incomplete quarters show artificially low totals
- First N days comparison shows true run-rate
- Enables accurate quarter-over-quarter projections

## Cost & Efficiency Metrics

### Marketing Spend Calculation

```sql
SELECT
    SUM(spend) AS total_marketing_spend
FROM internal_analytics.biz_attribution
WHERE metrics_name IN (
    'Ads Spend',
    'Other Marketing Costs',
    'Marketing Payroll',
    'Event Cost',
    'Sales Payroll',
    'Other Sales Costs'
)
AND toStartOfMonth(event_date) >= toStartOfQuarter(today())
AND toStartOfMonth(event_date) < addQuarters(toStartOfQuarter(today()), 1)
```

**Important:** Cost metrics use **month grain** (toStartOfMonth), not daily grain.

### Efficiency Calculations

After getting metrics from funnel queries and spend query:

```sql
-- Cost per SQL
Total Spend / SQL Count

-- Cost per Lead
Total Spend / Lead Count

-- Marketing ROI (pipeline multiple)
SQL ARR / Total Marketing Spend
-- Example: $1.1M / $38K = 29x ($1 spent generates $29 in pipeline)

-- Cost per Closed Won Customer
Total Spend / Closed Won Count

-- Revenue ROI
Closed Won ARR / Total Marketing Spend
```

## Quarter-over-Quarter Analysis

### Full Quarter Comparison (for completed quarters)

```sql
WHERE event_date >= '2025-01-01' AND event_date < '2026-01-01'
GROUP BY toStartOfQuarter(event_date) AS quarter
ORDER BY quarter
```

### Current Quarter vs Prior Quarters (full timeframe)

```sql
-- Get current quarter start
toStartOfQuarter(today())

-- Get last 4 quarters
WHERE event_date >= addQuarters(toStartOfQuarter(today()), -3)
  AND event_date < addQuarters(toStartOfQuarter(today()), 1)
GROUP BY toStartOfQuarter(event_date)
```

## Standard Channels

When analyzing channel breakdown results:

- **Organic** - SEO, organic search (zero direct cost - pure ROI channel)
- **Paid** - Paid advertising (controllable, scalable)
- **Direct** - Direct traffic (**UNCONTROLLABLE** - brand strength signal, do NOT recommend scaling)
- **Marketing Outbound** - Outbound campaigns, ABM
- **Event** - Conferences, webinars
- **Other** - Mixed/unclassified
- **Unattributed** - Attribution gap (tech limitations)
- **Organic AI** - AI search engines (ChatGPT, Perplexity, Gemini) - requires override

## Execution

### Using ch internal

```bash
# Activate venv (NOT claude_venv)
source venv/bin/activate

# Execute query against Palantir shard
python ch internal "YOUR_SQL_QUERY" --palantir
```

**Important:**
- Always use `--palantir` flag for internal_analytics database
- Use double quotes for SQL query
- Use single quotes inside SQL
- Activate `venv`, not `claude_venv`

### Example Full Execution

```bash
source venv/bin/activate && python ch internal "SELECT toStartOfQuarter(event_date) AS quarter, SUM(CASE WHEN metrics_name = 'SQL' THEN linear_weight ELSE 0 END) AS sqls FROM internal_analytics.biz_attribution WHERE event_date >= '2025-01-01' GROUP BY quarter ORDER BY quarter" --palantir
```

## Common Query Patterns

### Pattern 1: Filter by Date Range

```sql
-- Specific quarter
WHERE toStartOfQuarter(event_date) = '2025-01-01'

-- Date range
WHERE event_date >= '2025-01-01' AND event_date < '2025-04-01'

-- Last N quarters
WHERE event_date >= addQuarters(toStartOfQuarter(today()), -4)
```

### Pattern 2: Group by Time Period

```sql
-- By quarter
GROUP BY toStartOfQuarter(event_date) AS quarter

-- By month
GROUP BY toStartOfMonth(event_date) AS month

-- By week
GROUP BY toMonday(event_date) AS week
```

### Pattern 3: Email Filtering (Lead/Disco ONLY)

```sql
AND contact_email NOT LIKE '%@iubridge.com%'
AND contact_email NOT LIKE '%@kirill-markin.com%'
AND contact_email NOT LIKE '%@ozma.io%'
```

**NEVER apply to:** Opportunity Created, SQL, Closed Won

## Related Documentation

- **Main Skill:** `../SKILL.md` - Overview, concepts, workflow
- **Full Guide:** `algorithms/revenue_div/marketing_dpt/01_projects/mcp_analytics_with_claude/mcp_instructions.md` (relative to repository root, if exists)
