The Last Click Attribution Model Using BigQuery
In this article, you will explore how to easily aggregate the Last Click Attribution Model using BigQuery.
CONTENTS
- Introduction
- Enable the export of Google Analytics 4 data to BigQuery.
- Flatten Table
- Standardize the session UTM values for all events within each session.
- Replace NULL session UTM values with the value “
(direct)
”. - Extract only the necessary columns for calculating attribution.
- Assign journey numbers to each user’s purchase event.
- Assign Priorities to indicate the order of events for applying the Last Click Attribution Model.
- The
session_campaign
with the highest priority will be attributed to 100% of the revenue. - Now, let’s perform the Last Click Attribution aggregation for each channel.
- Let’s create a dashboard using Redash.
1. Introduction
When you go to the Attribution Settings of Google Analytics 4, you can easily choose the Attribution Model that you love.
Google Analytics 4 > Settings > Attribution Settings
Data Driven
- It does not explicitly specify what kind of model is being used, and I think it’s a little risky when it comes to making a marketing decision.
Last Click
- The channel that immediately precedes the purchase is considered to have a 100% contribution.
First Click
- The channel that initially brings in the user is considered to have a 100% contribution.
Linear
- All channels, from the user’s initial acquisition channel to the channel immediately preceding the purchase, are considered to have an equal contribution of 1/N, where N represents the total number of channels.
Google Analytics 4 > Acquisition > User Acquisition
However, due to the post hoc nature of Attribution Settings in Google Analytics 4, there is a limitation where if the marketing strategy changes, you would need to aggregate the data again from the beginning.
Therefore, considering alternatives outside of Google Analytics 4, I have started contemplating using BigQuery to apply Attribution Models more flexibly and regularly monitor them.
In this article, you will explore how to easily aggregate the Last Click Attribution Model using BigQuery.
2. Enable the export of Google Analytics 4 data to BigQuery.
Google Analytics 4 > Settings > BigQuery Links
For detailed info and how to, refer to here.
3. Flatten Table
Unlike Bigtable, a column-wide store NoSQL database, BigQuery is a data warehouse. However, BigQuery still offers a lot of nested columns to handle future additions or deletions of schema, events, parameters, and more. For analytical purposes, it’s necessary to flatten tables composed of these columns to simplify queries.
BigQuery table has nested columns that look like this.
WITH
CTE_flattened AS (
SELECT
-- ==========================================================
-- [Datetime]
-- ==========================================================
event_date, -- Event Date
event_timestamp, -- Event Timestamp
user_first_touch_timestamp, -- User's First Visit Timestamp
-- ==========================================================
-- [User & Session ID]
-- ==========================================================
user_pseudo_id, -- User ID
ga_session_id.value.int_value AS ga_session_id, -- Session ID
ga_session_number.value.int_value AS ga_session_number, -- User's Session Index (starting from 1)
-- ==========================================================
-- [Event Name]
-- ==========================================================
event_name,
-- ==========================================================
-- [Geography & Device]
-- ==========================================================
geo.country, -- Country
device.category AS device_category, -- Device Category
device.operating_system AS device_os, -- Device OS
-- ==========================================================
-- [Acquisition]
-- ==========================================================
-- Landing Info
entrances.value.int_value AS entrances, -- If it's the first landing page of this session (page_view Event only)
REPLACE(REGEXP_REPLACE(LOWER(page_referrer.value.string_value), r'/$', ''), 'https://', '') AS page_referrer,
-- Session UTM
campaign.value.string_value AS session_campaign, -- UTM Campaign (Session-based)
medium.value.string_value AS session_medium, -- UTM Medium (Session-based)
source.value.string_value AS session_source, -- UTM Source (Session-based)
term.value.string_value AS session_term, -- UTM Term (Session-based)
-- First UTM
traffic_source.name AS first_campaign, -- UTM Campaign (First-based)
traffic_source.medium AS first_medium, -- UTM Medium (First-based)
traffic_source.source AS first_source, -- UTM Source (First-based)
-- Manual UTM
collected_traffic_source.manual_campaign_id AS manual_campaign_id, -- UTM Campaign ID (Manual-based)
collected_traffic_source.manual_campaign_name AS manual_campaign, -- UTM Campaign (Manual-based)
collected_traffic_source.manual_medium AS manual_medium, -- UTM Medium (Manual-based)
collected_traffic_source.manual_source AS manual_source, -- UTM Source (Manual-based)
collected_traffic_source.manual_term AS manual_term, -- UTM Term (Manual-based)
collected_traffic_source.manual_content AS manual_content, -- UTM Content (Manual-based)
-- Ads Identifiers
collected_traffic_source.gclid AS manual_gclid, -- Google Click Identifier
collected_traffic_source.dclid AS manual_dclid, -- Google Marketing Platform Identifier
collected_traffic_source.srsltid AS manual_srsltid, -- Google Merchant Center Identifier
-- ==========================================================
-- [Page]
-- ==========================================================
REGEXP_REPLACE(LOWER(device.web_info.hostname), r'/$', '') AS hostname, -- Domain or Subdomain
REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(LOWER(page_location.value.string_value), r'(\?.*)$', ''), r'/$', ''), 'https://', '') AS page_location, -- Current Page URL
-- ==========================================================
-- [Engagement]
-- ==========================================================
session_engaged.value.int_value AS session_engaged, -- If the session is engaged (session_start Event only)
engagement_time_msec.value.int_value AS engagement_time_msec, -- session engagement time (msec)
-- ==========================================================
-- [Click Event]
-- ==========================================================
outbound.value.string_value AS outbound, -- If the click Event is outbound from the current domain
REGEXP_REPLACE(LOWER(link_domain.value.string_value), r'/$', '') AS link_domain, -- Domain from the click Event
REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(LOWER(link_url.value.string_value), r'(\?.*)$', ''), r'/$', ''), 'https://', '') AS link_url, -- Page URL from the click Event
-- ==========================================================
-- [Scroll Event]
-- ==========================================================
percent_scrolled.value.int_value AS percent_scrolled, -- Scroll Percentage (default = 90 only)
-- ==========================================================
-- [Ecommerce Event]
-- ==========================================================
-- View Item ~
ecomm_pagetype.value.string_value AS ecomm_pagetype, -- Type of the Page (product, cart) (view_item, add_to_cart, begin_checkout Event)
ecomm_prodid.value.string_value AS ecomm_prodid, -- Product ID (view_item, begin_checkout, add_to_cart Event)
ecommerce.total_item_quantity AS ecommerce_total_item_quantity, -- Amount of Total Items purchased (view_item, add_to_cart, begin_checkout, purchase Event)
ecommerce.unique_items AS ecommerce_unique_items, -- Amount of Unique Items purchased (view_item, add_to_cart, begin_checkout, purchase Event)
-- Add to Cart ~
ecomm_totalvalue.value.int_value AS ecomm_totalvalue, -- Total Value of Items (add_to_cart, begin_checkout Event)
currency.value.string_value AS currency, -- Currency (add_to_cart, begin_checkout, add_payment_info, purchase Event)
-- Add Payment Info ~
total.value.int_value AS total, -- Total Value (add_payment_info Event only)
-- Purchase ~
ecommerce.transaction_id AS ecommerce_transaction_id, -- Transaction ID (purchase Event only)
event_value_in_usd, -- Total value of purchase (purchase Event only)
ecommerce.shipping_value_in_usd AS ecommerce_shipping_value_in_usd, -- shipping fee (purchase Event only)
-- Lifetime Value
user_ltv.revenue AS user_ltv_revenue, -- User's Lifetime Value
user_ltv.currency AS user_ltv_currency, -- User's Lifetime Value Currency
FROM `iotrust-data.analytics_123456789.events_*` MAIN
LEFT JOIN UNNEST (event_params) AS ga_session_id ON ga_session_id.key = 'ga_session_id'
LEFT JOIN UNNEST (event_params) AS ga_session_number ON ga_session_number.key = 'ga_session_number'
LEFT JOIN UNNEST (event_params) AS session_engaged ON session_engaged.key = 'session_engaged'
LEFT JOIN UNNEST (event_params) AS engagement_time_msec ON engagement_time_msec.key = 'engagement_time_msec'
LEFT JOIN UNNEST (event_params) AS percent_scrolled ON percent_scrolled.key = 'percent_scrolled'
LEFT JOIN UNNEST (event_params) AS page_location ON page_location.key = 'page_location'
LEFT JOIN UNNEST (event_params) AS entrances ON entrances.key = 'entrances'
LEFT JOIN UNNEST (event_params) AS page_referrer ON page_referrer.key = 'page_referrer'
LEFT JOIN UNNEST (event_params) AS campaign ON campaign.key = 'campaign'
LEFT JOIN UNNEST (event_params) AS medium ON medium.key = 'medium'
LEFT JOIN UNNEST (event_params) AS source ON source.key = 'source'
LEFT JOIN UNNEST (event_params) AS term ON term.key = 'term'
LEFT JOIN UNNEST (event_params) AS outbound ON outbound.key = 'outbound'
LEFT JOIN UNNEST (event_params) AS link_domain ON link_domain.key = 'link_domain'
LEFT JOIN UNNEST (event_params) AS link_url ON link_url.key = 'link_url'
LEFT JOIN UNNEST (event_params) AS transaction_id ON transaction_id.key = 'transaction_id'
LEFT JOIN UNNEST (event_params) AS ecomm_prodid ON ecomm_prodid.key = 'ecomm_prodid'
LEFT JOIN UNNEST (event_params) AS ecomm_pagetype ON ecomm_pagetype.key = 'ecomm_pagetype'
LEFT JOIN UNNEST (event_params) AS currency ON currency.key = 'currency'
LEFT JOIN UNNEST (event_params) AS ecomm_totalvalue ON ecomm_totalvalue.key = 'ecomm_totalvalue'
LEFT JOIN UNNEST (event_params) AS total ON total.key = 'total'
LEFT JOIN UNNEST (event_params) AS value ON value.key = 'value'
LEFT JOIN UNNEST (event_params) AS shipping ON shipping.key = 'shipping'
LEFT JOIN UNNEST (event_params) AS tax ON tax.key = 'tax'
WHERE
_table_suffix BETWEEN
FORMAT_DATE('%Y%m%d', 'yyyy-mm-dd')
AND FORMAT_DATE('%Y%m%d', 'yyyy-mm-dd')
AND user_pseudo_id IS NOT NULL
AND ga_session_id.value.int_value IS NOT NULL
AND ga_session_number.value.int_value IS NOT NULL
),
4. Standardize the session UTM values for all events within each session.
For some reason that is not entirely clear, due to the inaccuracies in GA4, there are cases where the UTM values that contributed to the acquisition of specific sessions are occasionally lost.
User ID | Session ID | Event Name | Session Campaign | Session Medium | Session Source |
---|---|---|---|---|---|
Joshua | 12345 | session_start |
(n/a) | (n/a) | (n/a) |
Joshua | 12345 | page_view |
summer-event | owned-media | |
Joshua | 12345 | user_engagement |
summer-event | owned-media | |
Joshua | 12345 | scroll |
(n/a) | (n/a) | (n/a) |
Joshua | 12345 | begin_checkout |
summer-event | owned-media | |
Joshua | 12345 | add_payment_info |
summer-event | owned-media |
I have encountered several weird cases like this.
Therefore, in such cases, if there are session UTM values within the same session, the UTM values from the first event that occurred should be propagated to all the other events within the same session, ensuring they have the same values.
CTE_session_campaign_sequence AS (
SELECT
user_pseudo_id, ga_session_id,
session_campaign,
session_medium,
session_source,
session_term,
CASE
WHEN session_campaign IS NOT NULL THEN
ROW_NUMBER() OVER (
PARTITION BY
user_pseudo_id, ga_session_id,
CASE WHEN session_campaign IS NOT NULL THEN 1 ELSE 0 END
ORDER BY
event_timestamp
)
WHEN session_campaign IS NULL THEN
1
END AS row_num
FROM
CTE_flattened
),
CTE_utm_spread AS ( -- spread session utm parameters throughout each user & session
SELECT
MAIN.* EXCEPT (session_campaign, session_medium, session_source, session_term),
SUB.real_session_campaign AS session_campaign,
SUB.real_session_medium AS session_medium,
SUB.real_session_source AS session_source,
SUB.real_session_term AS session_term
FROM
CTE_flattened MAIN
LEFT JOIN (
SELECT
user_pseudo_id, ga_session_id,
MAX(session_campaign) AS real_session_campaign,
MAX(session_medium) AS real_session_medium,
MAX(session_source) AS real_session_source,
MAX(session_term) AS real_session_term
FROM
CTE_session_campaign_sequence
WHERE
row_num = 1
GROUP BY
user_pseudo_id, ga_session_id
) SUB
ON
MAIN.user_pseudo_id = SUB.user_pseudo_id
AND MAIN.ga_session_id = SUB.ga_session_id
ORDER BY
MAIN.user_pseudo_id, MAIN.ga_session_id, MAIN.event_timestamp
),
5. Replace NULL session UTM values with the value “(direct)
”.
Personally, I tend to consider the “(direct)
” values more as “unknown” rather than the direct traffic. Nevertheless, to ensure that calculations are not affected by NULL values during the aggregation process, here I proceeded with replacing all NULL values with “(direct)
”.
CTE_fill_na AS ( -- Replace NULL session_utms with (direct)
SELECT
* EXCEPT (session_campaign, session_medium, session_source, session_term),
IFNULL(session_campaign, '(direct)') AS session_campaign,
IFNULL(session_medium, '(none)') AS session_medium,
IFNULL(session_source, '(direct)') AS session_source,
IFNULL(session_term, '(direct)') AS session_term
FROM CTE_utm_spread
),
6. Extract only the necessary columns for calculating attribution.
You only need the following columns:
user_pseudo_id
event_date
event_timestamp
session_campaign
session_medium
session_source
conversion
(whether they have made their purchase or not)revenue
(total value of purchase)
CTE_user_sessions_channel_conversion AS (
SELECT
user_pseudo_id,
event_date, event_timestamp,
session_campaign, session_medium, session_source,
CASE
WHEN event_name = 'session_start' THEN 0
WHEN event_name = 'purchase' THEN 1
END AS conversion,
event_value_in_usd AS revenue
FROM
CTE_fill_na
WHERE
event_name IN (
'session_start',
'purchase'
)
ORDER BY
user_pseudo_id, event_timestamp
),
7. Assign journey numbers to each user’s purchase event.
Since a user may make multiple purchases, it’s necessary to recalculate attribution after the first purchase event.
You’ll need to separate each journey when it comes to calculating Joshua’s attribution model.
CTE_purchase_journey AS (
SELECT
user_pseudo_id,
event_date, event_timestamp,
session_campaign, session_medium, session_source,
conversion,
revenue,
COALESCE(cumsum_conversion, 0) + 1 AS purchase_journey
FROM
(
SELECT
*,
SUM(conversion) OVER (
PARTITION BY user_pseudo_id
ORDER BY event_timestamp
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS cumsum_conversion
FROM
CTE_user_sessions_channel_conversion
)
ORDER BY
user_pseudo_id, event_timestamp
),
8. Assign Priorities to indicate the order of events for applying the Last Click Attribution Model.
When applying Last Click Attribution mechanically, there is a high possibility of incorrect attribution in cases like the following:
- Joshua was initially acquired through the Summer Event campaign, then explored other information, and later started a new session through a Google search before making a purchase.
- In this case, it should be considered that Joshua made the purchase not primarily due to the effectiveness of our Google SEO strategy, as he was already aware of our website. Instead, it can be attributed to the appeal of the Summer Event, which influenced his decision to make the purchase.
Therefore, I have assigned priorities based on the session_campaign as follows:
- 1 <
direct
< 2 - 2 <
organic
< 3 - 3 <
referral
< 4 - 4 <
identified campaign
CTE_attribution_priority AS ( -- Adjusted Last Click Model
SELECT
*,
CASE
WHEN session_campaign = '(direct)'
THEN 1 + ROW_NUMBER() OVER (
PARTITION BY user_pseudo_id, purchase_journey
ORDER BY event_timestamp
) / 1000
WHEN session_campaign = '(organic)'
THEN 2 + ROW_NUMBER() OVER (
PARTITION BY user_pseudo_id, purchase_journey
ORDER BY event_timestamp
) / 1000
WHEN session_campaign = '(referral)'
THEN 3 + ROW_NUMBER() OVER (
PARTITION BY user_pseudo_id, purchase_journey
ORDER BY event_timestamp
) / 1000
ELSE
4 + ROW_NUMBER() OVER (
PARTITION BY user_pseudo_id, purchase_journey
ORDER BY event_timestamp
) / 1000
END AS attribution_priority
FROM
CTE_purchase_journey
ORDER BY
user_pseudo_id, event_timestamp
),
9. The session_campaign
with the highest priority will be attributed to 100% of the revenue.
I have assigned priority classes based on the campaigns such as direct, organic, referral, and identifiable campaigns. However, within the same class, the attribution will be applied based on the most recent event, following the Last Click Attribution Model. Now, you can call this approach “Joshua’s Adjusted Last Click Attribution Model”. 😄
CTE_revenue_attribution AS (
SELECT
user_pseudo_id,
event_date, event_timestamp,
session_campaign, session_medium, session_source,
conversion,
revenue,
purchase_journey,
CASE
WHEN attribution_priority = MAX(attribution_priority) OVER (
PARTITION BY user_pseudo_id, purchase_journey
)
THEN MAX(revenue) OVER (
PARTITION BY user_pseudo_id, purchase_journey
)
ELSE
NULL
END AS revenue_attribution
FROM
CTE_attribution_priority
ORDER BY
user_pseudo_id, event_timestamp
)
10. Now, let’s perform the Last Click Attribution aggregation for each channel.
SELECT
CASE
WHEN session_campaign = '(organic)' THEN 'Search Engine'
WHEN session_campaign = '(direct)' THEN 'Direct or Unknown'
WHEN session_campaign = '(referral)' THEN 'Social Media, Blogs, etc. (without UTM)'
ELSE session_campaign
END AS session_campaign_edited,
CASE
WHEN session_campaign = '(organic)' THEN session_source
WHEN session_campaign = '(referral)' THEN session_source
ELSE NULL
END AS session_source,
SUM(revenue_attribution) AS revenue,
COUNT(DISTINCT user_pseudo_id) AS acquisition_user_cnt,
SUM(revenue_attribution) / COUNT(DISTINCT user_pseudo_id) AS avg_revenue_per_user,
COUNT(revenue_attribution) AS purchase_cnt,
SUM(revenue_attribution) / COUNT(revenue_attribution) AS avg_revenue_per_purchase
FROM
CTE_revenue_attribution
GROUP BY
session_campaign_edited, session_source
ORDER BY
revenue DESC
;