Looker Studio Function: APPROX_COUNT_DISTINCT
Category: Aggregation function
The APPROX_COUNT_DISTINCT function in Looker Studio (formerly Data Studio) is a powerful tool used to estimate the number of distinct (unique) values in a dataset. Unlike the exact COUNT_DISTINCT, this function uses an approximation method that improves performance on large datasets while offering a high degree of accuracy.
Purpose of the APPROX_COUNT_DISTINCT Function
1. Data Distribution Analysis :
APPROX_COUNT_DISTINCT helps identify the spread and uniqueness of data across large volumes. For example, you can analyze the number of distinct product IDs purchased in different regions without overloading your system.
2. Customer Segmentation :Use this function to estimate the number of unique customers across various segments such as age groups, locations, or product categories. This enables marketers to create more targeted campaigns efficiently.
3. Decision Making :
Accurate estimation of unique entities (e.g., users, sessions, transactions) helps in making quick and informed business decisions, especially when dealing with big data where exact counts are resource-intensive.
Type of Calculation:
| Use Case | Description |
|---|---|
| Performance Benchmark | Track the number of unique users or sessions across performance metrics. |
| Customer Segmentation | Analyze distinct users in marketing cohorts or demographics. |
| Time Series Analysis | Estimate unique visits or interactions across time intervals like days or weeks. |
APPROX_COUNT_DISTINCT(field_name)
| Parameter | Type | Description |
|---|---|---|
| field_name | Dimension | The column or field whose distinct values you want to estimate. |
How Does the APPROX_COUNT_DISTINCT Function Work?
The function uses a probabilistic algorithm (commonly based on HyperLogLog) to estimate the number of distinct values in a dataset. It is particularly useful for handling large-scale data where exact distinct counts are expensive to compute. It runs efficiently and is less memory-intensive.
When Should You Use APPROX_COUNT_DISTINCT?
When dealing with large datasets where performance is a concern.
For estimations in dashboards that refresh frequently.
In use cases where exact accuracy is not critical but speed is important.
During exploratory data analysis, where quicker insights are needed.
Example of APPROX_COUNT_DISTINCT with Result
Example 1: Basic Usage
APPROX_COUNT_DISTINCT(Customer_ID)
Result: Estimates the number of unique customers.
Example 2: Column Usage
APPROX_COUNT_DISTINCT(Session_ID)
Used in a report column to show estimated unique sessions per channel.
Example 3: Handling Decimals and Scientific Notation
APPROX_COUNT_DISTINCT(CAST(Purchase_Amount AS STRING))
When numeric values may appear in scientific notation, casting helps in accurate estimation.
Example 4: Combining with Other Functions
APPROX_COUNT_DISTINCT(CASE WHEN Country = 'USA' THEN User_ID ELSE NULL END)
This estimates unique users from the USA only.
Tips and Tricks
Use
APPROX_COUNT_DISTINCTwhen working with big data sources like BigQuery or GA4.Combine with
CASEstatements for conditional unique counts.Use
CASTto convert data types for better accuracy in estimation.Avoid in cases where exact numbers are business critical (e.g., billing).
Works best with non-volatile, deduplicated data.
It offers faster performance and uses less memory when dealing with large datasets.
It provides a highly accurate estimate—typically within 1-2% of the actual distinct count.
Yes, but it may be unnecessary since exact counting is not resource-heavy for small datasets.
It is supported by major data sources like BigQuery, GA4, and Google Sheets, though availability may vary.
Yes, it can be used within CASE, IF, or mathematical expressions for advanced logic.