arrow_back

Apply RFM method to segment customer data

ログイン 参加
Test and share your knowledge with our community!
done
Get access to over 700 hands-on labs, skill badges, and courses

Apply RFM method to segment customer data

Lab 1時間 30分 universal_currency_alt クレジット: 2 show_chart 入門
Test and share your knowledge with our community!
done
Get access to over 700 hands-on labs, skill badges, and courses
important icon IMPORTANT:

desktop/labtop icon Make sure to complete this hands-on lab on a desktop/laptop only.

check icon There are only 5 attempts permitted per lab.

quiz target icon As a reminder – it is common to not get every question correct on your first try, and even to need to redo a task; this is part of the learning process.

timer icon Once a lab is started, the timer cannot be paused. After 1 hour and 30 minutes, the lab will end and you’ll need to start again.

tip icon For more information review the Lab technical tips reading.

Activity overview

Data transformation techniques are used to prepare data in a way that makes it easier for users to get the answers to their business questions quickly and efficiently. One example of data analysis that often requires transformation is customer segmentation.

Segmentation is the process of dividing data into groups based on common characteristics. This can be done for a variety of purposes, such as understanding customer behavior, identifying trends, and targeting marketing campaigns.

The RFM method is a customer segmentation technique that stands for recency, frequency, and monetary value. Recency refers to the time since the customer's last purchase. Frequency refers to the number of purchases the customer has made. Monetary value refers to the total amount of money the customer has spent.

The RFM method is a data-driven approach to customer segmentation that can be used to identify customers who are most likely to churn, upsell, or cross-sell. It can also be used to target marketing campaigns more effectively.

In this lab, you'll use BigQuery to apply the RFM method to segment customer data to help the marketing team decide how to better target their efforts.

Scenario

TheLook eCommerce has experienced rapid customer growth in recent years. But, despite this growth, customer retention and satisfaction have declined.

As a cloud data analyst for TheLook eCommerce, you’ve been asked to collaborate with Martina, the head of the marketing team, to find a way to better target marketing efforts across the customer base.

To do this, you decide to conduct an RFM analysis by identifying three behaviors for each customer: the recency of their latest purchase, the frequency of purchasing, and the total amount of money spent. The RFM method is a customer segmentation technique that will help you rank customers in four groups including High-Value Customers, Loyal Customers, At Risk Customers, and Persuadable Customers.

This RFM analysis can help the marketing team identify customer behavior and focus their marketing efforts on these behaviors to keep them engaged.To complete the RFM analysis, you’ll use transformation techniques to explore a dataset, aggregate data, join data, derive data, and apply a statistical method to the data.

Here’s how you'll do this task: First, you’ll analyze the data. Second, you’ll use the orders table to determine the most recent orders and how often customers place orders. Third, you’ll use an inner join to deduce the total amount paid by the top 10 clients in 2022. Fourth, you’ll create a Common Table Expression (CTE) to consolidate query results. Finally, you’ll apply a statistical method to RFM calculations for customer segmentation.

Setup

Before you click Start Lab

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.

This practical lab lets you do the activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).
Note: Use an Incognito or private browser window to run this lab. This prevents any conflicts between your personal account and the Student account, which may cause extra charges incurred to your personal account.
  • Time to complete the lab---remember, once you start, you cannot pause a lab.
Note: If you already have your own personal Google Cloud account or project, do not use it for this lab to avoid extra charges to your account.

How to start your lab and sign in to the Google Cloud console

  1. Click the Start Lab button. On the left is the Lab Details panel with the following:

    • Time remaining
    • The Open Google Cloud console button
    • The temporary credentials that you must use for this lab
    • Other information, if needed, to step through this lab
    Note: If you need to pay for the lab, a pop-up opens for you to select your payment method.
  2. Click Open Google Cloud console (or right-click and select Open Link in Incognito Window) if you are running the Chrome browser. The Sign in page opens in a new browser tab.

    Tip: You can arrange the tabs in separate, side-by-side windows to easily switch between them.

    Note: If the Choose an account dialog displays, click Use Another Account.
  3. If necessary, copy the Google Cloud username below and paste it into the Sign in dialog. Click Next.

{{{user_0.username | "Google Cloud username"}}}

You can also find the Google Cloud username in the Lab Details panel.

  1. Copy the Google Cloud password below and paste it into the Welcome dialog. Click Next.
{{{user_0.password | "Google Cloud password"}}}

You can also find the Google Cloud password in the Lab Details panel.

Important: You must use the credentials the lab provides you. Do not use your Google Cloud account credentials. Note: Using your own Google Cloud account for this lab may incur extra charges.
  1. Click through the subsequent pages:
    • Accept the terms and conditions
    • Do not add recovery options or two-factor authentication (because this is a temporary account)
    • Do not sign up for free trials

After a few moments, the Console opens in this tab.

Note: You can view the menu with a list of Google Cloud Products and Services by clicking the Navigation menu at the top-left. Google Cloud console menu with the Navigation menu icon highlighted

Task 1. Explore the data

In this task, you'll explore the data available in the thelook_ecommerce dataset.

  1. In the Cloud console, from the Navigation menu (Navigation Menu icon), select BigQuery.
Note: The Welcome to BigQuery in the Cloud Console message box may appear, providing links to the quickstart guide and the release notes for UI updates. Click Done to proceed.
  1. Expand the list of datasets by clicking the drop-down arrow next to the project ID.

  2. Scroll down to the thelook_ecommerce dataset. Expand the dataset and study the tables listed.

Notice the orders and order_items tables once the dataset has been expanded.

  1. Select the orders table and explore the Schema, Details, and Preview tabs.

  2. On the Schema tab, notice the columns (listed under the Field name column) in the table. Study the columns along with their Type information.

  1. Select the Details tab; study the Table info and Storage info sections.

  2. Select the Preview tab; the Preview tab displays the first 1000 rows of a table.

  3. On the Explorer pane, select the order_items table and explore the Schema, Details, and Preview tabs.

Task 2. Find the 10 most recent orders

Recency refers to how recently a customer made a purchase. It is measured in units of time, such as days, weeks, or months. A higher recency score indicates that a customer has made a purchase more recently.

In this task, you'll calculate the recency of customer orders to determine how recently each customer placed an order. For example, if a customer last made a purchase 400 days ago, their recency score would be 400.

  1. In BigQuery Studio, click + Compose new query. A new Untitled tab opens.
Note: Each time you run a new query in the Query Editor, you can either replace the old query by copying and pasting the new query over the previous one in the same Untitled tab, or you can click on the Compose new query (+) icon to open a new Untitled tab to run the query in.
  1. Copy the following query into the Query Editor. Take a moment to review the SQL code. Can you anticipate what the query results will look like?
SELECT user_id AS customer_id, DATE_DIFF(CURRENT_TIMESTAMP(), MAX(created_at), DAY) AS recency, FROM `thelook_ecommerce.orders` GROUP BY user_id ORDER BY recency DESC LIMIT 10;
  1. Click Run.

In the Query results section on the Results tab, a table displays the 10 customers who have placed orders the longest time ago.

Click Check my progress to verify that you have completed this task correctly. Find the 10 most recent orders

Task 3. Determine the order frequency

Customers who place orders more frequently are considered to be more engaged with the brand and are more likely to make repeat purchases in the future.

In this task, you'll determine the frequency of customer orders, which means for each customer, you'll count the total number of orders they placed. For instance, if a customer made 10 separate orders, the frequency for that customer would be 10.

  1. Copy the following query into the Query Editor. Take a moment to review the SQL code. Can you anticipate what the query results will look like?
SELECT user_id AS customer_id, COUNT(order_id) as frequency, FROM `thelook_ecommerce.orders` WHERE created_at >= '2022-01-01' and created_at < '2023-01-01' GROUP BY customer_id ORDER BY frequency DESC LIMIT 10;
  1. Click Run.

Hint: To find the highest frequency, click on the down arrow in the table and select Sort Descending.

Click Check my progress to verify that you have completed this task correctly. Determine the order frequency

Task 4. Determine the total amount spent

Monetary value is the total amount of money a customer has spent with a company over a specified period of time. Identifying the customers who spend the most money can help businesses improve customer relationships and target their marketing efforts to their most profitable customers.

In this task, you'll determine the total amount of money spent in 2022 by the top 10 customers. To do this you'll use an inner join to connect the order and order_items tables. You'll also need to use the SUM() method to add the sale price.

  1. Copy the following query into the Query Editor. Copy the following query into the Query Editor. Take a moment to review the SQL code. Can you anticipate what the query results will look like?
SELECT o.user_id AS customer_id, SUM(oi.sale_price) as monetary FROM `thelook_ecommerce.orders` o INNER JOIN `thelook_ecommerce.order_items` oi ON o.order_id = oi.order_id WHERE o.created_at >= '2022-01-01' and o.created_at < '2023-01-01' GROUP BY customer_id LIMIT 10;
  1. Click Run.

Click Check my progress to verify that you have completed this task correctly. Determine the total amount spent

Task 5. Create a CTE

A CTE or Common Table Expression is a temporary result set in a SQL query. A CTE is created using a WITH clause and a named SELECT statement. CTEs are usually referenced in other SELECT statements. A CTE makes it easier to write more complex queries without having to create a new table.

In this task, you’ll use a CTE to write a query that combines recency, frequency, and monetary computations you made earlier.

  1. Copy the following query into the Query Editor. Take a moment to review the SQL code. Can you anticipate what the query results will look like?
WITH rfm_calc AS ( SELECT o.user_id AS customer_id, DATE_DIFF(CURRENT_TIMESTAMP(), MAX(o.created_at), DAY) AS recency, COUNT(o.order_id) AS frequency, ROUND(SUM(oi.sale_price)) AS monetary FROM `thelook_ecommerce.orders` o INNER JOIN `thelook_ecommerce.order_items` oi ON o.order_id = oi.order_id GROUP BY customer_id ) -- You'll now return values from this CTE SELECT * FROM Rfm_calc;
  1. Click Run.

Click Check my progress to verify that you have completed this task correctly. Create a CTE

Task 6. Apply a statistical method to RFM calculations

RFM values measure how recently a customer has made a purchase, how frequently they make purchases, and how much they spend each time they purchase something. Quantiles divide a set of data into equal parts. For example, tertiles divide a set of data into three equal parts and a quartile divides data into four equal parts.

In this task, you'll segment the customers in this example using their RFM values and quantiles. You'll use the NTILE() function in BigQuery to calculate the quantiles of the RFM values. This will give you a better understanding of how your customers are distributed in terms of their RFM values.

To do this, you'll create a new CTE called rfm_quant. This CTE will use the NTILE() function to calculate the quantiles of the RFM values in the rfm_calc CTE that you created in the previous step.

Once you've created the rfm_quant CTE, you can use it to segment your customers.

Finally, you'll use the CASE statement to assign customer segmentation category names based on the quantiles you produced. A CASE statement is a SQL statement that allows you to evaluate multiple conditions and return a different value based on which condition is met.

  1. Copy the following query into the Query Editor:
WITH rfm_calc AS ( SELECT o.user_id AS customer_id, DATE_DIFF(CURRENT_TIMESTAMP(), MAX(o.created_at), DAY) AS recency, COUNT(o.order_id) AS frequency, ROUND(SUM(oi.sale_price)) AS monetary FROM `thelook_ecommerce.orders` o INNER JOIN `thelook_ecommerce.order_items` oi ON o.order_id = oi.order_id GROUP BY customer_id ), -- Here you're leveraging the rfm_calc CTE and creating another CTE rfm_quant AS ( SELECT customer_id, NTILE(4) OVER (ORDER BY recency) AS recency_quantile, NTILE(4) OVER (ORDER BY frequency) AS frequency_quantile, NTILE(4) OVER (ORDER BY monetary) AS monetary_quantile FROM rfm_calc ) --And then you perform a select query that assigns categories based on quantile logic and returns values SELECT customer_id,recency_quantile, frequency_quantile, monetary_quantile, CASE WHEN monetary_quantile >= 3 AND frequency_quantile >= 3 THEN "High Value Customer" WHEN frequency_quantile >= 3 THEN "Loyal Customer" WHEN recency_quantile <= 1 THEN "At Risk Customer" WHEN recency_quantile >= 3 THEN "Persuadable Customer" END AS customer_segment FROM rfm_quant;
  1. Click Run.

Click Check my progress to verify that you have completed this task correctly. Apply a statistical method to RFM calculations

Conclusion

Great work!

You successfully helped Martina better understand their customers by segmenting the data.

First, you analyzed the data.

Second, you used the orders table to determine the most recent orders and how often customers place orders.

Third, you used an inner join to deduce the total amount paid by the top 10 clients in 2022.

Fourth, you created a CTE to consolidate query results.

Finally, you applied a statistical method to RFM calculations for customer segmentation.

You now have practical experience using BigQuery to calculate how recently and how often customers have made purchases. You also learned how to calculate how much money they have spent by combining two tables. Then, you used those tables and calculations to create a new temporary table. This allowed you to use statistical methods to group customers into segments.

You’re well on your way to understanding how to utilize the RFM method on data sources in BigQuery.

End your lab

Before you end the lab, make sure you’re satisfied that you’ve completed all the tasks. When you're ready, click End Lab and then click Submit.

Ending the lab will remove your access to the lab environment, and you won’t be able to access the work you've completed in it again.

Copyright 2024 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.