Analyzing and Predicting Starbucks Promo Types.

Everlyne Nalule
8 min readNov 5, 2020

How can we maximize the effectiveness of offers that have been sent out?

This is a capstone project of the data scientist course in Udacity. The data set contains simulated data that mimics customer behavior on the Starbucks mobile app. Once every few days, Starbucks sends out an offer such as a discount or BOGO (buy one get one free) to users of the mobile app. The data set includes 3 files, which are:

  • Portfolio, that contains recorded offer types
  • Profile, that contains customer profiles
  • Transcript, that contains records of when a person received the offer, viewed the offer, completed the offer, and their consumption.

Problem Statement

Numerous businesses take on various business incentives to boost sales and thereby profits, which are usually effective but also come at a cost and in most cases causing a break-even. How then do we make the most and maximize all the potential that these incentives and offers have?

To answer this question, we find a solution to the following questions?

What is the relationship between customer demographics and their reaction to the different offers? is there a specific reason why some clients respond the way they do?

Is it possible to predict which offers are best suitable for who to maximize the desired goal which is increase in sales?

Which features most affect positive response to an offer?

Exploratory data analysis was done to answer the tackle the first question with visuals to that can easily be interpreted. For the second question, we applied machine learning models that is Random Forest because it is one of the most accurate learning algorithms available. For many data sets, it produces a highly accurate classifier. It also runs efficiently on large databases. We handled the third question using feature importance .

Data

Every few days, Starbucks sends out an offer to mobile app users. Some users might not receive any offers during certain weeks and not all users receive the same offer. There are 3 types of offers:

  • BOGO (buy one get one free) — spend amount A in ONE purchase before the offer expires to get reward R of equal value to A
  • Discount — spend amount A in ONE OR MORE purchases before the offer expires to get a discount D of equal or lesser value to A (all purchases within the validity period accumulate to meet the required amount A)
  • Informational — only provides information about a product

Customers do not opt into the offers they receive. In other words, a user can receive an offer, never actually view the offer, and still complete it. While these offers were recorded as completed, they really had no influence on the customer because they were not viewed.

There are 3 associated datasets:

  1. Porfolio (10 offers x 6 fields) — metadata for each offer

id — offer ID
offer_type— BOGO, discount, or informational
difficulty — required spending amount to complete the offer
reward — reward for completing the offer
duration — validity period in days (the offer expires after this period)
channels — web, email, mobile, social

2. Profile (17,000 users x 5 fields) — demographic data for each user

age — missing values were encoded as 118
became_member_on — date in which the customer created an account
gender — “M” for male, “F” for female, and “O” for other
id — customer ID
income — annual income of the customer

3. Transcript (306,534 events x 4 fields) — records of events that occurred during the month

event — transaction, offer received, offer viewed, or offer completed
person — customer ID
time — number of hours since the start of the test (begins at time t=0)
value — details of the event (offer metadata for offer-related events and amount for transactions)

A summary of the findings is recorded below

Cleaning the data.

Before any analysis can be done, the data sets had to first and foremost be cleaned to give accurate results. The data sets were each individually cleaned by removing the null values, one-hot encording and creating dummy values so as to obtain the features we need to carry out analysis and for the machine learning.

Data Preprocessing

For the portfolio data, we one-hot encord the channels to extract the web, email, social and mobile and the offer_types to get the BOGO, Informational and Discount columns. We then check to ensure that the table has no null values

For the profile data, we drop the null values, which were mainly associated with the age of 118, which was not also logical. I therefore decided to drop all the rows with the age of 118. I also extracted the member ship year of the client from the membership when column. Age groups were also created as we required them for analysis.

For the transcript data, I extracted the keys from the value column which were offer_id, amount, and reward. I then dropped the value column as we did not require it. One-hot encoding for the event to give the offer received, offer viewed and offer completed columns was done.

The three data sets were then merged into one clean dataframe that we used for the analysis.

showing summary of final data set.

Exploratory Data Analysis

This includes analyzing data sets to summarize their main characteristics, often with visual methods.

With the visual above, one can clearly observe that whereas numerous offers are received, few are actually completed. This is key information to be worked on.

The graph above shows the reaction of the different age groups to the offer types, in this case, the BOGO, Discount, and Informational offers. Clearly, we have most of the clients responding to offers in the 50–60’s age groups.

Generally, the male sex is more dominant than the female which is contrary to popular knowledge.

Data Modeling

I wanted to see if a machine learning approach can predict which offer should be sent to who on training and testing data derived from the starbucks_df dataframe I generated.

Some data preprocessing was also done before fitting the data in the models, such as changing categorical data to numerical data.

Metrics

Supervised machine learning models were used to solve the second question. I used Decision tree and Random Forest models. A decision tree is built on an entire dataset, using all the features/variables of interest, whereas a random forest randomly selects observations/rows and specific features/variables to build multiple decision trees from and then averages the results.

Among various machine learning classifiers, we used decision trees because they are particularly well suited for classification tasks. They are easy to interpret by non-statistician and are intuitive to follow. They cope with missing values and are able to combine heterogeneous data types into a single model, whilst also performing an automatic principal feature selection. However, Decision trees are very prone to overfitting, which challenge we overcame using Random Forest which doesn't suffer this problem and is highly accurate and can handle missing values.

The independent variables(X) were based on what factors most likely affected the one’s decision on whether to respond to an offer or not. They included the age, gender, reward, difficulty, etc. They were no complications that were encountered, save for the fact that the data set was relatively small.

To evaluate the strength of the model, I went with F1-scores, which is used to measure a test’s accuracy

F1 Score is the Harmonic Mean between precision and recall. The range for F1 Score is [0, 1]. It tells you how precise your classifier is (how many instances it classifies correctly), as well as how robust it is (it does not miss a significant number of instances).

High precision but lower recall, gives you an extremely accurate, but it then misses a large number of instances that are difficult to classify. The greater the F1 Score, the better is the performance of our model. Mathematically, it can be expressed as :

F1 Score

I used a random forest classifier, and then GridSearch to fine-tune the hyperparameters of the model to improve the F1 score. (However this was not required as both the models used returned a score of 1, that depicts accuracy. The F-score is a measure of a test’s accuracy with the highest possible value of an F-score being 1, indicating perfect precision and recall, and the lowest possible value is 0.

This indicates that my model is accurate and can correctly predict which offer type a client would respond to thereby eliminating unnecessary costs and attaining the most of the offers.

I also calculated feature importance on the data to see which feature affected the success of an offer the most. This, therefore, means that before an offer is sent, one would consider the top features such as to invoke a response where there wouldn't have been one.

list of feature importance from the highest to the lowest.

Conclusions

  1. The people in the age range of 50–65 are more likely to visit a Starbucks
  2. Overall, Bogo is the most popular kind of Offer Type
  3. Looking at different age groups, we can see that Bogo is popular than any other type of offers except for the ones in theirs 30s where it is as popular as the Discount offer and the ones who are in their 60s where Discount is more popular. But Informational is the least popular of all in all age groups
  4. In most of the cases, the offers were received but not completed. Discount offer was the which was received by most and also completed followed by BOGO

And created a Machine Learning model using Random Forest Classifier with the accuracy of 1. I may be getting an accuracy of 1 due to considering only the most important features and dropping all unnecessary features.

5. Its also key to note that unlike what would be expected, income doesnot affect the choice on whether to act complete an offer or not, rather the duration, difficulty and reward are the key factors.

There may be overfitting which can be solved by considering more data. As more rows were eliminated due to Nan values and duplicates the model had less data to work with. The data available on the customer should also be indepth to define each individual customer. The features of the customer would have helped in producing better classification model results.

As told above, this is only but a summary of my findings and is subject to correction. More of this can be looked at from my GitHub.

--

--