Bayesian Modeling of Marketing Data

In this project I wanted to simulate modeling ROI values for a marketing campaign. To do this, I first outlined the general concept. I was going to use a Bayesian statistical model to analyze marketing campaign data for an imagined company.

The first step in this process is creating reasonable data for a company. (I wanted to simulate my own data instead of using an existing company’s because I was unclear on usage laws for that kind of data.) My plan was to use R and generate 100 simulated marketing campaigns through a randomizer restrained to reasonable values. Because I am not a marketing expert, I consulted ChatGPT to see what variables I would need and what constituted reasonable values.

ChatGPT first recommended breaking the marketing campaigns into four different types (necessary to decide what is the most effective marketing tool). These were: social media ads, email, displays, and events (both in person and online). Then ChatGPT suggested key variables for each marketing campaign: campaign ID, type, money spent, successful conversions, revenue, and date.

Combining these types and variables, the last bit of information I gained from ChatGPT was realistic averages for the different marketing types. These are the values I used:

-          Social Media Ads: 3% conversion rate, $500 revenue per conversion

-          Email Campaigns: 5% conversion rate, $400 revenue per conversion

-          Display Ads: 2% conversion rate, $300 revenue per conversion

-          Events: 7% conversion rate, $600 revenue per conversion

Once these values were set, I created a code snippet in R to simulate 100 marketing campaigns with random data within a reasonable margin of the averages. The code is shown to the side here. The comments explain general ideas, but I will expand on a few things here, especially concerning the generation of marketing results.

Within the third code chunk (starting on line 22), I added three columns to the data set. These three columns are the marketing results from a campaign. The first column, spent, refers to the money spent on the campaign. This value was created by picking from a normal distribution around the average spend value for the type of campaign. The second column, conversions, uses a binomial distribution to simulate how many times the campaign was successful. The final column, revenue, was the simplest: conversions multiplied by revenue per conversion.

With the data simulated, I could now move on to some surface level analysis and visualization to better understand the data. Since part of the overall goal is comparing the campaign types, I wrote another section of code to summarize the data by marketing campaign type. This was a simple usage of group_by() and summarize() functions as shown on the right.

Unsurprisingly, the averages for the different types were close to the initially inputted averages I used to simulate the values. This is the complete opposite of groundbreaking, but it did serve as a good sanity check for my code.

Next was a quick calculation of ROI for each campaign type overall. This was simply total revenue divided by total spent. I placed these values in a new column (named ROI) of the typeInfo data frame from above.

With the calculated ROI, I moved on to creating the Bayesian model. I used the brms package for this.

The first step is to create the formula the model will follow. I set up revenue as a function of money spent, and then allowed the type of marketing campaign to be the random intercept. The formula then took on the following form:

bf(Revenue ~ Spent + (1 | Type))

Next I created the priors for the model to start with. Using ChatGPT’s original recommendation from when I asked it for simulation variables, I kept the priors relatively weak because of the nature of simulating data. In the context of the code, the weak priors correspond to the allowed spread, such as in normal (0,5). This sets up the prior intercept to be near 0 with a large spread of 5. The large spread is what creates a weak prior here. I coded for the slope, intercept, and random effect priors. The code for that is shown to the side again. (One interesting thing to note from the code is that I chose to use the Cauchy distribution for the standard deviation; that is relatively common for Bayesian modeling and seemed appropriate for the simulated situation in this project.) The next step was actually fitting the model.

My first code for fitting the model ended up with divergent transitions. In order to fix this, I added the argument control = list(adapt_delta = 0.95). This still created a divergence error, so next I increased adapt_delta to 0.99, increased to 3000 iterations from 2000, and increased warmup from 500 to 750. Again an error, so I increased again. I was finally able to make the model converge with the following arguments (note that the arguments seem complicated, but are really quite simple to use just following the notation from the brms documentation):

-          Formula = mf

-          Data = campaigns

-          Family = gaussian()

-          Prior = priors

-          Chains = 4

-          Iter = 4000

-          Warmup = 1000

-          Cores = 4

-          Control = list(adapt_delta = 0.995)

With this complete, I started to look at the results. First I got the overall results through the command

summary(ROIbm)

After that, I wanted to analyze each marketing campaign type separately, so I used the command

type_differences <- ranef(ROIbm)$Type.

This gave the following results shown to the side.

From an initial glance, the different types seem to have relatively similar error levels. They are significantly different in estimated ROI, however. Events seem to give the greatest ROI by a significant margin, then followed by email, social media, and finally display ads. To further analyze the differences between the campaign types, I created a quick plot using ggplot2. From looking at the plot, events is still the clear winner in ROI, but we can now see that email and social media have a large amount of overlap within their credible intervals.

This plot and data shows the ROI by marketing campaign type, so we can now use the learned information to inform our imagined company. The results could be summarized as followed: By far the most effective and efficient marketing campaigns are event based. Social media and email campaigns are nearly interchangeable in terms of ROI. While they are significantly less efficient than event based campaigns, their comparative ease of use could make them worthwhile candidates. I would recommend against using display advertisements as the error range shows a real chance at zero return on investment.

That concluded my analysis for this scenario, but I will include a copy of the entire R script I wrote for this project below.

Next
Next

Discussion of HeatCheck