LumenMath: How to Better Predict Campaign Performance
This is a sequel article to one of our biggest hits of 2019, LumenMath: The Rule of Three, or Should I Keep This Audience or Platform Running?
For fans of statistics and ad tech, here’s some great news: I have more esoteric probability formulas to share!
For everyone else I have bad news: I’m about to totally upend how you forecast CPA, CPC, CTR (and basically everything except CPM).
The formula I will walk you through in this article provides a more nuanced and (dare I say) accurate way to predict campaign performance. It’s complicated on the surface and requires me to walk you through some pretty heady concepts, but by the end, you will have the tools to execute more effective campaigns.
Let’s get to it.
Why does this matter?
One of the most important parts of keeping your advertising strategy on track is being able to accurately predict campaign performance. Knowing where you’ve been is important, but so is knowing where you’re going.
Small changes can propagate and a difference of 10%, 20% or 30% (like in our example later) can have huge repercussions, especially early on. Campaign optimization and maintenance hinges on being able to accurately calculate current and future performance. It should come as no surprise that making decisions based on data requires the right formulas and data.
You will be able to use this formula to improve your campaigns, and if you’re not quite ready for that, the concepts I introduce will help you start to think about predictive analysis in the right way.
There’s a difference between past, present and future values.
For better or worse, much of the digital advertising industry uses metrics like CPA, CPC, CTR, and others to measure success, monitor metrics and predict campaign performance. They’re popular because they provide meaningful information with an easy to understand formula.
When you break down the formulas, it’s pretty clear why they’re so relied upon:
CPA = cost / acquisitions
CPC = cost / clicks
CTR = clicks / impressions
These are easy to calculate and rely entirely on information you should generally track anyway.
However, these formulas only work for past and present values, not future values. In other words, they’re not well equipped to predict campaign performance. There’s a difference between observed data and estimated/forecasted data. Not only are the two measured in different ways, but they can vary in value dramatically.
We’ll get into the math soon, but for now let’s focus on getting an intuitive understanding why measurements and forecasts are different.
The problem with these metrics (CPA, CPC, etc.) is that, at their core, they calculate the average of values without taking into account the true range of what they’re measuring. OK, let’s break that down in real people terms.
Imagine you bought an orange at the grocery store for $1. You know exactly how much that one orange costs. You could precisely determine the “Cost Per Orange” (CPO = $1 / 1 orange) because you know how much you spent and how much you received.
However, if you intend to go buy more oranges from somewhere else, it might not cost the same. Different stores may have different prices, seasonality may affect supply, and the size or quality of the orange itself may be significantly different.
The cost of your first orange only gives you a rough idea of what future ones may cost.
Continuing this example, imagine you buy 3 more oranges from different stores for $3.00, $1.75 and $0.50. Suddenly our understanding of orange prices has completely changed. After a single purchase, oranges appeared to cost about $1.00, now they can cost between $0.50 and $3.00.
You can still precisely measure how much you’ve spent on oranges ($1.00 + $3.00 + $1.75 + $0.50 = $6.25) and you can precisely measure how many oranges you bought (1 the first day and 3 more the second day = 4 total oranges). However, this new data has made it much less clear what future oranges may cost. We’ve now seen CPO as low as $0.50 and as high as $3.00.
How could we go about making cost estimates? You may be tempted to use the average of these four values, but that would actually be your past CPO, the very formula we’re trying to improve upon. You could also use either the high or low bound, but that would cause your estimates to be too optimistic or too pessimistic respectively. Another way would be to take the midpoint of the high and low values: in this case ($0.50 + $3.00) / 2 = $1.75 per orange.
However, the validity of this approach depends on the width of the range. If we came across a $100 orange it would really throw a wrench into this calculation.
Now, instead of oranges, think about advertising inventory. While it’s uncommon for you to run across a $0.50 orange one day and a $100 orange the next, it’s not as uncommon to find a wide range (though maybe not this wide) when it comes to buying ad space.
Also, it’s important to note that this example, while helpful for understanding the concepts, leaves out an important nuance. The more oranges you buy, the better idea you will have of what future oranges will cost. But, you will have to buy a lot of oranges. Campaigns have a limited timespan. Do you have time to rely on this method to predict campaign performance? Not really. It’s a catch-22. To accurately predict campaign performance, you wait until most of the campaign is already run.
So, how do we go about calculating a more accurate metric (like CPO) that we can use to predict campaign performance? It turns out there’s a formula to do everything we need, which is to provide a high and low estimate that will improve with more data. However, there’s one last ingredient before we’re ready to use it: confidence.
The roll of a confidence interval.
To put it simply, the original method (actions/impressions) will provide a better estimate over time as you get more actions. The method I’m about to introduce, (which is based on a confidence interval) is applicable earlier on in a campaign and will still provide fairly accurate predictions. The early stages of a campaign are when decisions and optimizations are most important.
The more accurate your predictions early on, the better your performance will be. And confidence intervals give you more accurate predictions earlier on.
For the sake of brevity (and my sanity to an extent) I glossed over an explanation of a confidence interval in my previous article. So, let me address it in a little more detail here.
A confidence interval is a range that contains a value with some level of confidence. Confidence is the probability the true value is contained within that interval. A 95% confidence interval would be 95% likely to contain the true value of whatever you’re trying to calculate or predict.
We can get a better idea about what this actually means with a probabilistic model everyone’s familiar with: dice.
Your Title Goes Here
I’m talking about a six-sided dice here. I love D&D, but let’s stick to easy platonic solids for now. We also assume this is a fair dice.
Rolling a single die can result in any number from 1 to 6. If you had to guess the next result of a roll, there’s no way of knowing exactly what it’ll be.
However, there are 2 things you do know:
- You know the odds of rolling any number is equal.
- You know there are only 6 possible outcomes.
We can restate this idea with confidence and confidence intervals. Your confidence of rolling a 5 is 1/6 ~ 17% (the number of chances for that outcome / the total number of outcomes). You can also say with 100% confidence the outcome will be 1,2,3,4,5 or 6 (because those are the only options).
This seems kind of obvious, but you can use these two ideas to approach more complicated questions.
What if we only cared about rolling 5 or less? Well, you’d be pretty confident (5/6 ~ 83%) that you could roll that. The true value of the roll will be between 1 and 5 with confidence of 83%. Notice, however, we’re now talking about a range of values instead of a single estimate. This is a confidence interval.
Now that we understand confidence and confidence intervals we’re ready for the formula.
In my Rule of Three article, we established how to express metrics with no actions as a confidence interval, but what about metrics with actions?
For that, we’re gonna need a lot more math, but let’s review what we’ve covered so far.
- Accurate forecasts are critical for optimizing campaigns.
- Simplified calculations we usually rely on are inaccurate (or at least not very useful) when used to predict campaign performance.
- We need to calculate a confidence interval that:
- Will produce a high and low estimate.
- Will improve over time and with more data.
- Can work with any confidence level.
- Will work for null case (a.k.a. when actions = 0).
- Works when the number of samples or success probability is very low — more about that in the “There are other intervals?” section.
We’re finally ready to pick an algorithm to calculate confidence intervals. I’ve chosen the Clopper–Pearson interval because it addresses some of the deficiencies of other methods, but it can also apply when we have no actions.
Your Title Goes Here
There are other intervals?
There are a ton of different formulas for confidence intervals. You could feasibly use any of these, but I discourage the Normal Approximation interval due to the fact that it’s “unreliable when … the success probability is close to 0.” That’s going to be the case for basically any action you’re trying to measure.
That’s right, this formula will work for the 0-action (null-action) case that we covered last time as well!
The first thing you may notice is that this formula involves some gnarly math. Whenever the Beta Function is involved, you know it’s gonna get heavy.
Fear not, just because you don’t understand the entire derivation of this formula doesn’t mean you can’t use it.
You only need 3 numbers to power this beast.
- The number of “trials.”
In our case, impressions served.
- The number of “successes.”
The number of actions recorded in those impressions.
- A confidence level.
You can pick whatever you want here, but I’ll provide some benchmarks.
Are you willing to lower your confidence for a more narrow estimate of your true metric probability? Try a confidence value like 63%.
Dealing with a high-profile client where you can’t afford mistakes? You can crank your confidence up to 95%.
Need something in the middle? Try 85%.
OK, it’s getting heavy. Here’s an example.
Let’s try this all out with a simple CPC (cost per click)example.
If we’ve served 1000 impressions, got 10 clicks and we want to specify 95% confidence, we’d get 0.0048 and 0.0183 from our formula.
Notice that this formula doesn’t provide us with estimates of the metric (CPA, CPC, CTR). Instead, it estimates the probability of an impression leading to the specified action. We’ll cover how to convert this back to a true metric shortly.
This means we’d have 95% confidence that the probability of a user clicking on one of these impressions is between 0.48% – 1.83%.
You can then use your high and low estimates to get a range of your cost. Heading back to high school math we can use some good, old-fashioned unit conversion to get the CPC values we care about.
$5 / 1000 impressions * 1 impression / 0.0048 clicks = $1.04
$5 / 1000 impressions * 1 impression / 0.0183 clicks = $0.27
Finally, we arrive at our estimated CPC range: $0.27 – $1.04.
I think it’s worth comparing this to the old method (laid out at the beginning of this article). If we assume the probability of a click is just the number of clicks / number of impressions we’d have:
$5 / 1000 impressions * 1000 impression / 10 clicks = $0.5
At first this doesn’t seem so bad, $0.5 is comfortably within the range of our upper and lower bounds (which are $0.27 – $1.04). But, let’s compare it to the midpoint of our confidence interval:
($1.04 + $0.27)/2 = $0.655
As we can see, if we want a confidence of 95% our CPC estimate goes up by $0.15 (from $0.5 to $0.65).
That may not seem like much, but that’s an increase of 30%.
An increase of that size on a $5 CPC isn’t much, but imagine this kind of error on a campaign with $100 CPA (remember these calculations could apply to any action/metric, CPC is just an example). Worse, incorrectly forecasting metrics could cause you to make optimization decisions that are premature or completely wrong, which could cost you more than just money.
Imagine adjusting your margins thinking you could get $0.50 CPC only to find out weeks later it was closer to $0.65 CPC. That 30% error would suddenly be a lot more important and someone would be in trouble.
So remember, do the math upfront and save yourself a big headache at the end.
How do you actually do this?
As a final footnote here’s my recommended method approach for calculating confidences on your own. If you’re feeling adventurous and want to bust out some good, old-fashioned calculus you can use the formulas from the previous section to calculate this by hand. However, the much faster method is to use a computer.
You could do this in just about any programming language, but I chose Python for its ease of use and wide variety of statistics libraries.
I’m currently using statsmodels module for Python 3.7.6.
If you want to give this a try here’s the code you’ll need:
from statsmodels.stats.proportion import proportion_confint
k = 10 # NUMBER OF SUCCESSES
n = 1000 # NUMBER OF TRIALS
conf = 0.95 # Adjust accordingly
low_bound, high_bound = proportion_confint(k,n,alpha=(1-conf),method=’beta’)
# if you want the midpoint of your range:
mid_point = (high_bound – low_bound) / 2.0
Thanks for sticking with me this long. Look forward to LumenMath part 3!
Updated: 4/24/2020 It’s April, and working from home and sheltering in place is now the norm for most Americans. (Thank you essential workforce for all that you are doing right now!) As we wrap our...
At this point, just about every CMO and agency executive is asking themselves what impact the COVID-19 health crisis will have on their respective industries. The reality is we’re most certainly...
Table of Contents THE COOKIE INDUSTRY REACTION THE FUTURE WHAT'S NEXTIn January of this year, Google finally did it. They officially announced the expiration date of third-party cookies: January of...