Are you investing heavily in pay-per-click (PPC) advertising without seeing significant returns? Do you find it challenging to determine the optimal bids for specific keywords? Are you struggling to effectively allocate your budget across various keywords?
If you answered yes to any of these questions, we're here to assist you!
This blog is aimed at individuals grappling with driving traffic or generating revenue through PPC advertising, seeking a quantitative method to establish dynamic and optimal bids with minimal human intervention.
Pay-per-click (PPC) advertising stands as one of the most effective tools for driving targeted traffic to websites. At its core lies the Google Ads auction, a dynamic marketplace where advertisers bid on keywords to display their ads prominently in search engine results. With PPC, you only pay when a potential customer clicks on your ad, making it a cost-effective method to reach your target audience. Understanding this auction process is crucial for optimising your PPC campaigns and maximising return on investment (ROI).
Most businesses opt to use Google Smart Bidding to automate and optimise their bidding strategies. This approach leverages Google's machine learning algorithms to adjust bids in real-time based on various signals, aiming to maximise conversions or achieve other specified campaign objectives. This out of the box approach allows businesses to expend minimal effort in the setup and design of their Google Ad campaigns. Approximately 80% of Google Ad users are using Google Smart Bidding making it a very popular product.
However, this black-box approach is not always ideal for businesses aiming to maximise the effectiveness of their PPC advertising and who want to understand where their money is going. Manual bidding allows companies to have direct control over their bidding strategies, enabling them to tailor bids based on specific campaign goals, market conditions and performance data. By manually setting bids, businesses can adjust their strategies in real-time, capitalise on opportunities and optimise their ad spend more effectively. This method empowers advertisers to finely tune their campaigns, ensuring that budget allocations align closely with the desired outcomes and optimise the return on their advertising investment.
Let’s delve into understanding the mechanics of the Google auction. Imagine we're a vegetable company eager to increase website traffic, particularly targeting keywords like “healthy eating” or “vitamin C”. Our objective is clear: drive traffic and achieve a defined return on investment set by our finance team. The question arises: how do we determine the optimal bid for keywords like “healthy eating”?
Consider the extremes: bidding nothing results in zero sessions and zero expenditure, achieving no visibility. Conversely, bidding £100 per click may reliably secure the top ad position, yet it comes with the risk of depleting our budget rapidly without ensuring a proportional increase in conversions or profitability. It is worth noting that Google uses the second place bidding model so you won’t necessarily spend £100 per click, but will default to the second highest. These extreme examples provide us with some constraints to the expected distribution of sessions or revenue as a function of bid.
We can borrow the idea of a “dose response curve” from biochemistry, which quantifies the transition from minimal, to tolerable, to harmful effects of applying a dose of some stimulus, such as a drug or nutrients. The dose response curve is characteristically an S-shaped curve, where initially, small increases in dose result in minimal responses, followed by a rapid increase in response, which then levels off as the dose continues to increase.
Applying this analogy in our non-biochemical setting, the dose is the bid that we submit to the auction, and the response is the number of sessions achieved. We want to bid sufficiently high to get a significant number of sessions, but not too high that this response becomes overwhelmed by the detrimental effects of an “overdose”: marketing overspend.
This model can help advertisers understand the optimal point where bids maximise sessions, and ensure efficient allocation of advertising budget in Google Ads campaigns. The dose response sigmoid describing how the expected number of sessions, s, varies with the bid, x, has the form
where resulting curve is characterised by three parameters:
In the plot below, we see three different dose response curves for three different keywords. We will soon see how we can use these models to determine an optimal bid for each keyword.
We've explored how to model the expected number of sessions based on our bidding strategy for each keyword. Now, the question is: how do we strategically determine our bids for each keyword to meet our return on investment goals and maximise traffic to our vegetable business’s website?
To begin, we formulate our goal: bid to maximise the total number of sessions achieved. Without imposing constraints, the obvious solution here is to bid big: the more we bid, the more sessions we get! Of course, it is healthy to set guardrails on the maximum allowed bid; perhaps we’ve observed in the past that bidding over £3 yields diminishing returns, and so we set a global limit of £3. While these figures vary depending on the business and its specific keywords, digital marketing teams typically have a general understanding of these thresholds.
Even with these guardrails, the solution is to bid big. Well, as big as possible: every keyword would get a bid of £3, or whatever the upper limit might be.
The key decision to make is to balance the trade off between the cost of our PPC investments and their value. One approach is to consider trying to achieve a target return on investment to satisfy our finance team’s expectations. Typically, this target is somewhat flexible. For instance, our finance team may aim for a £1.50 return for every £1 spent on PPC, perhaps with some flexibility to move plus/minus 10p either way, to give some wiggle room in order to fully maximise the number of sessions we observe. The key to this constraint is that now bidding as high as possible on every keyword is unlikely to be allowed: some bids will need to be reduced to achieve the target ROI, and maybe some bids on valuable keywords could be increased.
Of course, to implement this constraint we need to know the expected marginal return, or value, attributable to a session for each keyword. In our post we will assume that this is given to us, but attribution modelling is a complete topic of its own.
We also need to consider our budget constraint, which is often specified as a daily limit. Therefore, while striving to maximise sessions and achieve our target ROI, it's crucial to ensure that we do not exceed our allocated budget.
These business constraints provide us with the ingredients needed to build an optimiser!
Having formulated an optimiser means we should be able determine the bid amount for each keyword to maximise sessions while meeting the specified constraints. However, our optimiser relies on knowing the dose response curves of each keyword: how many sessions can we expect for a given bid? We have not yet identified these curves, and need to calibrate them against data. But what if certain keywords lack historical data, with responses only observed for very few bids? Moreover, what if any dose response curves that we do identify become outdated, because competitors adjust their bids? How do we identify and update the curves, and use them to make good bidding decisions at the same time?
Enter Thompson sampling.
Thompson sampling is an advanced method in machine learning and decision-making, particularly useful in dynamic environments like online advertising. Unlike traditional approaches, it balances exploration (testing different strategies) and exploitation (using the best-known strategy) by maintaining a probabilistic framework through Bayesian inference.
In our example, we are taking sequential decisions at a regular time interval (e.g. each day). Our decision is the bid to set for each keyword on that day. For each keyword, we have some historic data on past bids, and the number of sessions those bids generated.
We could follow a classical approach and fit a single dose response curve for each keyword. This single curve could then be used in our optimiser and produce bids. However, the single curve does not take into account any uncertainty we might have. For example, we might have only ever set one bid for a keyword, and so have no knowledge of the shape of that keyword’s curve. We might have only ever set low bids, and so have limited knowledge of the potential maximum number of sessions achievable.
The ability to quantify uncertainty in this setting is vital. Instead, in the Thompson sampling approach, we maintain a distribution of dose response curves. This distribution of curves represents our uncertainty in the true underlying dose response as shown in the figure below.
On any one day, a near-optimal decision is taken by first sampling one of these curves for each keyword. These randomly sampled curves are then used in the optimiser to take the bidding decision for that day. The key to this method is that for those keywords where the data means that the dose response is highly uncertain, then on any given day we sample a curve from a wide range of possible curves. Conversely, when a keyword’s response is well-characterised, we keep using similar curves for that keyword each time. Balancing the relative uncertainties in this way allows a guided exploration of the space of potentially optimal decisions.
Where Thompson sampling shines is that it is adaptive and self-improving. After trying a potentially optimal bid for each keyword on any given day, we now have new data points. This data can be integrated with existing data, and used to change the uncertainty in the dose response of each keyword. So, tomorrow, we play the same game of sampling and optimising. But now, we are sampling from a new distribution of dose response curves for each keyword, each of which has been improved by incorporating that extra data point. As we continue day by day, more data is gathered and our decision-making automatically adapts to the new data.
In fact, we can take this auto-adaptive approach even further. Consider downweighting or even forgetting older data and only using more recent data in defining the uncertain distribution of dose response curves. The figure below illustrates, for one example keyword, how using only recent observations updates the space of possible dose response curves to sample from. As the model updates, so does Thompson sampling’s estimation of the optimal bid. By prioritising the most recent data, we ensure our bids remain competitive. The coral point represents the outcome of setting a bid for one day and observing the number of sessions obtained. Following this, we update our model and its uncertainty.
By sampling from these distributions, Thompson sampling adapts flexibly and quickly to changes in competitors' bids and user behaviour, making it highly effective for optimising bidding strategies and maximising advertising ROI. The bottom line is, if our veggie competitors change their bidding strategy, we will update our models automatically.
So, how do we know this works? Because we have tested it!
We collaborated with a client for 6-8 weeks to improve their website traffic. During this period, we developed and applied the Thompson sampling technique to maximise the number of sessions while meeting the ROI goals set by their finance department.
We conducted a 4-week A/B test and achieved a 15% increase in the number of sessions, all while maintaining the target ROI across approximately 5,000 keywords. This allowed us to allocate their marketing budget more effectively.
Despite the success of this technique, there are several ways to enhance the methods further.
Firstly, we currently model each keyword independently, which becomes inefficient as the number of keywords increases. To address this, we could develop a hierarchical model that transfers learning between similarly behaving keywords.
Secondly, instead of using a fixed number of observations per keyword to produce our models, we could scale the importance of each observation based on uncertainty. For instance, we could adjust the significance of each session observation according to its age, ensuring that outdated information plays a smaller role in the overall modelling.
In summary, we've found that using Thompson Sampling and Bayesian modelling, we can increase our client’s sessions by 15% to 20% with no loss in ROI. This means they are much more competitive with Google Smart bidding and will continue to be so. If you're interested in applying Thompson Sampling to your PPC ads, get in touch with us today.