Dynamic pricing isn't rocket science… but it can look similar

How did a Soviet engineer inspire a recent Datasparq dynamic pricing product?

As dynamic pricers and revenue managers, we might unthinkingly use metaphors like launching a product; using deep discounts to turbo-charge demand; or we could apply the brakes with a price increase. Demand could drop like a stone, or maybe it's just undergoing some turbulence.

Sometimes, though, it pays to be a literal-minded Data Scientist. Could we double down on this metaphor, and think about how engineering science—rocket science!—could apply to the yield management problem?

It turns out that yes, we can, as recent work on an ancillaries pricing product showed us.

Targeting revenue

Suppose we launch a product and we have a year to sell it before it expires. Let's also assume that we know the incremental cost of a sale, and that we know exactly how demand responds to price. We even know how the market size varies over the upcoming year. 

Given all these things we know, we might be in a position to be greedy. That is, we could look at any one day (or any other time unit) from launch to expiry. On that day, we could choose a price to set that will maximise the incremental revenue that we can get for that day. Using our models of how demand responds to price, this gives us a profile of prices we can set from launch to expiry to maximise incremental revenue on each day.

However, that hasn't dealt correctly with the true problem: we want to maximise achievable revenue. If we have a limited supply of our product and we price greedily on each day at the incremental revenue-optimal prices, then we might easily sell out too early—before the product expires—and miss out on demand arriving after we sell out. If there is scarcity, this means that we should be able to price higher and achieve greater overall revenue. We should forgo jam today to account for the likelihood of jam tomorrow.

Just get to the rockets, please

Let's think about our problem a bit more metaphorically. We could think of the price as a throttle, where lower prices speed up sales and higher prices slow them down, and our model tells us exactly how the price can affect the speed. We want to use this throttle to guide our product from 0% sold at launch to 100% sold at expiry, for the greatest amount of incremental revenue possible.

This is the optimal control problem. What is the sequence of control inputs (in our cases, prices) to maximise a reward (incremental revenue) given we have a model of the dynamics of our system (how underlying demand varies over time) and how the control affects the state of the system (how prices influence conversion).

The Optimal Control problem

And this problem is pervasive in Control Engineering. The solution we turned to was invented in 1956, just as the Space Race was really taking off, by Lev Pontryagin and his colleagues in the USSR. They were trying to determine the flight plan needed to maximise the terminal speed of a rocket. Similarly to us, they found that it makes no sense to apply full power to the rocket at all times to achieve the maximum speed. The applied thrust should be shaped over time and the natural system dynamics should be exploited to achieve the speed-maximisation goal.

Pontryagin's maximum principle, applied to the dynamic pricing problem, allows us to select prices at each time point that take into account scarcity when looking across an entire selling period. We apply an opportunity cost to a sale at any time point, to increase the price appropriately whenever we forecast the likelihood of scarcity. In other words, we ease off the throttle and slow sales at just the right rate to get the best overall outcome.

Beyond "fire and forget"

Above, we've described a technique invented right at the start of the space race. For better or for worse, control theory further progressed at a rapid rate throughout the Cold War and beyond, into the networked age. We also brought some of these techniques to bear in our pricing product.

Suppose we launch a rocket with a flight plan, but after 10 seconds an unexpected gust of wind points the nose slightly off the planned trajectory. Do we now have to hope for a gust in the other direction to even things out, on average? Or can we do better than this? One important technique also in the control engineer's toolbox is a deceptively simple approach, known as Model Predictive Control, or MPC.

Model Predictive Control

Remember that using optimal control, we designed our price profile on launch day, recommending optimal prices from launch to expiry to maximise achievable revenue. So we know how to set our price for launch day itself: it's the first price on the list. It's not greedy; it's taking into account the possible scarcity of our product by the time of expiry.

But now suppose it's the next day, and times have moved on. The product may have sold slightly more or less than expected, either because our model might be slightly off or because randomness might have played its part. The situation is not necessarily going to be exactly as modelled.

In this case, we could look at the launch day's solution, from yesterday, and take its price recommendation for today, hoping that's good enough.

Or we can now re-optimise in mid-flight. That is, we can re-solve the optimal control problem today, but starting from the current state of the product instead of its launch state. In fact, if we were feeling clever, we could also update our forward-looking model given the new data point (see our work on Thompson Sampling for more).

So, by using MPC, each time we refresh the price then we do the next right thing (with apologies to the makers of Frozen II). Each day, we set our prices by taking into account our uncertain knowledge of the future, while also feeding in the constantly updating reality of the present.

Control and Reinforcement Learning at Datasparq

Even in a 21st-century AI consultancy, we're never above reusing some 20th-century science. But, looking more deeply, there is a fascinating connection between the contemporary preoccupation with (and Datasparq's established strengths in) Reinforcement Learning (RL), and established control engineering principles. For example, Bellman – a name instantly familiar to those working in RL – was a contemporary of Pontryagin on the other side of the Iron Curtain at RAND Corporation. He was also a major contributor to the theory of optimal control, developing the ideas of dynamic programming. These connections between RL and control are under intense exploration by many academics and practitioners, including ourselves.

At Datasparq, we enjoy exploiting the depth of our expertise and applying that knowledge collaboratively to the problem at hand. Like the best engineers, we don't blindly recommend fashionable techniques. We deeply consider the context and details of data science techniques to make AI products that are truly aligned with the business problems of our clients.

More insights

Call us when you're ready