Statistical Learning for Decision

Let’s say you have a set L of levers, which you can pull to get rewards. Pulling a lever takes effort, so you want to find the best lever to pull, but you know nothing about the rewards other than what you learn from pulling the levers, so you’re going to need to use trial and error.

I’ve just described two problems in statistical learning – the Multi-Armed Bandit problem and Bayesian Optimisation.

In the MAB problem, our set L of levers is finite (often of size 2) and the rewards have some randomness associated with them so the answer isn’t just “pull each lever once to find the best lever and then keep pulling that until you’ve expended all your effort”. Your “overall reward” is the sum of the rewards of all the levers you have pulled, so you’re always trying to balance exploration (finding the best lever) and exploitation (pulling the best lever you’ve found so far).

In BO, the set L is continuous, and each pull of a lever x has a reward that is an evaluation of an expensive function f: L -> R at a point x (for example, “set up an experiment with parameters x and record the results”). We assume some notion of “if y is close to x, f(y) will be close to f(x)”. Although evaluations of f may be stochastic, randomness plays much less of a part in this problem, which is more about how to choose evaluation points x from an infinite set. Our overall reward is the maximum reward of all the levers we pull, i.e. out “best guess” at a maximum for f. Again, we’re trying to balance exploration and exploitation.

These two classes of problems have mainly been worked on in isolation, and can easily be formulated so that they look like they have nothing to do with one another. But the parallels between the two allow solutions found in one area to be transferred to another area – for example, the “upper confidence bound” solution developed in the MAB setting being tested out in the BO setting and performing surprisingly well.

In conclusion: it’s good for mathematicians to talk to each other, and some problems are more similar than they seem on the surface.

Leave a Reply

Your email address will not be published. Required fields are marked *