## Why Not Let Bandits in to Help in Clinical Trials?

### 15^{th} January 2016

In my last blog post I spoke about Thompson Sampling within
Multi-Armed Bandit Problems (MABPs) applied to testing multiple versions of the same website. Earlier this week, Lancaster
University hosted a conference on
just that subject, which has helped me to understand the problem a little more. I managed to get to a couple of the talks, one of
which, by Sofía Villar from
the Biostatistics Unit-Medical Research Council, UK, was particularly interesting. She said that she spends most of her working
hours trying to persuade the implementers of clinical trials to use MABPs. Her talk was looking at different methods
for allocation and their assets and disadvantages in the context of clinical trials. As I was interested, I went and looked up
the paper for more detailed information.

The first allocation method the paper describes is the Gittens Index (GI). Under certain circumstances, including an infinite
horizon, the Gittens Index is a function of the current state of the information of bandit $k$,
$$\mathcal{G}_k(x_{k,t}) = \text{sup}_{\tau\geq 1}\frac{E_{X_{k,t}=x_{k,t}}\sum_{i=0}^{\tau-1}\mathcal{R}(X_{k,t+i},1)d^i}{E_{X_{k,t}=x_{k,t}}\sum_{i=0}^{\tau-1}\mathcal{C}(X_{k,t+i},1)d^i},$$
with the reward function \(\mathcal{R}\), resource consumption function $\mathcal{C}$ and discount $d$. The principle of the
Gittens index is that the allocation at time $t+1$ is chosen by which arm has the largest $\mathcal{G}_k(x_{k,t})$. Despite
its asymptotic ideals, it can be used for finite horizon problems.

This was compared with multiple other allocation methods in the context of a clinical trial. These included adaptive rules,
Thompson Sampling, other index methods (such as Whittle Index) and a Fixed Randomised Design (FR) that allocates patients to
treatment with equal

fixed probabilities throughout the trial. The aim was to demonstrate the different assets of the different methods.
Consider the case where there are four different treatments to be tested with the probability of success being $p_0=p_1=p_2=0.3$
and $p_3=0.5$. Let the trial size (or horizon) be 423, chosen so that the FR could have a power of 80%. The power is
the probability of rejecting the null hypothesis (all treatments are equally good) given that the alternative is true.

The results showed that the method chosen is largely based on the desired outcomes. Are you mostly caring about
the results of the trial, ie. discovering which treatment has the highest success rate of those tested and so would be most useful in
the future? Or, alternatively, are you trying to treat the patients within the trial itself in the best possible way? The first target
is known as exploration, whilst the second is exploitation. Unfortunately, none of the allocation methods actually proved totally
effective on both fronts. There is a trade off that must be made.

Of all the models, it was the Thompson Sampling that had the most statistical power, 88.4%, but other adaptive rules also had high
power. Therefore, if the aim is exploration, these are the best options. However, there are then problems with whether or not this
is right for the patients involved as the proportion of patients assigned to the best treatment (and therefore expected successful
treatments) is lower than in other methods. On the other hand, GI sends 83% to treatment 3, and so the expected number of successful treatments
is 198.25, with the theoretical limit being 211.5. This far surpasses Thompson Sampling (172.15), but only has a power of 42.8%.

This is clearly a problem. The user must decide which is more important. In my previous blog the issue is clearer regarding websites,
so Thompson Sampling may be the best option. However, clinical trials have more ethical issues involved. There are other problems with MABPs
in clinical trials. Allocations can only be made after all other experiments are over, which is rarely the case. Whilst MABPs may provide
some insights beyond the current method of clinical trials, it seems to me that some decisions on ethics need to be discussed.
Perhaps development of a theory of pulling multiple arms may be more useful than single arm allocations.

#### References

[1] *Multi-Armed Bandit Models for the Optimal Design
of Clinical Trials: Benefits and Challenges*, Villar S., Bowden J. and Wason J., Statistical Science, Vol.30, N0.2, 199-215
(2015)