More Website Templates @ - September08, 2014!

Why Not Let Bandits in to Help in Clinical Trials?

15th January 2016

In my last blog post I spoke about Thompson Sampling within Multi-Armed Bandit Problems (MABPs) applied to testing multiple versions of the same website. Earlier this week, Lancaster University hosted a conference on just that subject, which has helped me to understand the problem a little more. I managed to get to a couple of the talks, one of which, by Sofía Villar from the Biostatistics Unit-Medical Research Council, UK, was particularly interesting. She said that she spends most of her working hours trying to persuade the implementers of clinical trials to use MABPs. Her talk was looking at different methods for allocation and their assets and disadvantages in the context of clinical trials. As I was interested, I went and looked up the paper for more detailed information.

The first allocation method the paper describes is the Gittens Index (GI). Under certain circumstances, including an infinite horizon, the Gittens Index is a function of the current state of the information of bandit $k$, $$\mathcal{G}_k(x_{k,t}) = \text{sup}_{\tau\geq 1}\frac{E_{X_{k,t}=x_{k,t}}\sum_{i=0}^{\tau-1}\mathcal{R}(X_{k,t+i},1)d^i}{E_{X_{k,t}=x_{k,t}}\sum_{i=0}^{\tau-1}\mathcal{C}(X_{k,t+i},1)d^i},$$ with the reward function \(\mathcal{R}\), resource consumption function $\mathcal{C}$ and discount $d$. The principle of the Gittens index is that the allocation at time $t+1$ is chosen by which arm has the largest $\mathcal{G}_k(x_{k,t})$. Despite its asymptotic ideals, it can be used for finite horizon problems.

This was compared with multiple other allocation methods in the context of a clinical trial. These included adaptive rules, Thompson Sampling, other index methods (such as Whittle Index) and a Fixed Randomised Design (FR) that allocates patients to treatment with equal

fixed probabilities throughout the trial. The aim was to demonstrate the different assets of the different methods. Consider the case where there are four different treatments to be tested with the probability of success being $p_0=p_1=p_2=0.3$ and $p_3=0.5$. Let the trial size (or horizon) be 423, chosen so that the FR could have a power of 80%. The power is the probability of rejecting the null hypothesis (all treatments are equally good) given that the alternative is true.

The results showed that the method chosen is largely based on the desired outcomes. Are you mostly caring about the results of the trial, ie. discovering which treatment has the highest success rate of those tested and so would be most useful in the future? Or, alternatively, are you trying to treat the patients within the trial itself in the best possible way? The first target is known as exploration, whilst the second is exploitation. Unfortunately, none of the allocation methods actually proved totally effective on both fronts. There is a trade off that must be made.

Of all the models, it was the Thompson Sampling that had the most statistical power, 88.4%, but other adaptive rules also had high power. Therefore, if the aim is exploration, these are the best options. However, there are then problems with whether or not this is right for the patients involved as the proportion of patients assigned to the best treatment (and therefore expected successful treatments) is lower than in other methods. On the other hand, GI sends 83% to treatment 3, and so the expected number of successful treatments is 198.25, with the theoretical limit being 211.5. This far surpasses Thompson Sampling (172.15), but only has a power of 42.8%.

This is clearly a problem. The user must decide which is more important. In my previous blog the issue is clearer regarding websites, so Thompson Sampling may be the best option. However, clinical trials have more ethical issues involved. There are other problems with MABPs in clinical trials. Allocations can only be made after all other experiments are over, which is rarely the case. Whilst MABPs may provide some insights beyond the current method of clinical trials, it seems to me that some decisions on ethics need to be discussed. Perhaps development of a theory of pulling multiple arms may be more useful than single arm allocations.


[1] Multi-Armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges, Villar S., Bowden J. and Wason J., Statistical Science, Vol.30, N0.2, 199-215 (2015)