Bayesian and Computational Statistics

A circuit board

About us

Most real-life applications of statistics require the use of computational methods.

This is particularly true for Bayesian statistics, where the output of an analysis is the posterior distribution — a distribution over models and parameters that quantifies the uncertainty of inferences from the data. In almost all applications the posterior distribution is intractable, and instead of analytical evaluation of probabilities and expectations we use algorithms to draw samples from the posterior. These algorithms are important in many areas of statistics, particularly when we need to average over uncertainty within our statistical models.

At Lancaster we have expertise in a range of Bayesian computational methods, including Markov chain Monte Carlo, sequential Monte Carlo and approximate Bayesian computation. A particular focus of the group is on developing new algorithms that have excellent computational properties in settings where we wish to fit complex stochastic models to large datasets. Our research is collaborative, with involvement in large multi-institutional projects that are aiming to develop the next generation of methods, motivated by applications ranging from the Health Sciences, to Engineering and Security.

The research group meets weekly over coffee, where we discuss recent papers, present our own new research ideas and hear directly from external researchers on their latest innovations. Past talks and discussions are detailed on Github.

Case Study: The Apogee to Apogee Path Sampler

Hamiltonian Monte Carlo diagram

Hamiltonian Monte Carlo

Hamiltonian Monte Carlo (HMC) is, perhaps, the most commonly used Markov chain Monte Carlo technique in the world; however, it is notoriously difficult to tune. A transformation of the Bayesian posterior can be viewed as a very high-dimensional, irregular U-shaped surface, and at the start of each iteration of the HMC algorithm, the current value can be viewed as a small ball on this surface. The algorithm then “kicks” the ball in a random direction and follows it by numerically solving the equations of motion using a time-discretisation, epsilon. The ball’s position after some time, T=L x epsilon, is the potential next point of the algorithm, and should, ideally, be a good distance from the current point. This suggests choosing a large value for L; however, if L is too large then the ball will roll up the side of the U and back down, perhaps ending its journey very close to where it started. This shows that epsilon and L must be “just right” for HMC to achieve close to its optimal efficiency.

Apogee to Apogee Path Sampler

Apogee to Apogee Path Sampler

The Apogee to Apogee Path Sampler (AAPS) relies on the same transformation of the surface as HMC but looks backwards in time as well as forwards and splits the path of the ball into segments according to the locally highest points, or apogees, reached along its trajectory around the multi-dimensional U. It also chooses the potential next point randomly from the whole trajectory, with a bias towards points further from the start, rather than forcing it to be the very end point. The AAPS achieves similar efficiencies to those of HMC; however, as can be seen from this figure, its efficiency is very insensitive to the choice of its tuning parameters (epsilon and the number of apogees, K) compared to HMC.