Scalable Monte Carlo Methods
Arguably the main deterrent to more wide-spread use of Bayesian methods is their reliance on Monte Carlo methods, such as MCMC, which scale poorly to big-data settings, and are often unsuitable for implementation in a parallel computing environment. Approximate approaches, such as variational methods, do offer scalable alternatives. While these can perform well in terms of approximating the body of the posterior and making point predictions, they often give unreliable approximations to the tails. However, in many health science applications it is the tails of the posterior that are crucial in determining the best decisions.
The challenge with these methods stems from the extra difficulty in simulating a continuous-time process, and is around developing general implementations for important classes of statistical model. This will be an initial focus, particularly using recent insights we have developed in terms of using such algorithms in the place of reversible-jump MCMC for problems of model-choice within association studies. In some situations the challenges of simulating the continuous-time dynamics may lead to excessive computational overhead, and thus we will investigate theoretically and empirically the properties of algorithms that use approximate simulation ideas.
Fusing Information from Disparate Sources
As data becomes ever cheaper to collect, combining information from different data sources and types will become an increasingly important challenge. It has already been widely identified as one of the key challenges of data science, and is a common challenge across our motivating health science applications. Bayesian approaches give a natural framework for developing methods that can combine information from different sources whilst appropriately quantifying and propagating uncertainty.
Arguably they give a trivial solution to this challenge through repeated application of Bayes theorem. However this solution masks a range of practical, modelling and computational challenges including the different quality and reliability of the data, differing availability of data types and high levels of missingness. Our motivating applications will give us a range of scenarios in which to develop practicable and reliable methods. Our initial work in this area will be based on models and applications from motivating areas. We will then develop theoretical understanding of these approaches, which will guide their development and use in wider applications.
Robust Bayesian Methods
Bayesian methods are motivated by their natural ability to model complex co-variation through hierarchical models, and facilitate coherent decision making through the calculation of marginal probability statements on parameters affecting the utility of actions. However, one issue with Bayesian methods is their reliance on models (joint likelihood functions) for the data. As data increases in size and complexity this can make specifying such models challenging, and leads to questions about how we can make inferences or decisions robust to the model error that will necessarily exist. Moreover, the requirement for a full joint likelihood is at odds with the concept of ‘keeping models simple’, as abstractions of the scientific domain that capture the major salient features.
Recent work has led to generalisations of Bayesian updating that requires only the modelling of how the data relate to features of interest, and tractable ideas for how to make robust decisions. These methods replace the likelihood function with a loss function as a probabilistic description of how salient features, or summary statistics, link to observations. These two pieces of work open up new avenues for developing robust scalable Bayesian methods that keep models simple as the data dimension or number of data modalities increase.