Statistics Seminar (PhD Forum): Callum Vyner and Chantelle Clark
Vyner, Callum (2.00-2.30pm)
Divide and Conquer MCMC: Marginal Views
MCMC algorithms are a standard tool in the Bayesian's arsenal. However these algorithms can suffer from computational, memory and disk bottle necks in big-data environments. As the number of data points increases, the amount to be stored on the machine and the cost of evaluating the likelihood increases. This can make standard MCMC methods impractical to use in the presence of big-data. There has recently been a surge in`divide and conquer' methods, which make MCMC algorithms more scalable. These methods partition data and prior information across machines to produce what we define as 'sub-posteriors'. Then multiple machines can draw samples from the sub-posteriors in parallel. This framework can simultaneously solve disk, memory and computational bottle necks. The question of interest here is: how do we combine the samples from each of the sub-posteriors? Most methods in this field concentrate on estimating the full posterior. Typically these scale well with the number of data points, but with even a moderate number of parameters (e.g. d>20) they can suffer from 'the curse of dimensionality'. Interest, however, often lies in several low dimensional summaries of the full posterior, such as the marginal predictive distribution, which motivates us to bypass estimation of the full posterior. Since the marginal of the full posterior is not the product of the marginal of each sub-posterior, the solution is not immediately obvious. We propose a fast and efficient method for estimating any full low-dimensional posterior from the sub-posterior samples using a combination of kernel density estimation and a well-known conditional-probability identity. We evaluate this method and compare with some competitor algorithms through simulation studies.
Clark, Chantelle (2.30-3.00pm)
Title and abstract: TBA
|+44 1524 593606|