More Website Templates @ TemplateMonster.com - September08, 2014!

Can Hamiltonian Win At Monte Carlo?

31st January 2016

As I said in my last post, Optimising with Ants and Ice-cream Cones, the MRes students at STOR-i are currently in a fornight of lectures giving brief outlines of research areas at Lancaster University. The rest of the week consisted of three different topic.

On Tuesday, it was Dynamic Programming (DP) that was the main topic. Under this, Chris Kirkbride gave us an introduction to DP as well as talking about the problems of it when scaling. Then Peter Jacko went further into this topic and introduced us to the idea of restless bandits. The session was concluded by Professor Kevin Glazebrook with our first real introduction to the Multi-Armed Bandit Problem. He also gave an explanation of the Gitten's Index that I spoke about in Why Not Let Bandits in to Help in Clinical Trials?.

On Wednesday, the main topic was Computational Statistics, as was given by Chris Sherlock and Peter Neal, both part of the Mathematics and Statistics Department. Friday was on the use of Wavelets and Changepoints, given by Idris Eckley, Matt Nunes and Azadeh Khaleghi. These were both interesting topics, involving some clever ideas.

The main thing I would like to talk about was a little comment made by Chris Sherlock during his part of the Computational Statistics talk. He gave the basic idea of Hamiltonian Monte Carlo (HMC) and how it can simulate from a difficult posterior distribution. A variant on HMC for Big Data was briefly mentioned in Can MCMC Be Updated For The Age Of Big Data?, but I didn't go into the mechanics in much detail, focussing mainly on its performance. I thought it was a wonderful idea, possibly because my background is in physics and so using those principles appeals to me. I decided to look it up quickly and write about how it works. Unfortunately, if you decide to search for "Hamiltonian Monte Carlo" online, you mainly get information about Formula One's Lewis Hamilton and the famous racetrack in Monte Carlo. Eventually I found a paper by B. Shahbaba, S. Lan, W. O. Johnson and R. M. Neal, which begins by going through the basic ideas.

Hamiltonian physics is based around a function called the Hamiltonian, $H(p,q)$. This quantifies the energy of a particle and is a function of its position $q$ and momentum $p$. In many cases, the Hamiltonian is the sum of the particles kinetic energy, $K(p)=p^TM^{-1}p/2$ (where M is a diagonal matrix representing mass), and its potential energy, $U(q)$. If one considers the case of a particle near the earth's surface with on gravity acting upon it, $U(q)$ is (approximately) proportional to the particles height above ground. From the Hamiltonian, one can calculate the trajectory of the particle from the partial differential equations that each component of $q$ and $p$ satisfies Hamilton's equations: $$\frac{dq_i}{dt} = \frac{\partial H}{\partial p_i} = \frac{\partial K}{\partial p_i},\qquad \frac{dp_i}{dt} = \frac{\partial H}{\partial q_i} = \frac{\partial U}{\partial q_i}.$$

This has been very cleverly applied to Bayesian Statistics to simulate from non-standard distributions. Let $q$ be a vector of parameters from the posterior distribution with prior $\pi$ and likelihood $L$. Then we can set $$U(q) = - \log(\pi(q)L(q)).$$ This inverts the posterior, and provides a potential (which can be considered a landscape) in which a particle can slide around. Now that we have a Hamiltonian, each component of $p$ can be taken from a normal distribution with variance $M_i$, and the particle at $q$ can be given this momentum. The particle will slide around the distribution, and exchanging kinetic and potential energy as it traverses parts of the posterior with different heights. After a certain amount of time, the position will be read again which, in principle, can be found from the Hamilton's equations.

The aim is to sample from the distribution. If your current sample is $q_i$, then a new sample point can be proposed by generating a set of momenta in each dimension of the parameters, letting the particle slide around the potential for a given time, $t$, and then choosing its position at time $t$, $q_{i+1}$. A Hamiltonian physics is invariant under certain transformations, the acceptance probability is 1.

Unfortunately, it is often too difficult to solve Hamilton's equations exactly, and so numerical integration is required. This leads to approximations creeping into the calculation and so the acceptance rate may not be exactly 1. Another problem is what time step should we choose? Too short and we will hardly move anywhere and mixing of the posterior will be very slow. Too long and the numerical integration will have too high a computational cost.

I really enjoyed looking into this, and thinking about it in a physical manner. My background in physics has got me thinking about whether or not other ideas could be applied to these ideas. For example, could Langrangian mechanics (which involves the least action principle) be used to solve similar problems. Or could you treat probability densties as forms and use the mathematical principles in general relativity to move around them more efficiently. Unfortunately, I feel that the differential equations woul be even more computationally difficult than the simple cases of Hamiltonian physics.

References

[1] Split Hamiltonian Monte Carlo, B. Shahbaba, S. Lan, W. O. Johnson and R. M. Neal, Statistics and Computing, Vol. 24(3), pp. 339-349 (2014)