More Website Templates @ TemplateMonster.com - September08, 2014!

A Jump Back to Population Estimation

10th April 2016

The next project the MRes students in STOR-i are onto is researching and presenting aspects of one of the Masterclasses that have occurred this term (excluding the one on simulation that I spoke about in Optimisation and Error in Simul?ation). Emily Graham and I are looking into the Masterclass by Professor Ruth King of School of Mathematics at the University of Edinburgh, who came to talk about ways of estimating population sizes using incomplete contingency tables. I have already spoken about this in my blog post Estimating Near Invisible Populations. In that post, I discussed the basic framework of the problem. In this post, I wish to talk about some of this in more detail, particularly the computational side.

As King used a Bayesian statistics approach to the problem, the aim is to get to the posterior distribution, $\pi(n_0,\theta|\mathbf{x})$, for the log-linear parameters and the number of unobserved individuals given the data, $\mathbf{x}$, we have. This is often very difficult to sample from as it is not a standard probability distribution. One way of going about this is to use Markov Chain Monte Carlo (MCMC), which is spoke briefly about in Can MCMC Be Updated For The Age Of Big Data?. For MCMC, initial values are given to each of the parameters, then the values are updated by simulating from either conditional distributions (if they are in a nice form) for from distributions that approximate the posterior and accepting them with a certain probability. . Due to the property of detailed balance (see later) the Markov chain tends to a stationary distribution equal to the posterior. Therefore, provided it has enough time to run and the early stages are removed, an MCMC sample will be a good approximate sample from the posterior.

There is a problem that occurs in statistics which is involved in how to pick which model, $m$, to use. In the application to population estimation, the different models correspond to different choices of main effect and interaction log-linear terms to include. The different models gave quite a wide range of estimates for the size of the population, so which one is correct? Really, we don’t know but just choosing one model and quoting its results will not give use the fact that we are uncertain about our choice. One way to include this uncertainty is model averaging.

For model averaging, the model itself is chosen to be a discrete parameter, $m\in\mathcal{M}$, and is added to the posterior. The parameters used for a particular model are $\theta_m$. This means that the posterior has form

$$\pi(n_0,\theta,m|\mathbf{x})\propto f(\mathbf{x}| n_0,\theta_m,m) \pi(n_0|\theta_m,m)\pi(\theta_m|m)\pi(m).$$ $f(\mathbf{x}| n_0,\theta_m,m) $ is the likelihood of the data, and the other terms are prior distributions that incorporate current beliefs about the other parameters. This makes MCMC really hard. You can’t just add in parameters easily between steps in a chain! Fortunately, a clever way of doing this has been suggested by Professor Peter Green of the University of Bristol, Reversible Jump MCMC.

Let the chain be at the parameter vector $\theta$. For now, just consider the move to a model, $m’$, with more parameters than the model, $m$, the chain is currently in. A move of type $j$ to model $m’$ with probability $r(j,m,\theta)$. How do we then choose what values for the new parameter vector $\theta’$? Well, Green’s suggestion was that the parameters that stayed in the model transformed deterministically by a function $g$ (that could just be the identity function) and the additional parameters were proposed from a distribution $q(u)$. Therefore, we would have $\theta’=(g(\theta),u)$.

Now that we have proposed a move, we need some way of deciding whether or not to accept it. This comes naturally out of assumptions of detailed balance. Detailed balance basically means that the probability of going from state $i$ to state $j$ is the same as going from state $j$ to state $i$. To satisfy this assumption, Green showed that an acceptance probability is $\alpha(\theta,\theta’)=\min\{1,A\}$ where $$A = \frac{\pi(m’,\theta’|\mathbf{x}) r(j,m’,\theta’)}{ \pi(m,\theta|\mathbf{x}) r(j,m,\theta)q(u)}\left|\frac{\partial\theta’}{\partial(\theta,u)}\right|.$$ This is basically the ratio of the probability of proposing going to $\theta$ from $\theta’$ to the probability of proposing going to $\theta’$ from $\theta$. The acceptance rate of the reverse move, that is to a model with fewer parameters is therefore $\alpha(\theta’,\theta)=\min\{1,1/A\}$.

I found this method of model averaging by Reversible Jump MCMC quite clever, and it seems to be a good way to help incorporate model uncertainty into ones answers. This is something that I haven’t seen done in classical statistics. The assumptions of selecting a model are quite strong ones and so to reduce the strength of them is very important to me.

References

[1] On the Bayesian Analysis of Population Size , R. King and S. P. Brooks, Biometrika. Vol. 88, No. 2, pp. 317-336 (2001)
[2] Reversible jump Markov chain Monte Carlo computation and Bayesian model determination, P. J. Green, Biometrika, Vol.82, No.4, pp. 711-732 (1995)