MATH454/554: Project III
TO BE HANDED IN BY MONDAY 15/01/2018 (WEEK 11), 10:00.
This project will contribute 17% towards the final module mark.
Submission: Upload the pdf of your answer and your R code file to the Moodle site. Your R code should be as .r or .txt file so that it can be copied and pasted to run. Submit also a printed copy of your answers (no need for a printed copy of the R code), together with a plagiarism cover sheet, to the MSc submissions pigeon hole. Please write your student ID on your answers, not your name.
Let with probability mass function
Then is a Geometric random variable. The mean and variance of are and , respectively. Thus the variance is greater than the mean and the geometric distribution can be preferable to the Poisson distribution for data which is over-dispersed (shows more variability) than we would expect from a Poisson distribution.
We take into account explanatory variables (covariates) by using the following model:-
| (1) | 
Thus for ,
Data
The data is provided in pupil.txt and contains the number of days absent of 314 pupils along with three covariates. The number of absence days are given in column 1 and the three remaining columns are:
- 
• 
Gender; 0 - female; 1 - male (column 2); 
- 
• 
Maths test score; range 0-100 (column 3); 
- 
• 
Programme pupil is on; 1, 2 or 3 (column 4). 
We will analyse the absence data using the geometric regression model above with including an intercept term and , where , and denote the gender, maths score and programme, respectively, of pupil . Remember is an indicator random variable which is 1 if occurs and 0 otherwise.
The unknown parameters are . The objective here is to conduct posterior inference on these parameters.
In order to conduct Bayesian inference, we need to elicit a prior distribution for , the model parameters. Let us consider the following prior distribution
| (2) | 
In other words, the prior for is -variate Normal with mean vector and diagonal covariance matrix . You may choose the following hyperparameter values (a -dimensional vector of 0 entries) and , which correspond to a fairly uninformative prior distributions.
- 
1. 
Write down, up to a constant of proportionality, the joint posterior distribution of . [2] 
- 
2. 
Write a function in R to compute the log-likelihood given , and (the absence data). [3] 
- 
3. 
Write a function in R to compute the log of the prior distribution. [1] 
- 
4. 
Write in R a random walk Metropolis algorithm to obtain samples from the joint posterior distribution of . Use a multivariate Gaussian proposal for with variance matrix V.prop. [3] 
 Hint: Note that you can use the random walk Metropolis for the Poisson regression example in Lab 5 as a template.
Throughout the following apply the random walk Metropolis algorithm to the pupil data set with initial parameter values and 10000 iterations of the algorithm.
- 
6. 
Perform a run with V.prop = diag(rep(0.01,5)) ( times the identity matrix). Comment upon the performance of the random walk Metropolis algorithm. [2] 
- 
7. 
Use tuning runs to find a good choice of V.prop. State your final choice of V.prop. [2] 
- 
8. 
Run the random walk Metropolis algorithm with your chosen V.prop for 110000 iterations and estimate the joint posterior distribution of the parameters. 
 Give a short report of the results obtained. [2]
- 
9. 
Using samples from the posterior distribution for , estimate the probability that a male on programme 3 with a maths score of 60 has no absences. [2]