{"id":390,"date":"2021-05-14T00:36:17","date_gmt":"2021-05-14T00:36:17","guid":{"rendered":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/?p=390"},"modified":"2021-05-14T12:12:21","modified_gmt":"2021-05-14T12:12:21","slug":"particle-filters","status":"publish","type":"post","link":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/2021\/05\/14\/particle-filters\/","title":{"rendered":"Particle Filters (Part I)"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><span class=\"has-inline-color has-black-color\">Remember our hidden Markov model for the rabbits and foxes from the previous blog? Well, let us recap quickly. It was a hidden Markov process, the state of which consisted of the number of rabbits and foxes at each integer time step, that we could not observe, but only had access to noisy observations of only the number of rabbits at some given times. Here we introduce some techniques that would help us approximate the hidden state of the rabbits.<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span class=\"has-inline-color has-black-color\"><strong>Particle filters<\/strong>, also known as sequential Monte Carlo methods, are predominantly used to solve filtering problems arising in signal processing and Bayesian statistical inference. This means that the methods try to estimate a hidden Markov process (i.e. a process we do not actually see), based only on partial noisy observations. The term particle filters was first introduced in 1996 by Del Moral, <a rel=\"noreferrer noopener\" href=\"https:\/\/epubs.siam.org\/doi\/abs\/10.1137\/1140078\" data-type=\"URL\" data-id=\"https:\/\/epubs.siam.org\/doi\/abs\/10.1137\/1140078\" target=\"_blank\">[1]<\/a>, and was subsequently referred to as &#8220;sequential Monte Carlo&#8221; by Liu and Chen in 1998, <a rel=\"noreferrer noopener\" href=\"https:\/\/www.tandfonline.com\/doi\/abs\/10.1080\/01621459.1998.10473765\" data-type=\"URL\" data-id=\"https:\/\/www.tandfonline.com\/doi\/abs\/10.1080\/01621459.1998.10473765\" target=\"_blank\">[2]<\/a>. This blog aims to introduce a number of filtering techniques, by starting from some simple methods and slowly improving them by establishing their weak points.<\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"has-inline-color has-black-color\"><strong>Background<\/strong><\/span><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span class=\"has-inline-color has-black-color\">The majority of research articles are devoted to the problem of filtering : characterising the distribution of the state of the hidden Markov model at the present time, given the information provided by all of the observations received up to that time. In a Bayesian setting, this problem can be formalized as the computation of the distribution <span class=\"wp-katex-eq\" data-display=\"false\">p(\\mathbf{X}_{t}|Y_{1:t})<\/span>, i.e the distribution from which we can sample to estimate the hidden state at time <span class=\"wp-katex-eq\" data-display=\"false\">t<\/span>, given the observations. We can compute this distribution in a recursive fashion, using prediction and update steps. Imagine that you know <span class=\"wp-katex-eq\" data-display=\"false\">p(\\mathbf{X}_{t}|Y_{1:t-1})<\/span>, which can be thought of prior for <span class=\"wp-katex-eq\" data-display=\"false\">\\mathbf{X}_{t}<\/span> before receiving the most recent observation <span class=\"wp-katex-eq\" data-display=\"false\">Y_{t}<\/span>. In the <strong>prediction step<\/strong>, we can compute <span class=\"wp-katex-eq\" data-display=\"false\">p(\\mathbf{X}_{t}|Y_{1:t-1})<\/span> from the filtering distribution at time <span class=\"wp-katex-eq\" data-display=\"false\">t-1<\/span>:<\/span><\/p>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong><span class=\"wp-katex-eq\" data-display=\"false\">p(\\mathbf{X}_{t}|Y_{1:t-1}) = \\int p(\\mathbf{X}_{t}|\\mathbf{X}_{t-1}) p(\\mathbf{X}_{t-1}|Y_{1:t-1}) d\\mathbf{X}_{t-1}<\/span>,<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span class=\"has-inline-color has-black-color\">where the distribution <span class=\"wp-katex-eq\" data-display=\"false\">p(\\mathbf{X}_{t-1}|Y_{1:t-1})<\/span> is assumed to be known from the previous steps. In the following <strong>update step<\/strong>, this prior is updated with the new measurement using Bayes&#8217; rule, which will become the posterior, before being used as the filtering distribution in the next update step:<\/span><\/p>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong><span class=\"wp-katex-eq\" data-display=\"false\">p(\\mathbf{X}_{t}|Y_{1:t}) \\propto p(Y_{t}|\\mathbf{X}_{t})p(\\mathbf{X}_{t}|Y_{1:t-1})<\/span>,<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"has-inline-color has-black-color\"><strong>SIS algorithm and particle degeneracy<\/strong><\/span><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span class=\"has-inline-color has-black-color\">SIS stands for &#8220;sequential importance sampling&#8221;. The idea behind the SIS algorithm is to approximate the full posterior distribution at time <span class=\"wp-katex-eq\" data-display=\"false\">t-1<\/span>, <span class=\"wp-katex-eq\" data-display=\"false\">p(\\mathbf{X}_{0:t-1}|Y_{1:t-1})<\/span>, with a weighted set of samples <span class=\"wp-katex-eq\" data-display=\"false\">\\{\\mathbf{x}^{i}_{0:t-1}, w^{i}_{t-1}\\}^{N}_{i=1}<\/span>, also called particles. Then, the algorithm will recursively update the particle weights, to approximate the posterior distribution at the next steps. SIS is based on importance sampling, hence we need a proposal distribution.<\/span> <span class=\"has-inline-color has-black-color\">After running the algorithm for 10 observation points, we arrive at the following plot, where the green path is the best fit and the brown, the worst:<\/span><\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img fetchpriority=\"high\" decoding=\"async\" width=\"622\" height=\"437\" src=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/wp-content\/uploads\/sites\/31\/2021\/05\/Multiple_Observations-1.png\" alt=\"\" class=\"wp-image-421\" srcset=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/wp-content\/uploads\/sites\/31\/2021\/05\/Multiple_Observations-1.png 622w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/wp-content\/uploads\/sites\/31\/2021\/05\/Multiple_Observations-1-300x211.png 300w\" sizes=\"(max-width: 622px) 100vw, 622px\" \/><figcaption><span class=\"has-inline-color has-black-color\"><strong>Simulation of the SIS algorithm<\/strong><\/span><\/figcaption><\/figure><\/div>\n<\/div>\n<\/div>\n\n\n\n<p class=\"wp-block-paragraph\"><span class=\"has-inline-color has-black-color\">Since we used the weights to determine which path is the best one, we can plot how the weights change and in particular record the weights at the final observation:<\/span><\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"650\" height=\"428\" src=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/wp-content\/uploads\/sites\/31\/2021\/05\/degeneracy-2.png\" alt=\"\" class=\"wp-image-424\" srcset=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/wp-content\/uploads\/sites\/31\/2021\/05\/degeneracy-2.png 650w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/wp-content\/uploads\/sites\/31\/2021\/05\/degeneracy-2-300x198.png 300w\" sizes=\"(max-width: 650px) 100vw, 650px\" \/><figcaption><span class=\"has-inline-color has-black-color\"><strong>Degeneracy of the SIS algorithm<\/strong><\/span><\/figcaption><\/figure><\/div>\n\n\n\n<p class=\"wp-block-paragraph\"><span class=\"has-inline-color has-black-color\">However, as we can see, the weight of the green path overshadows the others. In fact, if we add a few more observations, the weight of the green path will converge to 1. At this stage, we are wasting computational effort simulating the other paths, as since we are using the weights to determine the best representation of the hidden states, only the green path will be important. This is known as particle degeneracy. At this stage it is useful to introduce a quantity that tells us how many of our particles are significant.<\/span><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"has-inline-color has-black-color\"><strong>Effective Sample Size<\/strong><\/span><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><span class=\"has-inline-color has-black-color\">The effective sample size (ESS) is a measure of the efficiency of different Monte Carlo methods,such as Markov chain Monte Carlo (MCMC) and Importance Sampling (IS) techniques. ESS is theoretically defined as the equivalent number of independent samples generated directly formthe target distribution, which yields the same efficiency in the estimation obtained by the MCMCor IS algorithms. The most widely used and simple form of the ESS is:<\/span><\/p>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><strong><span class=\"has-inline-color has-black-color\"><span class=\"wp-katex-eq\" data-display=\"false\">\\text{ESS}_{j} = \\frac{1}{\\sum_{i=1}^{N}(w_{j}^{i})^{2}}<\/span>,<\/span><\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span class=\"has-inline-color has-black-color\">where <span class=\"wp-katex-eq\" data-display=\"false\">w_{j}^{i}<\/span> is the normalised weight for the <span class=\"wp-katex-eq\" data-display=\"false\">i^{th}<\/span> particle at time observation <span class=\"wp-katex-eq\" data-display=\"false\">j<\/span>. This is intuitive, as if each of the particles have weights <span class=\"wp-katex-eq\" data-display=\"false\">1\/N<\/span>, where <span class=\"wp-katex-eq\" data-display=\"false\">N<\/span> is the number of simulated particles, the ESS will also be <span class=\"wp-katex-eq\" data-display=\"false\">N<\/span>. Therefore it is a measure of how many of our particles are properly exploring the space around the observations. We expect the ESS to deteriorate with the number of observations increasing as essentially we collapse onto one &#8220;important&#8221; particle with a weight almost 1.  Upon plotting the ESS for 10 observation, we get:<\/span><\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"652\" height=\"430\" src=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/wp-content\/uploads\/sites\/31\/2021\/05\/ess.png\" alt=\"\" class=\"wp-image-428\" srcset=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/wp-content\/uploads\/sites\/31\/2021\/05\/ess.png 652w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/wp-content\/uploads\/sites\/31\/2021\/05\/ess-300x198.png 300w\" sizes=\"(max-width: 652px) 100vw, 652px\" \/><figcaption><span class=\"has-inline-color has-black-color\"><strong>ESS changing with more observations<\/strong><\/span><\/figcaption><\/figure><\/div>\n\n\n\n<p class=\"wp-block-paragraph\"><span class=\"has-inline-color has-black-color\">As we can see, the ESS starts from a value close to a <span class=\"wp-katex-eq\" data-display=\"false\">100<\/span>, (<span class=\"wp-katex-eq\" data-display=\"false\">91.8<\/span> in our case). In our simulation we used <span class=\"wp-katex-eq\" data-display=\"false\">100<\/span> particles, so on the first observation, quite a lot of them are representative of the hidden state, hence we can continue to simulate them. However, after the <span class=\"wp-katex-eq\" data-display=\"false\">3^{rd}<\/span>  observation, the ESS quickly drops to zero, meaning that we have achieved particle degeneracy. One solution could be to resample the particles at each observation from a multinomial distribution. <\/span><span class=\"has-inline-color has-black-color\"><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Remember our hidden Markov model for the rabbits and foxes from the previous blog? Well, let us recap quickly. It was a hidden Markov process, the state of which consisted of the number of rabbits and foxes at each integer time step, that we could not observe, but only had access to noisy observations of&hellip; <br \/> <a class=\"read-more\" href=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/2021\/05\/14\/particle-filters\/\">Read more<\/a><\/p>\n","protected":false},"author":32,"featured_media":393,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-390","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/wp-json\/wp\/v2\/posts\/390","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/wp-json\/wp\/v2\/users\/32"}],"replies":[{"embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/wp-json\/wp\/v2\/comments?post=390"}],"version-history":[{"count":22,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/wp-json\/wp\/v2\/posts\/390\/revisions"}],"predecessor-version":[{"id":430,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/wp-json\/wp\/v2\/posts\/390\/revisions\/430"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/wp-json\/wp\/v2\/media\/393"}],"wp:attachment":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/wp-json\/wp\/v2\/media?parent=390"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/wp-json\/wp\/v2\/categories?post=390"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/martin-dimitrov\/wp-json\/wp\/v2\/tags?post=390"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}