{"id":225,"date":"2020-02-21T22:37:21","date_gmt":"2020-02-21T22:37:21","guid":{"rendered":"http:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/?p=225"},"modified":"2020-04-01T14:12:39","modified_gmt":"2020-04-01T14:12:39","slug":"methods-for-missing-data","status":"publish","type":"post","link":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/2020\/02\/21\/methods-for-missing-data\/","title":{"rendered":"Methods for Missing Data"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"225\" class=\"elementor elementor-225\">\n\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-b7c0de4 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"b7c0de4\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-ed54038\" data-id=\"ed54038\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-4c9ca74 elementor-widget elementor-widget-image\" data-id=\"4c9ca74\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"766\" src=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/wp-content\/uploads\/sites\/11\/2020\/02\/Puzzle_black-white_missing-1024x766.jpg\" class=\"attachment-large size-large wp-image-228\" alt=\"\" srcset=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/wp-content\/uploads\/sites\/11\/2020\/02\/Puzzle_black-white_missing-1024x766.jpg 1024w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/wp-content\/uploads\/sites\/11\/2020\/02\/Puzzle_black-white_missing-300x224.jpg 300w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/wp-content\/uploads\/sites\/11\/2020\/02\/Puzzle_black-white_missing-768x575.jpg 768w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/wp-content\/uploads\/sites\/11\/2020\/02\/Puzzle_black-white_missing-1536x1149.jpg 1536w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/wp-content\/uploads\/sites\/11\/2020\/02\/Puzzle_black-white_missing-2048x1532.jpg 2048w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-66 elementor-top-column elementor-element elementor-element-d59f151\" data-id=\"d59f151\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-39fd1e0 elementor-widget elementor-widget-text-editor\" data-id=\"39fd1e0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>It is common when collecting data for some entries to be absent. This can have a significant impact on any attempt to gain useful information from these data, hence methods have been developed in order to make it possible to gain useful insights into data of this kind. A simple method for this is to simply discard any record which contains a missing entry, however this can lead to such a small sample that it is not useful for obtaining reliable information. In addition to this, there may be reasons why certain groups of people do not want to supply certain information, hence this approach can result in those certain groups of people being ignored.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-62fc823 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"62fc823\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-3ea51f4\" data-id=\"3ea51f4\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-d8ebc90 elementor-widget elementor-widget-text-editor\" data-id=\"d8ebc90\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>In order to use data with absent entries, these absent entries are often filled in, this is called imputing them. There are a variety of different methods which can be used for this. Some of which are explained below.<\/p><ul><li><p><strong><em>Unconditional mean imputation:<\/em><\/strong> The simplest method is to take the average value of a variable which has missing entries, and use this as the value for all those which are missing. Whilst convenient, this can lead to distortions in the data.<\/p><\/li><li><p><em><strong>Conditional mean imputation:<\/strong><\/em> Unconditional mean imputation can be improved upon by identifying a variable which seems to have a connection with the one with missing values and group the records according to this variable. The average value within each group for the variable with missing values is then calculated and used to fill in the missing values their respective group. Distortions in the data are still present here.<\/p><\/li><li><p><em><strong>Regression imputation:<\/strong><\/em>\u00a0This method involves identifying a variable which has a connection to the one with missing values, and effectively plotting them and calculating a line of best fit for their relationship. This line is then used to predict missing values. Distortions in the data are still present here.<\/p><\/li><li><p><em><strong>Stochastic regression imputation:<\/strong><\/em> This method involves performing regression imputation as mentioned above, but moving every imputed value by a random amount. This is intended to reflect the randomness in the data and prevent the previously mentioned distortion.<\/p><\/li><\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-bac8dc3 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"bac8dc3\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-50 elementor-top-column elementor-element elementor-element-cc79713\" data-id=\"cc79713\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-6008fd0 elementor-widget elementor-widget-text-editor\" data-id=\"6008fd0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><\/p><p class=\"wp-block-paragraph\">In order to reflect that there is some uncertainty in imputation, when using a method with some randomness to it (such as stochastic regression imputation) it can be useful to perform regression multiple times to gain multiple data sets. These data sets are then studied separately, and the averages of these are found with an estimate of how uncertain these averages are. This is called multiple imputation and can be useful because it gives an idea of how accurate the method used is.<\/p><p><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-50 elementor-top-column elementor-element elementor-element-baab459\" data-id=\"baab459\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-2875a91 elementor-widget elementor-widget-image\" data-id=\"2875a91\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"910\" height=\"607\" src=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/wp-content\/uploads\/sites\/11\/2020\/02\/data-dataset-word-data-deluge.jpg\" class=\"attachment-large size-large wp-image-229\" alt=\"\" srcset=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/wp-content\/uploads\/sites\/11\/2020\/02\/data-dataset-word-data-deluge.jpg 910w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/wp-content\/uploads\/sites\/11\/2020\/02\/data-dataset-word-data-deluge-300x200.jpg 300w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/wp-content\/uploads\/sites\/11\/2020\/02\/data-dataset-word-data-deluge-768x512.jpg 768w\" sizes=\"(max-width: 910px) 100vw, 910px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-0ddd7e7 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"0ddd7e7\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-0370043\" data-id=\"0370043\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-05ddb61 elementor-widget elementor-widget-text-editor\" data-id=\"05ddb61\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>When studying data, the selection of which variables to study is important. There are well established methods for this, however with missing data things are not quite as straight forward. Methods for dealing with this range from simply performing the standard method on the imputed data to altering the chances of variables being selected based on how much of them are missing.<\/p><p>Overall this is a wide area with a range of methods associated with it, only a few of which have been mentioned here. It is important to keep researching in this area in order to make collected data as useful as possible.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>It is common when collecting data for some entries to be absent. This can have a significant impact on any attempt to gain useful information from these data, hence methods have been developed in order to make it possible to gain useful insights into data of this kind. A simple method for this is to [&hellip;]<\/p>\n","protected":false},"author":11,"featured_media":231,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[12],"tags":[10],"class_list":["post-225","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-statistics","tag-statistics"],"_links":{"self":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/wp-json\/wp\/v2\/posts\/225","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/wp-json\/wp\/v2\/comments?post=225"}],"version-history":[{"count":5,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/wp-json\/wp\/v2\/posts\/225\/revisions"}],"predecessor-version":[{"id":244,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/wp-json\/wp\/v2\/posts\/225\/revisions\/244"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/wp-json\/wp\/v2\/media\/231"}],"wp:attachment":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/wp-json\/wp\/v2\/media?parent=225"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/wp-json\/wp\/v2\/categories?post=225"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/matthew-randall\/wp-json\/wp\/v2\/tags?post=225"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}