{"id":250,"date":"2022-03-29T10:45:00","date_gmt":"2022-03-29T10:45:00","guid":{"rendered":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/?p=250"},"modified":"2022-04-17T19:59:47","modified_gmt":"2022-04-17T19:59:47","slug":"anomaly-detection","status":"publish","type":"post","link":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/anomaly-detection\/","title":{"rendered":"Anomaly detection"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">As we had an overview of one of the unsupervised learning method,<strong><em> K-means Clustering<\/em><\/strong>, in my previous <a href=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/k-means-clustering\/\">blog<\/a> post, this post will introduce you to an another unsupervised learning method called Anomaly Detection. We should also note that Anomaly detection can be done using Supervised and Semi-supervised techniques as well. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What is an Anomaly?<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Anomalies (Outliers) are patterns in data that do not conform to a well defined notion of normal behaviour. An unexpected change within these data patterns, or an event that does not conform to the expected data pattern, is considered an anomaly. In other words, an anomaly is a deviation from the usual happenings.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-7387b849 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:100%\">\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-flow wp-block-group-is-layout-flow\">\n<figure class=\"wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-large is-resized is-style-default\"><img loading=\"lazy\" decoding=\"async\" data-id=\"259\" src=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-content\/uploads\/sites\/35\/2022\/04\/image-4-1024x530.png\" alt=\"\" class=\"wp-image-259\" width=\"393\" height=\"203\" srcset=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-content\/uploads\/sites\/35\/2022\/04\/image-4-1024x530.png 1024w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-content\/uploads\/sites\/35\/2022\/04\/image-4-300x155.png 300w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-content\/uploads\/sites\/35\/2022\/04\/image-4-768x398.png 768w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-content\/uploads\/sites\/35\/2022\/04\/image-4.png 1085w\" sizes=\"auto, (max-width: 393px) 100vw, 393px\" \/><figcaption><em>Figure 1., Wide Variety of Anomalies<\/em><\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large is-style-default\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"433\" data-id=\"262\" src=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-content\/uploads\/sites\/35\/2022\/04\/Anomaly-2-1024x433.png\" alt=\"\" class=\"wp-image-262\" srcset=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-content\/uploads\/sites\/35\/2022\/04\/Anomaly-2-1024x433.png 1024w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-content\/uploads\/sites\/35\/2022\/04\/Anomaly-2-300x127.png 300w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-content\/uploads\/sites\/35\/2022\/04\/Anomaly-2-768x325.png 768w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-content\/uploads\/sites\/35\/2022\/04\/Anomaly-2-1536x650.png 1536w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-content\/uploads\/sites\/35\/2022\/04\/Anomaly-2-1140x482.png 1140w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-content\/uploads\/sites\/35\/2022\/04\/Anomaly-2.png 1914w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption><em>Figure 2., Illustration of simple anomalies in <\/em>2D<\/figcaption><\/figure>\n<figcaption class=\"blocks-gallery-caption\">Source :<a href=\"https:\/\/link.springer.com\/article\/10.1007\/s41060-021-00265-1\"> Figure 1 ,<\/a><a href=\"https:\/\/towardsdatascience.com\/a-note-about-finding-anomalies-f9cedee38f0b\"> Figure 2<\/a><\/figcaption><\/figure>\n<\/div><\/div>\n<\/div>\n<\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Red Occurences in Figure 1, describes the various anomalies that can occur. In Figure 2, We could observe most of the data occurences are in the region <span class=\"wp-katex-eq\" data-display=\"false\"> N_1 <\/span> and <span class=\"wp-katex-eq\" data-display=\"false\"> N_2 <\/span>. The points far off from these normal region, such as <span class=\"wp-katex-eq\" data-display=\"false\"> O_1, O_2 <\/span> and <span class=\"wp-katex-eq\" data-display=\"false\"> O_3 <\/span> are anomalies.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Anomalies can occur for various reasons. Customer acquisition costs, Malicious activity, Web page views, Revenue per click, Credit card frauds and more. These are valuable metrics for any data analyst and is considered advantage in decision making process. Therefore, Anomaly detection is an important process, which if efficient and timely can enable intervention and action to minimize or avoid effects of underlying cause. Even from a statistical context, anomaly detection helps the methods to be more robust and efficient. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Types of Anomalies<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Anomalies are majorly classified into these three main categories which I will discuss below. However, there are wide variety of anomalies. If you are interested in learning various anomaly types and how it is distinguished, this <a href=\"https:\/\/link.springer.com\/article\/10.1007\/s41060-021-00265-1\">paper<\/a> could be helpful.  <\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Point Anomaly<\/strong>:  A datapoint is considered an anomaly if its far outside or extremely away from other data points. It is also called as global outlier. For example <span class=\"wp-katex-eq\" data-display=\"false\"> O_1 <\/span>, in figure 2, is a point anomaly.<\/li><li><strong>Contextual Anomaly:<\/strong> Contextual outliers are data points whose value significantly deviates from other data within some defined context. Also known as conditional outliers, its values are not outside global range but abnormal in comparison to the trend\/period or seasonal pattern. For example, <span class=\"wp-katex-eq\" data-display=\"false\"> 20 ^\\circ C <\/span> on Christmas day.<\/li><li><strong>Collective Anomaly:<\/strong> Collective anomalies are defined as sequences of observations that are not anomalous when considered individually, but together form an anomalous pattern. In layman words, having all cars on the freeway move to the left lane simultaneously would be a collective outlier because, even though moving to the left lane is not uncommon, it is unusual for all cars to move at the same time.<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">With different types of data, we come across various types of outliers. If we consider database systems, we will have outliers like insertion, updation or deletion anomalies. In time series data, we have additive outliers, innovational outliers, level shift outliers, seasonal outliers and more. This <a href=\"https:\/\/www.ibm.com\/docs\/en\/spss-modeler\/18.1.1?topic=series-outliers\">documentation<\/a> from IBM gives a general idea of understanding outliers in time-series data if you are interested.  <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Data that appears in a variety of disciplines including biology, medicine, meteorology and engineering, like Functional data (type of data consists of curves varying over a continuum, such as time, frequency, or wavelength), we come across outliers like magnitude, amplitude and shape outliers.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Magnitude Outlier:<\/strong> If any or one of the curve is outlying in some parts or across whole design domain.<\/li><li><strong>Amplitude Outlier:<\/strong> The curves with unusual oscillation levels<\/li><li><strong>Shape\/Pattern Outliers<\/strong>: The curves with unusual shape or the ones that significantly differ from pattern exhibited by other curves even after centralizing and normalizing. <\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"638\" height=\"788\" src=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-content\/uploads\/sites\/35\/2022\/04\/functional-outliers.jpg\" alt=\"\" class=\"wp-image-270\" srcset=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-content\/uploads\/sites\/35\/2022\/04\/functional-outliers.jpg 638w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-content\/uploads\/sites\/35\/2022\/04\/functional-outliers-243x300.jpg 243w\" sizes=\"auto, (max-width: 638px) 100vw, 638px\" \/><figcaption><br><a href=\"https:\/\/www.sciencedirect.com\/science\/article\/abs\/pii\/S0167947320300517?via%3Dihub\" target=\"_blank\" rel=\"noreferrer noopener\">Source<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Anomaly detection is the process of identifying these items or events in data sets that differ from the norm. Anomaly detection is often performed on unlabeled data, which is known as unsupervised anomaly detection. Anomaly detection is based on two assumptions: <\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Anomalies occur very rarely in the data<\/li><li>Their features differ significantly from the normal cases.<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">A business system that is considered normal today may not be so in the future. Most business systems change over time as the result of various factors. Best example is the current situation of Covid19, the whole world is experiencing.  The unprecedented scenarios everywhere, which often is termed as the <em>\u201cnew normal\u201d<\/em>, it was abnormal or anomalous outcomes earlier. These anomalous outcomes can result in positive or negative impact for any organization and are important to keep track of, for formulating a long term business strategy. Hence, anomaly detection in every domain is important and finding more effective ways to detect it will be most important skill in the future. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In my next blog, we will discuss more statistics, a method to detect outliers in functional data. I hope this post gave you an insight into what anomaly is and how anomaly detection matters. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Till then, I found this small game interesting, eyeballing and finding outliers. Try it for yourself if you could find. <\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-speaker-deck wp-block-embed-speaker-deck wp-embed-aspect-4-3 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"A note about anomalies (article)\" id=\"talk_frame_437940\" class=\"speakerdeck-iframe\" src=\"\/\/speakerdeck.com\/player\/eba645f19889412b8e0799ab93b02ede\" width=\"960\" height=\"720\" style=\"aspect-ratio:960\/720; border:0; padding:0; margin:0; background:transparent;\" frameborder=\"0\" allowtransparency=\"true\" allowfullscreen=\"allowfullscreen\" mozallowfullscreen=\"true\" webkitallowfullscreen=\"true\"><\/iframe>\n<\/div><figcaption><a href=\"https:\/\/towardsdatascience.com\/a-note-about-finding-anomalies-f9cedee38f0b\">Source<\/a><\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"> <\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Further Reading:<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">This<a href=\"https:\/\/towardsdatascience.com\/a-note-about-finding-anomalies-f9cedee38f0b\"> blog<\/a> could help you in basic overview of Anomalies. I have shared the links of few papers that can be of interest.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/link.springer.com\/article\/10.1007\/s41060-021-00265-1\">https:\/\/link.springer.com\/article\/10.1007\/s41060-021-00265-1<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.sciencedirect.com\/science\/article\/abs\/pii\/S0167947320300517?via%3Dihub\">https:\/\/www.sciencedirect.com\/science\/article\/abs\/pii\/S0167947320300517?via%3Dihub<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"> <\/p>\n","protected":false},"excerpt":{"rendered":"<p>As we had an overview of one of the unsupervised learning method, K-means Clustering, in my previous blog post, this post will introduce you to an another unsupervised learning method called Anomaly Detection. We should also note that Anomaly detection can be done using Supervised and Semi-supervised techniques as well. What is an Anomaly? Anomalies (Outliers) are patterns in data that do not conform to a well defined notion of normal behaviour. An unexpected change within these data patterns, or an event that does not conform to the expected data pattern, is considered an anomaly. In other words, an anomaly is a deviation from the usual happenings. Red Occurences in Figure 1, describes the various anomalies that can occur. In Figure 2, We could observe most of the data occurences are in the region and . The points far off from these normal region, such as and are anomalies. Anomalies can occur for various reasons. Customer acquisition costs, Malicious activity, Web page views, Revenue per click, Credit card frauds and more. These are valuable metrics for any data analyst and is considered advantage in decision making process. Therefore, Anomaly detection is an important process, which if efficient and timely can enable intervention and action to minimize or avoid effects of underlying cause. Even from a statistical context, anomaly detection helps the methods to be more robust and efficient. Types of Anomalies Anomalies are majorly classified into these three main categories which I will discuss below. However, there are wide variety of anomalies. If you are interested in learning various anomaly types and how it is distinguished, this paper could be helpful. Point Anomaly: A datapoint is considered an anomaly if its far outside or extremely away from other data points. It is also called as global outlier. For example , in figure 2, is a point anomaly. Contextual Anomaly: Contextual outliers are data points whose value significantly deviates from other data within some defined context. Also known as conditional outliers, its values are not outside global range but abnormal in comparison to the trend\/period or seasonal pattern. For example, on Christmas day. Collective Anomaly: Collective anomalies are defined as sequences of observations that are not anomalous when considered individually, but together form an anomalous pattern. In layman words, having all cars on the freeway move to the left lane simultaneously would be a collective outlier because, even though moving to the left lane is not uncommon, it is unusual for all cars to move at the same time. With different types of data, we come across various types of outliers. If we consider database systems, we will have outliers like insertion, updation or deletion anomalies. In time series data, we have additive outliers, innovational outliers, level shift outliers, seasonal outliers and more. This documentation from IBM gives a general idea of understanding outliers in time-series data if you are interested. Data that appears in a variety of disciplines including biology, medicine, meteorology and engineering, like Functional data (type of data consists of curves varying over a continuum, such as time, frequency, or wavelength), we come across outliers like magnitude, amplitude and shape outliers. Magnitude Outlier: If any or one of the curve is outlying in some parts or across whole design domain. Amplitude Outlier: The curves with unusual oscillation levels Shape\/Pattern Outliers: The curves with unusual shape or the ones that significantly differ from pattern exhibited by other curves even after centralizing and normalizing. Anomaly detection is the process of identifying these items or events in data sets that differ from the norm. Anomaly detection is often performed on unlabeled data, which is known as unsupervised anomaly detection. Anomaly detection is based on two assumptions: Anomalies occur very rarely in the data Their features differ significantly from the normal cases. A business system that is considered normal today may not be so in the future. Most business systems change over time as the result of various factors. Best example is the current situation of Covid19, the whole world is experiencing. The unprecedented scenarios everywhere, which often is termed as the \u201cnew normal\u201d, it was abnormal or anomalous outcomes earlier. These anomalous outcomes can result in positive or negative impact for any organization and are important to keep track of, for formulating a long term business strategy. Hence, anomaly detection in every domain is important and finding more effective ways to detect it will be most important skill in the future. In my next blog, we will discuss more statistics, a method to detect outliers in functional data. I hope this post gave you an insight into what anomaly is and how anomaly detection matters. Till then, I found this small game interesting, eyeballing and finding outliers. Try it for yourself if you could find. Further Reading: This blog could help you in basic overview of Anomalies. I have shared the links of few papers that can be of interest. https:\/\/link.springer.com\/article\/10.1007\/s41060-021-00265-1 https:\/\/www.sciencedirect.com\/science\/article\/abs\/pii\/S0167947320300517?via%3Dihub<\/p>\n","protected":false},"author":38,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"slim_seo":{"title":"Anomaly detection - Harini Jayaraman","description":"As we had an overview of one of the unsupervised learning method, K-means Clustering , in my previous blog post, this post will introduce you to an another unsu"},"footnotes":""},"categories":[9],"tags":[],"class_list":["post-250","post","type-post","status-publish","format-standard","hentry","category-statistics"],"_links":{"self":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-json\/wp\/v2\/posts\/250","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-json\/wp\/v2\/users\/38"}],"replies":[{"embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-json\/wp\/v2\/comments?post=250"}],"version-history":[{"count":8,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-json\/wp\/v2\/posts\/250\/revisions"}],"predecessor-version":[{"id":273,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-json\/wp\/v2\/posts\/250\/revisions\/273"}],"wp:attachment":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-json\/wp\/v2\/media?parent=250"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-json\/wp\/v2\/categories?post=250"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/harini-jayaraman\/wp-json\/wp\/v2\/tags?post=250"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}