{"id":204,"date":"2022-03-07T09:00:00","date_gmt":"2022-03-07T09:00:00","guid":{"rendered":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/?p=204"},"modified":"2024-03-10T16:45:17","modified_gmt":"2024-03-10T16:45:17","slug":"fighting-in-the-karate-club-stochastic-block-models","status":"publish","type":"post","link":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/2022\/03\/07\/fighting-in-the-karate-club-stochastic-block-models\/","title":{"rendered":"Fighting in the Karate Club: Stochastic block models"},"content":{"rendered":"<span class=\"span-reading-time rt-reading-time\" style=\"display: block;\"><span class=\"rt-label rt-prefix\">Reading Time: <\/span> <span class=\"rt-time\"> 5<\/span> <span class=\"rt-label rt-postfix\">minutes<\/span><\/span>\n<p>Imagine you love karate. You love the principles of the martial arts, you love the physical activity, and well&#8230; you love fighting. So you deal with your violent desires in a responsible way &#8211; you join your university&#8217;s Karate Club. Everything is going great, until there is some conflict over the price of lessons between the club president and the part-time karate instructor. Before you know it, the whole club is divided on the matter and it has become a conflict of ideology rather than just fees. Ultimately, the club leaders fire the instructor, and all of his supporters leave with him and form their own karate club. This is obviously not the type of fighting you had in mind when you joined.<\/p>\n\n\n\n<p>This <a rel=\"noreferrer noopener\" href=\"https:\/\/www.journals.uchicago.edu\/doi\/abs\/10.1086\/jar.33.4.3629752\" data-type=\"URL\" data-id=\"https:\/\/www.journals.uchicago.edu\/doi\/abs\/10.1086\/jar.33.4.3629752\" target=\"_blank\">exact situation<\/a> was studied by Wayne Zachary, an anthropologist. This and many other situations can be represented as networks to describe the social, physical and other structures where interactions between pairs of units are observed.  These include social networks, biological structures, and collections of websites, documents and words. <\/p>\n\n\n\n<p><strong>Stochastic block models (SBMs)<\/strong> are a class of random graph models which are widely<br>studied and popular for statistical analysis of networks.  These networks are modelled to discover and understand their underlying structure, which can be used to group similar elements or simulate how the network could grow. The goal is to infer unknown characteristics of the elements in the network from the observed measurements on pairwise properties. <\/p>\n\n\n\n<p>In this blog, we discuss <strong>SBMs<\/strong> using the Karate club example.<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Stochastic Block Models<\/li><li>Extensions of the model<ol><li>Degree-corrected SBMs<\/li><li>Multi-membership SBMs<\/li><li>Assortative SBMs<\/li><\/ol><\/li><\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">1. Stochastic Block Models (SBMs)<\/h2>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p>Consider an SBM for the karate club. There are <span class=\"wp-katex-eq\" data-display=\"false\"> n <\/span> members, each representing a node in the graph. The members can be divided into <span class=\"wp-katex-eq\" data-display=\"false\"> K <\/span> mutually exclusive groups, <span class=\"wp-katex-eq\" data-display=\"false\">(B_1 \\cdots B_K) <\/span>. These groups are unknown. What is known about this club are all the relationships between its members, which can be represented in the <strong><em>adjacency matrix<\/em><\/strong>   <span class=\"wp-katex-eq\" data-display=\"false\"> \\mathbf{Y} <\/span>. This is an <span class=\"wp-katex-eq\" data-display=\"false\"> n \\times n <\/span> matrix for the graph where for a pair of nodes <span class=\"wp-katex-eq\" data-display=\"false\">(p,q), \\, \\mathbf{Y}_{pq} <\/span> is 1 if there is an edge (connection) between the nodes (members) <span class=\"wp-katex-eq\" data-display=\"false\">p<\/span> and <span class=\"wp-katex-eq\" data-display=\"false\">q<\/span> and 0 otherwise.<\/p>\n\n\n\n<p>The SBM is defined by the <strong><em>stochastic block matrix,<\/em><\/strong>    <span class=\"wp-katex-eq\" data-display=\"false\"> \\mathbf{C}<\/span>, a <span class=\"wp-katex-eq\" data-display=\"false\">K \\times K<\/span> matrix for the graph where   <span class=\"wp-katex-eq\" data-display=\"false\">\\mathbf{C}_{ij} \\in [0,1]<\/span> is the probability that there is an edge from a node in   <span class=\"wp-katex-eq\" data-display=\"false\">B_i<\/span> to a node in   <span class=\"wp-katex-eq\" data-display=\"false\">B_j<\/span>. A key feature of SBMs is <strong>stochastic equivalence<\/strong>. This means that two persons in the same group have the same probability distribution of being connected to other persons, both those within and outside the group. It then follows that for node    <span class=\"wp-katex-eq\" data-display=\"false\">p \\in B_i<\/span> and node   <span class=\"wp-katex-eq\" data-display=\"false\">q \\in B_j<\/span>,<\/p>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> \\mathbf{Y}_{pq} \\sim \\text{Bernoulli}(\\mathbf{C}_{ij}). <\/span>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"688\" height=\"646\" src=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-content\/uploads\/sites\/38\/2022\/03\/Screenshot-2022-03-10-193357.png\" alt=\"\" class=\"wp-image-216\" srcset=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-content\/uploads\/sites\/38\/2022\/03\/Screenshot-2022-03-10-193357.png 688w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-content\/uploads\/sites\/38\/2022\/03\/Screenshot-2022-03-10-193357-300x282.png 300w\" sizes=\"auto, (max-width: 688px) 100vw, 688px\" \/><figcaption>The Karate club network. Graph from SBM review paper listed below (1).<\/figcaption><\/figure>\n<\/div>\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\">2. Modifications to the SBM<\/h2>\n\n\n\n<p>There are several limitations to the simple SBM described above:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Nodes within a group are expected to all have the same degree.<\/li><li>Nodes are restricted to belong to only one group.<\/li><li>The model is not guaranteed to produce assortative groups.<\/li><\/ol>\n\n\n\n<p>We will now look at a few modifications to the simple SBM that address each of these limitations. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Degree-corrected SBM<\/h3>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:60%\">\n<p>In the karate club (and many other networks), it is unlikely that every person in a particular group would have the same number of friends in the group. The <a rel=\"noreferrer noopener\" href=\"https:\/\/arxiv.org\/abs\/1008.3926\" data-type=\"URL\" data-id=\"https:\/\/arxiv.org\/abs\/1008.3926\" target=\"_blank\">degree-corrected SBM<\/a> extends the simple SBM to account for this possibility of different node degrees. <\/p>\n\n\n\n<p>In an undirected graph, the <strong>degree<\/strong> of node <span class=\"wp-katex-eq\" data-display=\"false\">p <\/span>  is the number of edges between <span class=\"wp-katex-eq\" data-display=\"false\">p <\/span> and another node. Consider the network as an undirected multi-graph. The stochastic block matrix   <span class=\"wp-katex-eq\" data-display=\"false\">\\mathbf{C}<\/span>  is redefined  such that    <span class=\"wp-katex-eq\" data-display=\"false\">\\mathbf{C}_{ij} \\in [0,1]<\/span> is the expected number of edges between nodes in   <span class=\"wp-katex-eq\" data-display=\"false\">B_i<\/span>  and   <span class=\"wp-katex-eq\" data-display=\"false\">B_j<\/span>. <\/p>\n\n\n\n<p>Also included in the model is an <em>n<\/em>-vector  <span class=\"wp-katex-eq\" data-display=\"false\">\\phi<\/span> where for each node <span class=\"wp-katex-eq\" data-display=\"false\">\\phi_p<\/span> which controls the expected degrees of node <span class=\"wp-katex-eq\" data-display=\"false\">p<\/span>. <span class=\"wp-katex-eq\" data-display=\"false\">\\phi_p<\/span>  is the probability that an edge connected to the group containing<br><em>p<\/em> is connected to <em>p<\/em> itself. It then follows that for node    <span class=\"wp-katex-eq\" data-display=\"false\">p \\in B_i<\/span> and node   <span class=\"wp-katex-eq\" data-display=\"false\">q \\in B_j<\/span>,<\/p>\n\n\n\n<span class=\"wp-katex-eq katex-display\" data-display=\"true\"> \\mathbf{Y}_{pq} \\sim \\text{Poisson}(\\phi_p\\phi_q\\mathbf{C}_{ij}). <\/span>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"375\" height=\"456\" src=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-content\/uploads\/sites\/38\/2022\/03\/Screenshot-2022-03-10-204657.png\" alt=\"\" class=\"wp-image-221\" srcset=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-content\/uploads\/sites\/38\/2022\/03\/Screenshot-2022-03-10-204657.png 375w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-content\/uploads\/sites\/38\/2022\/03\/Screenshot-2022-03-10-204657-247x300.png 247w\" sizes=\"auto, (max-width: 375px) 100vw, 375px\" \/><figcaption>Divisions of the karate club network found using the (a) uncorrected and (b) corrected SBMs. The size of each node is proportional to its degree and the shading reflects inferred group membership. The dashed line indicates the split observed in real life.<\/figcaption><\/figure><\/div>\n<\/div>\n<\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Mixed-membership SBMs<\/h3>\n\n\n\n<p>In the Karate club example, the 2nd issue is not all that major relevant (although it definitely could have been possible to be part of both clubs after the split), but it&#8217;s clear how for other networks, this can be a major limitation. Further, the strength of their affiliation with each group can be different. The <a href=\"https:\/\/jmlr.org\/papers\/v9\/airoldi08a.html\" target=\"_blank\" rel=\"noreferrer noopener\">mixed-membership SBM<\/a> (MMSBM) extends the simple SBM to accommodate these multi-faceted relationships using a mixed-membership approach.<\/p>\n\n\n\n<p>In addition to the variables in the simple SBM, there is also an    <span class=\"wp-katex-eq\" data-display=\"false\">n \\times K<\/span> matrix of membership probabilities   <span class=\"wp-katex-eq\" data-display=\"false\">\\Theta <\/span>  where each element   <span class=\"wp-katex-eq\" data-display=\"false\">\\Theta_{pi} <\/span>    represents the probability that node   <span class=\"wp-katex-eq\" data-display=\"false\">p \\in B_i<\/span>. Each row is not restricted to have only 1 non-zero element, so each node can simultaneously belong to multiple groups.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Assortative SBMs<\/h3>\n\n\n\n<p>Another consideration is if we want to model a network with a particular goal of grouping<br>similar elements. For the karate club, there are probably members who are more casual about the situation (they&#8217;re just there to fight and go home, not to make friends), and so only have a couple of friends in the club. The simple SBM may conclude that all such persons belong to the same group even if there are hardly any connections in the group because they each have a comparatively small but similar number of friends. This is because the SBM is not designed to prioritize community detection. However, work has been done to extend the model to improve its usefulness in clustering tasks.<\/p>\n\n\n\n<p><strong>Community detection<\/strong> or <strong>assortativeness<\/strong> is the property of nodes being partitioned into blocks in such a way that the edge density is high within a group and low between groups. The <a href=\"https:\/\/proceedings.neurips.cc\/paper\/2012\/file\/d6ef5f7fa914c19931a55bb262ec879c-Paper.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">assortative mixed-membership stochastic block model<\/a> (a-MMSB) is a special case of the MMSBM we looked at in the previous section. The model includes a parameter<br>for &#8220;community strength&#8221;, <span class=\"wp-katex-eq\" data-display=\"false\">\\beta_i\u2208 \\in (0, 1)<\/span> for each group which represents how closely the nodes in the group are linked.<\/p>\n\n\n\n<p> <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Learn More<\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li>A really good review of <a rel=\"noreferrer noopener\" href=\"https:\/\/appliednetsci.springeropen.com\/articles\/10.1007\/s41109-019-0232-2\" target=\"_blank\">stochastic block models<\/a><\/li><\/ul>\n","protected":false},"excerpt":{"rendered":"<p><span class=\"span-reading-time rt-reading-time\" style=\"display: block;\"><span class=\"rt-label rt-prefix\">Reading Time: <\/span> <span class=\"rt-time\"> 5<\/span> <span class=\"rt-label rt-postfix\">minutes<\/span><\/span>Many situations can be represented as networks to describe the social, physical and other structures where interactions between pairs of units are observed. Stochastic block models (SBMs) are a class of random graph models which are widely<br \/>\nstudied and popular for statistical analysis of networks. In this blog, we discuss SBMs using the Karate club example.<\/p>\n","protected":false},"author":41,"featured_media":214,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[14],"tags":[15,12],"class_list":{"0":"post-204","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","6":"hentry","7":"category-random-research","8":"tag-statistics","9":"tag-stochastic-block-models","11":"post-with-thumbnail","12":"post-with-thumbnail-large"},"_links":{"self":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-json\/wp\/v2\/posts\/204","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-json\/wp\/v2\/users\/41"}],"replies":[{"embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-json\/wp\/v2\/comments?post=204"}],"version-history":[{"count":19,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-json\/wp\/v2\/posts\/204\/revisions"}],"predecessor-version":[{"id":297,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-json\/wp\/v2\/posts\/204\/revisions\/297"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-json\/wp\/v2\/media\/214"}],"wp:attachment":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-json\/wp\/v2\/media?parent=204"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-json\/wp\/v2\/categories?post=204"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-json\/wp\/v2\/tags?post=204"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}