{"id":504,"date":"2021-05-01T17:56:47","date_gmt":"2021-05-01T17:56:47","guid":{"rendered":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/?p=504"},"modified":"2021-05-10T12:30:12","modified_gmt":"2021-05-10T12:30:12","slug":"reinforcement-learning-the-end-game-2","status":"publish","type":"post","link":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/2021\/05\/01\/reinforcement-learning-the-end-game-2\/","title":{"rendered":"Reinforcement Learning: The End Game"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"504\" class=\"elementor elementor-504\">\n\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-80c3554 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"80c3554\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-3ef841d\" data-id=\"3ef841d\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-99f3689 elementor-widget elementor-widget-text-editor\" data-id=\"99f3689\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><em>So Reinforcement Learning Intellegence can now beat us as a game that we humans have been playing for a good few millennium. So what? And more importantly, where is Reinforcement Learning heading now?<\/em><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-45c90a6 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"45c90a6\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-f35baa9\" data-id=\"f35baa9\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-11ad3ad elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"11ad3ad\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-9a83975 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"9a83975\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-5c9c414\" data-id=\"5c9c414\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-d0c609d elementor-widget elementor-widget-text-editor\" data-id=\"d0c609d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Well, something to point out is that it&#8217;s not just <em>Go<\/em>&#8230; or Chess or Backgammon. It&#8217;s the better part of everything. In 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game. The game in question was Dota 2, a free-to-play Real Time Strategy (RTS) \/ Multiplay Online Battle Arena (MOBA) game that regularly sees hundreds of thousands active players at any one given time. <span dir=\"ltr\">The game is actively played by full time professionals, and the prize<\/span><span dir=\"ltr\">pool for the 2020 international championship exceeded $40 million<\/span><span dir=\"ltr\">. <strong>Forty Million Dollars.<\/strong><\/span><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-58e59c4 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"58e59c4\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-9a77634\" data-id=\"9a77634\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-524bdf3 elementor-widget elementor-widget-video\" data-id=\"524bdf3\" data-element_type=\"widget\" data-e-type=\"widget\" data-settings=\"{&quot;youtube_url&quot;:&quot;https:\\\/\\\/www.youtube.com\\\/watch?v=BkW_m33OEcE&quot;,&quot;video_type&quot;:&quot;youtube&quot;,&quot;controls&quot;:&quot;yes&quot;}\" data-widget_type=\"video.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-wrapper elementor-open-inline\">\n\t\t\t<div class=\"elementor-video\"><\/div>\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-600e762 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"600e762\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-654f0dd\" data-id=\"654f0dd\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-dbd7b5a elementor-widget elementor-widget-text-editor\" data-id=\"dbd7b5a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p style=\"text-align: center\">Dota 2 International Announcement<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-47b7a0d elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"47b7a0d\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-ec48baa\" data-id=\"ec48baa\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-7da3aff elementor-widget elementor-widget-text-editor\" data-id=\"7da3aff\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Such an open enviroment has serious challenges for AI systems, such as long time horizons, imperfect information, and both complex, continuous state-action spaces (https:\/\/arxiv.org\/abs\/1912.06680). OpenAI Five would go on to defeate the at time world champion team, demonstrates that self-play reinforcement learning can completely surpass human ability in an extremely complex enviroment. This was done by applying existing reinforcement learning techniques, where the enviroment was scaled to learn from batches of approximately 2 million frames every 2 seconds. This process took 10 months.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-506c89d elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"506c89d\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-b507695\" data-id=\"b507695\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-0c572a1 elementor-widget elementor-widget-image\" data-id=\"0c572a1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"688\" height=\"382\" src=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-content\/uploads\/sites\/28\/2021\/05\/OpenAi-Five.png\" class=\"attachment-large size-large wp-image-512\" alt=\"OpenAi Five\" srcset=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-content\/uploads\/sites\/28\/2021\/05\/OpenAi-Five.png 726w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-content\/uploads\/sites\/28\/2021\/05\/OpenAi-Five-300x167.png 300w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-content\/uploads\/sites\/28\/2021\/05\/OpenAi-Five-24x13.png 24w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-content\/uploads\/sites\/28\/2021\/05\/OpenAi-Five-36x20.png 36w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-content\/uploads\/sites\/28\/2021\/05\/OpenAi-Five-48x27.png 48w\" sizes=\"(max-width: 688px) 100vw, 688px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">OpenAI Five Progress in Skill vs Computation (Dota 2 with Large Scale Deep Reinforcement Learning, Berner, Brockman, Chan et al., 2019, https:\/\/arxiv.org\/abs\/1912.06680)<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-1b85241 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"1b85241\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-c1831d0\" data-id=\"c1831d0\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-5748522 elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"5748522\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-a2e4c71 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"a2e4c71\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-395b213\" data-id=\"395b213\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-375ae89 elementor-widget elementor-widget-text-editor\" data-id=\"375ae89\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>So Reinforcement Learning, given enough time, is <span data-dobid=\"hdw\">incredibly powerful when solving one specific task. It&#8217;s not an unreasonable assumption that if self-play can get Reinforcement Learning to surpass humans at Dota 2, then it should be possible for nearly any <\/span>competitive game. But extending it, from one AI learning one game to one AI learning multiple different and unique rulesets is a different problem entirely. This leads us into the realm of Meta-Learning.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-1e507e5 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"1e507e5\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-8a95e9e\" data-id=\"8a95e9e\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-628517d elementor-widget elementor-widget-text-editor\" data-id=\"628517d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Meta Learning is the idea of teaching an AI to <em>learn<\/em> how to learn<span dir=\"ltr\">, with the goal of many Artificial Intelligence developers to eventually <\/span><span dir=\"ltr\">reach some form of Artificial General Intelligence, Meta Reinforcement Learning is commonly seen as <\/span><span dir=\"ltr\">the next step. There also exists the posibility that learning how to learn can be leveraged to speed up existing learning in deepRL.<\/span><\/p><p>The team at OpenAI have been working on MuZero, which rather than trying to model the entire environment the agent might encounter, MuZero just models aspects that are important to the agent\u2019s decision-making process. As the OpenAI blog (https:\/\/deepmind.com\/blog\/article\/muzero-mastering-go-chess-shogi-and-atari-without-rules) clarifies, &#8220;knowing an umbrella will keep you dry is more useful to know than modelling the pattern of raindrops in the air&#8221;.<\/p><p>MuZero only looks to model three elements from the enviroment (see the example way back with MDPs) that are the only three strictly necessary components. The value of a current position, the Policy the agent is following and the reward it just recieved. That&#8217;s it. In only focusing on the strictly necessary, MuZero is already able to learn and compete with other specifically designed Reinforcement Learning AI at Atari games that Mu has never seen before, does not know the rules of, has no human input has no access to any prior\/external knowledge. Not only this, but a variant called MuZero Reanalyze can use the learned model 90% of the time to re-plan what should have been done in past episodes.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-32a4024 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"32a4024\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-54c8498\" data-id=\"54c8498\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-e1c4ded elementor-widget elementor-widget-image\" data-id=\"e1c4ded\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"688\" height=\"314\" src=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-content\/uploads\/sites\/28\/2021\/05\/MuZero-1024x467.png\" class=\"attachment-large size-large wp-image-506\" alt=\"MuZero\" srcset=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-content\/uploads\/sites\/28\/2021\/05\/MuZero-1024x467.png 1024w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-content\/uploads\/sites\/28\/2021\/05\/MuZero-300x137.png 300w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-content\/uploads\/sites\/28\/2021\/05\/MuZero-768x350.png 768w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-content\/uploads\/sites\/28\/2021\/05\/MuZero-24x11.png 24w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-content\/uploads\/sites\/28\/2021\/05\/MuZero-36x16.png 36w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-content\/uploads\/sites\/28\/2021\/05\/MuZero-48x22.png 48w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-content\/uploads\/sites\/28\/2021\/05\/MuZero.png 1084w\" sizes=\"(max-width: 688px) 100vw, 688px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Performance of MuZero in Chess, Shogi, Go and Atari over time (Schrittwieser, J., Antonoglou, I., Hubert, T. et al. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588, 604\u2013609 (2020). https:\/\/doi.org\/10.1038\/s41586-020-03051-4)<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-4f55210 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"4f55210\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-b55d490\" data-id=\"b55d490\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-4039cf7 elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"4039cf7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-d665a05 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"d665a05\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-afce4b0\" data-id=\"afce4b0\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-9d50af1 elementor-widget elementor-widget-text-editor\" data-id=\"9d50af1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Is MuZero going to take over the world Terminator style? <em>Probably not&#8230;<\/em> But the potential applications to, for example, healthcare, are genuinely exciting. All the best,<\/p><p style=\"text-align: center\">&#8211; Jordan J Hood<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>So Reinforcement Learning Intellegence can now beat us as a game that we humans have been playing for a good&hellip;<\/p>\n","protected":false},"author":29,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"slim_seo":{"title":"Reinforcement Learning: The End Game - Jordan J Hood","description":"So Reinforcement Learning Intellegence can now beat us as a game that we humans have been playing for a good few millennium. So what? And more importantly, wher"},"footnotes":""},"categories":[3,5],"tags":[],"class_list":["post-504","post","type-post","status-publish","format-standard","hentry","category-academic","category-mres"],"_links":{"self":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-json\/wp\/v2\/posts\/504","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-json\/wp\/v2\/users\/29"}],"replies":[{"embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-json\/wp\/v2\/comments?post=504"}],"version-history":[{"count":7,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-json\/wp\/v2\/posts\/504\/revisions"}],"predecessor-version":[{"id":518,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-json\/wp\/v2\/posts\/504\/revisions\/518"}],"wp:attachment":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-json\/wp\/v2\/media?parent=504"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-json\/wp\/v2\/categories?post=504"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/jordan-j-hood\/wp-json\/wp\/v2\/tags?post=504"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}