In 2014 I graduated from St. Catharine's College at the University of Cambridge with a BA and MMath, having studied a variety of courses in statistics and operational research, as well as some probability. My Part III essay was on cake cutting, in particular the Robertson–Webb model. Anyone interested in papers with humorous names might consider going into research in this area, with examples including Cake Cutting Really Is Not a Piece of Cake (Edmonds and Pruhs, 2006) and How to Cut a Cake Before the Party Ends (Kurokawa, Lai and Procaccia, 2013).
In the summer of 2012 I was an intern with STOR-i at Lancaster University, where I am now a student. My internship project was on a variant of the travelling salesman project, which involved both mixed integer linear programming and dynamic programming formulations of the problem. These formulations were implemented in MPL and R respectively and run on a small example.
A year later I completed a placement with the Department of Earth Sciences at the University of Oxford. The placement involved analysing event count data over time from Volcán de Colima, Mexico, which was mainly done using spectral analysis.
In 2014 I started a four year programme with STOR-i at Lancaster University. Last year I completed my MRes which involved attending statistics and operational research lecture courses as well as completing research projects, giving presentations and being involved in poster sessions.
Over the next three years I will be working on my PhD which is on sequential decision problems. In particular I am looking at the multi-armed bandit problem and the contextual bandit problem, especially when the context space is large. My supervisors are Steffen Grünewälder and Nicos Pavlidis. For more information click 'Research'.
My work focusses on two sequential decision problems known as the multi-armed bandit problem and the contextual bandit problem. In the multi-armed bandit problem we have a sequence of time points at each of which we must select an action which is referred to as an arm. This generates a sequence of random independent rewards with rewards from the same arm having the same unknown distribution.
We will generally be interested in maximising the expectation of the sum of these rewards, particularly when the number of time points is large. However to do this it is necessary to learn about the distributions of the rewards from all of the arms. These objectives must be balanced by the use good policies which decide which arm should be selected next based on past observations.
The contextual bandit problem is identical to the multi-armed bandit problem except that the reward at a given time point depends not only on which arm is selected but also on some context which is different at different time points. We know the context of the current time point. If the context comes from a very large context space then it becomes more difficult to model the rewards based on the contexts.
If you want to get in contact with me feel free to use the form below. If you are really keen, on weekdays there is a good chance I will be in my office which is Fylde B56.