Interns and Blogs
The STOR-i Summer Research Internships run for eight weeks, July to September, each year. Every cohort writes a weekly blog on their experience of the programme, and you can read about them here.
Click on the links below to see details of the interns by year.
Here you can find details of the summer 2020 interns including a description of their research project.
Modelling extremes of environmental data
Degree: BSc (Hons) Mathematics, Manchester University
Supervisor: Callum Barltrop
Non-stationary is a term that is used to describe data for which the underlying distribution is not fixed. This is a common feature in many environmental datasets; for example, we often see temperature data increasing over time. When this is observed, standard statistical methodology that assumes data is identically distributed cannot be applied. This can lead to many inferential challenges, especially in the case of extreme value theory.
This branch of statistics is the theoretical framework used to model ‘extreme’ events (events considered to be rare or uncommon). By definition, there is little data for such events, any analysis relies heavily upon theoretical results. However, when non-stationarity is present, the standard theory cannot be applied. Moreover, we have to question what we mean by an ‘extreme’ event. For example, an ‘extreme’ event today may not be considered ‘extreme’ in 10 years time.
Non-stationarity can be present for a range of different factors. One such factor of particular interest is climate change. It is expected that climate change will continue to increase temperatures, while also increasing the frequency and magnitude of some extreme weather events, such as storms and floods. Therefore, it is important to find ways to build climate change into our framework.
In this project, we will be considering methods for modelling extreme events of environmental data using variables that drive climate change, such as CO2 levels. Initially, we will use a range of simulated dataset to develop a model capturing that can capture climate trends in extreme events. We will then be applying this model to UK climate projection data for a range of different variables, including temperature and humidity.
Click here for Malvina - presentation
Modelling Populations of Networks
Degree: MMath Mathematics, University of Warwick
Supervisor: George Bolt
Network data arises when we have relational information between entities of a system. A canonical example is social network data, where we may observe information on friendships within a sample of a population. We typically represent this data as mathematical graph, i.e. a set of vertices and edges, where vertices correspond to entities and edges represent relationships between them.
The development of statistical models for network data has become an active area of research, see Goldenberg et al. (2009) for a review. The majority of these models assume the observed network was generated in some pre-specified stochastic manner, dependent on a choice of model parameters. The task of the statistician is then to take an observed network and infer what parameters could have, or were most likely to have, led to its appearance.
Many of the traditional network models were constructed with application to individually observed networks in mind. However, with datasets becoming larger and richer there has been recent interest in developing models to describe the generative process of a population of networks. An interesting example of such data are connectomes, which are network representations of brain connectivity inferred from MRI scans. Typically, a scan would be taken on a sample of patients, and so after some data processing we end up with a network for each patient.
In this project, the student will be introduced to the statistical problem of modelling network data. Through simulation experiments, they will explore the benefits and drawbacks of some popular network models, before comparing these with models recently proposed in the literature that deal specifically with the problem of modelling a population of networks.
Click here for James - presentation
Extreme events: what are the odds?
Degree: BSc Mathematics, Lancaster University
Supervisor: Stan Tendijck
In this project, we compare different models in modelling bivariate extremes. This has oceanographic applications, for example, in the joint modelling of wave height and wind-speed, both of which are important variables in the calculation of failure probabilities of offshore facilities. Also, other applications can be thought of, like the joint modelling of losses on financial assets like the FTSElOO and AEX, or the modelling of the composition of certain gases in the atmosphere.
In the project, we compare the Heffernan-Tawn model introduced in  and a number of derived models. The Heffernan-Tawn model is a conditional extremes model that captures a wide variety of different dependency structures, essentially it is a form of regression model fitted only to the extremes. It is currently one of the most flexible models used in the field (5OO+ citations). They chose their particular model form since it worked asymptotically on almost all bivariate dependency structures (copulas) that were developed. The model is not perfect as many small variations have been introduced since its introduction, e.g. in . We will compare a few of these and potentially come up with a new one.
The Heffernan-Tawn model is an asymptotic model, i.e., it works as long as we push observations far enough away. However, in the area of extremes, we do not want to model observations with an occurrence probability of 10−30 but rather 10−2. The Heffernan-Tawn model might not have converged enough to make the model form the best possible one.
Based on different model choices, we compare the differences by estimating probabilities of extreme sets, e.g. the probability that a wave larger than 5m occurs together with a windspeed of higher than 40 knots?
Click here for Luke - presentation
Uncertain Predictions in Resource Allocation Models
Degree: MSc Natural Sciences, Durham University
Supervisor: Ben Black
Resource allocation problems are extremely common in the operational research literature. They consist of allocating a fixed set of resources such as human workers or machines, to a set of skills, jobs or functions to try to meet as much demand as possible. This could be allocating electricians to jobs (Chen et al., 2018) or multi-skilled handlers to calls (Koole and Pot, 2005). These problems handle one day at a time, and as such, the demand (jobs and calls) are known.
However, in some problems such as that of the medium-term planning of a telecommunications company’s engineers (Ainslie et al., 2015), we need to plan for a large number of days in the future. This means we need forecasts of the demand that we will need to meet over this period. These forecasts are almost always uncertain, but many optimisation models still treat them as though they aren’t. This can lead to big issues in the resulting allocations, such as wasted resources and unmet demand. This project will entail studying a variety of methods that can be used in mathematical and dynamic programming to reduce or incorporate this uncertainty in the models used. An example starting point could be training a reinforcement learning (Sutton and Barto, 1998) (RL) model that learns how best to correct a poor forecast in real time.
Click here for Daniel H - presentation
Solving small-scale Arc Routing Problems
Degree: MSci Mathematics and Statistics, University of Glasgow
Supervisor: Thu Dang
Arc Routing Problem (ARP) arises in several applications, such as postal delivery, meter reading, snow removal, salt spreading, and waste collection. The aim is to find a vehicle route or a set of paths in a network at minimum cost, such that certain arcs are traversed by at least one vehicle, possibly subject to various side constraints such as limited vehicle capacities, time windows, one-way streets and so on.
Linear programming (LP) is a method to guide decision-makers toward the choice of the best options by making use of a mathematical model whose requirements are expressed by linear relationships. Linear programming is a special case of mathematical optimization. It is expected that the focus of this project will be how to model ARPs in small-scale instances.
Initially, simulated data will be used and suitable working code will be provided. Then, once a general framework for the model is established, real data at small-scale will be used to test it. The model will then be tweaked to make use of all the available specific features of the data.
Click here for Katharina - presentation
Classification in an on-line setting
Degree: MSci Mathematics, Lancaster University
Supervisor: Chloe Fearn
In classification, a model is trained on some historic data, then for subsequent data, the features are used to predict the responses. However, sometimes the underlying distribution of the responses given the features changes over time; in this case, if a model is trained once and used to classify incoming data (whose responses are not known) forever it will eventually be rendered useless, since the responses of the test data will not be related to their features in the same way that the training data was.
Another problem that arises in classification is that sometimes it is expensive to view the responses of instances. In this situation, we need to view the instances that bring the most information to the classifier, and view as few as possible to save on cost.
There is plenty of literature for on-line classification, lots of which involves forgetting factors or sliding windows. This project will first explore off-line binary classification methods, then move onto on-line methods. If time allows, active learning will be explored, which involves methods that select a subset of the full data set to learn from when label requests are expensive.
Click here for Adeeb - presentation
Modelling Waves in the Ocean
Degree: MPhys Physics with Theoretical Physics, University of Manchester
Supervisor: Jake Grainger
The world’s oceans continue to play an important part in many aspects of modern life. Waves in the ocean can cause damage to structures and ships alike, endangering their crews and causing significant financial and environmental damage. Waves also propagate onshore, where they cause erosion and flooding. As such, it is important to understand their behaviour.
Observations of ocean waves come in the form of measurements of the displacement of the sea surface at a given location. These observations can then be used to develop parametric models that can describe the sea surface, many of which are summarised by Michel (1999). Typically these models describe the spectral density function of the process of interest (the frequency domain analogue of the autocovariance). To fit such models to actual data we can use pseudo-likelihood approaches such as the de-biased Whittle likelihood (Sykulski et al., 2019).
The way in which these parameters evolve over time is of increasing interest to engineers, especially in rapidly developing weather systems such as tropical cyclones. During this project, the student will use existing techniques to fit models to data-sets and then use the model fits to explore the behaviour of the different parameters as the sea evolves.
Click here for Jack - presentation
Input Uncertainty Quantification for Stochastic Simulation
Degree: BSc MORSE, Lancaster University
Supervisor: Drupad Parmar
The behaviour of many real-world systems, such as airports, hospitals, and manufacturing lines depend greatly upon some level of inherent randomness, and therefore such systems are frequently modelled using stochastic simulation. The randomness in the simulation is driven by input models, represented by probability distributions or processes, which are often estimated via data collected from the real-world system. Since the samples of data are finite, uncertainty arises in the estimated input models and this propagates through the simulation model to performance measure outputs.
Rarely is this propagation of input model uncertainty considered in simulation output analysis. Common practice is to report simulation-based confidence intervals for performance measures, how- ever these typically ignore input uncertainty and only include stochastic estimation error. Without considering the propagation of input model uncertainty in simulation output analysis, decisions are at risk of being made with misleading levels of confidence. Interest therefore lies in quantifying the uncertainty that arises in the simulation output as a result of the uncertainty in estimating the input models.
The aim of this project is to develop an understanding of input uncertainty and implement some existing methodologies for quantifying input uncertainty on provided stochastic simulation models, whilst considering the relative advantages and disadvantages of each method.
Click here for Daniel M - presentation
Machine learning in simulation
Degree: MSci Statistics, University of Glasgow
Supervisor: Graham Laidler
Simulation is commonly used to model many real-world operations. For example, queueing systems naturally arise in the operational running of facilities such as hospitals, call centres, and manufacturing processes. To optimise the performance of such systems, their complexity often makes mathematical analysis infeasible and a simulation model is used instead. Briefly, a simulation model replaces the random processes that occur in the real-world system, such as customer arrivals and service times, with appropriately distributed random variables. Sampling these random variables allows the system to be simulated, and its performance can then be evaluated with regards to some measurable performance indicator, such as customer waiting times. However, by the stochastic nature of the systems being modelled, performance indicators can fluctuate significantly over time. As such, traditional time-averaged performance indicators give an incomplete picture of a highly variable system. There is growing interest in obtaining a deeper understanding of simulation behaviour; for example, we want to uncover the main causes of time-varying performance.
This project will include some exploratory data analysis of the data generated by a simulation model, with focus on visualising the fluctuations in performance. We will then turn our attention to some common machine learning methods, and consider ways to exploit them for our purpose. Namely, we want to uncover the driving factors behind observed simulation performance. This project offers the chance to produce some novel methodology, and can be flexible depending on the interests and prior experience of the student.
Click here for Thomas - presentation
Dynamic Latent Space Network Models
Degree: BSc Mathematics with Statistics, University of Warwick
Supervisor: Amiee Rice
Networks are often used to represent real world interactions, and therefore the ability to model real world behaviour is paramount in the field of network analysis and modelling.
Latent space models have been used to capture a high level of transitivity in networks (the common phrase that you will come across is “a friend of a friend is a friend of mine too”). This first task in this project is to understand latent space modelling of networks.
Dynamic network models allow us to capture how time affects an interaction network. How do we accurately show the change in affinity for connection between two individuals as time goes on? Then using this information, you will be able to implement models that use both methods to capture realistic interaction patterns.
Click here for Taj - presentation
Anomaly detection using functional data analysis, with applications to sea surface temperature data
Degree: BSc Mathematics with Finance, Newcastle University
Supervisor: Edward Austin
Functional Data Analysis is used to model phenomena observed over a period of time as a continuous function. This is of particular use in situations where the observations are recorded at a high frequency over the time period, as this means that a large collection of points can be represented as a single observed curve. Inference using the observed functions can then take a variety of forms, and this project will focus on anomaly detection using Functional Data. Anomaly detection is the process by which the data are examined to test whether an observation differs significantly from the other observations, or some underlying expected process.
Anomaly detection for point data is a well-studied area, and the challenge is to extend the classical notions of an anomaly to the functional domain. In particular, how can outlyingness be measured with respect to a continuous function given the fact that each observation will be a smooth curve which varies over time. Furthermore, how can this definition of outlyingness be described so that sensible conclusions can be drawn from the data.
This project will seek to address these challenges, first by performing a review of the existing functional data anomaly detection methods, and then using these to detect anomalies within Pacific Sea Surface Temperature Data. The aim of this will be to detect not only the effect of a changing climate on the sea surface temperature, but also identify periods where anomalous weather has led to unexpected temperature being recorded.
Click here for Ryan - presentation
Online Sparse Temporal Disaggregation
Degree: BSc Mathematics, Lancaster University
Supervisor: Luke Mosley
Due to the significant adverse effect to the global economy caused by the coronavirus pandemic, there has never been a more important time to understand the short-term movements of headline macroeconomic variables. We can no longer rely on infrequent publications of GDP or traditional annual business surveys to inform us on the current state of the economy. Ever since the global financial crash between 2007-2008, national statistics institutes, such as ONS here in the UK, have motivated the need for a vast set of high frequency indicator time series that are readily available and measure numerous processes. This set will be used to create disaggregated series of infrequent headline variables, which will provide early warning signals of potential large economic impacts such as financial crashes and pandemics, and administer more accurate measurements of the rapidly evolving modern economy.
With the digital revolution we witness today, there are many potential resources for high frequency indicators. To disaggregate GDP, we could use credit card transactions data or VAT returns data. To disaggregate inflation, we could use scanned price data in supermarkets or social media news articles. To disaggregate unemployment, we could use web- scraped online job advertisement data. In the econometrics literature, the process of disaggregating a low frequency time series by making use of indicator series recorded at the desired high frequency is known as temporal disaggregation. This is a two-step procedure that involves finding a preliminary estimate for the high frequency disaggregated series (usually by performing GLS regression) and then distributes the aggregated residuals among the preliminary series. With the vast number of indicator series we would now like to use when performing temporal disaggregation, standard techniques such as GLS become statistically infeasible due to the curse of dimensionality, and therefore current methods fail. To resolve this difficulty, one can set up the temporal disaggregation problem in the sparse modelling framework by incorporating a LASSO regularization penalty which will focus on selecting a small set of the indicators having the most informative power on the variable of interest. The resulting high frequency estimates from sparse temporal disaggregation will be informative on two fronts, firstly they provide accurate visualisation on the short-term movements of the headline variable and secondly, they give interpretation into what indicator series are most relevant for future estimations.
The aim of this project is to devise a way sparse temporal disaggregation can be performed in the online setting, i.e. how to automatically update the model in light of data revisions. Data revisions will be very common when performing temporal disaggregation. For example, they may be due to a new time period occurring for the low frequency variable, or changes in the indicator series set due to improved data sources. More major revisions occur when there is a change in legislation or a change in accounting definitions or in times of financial crisis. Understanding how estimates are affected by revisions plays an important role in assessing how reliable the sparse temporal disaggregation model is. We would like estimates to remain precise but also stable over time.
Click here for Matthew - presentation
Here you can find details of the summer 2019 interns including a description of their research project.
Investigating the Eﬀect of Dependence on Averaging Extremes
Degree: BSc (Hons) Mathematics, University of Manchester
Supervisor: Jordan Richards
Extreme Value Theory is often used to model extreme weather events, such as extreme rainfall, which is the main cause of river ﬂooding. Given data at separate locations, we can ﬁt simple models to understand the distribution of heavy rainfall at these single locations. However, ﬂooding is generally not caused by an extreme event at a single location; it is caused by extreme rainfall averaged over several locations, often referred to as a catchment area. This project aims to investigate how the distribution of extreme rainfall at single locations, and the dependence between extreme events at diﬀerent sites, aﬀects the distribution of the average overall sites. We do this using a subset of data provided by the Met Oﬃce, which consists of gridded hourly rainfall across the north of England. Taking each grid box to be a single location, we can ﬁt distributions that model extreme events at each particular location. Dependence between grid boxes can also be quantiﬁed using empirical measures. We then average over adjacent grid boxes and ﬁt the same distributions. We are particularly interested in how the parameters of the distributions change as we average over an increasing number of grid boxes, and how the dependence between locations inﬂuence this change.
Optimal learning for multi-armed bandits
Degree: BSc (Hons) Mathematics, University of Warwick
Supervisor: Livia Stark
Optimal learning is concerned with eﬃciently gathering information (via observations) that is used in decision making. It becomes important when the way information is gathered is expensive, so that we are willing to put some eﬀort into making the process more eﬃcient. Learning can take place in one of two settings, oﬄine, or online. In oﬄine learning, we make a decision after a number of observations have taken place, while in online learning decisions are made sequentially so that a decision results in a new observation that in turn informs our next decision. This project will focus on learning for multi-armed bandits.
Multi-armed bandits present an online learning problem. They are easiest to visualise as a collection of slot machines (sometimes referred to as one-armed-bandits). The rewards from the slot machines are random, and each machine has a diﬀerent, unknown expected reward. The goal is to maximise one’s earnings from playing the slot machines. That can be achieved by playing the machine with the highest expected reward. However, the expected rewards can only be estimated by playing the machines and observing their random rewards. Therefore there is a trade-oﬀ between exploring bandits to learn more about their expected rewards and exploiting bandits with known high expected rewards.
Recruitment to Phase III Clinical Trials
Degree: BSc(Hons) Mathematics and Statistics, Lancaster University
Supervisor: Szymon Urbas
Clinical trials are a series of rigorous experiments examining the eﬀect of a new treatment in humans. They are essential in the drug-approval process, as per the European Medicines Agency guidelines. In order for a drug to be made available to the public, it must pass a number of statistical tests each with suﬃcient certainty in the outcomes. The most costly part of the trials process is Phase III, which is composed of randomised controlled studies with large samples of patients. Patients are continuously enrolled across a number of recruitment centres.
The standard way of modelling recruitment in a practical setting is to use a hierarchical Poissongamma (PG) model, as introduced in Anisimov and Fedorov (2007). The framework assumes that the rates at which patients come into each centre do not change over time. The main argument for using the simple model is the limited data available for inferences as well as tractable predictive distributions. A recent work of Lan et al. (2018) explores the idea of decaying recruitment rates. However, the proposed model lacks ﬂexibility in accounting for a multitude of recruitment patterns appearing across diﬀerent studies.
The internship project will concern itself with the analysis of a ﬂexible class of recruitment models in data-rich scenarios. The project will likely tackle an open problem in the area, which is the presence of a mixture of diﬀerent recruitment patterns appearing in a single study. This will likely involve novel ways of clustering centres based on the observed recruitments. The project will entail a mixture of applied probability, likelihood/Bayesian inference and predictive modelling. There will be a strong computing component in the form of eﬃcient optimisation or simulation methods.
Investigating Optimism in the Exploration/Exploitation Dilemma
Degree: MPhys Physics with Astrophysics and Cosmology, Lancaster University
Supervisor: Alan Wise
A stochastic multi-armed bandit problem is one where a learner/agent has to maximise their sum of rewards by playing a row of slot machines, or ‘arms’, in sequence. In each round, the learner pulls an arm and receives a reward corresponding to this arm. The rewards that are generated from each arm are assumed to be distributed as a noisy realisation of some unknown mean, therefore, maximising the sum of the rewards relies on ﬁnding the arm with the highest mean. We wish to create policies to tell us which arm to play next in order to maximise our reward sum.
The challenge to ﬁnding the best arm in the multi-armed bandit problem is the exploration-exploitation dilemma. This dilemma occurs since, at any time point, we need to decide between playing arms which have been played a low number of times (exploration) or the arm with the best-estimated mean (exploitation). If the learner explores too much then they will miss out on playing the optimum arm, however, if the learner chooses to exploit the best arm, without exploring other options, then they could end up exploiting a sub-optimal arm. It is clear that the best policies balance both exploration and exploitation. The policies which we will study in this project follow the philosophy of optimism in the face of uncertainty.
These policies work by being optimistic towards options which we are uncertain about. For instance, consider an intern visiting Lancaster University for the ﬁrst time. For lunch, if they are optimistic about the local food places (Sultan’s/ Go Burrito) over chains (Subway/ Greggs), then they will be more likely to explore the places that they are more uncertain about. In multi-armed bandits, these policies give each arm an upper conﬁdence bound index (UCB), which usually takes the form of the estimated mean reward plus some bias, and the arm is played with the largest value of the index. These types of policies are mathematically guaranteed never to perform badly - but can we do better? This is the major question of this project.
Predicting ocean current speed using drifter trajectories
Degree: MPhys Theoretical Physics, Durham University
Supervisor: Mike O'Malley
The dataset I am using involves tracking of drifting objects in the sea which are tracked by GPS. These drifting objects are commonly referred to as drifters. In summary, the location of drifters is processed to obtain quarter daily Longitude, Latitude and Velocity Data. In order to model complex phenomena in the ocean, one of the ﬁrst pre-processing steps is to remove a large scale mean velocity of the drifters. In other words, focus on the residuals, a model which predicts velocity, given location. Currently, one of the most popular methods to do this involve binning the data, then extracting a mean in each bin and using this mean as the prediction. This project will aim to develop a better, more accurate method to predict velocity at a given location.
The general scope of methods you will be focusing on are classed as nonparametric regression, and this includes spline regression, Gaussian processes, local polynomial regression and more. These methods are generally applied to independently distributed data. One of the more diﬃcult aspects of modelling this type of data is accounting for the non-independent nature. In particular, the next sampled location in a trajectory strongly depends on the current location and velocity at that location. In particular, this sequential sampling can strongly aﬀect model selection which will be a large part of this project.
Initially, the project will look at modelling a relatively simple toy simulation which I will supply. The reasoning behind this is that we know the true underlying process, therefore the models you ﬁt can be compared to the known ground truth. The method which is found to work best can then be applied to the real dataset with empirical evidence that it works on similar data.
Point Processes on Categorical Data
Degree: BSc (Hons) MORSE, University of Warwick
Supervisor: Jess Gillam
The aim of this project to explore point process methods to model categorical time series, speciﬁcally data provided by Howz. Howz is home monitoring system based on research that indicates changes in daily routine can identify potential health risks. Howz use appliances placed around the house and other low-cost sources such as smart meter data to detect these changes. This data is a great example of how categorical data applies to real-life situations. One potential way of modelling this data is to use point processes.
Point processes are composed of a time series of binary events (Daley and VereJones, 2003). There exist many diﬀerent point processes that could be useful for modelling this data, such as Poisson processes, Hawkes processes and Renewal processes (Rizoiu et al., 2017). The goal of this project is to ﬁnd ways to model multiple sensors, looking at the time between sensors being triggered to see if this indicates a change in routine. One extension to this project would be exploring the relationship between the categories; thus having diﬀerent models for each category. We could also look into subject speciﬁc eﬀects in the data.
Detection Boundaries of Univariate Changepoints in Gaussian Data
Degree: BSc (Hons) Computer Science / MSc Data Science, LMU Munich
Supervisor: Mirjam Kirchner / Tom Grundy
Changepoint detection deals with the problem of identifying structural changes in sequential data, such as deviations in mean, volatility, or trend. In many applications, these points are of interest as they might be linked to some exogenous cause. In the univariate case, the factors impacting on the detectability of a changepoint are well known: size of the change, location of the change, type of change, number of observations, and noise. However, the interplay of these parameters with the detectability of a changepoint hidden within a data sequence is yet to be studied in detail.
In this project, we investigate the reliability of the likelihood ratio test (LRT) statistic for detecting a single change in a univariate Gaussian process. To this end, we will conduct a simulation study testing diﬀerent settings of the parameters change in mean, change in variance, location of the change, and sample size. In particular, we are interested in ﬁnding parameter combinations for which the LRT becomes unreliable. For example, for a ﬁxed variance, sequence length, and changepoint position in the data, we would decrease the change in mean until we ﬁnd a region in which the LRT scatters around zero. The overall aim is to derive a surface that splits the parameter space of the LRT statistic into a detectable (LRT > 0) and undetectable (LRT →±0) changepoint region. Ideally, as a next step, an explicit relation between the simulation parameters and the detection boundary would be determined empirically. Further experiments on detection boundaries are possible, such as analysing non-Gaussian data or alternative test statistics.
How the diﬃculty of multivariate changepoint problems vary with dimension and sparsity
Multivariate changepoint detection aims to identify structural changes in multivariate time series. Increasingly more attention is being paid to developing methods to identify multivariate changes with little information known about the diﬃculty of multivariate changepoint problems and how they scale with dimension and sparsity. This work aims to answer the question: ‘If we have a user deﬁned signiﬁcant change size, could a change this size actually be detected using changepoint methods?’
In the univariate changepoint setting, it is well understood that there are many factors that aﬀect detectability of a changepoint including the size of the change, location of the change, type of change and length of the time series. Moving from a univariate to multivariate setting adds several layers to the detection problem; including the dimension of the time series and the sparsity of the changepoint.
Recent work at Lancaster University has considered, computationally, the case of a multivariate change in the mean problem where the change size is identical in all dimensions. Under these constraints, a relationship was identiﬁed between the size of the change and the number of dimensions that ensures the true and false positive rates remain constant. This project seeks to identify a relationship between the sparsity of a changepoint and the diﬃculty of detecting it as well as exploring the problems theoretically to give more justiﬁcation to the computational ﬁndings. If time and interest allow, we will also explore the aﬀect of varying the size of the change in each series on the detectability of the changepoint.
Evaluating A Response Adaptive Clinical Trial using simulations
Degree: MSci Mathematics, Lancaster University
Supervisor: Holly Jackson
Before a new drug can be distributed to the public, it must ﬁrst go through rigorous testing to make sure it is safe and eﬀective. This evaluation in humans is undertaken in a series of clinical trials. The approach most often used in clinical trials is the randomised controlled trial (RCT), which assigns all patients with equal probability to each treatment in the trial. Therefore RCTs are an eﬃcient way to identify if there is a signiﬁcant diﬀerence between the treatments in the study. Hence, the equal allocation of patients to each treatment maximises the power of the study. However, RCTs do not allow the possibility of changing the probability of assigning a patient to the treatments. If it emerges before the end of the trial that one treatment is clearly more eﬀective than the other, then to maximise the number of patients treated successfully, logic dictates the remaining patients should be allocated to the most eﬀective treatment.
Response adaptive designs use information from previous patients to decide which treatment to assign to the next patient. They vary the arm allocation in order to favour the treatment, which is estimated to be best. Multi-Armed Bandits (MAB) are an example of a response-adaptive design. They allocate patients to competing treatments in order to balance learning (identifying the best treatment) and earning (treating as many patients as eﬀectively as possible). One issue with some response adaptive designs is every patient is expected to produce the same outcome if given the same treatment. However, some patients will have certain characteristics (also known as covariates) which means they will react to the same treatment diﬀerently. For example, an overweight man in his twenties may react diﬀerently to a drug than an underweight woman in her eighties.
This internship will focus on a randomised allocation method with nonparametric estimation for a multi-armed bandit problem with covariates. This method uses nonparametric regression techniques (including polynomial regression, splines and random forests) to estimate which treatment is best for the next patient due to their particular covariate. The main emphasis of this project is the endpoint. An endpoint could be binary, such as the treatment curing the patient or not, integer-valued, such as the number of epileptic ﬁts in 6 months, continuous, such as a change in blood pressure, or it could be the survival time of a patient.
Heuristic procedures for the resource-constrained project scheduling problem
Degree: MSci Natural Sciences, University of Bath
Supervisor: Matt Bold
The resource-constrained project scheduling problem (RCPSP) is a well-studied problem in operational research. Given a set of precedence-related activities of known duration and resource requirements, and a limited amount of resource, the RCPSP consists of ﬁnding a schedule that minimises the time to complete all the activities (known as the project makespan). Solving this problem on a large scale is very diﬃcult. Hence, whilst many exact solution methods exist for solving the RCPSP, these are too slow and therefore largely ineﬀective at solving this problem on a realistic scale. Therefore, the study and evaluation of fast, but inexact procedures (known as heuristic procedures) for solving the RCPSP is critical for real-world application.
Priority-rule heuristics are simple, yet eﬀective, scheduling procedures, consisting of a rule for ordering activities into a so-called activity list representation, and a rule for turning the activity list representation into a complete schedule. This simple class of heuristics form the basis of many of the most successful heuristic procedures for the RCPSP. This project aims to compare the eﬀectiveness of a number of diﬀerent procedures from this large subset of heuristics, by testing them on a large database of RCPSP test-instances, as well as investigate possible further improvements and extensions to them.
Approximate posterior sampling via stochastic optimisation
Degree: MMath Mathematics, Durham University
Supervisor: Srshti Putcha
We now have access to so much data that many existing statistical methods are not very eﬀective in terms of computation. These changes have prompted considerable interest amongst the machine learning and statistics communities to develop methods which can scale easily in relation to the size of the data. The “size” of a data set can refer to either the number of observations it has (tall data) or to its dimensions (wide data). This project will focus on a class of methods designed to scale up as the number of available observations increases.
In recent years, there has been a demand for large scale machine learning models based on stochastic optimisation methods. These algorithms are mainly used for their computational eﬃciency, making it possible to train models even when it is necessary to incorporate a large number of observations. The speed oﬀered by stochastic optimisation can be attributed to the fact that only a subset of examples from the dataset is used at each iteration. The main drawback of this approach is that parameter uncertainty cannot be captured since only a point estimate of the local optimum is produced.
Bayesian inference methods allow us to get a much better understanding of the parameter uncertainty present in the learning process. The Bayesian posterior distribution is generally simulated using statistical algorithms known as Markov chain Monte Carlo (MCMC). Unfortunately, MCMC algorithms often involve calculations over the whole dataset at each iteration, which means that they can be very slow for large datasets. To tackle this issue, a whole host of scalable MCMC algorithms have been developed in the literature. In particular, stochastic gradient MCMC (SGMCMC) methods combine the computational savings oﬀered by stochastic optimisation with posterior sampling, allowing us to capture parameter uncertainty more eﬀectively. This project will focus on implementing and testing the stochastic gradient Langevin dynamics (SGLD) algorithm. SGLD exploits the similarity between Langevin dynamics and stochastic optimisation methods to construct a robust sampler for tall data.
Using Pairwise Comparison in Sports to Rank and Forecast
Degree: MMath Mathematics, Durham University
Supervisor: Harry Spearing
The aim of this project is to develop a ranking system for sports. Deﬁning a ‘good’ ranking depends on the aim. A ranking system that is used to predict future results must provide accurate predictions and could have a complex structure, whereas a system designed to seed players for a tournament needs to be robust to exploitation, fair, and easy to understand. Generally, a system that excels in one of these areas will fail in the other.
It is expected that the focus of this project will be the former, namely, to develop an accurate and robust ranking system. The accuracy of the system can be measured by comparing its predictive performance against existing benchmarks as well as bookmaker’s odds, and robustness can be measured by the ranking’s sensitivity to small changes in match outcomes. A ranking system that is applicable to all sports is, of course, ideal, but sport speciﬁc features will need to be considered to achieve state-of-the-art prediction accuracy, and some general knowledge or interest in sports will be of use. Initially, simulated data will be used to design the ranking system. Then, once a general framework for the model is established, real data from a sport of the student’s choice will be used to test it. The model will then be tweaked to make use of all the available sport speciﬁc features of the data.
Bid Price Controls for Dynamic Pricing in the Airline Industry
Degree: BSc (Hons) Mathematics and Psychology, University of St Andrews
Supervisor: Nicola Rennie
In the airline industry, revenue management systems seek to maximise revenue by forecasting the expected demand for diﬀerent ﬂights, and optimally determining the prices at which to sell tickets over time. Ideally, rather than setting prices at the start of the booking horizon, they should be updated over time depending on how many people have so far purchased tickets and how much time remains until departure. One such method of dynamically pricing tickets is the use of bid price controls. Bid price controls set threshold values for each leg of a ﬂight network; such that an itinerary (path on the network requested by a potential passenger) is sold only if its fare exceeds the sum of the threshold values along the path (Talluri and Ryzin, 1998).
Given that bid price controls require forecasts of demand; if demand is not as expected, for example, due to increased sales around the time of major sporting events or carnivals, this results in non-optimal pricing, which leads to a decrease in potential revenue. So far, we have considered the potential gains in revenue when incorrect forecasts are updated under simpler revenue management pricing control mechanisms and found that revenue can be increased by up to 20%. This project will similarly seek to quantify the potential gains in revenue from updating the bid prices when unexpected demand is detected.
Here you can find details of the summer 2018 interns including a description of their research project.
Estimation of Diffusivity in the Ocean
Degree: Lancaster University, BSc Mathematics
Supervisor: Sarah Oscroft
Diffusivity plays an important role in many real world problems, such as recovering missing objects lost at sea or predicting how an oil spill will spread. Specifically, it measures the rate at which particles spread out over time, for instance organisms or sediments transported through water. We can estimate diffusivity using satellite-tracked drifting instruments known as drifters. However, the ocean is highly unpredictable – two particles that start at the same location at the same time can end up following completely different paths to very different locations. This requires a statistical approach for the estimation of diffusivity.
Current techniques for estimating diffusivity provide inconsistent results so through statistical research, we aim to improve these techniques. My project compares some of these different methods and uses these to estimate diffusivity for a part of the ocean using real data collected by the global drifter program. This project applies time series techniques, with a particular focus on spectral analysis. I have used MATLAB to compare different estimators using both simulated and real data before plotting my results.
View Eleanor's Poster - Eleanor.
Investigating models for potential self-excitation
Degree: Lancaster University, BSc Mathematics
Supervisor: Zak Varty
This project explores models for which the data points occur randomly in space and time. The aim of this type of data is to model the locations of data points or events in addition to any information or marks associated with each occurrence. This can be achieved through point process models. The simplest example of this is the homogenous poisson process. In homogenous poisson process model events occur independently at random with a uniform intensity.
The first aim of the project is to look at methods for assessing the validity of the assumptions for any data set to fit the homogenous poisson process model where the assumptions are satisfied. The next aim is to study complex data sets where the assumptions made no longer hold. Then to use different models which have fewer or weaker assumptions and the subsequently assessing any improvements in the model fit.
During the project there is a choice of two data sets. The first of which is about armed conflicts across the globe. The second was about earthquakes above magnitude 1.5 in the Netherlands. For which the events are induced by gas extraction from the reservoir below the region.
Clustering On Web-Scraped Data
Degree: London School of Economics, BSc Statistics with Finance
Supervisor: Hankui Peng
The Office for National Statistics (ONS) are currently experimenting with new data sources to improve the representativeness of the Consumer Price Index (CPI), which is the official indicator for the inflation and deflation rates for the country. Web-scraped data is considered as a promising data source that come in huge volume and can be scraped easily and at high frequency. Therefore, if could incorporate web-scraped data into the index generating procedure, then price indices could be generated more effectively and at higher frequency.
However, web-scraped data do not always come in a way that can be immediately used for price index generation. The category labels for web-scraped prices usually follow the website categorisation that the data are scraped from, which does not necessarily match the categorisation that is used for the national price index generation. Also, some product information (product name, price, etc.) might be incorrectly scraped, due to the quality of the web-scrapers.
Clustering methods are a useful tool for tackling the aforementioned challenges that come with web-scraped data. The problems that we are interested in include both recognising the main clusters of products, given the web-scraped data as well as identifying the incorrectly scraped products. In this project, we will start by exploring the fundamental clustering methods that exist in the literature (k-means and spectral clustering methods, in particular). At a further stage, we will apply this techniques on a web-scraped dataset. Clustering performance evaluation shall be carried out to compare the existing methods and further extensions to the existing techniques shall be explored.
Investigating Trend in the Locally Stationary Wavelet Model
Degree: The University of Cambridge, BA Natural Sciences
Supervisor: Euan Mcgonigle
Outside of neat theoretical settings, time series are most commonly non-stationary. In fields from finance to biomedical statistics, time series rarely occur which have constant mean and/or autocovariance.
Wavelets are a class of oscillatory functions which are well localised in both time and frequency, allowing wavelet based transforms to capture information in a time series by examining it over a range of time scales. One prominent method for doing so with non-stationary time series is the locally stationary wavelet (LSW) model of Nason et al. (2000). Time series in the LSW model are assumed to be zero-mean. In practise this is rarely the case. Our aim is to explore the behaviour of the model when this assumption is weakened by investigating the effect of different trends on the LSW estimate of the wavelet spectrum.
We also plan to examine the treatment of boundary effects that appear in the wavelet coefficients of data near the end points of the time series. The time series are usually assumed to be periodic, however this too is a poor assumption in most non-zero mean cases. Our project will attempt to analyse the boundary effects caused by a trend and implement methods to reduce them.
View Cyrus' Poster - Cyrus.
Detecting Changes through Transformations
Degree: Newcastle University, BSc Mathematics and StatisticsSupervisor: Sean Ryan
Changepoint detection relates to the problem of locating abrupt changes in data when the properties of a given time series have changed. This can be extended into finding whether or not a changepoint has actually occurred and if there are multiple changepoints. This area of statistics is hugely important and has many real world applications such as medical condition monitoring and financial fluctuation detection.
The most studied method for detecting changepoints looks at changes in mean within a time series. This is a popular approach due to the fact that changes like these can be detected by transforming the data and then analysing changes in the mean of the transformed data. Other methods which may prove more accurate at detecting changepoints include looking at changes in variance.
My project aims to analyse various methods of identifying changepoints, whilst studying the advantages and limitations of each approach. This involves the construction and evaluation of numerous algorithms which are used to detect changepoints.
Optimisation Problems with Fixed Charges Associated with Subsets
Degree: Lancaster University, MSci Natural Sciences
Supervisor: Georgia Souli
Optimisation problems appear in a wide range of applications from investment banking to manufacturing. They involve finding the values of a number of decision variables (for example, the amount of different products that should be manufactured) to maximise (or minimise) a particular objective function (for example, profit), subject to a number of constraints. In many situations, the value of one or more of the decision variables must be an integer to give a feasible solution. These are called Mixed Integer Programs (MIPs).
The particular focus of my project is cutting planes. These are inequalities which are satisfied by all the feasible solutions to the MIP but not by all of the solutions that would be feasible if we ignored the integer constraints. The aim is to investigate different cutting planes in problems where we have fixed charges associated with subsets. In these problems, we have a set of continuous variables whose sum is bounded. We also have subsets of variables defined such that, if any variable in that subset takes a positive value, then a fixed charge is incurred. For example, the variables may represent the amounts of various items to be manufactured and the fixed charges would be start-up costs associated with machines involved in the production of subsets of these items. Cutting planes can be used to remove infeasible solutions to the MIP to focus in on the feasible region and hence the optimal solution to the problem.
Modelling the behaviour of Kepler light curve data with the aim of exoplanet detection
Degree: Warwick, BSc Mathematics
Supervisor: Alexander Fisch
Many exoplanets are detected via the so called transit method. This involves measuring the luminosity of a certain star at regular time intervals to obtain graphs known as light curves. A regular short sharp dip in luminosity could be caused by an exoplanet passing in front of the star. This sounds simple in theory but in reality there is lots of random noise, and the signal induced by planetary transits is very weak (even a planet the size of Jupiter reduces the luminosity of the sun by only 1% during a transit).
In order to remove some noise caused by phenomena such as sun spots NASA preprocesses their data to produce a so called whitened light curve. However their current method introduces complications and affects the signature of the transits, which makes the detection of the planets from the whitened data much harder.
My project will be focused on modelling the data in such a way as to not distort the transit signals. So far I have been using R to remove dominant sine waves from the data and will go on to investigate periodicity and autocorrelation within the data.
View James' Poster - James M.
Allocation of limited number of assets
Degree: Lancaster University, MSci Mathematics and Statistics
Supervisor: Stephen Ford
Having just completed my third year at Lancaster University and consider doing a PhD, the STOR-i internship was a great way for me to gain an insight into PhD life. The project I have been assigned is to do with assigning limited assets to a dynamical system. The problem that arises is if we choose to deploy an asset in the present it can’t be used later but it may be more rewarding to use it in the future. We wish to deploy them so that the reward gained is optimal. To do this, we use dynamic programming which is starting from the end and working backwards to the start, optimizing in stages, this doesn’t always yield an optimal solution but assuming certain properties of the system it will. The task at hand is finding the optimal policy, where a policy is a mathematical way to decide what decision should be made in the present given the current state of the system.
Heuristics for Real-time Railway Rescheduling
Degree: University of Bath, Mmath Mathematics with Industrial Placement
Supervisor: Edwin Reynolds
In railways networks, a single delayed train can delay other trains by getting in their way. This is called reactionary delay and is responsible for over half of all railway delays in the UK. Railway controllers therefore have to make decisions in real-time that minimise the amount of reactionary delay. Such decisions include ‘should I cancel a train, and if so which one?’ and ‘which train should leave the station first if they can only go one at a time?’ There currently exists algorithms that can find the optimal solution to these problems. However the amount computational time required to run the algorithm, especially on a large network, makes solving these problems in real-time infeasible. An alternative approach is use a heuristic, which solves the problem with a lower degree of accuracy but produces an answer in much less time. My project involves developing multiple heuristics, comparing their advantages and limitations and deciding on a final idea.
Preventing overfitting in Natural Language Processing.
Degree: The University of Manchester, BSc Mathematics
Supervisor: Henry Moss
Natural Language Processing (NLP) allows computers to understand human speech and writing. The standard approach in NLP is to fit the model in a way that avoids relying on features over-represented in the sample (known as overfitting). There are two methods: regularization and term-frequency weighting. There is no clear consensus on which method is best. Project’s aim is to investigate the relationship between these two approaches, alongside tests across a range of NLP tasks.
Here you can find details of the summer 2017 interns including a description of their research project.
Modelling Risk in Hazardous Material Transport
Degree: Lancaster University, BSc Mathematics
Supervisor: Chrissy Wright
The risk of an accident is an important factor to consider when transporting hazardous materials.
Because accidents can be deadly the route with the least overall risk should be chosen.
This project looks at how best to model the risk to enable safer routes to be taken.
Investigating bias in return level estimates due to the use of a stopping rule
Degree: Lancaster University, BSc Mathematics
Supervisor: Anna Barlow
There are many situations in which rare and extremely large (or small) events are of interest. For example, the focus of my project is the statistical modelling of extreme flood events. Extreme Value Theory is concerned with the modelling of the tails of the distribution and provides a theoretically sound framework for the study of extreme values. In particular, the Generalised Extreme Value distribution is used to model the maxima of a process within blocks of time (often a year). Usually, we are mostly interested in estimating the x-year return levels of a distribution, that is, the value we'd expect to be exceeded on average once every x years. However, the point at which we decide to stop sampling and analyse the data is not arbitrary and this choice of stopping point can result in biased return level estimates. After the December 2015 floods there was much interest in re-evaluating the return level estimates, as the inclusion of such a large event often led to significant changes in the value of these estimates. In this project, we will consider possible ‘stopping criteria’ (i.e. rules that tell us when to stop sampling data and do our analysis) to approximate the procedures used in reality and investigate the bias in the standard estimates. We will implement a variety of new estimators developed with the intention to improve upon the existing standard methods.
Time series Classification
Degree: Lancaster University, MSci Mathematics and Statistics
Supervisor: Harjit Hullait
The internship project will be focused on Time series classification, an area that has applications in various fields. The idea is to build a classifier, which is able to label a time series from a defined list of possibilities.
For example if we have heart rate time series for people walking and people running, we have two label: runner or walker. There are two main challenges in classification, firstly a set of labels needs to be chosen and secondly a classifier needs to be built that can label the time series.
Analysis of Armed Conflict Data
Degree: Lancaster University, MSc Mathematics
Supervisor: Christian Rohrbeck
The Armed Conflict Location & Event Data Project (ACLED) has aggregated the exact location, date, and other characteristics of several violent events in unstable and warring states. The analysis of this data is challenging due to the vast amount of factors influencing such events. Koren and Bagozzi (2017)( Journal of Peace Research, 54(3)) find, for instance, that, in times of war, violence against civilians occurs more frequently in areas with a high percentage of cropland. This result is derived based on a zero-inflated model which accounts for armed conflicts not being present in all areas at all times. The proposed project considers the publicly available data and aims to slightly extend the model by Koren and Bagozzi (2017), for instance, by accounting for the spatial aspect of the data. In particular, the project can be split over three steps: (i) Exploratory analysis of the Data, (ii) Estimation of a similar model which to the one by Koren and Bagozzi (2017) and (iii) Extending the model.
Assessing the Use of Spatial Models for Extremes
Degree: Lancaster University, MSc Mathematics
Supervisor: Rob Shooter
Being able to model spatial extremal behaviour (in particular spatial dependence) is an important area of Extreme Value Theory and this project will aim to give an introduction into the various methods of trying to capture this behaviour. The first part of this project will provide a short introduction to univariate extreme value theory and also will look at some methods of spatial statistics - in particular looking at Gaussian Processes, which will be simulated and have interpolation methods performed on them. The second part will introduce the Smith process (a particular type of max-stable process) and will compare this to using Gaussian Process techniques on data, with the aim of comparing how well the two types of spatial model are able to describe the nature of the data.
Sequential Changepoint Detection: Anticipating the next Financial Crash
Durham University, MMath Mathematics
Supervisor: Sam Tickle
Changepoint detection underpins virtually all questions of interest surrounding data analysis in a variety of contexts. Understanding the nature of a change, and when it occurred, is often of vital importance in preventing problems surfacing in the future. With the advent of Big Data, more sophisticated tools are increasingly required to search for changes on datasets of ever-growing size. Most existing methods for changepoint detection are offline, requiring the collection of an entire dataset prior to analysis, and interest in online techniques, where informed statements regarding changes of the recent past can be made in tandem with data collection, is growing.
This project will examine various existing methodologies which employ an online approach to changepoint detection, both Bayesian and frequentist, and attempt to apply these ideas to real-time datasets (for example, share price data for various FTSE100 companies) in order to find the best performing algorithms which can operate most efficiently in the greatest number of contexts. Depending on specific interests, this can involve exploring prior selection, investigating various 'control charts’ or using likelihood-based approaches among other options. There is also potential scope in helping to pioneer entirely new techniques which can then be tested against some of the existing methods.
Combination therapies: improving outcomes via the probability of success
Degree: University of York, MMath Mathematics
Combination therapies are able to hit the many mechanisms of diseases/cancers simultaneously by combining existing drugs and new molecular entities. When developing a combination therapy, the aim is to produce a synergistic effect while reducing side effects. However, drug development is a long and expensive process which is subject to a considerable amount of uncertainty. Therefore it is important that the decisions made are well informed and are expected to be the most beneficial to both the pharmaceutical company and the patient population.
Methods for decision making often require several parameters relating to a drug. We are interested in the estimation of the probability of study success for combination therapies. Current methods do not allow information to be shared across similar combinations. We believe that incorporating this information in a Bayesian setting will improve the accuracy of our estimates. This will lead to better decision making and improve the outcomes of the development programmes.
Simulation Optimisation Techniques for Time-Dependent Staffing Problems
Degree: Lancaster University, BSc Physics
Supervisor: Luke Rhodes-Leader
In many real world problems, such as complex queueing problems, mathematical models of the system can be too complex to solve analytically. An alternative way to study stochastic systems is to use a simulation to produce realisations of the system. Simulation can be used to optimise a system by testing alternative settings. The choice of optimisation technique depends heavily on the properties of the problem, such as size of the solution space, how many objectives there are and whether the decision variables are discrete or continuous. Due to the stochastic nature of the problems, the optimisation is further complicated as the objective must be estimated, rather evaluated exactly. This project will focus on finding simulation optimisation techniques appropriate for the optimal staffing problem for a time dependent queueing system, such as that of an emergency call centre.
Executing Offshore Maintenance Activities
Degree: Lancaster University, MPhys Physics
Supervisor: Toby Kingsman
At the start of the internship time will need to be spent learning about the general offshore maintenance problem and literature associated with it. This could be simpler sub-problems such as the travelling salesman problem, travelling repairman or scheduling of tasks. Depending on the student’s knowledge of linear programming and coding, time could be spent trying to implement one of these models on the computer.
The goal of the project is likely to be creating some simple construction heuristics to solve the offshore maintenance routing and scheduling problem. These could be extended to more general problems depending on the student’s interest, e.g. several vessels or tasks completed in stages. The performance and results of these heuristics could be compared across several instances.
Scheduling using Optimisation
Degree: University of Southampton, BSc Mathematics and Statistics
Supervisor: David Torres Sanchez
The project will focus on one the main optimisation scheduling problems. Project planning, it refers to the programming of different activities that need completion for a given project. It is also heavily conditioned by the specifications on the resources and activities, making the problem really interesting for mathematicians. In this project we will be focussing on understanding the so-called resource-constrained project scheduling problems (RCPSP). The generality of the RCPSP allows it to have a wide range of applications where the aim is to schedule some activities or jobs over a period of time such that precedence and resource constraints are satisfied, and a certain objective function is optimised. Depending on the student’s knowledge of linear programming and optimisation we can study the varied formulations or if the student is familiar with it we can jump straight into the pre-emptive case for long term planning horizons. Either of these tie in with testing on Python using Gurobi which will be learnt if needed.
Associated Interns 2017
Management Science Intern
Waseem joins the STOR-i Internships from the Lancaster University Management Science department.
Here you can find details of the summer 2016 interns including a description of their research project.
Regression with Dependencies and Non-Gaussian Noise
Degree: University of Cambridge, BA Mathematics
Supervisor: Stephen Page
The linear model is a widely used tool in regression analysis. Linear regression models are most commonly fitted using them both conceptually and computationally simple least-squares approach. A frequently made assumption in linear least squares regression is that the error terms between the observed responses and the corresponding expected values are independent and identically distributed normal random variables. This assumption greatly simplifies the matter of obtaining confidence intervals for the unknown parameters of our model. However, whether this is a sound assumption depends on the size and nature of the particular dataset under consideration. This project will investigate the case when the assumption is not satisfied. Various techniques for obtaining confidence sets will be examined and compared to the sets obtained via normal approximation. The effects of different possible violations of the Gaussian assumption on the constructed confidence sets will be investigated.
Input Uncertainty in Simulation Models
Degree: University of Birmingham, MSci Mathematics
Supervisor: Lucy Morgan
The simulation uses mathematical modelling in order to mimic real-world systems which cannot be tested in reality; perhaps due to time, cost or safety constraints. The information gained by running the simulation can then be used to make decisions about the real-world system. For example, retailers want to ensure they have enough servers to prevent customers from having to queue for long periods of time. A simulation model can be used to understand how the queue behaves and make a decision about how many servers are needed for each shift in order to keep the queue length below a certain level. The inputs in simulation models are usually approximated by observing real-world data; for example, observing the number of customers that are served in a shop over a period of time. Input uncertainty arises from the fact that we only have a finite amount of real-world data, and therefore cannot be certain that the values of the input parameters that are being used to drive the simulation are the true values of the input parameters. This project aims to quantify the input uncertainty in a queueing simulation model.
How good is the Lancaster University Mathematics Department? – An investigation using Data Envelopment Analysis
Degree: BSc Mathematics, Heriot-Watt University
Supervisor: Emma Stubington
Each year university league tables are released but many are based on different criteria and have slightly different results. We are interested in testing the efficiency and productivity of mathematics departments across the country. As we are considering multiple inputs and outputs: student satisfaction, entry requirements, academic and career attainment and the cost of university, etc. it is difficult to make direct comparisons between institutions. We therefore need to use a management science method, Data Envelopment Analysis, (DEA) which can cope with lots of constraints. What I am finding particularly interesting is the additional questions that arise from examining the data and implementing this approach, for example: Should universities that produce high numbers of good degrees be considered the best? Are some students not reaching their potential and are being let down by their institution, given they entered university with extremely high entry requirements? Are some universities awarding an unrepresentative number of good degrees considering their place in current league tables, or is the data just extremely bias with a small sample size? Should all universities be charging the same fees, given their career opportunities after are significantly less? Is university location skewing the career prospects of students, whilst not taking into consideration the living costs and average salary of non-graduates of some locations? As my project advances, I have realised that what seemed like a simple linear programming problem evolves into a complex social and economic issue, which questions the real cost to students when choosing which university is best for them.
Supervisor: Oliver Hatfield
Detecting Match-Fixing in Tennis
In January 2016, tennis was hit by allegations of widespread match-fixing prompted by the release of secret documents from reviews into tennis’ integrity. The documents detailed widespread accusations of corruption within the sport. The aim of the project is to create simulations of tennis matches and explore sudden changes in performance, which could be linked to match-fixing, using simple change point methods. Features such as dependence and the importance of critical points will also be taken into account to create accurate simulations. In addition the current rating system within tennis only takes into consideration the previous years results and has no consideration on the strength of opponents. A further aim of the project is to create a rating system based around the ELO system with improvements.
Supervisor: Aaron Lowther
Detecting Unwanted Variation in Time Series
A statistical outlier in a set of data is defined to be “an observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data”. In the context of time series, examples of outliers may include the number of complaints received by BT after a power outage, or the increase in supermarket sales during the days leading up to Christmas. It is important that we are able to detect these outliers as they may have a significant impact on the model selected to fit the data, the parameter estimates for the model, and consequently, on any forecasts made from the model. This project will look into methods and algorithms that are able to automatically detect outliers in time series.
Supervisor: Emma Simpson
Assessing Dependence in Extreme Values
Extreme value theory models the maxima (minima) of random variables. By their very nature, they occur infrequently and so are hard to model. A robust framework already exists, with block-maxima and threshold-based approaches providing parametric distributions for the maxima. Known as the Generalised Extreme Value (GEV) and Generalised Pareto (GP) distributions, these allow us to estimate the maximum value that we would expect to see over n years. My project looks into the bivariate case, where our variables have extremes that either occur simultaneously (Asymptotic Dependence) or independently (Asymptotic Independence). There already exist several statistical measures that measure this behaviour however it is hard to obtain reliable estimates of their values. I am looking at developing an alternative method to simultaneously estimate two of these measures, with the hope of finding some synergies.
Improving question selection in education software
Degree: University of Sheffield, BSc Mathematics
Supervisor: Ciara Pike-Burke
With the advance of technology in education, it is becoming more possible to personalise education software, providing students with questions tailored to their individual learning styles and abilities. The data gathered from the students' previous interactions with the education software can be used to simulate students response to future data. This enables us to model student performance. The main aim will be to investigate whether Bayesian methods can provide a more accurate prediction of student performance over frequentist methods. The Bayesian approach looked into Monte Carlo Markov Chains and Random Walk Metropolis. The models will be used to predict whether students would pass an exam of particular questions.
Assigning Drones in Military Search
Degree: University of Edinburgh (Mmath)
Supervisor: James Grant
Drone technology is fast becoming a vital component of military operations. Unmanned Aerial Vehicles (UAVs), as they are known within the military, can perform a variety of tasks remotely making things both more efficient and safer for military personnel. This project revolves around optimizing the UAV Search Problem by maximising the number of events detected within a given border by a fleet of UAVs equipped with cameras. The UAVs aim to detect the locations of events of some sort occurring on the border (one example may be crossings of the border). Each UAV is to be assigned a specific subsection of the border to patrol, with the assumption being that the larger its subsection is, the less likely it will be to actually detect an event. Some UAVs may be naturally better at detecting events than others (because of better cameras etc.) and some UAVs may be better equipped to detect events in certain parts of the boundary (e.g. different types of terrain).
Univariate methods for time series forecasting
Degree: University of Cambridge, BA Mathematical Tripos
Supervisor: Daniel Waller
Time series are often grouped in a hierarchical structure. For example, the time series for the total number of tourists visiting a country may be split into more time series according to the purpose of travel, and each of these time series may, in turn, be split into more time series according to the length of stay, thus creating a tree-like hierarchical structure. The issue of forecasting hierarchical series in a way that allows for a similar hierarchical disaggregation of the forecasts is very important. This project will combine two methods that have recently been proposed, optimal combination and temporal aggregation. It will then test the accuracy of this new method against that of optimal combination and other standard techniques such as bottom-up and top-down forecasting.
Detecting Changes in Multivariate Time Series
Degree: University of Edinburgh, Mathematics BSc (Hons)
Supervisor: Rebecca Wilson
Changepoint detection of univariate time series has been widely covered but the increasing availability of multivariate data has motivated the study of multivariate detection methods. Time series data of a multivariate flavour can be found in finance, health monitoring, signal processing, bioinformatics, and detecting credit card fraud. In my project, I explore a few methods to detect change points of multivariate time series data. I also discuss the drawbacks of these methods and suggest ways in which these drawbacks could be overcome.
Here you can find details of the summer 2015 interns including a description of their research project.
Statistical inference for evolving network structure
Degree: University of Cambridge, BA Mathematics
Supervisor: Matthew Ludkin
Networks are prominent in today’s world. The volume of telecommunications and social network data has exploded in the last two decades. Gaining a statistical understanding of the processes generating and maintaining network structure can be used to make confident statements about properties of a network, detect anomalous behaviour or target adverts. In recent years more data has been collected alongside the network. Can such covariate information improve inference for network structure compared to network data alone? Many have attempted to model how networks grow, however, most models have poor statistical properties. This project will investigate approaches for combining statistical methodology from static modelling techniques with methods for analysing data indexed through time.
Modelling extra-tropical cyclones using extreme value methods
Degree: Lancaster University, MSci Mathematics
Supervisor: Paul Sharkey
The prevalence of extra-tropical cyclones in the mid-latitudes is a dominant feature of the weather landscape affecting the United Kingdom. The UK has come to expect a consistent annual pattern of temperate summers and mild winters. However, in recent years it has been a focus of extreme weather events, for example, major floods and damaging windstorms. Accurate modelling and forecasting of extreme weather events are essential to protect human life, minimise potential damage and economic losses, and to aid the design of appropriate defence mechanisms. In this context, an extreme event is one that is very rare, with the consequence that datasets of extreme observations are small. The statistical field of extreme value theory is focused on modelling such rare events, with the ideology of extrapolating physical processes from the observed data to unobserved levels. This project will focus on applying extreme value methods to remote sites in the North Atlantic and European domain.
A Linguistically-Motivated Changepoint Problem from a Bayesian Perspective
Degree: University of Glasgow, MSci Mathematics
Supervisor: Sean Malory
Sequences arise naturally in linguistics with the number of occurrences of a linguistically salient feature changing over time as language attitudes evolve. One such feature is the use of flat adverbs, for instance, in the phrase “fresh ground coffee" the word “fresh" is a flat adverb, since it functions as an adverb but lacks the typical suffix “ly". While not as widespread nowadays, flat adverbs were commonly used during 1700-1900. Authors of this period used flat adverb forms and were publicly criticised for doing so. This project will introduce a Bayesian statistical framework to investigate whether the rate of flat adverb use changed significantly after an author's writing had been subjected to such criticism. This will focus on the detection of changes in a sequence of data points using a Bayesian approach, specifically, we will be interested in quantifying (in a precise way) whether or not a change in the sequence has occurred at some point.
Density-based cluster analysis
Degree: University of Reading, MMath Mathematics
Supervisor: Katie Yates
Cluster analysis is the process of partitioning a set of data vectors into disjoint groups (clusters) such that elements within the same cluster are more similar to each other than elements in different clusters. Clustering has a wide range of application areas including Biology, Physics, Computer Science, Social Science and Market Research. There are three main categories of algorithms which can be applied in order to find solutions to data clustering problems: hierarchical, partitioning and density-based. The main focus of this project is to explore density-based clustering methods and to compare the performance of these algorithms via simulation studies.
Classification in streaming environments
Degree: Newcastle University, MMath Mathematics
Supervisor: Andrew Wright
The aim of a classification model is to predict the class label of a new observation using only historical observations. Traditional classification approaches assume this historical dataset is a fixed size and is drawn from some fixed probability distribution(s). However, in recent years a new paradigm of data stream classification has emerged. In this setting, the observations arrive in rapid succession, with classifiers capable of being trained sequentially, and an adaptable underlying probability distribution. These classifiers have applications in areas as diverse as spam email filtering, analysing the sentiment of tweets and high-frequency finance. This project will investigate how models can be used to produce streaming versions of classifiers.
Auto-Correlation Estimates of Locally Stationary Time Series
Degree: London School of Economics, BSc Mathematics and Economics
Supervisor: Jamie-Leigh Chapman
A time series is a sequence of data points measured at equally spaced time intervals. Examples of time series include FTSE 100 Daily Returns and the total annual rainfall in London, UK. Often we assume that such series are second-order stationary. In other words, the statistical properties of the time series remain constant over time, e.g. the autocorrelation. However, the reality is that many time series are not second-order stationary and therefore it is not appropriate to model them using such methods. Instead, we must consider time-varying equivalents of the autocorrelation or autocovariance. One method that analysts use to adapt the regular autocorrelation function to be a time-varying quantity, is applying rolling windows of the data. Unfortunately, this can present quite different answers for segments of different lengths based on segment length choice and location of the time series sample. This project will explore alternative methods of estimating a time-varying auto-correlation function in order to overcome these problems.
Regression, curve fitting and optimisation algorithms
Degree: University of Cambridge, BSc Mathematics Tripos
Supervisor: Elena Zanini
The underlying strategy for most statistical modelling is to find parameter values that best describe the fit of the model to the data. This requires optimising an objective function while minimising the difference between the model and the observations. When analytical solutions to the optimisations are unavailable, statisticians often rely on numerical optimisation routines to perform this fit, trusting that this will produce stable estimates of the parameters. Firstly, some issues may arise in the choice of the best algorithm given the characteristics of the problem at hand. Secondly, the algorithm considered may not actually perform well and needs to be understood and adapted to work better on the model considered. This project will investigate different numerical optimisation algorithms used in statistical inference and curve fitting, and how to overcome some of the problems associated with these types of algorithms.
Degree: Lancaster University, MSci Mathematics
Supervisor: Helen Barnett
In medical research, in both pre-clinical and clinical trials, the objective is to learn about the behaviour and effect of potential new drugs in the body. This breaks down into two categories- how the drug affects the body (Pharmacodynamics) and how the body affects the drug (Pharmacokinetics). This application-driven project focuses on pharmacokinetic modelling, which involves modelling the concentration of a compound in the blood over time. The aim of the project is to apply statistical modelling techniques to real data in order to obtain an understanding of the role of pharmacokinetics in the drug development process.
Here you can find details of the summer 2014 interns including a description of their research project.
Anna Maria Barlow
Spatiotemporal Modelling of Economic Data using Disease Mapping
Degree: University of Durham, MMath Mathematics
Supervisor: Christian Rohrbeck
The field of statistics focusing on models incorporating spatial information is called Spatial Statistics. Spatial statistics generally distinguishes between three types of data: geostatistical data, lattice data and spatial point patterns. This project will focus on lattice data, where the number of sites at which observations are recorded is finite, for example, the population in each county of the UK or the results of the last general election per district. Spatial statistical methods for lattice data are often applied in epidemiology to model the occurrence of a disease in a region depending on covariates. This is known as Disease Mapping, with models aiming to predict the occurrence rate or the number of cases of a particular disease. This project will investigate the basic methods used in Disease Mapping and apply them to economic data.
Relocation Operations in One-Way Car-Sharing Problems
Degree: University of Glasgow, MSci Mathematics
Supervisor: Burak Boyaci
Car-sharing is a new concept that enables the general public to access a fleet of vehicles for short rental periods. These systems have several benefits including environmental, energy and societal considerations. Car-sharing systems have two general types; the restrictive “two-way” system where users pick up and drop off the vehicle at the same location, and the more flexible “one-way” system enabling the users to choose a different drop-off location to the pick-up station. For the customer, the one-way system is generally preferred however one of the difficulties in implementing a one-way system is managing the relocation of vehicles and personnel. This project will develop and implement models for improving relocation operations for the one-way car-sharing problem.
Modelling ocean environments with extreme value theory
Degree: University of Durham, MMath Mathematics
Supervisor: Monika Kereszturi
Offshore structures such as oil platforms and vessels must be designed to have very low probabilities of failure due to extreme weather conditions. Inadequate design can lead to structural damage, lost revenue, danger to operating staff and environmental pollution. Design codes demand that all offshore structures exceed specific levels of reliability, most commonly expressed in terms of an annual probability of failure or return period. Hence, interest lies in environmental phenomena that occur extremely rarely, and we want to estimate the rate and size of future occurrences. The aim of this project is to gain a deep understanding of extreme value theory in the application of ocean environments.
Analysis of Algorithms for yield optimization and batch scheduling problems
Degree: University of Birmingham, MSci Theoretical Physics and Applied Mathematics
Supervisor: Trivikram Dokka
A common scheduling problem in industrial settings is concerned with scheduling jobs on identical machines with the objective of minimizing the total active time. The problem finds important applications in the field of (energy-aware) scheduling especially in applications relating to optimal network design. The aim of this project is to investigate the performance of some natural heuristics proposed for finding near-optimal solutions to these computationally hard problems. This will involve learning about the integer and linear programming formulation based methods and using computer programming to implement algorithms and solve linear programs.
Seasonally Adjusting Official Time Series
Degree: Lancaster University, BSc Mathematics
Supervisor: Rebecca Killick
The Office for National Statistics (ONS) publishes thousands of seasonally-adjusted time series which are used to produce the official statistics that create the news headlines regarding, for example, increase/decrease in unemployment and double or triple-dip recessions. Seasonal adjustment involves estimating and removing a seasonal component from a time series. This project aims to develop and test a method for the automatic detection of changes in the seasonal pattern of time series by comparing alternative methods and assessing the impact on the estimation of seasonal factors for series that do and do not present changes in the seasonal pattern.
Fast inference for processing intelligence information
Degree: University of Bath, MMath Mathematics
Supervisor: Lisa Turner
Intelligence is information regarding threats to national security and potentially hostile forces. After raw intelligence data is collected it must be processed and screened, often in time-critical situations. Only relevant information is then passed on for further analysis. With huge amounts of intelligence data collected daily, potentially relevant information can be missed. Given a set of intercepted communications, how should we process the communications to maximise the amount of relevant information passed on for analysis? This project will develop a model for processing intercepted information and explore how to overcome problems associated with this type of model.
Assessing Performance of Changepoint Detection Algorithms
Degree: University of Cambridge, BA Mathematics
Supervisor: Kaylea Haynes
Changepoints are a widely studied area of statistics with applications including, but not restricted to, finance; detecting changes in volatility, computer science; detecting instant messaging worms and viruses and environmental such as oceanography and climatology. Changepoints are considered to be the points in a time-series where we experience a change in some statistical property, for example, a change in mean or a change in variance. There are many different approaches to changepoint analysis however current methods have the trade-off of being fast but approximate or exact but slow. The aim of this project is to develop an understanding of changepoint detection methods and in particular explore ways in which we can assess the performance of different detection methods.
Explaining changes in aggregated time series
Degree: Lancaster University, MSci Physics with Mathematics
Supervisor: Lawrence Bardwell
In many applications, there is some indicator that is constantly monitored as new data are collected, for example in an industrial setting, the number of faults recorded on a large network per week. Typically at a managerial level interest lies in the total number of faults over the entire network and patterns or changes that may occur. One important change in this indicator is a spike (outlier) where suddenly there is a large increase in the number of faults over the entire network. Understanding why these sudden increases occur is important so they can be prevented from happening again. This project will investigate methods for detecting outliers in large time-series datasets.
Selection of Tolerance Level for Approximate Bayesian Computation
Degree: University of Warwick, MMath Mathematics
Supervisor: Wentao Li and Paul Fearnhead
For many complex datasets, one feature is that the likelihood of the statistical model is intractable, in the sense that it is difficult to evaluate the likelihood values of the observations, and standard inference methods for unknown parameters, like Maximum Likelihood Estimation and Monte Carlo Markov Chain, do not work. For intractable problems of which sampling from the likelihood given parameter values is easy, Approximate Bayesian Computation (ABC) is a useful Bayesian inference method using Monte Carlo simulations. The project will investigate the impact of the tolerance level, a core parameter of the ABC algorithm, in various situations and try to design an automatic algorithm to select the tolerance level.
Travelling Salesman Problem
Degree: Sheffield University, BSc Mathematics
Supervisor: Ivar Struijker-Boudier
Scheduling problems can be found in many industrial settings. The complexity of scheduling problems is often such that optimal solutions cannot be guaranteed to be found in short computational time. However, many companies need to produce schedules on a daily basis, so they need a computationally fast way of implementing this. A well-known example of a difficult to solve scheduling problem is the travelling salesman problem (TSP) which is concerned with finding the shortest route which visits each of a number of locations exactly once. If every location can be travelled to directly from every other location, then the number of possible solutions increases very quickly as more locations are added to the problem. Evaluating every possible solution then becomes impossible. This project will explore the travelling salesman problem and will assess and compare various solution methods for the TSP.
Modelling solar irradiance for energy generation
Degree: University of Bath, MMath Mathematics
Supervisor: Nikos Kourentzes
The increasing investment in renewable energy is essential to guarantee immediate answers both to the high and fluctuating prices of crude oil and to the diversification of energy supplies, thus reducing external dependence on oil, gas and coal. Therefore, solar power generation becomes an area of paramount research. Various time series methods have been implemented to forecast solar irradiance for power generation however a complication with solar irradiance data is that of multiple seasonalities- seasonality from the day-night cycle and the annual earth cycle. This project will attempt to tackle some of the questions related to modelling the seasonal element of solar irradiance using time series and forecasting models.
Here you can find details of the summer 2013 interns including a description of their research project.
Statistical Modelling in Sports
Degree: University of York, MMath Mathematics (2010-present)
Supervisor: George Foulds
To aid the application of betting and investment strategies an edge must be sought over the market. Simulation modelling is a crucial part of this process, providing evidence to support real-world data analysis and professional conjecture. This project will introduce the student to the use of statistical modelling in the prediction of sports results and allow them to adapt a well-known model using their findings from freely available real world data.
Detecting Changepoints in Multivariate Data Series
Degree: University of Edinburgh, BSc Mathematics (2010-present)
Supervisor: Ben Pickering
Data collection is a huge component of the workings of any modern organisation. There are many examples of situations where data is collected from multiple sources which may be related in some way, for example, the stock prices of multiple companies in the same industrial sector. While the nature of these data values may stay fairly constant over time, occasionally some event may occur which causes a sudden change in the values being recorded at all sources, for example, in financial data, there may be a stock market crash. The times at which such changes occur are known as multivariate changepoints. This project will explore the effectiveness of current multivariate changepoint methods.
Learning in Dynamic Environments
Degree: University of Cambridge, BA Mathematics (2010-2014)
Supervisor: David Hofmeyr
Machine learning is a field of artificial intelligence focused on developing algorithms which allow computers to evolve and improve their behaviour as a result of empirical data. In the context of this project, this refers to the construction of a data-driven model to aid in a predefined task. The task might be something basic like making predictions based on a simple regression model, or it might be highly complex like describing intricate biological systems. This project offers a variety of possibilities due to the lack of specificity in online learning, and there is considerable flexibility for its direction depending on the student’s preference.
Bayes Sequential Decision Problems
Degree: University of Bath, MMath Mathematics (2010-present)
Supervisor: James Edwards
Many important decisions have to be made under uncertainty because the information that is relevant to the problem is missing or is only known imperfectly. Often, these decisions are not taken in isolation but in a sequence. New information that becomes available as a result of our actions can then be used to make better decisions in the future. However, the actions that give the best short term results may not be the same actions that give the most information. This presents a trade-off between taking the actions that are best in the short term and the need to learn for better long term results. This project will explore a number of statistical theories for dealing with decision problems followed by testing and selecting optimal methods.
Multiple Changepoint Detection in Non-Trivial Models
Degree: University of Durham, MMath Mathematics (2010-2014)
Supervisor: Rob Maidstone
Time series data sometimes experiences abrupt changes in structure. These changes are called changepoints. To model the data effectively these changepoints need to be detected and subsequently built into the model. Changepoints occur in a variety of real world situations, for example when analysing human genome data the average DNA copy value is usually around the same level, however occasionally sudden changes away from this level occur. These sudden changes in average DNA level often relate to tumourous cells and therefore the detection of these changes is critical for classifying the tumour type and progression. This project will introduce the student to changepoint models and involve programming these models using statistical computing software.
Background Subtraction: Methods for Video Analysis
Degree: Lancaster University, BSc Mathematics (2011-2014)
Supervisor: Rhian Davies
Surveillance cameras have become ubiquitous in many countries, collecting a huge amount of data, most of which is stored and never analysed. Converting this data into useful information can be problematic, particularly as large companies often use many cameras simultaneously. Often it is of interest to the user to detect anomalies in video footage, for example a person placing an item in their bag instead of their shopping trolley. In order to detect such anomalies, we first need to separate the foreground and the background of a video. One popular method for splitting the foreground from the background is background subtraction. The aim of this project is to investigate the effectiveness of different algorithms for background subtraction under a number of real and challenging scenarios.
Evaluating the Structure of the Excitability Curve of Motor Neurons
Degree: University of Manchester, BSc Mathematics with Spanish (2010-2014)
Supervisor: Simon Taylor
Scientists in the field of neuromuscular research are interested in understanding the structures and processes involved in operating a working muscle. The fundamental component to this process is a motor unit: consisting of a single motor neuron and a collection of muscle fibres that it governs. Evaluating the number of motor units that form a working muscle is very important in understanding the effects of various neuro-degenerative disorders and also in assessing the effectiveness of proposed treatments. The aim of this project is to analyse data from the stimulation of a single motor unit using importance sampling and Bayesian statistics.
The Unit Commitment Problem and Wind Energy
Degree: University of Bath, BSc Mathematics (2011-2014)
Supervisors: Pedro Crespo del Granado and Franklin Djeumou Fomeni
The UK’s wind renewable resources share in the grid energy generation mix is expected to be around 20-30% by 2020. Wind generation, however, creates new planning challenges to maintain a stable and reliable supply-demand balance. Since wind generation fluctuates independently from energy demand, this creates a disturbance for the short term generation planning and scheduling of other generation units (such as gas or coal power plants). This brings a new degree of uncertainty on stabilizing the power network equilibrium between supply and demand in real time. This project will use optimisation modelling to answer the question what is the optimal cost-effective mix of energy units needed to achieve carbon reduction targets whilst also coping with high wind input?
Betting Markets and Strategies
Degree: University of Bath, BSc Mathematics (2010-2013)
Supervisor: Tom Flowerdew
Markets come in many forms. From buying and selling livestock to trading complex financial derivatives the key to making long-term profits is to establish an edge on the market. Once an edge has been established, the question is how can wealth be optimised? This project will investigate ways in which existing theory can be adapted to fit into sports betting markets, and ways in which underlying assumptions can be removed to allow the theory to become more general.
A study of the air quality of major cities in China
Degree: University of Durham, MMath Mathematics (2010-2014)
Supervisor: Ye Liu
The air quality in some major cities in China has long suffered from the rapid industrialisation and increasing vehicle usage. With the help of social network and media coverage, this issue has gradually come to the concerns of the government as well as the general public. This research project will aim to gain some insight into the air pollution problem in China using classical statistical techniques such as time series analysis and extreme value theory.
Modelling droughts with extreme value theory
Degree: University of Durham, MMath Mathematics (2011-present)
Supervisor: Hugo Winter
Droughts are large scale climatic phenomena that can lead to social and economic damages. In Africa, periods of drought can lead to food instability and large death tolls as well as having a knock-on effect on the economies of major aid providers. In the UK, a drought could cause reservoirs to run low and lead to government legislature such as hose-pipe bans seen over the last few summers. It is of great concern to governments and industry where and when these events may occur and also whether their occurrences will differ in the future with anticipated global climate change. Using standard statistical techniques for rare events will potentially result in badly fitting models and worse, to misleading policies. With such rare and sparse data a more reliable approach is needed; this is called extreme value theory. This project will introduce the student to extreme value theory and its applications for drought data.
Resource Allocation in Service Industries
Degree: University of Durham, MMath Mathematics (2010-2014)
Supervisor: Emma Ross
The effective allocation of resources to meet demand is an essential consideration of any company hoping to survive in a competitive market. This and many other important decision problems can be formulated as a well-known combinatorial optimisation problem called the knapsack problem. It forms a basis from which to study such decision problems, but we quickly run into difficulty when the complexity and scale of real problems faced in industry are incorporated. This project will allow the student to investigate the impact of uncertainty in resource allocation problems by introducing them to linear programming.
Here you can find details of the summer 2012 interns including a description of their research project.
Scenario Generation for Stochastic Programming
Degree: University of York, MSci Mathematics (2009-2013)
Supervisor: Jamie Fairbrother
Often we have to make decisions in the face of uncertainty. A shop manager has to decide what stock to order without knowing the exact demand for each item. An investment banker has to choose a portfolio without knowing how the values of different assets will evolve. Taking this uncertainty into account allows us to make good robust decisions. This project uses stochastic programming as a tool to investigate such decision-making processes.
Modelling anti-terrorist surveillance systems from a queueing perspective
Degree: University of Cambridge, BA Mathematics with Physics (2010-2013)
Supervisor: Terry James
It is without question that surveillance is very much a part of the modern world. A growing interest in the need for surveillance has been matched by technological advances in the area. Surveillance cameras, either static or as part of an unmanned aerial vehicle have the ability to feed real-time information to a control centre. Here the subject under surveillance can be properly assessed in terms of their identity or possible intentions in a biometric fashion. This project explores an aspect of the emerging operational research field of Homeland Security. More specifically this project will consider the challenge of modelling the defensive surveillance of public areas which are subject to attack by terrorist subjects.
Spectral Analysis of Multivariate Time Series
Degree: University of Cambridge, BA Mathematics (2010-present)
Supervisor: Tim Park
The advent of smartphones has opened up new possibilities for the collection of data. These phones contain sensors such as accelerometers, gyroscopes and GPS making them a cheap and easy way for companies to collect time-series data. This data is often multivariate and nonstationary and often the main challenge is deciding which channels to focus the analysis on rather than the choice of analysis method itself. This project uses principal components analysis to identify which channel to focus on when analysing a multivariate time series.
Hybrid simulation models for maintenance processes
Degree: University of Birmingham, MSci Mathematics (2009-2013)
Supervisor: Mark Bell
One of the most widely used dynamic modelling methods in Operational Research for understanding and improving organisational systems is discrete event simulation (DES); an application of this method is in modelling maintenance processes. In a large organisation, there are often many additional interactions that affect maintenance operations. When this is the case, there are occasions where modelling the system using DES alone is not sufficient therefore System Dynamics (SD) may be utilised. In Operational Research these two approaches have traditionally been separated but in recent years there has been an emergence of using hybrid models that contain both techniques, as the limitations of each have been said to complement one another. This project initially involves building DES and SD models separately before finally combining the two models to create hybrid models of maintenance processes.
Clustering customers to estimate willingness-to-pay
Degree: Newcastle University, MMathStat Mathematics and Statistics (2009-2013)
Supervisor: Shreena Patel
Simple probability models are often inadequate for describing the data we encounter in reality because of heterogeneity in the population we are attempting to model. One way to overcome this is to use a mixture model which represents the population as consisting of several sub-populations (or clusters), each of which can be modelled by a standard parametric distribution. This project concerns a population of customers each of whom has an (unobservable) maximum price which they are willing to pay for a product, called a referral price. We wish to cluster customers to capture differences in their price-sensitivity by assuming that referral prices are generated by a mixture of normal distributions. Standard clustering techniques will be adapted in order to estimate how likely a customer is to accept future quotes.
Resource Allocation problems in queueing theory
Degree: Cardiff University, BSc Mathematics (2010-2013)
Supervisor: Jak Marshall
Queues occur naturally in business and computer science applications. So ubiquitous are queues in various situations that being able to model their behaviour is an essential skill for any practitioner or researcher of operations research. Often it is of benefit to simultaneously manage the flow of work in and out of multiple queues given limitations of service resources. This project introduces the rich theory of queueing systems and presents an opportunity to explore efficient ways of coping with random demands on a system with multiple parallel queues with cost structures imposed on them.
Prize-Collecting Steiner Travelling Salesman Problem with Time Windows
Degree: University of Cambridge, BA Mathematics (2010-present)
Supervisor: Saeideh Dehghan-Nasiri
The travelling salesman problem is a very well-known optimization problem. This project studies aspects of this problem with additional time window restrictions on the service time of customers and uses a real road network graph. Small scale versions of the problem may be solved using exact optimization techniques. This project looks at solving the problem using exact solution methods and developing and applying a dynamic programming algorithm that provides a lower bound for the problems of a larger scale.
Modelling the North Sea wave climate
Degree: University College Dublin, BSc Mathematical Science (2009-2013)
Supervisor: Ross Towe
Wave height is of inherent interest to oil companies with offshore operations. Through determining the distribution of wave heights, this information can be used to minimise the risk and consequently the cost of future offshore operations. A current consideration is also whether climate change will have an impact on the distribution of wave heights. This project considers extreme value theory for modelling wave heights in the North Sea.
Semi-Markov processes in a healthcare setting
Degree: Lancaster University, MSci Mathematics (2010-2013)
Supervisor: Dan Suen
Analysing healthcare systems has been an important concern of healthcare modellers for many years. Understanding patient flows and the number of patients in healthcare systems is an important tool when trying to improve hospital efficiency and, among other things, reduce patient waiting times. This project seeks to highlight similarities between healthcare models and the types of systems multigrade population models are applied to using data from a healthcare case.
Parameter Estimation with Particle Filtering Algorithms
Degree: University of Edinburgh, BSc Applied Mathematics (2009-2013)
Supervisor: Chris Nemeth
There exist numerous problems in statistics, engineering, signal processing, etc. which require the estimation of a hidden process. One such example can be found in target tracking, where the aim is to estimate the state of a target (e.g. position, velocity) given only partial, noisy observations (e.g. bearing measurements only). The process of estimating a target's state given only partial, noisy observations is known as filtering. This project involves gaining an understanding of particle filtering techniques and reviewing the current literature before using particle filtering methods to assess various models.
Here you can find details of the summer 2011 interns including a description of their research project.
Breast Cancer Screening
Degree: Lancaster University, BSc Mathematics (2009-2012)
Supervisor: Matt Sperrin
Breast density is a substantial risk factor for breast cancer. It can be estimated from mammograms, which are taken regularly for middle-aged women. A breast density reading can then be used to produce individualised monitoring for women (e.g. screening women with high breast density more frequently). However, breast density is estimated by radiologists subjectively. It is of interest to calibrate the breast density readings so that each radiologist’s scores are on the same scale, and assess the consistency of each radiologist. Data is available for the readings made by radiologists: we can attempt to exploit the fact that each mammogram is read twice by each radiologist, and each mammogram is read by two radiologists.
New Penalty Methods for Bilevel Optimisation
Degree: University of Warwick, MMath Mathematics (2008-2012)
Supervisor: Konstantinos Kaparis
Bilevel problems appear in areas such as economics, engineering, medicine and ecology. These types of problems are optimisation problems which include as part of their constraints a second optimisation problem. The upper level (or leader's) problem corresponds to our aim to optimise a certain function. The notion of optimality takes into account the subaltern part of the upper-level decisions. This part is represented by the lower level (or follower's) problem. This project concerns the linear case of bilevel programs.
The ABC of model choice
Degree: University of St Andrews, MMath Mathematics and Statistics (2008-2013)
Supervisors: Dennis Prangle and Paul Fearnhead
While Approximate Bayesian Computation (ABC) is now well-established for estimating parameters, its use for model-choice is still in its infancy. There have been recent papers disagreeing about whether ABC can be used for model-choice, and if so how it should be implemented. This project looks at some simple applications, to see whether ABC can give reliable inferences about the underlying statistical model; and if so, how to implement ABC so as to infer the model as accurately as possible.
Choice Modelling with Links to Optimisation and Compressed Sensing
Degree: University of Edinburgh, BSc Mathematics (2008-2012)
Supervisor: Arne Strauss
In many business applications, frequent decisions need to be made that depend on the choice behaviour of customers. For example, e-retailers such as Amazon.com must decide on the assortment of results to display in response to a customer query; airlines or hotels need to decide on the available booking classes to display in response to a customer request. Similar situations arise for many other firms. An often-used approach to choice modelling is to identify product attributes that influence the customer’s decision and to select and calibrate a structural model based on these attributes that fit the observed data. Recently, an intriguing way was proposed to learn a choice model from data using concepts from Revenue Management, Inventory Optimisation and Compressed Sensing. This project gives an insight into these respective fields whilst working on a topic that is currently at the forefront of research and has wide applicability.
Forecasting using time series methods
Degree: Lancaster University, MSci Mathematics with Statistics (with a year abroad: Australia) (2008-2012)
Supervisor: Robert Fildes
One of the most important applications of statistics is the time series forecasting. The key application area is to forecast demand (for a product or service). This project gives an introduction to the area of business forecasting using a newly written textbook. It includes some software testing (and development if appropriate) as well as the evaluation of different methods on test problems.
Degree: Lancaster University, MPhys Physics First Class (2005-2010)
The aim of any investor is to maximise their return. The highest return must be for a given amount of risk, or equivalently the risk must be minimised for an expected return. A mixture of analytical and simulation-based methods will be used to derive the properties of a portfolio and consequently the weight of investment that is given to each individual asset.
Compressed sensing methods for problems in statistics
Degree: Heriot-Watt University, Edinburgh, BSc Mathematics and Statistics (2008-2012)
Compressed sensing (CS) has recently emerged as an important area of scientific research for efficient signal sensing and compression. The main idea behind CS is that certain signals will be able to be entirely constructed using numerical optimisation algorithms from a relatively small number of “well-chosen” signal samples. This project is exploratory in nature and it provides the opportunity to learn about and research the area of compressed sensing, focussing on the role of CS in statistical applications for particular signals of interest.
Parametric inference for missing data problems
Degree: University of Cambridge, BA Mathematics (2009-2012)
Supervisors: Giorgos Sermaidis and Paul Fearnhead
A typical complication in parametric inference for missing data problems is the intractability of the likelihood. A well-established approach to maximum likelihood estimation is the simulated likelihood, where estimation is based on the optimisation of an unbiased Monte Carlo estimate of the likelihood. An important drawback, however, is that parameter consistency is achieved only when the Monte Carlo effort increases as a function of the data sample size, thus leading to computationally expensive algorithms. The aim of this project is to tackle this problem by constructing unbiased estimators of the log-likelihood, in which case consistency can be achieved even for fixed Monte Carlo size. The project involves standard techniques for Monte Carlo simulation and unbiased integral estimation and programming in R.
Analysing the structure of (multivariate) time series
Degree: Imperial College London, MSci Mathematics (2008-2012)
Supervisors: Karolina Krzemieniewska and Matt Nunes
Time series that are observed in practice are often highly complex in nature, for example, accelerometry signals arising from human movement experiments. The underlying behaviour of these signals is sometimes hidden or difficult to detect in the first instance. This project focuses on applied data analysis for complex time series and using statistical techniques to investigate changes in the underlying structure of time series. The project involves analysing real-world data arising from investigative health studies conducted by external collaborators.
Stochastic actor-based models for network dynamics
Degree: Durham University, MSci Mathematics and Physics (2008-2012)
Supervisor: Stephan Onggo
A stochastic actor-based model is a model for network dynamics that can represent a wide variety of influences on network change, and allow us to estimate parameters expressing such influences, and test corresponding hypotheses. The nodes in the network represent social actors, and the collection of ties represents a social relation. The project involves reading and summarising the relevant research literature on stochastic actor-based models, learning how to use RSiena, preparing a set of data, and applying the technique to the data.
Ivar Struijker Boudier
Exploring a new class of probability models for tail estimation in extreme value modelling
Degree: University of Glasgow, BSc Statistics (2008-2012)
Supervisor: Ioannis Papastathopoulos
Statistical modelling of extreme values plays an important role in understanding the behaviour of unusual events such as extreme weather conditions, earthquakes and financial crashes. The most common approach to the modelling of extreme values is to fit an appropriate probability distribution to the tail of the data and extrapolate it to levels above which no data are observed. This class of distributions is called the generalised Pareto distribution which contains the Exponential distribution. However, fits finite samples are not always adequate and more flexible models might be appropriate. The project explores a new class of probability models that incorporates existing models as special cases. The project involves exposure to the theory of extremes, simulation studies for the applicability of the new models and the statistical analysis of a medical dataset.
Degree: Durham University, MMath Mathematics (2008-2012)
Supervisors: Yifei Zhao and Stein W. Wallace
Facility layout, in its simplicity, is about where to place different machines on a production floor in situations where the use of conveyor belts is not possible because the different products do not all visit all the machines and, even if they did, not necessarily in the same order. So transportation of the products from machine to the machine can be complicated if the machines are far apart. In fact, it can result in total chaos. The ultimate goal is to place machines close to each other if it is likely that products need to be transported between them. The problem we study is simply: how should the machines be placed on the production floor?
Here you can find details of the summer 2010 interns including a description of their research project.
Optimisation on road networks
Degree: Lancaster University, MSci Mathematics (2008-2012)
Supervisor: Richard Eglese
There are many problems that involve optimising an objective that is relevant to journey planning over a road network. The first part of the project will be to review some of the existing methods for finding the shortest (or least cost) paths in a network. The second part of the project is to develop an effective algorithm for finding the least cost path between two points where the speed and cost of travelling along an arc depending on the time of day.
Investigation of Approximate Bayesian Computation
Degree: Lancaster University, BSc Mathematics (2008-2011)
Supervisor: Dennis Prangle
For many complex phenomena, fitting realistic statistical models is mathematically intractable by standard methods. A recent computational alternative is to repeatedly simulate the model to find good fits. This project investigates one such method (Approximate Bayesian Computation) on data from a Tuberculosis outbreak. The aim is to assess various implementations of this method through computer experiments, which will involve exposure to modern statistical methods and software.
Multi-scale methods for texture analysis
Degree: University of Warwick, MMath Mathematics (with a year abroad: Europe) (2006-2010)
Supervisor: Idris Eckley
Wavelets are a recent and powerful mathematical tool which were developed in the 80s. They provide a novel way of decomposing the information within signals and images, providing information at various scales (you can think of these as viewing windows). Texture analysis is a particular application area in which wavelets have been successfully used in recent years. Broadly speaking the texture of an image is the visual character of a region whose structure is, in some sense, regular (e.g. the appearance of a woven material). This project will investigate the potential of wavelets and related methods to modelling structure within textured images.
Time-Dependent Queueing Systems
Degree: University of Manchester, MMath Mathematics (2006-2010)
Supervisors: Navid Izady and Dave Worthington
In general, in the area of mathematical modelling, modellers often make simplifying assumptions in order to make a problem ‘solvable’. In doing so the modeller is hoping that the solutions produced by the simplified model will nevertheless be valid (in some sense) despite the simplifying assumptions. Important examples in the area of modelling queueing systems are, for example, call centres, accident and emergency departments, hospital emergency admission units, intensive care units. Our interest is in modelling aspects of such queueing systems that typically exhibit time of day (and possibly day of week) variations in their underlying arrival rates of ‘customers’ as well as the usual stochastic variation in arrival times and service times.
Dynamic modelling for wind-prediction
Degree: Lancaster University, MSci Mathematics with Statistics (with a year abroad: Australia National University) (2007- 2011)
Supervisor: Ben Taylor
Dynamic linear modelling is a technique for the analysis of time series data when the governing parameters of the model themselves evolve over time. In particular, it is easy to obtain predictions using these methods. This project concerns the short term modelling and prediction of wind speeds and hence power output at wind farms. The application is important in deciding whether a potential new site will deliver an acceptable amount of energy.
Pricing on-demand online services
Degree: Lancaster University, MPhys Physics with Particle Physics and Cosmology (2007-2011)
Supervisor: Chris Kirkbride
Cloud computing is a relatively new concept for Internet-based computing in which resources, software, information and applications are provided to user devices (PC, laptop, mobile) on-demand. This project will consider various models for the cloud environment in order to determine how resources can best be utilised to meet demands for service and how to price such services effectively.
Examining the applicability of a new technique for threshold selection in extreme value modelling
Degree: Lancaster University, BSc Mathematics (2008-2011)
Supervisor: Jenny Wadsworth
It is the extreme values that are important in many applications, such as flooding, stock market crashes, and wind storms. To estimate the frequency of extreme events a statistical model is fitted to the extreme values and extrapolated to the value of interest. This project is concerned with investigating appropriate probability models for “extreme values”, or more precisely the tails of a probability distribution. However there is a challenge in defining what makes a value “extreme”, i.e., from what point should we begin to model the tail? The project will look at examining the applicability of a new method for helping to define a suitable threshold. This project will involve mathematical computation and exposure to real-life problems using a variety of different data sets.
Detecting changes in mean
Degree: Lancaster University, MSci Mathematics (with a year abroad: Australia National University) (2007- 2011)
Supervisor: Rebecca Killick
In recent work, we collaborated with a company to identify whether there was a change in storminess in the Gulf of Mexico. This project arises out of this work. Detecting changes in properties, such as the mean, of a process are important in many other areas of research such as quality control. Although there are many algorithms designed to detect changes in mean, there has been little comparison of the performance of these algorithms. This project will provide an opportunity to research different algorithms, program them and then conduct simulation studies to test their performances under various circumstances.
The Change-Making Problem
Degree: Lancaster University, BSc Mathematics (2008-2011)
Supervisor: Adam Letchford
The Change-Making Problem is concerned with finding the minimum number of coins needed, in a given currency, to reach a certain amount. Suppose, for example, you are in Britain and you wish to give somebody 39p. The minimum number of coins needed is five (20p, 10p, 5p, 2p, 2p). If you were in the US and you wish to give somebody 39c, the minimum number of coins is six (25c, 10c, 1c, 1c, 1c,1c). This topic may seem, at first sight, to belong to recreational mathematics but it is in fact a classical operational research (OR) problem with many applications.
Non-stationary time series analysis
Degree: Lancaster University, MPhys Physics (with a year abroad: North America) (2006-2010)
Supervisors: Idris Eckley & Matt Nunes
Most signals (i.e. time series) observed in the real-world are non-stationary in their nature. This project will explore the behaviour of datasets related to financial data. We will investigate the structure of these signals using wavelets - a form of localised basis functions. The project will give an opportunity to learn about wavelets, their application to time series and provide the experience of conducting advanced exploratory data analyses.
Degree: University of Edinburgh, MA Mathematics (2007-2011)
Supervisors: Yifei Zhao and Stein W. Wallace
Facility layout, in its simplicity, is about where to place different machines on a production floor in situations where the use of conveyor belts is not possible because the different products do not all visit all the machines and, even if they did, not necessarily in the same order. So transportation of the products from machine to a machine can be complicated if the machines are far apart. In fact, it can result in total chaos. The ultimate goal is to place machines close to each other if it is likely that products need to be transported between them. The problem we study is simply: how should the machines be placed on the production floor? To do this we shall solve numerically small cases of the problem so as to try to understand the emerging structures (designs).
Selecting a portfolio in finance
Degree: University of Oxford, BSc Mathematics (2009-2012)
Supervisor: Ye Liu and Jonathan Tawn
In finance, the aim is typically to make as much money as possible while incurring as little risk as possible. One way of reducing the risk is to hold a selection of investments (a portfolio). However, as some investments are correlated then statistical methods are required to find the best way of balancing risk and expected return. In this project, you will explore the basic assumption that returns of investments are multivariate normal using a range of financial data and investigate some extensions of this assumption which are more realistic and result in better decision making in optimising the portfolio choice. The project will involve a real problem with real data, the need for statistical modelling, simulation and optimisation.
Agent-based Physical Asset Maintenance Simulation Modelling
Degree: University of Birmingham, BSc Physics (2008-2011)
Supervisor: Stephan Onggo
Physical assets such as houses, motorways/roads, water pipes and electrical networks need maintenance because the condition of a physical asset deteriorates with time and usage. The risk of an asset failure (e.g. flooding) or not being able to provide the required service quality (due to weak water pressure) increases as the assets condition decreases. The cost of a repair/replacement process, including the liability incurred due to an asset failure, can be very high. Therefore, a good maintenance strategy is needed. In this project, we will use one of the least explored OR modelling techniques for evaluating asset maintenance strategies, that is, an agent-based simulation model.
Here you can find details of the summer 2009 interns including a description of their research project.
Detecting changes in regression for time series: a review and application
Degree: MSci Hons Mathematics with Statistics/North America-Australasia at Lancaster University
Supervisors: Idris Eckley and Rebecca Killick
This project aimed to detect changes in regression (trend) in these datasets using industrial data sets, including several variables, provided by Unilever. Several existing methods for detecting changes in the regression were investigated, including (normal) maximum likelihood (with and without penalty), residual sum of squares and cumulative sum of squares before conducting a simulation study looking at their effectiveness. From this simulation study, the most appropriate algorithm was chosen using statistical methods and finally, the algorithm applied to the various industrial datasets. Anna produced a technical report of the findings and had the opportunity to present to statisticians at Unilever in Amsterdam.
Anna is now pursuing a PhD at Imperial College, London.
Optimal Control Policies in adjustable queue systems
Degree: MSci Hons Mathematics/North America-Australasia at Lancaster University
Supervisor: Kevin Glazebrook
Countless industrial processes include some variety of queueing system, for example, telecommunications and transport. Problems regularly arise in how queue operators manage the demand for their services. The challenge is to find an optimal way of allocating resource towards providing service across a collection of independent service stations serving customers in corresponding queues given the delicate balance of overspending on service infrastructure versus underspending and incurring costs due to system neglect. The approach to solving this problem relies heavily on computation and a good understanding of queueing objects in order to simulate an ideal queueing system. The key outcome of this project was to deliver a near-optimal method of managing queueing systems by considering a case study involving queues with only limited modes of service available at any time.
Jak joined STOR-i in 2010 to pursue a PhD in STOR.
Queueing Systems and Optimisation of Computer Component Repairs
Degree: BSc Hons Mathematics at Lancaster University
Supervisor: Kevin Glazebrook
Repair companies often offer a promise of a turn-around period in which a faulty product will be repaired and returned to the customer, ensuring optimal customer satisfaction. In the majority of cases, the repair company will not complete all of the repairs themselves, if any at all, but will instead outsource the work to several different sub-companies. Upon receiving a broken product, a computer for example, the repair company must then decide to which of its contracted sub-companies to send the machine. Company A, for example, maybe a larger, more specialist or more equipped business, and as such may be able to perform a given repair a lot quicker than Company B or C. If a company has a quick turnaround on their repairs, it may be desirable to send more broken machines to them than to the other companies. However, a balance must be struck between using the ‘best’ company and making efficient use of all the resources. With different companies being different distances away from a location (the repair company warehouse, for example), the time taken for travel and dispatch must also be considered. Taking into account all of these different factors, a model can be built in order to decide how many repairs to send to each company. Once a basic model has been designed, different probabilities can be assigned to factors, such as the probability of machine breakdown and machine repair, in order for choices and allocations to be made in the most intelligent manner.
Erin started a PhD at Lancaster University in collaboration with Garrad Hassan in 2009.
Graphical modelling of divergence weighted independence graphs in the Criminal Justice System
Degree: BSc Hons Mathematics at Lancaster University
Supervisor: Joe Whittaker
Graphical models show how the relationships between several variables can be shown in graphical form. This project required learning the theory behind divergence weighted independence graphs and the modelling of such graphs using the statistical package, R. A key part of the research focused on illustrating how these graphs can be used to identify relationships between factors which affect trust in the Criminal Justice System. On completion of the internship, Daniel produced a comprehensive scientific report including applications using British Crime Survey data.
Daniel joined STOR-i in 2010 to pursue a PhD in STOR.
Click on the links below to see the blogs written by each cohort.
Here you can find out more about the STOR-i internships experience as told by the 2020 STOR-i interns.
Written by Katharina Limbeck
Summer 2020 was a very interesting time to start working as interns at STOR-i. For the first time, the whole internship was to be held fully online due to the COVID-19 pandemic. Many of us were very grateful this program was still happening but also very new to the concept of working from home. We adapted quickly however, as the first week was full of interesting talks, social activities and engaging introductions to our summer projects.
We got to meet our supervisors on our first day for a virtual lunch meeting, which was a nice introduction to the program. Later this week all interns started reading about our projects and getting to know more about research into statistics, operational research and mathematics. Especially the course we started as introduction to programming in R was very well organised and a great experience of online learning.
During this first week, we also got to know all our fellow interns and some of the MRes students by doing 4 hours of teambuilding exercises. Just like the virtual coffee breaks scheduled throughout the internship, these activities gave us a chance to connect with other interns while working from home.
Even the social activities that were usually planned throughout the internship were translated to this new virtual environment. Instead of doing a scavenger hunt in person, we did a scavenger hunt at the campus of Lancaster University online using Google Maps. As not all of us have been to Lancaster before, it was an entertaining opportunity to get to know the university from the comfort of our homes. Especially the bonus exercises that asked us to take pictures with masks, make a video of how to wash your hands properly, build the highest toilet paper tower and find the largest stockpile of cans were a lot of fun. Some of us also participated in a virtual pub quiz, which was a great chance to engage with other students at STOR-I and answer interesting questions related to multiple topics.
On Thursday, we all held a presentation about ourselves, which was mostly an excuse to show cute cat pictures and introduce ourselves to everyone at STOR-i. Overall, we got a very warm welcome to the team and were looking forward to learning more throughout the next few weeks of our internships.
Written by Daniel Morton & Taj Patel
We started the week with our second group coding challenge; building an algorithm to play noughts and crosses. We worked together in groups to try and build the best code we could before putting them against each other on Wednesday. Though it proved more difficult than we initially expected, there was a clear winning group whose algorithm could never lose - to the dismay of my own team. Sadly, there were no prizes for the winners, but they had their bragging rights and it was an exciting way to end the R Programming tutorials.
This week mainly saw interns reading up more on their projects and beginning some deeper analysis on the problems after being able to talk through more ideas with fellow interns and further discussions with our supervisors.
The social event for the second week was “Cards with Cocktails”. The creativity with the cocktails was top tier, I think 2 of us had managed to muster the strength to walk to our respective fridges and grab a cider. As the name suggests, we played a few cards games and for some reason Daniel H kept winning “Crazy Eights” so I suspect something fishy was going on. Overall, this was great fun, and a nice opportunity to get to know each other and the other MRes students.
A highlight of the week was the talks given by current students on their respective studies - it is both interesting and helpful to listen to other things people are working on and how they approach the problems that they’ve encountered. Tuesday saw a number of project presentations from MRes students, and Friday's forum was given by Anja Stein who talked about her work on recommender systems, while allowing for regular updates using Sequential Monte Carlo methods.
Written by Luke Fairley and Daniel Hodgson
The third week eased in nicely with Monday, as with no specific sessions happening, we all got on with our personal projects, aside from a quick weekly meeting with the cohort split into two groups, to discuss how we were finding our projects. On Tuesday, we had our first LaTeX session, introducing us to some of the basic functionality and structure of the document software. One of us also had a supervisor meeting, in which we discussed our progress and issues thus far, and what to do next. That same supervisor then sent a fixed piece of code at around 11pm, a welcome surprise.
Wednesday introduced us to our first problem solving day, during which we were split into three teams, given lots of data on the top 200 streamed songs on Spotify over a period of time, and asked to find the features of songs best correlated with a high number of streams; that is, to build the perfect hit song. We tackled problems such as dealing with slightly messy data, handling strings and words in R when analysing the names of the songs, and other types of data analysis. Some even employed more advanced techniques such as Generalised Linear Models.
We began Thursday with our second LaTeX session, which proved to be more in depth and challenging than the first, quickly advancing to the creation of tables and matrices, and introducing the inclusion of pictures. The rest of the day was characterised by more work on our personal projects, broken up with another weekly meeting, this time in groups of 4. The week wound down nicely on Friday, mostly comprised of more personal work and writing this blog, but also featuring a short forum presented by Tom Grundy, presenting some of his work and research regarding changepoints. His work seemed to be a fusion of changepoint statistics and linear algebra, with the discussion of subspaces and multivariate time series. These ideas were applied to motion capture, where the changepoints being detected were different actions, such as walking or punching.
Written by Taj Patel and Daniel Morton
Week 4 was quite an exciting time for us. A lot of us were given reading material to be covered in the first 3 weeks, so it was great to start taking our projects in the directions we wanted, and even started coding some solutions to our respective problems.
Week 4 (thus far), seems to be UK’s hottest week, with temperatures hitting low 30’s. This meant during any phone call at least one person’s laptop would be a few degrees away from exploding. Every Monday we’d have a group catch-up call with one of the STOR-I leaders (Luke or Jake) to discuss how our projects were progressing and for any questions we might have. This week we’d had a conversation about the mountains and hikes we’ve missed out on, thus I made it my mission to visit Lancaster at least once in my lifetime and do all the trails. Tuesday started off a bit more relaxed, Ed, Matt R, Kim (MRes students) gave talks about their projects, all of which were insightful.
On Wednesday we had our final LaTeX session, in which we learnt how to create a professional academic poster. This was quite an interesting session, as it really demonstrated the power of LaTeX. During this session we also had the opportunity to view/critique some of the posters done by previous MRes students. On this day was also the August OR reading group, which was great to see how (admittedly difficult) papers can be interpreted and discussed in a group to enjoy a deeper understanding.
Thursday was quite the eventful day. Hamish and Peter had organised an “Online Escape room”. You can imagine our faces upon initially hearing the online nature of this event; however, we all found it to be great fun. The escape room consisted of a series of online clues and questions. It’s fair to say Daniel H had pretty much carried our team to freedom, he’d solved one of the riddles that even upon explaining to me and Peter, we’d still failed to understand. Our Team did take great pride as we didn’t just beat the other teams to escape, but also beat Hamish's previous time.
Written by James Boyle
So week five has been and gone. We learnt quite a lot about the ins and outs of doing a PhD this week, in the form of a Q&A session with a couple of PhD students, and a session with the STOR-i co-director about PhD admissions and applications. It was interesting to hear more from the PhD students, and to learn more about the sort of things to consider when deciding whether, where and how to apply for a PhD. In particular it was interesting to hear about the differences in the kinds of jobs one might get after a PhD compared to an undergraduate degree, which is something I at least hadn’t even considered before.
In other events, this week saw the arrival of the second problem solving day. This week was all about how airports allocate usage slots to airlines. Airlines request the use of airport facilities for specific arrival and departure times, and the airport then has to figure out how best to allocate slots in accordance with this, as in general for peak times demand outstrips supply. Obviously, you want to try and minimise the amount by which the slots airlines are given are different from those requested (termed the displacement). We were tasked with finding ways to ensure that slots were allocated fairly (the meaning of which was up to us to decide). As it turns out there’s quite a lot to consider here, such as ensuring that no individual airline suffers significantly more disruption, proportional to the amount of flights run by the airline, than any other one, or that no flight gets displaced from the requested arrival/departure times by too much. You’ve also got to ensure that the allocation algorithm can’t be “gamed” by airline companies by, for example, requesting different slots to what they actually want.
As for my project, this week has mostly consisted of a lot of simulation. I’ve recently been learning about various different models for network data, such as brain scans, and this week have been using simulation methods to compare their performance. It can be quite frustrating to have your code spend five minutes simulating a bunch of data to make a graph, only to find out at the end that you misspelt a word in the title and are going to have to run the simulation all over again, but other than that it’s been quite fun.
Anyhow, next week’s social involves a baking competition, so I’m going to have to stop writing this blog now and come up with a way of encapsulating the idea of “lockdown” in a cake…
Written by Luke Fairley
Week six was characterised by STOR-i social events and time to work on our individual projects. On Monday afternoon the interns were split into two groups for the weekly meeting. This allowed us to discuss our projects, the Austrian education system, how fortunate it is to have a cat who shows affection and the A-levels U-turn made on Monday, as well as other topics. I cannot speak for the other group, but it is safe to assume that different but similarly varied conversations took place during their meeting also.
On Tuesday morning talks were given by three MRes students, which was an interesting insight into their respective PhDs. Other than that, we had all of Tuesday to work on our personal projects and there was a bake-off social activity during the evening. The theme of the social was lockdown, which allowed for extensive individual interpretation and creativity - including a gingerbread prison and a cake shaped like a loo roll. It was a tight contest, but it was eventually won by Jamie’s gingerbread.
Wednesday and Thursday had no timetabled sessions so there was plenty of time for us to work on projects. As we are about to enter the penultimate week of the internship, this time was particularly useful to finish the research portion of our projects and collect some results!
Friday consisted largely of time to work on individual projects but there was also the weekly STOR-i forum; this week given by Jess Gillam. The work presented involved detecting changes in the daily routines of elderly people; by analysing the data recorded by various sensors around their home. There is significant research showing that a change in daily routine can be an indication of change in health and well-being. The method presented by Jess allowed for the model for an individual sensor, over time, to alter its probability of being triggered given information from surrounding sensors.
On Friday evening, the week was wrapped up by the virtual STOR-i ball, using Zoom. The main bulk of the ball was given by the pub quiz, hosted by Jordan, which covered a variety of topics from general knowledge, to weird trivia about STOR-i staff. This was followed by Jon’s speech and the awards, characterised by banter, quick jabs, and some innuendo for good measure. While a few people left after this, remaining people were randomly sectioned off into Zoom breakout rooms to chat amongst themselves.
Written by Jack McGinn
Week 7 went in a similar way to all the other previous weeks. With the end of the internship on the horizon, our thoughts as interns began to start considering how to wrap up our individual projects. Less was scheduled this week in comparison to most to give us time to complete our projects and start making the presentations for the following week. The Monday still had the usual catch up in the afternoon with all the other interns. This was a good opportunity to catch up see where the other interns were at with their own projects. It was also a good chance to hear about what else they had been doing with their time beyond the project. Tuesday had nothing as a group scheduled, so it was a good chance to really get to work on our projects. I found the fact that the project was ending soon to be particularly good fuel when it came to getting plenty of work done this day.
On the Wednesday was the final problem-solving day of the internship. We were split in to three teams and posed with the question “What models could we implement in order to improve the efficiency of A&E?”. This problem-solving day was less orientated around using a data set then the previous two, but more drafting up ideas and how they can be implemented. It was a very interesting question and it was fascinating to see what problems an A&E department has and how statistics can be used to resolve them. Like the other problem-solving days, it ended with a 10-minute presentation from each team to feedback to the other teams the ideas each had come up with. Thursday and Friday were again mainly used to complete the work of our own projects. For me that meant some extra meetings with my supervisor to discuss the work I had done so far and what might be good to show in the presentation. Friday gave us one last opportunity to have our regular Friday meetings with the other interns. It was a great opportunity to see where the other interns were at with getting ready for their presentations.
Written by Adeeb Mahmood and Matthew Speers
This week was very quiet as everyone was putting the finishing touches on the presentations which were to be given at the end of the week to all of STOR-i. However, unlike the last years we did not have to make a poster this time, this is mainly due to the covid-19 situation. It turns out summarizing 8 weeks of work into a quick and snappy 10-minute talk proved to be quite difficult, although looking at the end product of everyone's presentation we feel everyone summarized it quite well.
To start off the week, we had an operational research reading group where we read a paper together. The paper we looked at was ‘Performance Variability in Mixed-Integer Programming’. It was a good chance to talk about some maths and bounce some ideas around. Another big event that happened this week was our exit interviews. These consisted of a one to one talk with either Jake or Luke for about 15 minutes where they asked for improvements to STOR-i, about our future plans, if we can see ourselves joining STOR-i and what we felt like we contributed to STOR-i. It was a good chance to reflect on our last 8 weeks and to think about what we want in the future now we’ve come to the end of the internship experience. We’ve all been able to get a better idea of what research is really like and if it is the right choice for us. We are very appreciative to our supervisors, PhD and MRes students, and anyone else that made the experience what it was. It is greatly appreciated, and we hope everyone enjoyed it as much we did!
Here you can find out more about the STOR-i internships experience as told by the 2019 STOR-i interns.
Written by Joe Holey and Katy Ring
At the weekend we moved into the building in Furness college where many of the STOR-i interns are staying for the duration of the internship. Katy was impressed with the campus environment (particularly the large duck community) having never lived in student accommodation before.
The first day of the internship started with some introductory talks from STOR-i director Jonathan Tawn. We then did some icebreaker activities to get to know the other interns and some of the MRes students better. These included trying to untie a human knot which wasn’t our forte and team juggling which suited our skillset better.
Then we got to the highlight of the day – the buffet lunch! The feast lasted for 2 whole hours and much merriment was had by all.
On Tuesday, we got under way with our work, beginning with an R workshop (more of these followed throughout the week) and meeting with our respective supervisors who introduced us to their work and our projects. Following the day’s work, a few of us joined some members of staff to enjoy a game of football in the sun.
On Thursday, we each gave a 5-minute presentation about ourselves. We were surprised to find that we weren’t just giving these presentations to each other but seemingly to all STOR-i staff members and their extended family. These were the perfect opportunity to see embarrassing baby photos, cute pets and Shyam’s hairstyle woes. After the presentations we went on a scavenger hunt organised by the MRes students, which was a great opportunity to get to know the campus better. There were several opportunities to gain bonus points throughout the event, including getting a photo of someone in your group getting soaked in the fountain by the great hall – Joe gladly obliged by sticking his head right in the water. At the end of the scavenger hunt all the groups met up for a barbecue which featured loads more free food as well as some frisbee and more football.
The working week ended with us going to our first STOR-i forum which was presented by Sam Tickle. These are an opportunity for members of staff to share their work with the rest of the department and for everyone else to learn something about a field of STOR that they may not be familiar with.
Finally, some of the interns attended an “Applied Probability Night” on Friday where we were able to test our poker skills against each other.
Written by Matthew Darlington and Shyam Popat
After work on Monday there was badminton, which was a good chance to get to know some of the PhD students in a relaxed atmosphere. On Tuesday there was football as usual in extremely hot conditions, and also a meal and pub quiz organised in Lancaster. We split up into three teams, one team got to win a prize for the most average team and were awarded £10 and a curly wurly. We also had the R course where we competed with our code for the travelling salesman problem, and a box of celebrations for the winners.
During the middle of the week we had a break from the activities which was a good opportunity for us to make progress on our projects.Thursday afternoon we made R code to play noughts and crosses and then we had a tournament with chocolate again for the winners. There was a bit of controversy with the final results as once team had accidently made their method to overwrite the other teams moves! On Friday we had the second forum by Anja Stein, who talked about her work on recommender systems, followed by tea, coffee and biscuits in the hub.
At the weekend, there was a trip to climb up Scaffel Pike. One of the two cars got lost on the way there and ended up going up hardknot path which is one of the worst roads in Europe! It was a very steep climb up to the top, but once there it started to rain for the remainder of the walk. Hard work but worth it at the end with a pub meal back at the Boot and Shoe.
Written by Connie Trojan and Shyam Popat
On Monday, we moved into our new base room in STOR-i, right next to the kitchen and our endless coffee supply. We spent (wasted) some time moving our tables and sofas around to create group working and social areas, including a ‘mini-hub’ where we promptly enforced a mandatory 11am coffee break.
Since we had already learned all there was to know about R, on Tuesday we started an introduction to LaTex, learning the basics of document and presentation creation. We once again put in an appearance at the pub quiz, where one our teams tied for second place.
On Friday, we attended a PhD talk from Jess on modelling categorical data. We ended the week with a pub crawl in the city centre, taking full advantage of National Pub Fortnight to claim free pints at the White Cross.
Written by Liv Watson
Realisation that we were already coming up to the half way point of the internship this week started Monday morning with a sinking feeling, but then Matt pulled out some homemade banana bread and all was well with the world again. Stressed-out nervous laughs about broken code could be heard in the base room showed that we all were wondering how we’re going to get our R code working correctly in time.
Tuesday brought talks from some of the MRes students – Chloe, Aimee, Graham, Drupad and Thu – all of which were highly interesting and a welcome break from working on our own projects. The afternoon break was spent figuring out the picture round for the quiz that night, and the celebration that occurred when it was finally figured out was a mighty one. That evening, we decided to take advantage of Study Rooms Tasty Tuesday offer of 50% off mains before heading to the White Cross for the weekly pub quiz. Whilst we didn’t win the overall quiz we did win the most calorific prize they have had – so really who won here!? Still not us.
On Wednesday we had our final LaTeX workshop, where we learnt how to create posters so we would be fully equipped to make our end of project posters. It also featured a viewing of first year PhD students’ posters, where we had to say what we liked and disliked about their posters – all I can say is that I hope they are nicer about ours then we were about theirs!
Friday’s forum was a really engaging talk given by Matt Bold all about his BIG new scheduling problem – the decommissioning and safe clean-up of legacy nuclear waste at the Sellafield nuclear site in West Cumbria. That evening a couple of interns headed down to the bar to watch the kick off of the Premier League and to play some pool, I went home and played with the puppy...
Sadly, we had to postpone our planned trip of boating and a picnic in Coniston due to the harsh weather in the Lake District on Saturday. Instead we had a boardgames day in the Hub and left the brave(?) to tackle the storm.
Written by Katie Dixon
On Tuesday, a handful of the PhD students held a session titled ‘Life as a PhD student’. This was an opportunity to hear first hand what to expect should we choose to continue our studies as a STOR-i student and it allowed us to ask any questions that we had. In the evening, we managed to put together a team for the pub quiz. The team came in third place and they were only 3 points off winning the whole quiz!
As usual, on Friday we attended the STOR-i Forum but this week there was a twist. Instead of the normal set up (one 30-minute presentation), we were given mini lectures from a range of the STOR-i team where they had to outline their area of study in a maximum of pi minutes. Following the forum, we all attended a bake sale in the hub where the fantastic bakers managed to raise over £150 for Mind.
On Saturday, some of the interns chose to tackle Fairfield Horseshoe. In classic Lake District style, they managed to experience all four seasons in one day – some even happening at the same time! This was followed by a trip to Grasmere where Dylan bought 30 bits of the notorious gingerbread all for himself!
Written by Matthew Gorton
We finished Tuesday with a barbecue to celebrate Katy’s birthday. During this, Shyam came up with the innovation of barbecued curly fries, which I think it’s fair to say proved a mixed success. In a last-minute moment of ingenuity, Katie and Liv improvised a birthday cake by sticking a candle into the last remaining bread roll. We managed to get everything cooked and eaten just before it started tipping it down! Four of us, plus Sam, then went to the pub quiz. We were sadly unsuccessful prize-wise, but had fun, nonetheless.
Wednesday was not a normal working day, instead we had a 'problem-solving day’. Our task, set by Rob Shone, a researcher in Management Science, was to find a solution the problem of scheduling aircrafts at an airport. It turns out that this is a very complicated task!
Airports can only have a certain number of planes taking off and landing within a certain time period (say, per hour). So, airlines bid on arrival and departure times. Our task was to come up with a method to schedule arrival and departure times that minimises the ‘displacement’ – the difference between the time requested and the time assigned.
All three groups ended up coming up with different ideas and solutions, and we all thought of other issues that you might need to consider: leaving time for bad weather or emergencies when scheduling, different types of aircraft requiring different turn-around times. Rob told us that we had exceeded his expectations, and he even asked us to send our slides to him for them to look at for ideas! Quite impressive for a bunch of interns only working for a single day!
Friday’s forum was given by Livia Stark. Her work is trying to narrow down sources of information to be used by intelligence agencies in a novel way, which was of particular interest to myself as we are both investigating the same technique for solving problems (multi-armed bandits).
Straight after work on Friday, the interns went to Spaghetti House for dinner. We managed to get their early enough to take advantage of their ‘Happy Hour’, giving us pizza or pasta for £5.75. A lovely way to finish the week!
Written by Dylan Bahia and Jack Trainer
This week, most of our time was spent trying to knuckle down and get our posters finished so that they could be printed ready for the poster session next week. This meant that most of the week (except the sacred bank holiday) was spent in the STOR-i base room. Of course, we still had our usual half hour on Tuesday morning trying to decipher this week’s clue for the pub quiz and those who attended football on Tuesday had the privilege of seeing STOR-i footballing legend Harjit score his last goal ever. A slow, tough week was all worth it however as we had the opportunity to unwind at the annual STOR-i ball.
The whole department gathered at Lancaster Golf Club for an evening of food, drinks, magic and karaoke. The evening began with a delectable three course meal, accompanied by a small selection of aromatic wines. During this, a magician circulated the tables, flabbergasting us with acts which could be nothing other than sorcery. As the meal became ever more evanescent, Jon entertained us with his light-hearted speech, immediately followed by an awards ceremony. The most notable award was the Tickle Sam award, praising his contribution to the STOR-i department. The mingling of the guests then ensued, with the bar helping to facilitate conversation and creating a night to remember (or in some cases forget). With the highlight of the evening on the horizon, it was time for the guests to muster their brethren and deliver a spectacular karaoke performance. The combination of singing, dancing and laughter birthed a night nothing short of perfect. As the hour past witching hour approached, the guests said their farewells and went home, thus concluding the night. There were some exceptions, who continued the night by sauntering toward the city.
Written by Gwen Williams
We started the week making the final edits to our posters, which were intended to summarise our project and findings. We had been warned some STOR-i members had very particular views about poster-appearance, and so making sure our text boxes were correctly aligned was given high priority.On Tuesday, to celebrate submitting our posters, the majority of interns headed into town for the last weekly pub quiz. Although our general knowledge may have let us down, our luck did not, and we managed to win some free drinks.
The rest of the week was spent preparing our presentations. Summarising seven weeks of work in a 10-minute presentation proved challenging, however feedback from other interns during practise sessions made this a lot easier.On Friday morning everyone gave their presentation, with a brief interlude for coffee, of course. Once the presentations were over we all breathed a sigh of relief and went for a celebratory go burrito lunch.
That afternoon was the poster-session, fuelled by a generous spread of sweet treats. We each stood by our poster while members of STOR-i had the opportunity to walk around and ask us questions (or measure how well our text boxes were aligned). This was a great opportunity to talk in some more depth about our research, as well as to say goodbye to members of the department before we all left.
After all the excitement of the presentations and poster session, we were hit by the sad realisation that the internship had come to an end. For a final goodbye, we headed off to a bar, before going back to one of the flats for a big meal. The meal was intended to use up our leftover food before we moved out the next day. Despite having a somewhat unusual set of ingredients, the chefs (thank you) made some delicious tacos. After the meal, we said our goodbyes and wished each other well going back to our different universities. While we were sad to be leaving, we felt very grateful to have spent a fantastic summer as interns at STOR-i and to have been made to feel so welcome by everyone there.
Here you can find out more about the STOR-i internships experience as told by the 2018 STOR-i interns.
Written by Peter Greenstreet
I moved into the STOR-i flat on Sunday and within 3 hours I had met 8 of the interns and we started to get to know each other. The Monday began with an introductory talk from Jonathan Tawn, then we had an IT session with Oli who set us up and gave us all laptops! (Sadly only for the 8 weeks internship). We all quickly discovered Oli was a master of all tech, as he can sort any problem out. Next up we had a team-building session which began by holding hands and getting knotted together. This was followed by trying to make a square with some rope whilst blindfolded, which ended up having only 2 corners and not one side of equal length. Finally, we finished with a game where we had to call each other different vegetables. It was all a great laugh and also I really got to bond with both the interns and the Masters students. Next up was a 2-hour lunch with FREE food where we also got to meet our supervisors. Everyone was super chatty and friendly. Following this was another lecture and we finished with a university tour. After this, some of us headed to the sports centre to play badminton with PhD students, who were really good. For the next 3 days, we all met with our supervisors to discuss our projects and find out what we needed to learn for the first couple of weeks. We also had lab sessions on both R and LaTeX which helped refresh my memory as well as teaching me new skills in both. On Tuesday night we went to the legendary White Cross quiz night and one of the STOR-i teams even managed to come second! Thursday started with some more R followed by our introductory presentations which contained loads of cute baby photos. This was followed by a great scavenger hunt. We were split into teams of 4 with 2 interns and a PhD and Master student in each. We were given a list of things to find, as well as challenges all around campus. It was great fun. Then we had a barbeque with lots of food. However, we did lose a sausage to the ducks! On Friday we had a presentation from Sarah about her PhD project followed by cake and then some more time to work on our projects and meet with our supervisors. That evening some of us went to the gym and the others went to a poker night. On Saturday we went for a 3-hour walk and then bought some famous sticky toffee pudding. It was a great opportunity to get to know some of the MRes and PhD students. Then some of us had a roast that evening followed by our lovely sticky toffee pudding.
Written by Mason Pearce
We started the week by working on our second challenge using Rstudio in our assigned groups, the challenge was to code a strategy to win a game of tic-tac-toe, we then simulated 1000 games against using our strategy against the other groups. It was very close, group 1 drew with everyone but group 2 won more when playing group 3 and they were the overall victors receiving a giant Toblerone as their prize. Later in the day we moved over to our new base room and got settled in. In the evening some of us went down to the sports hall to play badminton with the MRes and PhD students.
The next day began with presentations from first-year students on project ideas for their PhD, this gave us a taste for the different areas of research that goes on at STOR-i. In the afternoon we were taught how to make beamer presentations and posters in LaTeX to prepare us for the later weeks, a few of us then went to play football. Later on the in the evening we attended The White Cross weekly quiz, splitting into two teams. One of the teams even won a gallon of beer!
We all met with our project supervisors again later in the week and the following days were spent working hard on our individual projects in the new base room. Most of us using the skills we had been taught in Rstudio to code what our supervisor had asked us too, whilst some of us used Python as it is a more suitable programming language for the project-related tasks. Although we were working on our own topics, there was plenty of talking and sharing ideas and lending people a hand if they needed.
On Friday, Sam organised a board game night at Pizzetta on campus, we all attended and a lot of the PhD students came too, which was nice. At the weekend we had planned to go on a trek up Scafell Pike in the Lake District, but due to the weather, we decided to postpone and instead went to escape rooms in the city centre. We were trapped in a jail cell accused of being witches and if we didn’t solve the puzzles to escape we would be ‘left to rot’, we had one hour. At first, we made great time, but towards the end, the puzzles got more difficult and slowed us down, we just managed to escape with only two and a half minutes spare!
Written by Niamh Lamin
Week three was a much quieter week in terms of scheduled academic activities but this gave us all a good chance to get our teeth into our projects. I spent most of the week studying the types of inequalities produced by a program called PORTA for optimisation problems. This involved the production of three of four items with start-up costs associated with machines involved in the production of various sub-sets of these items. Even though my supervisor was away, I found this wasn’t a problem because she was always available by email or phone if I got stuck or needed to ask any questions.
As well as speaking with our individual supervisors, we also had a group meeting on Friday afternoon. As in the previous weeks, I found this meeting really useful as it gave me a chance to explain what I had been working on that week to some of the other interns. As well as giving us all chance to find out about the interesting projects the others were working on, I found that explaining my progress helped me to consolidate and check my own understanding and provided useful practice ready for the presentations at the end of the programme.
On Friday morning, we had a STOR-i Forum with a difference - rather than just having a presentation from a single PhD student, we were treated to a series of ‘Pi Minute Theses’. Each PhD student had exactly three minutes and fourteen seconds to introduce us to their research topic. I really liked this format as it meant we got to hear about a wider range of different projects and the general introductions were easier to follow than the more detailed presentations of the previous two weeks. All the projects sounded really interesting but I particularly enjoyed Emily’s presentation about Combination Therapies and how information could be borrowed between similar combinations of drugs to decide which ones to investigate in clinical trials.
Even though the academic timetable was slightly less hectic, the social calendar was just as full as normal so there was a lot to entertain us all in the evenings. The weekly badminton and football sessions continued, as well as the pub quiz on Tuesday evening but there were also some special activities. For example, a group got together to watch the Love Island final on Monday evening and a group of us went for an impromptu ice-cream from Walling’s in Alex Square on Wednesday afternoon- it was the best chocolate-chip ice-cream I’ve ever tasted, though I fear I managed to get more of it down my shorts than actually in my mouth!
To round off the week, there was a bar crawl Friday evening. I’d never been on a bar crawl before but I actually really enjoyed it. We visited some really nice pubs and it was a great chance to socialise and spend time with some of the MRes and PhD students as well as the other interns (that’s one of the things I love about STOR-i there are always plenty of chances for integration between year groups which creates a great atmosphere). We started at the Water Witch and then made our way down into town visiting a series of pubs on a route planned for us especially by Tom and Alan. Though I was really having fun, since I was quite new to these sort of events, I decided to head home after the third pub, especially since the next destination was one whose name would strike fear into even the bravest of souls (which I most certainly am not)- The Pub!
Anyway, I have it on good authority that everyone made it back safely and the event was definitely a success.
Written by Sean Hooker
Week fours timetable again provided the interns with the possibility to focus on their projects with lots of time available for independent research. My project involves identifying points in a time series where there has been an abrupt change in its properties, such as a change in the mean or variance. I spent the past week building on techniques that I had coded previously and developing these into computationally more efficient methods.
This culminated in running my chosen method over multiple simulated time series all of the differing lengths. The main measurement I was comparing was the speed of the algorithms. The code took a little longer than expected to get through all the sets of data but I got a nice looking graph out of it and plenty of ideas for improvement.
I’m beginning to feel accustomed to the weekly activities of STOR-i members. Tuesday was football, it was a good turn out from the interns, as well as the regulars, this week and after an exhausting 90 minutes, the match ended with a close score. Also on that evening was the pub quiz at the White Cross pub in town, STOR-i fielded two teams whose members spent the night answering questions on topics from Pokémon to world records on blowing balloons and pretty much everything in between.
The rest of the week flew by, with the occasional hangman session to break up some of the days in our base room. This week’s edition of the Friday Applied Probability (poker) night was held in the intern’s flat and the home advantage was clear with Mason winning the night.
Saturday was the main event of the week with a hike up Scarfell Pike, this had already been cancelled once due to bad weather and it’s clear why even on a (mostly) bright and clear day this was a challenge. The entire group, made up of interns and PhD students, made it up and down before daylight fell, but they didn’t quite miss the rain, however. But this provided the group with some picturesque scenes of the mountains and the drizzle. Their impressions of the hike are currently skewed with the mental images of them all climbing down a mountain in the heavy rain, but given time, and a few more warm drinks, they’ll be able to reminisce what will be the main achievement of the internship so far.
Written by James Price
In terms of the project, Week 6 appeared to be a bit of a breakthrough week for a lot of the interns. With the prospect of the presentation and poster session looming, a lot of the work towards the project has taken shape and overall end goals are being achieved.
My project is on finding heuristics for real-time railway rescheduling. I’ve spent the past few weeks exploring various methods for finding the shortest path through various graphs and so last week was spent finding ways to measure both how good the methods were and how long they took to calculate their chosen route through the graph. The results were encouraging and allowed me to observe where certain methods could be refined further. This week saw the addition of a shadowy character to the intern’s base room, the puzzle-maker. This man (or woman) of mystery would leave us a new puzzle every day which, usually after a few hours of head-scratching, lead to a piece of paper hidden somewhere in the room containing a five-letter word. The ingenuity of these clues ranged from noticing a blue arrow pattern in a grid of chairs to colouring numbers on a grid according to an extensive set of rules. This all culminated in having to ask a specifically worded question at the Friday forum. And as if by magic, our prize, in the form of a cake, appeared in the base room.
I really enjoyed the puzzles, which the other interns can testify to due to my regular wonderings around the room to peer under a table or on a ledge. However, I discovered a quite a few new hiding places which will come in useful should my supervisor unexpectedly turn up asking why I got no work done this week.
The regular White Cross Pub quiz on Tuesday’s was also a triumph, which due to Sam Tickle’s beautifully obscure knowledge of the Enid Blyton’s ‘Famous Five’ novels in the Pointless round resulted in a tidy cash prize for the entire team. I guess you could say we had a wonderful time*.
The week closed with the big social event, the STOR-i ball, this year held at Lancaster Golf Club. This full-on night contained a group Ceilidh, complete with skipping, clapping and of course plenty of spins, and also a wonderful three-course meal, although thankfully not in that order. And then just when I thought it was all over, it turned out the night was only beginning. There was a quick taxi ride into town and before I knew it I was in Hustle nightclub, still in a full suit, having a great time.
I’ve managed to block from my memory the time when I finally got to bed, but if anything that’s the sign of a fabulous night.
*the joke is left as an exercise for the avid reader.Read on to find out more about the STOR-i internships experience as told by the current STOR-i interns.
Written by Kostya Siroki
The week started with an enjoyable day-off. But it didn’t make Monday any less entertaining, as the Murder Mystery Day took place. Three STOR-i teams participated in it. The aim was to find a “murderer” by answering questions, exploring Lancaster and collecting evidence from witnesses. All the participants found this event fascinating. Moreover, good results were achieved. The “Mafiamaticians” won best costume prize, also team “STORlock Holmes”, containing 2 interns, came second.
For the rest of the week, we were pushing our creativity to the limits so as to produce eye-catching posters. It made us especially collaborative this week due to the regular LaTeX errors and the subsequent necessity to find someone who had already encountered that issue.
A Foosball charity tournament was organized on Tuesday in order to raise money for MIND and also to test out the BRAND NEW FOOSBALL TABLE decorating the hub from now on. Two of interns participated in the competition as a team and successfully won the first round. Sadly luck wasn’t on their side in the second game and they lost against the eventual competition runners-up.
Week 7 was enriched with football. We had two wonderful games on both Tuesday and Thursday. Interns lead by the MRes students opposed PhD students on Tuesday. After an exhausting 90 minute long game the score was 5-5 and so a golden point game began. This time we lost, but next week we will be sure to come back stronger than ever before.
Tuesday was a very busy day as, in addition to the above-stated activities, it also included the pub quiz. Three teams represented STOR-i this time, and every team ended up winning prizes. One of the teams won “The most average team” prize, the other one was the closest when guessing the exact number of “Big Bang Theory” episodes and the last team, but certainly not the least, WON the quiz.
As usual, the week was concluded by the forum. This time the presentation was given by Christian Rohrbeck and of course, it was followed by coffee with cookies.
Written by Nicolo Grometto
The final week of the internship has finally begun!
We spent Monday morning making last-minute changes to our posters before sending them off for printing. In the afternoon, we all had a good start on our presentations, trying to condense the results obtained throughout the previous 7 weeks into a ten-minute presentation.
Unfortunately, Tuesday did not see any of the interns attending the weekly pub quiz. Making our posters and slides look pretty in LaTeX and feeling the final day approaching took up a great deal of energy, and almost no interns showed up for the last football session, either. Quite an animated 4-a-side still took place on the field, and the sun shining made it even more enjoyable for those who played.
Wednesday quickly went by, as we spent the whole day making fast progress with our presentations. On Thursday, we had our exit interviews with the Director of STOR-i, Jonathan Tawn. We had the possibility to discuss our experience throughout the internship, as well as the progress we made with our projects in the past weeks. We concluded the day by gathering in groups in the Postgraduate Statistics Centre for rehearsing and giving each other constructive feedback.
And at last, Friday! The day began with an unusual atmosphere at STOR-i, as we were all so excited about showing our work to others, whilst also feeling nervous about having to speak in front of the audience. After rearranging the interns’ office for the afternoon poster session, at 9:45 we started off with the presentations. It was incredible to see how much progress each one of us made during the internship and how well we all managed to present our work.
After a short break, the day continued with the poster display session, which also went exceedingly well. A number of visitors came along to see our work, including the MRes and PhD students, as well as members of staff from STOR-i and the Mathematics and Statistics, and Management Science Departments. We all received positive comments about our research projects, as well as posters, which made us extremely proud and satisfied with our work. We concluded the day with a final meal in town and celebrated our results altogether.
On Saturday morning, the time to leave had come. Whilst feeling sad for having to say goodbye to each other, we were all so happy for having spent a fantastic summer at STOR-i and for feeling part of such an inclusive community. Thank you to everyone who worked hard in order to make this happen.
Here you can find out more about the STOR-i internships experience as told by the 2017 STOR-i interns.
Written by Callum Barltrop
The week started with an introductory talk from Jonathan Tawn, followed by some team-building exercises. We also got a tour of the STOR-i facilities and were introduced to the saving grace of the organisation - the coffee machine.
On Tuesday, we had our first lectures on R and Latex. Many of us also met with our supervisors on this day to discuss our projects and decide what reading to do to familiarise ourselves with the content. In the evening, a bunch of us met in the White Cross for the 'world-renowned' pub quiz. Whilst my team didn't win, we did win a gallon of beer for having the closest guess on the bonus round - how much is a KG of Donkey Cheese? (clue: it's very expensive!)
Wednesday and Thursday were fairly similar, with more lectures and reading. In the intern group, we had all got to know each other fairly well by this point and had started making some plans for over the rest of the internship, including booking out a lecture theatre for Game of Thrones!
Friday was slightly different - instead of the usual lectures, we were set our first group R challenge, which involved a famous old puzzle. We also attended our first STOR-i forum, where we found out about some of the fascinating research being done by one of the PhD students at the organisation. In the evening, a few of us met up for a couple of drinks in one of the bars of campus.
Saturday involved a trip to Grasmere in the Lake District, organised by the one known formally as 'Mr Tickle'. After a long 10 mile hike with plenty of rain, cloud and mud, we bought some incredible gingerbread and stared up angrily at the sky as the sun came out... Just our luck!
Finally, on Sunday, George (another intern) and I met up in the morning for a nice steady run in the sun. Later on in the day, a bunch of us met up at a bar on campus to catch the Wimbledon final, where Federer made the game look easy.
All in all, a fantastic first week at STOR-i, with a lot still to look forward to!
Written by Edward Austin
Week 2 began with us comparing who had written the best algorithm to solve a Travelling Salesman Problem. I can confirm, with a score nearly 100 times that of the winning group – Jake, Jonny and George’s code – that it was not my group. Not to be put off with this, though, we set to work on our next challenge – making our laptops play noughts and crosses.
Over the course of the week, this certainly brought out some of the competitors in the group with Jake managing to make a code that simply could not be beaten. Indeed, after playing a million games against random opponents it never lost! This piece of programming mastery led to the hypothesis that caffeine, as tracked on our new caffeine chart, was the secret to his coding success.
Wednesday not only saw my group’s noughts and crosses code crushed by everyone else’s but also saw us finish the LaTeX courses with an introduction to creating posters. This will certainly be of great use to us when it comes to the end of the project!
Thursday was a strange day insofar as we had no scheduled activities and instead was left to work solely on our projects. This could have marked the start of a long and prosperous relationship with RStudio, however judging from the number of error messages on my screen this might have to wait a couple more weeks yet! In the evening we attended a board games night with some of the other MRes and PhD students, and fun was had by all
The following day was our second STOR-i coffee morning with a talk by David Torres Sanchez on optimising aircraft engine maintenance schedule. It was a very enjoyable talk in the sense that not only could we all follow what was being said, but it was delivered with cheerful humour too! Lunchtime then saw Callum entrench his tradition of burritos on a Friday and then in the afternoon we all decided to combine the group meetings into a group presentation where each group member gave a small talk on what they had done that week. This was great as not only was the subject matter interesting, but we all learnt a bit more about the work we were doing and what direction we should head in next too!
At the weekend some of us headed into the Lake District for a walk on Saturday, and others spent time with their girlfriends. On Sunday there was also a cinema trip to watch Dunkirk, a film I cannot recommend highly enough!
Written by Jake Grainger and Graham Laidler
With the end of the previous weeks’ coding lessons, we were able to fully sink our teeth into our individual projects. This allowed us to really make some solid progress, gaining a fuller sense of our projects’ complexities.
We all began to make some headway with our projects, and our coding skills went through the roof. Here is a spatial dependency plot that Jake produced. It shows the wave height dependency of different points with the central point, represented using his favourite colours.
On Friday, we enjoyed the usual STOR-i forum. This week, each of 5 PhD students attempted to explain their research in just 3 minutes 14 seconds. With the volume of the buzzer helpfully set to maximum, we were left in no illusions as to when this time was up. Jake had a headache for the rest of the day, but some hot milk and spatial extremes perked him up again. This was another great week on the STOR-i internship, and we are all bonding well.
Written by Chloe Fearn and Jonnie Bevan
Monday of week 5 saw the continuation of the after-work Game of Thrones watching tradition that has developed. We don’t watch Game of Thrones but given all the exciting talk since, we think it was a great episode!
On Wednesday of this week, we had to tackle the problem-solving day, which involved finding reasons why a cycling company was receiving less custom. We split off into three teams and spent the day analysing the relevant data and coming to conclusions about what the company could do to boost their customer base. At the end of the day, we presented our findings to the other groups and some of the MRes students; it was interesting to see the different approaches we all decided to go with. It was a fun break from the routine that we have settled into with our projects and gave us a chance to work collaboratively for the first time since the noughts and crosses project in week 2. On Thursday morning we talked to three of the PhD students about what life is like at STOR-i. They were very helpful and answered a lot of questions that we had!
This week’s STOR-i forum involved five pi-minute theses as opposed to the general half-hour presentation of a single thesis. We heard short presentations on a range of topics, and the loud buzzer at the end of each one kept us all on our toes! Afterwards, we headed to The Hub for the usual coffee and biscuits, before we rounded off the week with an afternoon of work (and a bit of Pictionary on the whiteboard).
Written by Callum Barltrop
This week started off a little different from the other weeks over the internship since it was the first week where we did not have anything scheduled! This was to allow us time to work on our project posters and presentation, which we would be going to be presenting in the following week.
Whilst working independently can be at times, this week really showed to us the difference between having a good working group can make. We regularly took coffee breaks and had chats about how our projects were going, as well as getting second opinions on some of the stuff we were working on. For many of us, this really helped to clarify what we were working on.
On Friday, we had the regular STOR-i forum - this week was a 'Pi Forum' where 5 PhD students had exactly 'Pi' minutes to present their work and the progress they had made.
Saturday began with an early start for some of us as we had decided to go and climb Scafell Pike - the highest point in England! A few of us got a little lost on the way there and spent a fair bit of time driving through farmers fields (cough cough Graham) but we made it in the end for some rather anticlimactic views. A great day all in all though!
Finally, on the Sunday, myself and George once again went out for a gentle run down some of the beautiful country roads around Lancaster, making for some awesome views.
All in all, this week really gave us good experience in what it would be like to work independently on a PhD, as well as how to summarise and present your work in a concise manner!