# Interns and Blogs

The STOR-i Summer Research Internships run for eight weeks, July to September, each year. Every cohort writes a weekly blog on their experience of the programme, and you can read about them here.

## Interns

Click on the links below to see details of the interns by year.

## 2021 Interns

Here you can find details of the summer 2021 interns including a description of their research project.

### Itamar Aharoni

#### Optimal Search with an Improving Searcher

**Degree**: MMath Mathematics, University of Warwick

**Supervisor**: Jake Clarkson

There are many important real-life situations involving a hidden object needing to be found by a searcher, for example, a survivor of a disaster by a rescue team. The classic search problem splits the area to be searched into n separate parts (called boxes), with the objective to search the boxes in an order which minimises the expected time to find the hidden object. A simple and easily-calculated optimal policy was found in 1962 by Blackwell.

The classic problem assumes that the searcher's detection capabilities in each box are constant throughout the search. However, in real life, the searcher's methods may improve upon repeated visits of a box due to increasing knowledge of the box’s geography. With such an `improving searcher’, an adjusted version of Blackwell's policy is known to be optimal, but its calculation is difficult, meaning it is no longer a practical solution.

The aim of this project is to investigate simpler policies with close-to-optimal performance, possibly with the aid of some simplifying theoretical results.

### Sam Bell

#### Time series forecasting

**Degree**: MSci Mathematics, Lancaster University

**Supervisor**: Hamish Thorburn

Dealing with uncertainty is crucial in modern life. Every business/organisation faces some uncertainty in their future, and to be successful, must find innovative ways to account for this.

One such method to try and reduce uncertainty is forecasting. Forecasting is the process of using historical data to predict future estimates. Areas in which forecasting is commonly used include:

• Future weather conditions (e.g. rainfall, temperature)

• The price of various financial instruments (e.g. stocks, bonds)

• Demands for various products for a business

In this project, we will be forecasting univariate data using time series models. This will involve examining the data, considering different types of forecasting models, selecting the most appropriate and implementing this model. We will have to consider various aspects, such as the assumptions made by each method, the effect of forecasting error measure, and forecast horizon to use.

The initial scope of the project will be to use ARIMA and Exponential Smoothing models for the forecasting. Time and data permitting, we may move onto some more advanced, machine-learning techniques (Abolghasemi et al., 2019).

### Emma Costello

#### Accounting for Seasonality in Extreme Sea Level Estimation

**Degree**: BSc Mathematics, University College Dublin

**Supervisor**: Eleanor D’Arcy

Storm surges pose an increasing risk to coastline communities. These events, combined with high tide, can result in coastal flooding. To reduce the impact of storm surges, an accurate estimate of coastal flood risk is necessary. Specifically, estimates are required for the return level of sea levels (still water). This is the level with annual exceedance probability p; estimates can be used as inputs to determine the height for a coastal defence, such as a sea wall. The return level estimation requires statistical analysis based on extreme value theory, as we need to know about the frequency of events that are more extreme than those previously observed.

Large storm surges exhibit seasonality, they are typically at their worst in the winter and least extreme in the summer. We focus on the skew surge: the difference between the observed and predicted high water within a tidal cycle. As well as an annual cyclic trend for seasonality, we investigate a linear trend in the mean skew surge and any residual trend in each season by sharing information spatially.

The seasonal pattern of skew surge differs from that of the tide, whose seasonality is driven astronomically, resulting in tidal peaks at the spring and autumn equinoxes. Hence, the worst levels of the two components of still water level are likely to peak at different times in the year, and so statistical methods that treat them as independent variables are likely to over-estimate return levels.

This project aims to understand the seasonal model for the skew surges and how we combine this with the known seasonality of tides to derive estimates of still water level return levels. We will apply the methodology to various coastal locations in the UK using data from the British Oceanographic Data Centre (BODC).

### Megan Harries

#### Exploring the impact of input uncertainty on the response of floating wind turbines

**Degree**: BSc Mathematics with Statistics, Lancaster University

**Supervisor**: Jake Grainger

Floating wind-turbines have the potential to drastically increase the quantity of energy which can be produced from the wind, whilst simultaneously reducing the visual impact on the landscape and the potential disruption to shipping. However, building such wind-turbines presents significant challenges. Understanding the design conditions generated by the environment plays a major role in solving such challenges. Models for the response of the structure under these design conditions are then used to help quantify the risk to a structure. The response of the turbine can be reasonably represented by a single-degree-of-freedom system. Such systems and their relation to the environmental conditions are described in Section 5 of Giske et al. (2017).

An important part of the design conditions is the spectral characteristics of the wave-field in which the turbine will be situated. Typically, this is summarised by a parametric model for the spectral density function of a wave process, such as those described by Michel (1999). To use such a model, appropriate parameters must be chosen based on the conditions at hand. Usually, this is done by fitting the model to observed data taken from an instrument such as a floating buoy, using, for example, the de-biased Whittle likelihood (Sykulski et al., 2019). However, there is uncertainty in the recovered parameters which can be significant and may impact the results of later analysis.

This leads us to ask the following question: how does the uncertainty in our model for the wave behaviour translate to uncertainty about the behaviour of the turbine? To answer this question we will need to understand the uncertainty around the model, as well as the way in which uncertainty propagates through to the structural response. Uncertainty in the parameters of the model can be quantified using the technique described in Section 6.2 of Sykulski et al. (2019), whilst some techniques for handling input uncertainty are described by Song et al. (2014).

### Max Howell

#### Anomaly Detection in Streaming Data

**Degree**: MMath Mathematics, University of York

**Supervisor**: Kes Ward

The world around us is filled with moving data. Every day, billions of sources produce trillions of data streams in real time - much of which cannot be stored long-term, and is lost before it can be analysed.

This project will look at ways to monitor real-time data streams searching for signals that characterise anomalous events. From satellite monitoring grids in space, to the traffic volume of cables that bring internet to your home, many of these data streams will only be looked over by human eyes if an algorithm first detects that something is odd. The choice of which algorithms to use is therefore crucial to shaping our understanding of an increasingly data-rich world.

Our challenge when constructing an anomaly detection algorithm is to keep the rate of false detections low, while finding anomalies as soon as possible after they start to happen, all the while working with very limited computational resources compared to more traditional domains of AI and machine learning.

### Adam Keyes

#### Search and Stop

**Degree**: BSc Applied and Computational Mathematics, University College Dublin

**Supervisor**: Ed Mellor

Effective search strategies are necessary in a wide range of real-world situations. The unsuccessful search for Malaysian Airlines flight 370 cost more than two hundred million Australian dollars.

The classical search problem assumes that a target is hidden in one of n distinct locations and that when searching the correct location there is a known probability of discovery. In this case, the best possible order to search the locations can be found by modelling the search process as a multi-armed bandit. This is a well-studied mathematical model inspired by slot machines where a series of decisions are made in order to maximise some reward. (A good explanation of this can be found in [Clarkson et al., 2020])

While conducting a search, it is necessary to expend some resource – usually time and/or money. Depending on the reward given upon finding the object it may or may not be cost-effective to carry out the proposed search at all. In some cases it may also be preferred to search for some period of time and then if unsuccessful give up [Ross, 1969]. This is especially true if there is any possibility that the object in question does not exist [Chew et al., 1967].

The aim of this project is to gain an understanding of search theory, focusing on the classical search problem and its variations which necessitate early termination. This project will also involve some hands on programming to apply these ideas.

### Kitti Kovacs

#### Constraint Programming for Network Slot Allocation Problems

**Degree**: BSc Data Science, University of Warwick

**Supervisor**: David Torres Sanchez

Solving large combinatorial optimisation problems is a very challenging task. In some cases, the number of possible solutions that have to be checked is prohibitively large. This is the case for most scheduling problems. Fortunately, there’s constraint programming (CP), where different deduction techniques are employed in order to reduce the possible set of solutions. In the general case, CP doesn’t require an objective function, finding a solution that satisfies a set of constraints is already hard enough, however, in most applications, there’s (at least!) one objective we would like to optimise.

At OR-MASTER, we deal with capacity allocation problems at airports. Of particular interest is the single airport slot allocation problem where we try to schedule airport capacity compliant solutions while satisfying airline slot requests. Slot here means an interval of time where an airline is granted access to the airport facilities they require. In this problem, there is at least one objective, namely, minimise the (absolute) difference between the allocated time and the requested one. There are further extensions, but the simplest version of the model can be found in Zografos et al. (2012).

CP has been found to be a very effective modelling and solution method many problems, including scheduling problems. Google’s OR-Tools CP-SAT is a very efficient implementation and easy to use.

### Euan McNaughton

#### Calculating the effect on FWER and Power when adding interim analyses to the Tukey test

**Degree**: BSc Mathematics and Statistics, University of Glasgow

**Supervisor**: Peter Greenstreet

In the pharmaceutical industry there may be multiple treatments that the industry wishes to compare with each other at the same time. There are some situations where a control treatment is not needed or possible. These include when testing from multiple treatments which are already being used as the current standard of care for a condition. In this scenario, we are interested in finding out if any of the treatments are superior or inferior compared to any of the others.

The aim of this project is to study the effects on the properties of a clinical trial when conducting a Tukey test with interim analyses. (Interim analyses allow the data to be analysed part way through the trial. From this, different trial decisions can be made.) This type of trial design will compare multiple treatments to one another at the same time as well as allowing for interim analyses. We will find the stopping boundaries and number of patients we need for our trial in two ways. The first of which is to use the Tukey test (Tukey (1949) and Kramer (1956)). The second is to use the multi-arm multi-stage (MAMS) approach (Magirr et al. (2012)). When we have these from the two designs we will study the effect on the errors of using these in our setting. The coding language of R will be used for this project and the MAMS R package will be used (Jaki et al. (2019)).

### Owain Morgan

#### Heuristic algorithms for Network Pricing

**Degree**: BSc Mathematics, University of York

**Supervisor**: Aaditya Bhardwaj

Network models are inherent in many real-world situations, pricing problem is one among them. The traditional fuel as well as EV charging station prices heavily depends on the connectivity available to the particular station, and the prices available at alternative outlets. We have developed a deterministic pricing model (including a few algorithms to solve it sub-optimally) considering network structure to optimise such pricing decisions.

Outlet matching is a key component of the developed model. In outlet matching, we are required to identify the most favourable outlet for consumers given the entire set of alternative outlets, and the prices available there. The aim of this project is to develop a unique algorithm to solve the outlet matching problem, draw valuable insights from the model under different scenarios as well as compare heuristics (existing and developed) by using a simulation study.

### Adam Page

#### Reinforcement Learning in Stochastic Games

**Degree**: MSci Mathematics, Lancaster University

**Supervisor**: Matt Darlington

Game theory is an area which spans many disciplines such as biology, computer science, economics and psychology. Rather than being able to simply optimise our actions, we are forced to consider how our choices will impact the decisions of others. The most common way to encapsulate this is the Nash equilibrium.

We are not always able in a real world situation to be able to solve these exactly, however, since we do not always have access to the reward function we are receiving our utility from. This motivates the need for reinforcement learning techniques where we can learn from past observations to make informed decisions about future actions.

Additionally, we can generalise the standard normal-form games to stochastic games where we not only have to consider our immediate payoff, but also balance that with the ability to obtain future rewards. This complicates the problem and will be the end goal to think about in this project.

### Martin Smit

#### The convergence of Wasserstein distance estimates

**Degree**: BSc Mathematics and Statistics, University of Warwick

**Supervisor**: Tamas Papp

The problem of quantifying the distance between two probability distributions occurs frequently in statistics and machine learning, both in theory and application. Formulating this problem as one of moving mass from one place to another, the theory of optimal transport (Villani, 2009) provides a principled way of defining metrics between probability distributions. These metrics are known as Wasserstein distances.

While they possess attractive theoretical properties, Wasserstein distances can be challenging to work with in practice. Aside from a few specific scenarios, Wasserstein distances do not have analytical expressions, even when the two distributions to be compared are known explicitly. Fortunately, if one has, or can generate, samples from both distributions, then one can also compute several estimators of the distance of interest. As the number of samples from both distributions grows to infinity, the naive Wasserstein distance estimator is known to converge to the true distance.

The usefulness of such estimators is dependent on how quickly they converge to the quantity of interest, as well as the amount of variation they exhibit. Theoretical results are available on both fronts, but many are sub-optimal or are subject to rather strict conditions. The aim of this project is therefore to complement the theory with a simulation study. This study could investigate the potential for conditions to be weakened, whether central limit theorems can be extended, or whether bounds on rates of mean-convergence are sharp, but it need not be limited to these topics.

### Rachel Wood

#### Detecting Changepoints for Condition Monitoring

**Degree**: BSc Mathematics, University of Bristol

**Supervisor**: Tessa Wilkie

Many industrial systems are equipped with sensors. We can apply statistical or machine learning techniques to sensor data to tell us whether a system is running properly or whether there is a fault that needs fixing. This is known as condition monitoring.

One way of monitoring the condition of a system is by applying changepoint detection. Flagging a change in a statistical property (such as the mean or variance) of a sequence of sensor data can give us an early warning that something is up. However, with an industrial application in mind we need to take especial care of certain practical issues: we do not want to identify a lot of spurious changes and we need methods that are fast.

The aim of this project is to investigate changepoint detection applied to challenging datasets. Accurate and speedy changepoint detection becomes more difficult with very large, high-dimensional datasets and/or those with dependence between series.

## 2020 Interns

Here you can find details of the summer 2020 interns including a description of their research project.

### Malvina Bozhidarova

#### Modelling extremes of environmental data

**Degree**: BSc (Hons) Mathematics, Manchester University

**Supervisor**: Callum Barltrop

Non-stationary is a term that is used to describe data for which the underlying distribution is not fixed. This is a common feature in many environmental datasets; for example, we often see temperature data increasing over time. When this is observed, standard statistical methodology that assumes data is identically distributed cannot be applied. This can lead to many inferential challenges, especially in the case of extreme value theory.

This branch of statistics is the theoretical framework used to model ‘extreme’ events (events considered to be rare or uncommon). By definition, there is little data for such events, any analysis relies heavily upon theoretical results. However, when non-stationarity is present, the standard theory cannot be applied. Moreover, we have to question what we mean by an ‘extreme’ event. For example, an ‘extreme’ event today may not be considered ‘extreme’ in 10 years time.

Non-stationarity can be present for a range of different factors. One such factor of particular interest is climate change. It is expected that climate change will continue to increase temperatures, while also increasing the frequency and magnitude of some extreme weather events, such as storms and floods. Therefore, it is important to find ways to build climate change into our framework.

In this project, we will be considering methods for modelling extreme events of environmental data using variables that drive climate change, such as CO2 levels. Initially, we will use a range of simulated dataset to develop a model capturing that can capture climate trends in extreme events. We will then be applying this model to UK climate projection data for a range of different variables, including temperature and humidity.

Click here for Malvina - presentation

### James Boyle

#### Modelling Populations of Networks

**Degree**: MMath Mathematics, University of Warwick

**Supervisor**: George Bolt

Network data arises when we have relational information between entities of a system. A canonical example is social network data, where we may observe information on friendships within a sample of a population. We typically represent this data as mathematical graph, i.e. a set of vertices and edges, where vertices correspond to entities and edges represent relationships between them.

The development of statistical models for network data has become an active area of research, see Goldenberg et al. (2009) for a review. The majority of these models assume the observed network was generated in some pre-specified stochastic manner, dependent on a choice of model parameters. The task of the statistician is then to take an observed network and infer what parameters could have, or were most likely to have, led to its appearance.

Many of the traditional network models were constructed with application to individually observed networks in mind. However, with datasets becoming larger and richer there has been recent interest in developing models to describe the generative process of a population of networks. An interesting example of such data are connectomes, which are network representations of brain connectivity inferred from MRI scans. Typically, a scan would be taken on a sample of patients, and so after some data processing we end up with a network for each patient.

In this project, the student will be introduced to the statistical problem of modelling network data. Through simulation experiments, they will explore the benefits and drawbacks of some popular network models, before comparing these with models recently proposed in the literature that deal specifically with the problem of modelling a population of networks.

Click here for James - presentation

### Luke Fairley

#### Extreme events: what are the odds?

**Degree**: BSc Mathematics, Lancaster University

**Supervisor**: Stan Tendijck

In this project, we compare different models in modelling bivariate extremes. This has oceanographic applications, for example, in the joint modelling of wave height and wind-speed, both of which are important variables in the calculation of failure probabilities of offshore facilities. Also, other applications can be thought of, like the joint modelling of losses on financial assets like the FTSElOO and AEX, or the modelling of the composition of certain gases in the atmosphere.

In the project, we compare the Heffernan-Tawn model introduced in [2] and a number of derived models. The Heffernan-Tawn model is a conditional extremes model that captures a wide variety of different dependency structures, essentially it is a form of regression model fitted only to the extremes. It is currently one of the most flexible models used in the field (5OO+ citations). They chose their particular model form since it worked asymptotically on almost all bivariate dependency structures (copulas) that were developed. The model is not perfect as many small variations have been introduced since its introduction, e.g. in [3]. We will compare a few of these and potentially come up with a new one.

The Heffernan-Tawn model is an asymptotic model, i.e., it works as long as we push observations far enough away. However, in the area of extremes, we do not want to model observations with an occurrence probability of 10−30 but rather 10−2. The Heffernan-Tawn model might not have converged enough to make the model form the best possible one.

Based on different model choices, we compare the differences by estimating probabilities of extreme sets, e.g. the probability that a wave larger than 5m occurs together with a windspeed of higher than 40 knots?

Click here for Luke - presentation

### Daniel Hodgson

#### Uncertain Predictions in Resource Allocation Models

**Degree**: MSc Natural Sciences, Durham University

**Supervisor**: Ben Black

Resource allocation problems are extremely common in the operational research literature. They consist of allocating a fixed set of resources such as human workers or machines, to a set of skills, jobs or functions to try to meet as much demand as possible. This could be allocating electricians to jobs (Chen et al., 2018) or multi-skilled handlers to calls (Koole and Pot, 2005). These problems handle one day at a time, and as such, the demand (jobs and calls) are known.

However, in some problems such as that of the medium-term planning of a telecommunications company’s engineers (Ainslie et al., 2015), we need to plan for a large number of days in the future. This means we need forecasts of the demand that we will need to meet over this period. These forecasts are almost always uncertain, but many optimisation models still treat them as though they aren’t. This can lead to big issues in the resulting allocations, such as wasted resources and unmet demand. This project will entail studying a variety of methods that can be used in mathematical and dynamic programming to reduce or incorporate this uncertainty in the models used. An example starting point could be training a reinforcement learning (Sutton and Barto, 1998) (RL) model that learns how best to correct a poor forecast in real time.

Click here for Daniel H - presentation

### Katharina Limbeck

#### Solving small-scale Arc Routing Problems

**Degree**: MSci Mathematics and Statistics, University of Glasgow

**Supervisor**: Thu Dang

Arc Routing Problem (ARP) arises in several applications, such as postal delivery, meter reading, snow removal, salt spreading, and waste collection. The aim is to find a vehicle route or a set of paths in a network at minimum cost, such that certain arcs are traversed by at least one vehicle, possibly subject to various side constraints such as limited vehicle capacities, time windows, one-way streets and so on.

Linear programming (LP) is a method to guide decision-makers toward the choice of the best options by making use of a mathematical model whose requirements are expressed by linear relationships. Linear programming is a special case of mathematical optimization. It is expected that the focus of this project will be how to model ARPs in small-scale instances.

Initially, simulated data will be used and suitable working code will be provided. Then, once a general framework for the model is established, real data at small-scale will be used to test it. The model will then be tweaked to make use of all the available specific features of the data.

Click here for Katharina - presentation

### Adeeb Mahmood

#### Classification in an on-line setting

**Degree**: MSci Mathematics, Lancaster University

**Supervisor**: Chloe Fearn

In classification, a model is trained on some historic data, then for subsequent data, the features are used to predict the responses. However, sometimes the underlying distribution of the responses given the features changes over time; in this case, if a model is trained once and used to classify incoming data (whose responses are not known) forever it will eventually be rendered useless, since the responses of the test data will not be related to their features in the same way that the training data was.

Another problem that arises in classification is that sometimes it is expensive to view the responses of instances. In this situation, we need to view the instances that bring the most information to the classifier, and view as few as possible to save on cost.

There is plenty of literature for on-line classification, lots of which involves forgetting factors or sliding windows. This project will first explore off-line binary classification methods, then move onto on-line methods. If time allows, active learning will be explored, which involves methods that select a subset of the full data set to learn from when label requests are expensive.

Click here for Adeeb - presentation

### Jack McGinn

#### Modelling Waves in the Ocean

**Degree**: MPhys Physics with Theoretical Physics, University of Manchester

**Supervisor**: Jake Grainger

The world’s oceans continue to play an important part in many aspects of modern life. Waves in the ocean can cause damage to structures and ships alike, endangering their crews and causing significant financial and environmental damage. Waves also propagate onshore, where they cause erosion and flooding. As such, it is important to understand their behaviour.

Observations of ocean waves come in the form of measurements of the displacement of the sea surface at a given location. These observations can then be used to develop parametric models that can describe the sea surface, many of which are summarised by Michel (1999). Typically these models describe the spectral density function of the process of interest (the frequency domain analogue of the autocovariance). To fit such models to actual data we can use pseudo-likelihood approaches such as the de-biased Whittle likelihood (Sykulski et al., 2019).

The way in which these parameters evolve over time is of increasing interest to engineers, especially in rapidly developing weather systems such as tropical cyclones. During this project, the student will use existing techniques to fit models to data-sets and then use the model fits to explore the behaviour of the different parameters as the sea evolves.

Click here for Jack - presentation

### Daniel Morton

#### Input Uncertainty Quantification for Stochastic Simulation

**Degree**: BSc MORSE, Lancaster University

**Supervisor**: Drupad Parmar

The behaviour of many real-world systems, such as airports, hospitals, and manufacturing lines depend greatly upon some level of inherent randomness, and therefore such systems are frequently modelled using stochastic simulation. The randomness in the simulation is driven by input models, represented by probability distributions or processes, which are often estimated via data collected from the real-world system. Since the samples of data are finite, uncertainty arises in the estimated input models and this propagates through the simulation model to performance measure outputs.

Rarely is this propagation of input model uncertainty considered in simulation output analysis. Common practice is to report simulation-based confidence intervals for performance measures, how- ever these typically ignore input uncertainty and only include stochastic estimation error. Without considering the propagation of input model uncertainty in simulation output analysis, decisions are at risk of being made with misleading levels of confidence. Interest therefore lies in quantifying the uncertainty that arises in the simulation output as a result of the uncertainty in estimating the input models.

The aim of this project is to develop an understanding of input uncertainty and implement some existing methodologies for quantifying input uncertainty on provided stochastic simulation models, whilst considering the relative advantages and disadvantages of each method.

Click here for Daniel M - presentation

### Thomas Newman

#### Machine learning in simulation

**Degree**: MSci Statistics, University of Glasgow

**Supervisor**: Graham Laidler

Simulation is commonly used to model many real-world operations. For example, queueing systems naturally arise in the operational running of facilities such as hospitals, call centres, and manufacturing processes. To optimise the performance of such systems, their complexity often makes mathematical analysis infeasible and a simulation model is used instead. Briefly, a simulation model replaces the random processes that occur in the real-world system, such as customer arrivals and service times, with appropriately distributed random variables. Sampling these random variables allows the system to be simulated, and its performance can then be evaluated with regards to some measurable performance indicator, such as customer waiting times. However, by the stochastic nature of the systems being modelled, performance indicators can fluctuate significantly over time. As such, traditional time-averaged performance indicators give an incomplete picture of a highly variable system. There is growing interest in obtaining a deeper understanding of simulation behaviour; for example, we want to uncover the main causes of time-varying performance.

This project will include some exploratory data analysis of the data generated by a simulation model, with focus on visualising the fluctuations in performance. We will then turn our attention to some common machine learning methods, and consider ways to exploit them for our purpose. Namely, we want to uncover the driving factors behind observed simulation performance. This project offers the chance to produce some novel methodology, and can be flexible depending on the interests and prior experience of the student.

Click here for Thomas - presentation

### Taj Patel

#### Dynamic Latent Space Network Models

**Degree**: BSc Mathematics with Statistics, University of Warwick

**Supervisor**: Amiee Rice

Networks are often used to represent real world interactions, and therefore the ability to model real world behaviour is paramount in the field of network analysis and modelling.

Latent space models have been used to capture a high level of transitivity in networks (the common phrase that you will come across is “a friend of a friend is a friend of mine too”). This first task in this project is to understand latent space modelling of networks.

Dynamic network models allow us to capture how time affects an interaction network. How do we accurately show the change in affinity for connection between two individuals as time goes on? Then using this information, you will be able to implement models that use both methods to capture realistic interaction patterns.

Click here for Taj - presentation

### Ryan Pownall

#### Anomaly detection using functional data analysis, with applications to sea surface temperature data

**Degree**: BSc Mathematics with Finance, Newcastle University

**Supervisor**: Edward Austin

Functional Data Analysis is used to model phenomena observed over a period of time as a continuous function. This is of particular use in situations where the observations are recorded at a high frequency over the time period, as this means that a large collection of points can be represented as a single observed curve. Inference using the observed functions can then take a variety of forms, and this project will focus on anomaly detection using Functional Data. Anomaly detection is the process by which the data are examined to test whether an observation differs significantly from the other observations, or some underlying expected process.

Anomaly detection for point data is a well-studied area, and the challenge is to extend the classical notions of an anomaly to the functional domain. In particular, how can outlyingness be measured with respect to a continuous function given the fact that each observation will be a smooth curve which varies over time. Furthermore, how can this definition of outlyingness be described so that sensible conclusions can be drawn from the data.

This project will seek to address these challenges, first by performing a review of the existing functional data anomaly detection methods, and then using these to detect anomalies within Pacific Sea Surface Temperature Data. The aim of this will be to detect not only the effect of a changing climate on the sea surface temperature, but also identify periods where anomalous weather has led to unexpected temperature being recorded.

Click here for Ryan - presentation

### Matthew Speers

#### Online Sparse Temporal Disaggregation

**Degree**: BSc Mathematics, Lancaster University

**Supervisor**: Luke Mosley

Due to the significant adverse effect to the global economy caused by the coronavirus pandemic, there has never been a more important time to understand the short-term movements of headline macroeconomic variables. We can no longer rely on infrequent publications of GDP or traditional annual business surveys to inform us on the current state of the economy. Ever since the global financial crash between 2007-2008, national statistics institutes, such as ONS here in the UK, have motivated the need for a vast set of high frequency indicator time series that are readily available and measure numerous processes. This set will be used to create disaggregated series of infrequent headline variables, which will provide early warning signals of potential large economic impacts such as financial crashes and pandemics, and administer more accurate measurements of the rapidly evolving modern economy.

With the digital revolution we witness today, there are many potential resources for high frequency indicators. To disaggregate GDP, we could use credit card transactions data or VAT returns data. To disaggregate inflation, we could use scanned price data in supermarkets or social media news articles. To disaggregate unemployment, we could use web- scraped online job advertisement data. In the econometrics literature, the process of disaggregating a low frequency time series by making use of indicator series recorded at the desired high frequency is known as temporal disaggregation. This is a two-step procedure that involves finding a preliminary estimate for the high frequency disaggregated series (usually by performing GLS regression) and then distributes the aggregated residuals among the preliminary series. With the vast number of indicator series we would now like to use when performing temporal disaggregation, standard techniques such as GLS become statistically infeasible due to the curse of dimensionality, and therefore current methods fail. To resolve this difficulty, one can set up the temporal disaggregation problem in the sparse modelling framework by incorporating a LASSO regularization penalty which will focus on selecting a small set of the indicators having the most informative power on the variable of interest. The resulting high frequency estimates from sparse temporal disaggregation will be informative on two fronts, firstly they provide accurate visualisation on the short-term movements of the headline variable and secondly, they give interpretation into what indicator series are most relevant for future estimations.

The aim of this project is to devise a way sparse temporal disaggregation can be performed in the online setting, i.e. how to automatically update the model in light of data revisions. Data revisions will be very common when performing temporal disaggregation. For example, they may be due to a new time period occurring for the low frequency variable, or changes in the indicator series set due to improved data sources. More major revisions occur when there is a change in legislation or a change in accounting definitions or in times of financial crisis. Understanding how estimates are affected by revisions plays an important role in assessing how reliable the sparse temporal disaggregation model is. We would like estimates to remain precise but also stable over time.

Click here for Matthew - presentation

## 2019 Interns

Here you can find details of the summer 2019 interns including a description of their research project.

### Dylan Bahia

#### Investigating the Eﬀect of Dependence on Averaging Extremes

**Degree:** BSc (Hons) Mathematics, University of Manchester**Supervisor:** Jordan Richards

Extreme Value Theory is often used to model extreme weather events, such as extreme rainfall, which is the main cause of river ﬂooding. Given data at separate locations, we can ﬁt simple models to understand the distribution of heavy rainfall at these single locations. However, ﬂooding is generally not caused by an extreme event at a single location; it is caused by extreme rainfall averaged over several locations, often referred to as a catchment area. This project aims to investigate how the distribution of extreme rainfall at single locations, and the dependence between extreme events at diﬀerent sites, aﬀects the distribution of the average overall sites. We do this using a subset of data provided by the Met Oﬃce, which consists of gridded hourly rainfall across the north of England. Taking each grid box to be a single location, we can ﬁt distributions that model extreme events at each particular location. Dependence between grid boxes can also be quantiﬁed using empirical measures. We then average over adjacent grid boxes and ﬁt the same distributions. We are particularly interested in how the parameters of the distributions change as we average over an increasing number of grid boxes, and how the dependence between locations inﬂuence this change.

Click here to view Dylan - poster and Dylan - presentation

### Matthew Darlington

#### Optimal learning for multi-armed bandits

**Degree:** BSc (Hons) Mathematics, University of Warwick**Supervisor:** Livia Stark

Optimal learning is concerned with eﬃciently gathering information (via observations) that is used in decision making. It becomes important when the way information is gathered is expensive, so that we are willing to put some eﬀort into making the process more eﬃcient. Learning can take place in one of two settings, oﬄine, or online. In oﬄine learning, we make a decision after a number of observations have taken place, while in online learning decisions are made sequentially so that a decision results in a new observation that in turn informs our next decision. This project will focus on learning for multi-armed bandits.

Multi-armed bandits present an online learning problem. They are easiest to visualise as a collection of slot machines (sometimes referred to as one-armed-bandits). The rewards from the slot machines are random, and each machine has a diﬀerent, unknown expected reward. The goal is to maximise one’s earnings from playing the slot machines. That can be achieved by playing the machine with the highest expected reward. However, the expected rewards can only be estimated by playing the machines and observing their random rewards. Therefore there is a trade-oﬀ between exploring bandits to learn more about their expected rewards and exploiting bandits with known high expected rewards.

Click here to view Matthew D - poster and Matthew D - presentation

### Katie Dixon

#### Recruitment to Phase III Clinical Trials

**Degree:** BSc(Hons) Mathematics and Statistics, Lancaster University**Supervisor:** Szymon Urbas

Clinical trials are a series of rigorous experiments examining the eﬀect of a new treatment in humans. They are essential in the drug-approval process, as per the European Medicines Agency guidelines. In order for a drug to be made available to the public, it must pass a number of statistical tests each with suﬃcient certainty in the outcomes. The most costly part of the trials process is Phase III, which is composed of randomised controlled studies with large samples of patients. Patients are continuously enrolled across a number of recruitment centres.

The standard way of modelling recruitment in a practical setting is to use a hierarchical Poissongamma (PG) model, as introduced in Anisimov and Fedorov (2007). The framework assumes that the rates at which patients come into each centre do not change over time. The main argument for using the simple model is the limited data available for inferences as well as tractable predictive distributions. A recent work of Lan et al. (2018) explores the idea of decaying recruitment rates. However, the proposed model lacks ﬂexibility in accounting for a multitude of recruitment patterns appearing across diﬀerent studies.

The internship project will concern itself with the analysis of a ﬂexible class of recruitment models in data-rich scenarios. The project will likely tackle an open problem in the area, which is the presence of a mixture of diﬀerent recruitment patterns appearing in a single study. This will likely involve novel ways of clustering centres based on the observed recruitments. The project will entail a mixture of applied probability, likelihood/Bayesian inference and predictive modelling. There will be a strong computing component in the form of eﬃcient optimisation or simulation methods.

Click here to view Katie D - poster and Katie D - presentation

### Matthew Gorton

#### Investigating Optimism in the Exploration/Exploitation Dilemma

**Degree:** MPhys Physics with Astrophysics and Cosmology, Lancaster University**Supervisor:** Alan Wise

A stochastic multi-armed bandit problem is one where a learner/agent has to maximise their sum of rewards by playing a row of slot machines, or ‘arms’, in sequence. In each round, the learner pulls an arm and receives a reward corresponding to this arm. The rewards that are generated from each arm are assumed to be distributed as a noisy realisation of some unknown mean, therefore, maximising the sum of the rewards relies on ﬁnding the arm with the highest mean. We wish to create policies to tell us which arm to play next in order to maximise our reward sum.

The challenge to ﬁnding the best arm in the multi-armed bandit problem is the exploration-exploitation dilemma. This dilemma occurs since, at any time point, we need to decide between playing arms which have been played a low number of times (exploration) or the arm with the best-estimated mean (exploitation). If the learner explores too much then they will miss out on playing the optimum arm, however, if the learner chooses to exploit the best arm, without exploring other options, then they could end up exploiting a sub-optimal arm. It is clear that the best policies balance both exploration and exploitation. The policies which we will study in this project follow the philosophy of optimism in the face of uncertainty.

These policies work by being optimistic towards options which we are uncertain about. For instance, consider an intern visiting Lancaster University for the ﬁrst time. For lunch, if they are optimistic about the local food places (Sultan’s/ Go Burrito) over chains (Subway/ Greggs), then they will be more likely to explore the places that they are more uncertain about. In multi-armed bandits, these policies give each arm an upper conﬁdence bound index (UCB), which usually takes the form of the estimated mean reward plus some bias, and the arm is played with the largest value of the index. These types of policies are mathematically guaranteed never to perform badly - but can we do better? This is the major question of this project.

Click here to view Matthew G - poster and Matthew G - presentation

### Joseph Holey

#### Predicting ocean current speed using drifter trajectories

**Degree:** MPhys Theoretical Physics, Durham University**Supervisor:** Mike O'Malley

The dataset I am using involves tracking of drifting objects in the sea which are tracked by GPS. These drifting objects are commonly referred to as drifters. In summary, the location of drifters is processed to obtain quarter daily Longitude, Latitude and Velocity Data. In order to model complex phenomena in the ocean, one of the ﬁrst pre-processing steps is to remove a large scale mean velocity of the drifters. In other words, focus on the residuals, a model which predicts velocity, given location. Currently, one of the most popular methods to do this involve binning the data, then extracting a mean in each bin and using this mean as the prediction. This project will aim to develop a better, more accurate method to predict velocity at a given location.

The general scope of methods you will be focusing on are classed as nonparametric regression, and this includes spline regression, Gaussian processes, local polynomial regression and more. These methods are generally applied to independently distributed data. One of the more diﬃcult aspects of modelling this type of data is accounting for the non-independent nature. In particular, the next sampled location in a trajectory strongly depends on the current location and velocity at that location. In particular, this sequential sampling can strongly aﬀect model selection which will be a large part of this project.

Initially, the project will look at modelling a relatively simple toy simulation which I will supply. The reasoning behind this is that we know the true underlying process, therefore the models you ﬁt can be compared to the known ground truth. The method which is found to work best can then be applied to the real dataset with empirical evidence that it works on similar data.

Click here to view Joe - poster and Joe - presentation

### Shyam Popat

#### Point Processes on Categorical Data

**Degree:** BSc (Hons) MORSE, University of Warwick**Supervisor:** Jess Gillam

The aim of this project to explore point process methods to model categorical time series, speciﬁcally data provided by Howz. Howz is home monitoring system based on research that indicates changes in daily routine can identify potential health risks. Howz use appliances placed around the house and other low-cost sources such as smart meter data to detect these changes. This data is a great example of how categorical data applies to real-life situations. One potential way of modelling this data is to use point processes.

Point processes are composed of a time series of binary events (Daley and VereJones, 2003). There exist many diﬀerent point processes that could be useful for modelling this data, such as Poisson processes, Hawkes processes and Renewal processes (Rizoiu et al., 2017). The goal of this project is to ﬁnd ways to model multiple sensors, looking at the time between sensors being triggered to see if this indicates a change in routine. One extension to this project would be exploring the relationship between the categories; thus having diﬀerent models for each category. We could also look into subject speciﬁc eﬀects in the data.

Click here to view Shyam - poster and Shyam - presentation

### Katy Ring

#### Detection Boundaries of Univariate Changepoints in Gaussian Data

**Degree: BSc (Hons) Computer Science / MSc Data Science, LMU Munich Supervisor: Mirjam Kirchner / Tom Grundy**

Changepoint detection deals with the problem of identifying structural changes in sequential data, such as deviations in mean, volatility, or trend. In many applications, these points are of interest as they might be linked to some exogenous cause. In the univariate case, the factors impacting on the detectability of a changepoint are well known: size of the change, location of the change, type of change, number of observations, and noise. However, the interplay of these parameters with the detectability of a changepoint hidden within a data sequence is yet to be studied in detail.

In this project, we investigate the reliability of the likelihood ratio test (LRT) statistic for detecting a single change in a univariate Gaussian process. To this end, we will conduct a simulation study testing diﬀerent settings of the parameters change in mean, change in variance, location of the change, and sample size. In particular, we are interested in ﬁnding parameter combinations for which the LRT becomes unreliable. For example, for a ﬁxed variance, sequence length, and changepoint position in the data, we would decrease the change in mean until we ﬁnd a region in which the LRT scatters around zero. The overall aim is to derive a surface that splits the parameter space of the LRT statistic into a detectable (LRT > 0) and undetectable (LRT →±0) changepoint region. Ideally, as a next step, an explicit relation between the simulation parameters and the detection boundary would be determined empirically. Further experiments on detection boundaries are possible, such as analysing non-Gaussian data or alternative test statistics.

*How the diﬃculty of multivariate changepoint problems vary with dimension and sparsity*

Multivariate changepoint detection aims to identify structural changes in multivariate time series. Increasingly more attention is being paid to developing methods to identify multivariate changes with little information known about the diﬃculty of multivariate changepoint problems and how they scale with dimension and sparsity. This work aims to answer the question: ‘If we have a user deﬁned signiﬁcant change size, could a change this size actually be detected using changepoint methods?’

In the univariate changepoint setting, it is well understood that there are many factors that aﬀect detectability of a changepoint including the size of the change, location of the change, type of change and length of the time series. Moving from a univariate to multivariate setting adds several layers to the detection problem; including the dimension of the time series and the sparsity of the changepoint.

Recent work at Lancaster University has considered, computationally, the case of a multivariate change in the mean problem where the change size is identical in all dimensions. Under these constraints, a relationship was identiﬁed between the size of the change and the number of dimensions that ensures the true and false positive rates remain constant. This project seeks to identify a relationship between the sparsity of a changepoint and the diﬃculty of detecting it as well as exploring the problems theoretically to give more justiﬁcation to the computational ﬁndings. If time and interest allow, we will also explore the aﬀect of varying the size of the change in each series on the detectability of the changepoint.

Click here to view Katy R - poster and Katy R - presentation

### Moaaz Sidat

#### Evaluating A Response Adaptive Clinical Trial using simulations

**Degree:** MSci Mathematics, Lancaster University**Supervisor:** Holly Jackson

Before a new drug can be distributed to the public, it must ﬁrst go through rigorous testing to make sure it is safe and eﬀective. This evaluation in humans is undertaken in a series of clinical trials. The approach most often used in clinical trials is the randomised controlled trial (RCT), which assigns all patients with equal probability to each treatment in the trial. Therefore RCTs are an eﬃcient way to identify if there is a signiﬁcant diﬀerence between the treatments in the study. Hence, the equal allocation of patients to each treatment maximises the power of the study. However, RCTs do not allow the possibility of changing the probability of assigning a patient to the treatments. If it emerges before the end of the trial that one treatment is clearly more eﬀective than the other, then to maximise the number of patients treated successfully, logic dictates the remaining patients should be allocated to the most eﬀective treatment.

Response adaptive designs use information from previous patients to decide which treatment to assign to the next patient. They vary the arm allocation in order to favour the treatment, which is estimated to be best. Multi-Armed Bandits (MAB) are an example of a response-adaptive design. They allocate patients to competing treatments in order to balance learning (identifying the best treatment) and earning (treating as many patients as eﬀectively as possible). One issue with some response adaptive designs is every patient is expected to produce the same outcome if given the same treatment. However, some patients will have certain characteristics (also known as covariates) which means they will react to the same treatment diﬀerently. For example, an overweight man in his twenties may react diﬀerently to a drug than an underweight woman in her eighties.

This internship will focus on a randomised allocation method with nonparametric estimation for a multi-armed bandit problem with covariates. This method uses nonparametric regression techniques (including polynomial regression, splines and random forests) to estimate which treatment is best for the next patient due to their particular covariate. The main emphasis of this project is the endpoint. An endpoint could be binary, such as the treatment curing the patient or not, integer-valued, such as the number of epileptic ﬁts in 6 months, continuous, such as a change in blood pressure, or it could be the survival time of a patient.

Click here to view Moaaz - poster and Moaaz - presentation

### Jack Trainer

#### Heuristic procedures for the resource-constrained project scheduling problem

**Degree:** MSci Natural Sciences, University of Bath**Supervisor: **Matt Bold

The resource-constrained project scheduling problem (RCPSP) is a well-studied problem in operational research. Given a set of precedence-related activities of known duration and resource requirements, and a limited amount of resource, the RCPSP consists of ﬁnding a schedule that minimises the time to complete all the activities (known as the project makespan). Solving this problem on a large scale is very diﬃcult. Hence, whilst many exact solution methods exist for solving the RCPSP, these are too slow and therefore largely ineﬀective at solving this problem on a realistic scale. Therefore, the study and evaluation of fast, but inexact procedures (known as heuristic procedures) for solving the RCPSP is critical for real-world application.

Priority-rule heuristics are simple, yet eﬀective, scheduling procedures, consisting of a rule for ordering activities into a so-called activity list representation, and a rule for turning the activity list representation into a complete schedule. This simple class of heuristics form the basis of many of the most successful heuristic procedures for the RCPSP. This project aims to compare the eﬀectiveness of a number of diﬀerent procedures from this large subset of heuristics, by testing them on a large database of RCPSP test-instances, as well as investigate possible further improvements and extensions to them.

Click here to view Jack - poster and Jack - presentation

### Connie Trojan

#### Approximate posterior sampling via stochastic optimisation

**Degree:** MMath Mathematics, Durham University**Supervisor:** Srshti Putcha

We now have access to so much data that many existing statistical methods are not very eﬀective in terms of computation. These changes have prompted considerable interest amongst the machine learning and statistics communities to develop methods which can scale easily in relation to the size of the data. The “size” of a data set can refer to either the number of observations it has (tall data) or to its dimensions (wide data). This project will focus on a class of methods designed to scale up as the number of available observations increases.

In recent years, there has been a demand for large scale machine learning models based on stochastic optimisation methods. These algorithms are mainly used for their computational eﬃciency, making it possible to train models even when it is necessary to incorporate a large number of observations. The speed oﬀered by stochastic optimisation can be attributed to the fact that only a subset of examples from the dataset is used at each iteration. The main drawback of this approach is that parameter uncertainty cannot be captured since only a point estimate of the local optimum is produced.

Bayesian inference methods allow us to get a much better understanding of the parameter uncertainty present in the learning process. The Bayesian posterior distribution is generally simulated using statistical algorithms known as Markov chain Monte Carlo (MCMC). Unfortunately, MCMC algorithms often involve calculations over the whole dataset at each iteration, which means that they can be very slow for large datasets. To tackle this issue, a whole host of scalable MCMC algorithms have been developed in the literature. In particular, stochastic gradient MCMC (SGMCMC) methods combine the computational savings oﬀered by stochastic optimisation with posterior sampling, allowing us to capture parameter uncertainty more eﬀectively. This project will focus on implementing and testing the stochastic gradient Langevin dynamics (SGLD) algorithm. SGLD exploits the similarity between Langevin dynamics and stochastic optimisation methods to construct a robust sampler for tall data.

Click here to view Connie - poster and Connie - presentation

### Liv Watson

#### Using Pairwise Comparison in Sports to Rank and Forecast

**Degree:** MMath Mathematics, Durham University**Supervisor:** Harry Spearing

The aim of this project is to develop a ranking system for sports. Deﬁning a ‘good’ ranking depends on the aim. A ranking system that is used to predict future results must provide accurate predictions and could have a complex structure, whereas a system designed to seed players for a tournament needs to be robust to exploitation, fair, and easy to understand. Generally, a system that excels in one of these areas will fail in the other.

It is expected that the focus of this project will be the former, namely, to develop an accurate and robust ranking system. The accuracy of the system can be measured by comparing its predictive performance against existing benchmarks as well as bookmaker’s odds, and robustness can be measured by the ranking’s sensitivity to small changes in match outcomes. A ranking system that is applicable to all sports is, of course, ideal, but sport speciﬁc features will need to be considered to achieve state-of-the-art prediction accuracy, and some general knowledge or interest in sports will be of use. Initially, simulated data will be used to design the ranking system. Then, once a general framework for the model is established, real data from a sport of the student’s choice will be used to test it. The model will then be tweaked to make use of all the available sport speciﬁc features of the data.

Click here to view Olivia - poster and Olivia - presentation

### Gwen Williams

#### Bid Price Controls for Dynamic Pricing in the Airline Industry

**Degree:** BSc (Hons) Mathematics and Psychology, University of St Andrews**Supervisor:** Nicola Rennie

In the airline industry, revenue management systems seek to maximise revenue by forecasting the expected demand for diﬀerent ﬂights, and optimally determining the prices at which to sell tickets over time. Ideally, rather than setting prices at the start of the booking horizon, they should be updated over time depending on how many people have so far purchased tickets and how much time remains until departure. One such method of dynamically pricing tickets is the use of bid price controls. Bid price controls set threshold values for each leg of a ﬂight network; such that an itinerary (path on the network requested by a potential passenger) is sold only if its fare exceeds the sum of the threshold values along the path (Talluri and Ryzin, 1998).

Given that bid price controls require forecasts of demand; if demand is not as expected, for example, due to increased sales around the time of major sporting events or carnivals, this results in non-optimal pricing, which leads to a decrease in potential revenue. So far, we have considered the potential gains in revenue when incorrect forecasts are updated under simpler revenue management pricing control mechanisms and found that revenue can be increased by up to 20%. This project will similarly seek to quantify the potential gains in revenue from updating the bid prices when unexpected demand is detected.

Click here to view Gwen - poster and Gwen - presentation

## 2018 Interns

Here you can find details of the summer 2018 interns including a description of their research project.

### Eleanor D'Arcy

#### Estimation of Diffusivity in the Ocean

**Degree:** Lancaster University, BSc Mathematics**Supervisor: **Sarah Oscroft

Diffusivity plays an important role in many real world problems, such as recovering missing objects lost at sea or predicting how an oil spill will spread. Specifically, it measures the rate at which particles spread out over time, for instance organisms or sediments transported through water. We can estimate diffusivity using satellite-tracked drifting instruments known as drifters. However, the ocean is highly unpredictable – two particles that start at the same location at the same time can end up following completely different paths to very different locations. This requires a statistical approach for the estimation of diffusivity.

Current techniques for estimating diffusivity provide inconsistent results so through statistical research, we aim to improve these techniques. My project compares some of these different methods and uses these to estimate diffusivity for a part of the ocean using real data collected by the global drifter program. This project applies time series techniques, with a particular focus on spectral analysis. I have used MATLAB to compare different estimators using both simulated and real data before plotting my results.

### Peter Greenstreet

#### Investigating models for potential self-excitation

**Degree:** Lancaster University, BSc Mathematics**Supervisor:** Zak Varty

This project explores models for which the data points occur randomly in space and time. The aim of this type of data is to model the locations of data points or events in addition to any information or marks associated with each occurrence. This can be achieved through point process models. The simplest example of this is the homogenous poisson process. In homogenous poisson process model events occur independently at random with a uniform intensity.

The first aim of the project is to look at methods for assessing the validity of the assumptions for any data set to fit the homogenous poisson process model where the assumptions are satisfied. The next aim is to study complex data sets where the assumptions made no longer hold. Then to use different models which have fewer or weaker assumptions and the subsequently assessing any improvements in the model fit.

During the project there is a choice of two data sets. The first of which is about armed conflicts across the globe. The second was about earthquakes above magnitude 1.5 in the Netherlands. For which the events are induced by gas extraction from the reservoir below the region.

### Nicolo Grometto

#### Clustering On Web-Scraped Data

**Degree:** London School of Economics, BSc Statistics with Finance**Supervisor:** Hankui Peng

The Office for National Statistics (ONS) are currently experimenting with new data sources to improve the representativeness of the Consumer Price Index (CPI), which is the official indicator for the inflation and deflation rates for the country. Web-scraped data is considered as a promising data source that come in huge volume and can be scraped easily and at high frequency. Therefore, if could incorporate web-scraped data into the index generating procedure, then price indices could be generated more effectively and at higher frequency.

However, web-scraped data do not always come in a way that can be immediately used for price index generation. The category labels for web-scraped prices usually follow the website categorisation that the data are scraped from, which does not necessarily match the categorisation that is used for the national price index generation. Also, some product information (product name, price, etc.) might be incorrectly scraped, due to the quality of the web-scrapers.

Clustering methods are a useful tool for tackling the aforementioned challenges that come with web-scraped data. The problems that we are interested in include both recognising the main clusters of products, given the web-scraped data as well as identifying the incorrectly scraped products. In this project, we will start by exploring the fundamental clustering methods that exist in the literature (k-means and spectral clustering methods, in particular). At a further stage, we will apply this techniques on a web-scraped dataset. Clustering performance evaluation shall be carried out to compare the existing methods and further extensions to the existing techniques shall be explored.

### Cyrus Hafezparast

#### Investigating Trend in the Locally Stationary Wavelet Model

**Degree:** The University of Cambridge, BA Natural Sciences**Supervisor: **Euan Mcgonigle

Outside of neat theoretical settings, time series are most commonly non-stationary. In fields from finance to biomedical statistics, time series rarely occur which have constant mean and/or autocovariance.

Wavelets are a class of oscillatory functions which are well localised in both time and frequency, allowing wavelet based transforms to capture information in a time series by examining it over a range of time scales. One prominent method for doing so with non-stationary time series is the locally stationary wavelet (LSW) model of Nason et al. (2000). Time series in the LSW model are assumed to be zero-mean. In practise this is rarely the case. Our aim is to explore the behaviour of the model when this assumption is weakened by investigating the effect of different trends on the LSW estimate of the wavelet spectrum.

We also plan to examine the treatment of boundary effects that appear in the wavelet coefficients of data near the end points of the time series. The time series are usually assumed to be periodic, however this too is a poor assumption in most non-zero mean cases. Our project will attempt to analyse the boundary effects caused by a trend and implement methods to reduce them.

### Sean Hooker

#### Detecting Changes through Transformations

**Degree:** Newcastle University, BSc Mathematics and Statistics**Supervisor:** Sean Ryan

Changepoint detection relates to the problem of locating abrupt changes in data when the properties of a given time series have changed. This can be extended into finding whether or not a changepoint has actually occurred and if there are multiple changepoints. This area of statistics is hugely important and has many real world applications such as medical condition monitoring and financial fluctuation detection.

The most studied method for detecting changepoints looks at changes in mean within a time series. This is a popular approach due to the fact that changes like these can be detected by transforming the data and then analysing changes in the mean of the transformed data. Other methods which may prove more accurate at detecting changepoints include looking at changes in variance.

My project aims to analyse various methods of identifying changepoints, whilst studying the advantages and limitations of each approach. This involves the construction and evaluation of numerous algorithms which are used to detect changepoints.

### Niamh Lamin

#### Optimisation Problems with Fixed Charges Associated with Subsets

**Degree:** Lancaster University, MSci Natural Sciences**Supervisor:** Georgia Souli

Optimisation problems appear in a wide range of applications from investment banking to manufacturing. They involve finding the values of a number of decision variables (for example, the amount of different products that should be manufactured) to maximise (or minimise) a particular objective function (for example, profit), subject to a number of constraints. In many situations, the value of one or more of the decision variables must be an integer to give a feasible solution. These are called Mixed Integer Programs (MIPs).

The particular focus of my project is cutting planes. These are inequalities which are satisfied by all the feasible solutions to the MIP but not by all of the solutions that would be feasible if we ignored the integer constraints. The aim is to investigate different cutting planes in problems where we have fixed charges associated with subsets. In these problems, we have a set of continuous variables whose sum is bounded. We also have subsets of variables defined such that, if any variable in that subset takes a positive value, then a fixed charge is incurred. For example, the variables may represent the amounts of various items to be manufactured and the fixed charges would be start-up costs associated with machines involved in the production of subsets of these items. Cutting planes can be used to remove infeasible solutions to the MIP to focus in on the feasible region and hence the optimal solution to the problem.

### James Mabon

#### Modelling the behaviour of Kepler light curve data with the aim of exoplanet detection

**Degree:** Warwick, BSc Mathematics**Supervisor:** Alexander Fisch

Many exoplanets are detected via the so called transit method. This involves measuring the luminosity of a certain star at regular time intervals to obtain graphs known as light curves. A regular short sharp dip in luminosity could be caused by an exoplanet passing in front of the star. This sounds simple in theory but in reality there is lots of random noise, and the signal induced by planetary transits is very weak (even a planet the size of Jupiter reduces the luminosity of the sun by only 1% during a transit).

In order to remove some noise caused by phenomena such as sun spots NASA preprocesses their data to produce a so called whitened light curve. However their current method introduces complications and affects the signature of the transits, which makes the detection of the planets from the whitened data much harder.

My project will be focused on modelling the data in such a way as to not distort the transit signals. So far I have been using R to remove dominant sine waves from the data and will go on to investigate periodicity and autocorrelation within the data.

### Mason Pearce

#### Allocation of limited number of assets

**Degree:** Lancaster University, MSci Mathematics and Statistics**Supervisor:** Stephen Ford

Having just completed my third year at Lancaster University and consider doing a PhD, the STOR-i internship was a great way for me to gain an insight into PhD life. The project I have been assigned is to do with assigning limited assets to a dynamical system. The problem that arises is if we choose to deploy an asset in the present it can’t be used later but it may be more rewarding to use it in the future. We wish to deploy them so that the reward gained is optimal. To do this, we use dynamic programming which is starting from the end and working backwards to the start, optimizing in stages, this doesn’t always yield an optimal solution but assuming certain properties of the system it will. The task at hand is finding the optimal policy, where a policy is a mathematical way to decide what decision should be made in the present given the current state of the system.

### James Price

#### Heuristics for Real-time Railway Rescheduling

**Degree:** University of Bath, Mmath Mathematics with Industrial Placement**Supervisor:** Edwin Reynolds

In railways networks, a single delayed train can delay other trains by getting in their way. This is called reactionary delay and is responsible for over half of all railway delays in the UK. Railway controllers therefore have to make decisions in real-time that minimise the amount of reactionary delay. Such decisions include ‘should I cancel a train, and if so which one?’ and ‘which train should leave the station first if they can only go one at a time?’ There currently exists algorithms that can find the optimal solution to these problems. However the amount computational time required to run the algorithm, especially on a large network, makes solving these problems in real-time infeasible. An alternative approach is use a heuristic, which solves the problem with a lower degree of accuracy but produces an answer in much less time. My project involves developing multiple heuristics, comparing their advantages and limitations and deciding on a final idea.

### Konstantin Siroki

#### Preventing overfitting in Natural Language Processing.

**Degree:** The University of Manchester, BSc Mathematics**Supervisor:** Henry Moss

Natural Language Processing (NLP) allows computers to understand human speech and writing. The standard approach in NLP is to fit the model in a way that avoids relying on features over-represented in the sample (known as overfitting). There are two methods: regularization and term-frequency weighting. There is no clear consensus on which method is best. Project’s aim is to investigate the relationship between these two approaches, alongside tests across a range of NLP tasks.

## 2017 Interns

Here you can find details of the summer 2017 interns including a description of their research project.

### Edward Austin

#### Modelling Risk in Hazardous Material Transport

**Degree:** Lancaster University, BSc Mathematics**Supervisor: **Chrissy Wright

The risk of an accident is an important factor to consider when transporting hazardous materials.

Because accidents can be deadly the route with the least overall risk should be chosen.

This project looks at how best to model the risk to enable safer routes to be taken.

### Callum Barltrop

#### Investigating bias in return level estimates due to the use of a stopping rule

**Degree:** Lancaster University, BSc Mathematics**Supervisor:** Anna Barlow

There are many situations in which rare and extremely large (or small) events are of interest. For example, the focus of my project is the statistical modelling of extreme flood events. Extreme Value Theory is concerned with the modelling of the tails of the distribution and provides a theoretically sound framework for the study of extreme values. In particular, the Generalised Extreme Value distribution is used to model the maxima of a process within blocks of time (often a year). Usually, we are mostly interested in estimating the x-year return levels of a distribution, that is, the value we'd expect to be exceeded on average once every x years. However, the point at which we decide to stop sampling and analyse the data is not arbitrary and this choice of stopping point can result in biased return level estimates. After the December 2015 floods there was much interest in re-evaluating the return level estimates, as the inclusion of such a large event often led to significant changes in the value of these estimates. In this project, we will consider possible ‘stopping criteria’ (i.e. rules that tell us when to stop sampling data and do our analysis) to approximate the procedures used in reality and investigate the bias in the standard estimates. We will implement a variety of new estimators developed with the intention to improve upon the existing standard methods.

### Jonathan Bevan

#### Time series Classification

**Degree:** Lancaster University, MSci Mathematics and Statistics**Supervisor:** Harjit Hullait

The internship project will be focused on Time series classification, an area that has applications in various fields. The idea is to build a classifier, which is able to label a time series from a defined list of possibilities.

For example if we have heart rate time series for people walking and people running, we have two label: runner or walker. There are two main challenges in classification, firstly a set of labels needs to be chosen and secondly a classifier needs to be built that can label the time series.

### Chloe Fearn

#### Analysis of Armed Conflict Data

**Degree:** Lancaster University, MSc Mathematics**Supervisor:** Christian Rohrbeck

The Armed Conflict Location & Event Data Project (ACLED) has aggregated the exact location, date, and other characteristics of several violent events in unstable and warring states. The analysis of this data is challenging due to the vast amount of factors influencing such events. Koren and Bagozzi (2017)( Journal of Peace Research, 54(3)) find, for instance, that, in times of war, violence against civilians occurs more frequently in areas with a high percentage of cropland. This result is derived based on a zero-inflated model which accounts for armed conflicts not being present in all areas at all times. The proposed project considers the publicly available data and aims to slightly extend the model by Koren and Bagozzi (2017), for instance, by accounting for the spatial aspect of the data. In particular, the project can be split over three steps: (i) Exploratory analysis of the Data, (ii) Estimation of a similar model which to the one by Koren and Bagozzi (2017) and (iii) Extending the model.

### Jake Grainger

#### Assessing the Use of Spatial Models for Extremes

**Degree:** Lancaster University, MSc Mathematics**Supervisor:** Rob Shooter

Being able to model spatial extremal behaviour (in particular spatial dependence) is an important area of Extreme Value Theory and this project will aim to give an introduction into the various methods of trying to capture this behaviour. The first part of this project will provide a short introduction to univariate extreme value theory and also will look at some methods of spatial statistics - in particular looking at Gaussian Processes, which will be simulated and have interpolation methods performed on them. The second part will introduce the Smith process (a particular type of max-stable process) and will compare this to using Gaussian Process techniques on data, with the aim of comparing how well the two types of spatial model are able to describe the nature of the data.

### Graham Laidler

#### Sequential Changepoint Detection: Anticipating the next Financial Crash

### Durham University, MMath MathematicsSupervisor: Sam Tickle

Changepoint detection underpins virtually all questions of interest surrounding data analysis in a variety of contexts. Understanding the nature of a change, and when it occurred, is often of vital importance in preventing problems surfacing in the future. With the advent of Big Data, more sophisticated tools are increasingly required to search for changes on datasets of ever-growing size. Most existing methods for changepoint detection are offline, requiring the collection of an entire dataset prior to analysis, and interest in online techniques, where informed statements regarding changes of the recent past can be made in tandem with data collection, is growing.

This project will examine various existing methodologies which employ an online approach to changepoint detection, both Bayesian and frequentist, and attempt to apply these ideas to real-time datasets (for example, share price data for various FTSE100 companies) in order to find the best performing algorithms which can operate most efficiently in the greatest number of contexts. Depending on specific interests, this can involve exploring prior selection, investigating various 'control charts’ or using likelihood-based approaches among other options. There is also potential scope in helping to pioneer entirely new techniques which can then be tested against some of the existing methods.

### George Phillips

#### Combination therapies: improving outcomes via the probability of success

**Degree:** University of York, MMath Mathematics**Supervisor:**Emily Graham

Combination therapies are able to hit the many mechanisms of diseases/cancers simultaneously by combining existing drugs and new molecular entities. When developing a combination therapy, the aim is to produce a synergistic effect while reducing side effects. However, drug development is a long and expensive process which is subject to a considerable amount of uncertainty. Therefore it is important that the decisions made are well informed and are expected to be the most beneficial to both the pharmaceutical company and the patient population.

Methods for decision making often require several parameters relating to a drug. We are interested in the estimation of the probability of study success for combination therapies. Current methods do not allow information to be shared across similar combinations. We believe that incorporating this information in a Bayesian setting will improve the accuracy of our estimates. This will lead to better decision making and improve the outcomes of the development programmes.

### Harry Spearing

#### Simulation Optimisation Techniques for Time-Dependent Staffing Problems

**Degree:** Lancaster University, BSc Physics**Supervisor:** Luke Rhodes-Leader

In many real world problems, such as complex queueing problems, mathematical models of the system can be too complex to solve analytically. An alternative way to study stochastic systems is to use a simulation to produce realisations of the system. Simulation can be used to optimise a system by testing alternative settings. The choice of optimisation technique depends heavily on the properties of the problem, such as size of the solution space, how many objectives there are and whether the decision variables are discrete or continuous. Due to the stochastic nature of the problems, the optimisation is further complicated as the objective must be estimated, rather evaluated exactly. This project will focus on finding simulation optimisation techniques appropriate for the optimal staffing problem for a time dependent queueing system, such as that of an emergency call centre.

### Livia Stark

#### Executing Offshore Maintenance Activities

**Degree:** Lancaster University, MPhys Physics**Supervisor:** Toby Kingsman

At the start of the internship time will need to be spent learning about the general offshore maintenance problem and literature associated with it. This could be simpler sub-problems such as the travelling salesman problem, travelling repairman or scheduling of tasks. Depending on the student’s knowledge of linear programming and coding, time could be spent trying to implement one of these models on the computer.

The goal of the project is likely to be creating some simple construction heuristics to solve the offshore maintenance routing and scheduling problem. These could be extended to more general problems depending on the student’s interest, e.g. several vessels or tasks completed in stages. The performance and results of these heuristics could be compared across several instances.

### Jinran Zhan

#### Scheduling using Optimisation

**Degree:** University of Southampton, BSc Mathematics and Statistics**Supervisor:** David Torres Sanchez

The project will focus on one the main optimisation scheduling problems. Project planning, it refers to the programming of different activities that need completion for a given project. It is also heavily conditioned by the specifications on the resources and activities, making the problem really interesting for mathematicians. In this project we will be focussing on understanding the so-called resource-constrained project scheduling problems (RCPSP). The generality of the RCPSP allows it to have a wide range of applications where the aim is to schedule some activities or jobs over a period of time such that precedence and resource constraints are satisfied, and a certain objective function is optimised. Depending on the student’s knowledge of linear programming and optimisation we can study the varied formulations or if the student is familiar with it we can jump straight into the pre-emptive case for long term planning horizons. Either of these tie in with testing on Python using Gurobi which will be learnt if needed.

## Associated Interns 2017

### Waseem Aslam

#### Management Science Intern

Waseem joins the STOR-i Internships from the Lancaster University Management Science department.

## 2016 Interns

Here you can find details of the summer 2016 interns including a description of their research project.

### Stefanos Bennett

#### Regression with Dependencies and Non-Gaussian Noise

**Degree:** University of Cambridge, BA Mathematics**Supervisor:** Stephen Page

The linear model is a widely used tool in regression analysis. Linear regression models are most commonly fitted using them both conceptually and computationally simple least-squares approach. A frequently made assumption in linear least squares regression is that the error terms between the observed responses and the corresponding expected values are independent and identically distributed normal random variables. This assumption greatly simplifies the matter of obtaining confidence intervals for the unknown parameters of our model. However, whether this is a sound assumption depends on the size and nature of the particular dataset under consideration. This project will investigate the case when the assumption is not satisfied. Various techniques for obtaining confidence sets will be examined and compared to the sets obtained via normal approximation. The effects of different possible violations of the Gaussian assumption on the constructed confidence sets will be investigated.

### Matthew Bold

#### Input Uncertainty in Simulation Models

**Degree:** University of Birmingham, MSci Mathematics**Supervisor:** Lucy Morgan

The simulation uses mathematical modelling in order to mimic real-world systems which cannot be tested in reality; perhaps due to time, cost or safety constraints. The information gained by running the simulation can then be used to make decisions about the real-world system. For example, retailers want to ensure they have enough servers to prevent customers from having to queue for long periods of time. A simulation model can be used to understand how the queue behaves and make a decision about how many servers are needed for each shift in order to keep the queue length below a certain level. The inputs in simulation models are usually approximated by observing real-world data; for example, observing the number of customers that are served in a shop over a period of time. Input uncertainty arises from the fact that we only have a finite amount of real-world data, and therefore cannot be certain that the values of the input parameters that are being used to drive the simulation are the true values of the input parameters. This project aims to quantify the input uncertainty in a queueing simulation model.

### Bronwen Edge

#### How good is the Lancaster University Mathematics Department? – An investigation using Data Envelopment Analysis

**Degree:** BSc Mathematics, Heriot-Watt University**Supervisor:** Emma Stubington

Each year university league tables are released but many are based on different criteria and have slightly different results. We are interested in testing the efficiency and productivity of mathematics departments across the country. As we are considering multiple inputs and outputs: student satisfaction, entry requirements, academic and career attainment and the cost of university, etc. it is difficult to make direct comparisons between institutions. We therefore need to use a management science method, Data Envelopment Analysis, (DEA) which can cope with lots of constraints. What I am finding particularly interesting is the additional questions that arise from examining the data and implementing this approach, for example: Should universities that produce high numbers of good degrees be considered the best? Are some students not reaching their potential and are being let down by their institution, given they entered university with extremely high entry requirements? Are some universities awarding an unrepresentative number of good degrees considering their place in current league tables, or is the data just extremely bias with a small sample size? Should all universities be charging the same fees, given their career opportunities after are significantly less? Is university location skewing the career prospects of students, whilst not taking into consideration the living costs and average salary of non-graduates of some locations? As my project advances, I have realised that what seemed like a simple linear programming problem evolves into a complex social and economic issue, which questions the real cost to students when choosing which university is best for them.

## Thomas Grundy

### Supervisor: Oliver Hatfield

Detecting Match-Fixing in Tennis

In January 2016, tennis was hit by allegations of widespread match-fixing prompted by the release of secret documents from reviews into tennis’ integrity. The documents detailed widespread accusations of corruption within the sport. The aim of the project is to create simulations of tennis matches and explore sudden changes in performance, which could be linked to match-fixing, using simple change point methods. Features such as dependence and the importance of critical points will also be taken into account to create accurate simulations. In addition the current rating system within tennis only takes into consideration the previous years results and has no consideration on the strength of opponents. A further aim of the project is to create a rating system based around the ELO system with improvements.

## Ben Miller

### Supervisor: Aaron Lowther

Detecting Unwanted Variation in Time Series

A statistical outlier in a set of data is defined to be “an observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data”. In the context of time series, examples of outliers may include the number of complaints received by BT after a power outage, or the increase in supermarket sales during the days leading up to Christmas. It is important that we are able to detect these outliers as they may have a significant impact on the model selected to fit the data, the parameter estimates for the model, and consequently, on any forecasts made from the model. This project will look into methods and algorithms that are able to automatically detect outliers in time series.

## Henry Moss

### Supervisor: Emma Simpson

Assessing Dependence in Extreme Values

Extreme value theory models the maxima (minima) of random variables. By their very nature, they occur infrequently and so are hard to model. A robust framework already exists, with block-maxima and threshold-based approaches providing parametric distributions for the maxima. Known as the Generalised Extreme Value (GEV) and Generalised Pareto (GP) distributions, these allow us to estimate the maximum value that we would expect to see over n years. My project looks into the bivariate case, where our variables have extremes that either occur simultaneously (Asymptotic Dependence) or independently (Asymptotic Independence). There already exist several statistical measures that measure this behaviour however it is hard to obtain reliable estimates of their values. I am looking at developing an alternative method to simultaneously estimate two of these measures, with the hope of finding some synergies.

### Emma Oldfield

#### Improving question selection in education software

**Degree:** University of Sheffield, BSc Mathematics**Supervisor:** Ciara Pike-Burke

With the advance of technology in education, it is becoming more possible to personalise education software, providing students with questions tailored to their individual learning styles and abilities. The data gathered from the students' previous interactions with the education software can be used to simulate students response to future data. This enables us to model student performance. The main aim will be to investigate whether Bayesian methods can provide a more accurate prediction of student performance over frequentist methods. The Bayesian approach looked into Monte Carlo Markov Chains and Random Walk Metropolis. The models will be used to predict whether students would pass an exam of particular questions.

### Anja Stein

#### Assigning Drones in Military Search

**Degree:** University of Edinburgh (Mmath)**Supervisor:** James Grant

Drone technology is fast becoming a vital component of military operations. Unmanned Aerial Vehicles (UAVs), as they are known within the military, can perform a variety of tasks remotely making things both more efficient and safer for military personnel. This project revolves around optimizing the UAV Search Problem by maximising the number of events detected within a given border by a fleet of UAVs equipped with cameras. The UAVs aim to detect the locations of events of some sort occurring on the border (one example may be crossings of the border). Each UAV is to be assigned a specific subsection of the border to patrol, with the assumption being that the larger its subsection is, the less likely it will be to actually detect an event. Some UAVs may be naturally better at detecting events than others (because of better cameras etc.) and some UAVs may be better equipped to detect events in certain parts of the boundary (e.g. different types of terrain).

### Georgios Topaloglou

#### Univariate methods for time series forecasting

**Degree:** University of Cambridge, BA Mathematical Tripos**Supervisor:** Daniel Waller

Time series are often grouped in a hierarchical structure. For example, the time series for the total number of tourists visiting a country may be split into more time series according to the purpose of travel, and each of these time series may, in turn, be split into more time series according to the length of stay, thus creating a tree-like hierarchical structure. The issue of forecasting hierarchical series in a way that allows for a similar hierarchical disaggregation of the forecasts is very important. This project will combine two methods that have recently been proposed, optimal combination and temporal aggregation. It will then test the accuracy of this new method against that of optimal combination and other standard techniques such as bottom-up and top-down forecasting.

### Alan Wise

#### Detecting Changes in Multivariate Time Series

**Degree:** University of Edinburgh, Mathematics BSc (Hons)**Supervisor:** Rebecca Wilson

Changepoint detection of univariate time series has been widely covered but the increasing availability of multivariate data has motivated the study of multivariate detection methods. Time series data of a multivariate flavour can be found in finance, health monitoring, signal processing, bioinformatics, and detecting credit card fraud. In my project, I explore a few methods to detect change points of multivariate time series data. I also discuss the drawbacks of these methods and suggest ways in which these drawbacks could be overcome.

## 2015 Interns

Here you can find details of the summer 2015 interns including a description of their research project.

### Ana Daglis

#### Statistical inference for evolving network structure

**Degree**: University of Cambridge, BA Mathematics**Supervisor**: Matthew Ludkin

Networks are prominent in today’s world. The volume of telecommunications and social network data has exploded in the last two decades. Gaining a statistical understanding of the processes generating and maintaining network structure can be used to make confident statements about properties of a network, detect anomalous behaviour or target adverts. In recent years more data has been collected alongside the network. Can such covariate information improve inference for network structure compared to network data alone? Many have attempted to model how networks grow, however, most models have poor statistical properties. This project will investigate approaches for combining statistical methodology from static modelling techniques with methods for analysing data indexed through time.

### Lawrence Latter

#### Modelling extra-tropical cyclones using extreme value methods

**Degree**: Lancaster University, MSci Mathematics**Supervisor**: Paul Sharkey

The prevalence of extra-tropical cyclones in the mid-latitudes is a dominant feature of the weather landscape affecting the United Kingdom. The UK has come to expect a consistent annual pattern of temperate summers and mild winters. However, in recent years it has been a focus of extreme weather events, for example, major floods and damaging windstorms. Accurate modelling and forecasting of extreme weather events are essential to protect human life, minimise potential damage and economic losses, and to aid the design of appropriate defence mechanisms. In this context, an extreme event is one that is very rare, with the consequence that datasets of extreme observations are small. The statistical field of extreme value theory is focused on modelling such rare events, with the ideology of extrapolating physical processes from the observed data to unobserved levels. This project will focus on applying extreme value methods to remote sites in the North Atlantic and European domain.

### Euan McGonigle

#### A Linguistically-Motivated Changepoint Problem from a Bayesian Perspective

**Degree**: University of Glasgow, MSci Mathematics**Supervisor**: Sean Malory

Sequences arise naturally in linguistics with the number of occurrences of a linguistically salient feature changing over time as language attitudes evolve. One such feature is the use of flat adverbs, for instance, in the phrase “fresh ground coffee" the word “fresh" is a flat adverb, since it functions as an adverb but lacks the typical suffix “ly". While not as widespread nowadays, flat adverbs were commonly used during 1700-1900. Authors of this period used flat adverb forms and were publicly criticised for doing so. This project will introduce a Bayesian statistical framework to investigate whether the rate of flat adverb use changed significantly after an author's writing had been subjected to such criticism. This will focus on the detection of changes in a sequence of data points using a Bayesian approach, specifically, we will be interested in quantifying (in a precise way) whether or not a change in the sequence has occurred at some point.

### Daniel Miles

#### Density-based cluster analysis

**Degree**: University of Reading, MMath Mathematics**Supervisor**: Katie Yates

Cluster analysis is the process of partitioning a set of data vectors into disjoint groups (clusters) such that elements within the same cluster are more similar to each other than elements in different clusters. Clustering has a wide range of application areas including Biology, Physics, Computer Science, Social Science and Market Research. There are three main categories of algorithms which can be applied in order to find solutions to data clustering problems: hierarchical, partitioning and density-based. The main focus of this project is to explore density-based clustering methods and to compare the performance of these algorithms via simulation studies.

### Sarah Oscroft

#### Classification in streaming environments

**Degree**: Newcastle University, MMath Mathematics**Supervisor**: Andrew Wright

The aim of a classification model is to predict the class label of a new observation using only historical observations. Traditional classification approaches assume this historical dataset is a fixed size and is drawn from some fixed probability distribution(s). However, in recent years a new paradigm of data stream classification has emerged. In this setting, the observations arrive in rapid succession, with classifiers capable of being trained sequentially, and an adaptable underlying probability distribution. These classifiers have applications in areas as diverse as spam email filtering, analysing the sentiment of tweets and high-frequency finance. This project will investigate how models can be used to produce streaming versions of classifiers.

### Srshti Putcha

#### Auto-Correlation Estimates of Locally Stationary Time Series

**Degree**: London School of Economics, BSc Mathematics and Economics**Supervisor**: Jamie-Leigh Chapman

A time series is a sequence of data points measured at equally spaced time intervals. Examples of time series include FTSE 100 Daily Returns and the total annual rainfall in London, UK. Often we assume that such series are second-order stationary. In other words, the statistical properties of the time series remain constant over time, e.g. the autocorrelation. However, the reality is that many time series are not second-order stationary and therefore it is not appropriate to model them using such methods. Instead, we must consider time-varying equivalents of the autocorrelation or autocovariance. One method that analysts use to adapt the regular autocorrelation function to be a time-varying quantity, is applying rolling windows of the data. Unfortunately, this can present quite different answers for segments of different lengths based on segment length choice and location of the time series sample. This project will explore alternative methods of estimating a time-varying auto-correlation function in order to overcome these problems.

### Sam Tickle

#### Regression, curve fitting and optimisation algorithms

**Degree**: University of Cambridge, BSc Mathematics Tripos**Supervisor**: Elena Zanini

The underlying strategy for most statistical modelling is to find parameter values that best describe the fit of the model to the data. This requires optimising an objective function while minimising the difference between the model and the observations. When analytical solutions to the optimisations are unavailable, statisticians often rely on numerical optimisation routines to perform this fit, trusting that this will produce stable estimates of the parameters. Firstly, some issues may arise in the choice of the best algorithm given the characteristics of the problem at hand. Secondly, the algorithm considered may not actually perform well and needs to be understood and adapted to work better on the model considered. This project will investigate different numerical optimisation algorithms used in statistical inference and curve fitting, and how to overcome some of the problems associated with these types of algorithms.

### Zak Varty

#### Pharmacokinetic Modelling

**Degree**: Lancaster University, MSci Mathematics**Supervisor**: Helen Barnett

In medical research, in both pre-clinical and clinical trials, the objective is to learn about the behaviour and effect of potential new drugs in the body. This breaks down into two categories- how the drug affects the body (Pharmacodynamics) and how the body affects the drug (Pharmacokinetics). This application-driven project focuses on pharmacokinetic modelling, which involves modelling the concentration of a compound in the blood over time. The aim of the project is to apply statistical modelling techniques to real data in order to obtain an understanding of the role of pharmacokinetics in the drug development process.

## 2014 Interns

Here you can find details of the summer 2014 interns including a description of their research project.

### Anna Maria Barlow

#### Spatiotemporal Modelling of Economic Data using Disease Mapping

**Degree**: University of Durham, MMath Mathematics**Supervisor**: Christian Rohrbeck

The field of statistics focusing on models incorporating spatial information is called Spatial Statistics. Spatial statistics generally distinguishes between three types of data: geostatistical data, lattice data and spatial point patterns. This project will focus on lattice data, where the number of sites at which observations are recorded is finite, for example, the population in each county of the UK or the results of the last general election per district. Spatial statistical methods for lattice data are often applied in epidemiology to model the occurrence of a disease in a region depending on covariates. This is known as Disease Mapping, with models aiming to predict the occurrence rate or the number of cases of a particular disease. This project will investigate the basic methods used in Disease Mapping and apply them to economic data.

### Dawid Bernaciak

#### Relocation Operations in One-Way Car-Sharing Problems

**Degree**: University of Glasgow, MSci Mathematics**Supervisor**: Burak Boyaci

Car-sharing is a new concept that enables the general public to access a fleet of vehicles for short rental periods. These systems have several benefits including environmental, energy and societal considerations. Car-sharing systems have two general types; the restrictive “two-way” system where users pick up and drop off the vehicle at the same location, and the more flexible “one-way” system enabling the users to choose a different drop-off location to the pick-up station. For the customer, the one-way system is generally preferred however one of the difficulties in implementing a one-way system is managing the relocation of vehicles and personnel. This project will develop and implement models for improving relocation operations for the one-way car-sharing problem.

### Helen Coupland

#### Modelling ocean environments with extreme value theory

**Degree**: University of Durham, MMath Mathematics**Supervisor**: Monika Kereszturi

Offshore structures such as oil platforms and vessels must be designed to have very low probabilities of failure due to extreme weather conditions. Inadequate design can lead to structural damage, lost revenue, danger to operating staff and environmental pollution. Design codes demand that all offshore structures exceed specific levels of reliability, most commonly expressed in terms of an annual probability of failure or return period. Hence, interest lies in environmental phenomena that occur extremely rarely, and we want to estimate the rate and size of future occurrences. The aim of this project is to gain a deep understanding of extreme value theory in the application of ocean environments.

### Toby Kingsman

#### Analysis of Algorithms for yield optimization and batch scheduling problems

**Degree**: University of Birmingham, MSci Theoretical Physics and Applied Mathematics**Supervisor**: Trivikram Dokka

A common scheduling problem in industrial settings is concerned with scheduling jobs on identical machines with the objective of minimizing the total active time. The problem finds important applications in the field of (energy-aware) scheduling especially in applications relating to optimal network design. The aim of this project is to investigate the performance of some natural heuristics proposed for finding near-optimal solutions to these computationally hard problems. This will involve learning about the integer and linear programming formulation based methods and using computer programming to implement algorithms and solve linear programs.

### Aaron Lowther

#### Seasonally Adjusting Official Time Series

**Degree**: Lancaster University, BSc Mathematics**Supervisor**: Rebecca Killick

The Office for National Statistics (ONS) publishes thousands of seasonally-adjusted time series which are used to produce the official statistics that create the news headlines regarding, for example, increase/decrease in unemployment and double or triple-dip recessions. Seasonal adjustment involves estimating and removing a seasonal component from a time series. This project aims to develop and test a method for the automatic detection of changes in the seasonal pattern of time series by comparing alternative methods and assessing the impact on the estimation of seasonal factors for series that do and do not present changes in the seasonal pattern.

### Rachel Naylor

#### Fast inference for processing intelligence information

**Degree**: University of Bath, MMath Mathematics**Supervisor**: Lisa Turner

Intelligence is information regarding threats to national security and potentially hostile forces. After raw intelligence data is collected it must be processed and screened, often in time-critical situations. Only relevant information is then passed on for further analysis. With huge amounts of intelligence data collected daily, potentially relevant information can be missed. Given a set of intercepted communications, how should we process the communications to maximise the amount of relevant information passed on for analysis? This project will develop a model for processing intercepted information and explore how to overcome problems associated with this type of model.

### Emily Olesker

#### Assessing Performance of Changepoint Detection Algorithms

**Degree**: University of Cambridge, BA Mathematics**Supervisor**: Kaylea Haynes

Changepoints are a widely studied area of statistics with applications including, but not restricted to, finance; detecting changes in volatility, computer science; detecting instant messaging worms and viruses and environmental such as oceanography and climatology. Changepoints are considered to be the points in a time-series where we experience a change in some statistical property, for example, a change in mean or a change in variance. There are many different approaches to changepoint analysis however current methods have the trade-off of being fast but approximate or exact but slow. The aim of this project is to develop an understanding of changepoint detection methods and in particular explore ways in which we can assess the performance of different detection methods.

### Luke Rhodes-Leader

#### Explaining changes in aggregated time series

**Degree**: Lancaster University, MSci Physics with Mathematics**Supervisor**: Lawrence Bardwell

In many applications, there is some indicator that is constantly monitored as new data are collected, for example in an industrial setting, the number of faults recorded on a large network per week. Typically at a managerial level interest lies in the total number of faults over the entire network and patterns or changes that may occur. One important change in this indicator is a spike (outlier) where suddenly there is a large increase in the number of faults over the entire network. Understanding why these sudden increases occur is important so they can be prevented from happening again. This project will investigate methods for detecting outliers in large time-series datasets.

### Matthew Robinson

#### Selection of Tolerance Level for Approximate Bayesian Computation

**Degree**: University of Warwick, MMath Mathematics**Supervisor**: Wentao Li and Paul Fearnhead

For many complex datasets, one feature is that the likelihood of the statistical model is intractable, in the sense that it is difficult to evaluate the likelihood values of the observations, and standard inference methods for unknown parameters, like Maximum Likelihood Estimation and Monte Carlo Markov Chain, do not work. For intractable problems of which sampling from the likelihood given parameter values is easy, Approximate Bayesian Computation (ABC) is a useful Bayesian inference method using Monte Carlo simulations. The project will investigate the impact of the tolerance level, a core parameter of the ABC algorithm, in various situations and try to design an automatic algorithm to select the tolerance level.

### Emma Stubington

#### Travelling Salesman Problem

**Degree**: Sheffield University, BSc Mathematics**Supervisor**: Ivar Struijker-Boudier

Scheduling problems can be found in many industrial settings. The complexity of scheduling problems is often such that optimal solutions cannot be guaranteed to be found in short computational time. However, many companies need to produce schedules on a daily basis, so they need a computationally fast way of implementing this. A well-known example of a difficult to solve scheduling problem is the travelling salesman problem (TSP) which is concerned with finding the shortest route which visits each of a number of locations exactly once. If every location can be travelled to directly from every other location, then the number of possible solutions increases very quickly as more locations are added to the problem. Evaluating every possible solution then becomes impossible. This project will explore the travelling salesman problem and will assess and compare various solution methods for the TSP.

### Luke Whincop

#### Modelling solar irradiance for energy generation

**Degree**: University of Bath, MMath Mathematics**Supervisor**: Nikos Kourentzes

The increasing investment in renewable energy is essential to guarantee immediate answers both to the high and fluctuating prices of crude oil and to the diversification of energy supplies, thus reducing external dependence on oil, gas and coal. Therefore, solar power generation becomes an area of paramount research. Various time series methods have been implemented to forecast solar irradiance for power generation however a complication with solar irradiance data is that of multiple seasonalities- seasonality from the day-night cycle and the annual earth cycle. This project will attempt to tackle some of the questions related to modelling the seasonal element of solar irradiance using time series and forecasting models.

## 2013 Interns

Here you can find details of the summer 2013 interns including a description of their research project.

### Martin Andla

#### Statistical Modelling in Sports

**Degree**: University of York, MMath Mathematics (2010-present)**Supervisor**: George Foulds

To aid the application of betting and investment strategies an edge must be sought over the market. Simulation modelling is a crucial part of this process, providing evidence to support real-world data analysis and professional conjecture. This project will introduce the student to the use of statistical modelling in the prediction of sports results and allow them to adapt a well-known model using their findings from freely available real world data.

### Jenny August

#### Detecting Changepoints in Multivariate Data Series

**Degree**: University of Edinburgh, BSc Mathematics (2010-present)**Supervisor**: Ben Pickering

Data collection is a huge component of the workings of any modern organisation. There are many examples of situations where data is collected from multiple sources which may be related in some way, for example, the stock prices of multiple companies in the same industrial sector. While the nature of these data values may stay fairly constant over time, occasionally some event may occur which causes a sudden change in the values being recorded at all sources, for example, in financial data, there may be a stock market crash. The times at which such changes occur are known as multivariate changepoints. This project will explore the effectiveness of current multivariate changepoint methods.

### Thomas Berrett

Learning in Dynamic Environments

**Degree**: University of Cambridge, BA Mathematics (2010-2014)**Supervisor**: David Hofmeyr

Machine learning is a field of artificial intelligence focused on developing algorithms which allow computers to evolve and improve their behaviour as a result of empirical data. In the context of this project, this refers to the construction of a data-driven model to aid in a predefined task. The task might be something basic like making predictions based on a simple regression model, or it might be highly complex like describing intricate biological systems. This project offers a variety of possibilities due to the lack of specificity in online learning, and there is considerable flexibility for its direction depending on the student’s preference.

### Simon Crawford

#### Bayes Sequential Decision Problems

**Degree**: University of Bath, MMath Mathematics (2010-present)**Supervisor**: James Edwards

Many important decisions have to be made under uncertainty because the information that is relevant to the problem is missing or is only known imperfectly. Often, these decisions are not taken in isolation but in a sequence. New information that becomes available as a result of our actions can then be used to make better decisions in the future. However, the actions that give the best short term results may not be the same actions that give the most information. This presents a trade-off between taking the actions that are best in the short term and the need to learn for better long term results. This project will explore a number of statistical theories for dealing with decision problems followed by testing and selecting optimal methods.

### Oliver Hatfield

#### Multiple Changepoint Detection in Non-Trivial Models

**Degree**: University of Durham, MMath Mathematics (2010-2014)**Supervisor**: Rob Maidstone

Time series data sometimes experiences abrupt changes in structure. These changes are called changepoints. To model the data effectively these changepoints need to be detected and subsequently built into the model. Changepoints occur in a variety of real world situations, for example when analysing human genome data the average DNA copy value is usually around the same level, however occasionally sudden changes away from this level occur. These sudden changes in average DNA level often relate to tumourous cells and therefore the detection of these changes is critical for classifying the tumour type and progression. This project will introduce the student to changepoint models and involve programming these models using statistical computing software.

### Lucy Morgan

### Background Subtraction: Methods for Video Analysis

**Degree**: Lancaster University, BSc Mathematics (2011-2014)**Supervisor**: Rhian Davies

Surveillance cameras have become ubiquitous in many countries, collecting a huge amount of data, most of which is stored and never analysed. Converting this data into useful information can be problematic, particularly as large companies often use many cameras simultaneously. Often it is of interest to the user to detect anomalies in video footage, for example a person placing an item in their bag instead of their shopping trolley. In order to detect such anomalies, we first need to separate the foreground and the background of a video. One popular method for splitting the foreground from the background is background subtraction. The aim of this project is to investigate the effectiveness of different algorithms for background subtraction under a number of real and challenging scenarios.

### Ciara Pike-Burke

#### Evaluating the Structure of the Excitability Curve of Motor Neurons

**Degree**: University of Manchester, BSc Mathematics with Spanish (2010-2014)**Supervisor**: Simon Taylor

Scientists in the field of neuromuscular research are interested in understanding the structures and processes involved in operating a working muscle. The fundamental component to this process is a motor unit: consisting of a single motor neuron and a collection of muscle fibres that it governs. Evaluating the number of motor units that form a working muscle is very important in understanding the effects of various neuro-degenerative disorders and also in assessing the effectiveness of proposed treatments. The aim of this project is to analyse data from the stimulation of a single motor unit using importance sampling and Bayesian statistics.

### Michelle Pinharry

#### The Unit Commitment Problem and Wind Energy

**Degree**: University of Bath, BSc Mathematics (2011-2014)**Supervisors**: Pedro Crespo del Granado and Franklin Djeumou Fomeni

The UK’s wind renewable resources share in the grid energy generation mix is expected to be around 20-30% by 2020. Wind generation, however, creates new planning challenges to maintain a stable and reliable supply-demand balance. Since wind generation fluctuates independently from energy demand, this creates a disturbance for the short term generation planning and scheduling of other generation units (such as gas or coal power plants). This brings a new degree of uncertainty on stabilizing the power network equilibrium between supply and demand in real time. This project will use optimisation modelling to answer the question what is the optimal cost-effective mix of energy units needed to achieve carbon reduction targets whilst also coping with high wind input?

### Benjamin Pring

#### Betting Markets and Strategies

**Degree**: University of Bath, BSc Mathematics (2010-2013)**Supervisor**: Tom Flowerdew

Markets come in many forms. From buying and selling livestock to trading complex financial derivatives the key to making long-term profits is to establish an edge on the market. Once an edge has been established, the question is how can wealth be optimised? This project will investigate ways in which existing theory can be adapted to fit into sports betting markets, and ways in which underlying assumptions can be removed to allow the theory to become more general.

### Emma Simpson

#### A study of the air quality of major cities in China

**Degree**: University of Durham, MMath Mathematics (2010-2014)**Supervisor**: Ye Liu

The air quality in some major cities in China has long suffered from the rapid industrialisation and increasing vehicle usage. With the help of social network and media coverage, this issue has gradually come to the concerns of the government as well as the general public. This research project will aim to gain some insight into the air pollution problem in China using classical statistical techniques such as time series analysis and extreme value theory.

### Kathryn Turnbull

#### Modelling droughts with extreme value theory

**Degree**: University of Durham, MMath Mathematics (2011-present)**Supervisor**: Hugo Winter

Droughts are large scale climatic phenomena that can lead to social and economic damages. In Africa, periods of drought can lead to food instability and large death tolls as well as having a knock-on effect on the economies of major aid providers. In the UK, a drought could cause reservoirs to run low and lead to government legislature such as hose-pipe bans seen over the last few summers. It is of great concern to governments and industry where and when these events may occur and also whether their occurrences will differ in the future with anticipated global climate change. Using standard statistical techniques for rare events will potentially result in badly fitting models and worse, to misleading policies. With such rare and sparse data a more reliable approach is needed; this is called extreme value theory. This project will introduce the student to extreme value theory and its applications for drought data.

### Christina Wright

#### Resource Allocation in Service Industries

**Degree**: University of Durham, MMath Mathematics (2010-2014)**Supervisor**: Emma Ross

The effective allocation of resources to meet demand is an essential consideration of any company hoping to survive in a competitive market. This and many other important decision problems can be formulated as a well-known combinatorial optimisation problem called the knapsack problem. It forms a basis from which to study such decision problems, but we quickly run into difficulty when the complexity and scale of real problems faced in industry are incorporated. This project will allow the student to investigate the impact of uncertainty in resource allocation problems by introducing them to linear programming.

## 2012 Interns

Here you can find details of the summer 2012 interns including a description of their research project.

### Jamie-Leigh Chapman

#### Scenario Generation for Stochastic Programming

**Degree**: University of York, MSci Mathematics (2009-2013)**Supervisor**: Jamie Fairbrother

Often we have to make decisions in the face of uncertainty. A shop manager has to decide what stock to order without knowing the exact demand for each item. An investment banker has to choose a portfolio without knowing how the values of different assets will evolve. Taking this uncertainty into account allows us to make good robust decisions. This project uses stochastic programming as a tool to investigate such decision-making processes.

### William Cook

#### Modelling anti-terrorist surveillance systems from a queueing perspective

**Degree**: University of Cambridge, BA Mathematics with Physics (2010-2013)**Supervisor**: Terry James

It is without question that surveillance is very much a part of the modern world. A growing interest in the need for surveillance has been matched by technological advances in the area. Surveillance cameras, either static or as part of an unmanned aerial vehicle have the ability to feed real-time information to a control centre. Here the subject under surveillance can be properly assessed in terms of their identity or possible intentions in a biometric fashion. This project explores an aspect of the emerging operational research field of Homeland Security. More specifically this project will consider the challenge of modelling the defensive surveillance of public areas which are subject to attack by terrorist subjects.

### Josephine Evans

#### Spectral Analysis of Multivariate Time Series

**Degree**: University of Cambridge, BA Mathematics (2010-present)**Supervisor**: Tim Park

The advent of smartphones has opened up new possibilities for the collection of data. These phones contain sensors such as accelerometers, gyroscopes and GPS making them a cheap and easy way for companies to collect time-series data. This data is often multivariate and nonstationary and often the main challenge is deciding which channels to focus the analysis on rather than the choice of analysis method itself. This project uses principal components analysis to identify which channel to focus on when analysing a multivariate time series.

### Matthew Ludkin

#### Hybrid simulation models for maintenance processes

**Degree**: University of Birmingham, MSci Mathematics (2009-2013)**Supervisor**: Mark Bell

One of the most widely used dynamic modelling methods in Operational Research for understanding and improving organisational systems is discrete event simulation (DES); an application of this method is in modelling maintenance processes. In a large organisation, there are often many additional interactions that affect maintenance operations. When this is the case, there are occasions where modelling the system using DES alone is not sufficient therefore System Dynamics (SD) may be utilised. In Operational Research these two approaches have traditionally been separated but in recent years there has been an emergence of using hybrid models that contain both techniques, as the limitations of each have been said to complement one another. This project initially involves building DES and SD models separately before finally combining the two models to create hybrid models of maintenance processes.

### Helen Mossop

#### Clustering customers to estimate willingness-to-pay

**Degree**: Newcastle University, MMathStat Mathematics and Statistics (2009-2013)**Supervisor**: Shreena Patel

Simple probability models are often inadequate for describing the data we encounter in reality because of heterogeneity in the population we are attempting to model. One way to overcome this is to use a mixture model which represents the population as consisting of several sub-populations (or clusters), each of which can be modelled by a standard parametric distribution. This project concerns a population of customers each of whom has an (unobservable) maximum price which they are willing to pay for a product, called a referral price. We wish to cluster customers to capture differences in their price-sensitivity by assuming that referral prices are generated by a mixture of normal distributions. Standard clustering techniques will be adapted in order to estimate how likely a customer is to accept future quotes.

### Gwern Owain

#### Resource Allocation problems in queueing theory

**Degree**: Cardiff University, BSc Mathematics (2010-2013)**Supervisor**: Jak Marshall

Queues occur naturally in business and computer science applications. So ubiquitous are queues in various situations that being able to model their behaviour is an essential skill for any practitioner or researcher of operations research. Often it is of benefit to simultaneously manage the flow of work in and out of multiple queues given limitations of service resources. This project introduces the rich theory of queueing systems and presents an opportunity to explore efficient ways of coping with random demands on a system with multiple parallel queues with cost structures imposed on them.

### Stephen Page

#### Prize-Collecting Steiner Travelling Salesman Problem with Time Windows

**Degree**: University of Cambridge, BA Mathematics (2010-present)**Supervisor**: Saeideh Dehghan-Nasiri

The travelling salesman problem is a very well-known optimization problem. This project studies aspects of this problem with additional time window restrictions on the service time of customers and uses a real road network graph. Small scale versions of the problem may be solved using exact optimization techniques. This project looks at solving the problem using exact solution methods and developing and applying a dynamic programming algorithm that provides a lower bound for the problems of a larger scale.

### Paul Sharkey

#### Modelling the North Sea wave climate

**Degree**: University College Dublin, BSc Mathematical Science (2009-2013)**Supervisor**: Ross Towe

Wave height is of inherent interest to oil companies with offshore operations. Through determining the distribution of wave heights, this information can be used to minimise the risk and consequently the cost of future offshore operations. A current consideration is also whether climate change will have an impact on the distribution of wave heights. This project considers extreme value theory for modelling wave heights in the North Sea.

### Faye Williamson

#### Semi-Markov processes in a healthcare setting

**Degree:** Lancaster University, MSci Mathematics (2010-2013)**Supervisor:** Dan Suen

Analysing healthcare systems has been an important concern of healthcare modellers for many years. Understanding patient flows and the number of patients in healthcare systems is an important tool when trying to improve hospital efficiency and, among other things, reduce patient waiting times. This project seeks to highlight similarities between healthcare models and the types of systems multigrade population models are applied to using data from a healthcare case.

### Elena Zanini

#### Parameter Estimation with Particle Filtering Algorithms

**Degree**: University of Edinburgh, BSc Applied Mathematics (2009-2013)**Supervisor**: Chris Nemeth

There exist numerous problems in statistics, engineering, signal processing, etc. which require the estimation of a hidden process. One such example can be found in target tracking, where the aim is to estimate the state of a target (e.g. position, velocity) given only partial, noisy observations (e.g. bearing measurements only). The process of estimating a target's state given only partial, noisy observations is known as filtering. This project involves gaining an understanding of particle filtering techniques and reviewing the current literature before using particle filtering methods to assess various models.

## 2011 Interns

Here you can find details of the summer 2011 interns including a description of their research project.

### Lawrence Bardwell

#### Breast Cancer Screening

**Degree:** Lancaster University, BSc Mathematics (2009-2012)**Supervisor:** Matt Sperrin

Breast density is a substantial risk factor for breast cancer. It can be estimated from mammograms, which are taken regularly for middle-aged women. A breast density reading can then be used to produce individualised monitoring for women (e.g. screening women with high breast density more frequently). However, breast density is estimated by radiologists subjectively. It is of interest to calibrate the breast density readings so that each radiologist’s scores are on the same scale, and assess the consistency of each radiologist. Data is available for the readings made by radiologists: we can attempt to exploit the fact that each mammogram is read twice by each radiologist, and each mammogram is read by two radiologists.

### Elizabeth Buckingham-Jeffery

#### New Penalty Methods for Bilevel Optimisation

**Degree:** University of Warwick, MMath Mathematics (2008-2012)**Supervisor:** Konstantinos Kaparis

Bilevel problems appear in areas such as economics, engineering, medicine and ecology. These types of problems are optimisation problems which include as part of their constraints a second optimisation problem. The upper level (or leader's) problem corresponds to our aim to optimise a certain function. The notion of optimality takes into account the subaltern part of the upper-level decisions. This part is represented by the lower level (or follower's) problem. This project concerns the linear case of bilevel programs.

### David Ewing

#### The ABC of model choice

**Degree:** University of St Andrews, MMath Mathematics and Statistics (2008-2013)**Supervisors:** Dennis Prangle and Paul Fearnhead

While Approximate Bayesian Computation (ABC) is now well-established for estimating parameters, its use for model-choice is still in its infancy. There have been recent papers disagreeing about whether ABC can be used for model-choice, and if so how it should be implemented. This project looks at some simple applications, to see whether ABC can give reliable inferences about the underlying statistical model; and if so, how to implement ABC so as to infer the model as accurately as possible.

### Thomas Facer

#### Choice Modelling with Links to Optimisation and Compressed Sensing

**Degree:** University of Edinburgh, BSc Mathematics (2008-2012)**Supervisor:** Arne Strauss

In many business applications, frequent decisions need to be made that depend on the choice behaviour of customers. For example, e-retailers such as Amazon.com must decide on the assortment of results to display in response to a customer query; airlines or hotels need to decide on the available booking classes to display in response to a customer request. Similar situations arise for many other firms. An often-used approach to choice modelling is to identify product attributes that influence the customer’s decision and to select and calibrate a structural model based on these attributes that fit the observed data. Recently, an intriguing way was proposed to learn a choice model from data using concepts from Revenue Management, Inventory Optimisation and Compressed Sensing. This project gives an insight into these respective fields whilst working on a topic that is currently at the forefront of research and has wide applicability.

### Liam Fielder

#### Forecasting using time series methods

**Degree:** Lancaster University, MSci Mathematics with Statistics (with a year abroad: Australia) (2008-2012)**Supervisor:** Robert Fildes

One of the most important applications of statistics is the time series forecasting. The key application area is to forecast demand (for a product or service). This project gives an introduction to the area of business forecasting using a newly written textbook. It includes some software testing (and development if appropriate) as well as the evaluation of different methods on test problems.

### George Foulds

#### Portfolio Optimisation

**Degree:** Lancaster University, MPhys Physics First Class (2005-2010)**Supervisor:**Jonathan Tawn

The aim of any investor is to maximise their return. The highest return must be for a given amount of risk, or equivalently the risk must be minimised for an expected return. A mixture of analytical and simulation-based methods will be used to derive the properties of a portfolio and consequently the weight of investment that is given to each individual asset.

### Kaylea Haynes

#### Compressed sensing methods for problems in statistics

**Degree:** Heriot-Watt University, Edinburgh, BSc Mathematics and Statistics (2008-2012)**Supervisor:**Matt Nunes

Compressed sensing (CS) has recently emerged as an important area of scientific research for efficient signal sensing and compression. The main idea behind CS is that certain signals will be able to be entirely constructed using numerical optimisation algorithms from a relatively small number of “well-chosen” signal samples. This project is exploratory in nature and it provides the opportunity to learn about and research the area of compressed sensing, focussing on the role of CS in statistical applications for particular signals of interest.

### Clive Newstead

#### Parametric inference for missing data problems

**Degree:** University of Cambridge, BA Mathematics (2009-2012)**Supervisors:** Giorgos Sermaidis and Paul Fearnhead

A typical complication in parametric inference for missing data problems is the intractability of the likelihood. A well-established approach to maximum likelihood estimation is the simulated likelihood, where estimation is based on the optimisation of an unbiased Monte Carlo estimate of the likelihood. An important drawback, however, is that parameter consistency is achieved only when the Monte Carlo effort increases as a function of the data sample size, thus leading to computationally expensive algorithms. The aim of this project is to tackle this problem by constructing unbiased estimators of the log-likelihood, in which case consistency can be achieved even for fixed Monte Carlo size. The project involves standard techniques for Monte Carlo simulation and unbiased integral estimation and programming in R.

### Ragnhild Noven

#### Analysing the structure of (multivariate) time series

**Degree:** Imperial College London, MSci Mathematics (2008-2012)**Supervisors:** Karolina Krzemieniewska and Matt Nunes

Time series that are observed in practice are often highly complex in nature, for example, accelerometry signals arising from human movement experiments. The underlying behaviour of these signals is sometimes hidden or difficult to detect in the first instance. This project focuses on applied data analysis for complex time series and using statistical techniques to investigate changes in the underlying structure of time series. The project involves analysing real-world data arising from investigative health studies conducted by external collaborators.

### Robert Stainforth

#### Stochastic actor-based models for network dynamics

**Degree:** Durham University, MSci Mathematics and Physics (2008-2012)**Supervisor:** Stephan Onggo

A stochastic actor-based model is a model for network dynamics that can represent a wide variety of influences on network change, and allow us to estimate parameters expressing such influences, and test corresponding hypotheses. The nodes in the network represent social actors, and the collection of ties represents a social relation. The project involves reading and summarising the relevant research literature on stochastic actor-based models, learning how to use RSiena, preparing a set of data, and applying the technique to the data.

### Ivar Struijker Boudier

#### Exploring a new class of probability models for tail estimation in extreme value modelling

**Degree:** University of Glasgow, BSc Statistics (2008-2012)**Supervisor:** Ioannis Papastathopoulos

Statistical modelling of extreme values plays an important role in understanding the behaviour of unusual events such as extreme weather conditions, earthquakes and financial crashes. The most common approach to the modelling of extreme values is to fit an appropriate probability distribution to the tail of the data and extrapolate it to levels above which no data are observed. This class of distributions is called the generalised Pareto distribution which contains the Exponential distribution. However, fits finite samples are not always adequate and more flexible models might be appropriate. The project explores a new class of probability models that incorporates existing models as special cases. The project involves exposure to the theory of extremes, simulation studies for the applicability of the new models and the statistical analysis of a medical dataset.

### Lisa Turner

#### Facility layout

**Degree:** Durham University, MMath Mathematics (2008-2012)**Supervisors:** Yifei Zhao and Stein W. Wallace

Facility layoutFacility layout, in its simplicity, is about where to place different machines on a production floor in situations where the use of conveyor belts is not possible because the different products do not all visit all the machines and, even if they did, not necessarily in the same order. So transportation of the products from machine to the machine can be complicated if the machines are far apart. In fact, it can result in total chaos. The ultimate goal is to place machines close to each other if it is likely that products need to be transported between them. The problem we study is simply: how should the machines be placed on the production floor?

## 2010 Interns

Here you can find details of the summer 2010 interns including a description of their research project.

### Helen Blue

#### Optimisation on road networks

**Degree:** Lancaster University, MSci Mathematics (2008-2012)**Supervisor:** Richard Eglese

There are many problems that involve optimising an objective that is relevant to journey planning over a road network. The first part of the project will be to review some of the existing methods for finding the shortest (or least cost) paths in a network. The second part of the project is to develop an effective algorithm for finding the least cost path between two points where the speed and cost of travelling along an arc depending on the time of day.

### Rhian Davies

#### Investigation of Approximate Bayesian Computation

**Degree:** Lancaster University, BSc Mathematics (2008-2011)**Supervisor:** Dennis Prangle

For many complex phenomena, fitting realistic statistical models is mathematically intractable by standard methods. A recent computational alternative is to repeatedly simulate the model to find good fits. This project investigates one such method (Approximate Bayesian Computation) on data from a Tuberculosis outbreak. The aim is to assess various implementations of this method through computer experiments, which will involve exposure to modern statistical methods and software.

### Jamie Fairbrother

#### Multi-scale methods for texture analysis

**Degree:** University of Warwick, MMath Mathematics (with a year abroad: Europe) (2006-2010)**Supervisor:** Idris Eckley

Wavelets are a recent and powerful mathematical tool which were developed in the 80s. They provide a novel way of decomposing the information within signals and images, providing information at various scales (you can think of these as viewing windows). Texture analysis is a particular application area in which wavelets have been successfully used in recent years. Broadly speaking the texture of an image is the visual character of a region whose structure is, in some sense, regular (e.g. the appearance of a woven material). This project will investigate the potential of wavelets and related methods to modelling structure within textured images.

### Dave Grant

#### Time-Dependent Queueing Systems

**Degree:** University of Manchester, MMath Mathematics (2006-2010)**Supervisors:** Navid Izady and Dave Worthington

In general, in the area of mathematical modelling, modellers often make simplifying assumptions in order to make a problem ‘solvable’. In doing so the modeller is hoping that the solutions produced by the simplified model will nevertheless be valid (in some sense) despite the simplifying assumptions. Important examples in the area of modelling queueing systems are, for example, call centres, accident and emergency departments, hospital emergency admission units, intensive care units. Our interest is in modelling aspects of such queueing systems that typically exhibit time of day (and possibly day of week) variations in their underlying arrival rates of ‘customers’ as well as the usual stochastic variation in arrival times and service times.

### Rachael Griffiths

#### Dynamic modelling for wind-prediction

**Degree:** Lancaster University, MSci Mathematics with Statistics (with a year abroad: Australia National University) (2007- 2011)**Supervisor:** Ben Taylor

Dynamic linear modelling is a technique for the analysis of time series data when the governing parameters of the model themselves evolve over time. In particular, it is easy to obtain predictions using these methods. This project concerns the short term modelling and prediction of wind speeds and hence power output at wind farms. The application is important in deciding whether a potential new site will deliver an acceptable amount of energy.

### Dominic Hickie

#### Pricing on-demand online services

**Degree:** Lancaster University, MPhys Physics with Particle Physics and Cosmology (2007-2011)**Supervisor:** Chris Kirkbride

Cloud computing is a relatively new concept for Internet-based computing in which resources, software, information and applications are provided to user devices (PC, laptop, mobile) on-demand. This project will consider various models for the cloud environment in order to determine how resources can best be utilised to meet demands for service and how to price such services effectively.

### Samantha Hinsley

#### Examining the applicability of a new technique for threshold selection in extreme value modelling

**Degree:** Lancaster University, BSc Mathematics (2008-2011)**Supervisor:** Jenny Wadsworth

It is the extreme values that are important in many applications, such as flooding, stock market crashes, and wind storms. To estimate the frequency of extreme events a statistical model is fitted to the extreme values and extrapolated to the value of interest. This project is concerned with investigating appropriate probability models for “extreme values”, or more precisely the tails of a probability distribution. However there is a challenge in defining what makes a value “extreme”, i.e., from what point should we begin to model the tail? The project will look at examining the applicability of a new method for helping to define a suitable threshold. This project will involve mathematical computation and exposure to real-life problems using a variety of different data sets.

### Nicola Huxley

#### Detecting changes in mean

**Degree:** Lancaster University, MSci Mathematics (with a year abroad: Australia National University) (2007- 2011)**Supervisor:** Rebecca Killick

In recent work, we collaborated with a company to identify whether there was a change in storminess in the Gulf of Mexico. This project arises out of this work. Detecting changes in properties, such as the mean, of a process are important in many other areas of research such as quality control. Although there are many algorithms designed to detect changes in mean, there has been little comparison of the performance of these algorithms. This project will provide an opportunity to research different algorithms, program them and then conduct simulation studies to test their performances under various circumstances.

### Robert Maidstone

#### The Change-Making Problem

**Degree:** Lancaster University, BSc Mathematics (2008-2011)**Supervisor:** Adam Letchford

The Change-Making Problem is concerned with finding the minimum number of coins needed, in a given currency, to reach a certain amount. Suppose, for example, you are in Britain and you wish to give somebody 39p. The minimum number of coins needed is five (20p, 10p, 5p, 2p, 2p). If you were in the US and you wish to give somebody 39c, the minimum number of coins is six (25c, 10c, 1c, 1c, 1c,1c). This topic may seem, at first sight, to belong to recreational mathematics but it is in fact a classical operational research (OR) problem with many applications.

### Tim Park

#### Non-stationary time series analysis

**Degree:** Lancaster University, MPhys Physics (with a year abroad: North America) (2006-2010)**Supervisors:** Idris Eckley & Matt Nunes

Most signals (i.e. time series) observed in the real-world are non-stationary in their nature. This project will explore the behaviour of datasets related to financial data. We will investigate the structure of these signals using wavelets - a form of localised basis functions. The project will give an opportunity to learn about wavelets, their application to time series and provide the experience of conducting advanced exploratory data analyses.

### Emma Ross

#### Facility layout

**Degree:** University of Edinburgh, MA Mathematics (2007-2011)**Supervisors:** Yifei Zhao and Stein W. Wallace

Facility layout, in its simplicity, is about where to place different machines on a production floor in situations where the use of conveyor belts is not possible because the different products do not all visit all the machines and, even if they did, not necessarily in the same order. So transportation of the products from machine to a machine can be complicated if the machines are far apart. In fact, it can result in total chaos. The ultimate goal is to place machines close to each other if it is likely that products need to be transported between them. The problem we study is simply: how should the machines be placed on the production floor? To do this we shall solve numerically small cases of the problem so as to try to understand the emerging structures (designs).

### Ben Sloman

#### Selecting a portfolio in finance

**Degree:** University of Oxford, BSc Mathematics (2009-2012)**Supervisor:** Ye Liu and Jonathan Tawn

In finance, the aim is typically to make as much money as possible while incurring as little risk as possible. One way of reducing the risk is to hold a selection of investments (a portfolio). However, as some investments are correlated then statistical methods are required to find the best way of balancing risk and expected return. In this project, you will explore the basic assumption that returns of investments are multivariate normal using a range of financial data and investigate some extensions of this assumption which are more realistic and result in better decision making in optimising the portfolio choice. The project will involve a real problem with real data, the need for statistical modelling, simulation and optimisation.

### Michael Thistlethwaite

#### Agent-based Physical Asset Maintenance Simulation Modelling

**Degree:** University of Birmingham, BSc Physics (2008-2011)**Supervisor:** Stephan Onggo

Physical assets such as houses, motorways/roads, water pipes and electrical networks need maintenance because the condition of a physical asset deteriorates with time and usage. The risk of an asset failure (e.g. flooding) or not being able to provide the required service quality (due to weak water pressure) increases as the assets condition decreases. The cost of a repair/replacement process, including the liability incurred due to an asset failure, can be very high. Therefore, a good maintenance strategy is needed. In this project, we will use one of the least explored OR modelling techniques for evaluating asset maintenance strategies, that is, an agent-based simulation model.

## 2009 Interns

Here you can find details of the summer 2009 interns including a description of their research project.

### Anna Fowler

#### Detecting changes in regression for time series: a review and application

**Degree:** MSci Hons Mathematics with Statistics/North America-Australasia at Lancaster University**Supervisors:** Idris Eckley and Rebecca Killick

This project aimed to detect changes in regression (trend) in these datasets using industrial data sets, including several variables, provided by Unilever. Several existing methods for detecting changes in the regression were investigated, including (normal) maximum likelihood (with and without penalty), residual sum of squares and cumulative sum of squares before conducting a simulation study looking at their effectiveness. From this simulation study, the most appropriate algorithm was chosen using statistical methods and finally, the algorithm applied to the various industrial datasets. Anna produced a technical report of the findings and had the opportunity to present to statisticians at Unilever in Amsterdam.

Anna is now pursuing a PhD at Imperial College, London.

### Jak Marshall

#### Optimal Control Policies in adjustable queue systems

**Degree:** MSci Hons Mathematics/North America-Australasia at Lancaster University**Supervisor:** Kevin Glazebrook

Countless industrial processes include some variety of queueing system, for example, telecommunications and transport. Problems regularly arise in how queue operators manage the demand for their services. The challenge is to find an optimal way of allocating resource towards providing service across a collection of independent service stations serving customers in corresponding queues given the delicate balance of overspending on service infrastructure versus underspending and incurring costs due to system neglect. The approach to solving this problem relies heavily on computation and a good understanding of queueing objects in order to simulate an ideal queueing system. The key outcome of this project was to deliver a near-optimal method of managing queueing systems by considering a case study involving queues with only limited modes of service available at any time.

Jak joined STOR-i in 2010 to pursue a PhD in STOR.

### Erin Mitchell

#### Queueing Systems and Optimisation of Computer Component Repairs

**Degree:** BSc Hons Mathematics at Lancaster University**Supervisor:** Kevin Glazebrook

Repair companies often offer a promise of a turn-around period in which a faulty product will be repaired and returned to the customer, ensuring optimal customer satisfaction. In the majority of cases, the repair company will not complete all of the repairs themselves, if any at all, but will instead outsource the work to several different sub-companies. Upon receiving a broken product, a computer for example, the repair company must then decide to which of its contracted sub-companies to send the machine. Company A, for example, maybe a larger, more specialist or more equipped business, and as such may be able to perform a given repair a lot quicker than Company B or C. If a company has a quick turnaround on their repairs, it may be desirable to send more broken machines to them than to the other companies. However, a balance must be struck between using the ‘best’ company and making efficient use of all the resources. With different companies being different distances away from a location (the repair company warehouse, for example), the time taken for travel and dispatch must also be considered. Taking into account all of these different factors, a model can be built in order to decide how many repairs to send to each company. Once a basic model has been designed, different probabilities can be assigned to factors, such as the probability of machine breakdown and machine repair, in order for choices and allocations to be made in the most intelligent manner.

Erin started a PhD at Lancaster University in collaboration with Garrad Hassan in 2009.

### Daniel Suen

#### Graphical modelling of divergence weighted independence graphs in the Criminal Justice System

**Degree:** BSc Hons Mathematics at Lancaster University**Supervisor:** Joe Whittaker

Graphical models show how the relationships between several variables can be shown in graphical form. This project required learning the theory behind divergence weighted independence graphs and the modelling of such graphs using the statistical package, R. A key part of the research focused on illustrating how these graphs can be used to identify relationships between factors which affect trust in the Criminal Justice System. On completion of the internship, Daniel produced a comprehensive scientific report including applications using British Crime Survey data.

Daniel joined STOR-i in 2010 to pursue a PhD in STOR.

## Blogs

Click on the links below to see the blogs written by each cohort.

## 2021 Blog

Here you can find out more about the STOR-i internships experience as told by the 2021 STOR-i interns.

**Week 1 written by Sam Bell and Emma Costello**

We’ve just finished the first week of our STOR-i summer internship. It’s been a whirlwind of meeting new people - both our fellow interns and actual MRes and PhD students. After a year of distance learning, it seemed almost natural to meet our supervisors on Teams coffee mornings and hang out with new friends on Zoom quizzes.

To ease us in we’ve had a lot of social events, for example, the Introductory Presentation. All the interns made a 5-minute so we could find out more about each other and compare hobbies and interests. Aside from the York vs Lancaster rivalry flaring up, it was great. A lot of the staff at STOR-i showed up to look at our pet pictures. Emma doesn’t have any pets, so they had to make do with her baby pictures.

We’ve also been doing some R sessions. We had to program a “solution” to the travelling salesman problem which was great fun. Both Emma’s and Sam’s group opted to use a simulated annealing method. It’s been nice to flex our problem solving muscles again after the summer break.

Work for this first week has been pretty light, as recompense for having the internship remotely. I expect next week we’ll be buckling down a lot more. We’ve been assigned some reading to do in order to get a basic understanding of the subjects our PhD level supervisors have given us, which sounds more daunting than it is, as we’ve got them there to help.

**Week 2 written by Kitti Kovacs and Euan McNaughton**

Our second week began with the penultimate introduction to R-programming class. Here we were challenged with producing the best algorithm to play noughts and crosses (or tic-tac-toe). The group was split into three breakout rooms to work together on the challenge in preparation for a battle between the teams on Wednesday to see who had coded the best algorithm. Later in the day we had the opportunity to meet Lesley English, the faculty librarian, who showed us how to use the online library to find books and papers we may need for our projects.

This week also included presentations from some of the MRes students who talked us through their research topics. This was very interesting as it provided a strong idea of the potential research areas on offer at STOR-I.

The battle of the R code occurred on Wednesday morning during the final R-programming session. We began by running each groups algorithm against a ‘random next move’ algorithm (which should be easy to beat) for 1000 matches, to see how each performed. Each group emerged with a good record. Once the groups algorithms were put head-to-head, one of the groups got completely destroyed by the other two, and so their built in ‘trash talk’ function luckily remained unused. When the two better groups played each other, they drew every single time! This meant that there was no clear winner, but they had the bragging rights over the group who prematurely prepared to talk down on the opponent in their algorithm.

There was more free time this week compared to last which allowed all the interns to work more with their projects through reading and potentially beginning some coding and tackling problems. This was enjoyable because there was more time to get stuck into our projects and make some real progress with our work.

On Thursday we had our second evening social organized by the MRes students. We were playing poker this time, of course, not for money. We had some initial difficulties with the online poker site, but this did not stop us from playing a few practice and one super serious round of poker.

The STOR-i forum on Friday finished off the week with some PHD students and their pi-minute theses. This involves the students describing and presenting their research within 3.14 minutes exactly! This was amusing at times because often some of the presentations would be cut short because they ran out of time.

**Week 3 written by Megan Harries and Itamar Aharoni**

This week, the majority of our time was spent continuing work on our individual projects and having meetings with our supervisors. Our small coffee groups changed over, so it was nice having a chat with different interns and getting to know them more (although we will miss having a daily catch up with each other and Owain).

On Tuesday the introduction to LaTeX course started where we ran through the basics of creating documents and inputting maths into them, led by two PhD students Ben and Kes. It was a nice and gentle introduction, and we especially liked the beautiful beamer template.

Wednesday was quite different to the standard intern day with a Problem-solving session scheduled, one of the STOR-i alumni Jamie-Leigh facilitated this; we learnt about some of her work as a Data Scientist at a local NHS trust and were set the question for the day: “How could Statistics, Operational Research, Machine Learning or Data Science be used to improve our patient care?”. This open question gave us a lot of freedom to brainstorm, and we worked in small groups to come up with a potential solution and present it to the other interns. Our group’s solution of integrating incoming A&E data with 111 calls and health tracking technology got the most votes at the end of the day, therefore winning bragging rights.

It’s been fascinating learning about the different research areas within STOR-i through the forums each Friday and the week ended with Luke Mosley presenting his work on economic time series and how different indicators influence GDP in England and Wales. The past 3 weeks have flown by, and it’s surprising to know that we’re halfway through the internship already.

**Week 4 written by Adam Page and Owain Morgan**

We have now finished week 4 which means we are well and truly into the second half of our internship. The week started off with our final Introduction to LaTeX session, where we honed our typesetting skills with a focus on Beamer presentations to help us with our presentations in the final week. The day then ended with an intern-only coffee meeting. These meetings are scattered throughout the week and are a great time to relax and have fun with your peers in the cohort, as well seeing what everyone else has been up to. I specifically remember Owain, Sam and me spending nearly an hour talking about “Bacon Numbers” and “The Seven Degrees of Separation”, which was surprisingly fun.

Tuesday gave us a chance to listen to what the MRes students had been working on for their dissertation projects. One was about using a range of statistics to help Tesco markdown the price of perishable goods (reduced yellow stickers, if that helps), while another one was about optimising an extended travelling salesman problem with regards to supermarket deliveries. To mirror this, we had a Stor-i forum, where a PhD student presented their work on “Urban Heat Islands” to the whole department.

On top of that we also had meetings with our PhD supervisors and a slightly bigger group of interns - this is a great way to see new people and also get checked up on if something is going wrong or if you need some help.

Adam’s supervisor invited him to an OR reading group, the premise of this is that the group reads a paper beforehand and then discusses it during the meeting. It was really interesting to be a part of, especially listening to the way higher level researchers talk about papers and mathematical ideas.

We have also been doing a lot of work on our own individual projects - it hasn't all just been meetings! We’ve all had the opportunity to get some work done on our projects, especially now we feel like we’ve cracked the nut! To help us have some fun there was a range of social events organised by the MRes students. The first was a virtual escape room (exactly what it sounds like), while the second was an online board game social. We played Pictionary and Codenames, it was really fun to see how some people thought but mainly to laugh at how hard it is to draw with a mouse instead of a pencil.

Looking forward to seeing what the last 2 weeks of the internship will bring!

## 2020 Blog

Here you can find out more about the STOR-i internships experience as told by the 2020 STOR-i interns.

### Week 1

###

#### Written by Katharina Limbeck

Summer 2020 was a very interesting time to start working as interns at STOR-i. For the first time, the whole internship was to be held fully online due to the COVID-19 pandemic. Many of us were very grateful this program was still happening but also very new to the concept of working from home. We adapted quickly however, as the first week was full of interesting talks, social activities and engaging introductions to our summer projects.

We got to meet our supervisors on our first day for a virtual lunch meeting, which was a nice introduction to the program. Later this week all interns started reading about our projects and getting to know more about research into statistics, operational research and mathematics. Especially the course we started as introduction to programming in R was very well organised and a great experience of online learning.

During this first week, we also got to know all our fellow interns and some of the MRes students by doing 4 hours of teambuilding exercises. Just like the virtual coffee breaks scheduled throughout the internship, these activities gave us a chance to connect with other interns while working from home.

Even the social activities that were usually planned throughout the internship were translated to this new virtual environment. Instead of doing a scavenger hunt in person, we did a scavenger hunt at the campus of Lancaster University online using Google Maps. As not all of us have been to Lancaster before, it was an entertaining opportunity to get to know the university from the comfort of our homes. Especially the bonus exercises that asked us to take pictures with masks, make a video of how to wash your hands properly, build the highest toilet paper tower and find the largest stockpile of cans were a lot of fun. Some of us also participated in a virtual pub quiz, which was a great chance to engage with other students at STOR-I and answer interesting questions related to multiple topics.

On Thursday, we all held a presentation about ourselves, which was mostly an excuse to show cute cat pictures and introduce ourselves to everyone at STOR-i. Overall, we got a very warm welcome to the team and were looking forward to learning more throughout the next few weeks of our internships.

### Week 2

#### Written by Daniel Morton & Taj Patel

We started the week with our second group coding challenge; building an algorithm to play noughts and crosses. We worked together in groups to try and build the best code we could before putting them against each other on Wednesday. Though it proved more difficult than we initially expected, there was a clear winning group whose algorithm could never lose - to the dismay of my own team. Sadly, there were no prizes for the winners, but they had their bragging rights and it was an exciting way to end the R Programming tutorials.

This week mainly saw interns reading up more on their projects and beginning some deeper analysis on the problems after being able to talk through more ideas with fellow interns and further discussions with our supervisors.

The social event for the second week was “Cards with Cocktails”. The creativity with the cocktails was top tier, I think 2 of us had managed to muster the strength to walk to our respective fridges and grab a cider. As the name suggests, we played a few cards games and for some reason Daniel H kept winning “Crazy Eights” so I suspect something fishy was going on. Overall, this was great fun, and a nice opportunity to get to know each other and the other MRes students.

A highlight of the week was the talks given by current students on their respective studies - it is both interesting and helpful to listen to other things people are working on and how they approach the problems that they’ve encountered. Tuesday saw a number of project presentations from MRes students, and Friday's forum was given by Anja Stein who talked about her work on recommender systems while allowing for regular updates using Sequential Monte Carlo methods.

### Week 3

#### Written by Luke Fairley and Daniel Hodgson

The third week eased in nicely with Monday, as with no specific sessions happening, we all got on with our personal projects, aside from a quick weekly meeting with the cohort split into two groups, to discuss how we were finding our projects. On Tuesday, we had our first LaTeX session, introducing us to some of the basic functionality and structure of the document software. One of us also had a supervisor meeting, in which we discussed our progress and issues thus far, and what to do next. That same supervisor then sent a fixed piece of code at around 11pm, a welcome surprise.

Wednesday introduced us to our first problem solving day, during which we were split into three teams, given lots of data on the top 200 streamed songs on Spotify over a period of time, and asked to find the features of songs best correlated with a high number of streams; that is, to build the perfect hit song. We tackled problems such as dealing with slightly messy data, handling strings and words in R when analysing the names of the songs, and other types of data analysis. Some even employed more advanced techniques such as Generalised Linear Models.

We began Thursday with our second LaTeX session, which proved to be more in depth and challenging than the first, quickly advancing to the creation of tables and matrices, and introducing the inclusion of pictures. The rest of the day was characterised by more work on our personal projects, broken up with another weekly meeting, this time in groups of 4. The week wound down nicely on Friday, mostly comprised of more personal work and writing this blog, but also featuring a short forum presented by Tom Grundy, presenting some of his work and research regarding changepoints. His work seemed to be a fusion of changepoint statistics and linear algebra, with the discussion of subspaces and multivariate time series. These ideas were applied to motion capture, where the changepoints being detected were different actions, such as walking or punching.

### Week 4

**Written by Taj Patel and Daniel Morton**

Week 4 was quite an exciting time for us. A lot of us were given reading material to be covered in the first 3 weeks, so it was great to start taking our projects in the directions we wanted, and even started coding some solutions to our respective problems.

Week 4 (thus far), seems to be UK’s hottest week, with temperatures hitting low 30’s. This meant during any phone call at least one person’s laptop would be a few degrees away from exploding. Every Monday we’d have a group catch-up call with one of the STOR-I leaders (Luke or Jake) to discuss how our projects were progressing and for any questions we might have. This week we’d had a conversation about the mountains and hikes we’ve missed out on, thus I made it my mission to visit Lancaster at least once in my lifetime and do all the trails. Tuesday started off a bit more relaxed, Ed, Matt R, Kim (MRes students) gave talks about their projects, all of which were insightful.

On Wednesday we had our final LaTeX session, in which we learnt how to create a professional academic poster. This was quite an interesting session, as it really demonstrated the power of LaTeX. During this session we also had the opportunity to view/critique some of the posters done by previous MRes students. On this day was also the August OR reading group, which was great to see how (admittedly difficult) papers can be interpreted and discussed in a group to enjoy a deeper understanding.

Thursday was quite the eventful day. Hamish and Peter had organised an “Online Escape room”. You can imagine our faces upon initially hearing the online nature of this event; however, we all found it to be great fun. The escape room consisted of a series of online clues and questions. It’s fair to say Daniel H had pretty much carried our team to freedom, he’d solved one of the riddles that even upon explaining to me and Peter, we’d still failed to understand. Our Team did take great pride as we didn’t just beat the other teams to escape, but also beat Hamish's previous time.

### Week 5

#### Written by James Boyle

So week five has been and gone. We learnt quite a lot about the ins and outs of doing a PhD this week, in the form of a Q&A session with a couple of PhD students, and a session with the STOR-i co-director about PhD admissions and applications. It was interesting to hear more from the PhD students, and to learn more about the sort of things to consider when deciding whether, where and how to apply for a PhD. In particular it was interesting to hear about the differences in the kinds of jobs one might get after a PhD compared to an undergraduate degree, which is something I at least hadn’t even considered before.

In other events, this week saw the arrival of the second problem solving day. This week was all about how airports allocate usage slots to airlines. Airlines request the use of airport facilities for specific arrival and departure times, and the airport then has to figure out how best to allocate slots in accordance with this, as in general for peak times demand outstrips supply. Obviously, you want to try and minimise the amount by which the slots airlines are given are different from those requested (termed the displacement). We were tasked with finding ways to ensure that slots were allocated fairly (the meaning of which was up to us to decide). As it turns out there’s quite a lot to consider here, such as ensuring that no individual airline suffers significantly more disruption, proportional to the amount of flights run by the airline, than any other one, or that no flight gets displaced from the requested arrival/departure times by too much. You’ve also got to ensure that the allocation algorithm can’t be “gamed” by airline companies by, for example, requesting different slots to what they actually want.

As for my project, this week has mostly consisted of a lot of simulation. I’ve recently been learning about various different models for network data, such as brain scans, and this week have been using simulation methods to compare their performance. It can be quite frustrating to have your code spend five minutes simulating a bunch of data to make a graph, only to find out at the end that you misspelt a word in the title and are going to have to run the simulation all over again, but other than that it’s been quite fun.

Anyhow, next week’s social involves a baking competition, so I’m going to have to stop writing this blog now and come up with a way of encapsulating the idea of “lockdown” in a cake…

### Week 6

#### Written by Luke Fairley

Week six was characterised by STOR-i social events and time to work on our individual projects. On Monday afternoon the interns were split into two groups for the weekly meeting. This allowed us to discuss our projects, the Austrian education system, how fortunate it is to have a cat who shows affection and the A-levels U-turn made on Monday, as well as other topics. I cannot speak for the other group, but it is safe to assume that different but similarly varied conversations took place during their meeting also.

On Tuesday morning talks were given by three MRes students, which was an interesting insight into their respective PhDs. Other than that, we had all of Tuesday to work on our personal projects and there was a bake-off social activity during the evening. The theme of the social was lockdown, which allowed for extensive individual interpretation and creativity - including a gingerbread prison and a cake shaped like a loo roll. It was a tight contest, but it was eventually won by Jamie’s gingerbread. Wednesday and Thursday had no timetabled sessions so there was plenty of time for us to work on projects. As we are about to enter the penultimate week of the internship, this time was particularly useful to finish the research portion of our projects and collect some results!

Friday consisted largely of time to work on individual projects but there was also the weekly STOR-i forum; this week given by Jess Gillam. The work presented involved detecting changes in the daily routines of elderly people; by analysing the data recorded by various sensors around their home. There is significant research showing that a change in daily routine can be an indication of change in health and wellbeing. The method presented by Jess allowed for the model for an individual sensor, over time, to alter its probability of being triggered given information from surrounding sensors.

On Friday evening, the week was wrapped up by the virtual STOR-i ball, using Zoom. The main bulk of the ball was given by the pub quiz, hosted by Jordan, which covered a variety of topics from general knowledge, to weird trivia about STOR-i staff. This was followed by Jon’s speech and the awards, characterised by banter, quick jabs, and some innuendo for good measure. While a few people left after this, remaining people were randomly sectioned off into Zoom breakout rooms to chat amongst themselves.

### Week 7

#### Written by Jack McGinn

Week 7 went in a similar way to all the other previous weeks. With the end of the internship on the horizon, our thoughts as interns began to start considering how to wrap up our individual projects. Less was scheduled this week in comparison to most to give us time to complete our projects and start making the presentations for the following week. The Monday still had the usual catch up in the afternoon with all the other interns. This was a good opportunity to catch up see where the other interns were at with their own projects. It was also a good chance to hear about what else they had been doing with their time beyond the project. Tuesday had nothing as a group scheduled, so it was a good chance to really get to work on our projects. I found the fact that the project was ending soon to be particularly good fuel when it came to getting plenty of work done this day.

On the Wednesday was the final problem-solving day of the internship. We were split in to three teams and posed with the question “What models could we implement in order to improve the efficiency of A&E?”. This problem-solving day was less orientated around using a data set then the previous two, but more drafting up ideas and how they can be implemented. It was a very interesting question and it was fascinating to see what problems an A&E department has and how statistics can be used to resolve them. Like the other problem-solving days, it ended with a 10-minute presentation from each team to feedback to the other teams the ideas each had come up with. Thursday and Friday were again mainly used to complete the work of our own projects. For me that meant some extra meetings with my supervisor to discuss the work I had done so far and what might be good to show in the presentation. Friday gave us one last opportunity to have our regular Friday meetings with the other interns. It was a great opportunity to see where the other interns were at with getting ready for their presentations.

### Week 8

#### Written by Adeeb Mahmood and Matthew Speers

This week was very quiet as everyone was putting the finishing touches on the presentations which were to be given at the end of the week to all of STOR-i. However, unlike the last years we did not have to make a poster this time, this is mainly due to the covid-19 situation. It turns out summarizing 8 weeks of work into a quick and snappy 10-minute talk proved to be quite difficult, although looking at the end product of everyone's presentation we feel everyone summarized it quite well.

To start off the week, we had an operational research reading group where we read a paper together. The paper we looked at was ‘Performance Variability in Mixed-Integer Programming'. It was a good chance to talk about some maths and bounce some ideas around. Another big event that happened this week was our exit interviews. These consisted of a one to one talk with either Jake or Luke for about 15 minutes where they asked for improvements to STOR-i, about our future plans, if we can see ourselves joining STOR-i and what we felt like we contributed to STOR-i. It was a good chance to reflect on our last 8 weeks and to think about what we want in the future now we’ve come to the end of the internship experience. We’ve all been able to get a better idea of what research is really like and if it is the right choice for us. We are very appreciative to our supervisors, PhD and MRes students, and anyone else that made the experience what it was. It is greatly appreciated, and we hope everyone enjoyed it as much we did!

## 2019 Blog

Here you can find out more about the STOR-i internships experience as told by the 2019 STOR-i interns.

### Week 1

#### Written by Joe Holey and Katy Ring

At the weekend we moved into the building in Furness college where many of the STOR-i interns are staying for the duration of the internship. Katy was impressed with the campus environment (particularly the large duck community) having never lived in student accommodation before.

The first day of the internship started with some introductory talks from STOR-i director Jonathan Tawn. We then did some icebreaker activities to get to know the other interns and some of the MRes students better. These included trying to untie a human knot which wasn’t our forte and team juggling which suited our skillset better.

Then we got to the highlight of the day – the buffet lunch! The feast lasted for 2 whole hours and much merriment was had by all.

On Tuesday, we got under way with our work, beginning with an R workshop (more of these followed throughout the week) and meeting with our respective supervisors who introduced us to their work and our projects. Following the day’s work, a few of us joined some members of staff to enjoy a game of football in the sun.

On Thursday, we each gave a 5-minute presentation about ourselves. We were surprised to find that we weren’t just giving these presentations to each other but seemingly to all STOR-i staff members and their extended family. These were the perfect opportunity to see embarrassing baby photos, cute pets and Shyam’s hairstyle woes. After the presentations we went on a scavenger hunt organised by the MRes students, which was a great opportunity to get to know the campus better. There were several opportunities to gain bonus points throughout the event, including getting a photo of someone in your group getting soaked in the fountain by the great hall – Joe gladly obliged by sticking his head right in the water. At the end of the scavenger hunt all the groups met up for a barbecue which featured loads more free food as well as some frisbee and more football.

The working week ended with us going to our first STOR-i forum which was presented by Sam Tickle. These are an opportunity for members of staff to share their work with the rest of the department and for everyone else to learn something about a field of STOR that they may not be familiar with.

Finally, some of the interns attended an “Applied Probability Night” on Friday where we were able to test our poker skills against each other.

### Week 2

#### Written by Matthew Darlington and Shyam Popat

After work on Monday there was badminton, which was a good chance to get to know some of the PhD students in a relaxed atmosphere. On Tuesday there was football as usual in extremely hot conditions, and also a meal and pub quiz organised in Lancaster. We split up into three teams, one team got to win a prize for the most average team and were awarded £10 and a curly wurly. We also had the R course where we competed with our code for the travelling salesman problem, and a box of celebrations for the winners.

During the middle of the week we had a break from the activities which was a good opportunity for us to make progress on our projects.Thursday afternoon we made R code to play noughts and crosses and then we had a tournament with chocolate again for the winners. There was a bit of controversy with the final results as once team had accidently made their method to overwrite the other teams moves! On Friday we had the second forum by Anja Stein, who talked about her work on recommender systems, followed by tea, coffee and biscuits in the hub.

At the weekend, there was a trip to climb up Scaffel Pike. One of the two cars got lost on the way there and ended up going up hardknot path which is one of the worst roads in Europe! It was a very steep climb up to the top, but once there it started to rain for the remainder of the walk. Hard work but worth it at the end with a pub meal back at the Boot and Shoe.

### Week 3

#### Written by Connie Trojan and Shyam Popat

On Monday, we moved into our new base room in STOR-i, right next to the kitchen and our endless coffee supply. We spent (wasted) some time moving our tables and sofas around to create group working and social areas, including a ‘mini-hub’ where we promptly enforced a mandatory 11am coffee break.

Since we had already learned all there was to know about R, on Tuesday we started an introduction to LaTex, learning the basics of document and presentation creation. We once again put in an appearance at the pub quiz, where one our teams tied for second place.

On Friday, we attended a PhD talk from Jess on modelling categorical data. We ended the week with a pub crawl in the city centre, taking full advantage of National Pub Fortnight to claim free pints at the White Cross.

### Week 4

#### Written by Liv Watson

Realisation that we were already coming up to the half way point of the internship this week started Monday morning with a sinking feeling, but then Matt pulled out some homemade banana bread and all was well with the world again. Stressed-out nervous laughs about broken code could be heard in the base room showed that we all were wondering how we’re going to get our R code working correctly in time.

Tuesday brought talks from some of the MRes students – Chloe, Aimee, Graham, Drupad and Thu – all of which were highly interesting and a welcome break from working on our own projects. The afternoon break was spent figuring out the picture round for the quiz that night, and the celebration that occurred when it was finally figured out was a mighty one. That evening, we decided to take advantage of Study Rooms Tasty Tuesday offer of 50% off mains before heading to the White Cross for the weekly pub quiz. Whilst we didn’t win the overall quiz we did win the most calorific prize they have had – so really who won here!? Still not us.

On Wednesday we had our final LaTeX workshop, where we learnt how to create posters so we would be fully equipped to make our end of project posters. It also featured a viewing of first year PhD students’ posters, where we had to say what we liked and disliked about their posters – all I can say is that I hope they are nicer about ours then we were about theirs!

Friday’s forum was a really engaging talk given by Matt Bold all about his *BIG new scheduling problem *– the decommissioning and safe clean-up of legacy nuclear waste at the Sellafield nuclear site in West Cumbria. That evening a couple of interns headed down to the bar to watch the kick off of the Premier League and to play some pool, I went home and played with the puppy...

Sadly, we had to postpone our planned trip of boating and a picnic in Coniston due to the harsh weather in the Lake District on Saturday. Instead we had a boardgames day in the Hub and left the brave(?) to tackle the storm.

### Week 5

#### Written by Katie Dixon

On Tuesday, a handful of the PhD students held a session titled ‘Life as a PhD student’. This was an opportunity to hear first hand what to expect should we choose to continue our studies as a STOR-i student and it allowed us to ask any questions that we had. In the evening, we managed to put together a team for the pub quiz. The team came in third place and they were only 3 points off winning the whole quiz!

As usual, on Friday we attended the STOR-i Forum but this week there was a twist. Instead of the normal set up (one 30-minute presentation), we were given mini lectures from a range of the STOR-i team where they had to outline their area of study in a maximum of pi minutes. Following the forum, we all attended a bake sale in the hub where the fantastic bakers managed to raise over £150 for Mind.

On Saturday, some of the interns chose to tackle Fairfield Horseshoe. In classic Lake District style, they managed to experience all four seasons in one day – some even happening at the same time! This was followed by a trip to Grasmere where Dylan bought 30 bits of the notorious gingerbread all for himself!

### Week 6

#### Written by Matthew Gorton

We finished Tuesday with a barbecue to celebrate Katy’s birthday. During this, Shyam came up with the innovation of barbecued curly fries, which I think it’s fair to say proved a mixed success. In a last-minute moment of ingenuity, Katie and Liv improvised a birthday cake by sticking a candle into the last remaining bread roll. We managed to get everything cooked and eaten just before it started tipping it down! Four of us, plus Sam, then went to the pub quiz. We were sadly unsuccessful prize-wise, but had fun, nonetheless.

Wednesday was not a normal working day, instead we had a 'problem-solving day’. Our task, set by Rob Shone, a researcher in Management Science, was to find a solution the problem of scheduling aircrafts at an airport. It turns out that this is a very complicated task!

Airports can only have a certain number of planes taking off and landing within a certain time period (say, per hour). So, airlines bid on arrival and departure times. Our task was to come up with a method to schedule arrival and departure times that minimises the ‘displacement’ – the difference between the time requested and the time assigned.

All three groups ended up coming up with different ideas and solutions, and we all thought of other issues that you might need to consider: leaving time for bad weather or emergencies when scheduling, different types of aircraft requiring different turn-around times. Rob told us that we had exceeded his expectations, and he even asked us to send our slides to him for them to look at for ideas! Quite impressive for a bunch of interns only working for a single day!

Friday’s forum was given by Livia Stark. Her work is trying to narrow down sources of information to be used by intelligence agencies in a novel way, which was of particular interest to myself as we are both investigating the same technique for solving problems (multi-armed bandits).

Straight after work on Friday, the interns went to Spaghetti House for dinner. We managed to get their early enough to take advantage of their ‘Happy Hour’, giving us pizza or pasta for £5.75. A lovely way to finish the week!

### Week 7

#### Written by Dylan Bahia and Jack Trainer

This week, most of our time was spent trying to knuckle down and get our posters finished so that they could be printed ready for the poster session next week. This meant that most of the week (except the sacred bank holiday) was spent in the STOR-i base room. Of course, we still had our usual half hour on Tuesday morning trying to decipher this week’s clue for the pub quiz and those who attended football on Tuesday had the privilege of seeing STOR-i footballing legend Harjit score his last goal ever. A slow, tough week was all worth it however as we had the opportunity to unwind at the annual STOR-i ball.

The whole department gathered at Lancaster Golf Club for an evening of food, drinks, magic and karaoke. The evening began with a delectable three course meal, accompanied by a small selection of aromatic wines. During this, a magician circulated the tables, flabbergasting us with acts which could be nothing other than sorcery. As the meal became ever more evanescent, Jon entertained us with his light-hearted speech, immediately followed by an awards ceremony. The most notable award was the Tickle Sam award, praising his contribution to the STOR-i department. The mingling of the guests then ensued, with the bar helping to facilitate conversation and creating a night to remember (or in some cases forget). With the highlight of the evening on the horizon, it was time for the guests to muster their brethren and deliver a spectacular karaoke performance. The combination of singing, dancing and laughter birthed a night nothing short of perfect. As the hour past witching hour approached, the guests said their farewells and went home, thus concluding the night. There were some exceptions, who continued the night by sauntering toward the city.

### Week 8

#### Written by Gwen Williams

We started the week making the final edits to our posters, which were intended to summarise our project and findings. We had been warned some STOR-i members had very particular views about poster-appearance, and so making sure our text boxes were correctly aligned was given high priority.On Tuesday, to celebrate submitting our posters, the majority of interns headed into town for the last weekly pub quiz. Although our general knowledge may have let us down, our luck did not, and we managed to win some free drinks.

The rest of the week was spent preparing our presentations. Summarising seven weeks of work in a 10-minute presentation proved challenging, however feedback from other interns during practise sessions made this a lot easier.On Friday morning everyone gave their presentation, with a brief interlude for coffee, of course. Once the presentations were over we all breathed a sigh of relief and went for a celebratory go burrito lunch.

That afternoon was the poster-session, fuelled by a generous spread of sweet treats. We each stood by our poster while members of STOR-i had the opportunity to walk around and ask us questions (or measure how well our text boxes were aligned). This was a great opportunity to talk in some more depth about our research, as well as to say goodbye to members of the department before we all left.

After all the excitement of the presentations and poster session, we were hit by the sad realisation that the internship had come to an end. For a final goodbye, we headed off to a bar, before going back to one of the flats for a big meal. The meal was intended to use up our leftover food before we moved out the next day. Despite having a somewhat unusual set of ingredients, the chefs (thank you) made some delicious tacos. After the meal, we said our goodbyes and wished each other well going back to our different universities. While we were sad to be leaving, we felt very grateful to have spent a fantastic summer as interns at STOR-i and to have been made to feel so welcome by everyone there.

## 2018 Blog

Here you can find out more about the STOR-i internships experience as told by the 2018 STOR-i interns.

### Week 1

#### Written by Peter Greenstreet

I moved into the STOR-i flat on Sunday and within 3 hours I had met 8 of the interns and we started to get to know each other. The Monday began with an introductory talk from Jonathan Tawn, then we had an IT session with Oli who set us up and gave us all laptops! (Sadly only for the 8 weeks internship). We all quickly discovered Oli was a master of all tech, as he can sort any problem out. Next up we had a team-building session which began by holding hands and getting knotted together. This was followed by trying to make a square with some rope whilst blindfolded, which ended up having only 2 corners and not one side of equal length. Finally, we finished with a game where we had to call each other different vegetables. It was all a great laugh and also I really got to bond with both the interns and the Masters students. Next up was a 2-hour lunch with FREE food where we also got to meet our supervisors. Everyone was super chatty and friendly. Following this was another lecture and we finished with a university tour. After this, some of us headed to the sports centre to play badminton with PhD students, who were really good. For the next 3 days, we all met with our supervisors to discuss our projects and find out what we needed to learn for the first couple of weeks. We also had lab sessions on both R and LaTeX which helped refresh my memory as well as teaching me new skills in both. On Tuesday night we went to the legendary White Cross quiz night and one of the STOR-i teams even managed to come second! Thursday started with some more R followed by our introductory presentations which contained loads of cute baby photos. This was followed by a great scavenger hunt. We were split into teams of 4 with 2 interns and a PhD and Master student in each. We were given a list of things to find, as well as challenges all around campus. It was great fun. Then we had a barbeque with lots of food. However, we did lose a sausage to the ducks! On Friday we had a presentation from Sarah about her PhD project followed by cake and then some more time to work on our projects and meet with our supervisors. That evening some of us went to the gym and the others went to a poker night. On Saturday we went for a 3-hour walk and then bought some famous sticky toffee pudding. It was a great opportunity to get to know some of the MRes and PhD students. Then some of us had a roast that evening followed by our lovely sticky toffee pudding.

### Week 2

#### Written by Mason Pearce

We started the week by working on our second challenge using Rstudio in our assigned groups, the challenge was to code a strategy to win a game of tic-tac-toe, we then simulated 1000 games against using our strategy against the other groups. It was very close, group 1 drew with everyone but group 2 won more when playing group 3 and they were the overall victors receiving a giant Toblerone as their prize. Later in the day we moved over to our new base room and got settled in. In the evening some of us went down to the sports hall to play badminton with the MRes and PhD students.

The next day began with presentations from first-year students on project ideas for their PhD, this gave us a taste for the different areas of research that goes on at STOR-i. In the afternoon we were taught how to make beamer presentations and posters in LaTeX to prepare us for the later weeks, a few of us then went to play football. Later on the in the evening we attended The White Cross weekly quiz, splitting into two teams. One of the teams even won a gallon of beer!

We all met with our project supervisors again later in the week and the following days were spent working hard on our individual projects in the new base room. Most of us using the skills we had been taught in Rstudio to code what our supervisor had asked us too, whilst some of us used Python as it is a more suitable programming language for the project-related tasks. Although we were working on our own topics, there was plenty of talking and sharing ideas and lending people a hand if they needed.

On Friday, Sam organised a board game night at Pizzetta on campus, we all attended and a lot of the PhD students came too, which was nice. At the weekend we had planned to go on a trek up Scafell Pike in the Lake District, but due to the weather, we decided to postpone and instead went to escape rooms in the city centre. We were trapped in a jail cell accused of being witches and if we didn’t solve the puzzles to escape we would be ‘left to rot’, we had one hour. At first, we made great time, but towards the end, the puzzles got more difficult and slowed us down, we just managed to escape with only two and a half minutes spare!

### Week 3

#### Written by Niamh Lamin

Week three was a much quieter week in terms of scheduled academic activities but this gave us all a good chance to get our teeth into our projects. I spent most of the week studying the types of inequalities produced by a program called PORTA for optimisation problems. This involved the production of three of four items with start-up costs associated with machines involved in the production of various sub-sets of these items. Even though my supervisor was away, I found this wasn’t a problem because she was always available by email or phone if I got stuck or needed to ask any questions.

As well as speaking with our individual supervisors, we also had a group meeting on Friday afternoon. As in the previous weeks, I found this meeting really useful as it gave me a chance to explain what I had been working on that week to some of the other interns. As well as giving us all chance to find out about the interesting projects the others were working on, I found that explaining my progress helped me to consolidate and check my own understanding and provided useful practice ready for the presentations at the end of the programme.

On Friday morning, we had a STOR-i Forum with a difference - rather than just having a presentation from a single PhD student, we were treated to a series of ‘Pi Minute Theses’. Each PhD student had exactly three minutes and fourteen seconds to introduce us to their research topic. I really liked this format as it meant we got to hear about a wider range of different projects and the general introductions were easier to follow than the more detailed presentations of the previous two weeks. All the projects sounded really interesting but I particularly enjoyed Emily’s presentation about Combination Therapies and how information could be borrowed between similar combinations of drugs to decide which ones to investigate in clinical trials.

Even though the academic timetable was slightly less hectic, the social calendar was just as full as normal so there was a lot to entertain us all in the evenings. The weekly badminton and football sessions continued, as well as the pub quiz on Tuesday evening but there were also some special activities. For example, a group got together to watch the Love Island final on Monday evening and a group of us went for an impromptu ice-cream from Walling’s in Alex Square on Wednesday afternoon- it was the best chocolate-chip ice-cream I’ve ever tasted, though I fear I managed to get more of it down my shorts than actually in my mouth!

To round off the week, there was a bar crawl Friday evening. I’d never been on a bar crawl before but I actually really enjoyed it. We visited some really nice pubs and it was a great chance to socialise and spend time with some of the MRes and PhD students as well as the other interns (that’s one of the things I love about STOR-i there are always plenty of chances for integration between year groups which creates a great atmosphere). We started at the Water Witch and then made our way down into town visiting a series of pubs on a route planned for us especially by Tom and Alan. Though I was really having fun, since I was quite new to these sort of events, I decided to head home after the third pub, especially since the next destination was one whose name would strike fear into even the bravest of souls (which I most certainly am not)- The Pub!

Anyway, I have it on good authority that everyone made it back safely and the event was definitely a success.

### Week 4

#### Written by Sean Hooker

Week fours timetable again provided the interns with the possibility to focus on their projects with lots of time available for independent research. My project involves identifying points in a time series where there has been an abrupt change in its properties, such as a change in the mean or variance. I spent the past week building on techniques that I had coded previously and developing these into computationally more efficient methods.

This culminated in running my chosen method over multiple simulated time series all of the differing lengths. The main measurement I was comparing was the speed of the algorithms. The code took a little longer than expected to get through all the sets of data but I got a nice looking graph out of it and plenty of ideas for improvement.

I’m beginning to feel accustomed to the weekly activities of STOR-i members. Tuesday was football, it was a good turn out from the interns, as well as the regulars, this week and after an exhausting 90 minutes, the match ended with a close score. Also on that evening was the pub quiz at the White Cross pub in town, STOR-i fielded two teams whose members spent the night answering questions on topics from Pokémon to world records on blowing balloons and pretty much everything in between.

The rest of the week flew by, with the occasional hangman session to break up some of the days in our base room. This week’s edition of the Friday Applied Probability (poker) night was held in the intern’s flat and the home advantage was clear with Mason winning the night.

Saturday was the main event of the week with a hike up Scarfell Pike, this had already been cancelled once due to bad weather and it’s clear why even on a (mostly) bright and clear day this was a challenge. The entire group, made up of interns and PhD students, made it up and down before daylight fell, but they didn’t quite miss the rain, however. But this provided the group with some picturesque scenes of the mountains and the drizzle. Their impressions of the hike are currently skewed with the mental images of them all climbing down a mountain in the heavy rain, but given time, and a few more warm drinks, they’ll be able to reminisce what will be the main achievement of the internship so far.

### Week 6

#### Written by James Price

In terms of the project, Week 6 appeared to be a bit of a breakthrough week for a lot of the interns. With the prospect of the presentation and poster session looming, a lot of the work towards the project has taken shape and overall end goals are being achieved.

My project is on finding heuristics for real-time railway rescheduling. I’ve spent the past few weeks exploring various methods for finding the shortest path through various graphs and so last week was spent finding ways to measure both how good the methods were and how long they took to calculate their chosen route through the graph. The results were encouraging and allowed me to observe where certain methods could be refined further. This week saw the addition of a shadowy character to the intern’s base room, the puzzle-maker. This man (or woman) of mystery would leave us a new puzzle every day which, usually after a few hours of head-scratching, lead to a piece of paper hidden somewhere in the room containing a five-letter word. The ingenuity of these clues ranged from noticing a blue arrow pattern in a grid of chairs to colouring numbers on a grid according to an extensive set of rules. This all culminated in having to ask a specifically worded question at the Friday forum. And as if by magic, our prize, in the form of a cake, appeared in the base room.

I really enjoyed the puzzles, which the other interns can testify to due to my regular wonderings around the room to peer under a table or on a ledge. However, I discovered a quite a few new hiding places which will come in useful should my supervisor unexpectedly turn up asking why I got no work done this week.

The regular White Cross Pub quiz on Tuesday’s was also a triumph, which due to Sam Tickle’s beautifully obscure knowledge of the Enid Blyton’s ‘Famous Five’ novels in the Pointless round resulted in a tidy cash prize for the entire team. I guess you could say we had a wonderful time*.

The week closed with the big social event, the STOR-i ball, this year held at Lancaster Golf Club. This full-on night contained a group Ceilidh, complete with skipping, clapping and of course plenty of spins, and also a wonderful three-course meal, although thankfully not in that order. And then just when I thought it was all over, it turned out the night was only beginning. There was a quick taxi ride into town and before I knew it I was in Hustle nightclub, still in a full suit, having a great time.

I’ve managed to block from my memory the time when I finally got to bed, but if anything that’s the sign of a fabulous night.

*the joke is left as an exercise for the avid reader.Read on to find out more about the STOR-i internships experience as told by the current STOR-i interns.

### Week 7

#### Written by Kostya Siroki

The week started with an enjoyable day-off. But it didn’t make Monday any less entertaining, as the Murder Mystery Day took place. Three STOR-i teams participated in it. The aim was to find a “murderer” by answering questions, exploring Lancaster and collecting evidence from witnesses. All the participants found this event fascinating. Moreover, good results were achieved. The “Mafiamaticians” won best costume prize, also team “STORlock Holmes”, containing 2 interns, came second.

For the rest of the week, we were pushing our creativity to the limits so as to produce eye-catching posters. It made us especially collaborative this week due to the regular LaTeX errors and the subsequent necessity to find someone who had already encountered that issue.

A Foosball charity tournament was organized on Tuesday in order to raise money for MIND and also to test out the BRAND NEW FOOSBALL TABLE decorating the hub from now on. Two of interns participated in the competition as a team and successfully won the first round. Sadly luck wasn’t on their side in the second game and they lost against the eventual competition runners-up.

Week 7 was enriched with football. We had two wonderful games on both Tuesday and Thursday. Interns lead by the MRes students opposed PhD students on Tuesday. After an exhausting 90 minute long game the score was 5-5 and so a golden point game began. This time we lost, but next week we will be sure to come back stronger than ever before.

Tuesday was a very busy day as, in addition to the above-stated activities, it also included the pub quiz. Three teams represented STOR-i this time, and every team ended up winning prizes. One of the teams won “The most average team” prize, the other one was the closest when guessing the exact number of “Big Bang Theory” episodes and the last team, but certainly not the least, WON the quiz.

As usual, the week was concluded by the forum. This time the presentation was given by Christian Rohrbeck and of course, it was followed by coffee with cookies.

### Week 8

#### Written by Nicolo Grometto

The final week of the internship has finally begun!

We spent Monday morning making last-minute changes to our posters before sending them off for printing. In the afternoon, we all had a good start on our presentations, trying to condense the results obtained throughout the previous 7 weeks into a ten-minute presentation.

Unfortunately, Tuesday did not see any of the interns attending the weekly pub quiz. Making our posters and slides look pretty in LaTeX and feeling the final day approaching took up a great deal of energy, and almost no interns showed up for the last football session, either. Quite an animated 4-a-side still took place on the field, and the sun shining made it even more enjoyable for those who played.

Wednesday quickly went by, as we spent the whole day making fast progress with our presentations. On Thursday, we had our exit interviews with the Director of STOR-i, Jonathan Tawn. We had the possibility to discuss our experience throughout the internship, as well as the progress we made with our projects in the past weeks. We concluded the day by gathering in groups in the Postgraduate Statistics Centre for rehearsing and giving each other constructive feedback.

And at last, Friday! The day began with an unusual atmosphere at STOR-i, as we were all so excited about showing our work to others, whilst also feeling nervous about having to speak in front of the audience. After rearranging the interns’ office for the afternoon poster session, at 9:45 we started off with the presentations. It was incredible to see how much progress each one of us made during the internship and how well we all managed to present our work.

After a short break, the day continued with the poster display session, which also went exceedingly well. A number of visitors came along to see our work, including the MRes and PhD students, as well as members of staff from STOR-i and the Mathematics and Statistics, and Management Science Departments. We all received positive comments about our research projects, as well as posters, which made us extremely proud and satisfied with our work. We concluded the day with a final meal in town and celebrated our results altogether.

On Saturday morning, the time to leave had come. Whilst feeling sad for having to say goodbye to each other, we were all so happy for having spent a fantastic summer at STOR-i and for feeling part of such an inclusive community. Thank you to everyone who worked hard in order to make this happen.

## 2017 Blog

Here you can find out more about the STOR-i internships experience as told by the 2017 STOR-i interns.

### Week 1

#### Written by Callum Barltrop

The week started with an introductory talk from Jonathan Tawn, followed by some team-building exercises. We also got a tour of the STOR-i facilities and were introduced to the saving grace of the organisation - the coffee machine.

On Tuesday, we had our first lectures on R and Latex. Many of us also met with our supervisors on this day to discuss our projects and decide what reading to do to familiarise ourselves with the content. In the evening, a bunch of us met in the White Cross for the 'world-renowned' pub quiz. Whilst my team didn't win, we did win a gallon of beer for having the closest guess on the bonus round - how much is a KG of Donkey Cheese? (clue: it's very expensive!)

Wednesday and Thursday were fairly similar, with more lectures and reading. In the intern group, we had all got to know each other fairly well by this point and had started making some plans for over the rest of the internship, including booking out a lecture theatre for Game of Thrones!

Friday was slightly different - instead of the usual lectures, we were set our first group R challenge, which involved a famous old puzzle. We also attended our first STOR-i forum, where we found out about some of the fascinating research being done by one of the PhD students at the organisation. In the evening, a few of us met up for a couple of drinks in one of the bars of campus.

Saturday involved a trip to Grasmere in the Lake District, organised by the one known formally as 'Mr Tickle'. After a long 10 mile hike with plenty of rain, cloud and mud, we bought some incredible gingerbread and stared up angrily at the sky as the sun came out... Just our luck!

Finally, on Sunday, George (another intern) and I met up in the morning for a nice steady run in the sun. Later on in the day, a bunch of us met up at a bar on campus to catch the Wimbledon final, where Federer made the game look easy.

All in all, a fantastic first week at STOR-i, with a lot still to look forward to!

### Week 2

#### Written by Edward Austin

Week 2 began with us comparing who had written the best algorithm to solve a Travelling Salesman Problem. I can confirm, with a score nearly 100 times that of the winning group – Jake, Jonny and George’s code – that it was not my group. Not to be put off with this, though, we set to work on our next challenge – making our laptops play noughts and crosses.

Over the course of the week, this certainly brought out some of the competitors in the group with Jake managing to make a code that simply could not be beaten. Indeed, after playing a million games against random opponents it never lost! This piece of programming mastery led to the hypothesis that caffeine, as tracked on our new caffeine chart, was the secret to his coding success.

Wednesday not only saw my group’s noughts and crosses code crushed by everyone else’s but also saw us finish the LaTeX courses with an introduction to creating posters. This will certainly be of great use to us when it comes to the end of the project!

Thursday was a strange day insofar as we had no scheduled activities and instead was left to work solely on our projects. This could have marked the start of a long and prosperous relationship with RStudio, however judging from the number of error messages on my screen this might have to wait a couple more weeks yet! In the evening we attended a board games night with some of the other MRes and PhD students, and fun was had by all

The following day was our second STOR-i coffee morning with a talk by David Torres Sanchez on optimising aircraft engine maintenance schedule. It was a very enjoyable talk in the sense that not only could we all follow what was being said, but it was delivered with cheerful humour too! Lunchtime then saw Callum entrench his tradition of burritos on a Friday and then in the afternoon we all decided to combine the group meetings into a group presentation where each group member gave a small talk on what they had done that week. This was great as not only was the subject matter interesting, but we all learnt a bit more about the work we were doing and what direction we should head in next too!

At the weekend some of us headed into the Lake District for a walk on Saturday, and others spent time with their girlfriends. On Sunday there was also a cinema trip to watch Dunkirk, a film I cannot recommend highly enough!

### Week 3

#### Written by Jake Grainger and Graham Laidler

With the end of the previous weeks’ coding lessons, we were able to fully sink our teeth into our individual projects. This allowed us to really make some solid progress, gaining a fuller sense of our projects’ complexities.

We all began to make some headway with our projects, and our coding skills went through the roof. Here is a spatial dependency plot that Jake produced. It shows the wave height dependency of different points with the central point, represented using his favourite colours.

On Friday, we enjoyed the usual STOR-i forum. This week, each of 5 PhD students attempted to explain their research in just 3 minutes 14 seconds. With the volume of the buzzer helpfully set to maximum, we were left in no illusions as to when this time was up. Jake had a headache for the rest of the day, but some hot milk and spatial extremes perked him up again. This was another great week on the STOR-i internship, and we are all bonding well.

### Week 5

#### Written by Chloe Fearn and Jonnie Bevan

Monday of week 5 saw the continuation of the after-work Game of Thrones watching tradition that has developed. We don’t watch Game of Thrones but given all the exciting talk since, we think it was a great episode!

On Wednesday of this week, we had to tackle the problem-solving day, which involved finding reasons why a cycling company was receiving less custom. We split off into three teams and spent the day analysing the relevant data and coming to conclusions about what the company could do to boost their customer base. At the end of the day, we presented our findings to the other groups and some of the MRes students; it was interesting to see the different approaches we all decided to go with. It was a fun break from the routine that we have settled into with our projects and gave us a chance to work collaboratively for the first time since the noughts and crosses project in week 2. On Thursday morning we talked to three of the PhD students about what life is like at STOR-i. They were very helpful and answered a lot of questions that we had!

This week’s STOR-i forum involved five pi-minute theses as opposed to the general half-hour presentation of a single thesis. We heard short presentations on a range of topics, and the loud buzzer at the end of each one kept us all on our toes! Afterwards, we headed to The Hub for the usual coffee and biscuits, before we rounded off the week with an afternoon of work (and a bit of Pictionary on the whiteboard).

### Week 7

#### Written by Callum Barltrop

This week started off a little different from the other weeks over the internship since it was the first week where we did not have anything scheduled! This was to allow us time to work on our project posters and presentation, which we would be going to be presenting in the following week.

Whilst working independently can be at times, this week really showed to us the difference between having a good working group can make. We regularly took coffee breaks and had chats about how our projects were going, as well as getting second opinions on some of the stuff we were working on. For many of us, this really helped to clarify what we were working on.

On Friday, we had the regular STOR-i forum - this week was a 'Pi Forum' where 5 PhD students had exactly 'Pi' minutes to present their work and the progress they had made.

Saturday began with an early start for some of us as we had decided to go and climb Scafell Pike - the highest point in England! A few of us got a little lost on the way there and spent a fair bit of time driving through farmers fields (cough cough Graham) but we made it in the end for some rather anticlimactic views. A great day all in all though!

Finally, on the Sunday, myself and George once again went out for a gentle run down some of the beautiful country roads around Lancaster, making for some awesome views.

All in all, this week really gave us good experience in what it would be like to work independently on a PhD, as well as how to summarise and present your work in a concise manner!