# Interns and Blogs

The STOR-i Summer Research Internships run for eight weeks, July to September, each year. Every cohort writes a weekly blog on their experience of the programme, and you can read about them here.

## Interns

Click on the links below to see details of the interns by year.

## 2019 Interns

Here you can find details of the summer 2019 interns including a description of their research project.

### Dylan Bahia

#### Investigating the Eﬀect of Dependence on Averaging Extremes

**Degree:** BSc (Hons) Mathematics, University of Manchester**Supervisor:** Jordan Richards

Extreme Value Theory is often used to model extreme weather events, such as extreme rainfall, which is the main cause of river ﬂooding. Given data at separate locations, we can ﬁt simple models to understand the distribution of heavy rainfall at these single locations. However, ﬂooding is generally not caused by an extreme event at a single location; it is caused by extreme rainfall averaged over several locations, often referred to as a catchment area. This project aims to investigate how the distribution of extreme rainfall at single locations, and the dependence between extreme events at diﬀerent sites, aﬀects the distribution of the average overall sites. We do this using a subset of data provided by the Met Oﬃce, which consists of gridded hourly rainfall across the north of England. Taking each grid box to be a single location, we can ﬁt distributions that model extreme events at each particular location. Dependence between grid boxes can also be quantiﬁed using empirical measures. We then average over adjacent grid boxes and ﬁt the same distributions. We are particularly interested in how the parameters of the distributions change as we average over an increasing number of grid boxes, and how the dependence between locations inﬂuence this change.

Click here to view Dylan - poster and Dylan - presentation

### Matthew Darlington

#### Optimal learning for multi-armed bandits

**Degree:** BSc (Hons) Mathematics, University of Warwick**Supervisor:** Livia Stark

Optimal learning is concerned with eﬃciently gathering information (via observations) that is used in decision making. It becomes important when the way information is gathered is expensive, so that we are willing to put some eﬀort into making the process more eﬃcient. Learning can take place in one of two settings, oﬄine, or online. In oﬄine learning, we make a decision after a number of observations have taken place, while in online learning decisions are made sequentially so that a decision results in a new observation that in turn informs our next decision. This project will focus on learning for multi-armed bandits.

Multi-armed bandits present an online learning problem. They are easiest to visualise as a collection of slot machines (sometimes referred to as one-armed-bandits). The rewards from the slot machines are random, and each machine has a diﬀerent, unknown expected reward. The goal is to maximise one’s earnings from playing the slot machines. That can be achieved by playing the machine with the highest expected reward. However, the expected rewards can only be estimated by playing the machines and observing their random rewards. Therefore there is a trade-oﬀ between exploring bandits to learn more about their expected rewards and exploiting bandits with known high expected rewards.

Click here to view Matthew D - poster and Matthew D - presentation

### Katie Dixon

#### Recruitment to Phase III Clinical Trials

**Degree:** BSc(Hons) Mathematics and Statistics, Lancaster University**Supervisor:** Szymon Urbas

Clinical trials are a series of rigorous experiments examining the eﬀect of a new treatment in humans. They are essential in the drug-approval process, as per the European Medicines Agency guidelines. In order for a drug to be made available to the public, it must pass a number of statistical tests each with suﬃcient certainty in the outcomes. The most costly part of the trials process is Phase III, which is composed of randomised controlled studies with large samples of patients. Patients are continuously enrolled across a number of recruitment centres.

The standard way of modelling recruitment in a practical setting is to use a hierarchical Poissongamma (PG) model, as introduced in Anisimov and Fedorov (2007). The framework assumes that the rates at which patients come into each centre do not change over time. The main argument for using the simple model is the limited data available for inferences as well as tractable predictive distributions. A recent work of Lan et al. (2018) explores the idea of decaying recruitment rates. However, the proposed model lacks ﬂexibility in accounting for a multitude of recruitment patterns appearing across diﬀerent studies.

The internship project will concern itself with the analysis of a ﬂexible class of recruitment models in data-rich scenarios. The project will likely tackle an open problem in the area, which is the presence of a mixture of diﬀerent recruitment patterns appearing in a single study. This will likely involve novel ways of clustering centres based on the observed recruitments. The project will entail a mixture of applied probability, likelihood/Bayesian inference and predictive modelling. There will be a strong computing component in the form of eﬃcient optimisation or simulation methods.

Click here to view Katie D - poster and Katie D - presentation

### Matthew Gorton

#### Investigating Optimism in the Exploration/Exploitation Dilemma

**Degree:** MPhys Physics with Astrophysics and Cosmology, Lancaster University**Supervisor:** Alan Wise

A stochastic multi-armed bandit problem is one where a learner/agent has to maximise their sum of rewards by playing a row of slot machines, or ‘arms’, in sequence. In each round, the learner pulls an arm and receives a reward corresponding to this arm. The rewards that are generated from each arm are assumed to be distributed as a noisy realisation of some unknown mean, therefore, maximising the sum of the rewards relies on ﬁnding the arm with the highest mean. We wish to create policies to tell us which arm to play next in order to maximise our reward sum.

The challenge to ﬁnding the best arm in the multi-armed bandit problem is the exploration-exploitation dilemma. This dilemma occurs since, at any time point, we need to decide between playing arms which have been played a low number of times (exploration) or the arm with the best-estimated mean (exploitation). If the learner explores too much then they will miss out on playing the optimum arm, however, if the learner chooses to exploit the best arm, without exploring other options, then they could end up exploiting a sub-optimal arm. It is clear that the best policies balance both exploration and exploitation. The policies which we will study in this project follow the philosophy of optimism in the face of uncertainty.

These policies work by being optimistic towards options which we are uncertain about. For instance, consider an intern visiting Lancaster University for the ﬁrst time. For lunch, if they are optimistic about the local food places (Sultan’s/ Go Burrito) over chains (Subway/ Greggs), then they will be more likely to explore the places that they are more uncertain about. In multi-armed bandits, these policies give each arm an upper conﬁdence bound index (UCB), which usually takes the form of the estimated mean reward plus some bias, and the arm is played with the largest value of the index. These types of policies are mathematically guaranteed never to perform badly - but can we do better? This is the major question of this project.

Click here to view Matthew G - poster and Matthew G - presentation

### Joseph Holey

#### Predicting ocean current speed using drifter trajectories

**Degree:** MPhys Theoretical Physics, Durham University**Supervisor:** Mike O'Malley

The dataset I am using involves tracking of drifting objects in the sea which are tracked by GPS. These drifting objects are commonly referred to as drifters. In summary, the location of drifters is processed to obtain quarter daily Longitude, Latitude and Velocity Data. In order to model complex phenomena in the ocean, one of the ﬁrst pre-processing steps is to remove a large scale mean velocity of the drifters. In other words, focus on the residuals, a model which predicts velocity, given location. Currently, one of the most popular methods to do this involve binning the data, then extracting a mean in each bin and using this mean as the prediction. This project will aim to develop a better, more accurate method to predict velocity at a given location.

The general scope of methods you will be focusing on are classed as nonparametric regression, and this includes spline regression, Gaussian processes, local polynomial regression and more. These methods are generally applied to independently distributed data. One of the more diﬃcult aspects of modelling this type of data is accounting for the non-independent nature. In particular, the next sampled location in a trajectory strongly depends on the current location and velocity at that location. In particular, this sequential sampling can strongly aﬀect model selection which will be a large part of this project.

Initially, the project will look at modelling a relatively simple toy simulation which I will supply. The reasoning behind this is that we know the true underlying process, therefore the models you ﬁt can be compared to the known ground truth. The method which is found to work best can then be applied to the real dataset with empirical evidence that it works on similar data.

Click here to view Joe - poster and Joe - presentation

### Shyam Popat

#### Point Processes on Categorical Data

**Degree:** BSc (Hons) MORSE, University of Warwick**Supervisor:** Jess Gillam

The aim of this project to explore point process methods to model categorical time series, speciﬁcally data provided by Howz. Howz is home monitoring system based on research that indicates changes in daily routine can identify potential health risks. Howz use appliances placed around the house and other low-cost sources such as smart meter data to detect these changes. This data is a great example of how categorical data applies to real-life situations. One potential way of modelling this data is to use point processes.

Point processes are composed of a time series of binary events (Daley and VereJones, 2003). There exist many diﬀerent point processes that could be useful for modelling this data, such as Poisson processes, Hawkes processes and Renewal processes (Rizoiu et al., 2017). The goal of this project is to ﬁnd ways to model multiple sensors, looking at the time between sensors being triggered to see if this indicates a change in routine. One extension to this project would be exploring the relationship between the categories; thus having diﬀerent models for each category. We could also look into subject speciﬁc eﬀects in the data.

Click here to view Shyam - poster and Shyam - presentation

### Katy Ring

#### Detection Boundaries of Univariate Changepoints in Gaussian Data

**Degree: BSc (Hons) Computer Science / MSc Data Science, LMU Munich Supervisor: Mirjam Kirchner / Tom Grundy**

Changepoint detection deals with the problem of identifying structural changes in sequential data, such as deviations in mean, volatility, or trend. In many applications, these points are of interest as they might be linked to some exogenous cause. In the univariate case, the factors impacting on the detectability of a changepoint are well known: size of the change, location of the change, type of change, number of observations, and noise. However, the interplay of these parameters with the detectability of a changepoint hidden within a data sequence is yet to be studied in detail.

In this project, we investigate the reliability of the likelihood ratio test (LRT) statistic for detecting a single change in a univariate Gaussian process. To this end, we will conduct a simulation study testing diﬀerent settings of the parameters change in mean, change in variance, location of the change, and sample size. In particular, we are interested in ﬁnding parameter combinations for which the LRT becomes unreliable. For example, for a ﬁxed variance, sequence length, and changepoint position in the data, we would decrease the change in mean until we ﬁnd a region in which the LRT scatters around zero. The overall aim is to derive a surface that splits the parameter space of the LRT statistic into a detectable (LRT > 0) and undetectable (LRT →±0) changepoint region. Ideally, as a next step, an explicit relation between the simulation parameters and the detection boundary would be determined empirically. Further experiments on detection boundaries are possible, such as analysing non-Gaussian data or alternative test statistics.

*How the diﬃculty of multivariate changepoint problems vary with dimension and sparsity*

Multivariate changepoint detection aims to identify structural changes in multivariate time series. Increasingly more attention is being paid to developing methods to identify multivariate changes with little information known about the diﬃculty of multivariate changepoint problems and how they scale with dimension and sparsity. This work aims to answer the question: ‘If we have a user deﬁned signiﬁcant change size, could a change this size actually be detected using changepoint methods?’

In the univariate changepoint setting, it is well understood that there are many factors that aﬀect detectability of a changepoint including the size of the change, location of the change, type of change and length of the time series. Moving from a univariate to multivariate setting adds several layers to the detection problem; including the dimension of the time series and the sparsity of the changepoint.

Recent work at Lancaster University has considered, computationally, the case of a multivariate change in the mean problem where the change size is identical in all dimensions. Under these constraints, a relationship was identiﬁed between the size of the change and the number of dimensions that ensures the true and false positive rates remain constant. This project seeks to identify a relationship between the sparsity of a changepoint and the diﬃculty of detecting it as well as exploring the problems theoretically to give more justiﬁcation to the computational ﬁndings. If time and interest allow, we will also explore the aﬀect of varying the size of the change in each series on the detectability of the changepoint.

Click here to view Katy R - poster and Katy R - presentation

### Moaaz Sidat

#### Evaluating A Response Adaptive Clinical Trial using simulations

**Degree:** MSci Mathematics, Lancaster University**Supervisor:** Holly Jackson

Before a new drug can be distributed to the public, it must ﬁrst go through rigorous testing to make sure it is safe and eﬀective. This evaluation in humans is undertaken in a series of clinical trials. The approach most often used in clinical trials is the randomised controlled trial (RCT), which assigns all patients with equal probability to each treatment in the trial. Therefore RCTs are an eﬃcient way to identify if there is a signiﬁcant diﬀerence between the treatments in the study. Hence, the equal allocation of patients to each treatment maximises the power of the study. However, RCTs do not allow the possibility of changing the probability of assigning a patient to the treatments. If it emerges before the end of the trial that one treatment is clearly more eﬀective than the other, then to maximise the number of patients treated successfully, logic dictates the remaining patients should be allocated to the most eﬀective treatment.

Response adaptive designs use information from previous patients to decide which treatment to assign to the next patient. They vary the arm allocation in order to favour the treatment, which is estimated to be best. Multi-Armed Bandits (MAB) are an example of a response-adaptive design. They allocate patients to competing treatments in order to balance learning (identifying the best treatment) and earning (treating as many patients as eﬀectively as possible). One issue with some response adaptive designs is every patient is expected to produce the same outcome if given the same treatment. However, some patients will have certain characteristics (also known as covariates) which means they will react to the same treatment diﬀerently. For example, an overweight man in his twenties may react diﬀerently to a drug than an underweight woman in her eighties.

This internship will focus on a randomised allocation method with nonparametric estimation for a multi-armed bandit problem with covariates. This method uses nonparametric regression techniques (including polynomial regression, splines and random forests) to estimate which treatment is best for the next patient due to their particular covariate. The main emphasis of this project is the endpoint. An endpoint could be binary, such as the treatment curing the patient or not, integer-valued, such as the number of epileptic ﬁts in 6 months, continuous, such as a change in blood pressure, or it could be the survival time of a patient.

Click here to view Moaaz - poster and Moaaz - presentation

### Jack Trainer

#### Heuristic procedures for the resource-constrained project scheduling problem

**Degree:** MSci Natural Sciences, University of Bath**Supervisor: **Matt Bold

The resource-constrained project scheduling problem (RCPSP) is a well-studied problem in operational research. Given a set of precedence-related activities of known duration and resource requirements, and a limited amount of resource, the RCPSP consists of ﬁnding a schedule that minimises the time to complete all the activities (known as the project makespan). Solving this problem on a large scale is very diﬃcult. Hence, whilst many exact solution methods exist for solving the RCPSP, these are too slow and therefore largely ineﬀective at solving this problem on a realistic scale. Therefore, the study and evaluation of fast, but inexact procedures (known as heuristic procedures) for solving the RCPSP is critical for real-world application.

Priority-rule heuristics are simple, yet eﬀective, scheduling procedures, consisting of a rule for ordering activities into a so-called activity list representation, and a rule for turning the activity list representation into a complete schedule. This simple class of heuristics form the basis of many of the most successful heuristic procedures for the RCPSP. This project aims to compare the eﬀectiveness of a number of diﬀerent procedures from this large subset of heuristics, by testing them on a large database of RCPSP test-instances, as well as investigate possible further improvements and extensions to them.

Click here to view Jack - poster and Jack - presentation

### Connie Trojan

#### Approximate posterior sampling via stochastic optimisation

**Degree:** MMath Mathematics, Durham University**Supervisor:** Srshti Putcha

We now have access to so much data that many existing statistical methods are not very eﬀective in terms of computation. These changes have prompted considerable interest amongst the machine learning and statistics communities to develop methods which can scale easily in relation to the size of the data. The “size” of a data set can refer to either the number of observations it has (tall data) or to its dimensions (wide data). This project will focus on a class of methods designed to scale up as the number of available observations increases.

In recent years, there has been a demand for large scale machine learning models based on stochastic optimisation methods. These algorithms are mainly used for their computational eﬃciency, making it possible to train models even when it is necessary to incorporate a large number of observations. The speed oﬀered by stochastic optimisation can be attributed to the fact that only a subset of examples from the dataset is used at each iteration. The main drawback of this approach is that parameter uncertainty cannot be captured since only a point estimate of the local optimum is produced.

Bayesian inference methods allow us to get a much better understanding of the parameter uncertainty present in the learning process. The Bayesian posterior distribution is generally simulated using statistical algorithms known as Markov chain Monte Carlo (MCMC). Unfortunately, MCMC algorithms often involve calculations over the whole dataset at each iteration, which means that they can be very slow for large datasets. To tackle this issue, a whole host of scalable MCMC algorithms have been developed in the literature. In particular, stochastic gradient MCMC (SGMCMC) methods combine the computational savings oﬀered by stochastic optimisation with posterior sampling, allowing us to capture parameter uncertainty more eﬀectively. This project will focus on implementing and testing the stochastic gradient Langevin dynamics (SGLD) algorithm. SGLD exploits the similarity between Langevin dynamics and stochastic optimisation methods to construct a robust sampler for tall data.

Click here to view Connie - poster and Connie - presentation

### Liv Watson

#### Using Pairwise Comparison in Sports to Rank and Forecast

**Degree:** MMath Mathematics, Durham University**Supervisor:** Harry Spearing

The aim of this project is to develop a ranking system for sports. Deﬁning a ‘good’ ranking depends on the aim. A ranking system that is used to predict future results must provide accurate predictions and could have a complex structure, whereas a system designed to seed players for a tournament needs to be robust to exploitation, fair, and easy to understand. Generally, a system that excels in one of these areas will fail in the other.

It is expected that the focus of this project will be the former, namely, to develop an accurate and robust ranking system. The accuracy of the system can be measured by comparing its predictive performance against existing benchmarks as well as bookmaker’s odds, and robustness can be measured by the ranking’s sensitivity to small changes in match outcomes. A ranking system that is applicable to all sports is, of course, ideal, but sport speciﬁc features will need to be considered to achieve state-of-the-art prediction accuracy, and some general knowledge or interest in sports will be of use. Initially, simulated data will be used to design the ranking system. Then, once a general framework for the model is established, real data from a sport of the student’s choice will be used to test it. The model will then be tweaked to make use of all the available sport speciﬁc features of the data.

Click here to view Olivia - poster and Olivia - presentation

### Gwen Williams

#### Bid Price Controls for Dynamic Pricing in the Airline Industry

**Degree:** BSc (Hons) Mathematics and Psychology, University of St Andrews**Supervisor:** Nicola Rennie

In the airline industry, revenue management systems seek to maximise revenue by forecasting the expected demand for diﬀerent ﬂights, and optimally determining the prices at which to sell tickets over time. Ideally, rather than setting prices at the start of the booking horizon, they should be updated over time depending on how many people have so far purchased tickets and how much time remains until departure. One such method of dynamically pricing tickets is the use of bid price controls. Bid price controls set threshold values for each leg of a ﬂight network; such that an itinerary (path on the network requested by a potential passenger) is sold only if its fare exceeds the sum of the threshold values along the path (Talluri and Ryzin, 1998).

Given that bid price controls require forecasts of demand; if demand is not as expected, for example, due to increased sales around the time of major sporting events or carnivals, this results in non-optimal pricing, which leads to a decrease in potential revenue. So far, we have considered the potential gains in revenue when incorrect forecasts are updated under simpler revenue management pricing control mechanisms and found that revenue can be increased by up to 20%. This project will similarly seek to quantify the potential gains in revenue from updating the bid prices when unexpected demand is detected.

Click here to view Gwen - poster and Gwen - presentation

## 2018 Interns

Here you can find details of the summer 2018 interns including a description of their research project.

### Eleanor D'Arcy

#### Estimation of Diffusivity in the Ocean

**Degree:** Lancaster University, BSc Mathematics**Supervisor: **Sarah Oscroft

Diffusivity plays an important role in many real world problems, such as recovering missing objects lost at sea or predicting how an oil spill will spread. Specifically, it measures the rate at which particles spread out over time, for instance organisms or sediments transported through water. We can estimate diffusivity using satellite-tracked drifting instruments known as drifters. However, the ocean is highly unpredictable – two particles that start at the same location at the same time can end up following completely different paths to very different locations. This requires a statistical approach for the estimation of diffusivity.

Current techniques for estimating diffusivity provide inconsistent results so through statistical research, we aim to improve these techniques. My project compares some of these different methods and uses these to estimate diffusivity for a part of the ocean using real data collected by the global drifter program. This project applies time series techniques, with a particular focus on spectral analysis. I have used MATLAB to compare different estimators using both simulated and real data before plotting my results.

View Eleanor's Poster - Eleanor.

### Peter Greenstreet

#### Investigating models for potential self-excitation

**Degree:** Lancaster University, BSc Mathematics**Supervisor:** Zak Varty

This project explores models for which the data points occur randomly in space and time. The aim of this type of data is to model the locations of data points or events in addition to any information or marks associated with each occurrence. This can be achieved through point process models. The simplest example of this is the homogenous poisson process. In homogenous poisson process model events occur independently at random with a uniform intensity.

The first aim of the project is to look at methods for assessing the validity of the assumptions for any data set to fit the homogenous poisson process model where the assumptions are satisfied. The next aim is to study complex data sets where the assumptions made no longer hold. Then to use different models which have fewer or weaker assumptions and the subsequently assessing any improvements in the model fit.

During the project there is a choice of two data sets. The first of which is about armed conflicts across the globe. The second was about earthquakes above magnitude 1.5 in the Netherlands. For which the events are induced by gas extraction from the reservoir below the region.

View Peter's Presentation - Peter and Poster - Peter.

### Nicolo Grometto

#### Clustering On Web-Scraped Data

**Degree:** London School of Economics, BSc Statistics with Finance**Supervisor:** Hankui Peng

The Office for National Statistics (ONS) are currently experimenting with new data sources to improve the representativeness of the Consumer Price Index (CPI), which is the official indicator for the inflation and deflation rates for the country. Web-scraped data is considered as a promising data source that come in huge volume and can be scraped easily and at high frequency. Therefore, if could incorporate web-scraped data into the index generating procedure, then price indices could be generated more effectively and at higher frequency.

However, web-scraped data do not always come in a way that can be immediately used for price index generation. The category labels for web-scraped prices usually follow the website categorisation that the data are scraped from, which does not necessarily match the categorisation that is used for the national price index generation. Also, some product information (product name, price, etc.) might be incorrectly scraped, due to the quality of the web-scrapers.

Clustering methods are a useful tool for tackling the aforementioned challenges that come with web-scraped data. The problems that we are interested in include both recognising the main clusters of products, given the web-scraped data as well as identifying the incorrectly scraped products. In this project, we will start by exploring the fundamental clustering methods that exist in the literature (k-means and spectral clustering methods, in particular). At a further stage, we will apply this techniques on a web-scraped dataset. Clustering performance evaluation shall be carried out to compare the existing methods and further extensions to the existing techniques shall be explored.

View Nicolo's Presentation - Nicolo and Poster - Nicolo.

### Cyrus Hafezparast

#### Investigating Trend in the Locally Stationary Wavelet Model

**Degree:** The University of Cambridge, BA Natural Sciences**Supervisor: **Euan Mcgonigle

Outside of neat theoretical settings, time series are most commonly non-stationary. In fields from finance to biomedical statistics, time series rarely occur which have constant mean and/or autocovariance.

Wavelets are a class of oscillatory functions which are well localised in both time and frequency, allowing wavelet based transforms to capture information in a time series by examining it over a range of time scales. One prominent method for doing so with non-stationary time series is the locally stationary wavelet (LSW) model of Nason et al. (2000). Time series in the LSW model are assumed to be zero-mean. In practise this is rarely the case. Our aim is to explore the behaviour of the model when this assumption is weakened by investigating the effect of different trends on the LSW estimate of the wavelet spectrum.

We also plan to examine the treatment of boundary effects that appear in the wavelet coefficients of data near the end points of the time series. The time series are usually assumed to be periodic, however this too is a poor assumption in most non-zero mean cases. Our project will attempt to analyse the boundary effects caused by a trend and implement methods to reduce them.

View Cyrus' Poster - Cyrus.

### Sean Hooker

#### Detecting Changes through Transformations

**Degree:** Newcastle University, BSc Mathematics and Statistics**Supervisor:** Sean Ryan

Changepoint detection relates to the problem of locating abrupt changes in data when the properties of a given time series have changed. This can be extended into finding whether or not a changepoint has actually occurred and if there are multiple changepoints. This area of statistics is hugely important and has many real world applications such as medical condition monitoring and financial fluctuation detection.

The most studied method for detecting changepoints looks at changes in mean within a time series. This is a popular approach due to the fact that changes like these can be detected by transforming the data and then analysing changes in the mean of the transformed data. Other methods which may prove more accurate at detecting changepoints include looking at changes in variance.

My project aims to analyse various methods of identifying changepoints, whilst studying the advantages and limitations of each approach. This involves the construction and evaluation of numerous algorithms which are used to detect changepoints.

### Niamh Lamin

#### Optimisation Problems with Fixed Charges Associated with Subsets

**Degree:** Lancaster University, MSci Natural Sciences**Supervisor:** Georgia Souli

Optimisation problems appear in a wide range of applications from investment banking to manufacturing. They involve finding the values of a number of decision variables (for example, the amount of different products that should be manufactured) to maximise (or minimise) a particular objective function (for example, profit), subject to a number of constraints. In many situations, the value of one or more of the decision variables must be an integer to give a feasible solution. These are called Mixed Integer Programs (MIPs).

The particular focus of my project is cutting planes. These are inequalities which are satisfied by all the feasible solutions to the MIP but not by all of the solutions that would be feasible if we ignored the integer constraints. The aim is to investigate different cutting planes in problems where we have fixed charges associated with subsets. In these problems, we have a set of continuous variables whose sum is bounded. We also have subsets of variables defined such that, if any variable in that subset takes a positive value, then a fixed charge is incurred. For example, the variables may represent the amounts of various items to be manufactured and the fixed charges would be start-up costs associated with machines involved in the production of subsets of these items. Cutting planes can be used to remove infeasible solutions to the MIP to focus in on the feasible region and hence the optimal solution to the problem.

### James Mabon

#### Modelling the behaviour of Kepler light curve data with the aim of exoplanet detection

**Degree:** Warwick, BSc Mathematics**Supervisor:** Alexander Fisch

Many exoplanets are detected via the so called transit method. This involves measuring the luminosity of a certain star at regular time intervals to obtain graphs known as light curves. A regular short sharp dip in luminosity could be caused by an exoplanet passing in front of the star. This sounds simple in theory but in reality there is lots of random noise, and the signal induced by planetary transits is very weak (even a planet the size of Jupiter reduces the luminosity of the sun by only 1% during a transit).

In order to remove some noise caused by phenomena such as sun spots NASA preprocesses their data to produce a so called whitened light curve. However their current method introduces complications and affects the signature of the transits, which makes the detection of the planets from the whitened data much harder.

My project will be focused on modelling the data in such a way as to not distort the transit signals. So far I have been using R to remove dominant sine waves from the data and will go on to investigate periodicity and autocorrelation within the data.

View James' Poster - James M.

### Mason Pearce

#### Allocation of limited number of assets

**Degree:** Lancaster University, MSci Mathematics and Statistics**Supervisor:** Stephen Ford

Having just completed my third year at Lancaster University and consider doing a PhD, the STOR-i internship was a great way for me to gain an insight into PhD life. The project I have been assigned is to do with assigning limited assets to a dynamical system. The problem that arises is if we choose to deploy an asset in the present it can’t be used later but it may be more rewarding to use it in the future. We wish to deploy them so that the reward gained is optimal. To do this, we use dynamic programming which is starting from the end and working backwards to the start, optimizing in stages, this doesn’t always yield an optimal solution but assuming certain properties of the system it will. The task at hand is finding the optimal policy, where a policy is a mathematical way to decide what decision should be made in the present given the current state of the system.

### James Price

#### Heuristics for Real-time Railway Rescheduling

**Degree:** University of Bath, Mmath Mathematics with Industrial Placement**Supervisor:** Edwin Reynolds

In railways networks, a single delayed train can delay other trains by getting in their way. This is called reactionary delay and is responsible for over half of all railway delays in the UK. Railway controllers therefore have to make decisions in real-time that minimise the amount of reactionary delay. Such decisions include ‘should I cancel a train, and if so which one?’ and ‘which train should leave the station first if they can only go one at a time?’ There currently exists algorithms that can find the optimal solution to these problems. However the amount computational time required to run the algorithm, especially on a large network, makes solving these problems in real-time infeasible. An alternative approach is use a heuristic, which solves the problem with a lower degree of accuracy but produces an answer in much less time. My project involves developing multiple heuristics, comparing their advantages and limitations and deciding on a final idea.

View James' Presentation - James P and Poster - James P.

### Konstantin Siroki

#### Preventing overfitting in Natural Language Processing.

**Degree:** The University of Manchester, BSc Mathematics**Supervisor:** Henry Moss

Natural Language Processing (NLP) allows computers to understand human speech and writing. The standard approach in NLP is to fit the model in a way that avoids relying on features over-represented in the sample (known as overfitting). There are two methods: regularization and term-frequency weighting. There is no clear consensus on which method is best. Project’s aim is to investigate the relationship between these two approaches, alongside tests across a range of NLP tasks.

View Konstantin's Presentation - Konstantin and Poster - Konstantin.

## 2017 Interns

Here you can find details of the summer 2017 interns including a description of their research project.

### Edward Austin

#### Modelling Risk in Hazardous Material Transport

**Degree:** Lancaster University, BSc Mathematics**Supervisor: **Chrissy Wright

The risk of an accident is an important factor to consider when transporting hazardous materials.

Because accidents can be deadly the route with the least overall risk should be chosen.

This project looks at how best to model the risk to enable safer routes to be taken.

### Callum Barltrop

#### Investigating bias in return level estimates due to the use of a stopping rule

**Degree:** Lancaster University, BSc Mathematics**Supervisor:** Anna Barlow

There are many situations in which rare and extremely large (or small) events are of interest. For example, the focus of my project is the statistical modelling of extreme flood events. Extreme Value Theory is concerned with the modelling of the tails of the distribution and provides a theoretically sound framework for the study of extreme values. In particular, the Generalised Extreme Value distribution is used to model the maxima of a process within blocks of time (often a year). Usually, we are mostly interested in estimating the x-year return levels of a distribution, that is, the value we'd expect to be exceeded on average once every x years. However, the point at which we decide to stop sampling and analyse the data is not arbitrary and this choice of stopping point can result in biased return level estimates. After the December 2015 floods there was much interest in re-evaluating the return level estimates, as the inclusion of such a large event often led to significant changes in the value of these estimates. In this project, we will consider possible ‘stopping criteria’ (i.e. rules that tell us when to stop sampling data and do our analysis) to approximate the procedures used in reality and investigate the bias in the standard estimates. We will implement a variety of new estimators developed with the intention to improve upon the existing standard methods.

### Jonathan Bevan

#### Time series Classification

**Degree:** Lancaster University, MSci Mathematics and Statistics**Supervisor:** Harjit Hullait

The internship project will be focused on Time series classification, an area that has applications in various fields. The idea is to build a classifier, which is able to label a time series from a defined list of possibilities.

For example if we have heart rate time series for people walking and people running, we have two label: runner or walker. There are two main challenges in classification, firstly a set of labels needs to be chosen and secondly a classifier needs to be built that can label the time series.

### Chloe Fearn

#### Analysis of Armed Conflict Data

**Degree:** Lancaster University, MSc Mathematics**Supervisor:** Christian Rohrbeck

The Armed Conflict Location & Event Data Project (ACLED) has aggregated the exact location, date, and other characteristics of several violent events in unstable and warring states. The analysis of this data is challenging due to the vast amount of factors influencing such events. Koren and Bagozzi (2017)( Journal of Peace Research, 54(3)) find, for instance, that, in times of war, violence against civilians occurs more frequently in areas with a high percentage of cropland. This result is derived based on a zero-inflated model which accounts for armed conflicts not being present in all areas at all times. The proposed project considers the publicly available data and aims to slightly extend the model by Koren and Bagozzi (2017), for instance, by accounting for the spatial aspect of the data. In particular, the project can be split over three steps: (i) Exploratory analysis of the Data, (ii) Estimation of a similar model which to the one by Koren and Bagozzi (2017) and (iii) Extending the model.

### Jake Grainger

#### Assessing the Use of Spatial Models for Extremes

**Degree:** Lancaster University, MSc Mathematics**Supervisor:** Rob Shooter

Being able to model spatial extremal behaviour (in particular spatial dependence) is an important area of Extreme Value Theory and this project will aim to give an introduction into the various methods of trying to capture this behaviour. The first part of this project will provide a short introduction to univariate extreme value theory and also will look at some methods of spatial statistics - in particular looking at Gaussian Processes, which will be simulated and have interpolation methods performed on them. The second part will introduce the Smith process (a particular type of max-stable process) and will compare this to using Gaussian Process techniques on data, with the aim of comparing how well the two types of spatial model are able to describe the nature of the data.

### Graham Laidler

#### Sequential Changepoint Detection: Anticipating the next Financial Crash

### Durham University, MMath Mathematics

Supervisor: Sam Tickle

Changepoint detection underpins virtually all questions of interest surrounding data analysis in a variety of contexts. Understanding the nature of a change, and when it occurred, is often of vital importance in preventing problems surfacing in the future. With the advent of Big Data, more sophisticated tools are increasingly required to search for changes on datasets of ever-growing size. Most existing methods for changepoint detection are offline, requiring the collection of an entire dataset prior to analysis, and interest in online techniques, where informed statements regarding changes of the recent past can be made in tandem with data collection, is growing.

This project will examine various existing methodologies which employ an online approach to changepoint detection, both Bayesian and frequentist, and attempt to apply these ideas to real-time datasets (for example, share price data for various FTSE100 companies) in order to find the best performing algorithms which can operate most efficiently in the greatest number of contexts. Depending on specific interests, this can involve exploring prior selection, investigating various 'control charts’ or using likelihood-based approaches among other options. There is also potential scope in helping to pioneer entirely new techniques which can then be tested against some of the existing methods.

### George Phillips

#### Combination therapies: improving outcomes via the probability of success

**Degree:** University of York, MMath Mathematics**Supervisor:**Emily Graham

Combination therapies are able to hit the many mechanisms of diseases/cancers simultaneously by combining existing drugs and new molecular entities. When developing a combination therapy, the aim is to produce a synergistic effect while reducing side effects. However, drug development is a long and expensive process which is subject to a considerable amount of uncertainty. Therefore it is important that the decisions made are well informed and are expected to be the most beneficial to both the pharmaceutical company and the patient population.

Methods for decision making often require several parameters relating to a drug. We are interested in the estimation of the probability of study success for combination therapies. Current methods do not allow information to be shared across similar combinations. We believe that incorporating this information in a Bayesian setting will improve the accuracy of our estimates. This will lead to better decision making and improve the outcomes of the development programmes.

### Harry Spearing

#### Simulation Optimisation Techniques for Time-Dependent Staffing Problems

**Degree:** Lancaster University, BSc Physics**Supervisor:** Luke Rhodes-Leader

In many real world problems, such as complex queueing problems, mathematical models of the system can be too complex to solve analytically. An alternative way to study stochastic systems is to use a simulation to produce realisations of the system. Simulation can be used to optimise a system by testing alternative settings. The choice of optimisation technique depends heavily on the properties of the problem, such as size of the solution space, how many objectives there are and whether the decision variables are discrete or continuous. Due to the stochastic nature of the problems, the optimisation is further complicated as the objective must be estimated, rather evaluated exactly. This project will focus on finding simulation optimisation techniques appropriate for the optimal staffing problem for a time dependent queueing system, such as that of an emergency call centre.

### Livia Stark

#### Executing Offshore Maintenance Activities

**Degree:** Lancaster University, MPhys Physics**Supervisor:** Toby Kingsman

At the start of the internship time will need to be spent learning about the general offshore maintenance problem and literature associated with it. This could be simpler sub-problems such as the travelling salesman problem, travelling repairman or scheduling of tasks. Depending on the student’s knowledge of linear programming and coding, time could be spent trying to implement one of these models on the computer.

The goal of the project is likely to be creating some simple construction heuristics to solve the offshore maintenance routing and scheduling problem. These could be extended to more general problems depending on the student’s interest, e.g. several vessels or tasks completed in stages. The performance and results of these heuristics could be compared across several instances.

### Jinran Zhan

#### Scheduling using Optimisation

**Degree:** University of Southampton, BSc Mathematics and Statistics**Supervisor:** David Torres Sanchez

The project will focus on one the main optimisation scheduling problems. Project planning, it refers to the programming of different activities that need completion for a given project. It is also heavily conditioned by the specifications on the resources and activities, making the problem really interesting for mathematicians. In this project we will be focussing on understanding the so-called resource-constrained project scheduling problems (RCPSP). The generality of the RCPSP allows it to have a wide range of applications where the aim is to schedule some activities or jobs over a period of time such that precedence and resource constraints are satisfied, and a certain objective function is optimised. Depending on the student’s knowledge of linear programming and optimisation we can study the varied formulations or if the student is familiar with it we can jump straight into the pre-emptive case for long term planning horizons. Either of these tie in with testing on Python using Gurobi which will be learnt if needed.

## Associated Interns 2017

### Waseem Aslam

#### Management Science Intern

Waseem joins the STOR-i Internships from the Lancaster University Management Science department.

## 2016 Interns

Here you can find details of the summer 2016 interns including a description of their research project.

### Stefanos Bennett

#### Regression with Dependencies and Non-Gaussian Noise

**Degree:** University of Cambridge, BA Mathematics**Supervisor:** Stephen Page

The linear model is a widely used tool in regression analysis. Linear regression models are most commonly fitted using them both conceptually and computationally simple least-squares approach. A frequently made assumption in linear least squares regression is that the error terms between the observed responses and the corresponding expected values are independent and identically distributed normal random variables. This assumption greatly simplifies the matter of obtaining confidence intervals for the unknown parameters of our model. However, whether this is a sound assumption depends on the size and nature of the particular dataset under consideration. This project will investigate the case when the assumption is not satisfied. Various techniques for obtaining confidence sets will be examined and compared to the sets obtained via normal approximation. The effects of different possible violations of the Gaussian assumption on the constructed confidence sets will be investigated.

### Matthew Bold

#### Input Uncertainty in Simulation Models

**Degree:** University of Birmingham, MSci Mathematics**Supervisor:** Lucy Morgan

The simulation uses mathematical modelling in order to mimic real-world systems which cannot be tested in reality; perhaps due to time, cost or safety constraints. The information gained by running the simulation can then be used to make decisions about the real-world system. For example, retailers want to ensure they have enough servers to prevent customers from having to queue for long periods of time. A simulation model can be used to understand how the queue behaves and make a decision about how many servers are needed for each shift in order to keep the queue length below a certain level. The inputs in simulation models are usually approximated by observing real-world data; for example, observing the number of customers that are served in a shop over a period of time. Input uncertainty arises from the fact that we only have a finite amount of real-world data, and therefore cannot be certain that the values of the input parameters that are being used to drive the simulation are the true values of the input parameters. This project aims to quantify the input uncertainty in a queueing simulation model.

### Bronwen Edge

#### How good is the Lancaster University Mathematics Department? – An investigation using Data Envelopment Analysis

**Degree:** BSc Mathematics, Heriot-Watt University**Supervisor:** Emma Stubington

Each year university league tables are released but many are based on different criteria and have slightly different results. We are interested in testing the efficiency and productivity of mathematics departments across the country. As we are considering multiple inputs and outputs: student satisfaction, entry requirements, academic and career attainment and the cost of university, etc. it is difficult to make direct comparisons between institutions. We therefore need to use a management science method, Data Envelopment Analysis, (DEA) which can cope with lots of constraints. What I am finding particularly interesting is the additional questions that arise from examining the data and implementing this approach, for example: Should universities that produce high numbers of good degrees be considered the best? Are some students not reaching their potential and are being let down by their institution, given they entered university with extremely high entry requirements? Are some universities awarding an unrepresentative number of good degrees considering their place in current league tables, or is the data just extremely bias with a small sample size? Should all universities be charging the same fees, given their career opportunities after are significantly less? Is university location skewing the career prospects of students, whilst not taking into consideration the living costs and average salary of non-graduates of some locations? As my project advances, I have realised that what seemed like a simple linear programming problem evolves into a complex social and economic issue, which questions the real cost to students when choosing which university is best for them.

## Thomas Grundy

### Supervisor: Oliver Hatfield

Detecting Match-Fixing in Tennis

In January 2016, tennis was hit by allegations of widespread match-fixing prompted by the release of secret documents from reviews into tennis’ integrity. The documents detailed widespread accusations of corruption within the sport. The aim of the project is to create simulations of tennis matches and explore sudden changes in performance, which could be linked to match-fixing, using simple change point methods. Features such as dependence and the importance of critical points will also be taken into account to create accurate simulations. In addition the current rating system within tennis only takes into consideration the previous years results and has no consideration on the strength of opponents. A further aim of the project is to create a rating system based around the ELO system with improvements.

## Ben Miller

### Supervisor: Aaron Lowther

Detecting Unwanted Variation in Time Series

A statistical outlier in a set of data is defined to be “an observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data”. In the context of time series, examples of outliers may include the number of complaints received by BT after a power outage, or the increase in supermarket sales during the days leading up to Christmas. It is important that we are able to detect these outliers as they may have a significant impact on the model selected to fit the data, the parameter estimates for the model, and consequently, on any forecasts made from the model. This project will look into methods and algorithms that are able to automatically detect outliers in time series.

## Henry Moss

### Supervisor: Emma Simpson

Assessing Dependence in Extreme Values

Extreme value theory models the maxima (minima) of random variables. By their very nature, they occur infrequently and so are hard to model. A robust framework already exists, with block-maxima and threshold-based approaches providing parametric distributions for the maxima. Known as the Generalised Extreme Value (GEV) and Generalised Pareto (GP) distributions, these allow us to estimate the maximum value that we would expect to see over n years. My project looks into the bivariate case, where our variables have extremes that either occur simultaneously (Asymptotic Dependence) or independently (Asymptotic Independence). There already exist several statistical measures that measure this behaviour however it is hard to obtain reliable estimates of their values. I am looking at developing an alternative method to simultaneously estimate two of these measures, with the hope of finding some synergies.

### Emma Oldfield

#### Improving question selection in education software

**Degree:** University of Sheffield, BSc Mathematics**Supervisor:** Ciara Pike-Burke

With the advance of technology in education, it is becoming more possible to personalise education software, providing students with questions tailored to their individual learning styles and abilities. The data gathered from the students' previous interactions with the education software can be used to simulate students response to future data. This enables us to model student performance. The main aim will be to investigate whether Bayesian methods can provide a more accurate prediction of student performance over frequentist methods. The Bayesian approach looked into Monte Carlo Markov Chains and Random Walk Metropolis. The models will be used to predict whether students would pass an exam of particular questions.

### Anja Stein

#### Assigning Drones in Military Search

**Degree:** University of Edinburgh (Mmath)**Supervisor:** James Grant

Drone technology is fast becoming a vital component of military operations. Unmanned Aerial Vehicles (UAVs), as they are known within the military, can perform a variety of tasks remotely making things both more efficient and safer for military personnel. This project revolves around optimizing the UAV Search Problem by maximising the number of events detected within a given border by a fleet of UAVs equipped with cameras. The UAVs aim to detect the locations of events of some sort occurring on the border (one example may be crossings of the border). Each UAV is to be assigned a specific subsection of the border to patrol, with the assumption being that the larger its subsection is, the less likely it will be to actually detect an event. Some UAVs may be naturally better at detecting events than others (because of better cameras etc.) and some UAVs may be better equipped to detect events in certain parts of the boundary (e.g. different types of terrain).

### Georgios Topaloglou

#### Univariate methods for time series forecasting

**Degree:** University of Cambridge, BA Mathematical Tripos**Supervisor:** Daniel Waller

Time series are often grouped in a hierarchical structure. For example, the time series for the total number of tourists visiting a country may be split into more time series according to the purpose of travel, and each of these time series may, in turn, be split into more time series according to the length of stay, thus creating a tree-like hierarchical structure. The issue of forecasting hierarchical series in a way that allows for a similar hierarchical disaggregation of the forecasts is very important. This project will combine two methods that have recently been proposed, optimal combination and temporal aggregation. It will then test the accuracy of this new method against that of optimal combination and other standard techniques such as bottom-up and top-down forecasting.

### Alan Wise

#### Detecting Changes in Multivariate Time Series

**Degree:** University of Edinburgh, Mathematics BSc (Hons)**Supervisor:** Rebecca Wilson

Changepoint detection of univariate time series has been widely covered but the increasing availability of multivariate data has motivated the study of multivariate detection methods. Time series data of a multivariate flavour can be found in finance, health monitoring, signal processing, bioinformatics, and detecting credit card fraud. In my project, I explore a few methods to detect change points of multivariate time series data. I also discuss the drawbacks of these methods and suggest ways in which these drawbacks could be overcome.

## 2015 Interns

Here you can find details of the summer 2015 interns including a description of their research project.

### Ana Daglis

#### Statistical inference for evolving network structure

**Degree**: University of Cambridge, BA Mathematics**Supervisor**: Matthew Ludkin

Networks are prominent in today’s world. The volume of telecommunications and social network data has exploded in the last two decades. Gaining a statistical understanding of the processes generating and maintaining network structure can be used to make confident statements about properties of a network, detect anomalous behaviour or target adverts. In recent years more data has been collected alongside the network. Can such covariate information improve inference for network structure compared to network data alone? Many have attempted to model how networks grow, however, most models have poor statistical properties. This project will investigate approaches for combining statistical methodology from static modelling techniques with methods for analysing data indexed through time.

### Lawrence Latter

#### Modelling extra-tropical cyclones using extreme value methods

**Degree**: Lancaster University, MSci Mathematics**Supervisor**: Paul Sharkey

The prevalence of extra-tropical cyclones in the mid-latitudes is a dominant feature of the weather landscape affecting the United Kingdom. The UK has come to expect a consistent annual pattern of temperate summers and mild winters. However, in recent years it has been a focus of extreme weather events, for example, major floods and damaging windstorms. Accurate modelling and forecasting of extreme weather events are essential to protect human life, minimise potential damage and economic losses, and to aid the design of appropriate defence mechanisms. In this context, an extreme event is one that is very rare, with the consequence that datasets of extreme observations are small. The statistical field of extreme value theory is focused on modelling such rare events, with the ideology of extrapolating physical processes from the observed data to unobserved levels. This project will focus on applying extreme value methods to remote sites in the North Atlantic and European domain.

### Euan McGonigle

#### A Linguistically-Motivated Changepoint Problem from a Bayesian Perspective

**Degree**: University of Glasgow, MSci Mathematics**Supervisor**: Sean Malory

Sequences arise naturally in linguistics with the number of occurrences of a linguistically salient feature changing over time as language attitudes evolve. One such feature is the use of flat adverbs, for instance, in the phrase “fresh ground coffee" the word “fresh" is a flat adverb, since it functions as an adverb but lacks the typical suffix “ly". While not as widespread nowadays, flat adverbs were commonly used during 1700-1900. Authors of this period used flat adverb forms and were publicly criticised for doing so. This project will introduce a Bayesian statistical framework to investigate whether the rate of flat adverb use changed significantly after an author's writing had been subjected to such criticism. This will focus on the detection of changes in a sequence of data points using a Bayesian approach, specifically, we will be interested in quantifying (in a precise way) whether or not a change in the sequence has occurred at some point.

### Daniel Miles

#### Density-based cluster analysis

**Degree**: University of Reading, MMath Mathematics**Supervisor**: Katie Yates

Cluster analysis is the process of partitioning a set of data vectors into disjoint groups (clusters) such that elements within the same cluster are more similar to each other than elements in different clusters. Clustering has a wide range of application areas including Biology, Physics, Computer Science, Social Science and Market Research. There are three main categories of algorithms which can be applied in order to find solutions to data clustering problems: hierarchical, partitioning and density-based. The main focus of this project is to explore density-based clustering methods and to compare the performance of these algorithms via simulation studies.

### Sarah Oscroft

#### Classification in streaming environments

**Degree**: Newcastle University, MMath Mathematics**Supervisor**: Andrew Wright

The aim of a classification model is to predict the class label of a new observation using only historical observations. Traditional classification approaches assume this historical dataset is a fixed size and is drawn from some fixed probability distribution(s). However, in recent years a new paradigm of data stream classification has emerged. In this setting, the observations arrive in rapid succession, with classifiers capable of being trained sequentially, and an adaptable underlying probability distribution. These classifiers have applications in areas as diverse as spam email filtering, analysing the sentiment of tweets and high-frequency finance. This project will investigate how models can be used to produce streaming versions of classifiers.

### Srshti Putcha

#### Auto-Correlation Estimates of Locally Stationary Time Series

**Degree**: London School of Economics, BSc Mathematics and Economics**Supervisor**: Jamie-Leigh Chapman

A time series is a sequence of data points measured at equally spaced time intervals. Examples of time series include FTSE 100 Daily Returns and the total annual rainfall in London, UK. Often we assume that such series are second-order stationary. In other words, the statistical properties of the time series remain constant over time, e.g. the autocorrelation. However, the reality is that many time series are not second-order stationary and therefore it is not appropriate to model them using such methods. Instead, we must consider time-varying equivalents of the autocorrelation or autocovariance. One method that analysts use to adapt the regular autocorrelation function to be a time-varying quantity, is applying rolling windows of the data. Unfortunately, this can present quite different answers for segments of different lengths based on segment length choice and location of the time series sample. This project will explore alternative methods of estimating a time-varying auto-correlation function in order to overcome these problems.

### Sam Tickle

#### Regression, curve fitting and optimisation algorithms

**Degree**: University of Cambridge, BSc Mathematics Tripos**Supervisor**: Elena Zanini

The underlying strategy for most statistical modelling is to find parameter values that best describe the fit of the model to the data. This requires optimising an objective function while minimising the difference between the model and the observations. When analytical solutions to the optimisations are unavailable, statisticians often rely on numerical optimisation routines to perform this fit, trusting that this will produce stable estimates of the parameters. Firstly, some issues may arise in the choice of the best algorithm given the characteristics of the problem at hand. Secondly, the algorithm considered may not actually perform well and needs to be understood and adapted to work better on the model considered. This project will investigate different numerical optimisation algorithms used in statistical inference and curve fitting, and how to overcome some of the problems associated with these types of algorithms.

### Zak Varty

#### Pharmacokinetic Modelling

**Degree**: Lancaster University, MSci Mathematics**Supervisor**: Helen Barnett

In medical research, in both pre-clinical and clinical trials, the objective is to learn about the behaviour and effect of potential new drugs in the body. This breaks down into two categories- how the drug affects the body (Pharmacodynamics) and how the body affects the drug (Pharmacokinetics). This application-driven project focuses on pharmacokinetic modelling, which involves modelling the concentration of a compound in the blood over time. The aim of the project is to apply statistical modelling techniques to real data in order to obtain an understanding of the role of pharmacokinetics in the drug development process.

## 2014 Interns

Here you can find details of the summer 2014 interns including a description of their research project.

### Anna Maria Barlow

#### Spatiotemporal Modelling of Economic Data using Disease Mapping

**Degree**: University of Durham, MMath Mathematics**Supervisor**: Christian Rohrbeck

The field of statistics focusing on models incorporating spatial information is called Spatial Statistics. Spatial statistics generally distinguishes between three types of data: geostatistical data, lattice data and spatial point patterns. This project will focus on lattice data, where the number of sites at which observations are recorded is finite, for example, the population in each county of the UK or the results of the last general election per district. Spatial statistical methods for lattice data are often applied in epidemiology to model the occurrence of a disease in a region depending on covariates. This is known as Disease Mapping, with models aiming to predict the occurrence rate or the number of cases of a particular disease. This project will investigate the basic methods used in Disease Mapping and apply them to economic data.

### Dawid Bernaciak

#### Relocation Operations in One-Way Car-Sharing Problems

**Degree**: University of Glasgow, MSci Mathematics**Supervisor**: Burak Boyaci

Car-sharing is a new concept that enables the general public to access a fleet of vehicles for short rental periods. These systems have several benefits including environmental, energy and societal considerations. Car-sharing systems have two general types; the restrictive “two-way” system where users pick up and drop off the vehicle at the same location, and the more flexible “one-way” system enabling the users to choose a different drop-off location to the pick-up station. For the customer, the one-way system is generally preferred however one of the difficulties in implementing a one-way system is managing the relocation of vehicles and personnel. This project will develop and implement models for improving relocation operations for the one-way car-sharing problem.

### Helen Coupland

#### Modelling ocean environments with extreme value theory

**Degree**: University of Durham, MMath Mathematics**Supervisor**: Monika Kereszturi

Offshore structures such as oil platforms and vessels must be designed to have very low probabilities of failure due to extreme weather conditions. Inadequate design can lead to structural damage, lost revenue, danger to operating staff and environmental pollution. Design codes demand that all offshore structures exceed specific levels of reliability, most commonly expressed in terms of an annual probability of failure or return period. Hence, interest lies in environmental phenomena that occur extremely rarely, and we want to estimate the rate and size of future occurrences. The aim of this project is to gain a deep understanding of extreme value theory in the application of ocean environments.

### Toby Kingsman

#### Analysis of Algorithms for yield optimization and batch scheduling problems

**Degree**: University of Birmingham, MSci Theoretical Physics and Applied Mathematics**Supervisor**: Trivikram Dokka

A common scheduling problem in industrial settings is concerned with scheduling jobs on identical machines with the objective of minimizing the total active time. The problem finds important applications in the field of (energy-aware) scheduling especially in applications relating to optimal network design. The aim of this project is to investigate the performance of some natural heuristics proposed for finding near-optimal solutions to these computationally hard problems. This will involve learning about the integer and linear programming formulation based methods and using computer programming to implement algorithms and solve linear programs.

### Aaron Lowther

#### Seasonally Adjusting Official Time Series

**Degree**: Lancaster University, BSc Mathematics**Supervisor**: Rebecca Killick

The Office for National Statistics (ONS) publishes thousands of seasonally-adjusted time series which are used to produce the official statistics that create the news headlines regarding, for example, increase/decrease in unemployment and double or triple-dip recessions. Seasonal adjustment involves estimating and removing a seasonal component from a time series. This project aims to develop and test a method for the automatic detection of changes in the seasonal pattern of time series by comparing alternative methods and assessing the impact on the estimation of seasonal factors for series that do and do not present changes in the seasonal pattern.

### Rachel Naylor

#### Fast inference for processing intelligence information

**Degree**: University of Bath, MMath Mathematics**Supervisor**: Lisa Turner

Intelligence is information regarding threats to national security and potentially hostile forces. After raw intelligence data is collected it must be processed and screened, often in time-critical situations. Only relevant information is then passed on for further analysis. With huge amounts of intelligence data collected daily, potentially relevant information can be missed. Given a set of intercepted communications, how should we process the communications to maximise the amount of relevant information passed on for analysis? This project will develop a model for processing intercepted information and explore how to overcome problems associated with this type of model.

### Emily Olesker

#### Assessing Performance of Changepoint Detection Algorithms

**Degree**: University of Cambridge, BA Mathematics**Supervisor**: Kaylea Haynes

Changepoints are a widely studied area of statistics with applications including, but not restricted to, finance; detecting changes in volatility, computer science; detecting instant messaging worms and viruses and environmental such as oceanography and climatology. Changepoints are considered to be the points in a time-series where we experience a change in some statistical property, for example, a change in mean or a change in variance. There are many different approaches to changepoint analysis however current methods have the trade-off of being fast but approximate or exact but slow. The aim of this project is to develop an understanding of changepoint detection methods and in particular explore ways in which we can assess the performance of different detection methods.

### Luke Rhodes-Leader

#### Explaining changes in aggregated time series

**Degree**: Lancaster University, MSci Physics with Mathematics**Supervisor**: Lawrence Bardwell

In many applications, there is some indicator that is constantly monitored as new data are collected, for example in an industrial setting, the number of faults recorded on a large network per week. Typically at a managerial level interest lies in the total number of faults over the entire network and patterns or changes that may occur. One important change in this indicator is a spike (outlier) where suddenly there is a large increase in the number of faults over the entire network. Understanding why these sudden increases occur is important so they can be prevented from happening again. This project will investigate methods for detecting outliers in large time-series datasets.

### Matthew Robinson

#### Selection of Tolerance Level for Approximate Bayesian Computation

**Degree**: University of Warwick, MMath Mathematics**Supervisor**: Wentao Li and Paul Fearnhead

For many complex datasets, one feature is that the likelihood of the statistical model is intractable, in the sense that it is difficult to evaluate the likelihood values of the observations, and standard inference methods for unknown parameters, like Maximum Likelihood Estimation and Monte Carlo Markov Chain, do not work. For intractable problems of which sampling from the likelihood given parameter values is easy, Approximate Bayesian Computation (ABC) is a useful Bayesian inference method using Monte Carlo simulations. The project will investigate the impact of the tolerance level, a core parameter of the ABC algorithm, in various situations and try to design an automatic algorithm to select the tolerance level.

### Emma Stubington

#### Travelling Salesman Problem

**Degree**: Sheffield University, BSc Mathematics**Supervisor**: Ivar Struijker-Boudier

Scheduling problems can be found in many industrial settings. The complexity of scheduling problems is often such that optimal solutions cannot be guaranteed to be found in short computational time. However, many companies need to produce schedules on a daily basis, so they need a computationally fast way of implementing this. A well-known example of a difficult to solve scheduling problem is the travelling salesman problem (TSP) which is concerned with finding the shortest route which visits each of a number of locations exactly once. If every location can be travelled to directly from every other location, then the number of possible solutions increases very quickly as more locations are added to the problem. Evaluating every possible solution then becomes impossible. This project will explore the travelling salesman problem and will assess and compare various solution methods for the TSP.

### Luke Whincop

#### Modelling solar irradiance for energy generation

**Degree**: University of Bath, MMath Mathematics**Supervisor**: Nikos Kourentzes

The increasing investment in renewable energy is essential to guarantee immediate answers both to the high and fluctuating prices of crude oil and to the diversification of energy supplies, thus reducing external dependence on oil, gas and coal. Therefore, solar power generation becomes an area of paramount research. Various time series methods have been implemented to forecast solar irradiance for power generation however a complication with solar irradiance data is that of multiple seasonalities- seasonality from the day-night cycle and the annual earth cycle. This project will attempt to tackle some of the questions related to modelling the seasonal element of solar irradiance using time series and forecasting models.

## 2013 Interns

Here you can find details of the summer 2013 interns including a description of their research project.

### Martin Andla

#### Statistical Modelling in Sports

**Degree**: University of York, MMath Mathematics (2010-present)**Supervisor**: George Foulds

To aid the application of betting and investment strategies an edge must be sought over the market. Simulation modelling is a crucial part of this process, providing evidence to support real-world data analysis and professional conjecture. This project will introduce the student to the use of statistical modelling in the prediction of sports results and allow them to adapt a well-known model using their findings from freely available real world data.

### Jenny August

#### Detecting Changepoints in Multivariate Data Series

**Degree**: University of Edinburgh, BSc Mathematics (2010-present)**Supervisor**: Ben Pickering

Data collection is a huge component of the workings of any modern organisation. There are many examples of situations where data is collected from multiple sources which may be related in some way, for example, the stock prices of multiple companies in the same industrial sector. While the nature of these data values may stay fairly constant over time, occasionally some event may occur which causes a sudden change in the values being recorded at all sources, for example, in financial data, there may be a stock market crash. The times at which such changes occur are known as multivariate changepoints. This project will explore the effectiveness of current multivariate changepoint methods.

### Thomas Berrett

Learning in Dynamic Environments

**Degree**: University of Cambridge, BA Mathematics (2010-2014)**Supervisor**: David Hofmeyr

Machine learning is a field of artificial intelligence focused on developing algorithms which allow computers to evolve and improve their behaviour as a result of empirical data. In the context of this project, this refers to the construction of a data-driven model to aid in a predefined task. The task might be something basic like making predictions based on a simple regression model, or it might be highly complex like describing intricate biological systems. This project offers a variety of possibilities due to the lack of specificity in online learning, and there is considerable flexibility for its direction depending on the student’s preference.

### Simon Crawford

#### Bayes Sequential Decision Problems

**Degree**: University of Bath, MMath Mathematics (2010-present)**Supervisor**: James Edwards

Many important decisions have to be made under uncertainty because the information that is relevant to the problem is missing or is only known imperfectly. Often, these decisions are not taken in isolation but in a sequence. New information that becomes available as a result of our actions can then be used to make better decisions in the future. However, the actions that give the best short term results may not be the same actions that give the most information. This presents a trade-off between taking the actions that are best in the short term and the need to learn for better long term results. This project will explore a number of statistical theories for dealing with decision problems followed by testing and selecting optimal methods.

### Oliver Hatfield

#### Multiple Changepoint Detection in Non-Trivial Models

**Degree**: University of Durham, MMath Mathematics (2010-2014)**Supervisor**: Rob Maidstone

Time series data sometimes experiences abrupt changes in structure. These changes are called changepoints. To model the data effectively these changepoints need to be detected and subsequently built into the model. Changepoints occur in a variety of real world situations, for example when analysing human genome data the average DNA copy value is usually around the same level, however occasionally sudden changes away from this level occur. These sudden changes in average DNA level often relate to tumourous cells and therefore the detection of these changes is critical for classifying the tumour type and progression. This project will introduce the student to changepoint models and involve programming these models using statistical computing software.

### Lucy Morgan

### Background Subtraction: Methods for Video Analysis

**Degree**: Lancaster University, BSc Mathematics (2011-2014)**Supervisor**: Rhian Davies

Surveillance cameras have become ubiquitous in many countries, collecting a huge amount of data, most of which is stored and never analysed. Converting this data into useful information can be problematic, particularly as large companies often use many cameras simultaneously. Often it is of interest to the user to detect anomalies in video footage, for example a person placing an item in their bag instead of their shopping trolley. In order to detect such anomalies, we first need to separate the foreground and the background of a video. One popular method for splitting the foreground from the background is background subtraction. The aim of this project is to investigate the effectiveness of different algorithms for background subtraction under a number of real and challenging scenarios.

### Ciara Pike-Burke

#### Evaluating the Structure of the Excitability Curve of Motor Neurons

**Degree**: University of Manchester, BSc Mathematics with Spanish (2010-2014)**Supervisor**: Simon Taylor

Scientists in the field of neuromuscular research are interested in understanding the structures and processes involved in operating a working muscle. The fundamental component to this process is a motor unit: consisting of a single motor neuron and a collection of muscle fibres that it governs. Evaluating the number of motor units that form a working muscle is very important in understanding the effects of various neuro-degenerative disorders and also in assessing the effectiveness of proposed treatments. The aim of this project is to analyse data from the stimulation of a single motor unit using importance sampling and Bayesian statistics.

### Michelle Pinharry

#### The Unit Commitment Problem and Wind Energy

**Degree**: University of Bath, BSc Mathematics (2011-2014)**Supervisors**: Pedro Crespo del Granado and Franklin Djeumou Fomeni

The UK’s wind renewable resources share in the grid energy generation mix is expected to be around 20-30% by 2020. Wind generation, however, creates new planning challenges to maintain a stable and reliable supply-demand balance. Since wind generation fluctuates independently from energy demand, this creates a disturbance for the short term generation planning and scheduling of other generation units (such as gas or coal power plants). This brings a new degree of uncertainty on stabilizing the power network equilibrium between supply and demand in real time. This project will use optimisation modelling to answer the question what is the optimal cost-effective mix of energy units needed to achieve carbon reduction targets whilst also coping with high wind input?

### Benjamin Pring

#### Betting Markets and Strategies

**Degree**: University of Bath, BSc Mathematics (2010-2013)**Supervisor**: Tom Flowerdew

Markets come in many forms. From buying and selling livestock to trading complex financial derivatives the key to making long-term profits is to establish an edge on the market. Once an edge has been established, the question is how can wealth be optimised? This project will investigate ways in which existing theory can be adapted to fit into sports betting markets, and ways in which underlying assumptions can be removed to allow the theory to become more general.

### Emma Simpson

#### A study of the air quality of major cities in China

**Degree**: University of Durham, MMath Mathematics (2010-2014)**Supervisor**: Ye Liu

The air quality in some major cities in China has long suffered from the rapid industrialisation and increasing vehicle usage. With the help of social network and media coverage, this issue has gradually come to the concerns of the government as well as the general public. This research project will aim to gain some insight into the air pollution problem in China using classical statistical techniques such as time series analysis and extreme value theory.

### Kathryn Turnbull

#### Modelling droughts with extreme value theory

**Degree**: University of Durham, MMath Mathematics (2011-present)**Supervisor**: Hugo Winter

Droughts are large scale climatic phenomena that can lead to social and economic damages. In Africa, periods of drought can lead to food instability and large death tolls as well as having a knock-on effect on the economies of major aid providers. In the UK, a drought could cause reservoirs to run low and lead to government legislature such as hose-pipe bans seen over the last few summers. It is of great concern to governments and industry where and when these events may occur and also whether their occurrences will differ in the future with anticipated global climate change. Using standard statistical techniques for rare events will potentially result in badly fitting models and worse, to misleading policies. With such rare and sparse data a more reliable approach is needed; this is called extreme value theory. This project will introduce the student to extreme value theory and its applications for drought data.

### Christina Wright

#### Resource Allocation in Service Industries

**Degree**: University of Durham, MMath Mathematics (2010-2014)**Supervisor**: Emma Ross

The effective allocation of resources to meet demand is an essential consideration of any company hoping to survive in a competitive market. This and many other important decision problems can be formulated as a well-known combinatorial optimisation problem called the knapsack problem. It forms a basis from which to study such decision problems, but we quickly run into difficulty when the complexity and scale of real problems faced in industry are incorporated. This project will allow the student to investigate the impact of uncertainty in resource allocation problems by introducing them to linear programming.

## 2012 Interns

Here you can find details of the summer 2012 interns including a description of their research project.

### Jamie-Leigh Chapman

#### Scenario Generation for Stochastic Programming

**Degree**: University of York, MSci Mathematics (2009-2013)**Supervisor**: Jamie Fairbrother

Often we have to make decisions in the face of uncertainty. A shop manager has to decide what stock to order without knowing the exact demand for each item. An investment banker has to choose a portfolio without knowing how the values of different assets will evolve. Taking this uncertainty into account allows us to make good robust decisions. This project uses stochastic programming as a tool to investigate such decision-making processes.

### William Cook

#### Modelling anti-terrorist surveillance systems from a queueing perspective

**Degree**: University of Cambridge, BA Mathematics with Physics (2010-2013)**Supervisor**: Terry James

It is without question that surveillance is very much a part of the modern world. A growing interest in the need for surveillance has been matched by technological advances in the area. Surveillance cameras, either static or as part of an unmanned aerial vehicle have the ability to feed real-time information to a control centre. Here the subject under surveillance can be properly assessed in terms of their identity or possible intentions in a biometric fashion. This project explores an aspect of the emerging operational research field of Homeland Security. More specifically this project will consider the challenge of modelling the defensive surveillance of public areas which are subject to attack by terrorist subjects.

### Josephine Evans

#### Spectral Analysis of Multivariate Time Series

**Degree**: University of Cambridge, BA Mathematics (2010-present)**Supervisor**: Tim Park

The advent of smartphones has opened up new possibilities for the collection of data. These phones contain sensors such as accelerometers, gyroscopes and GPS making them a cheap and easy way for companies to collect time-series data. This data is often multivariate and nonstationary and often the main challenge is deciding which channels to focus the analysis on rather than the choice of analysis method itself. This project uses principal components analysis to identify which channel to focus on when analysing a multivariate time series.

### Matthew Ludkin

#### Hybrid simulation models for maintenance processes

**Degree**: University of Birmingham, MSci Mathematics (2009-2013)**Supervisor**: Mark Bell

One of the most widely used dynamic modelling methods in Operational Research for understanding and improving organisational systems is discrete event simulation (DES); an application of this method is in modelling maintenance processes. In a large organisation, there are often many additional interactions that affect maintenance operations. When this is the case, there are occasions where modelling the system using DES alone is not sufficient therefore System Dynamics (SD) may be utilised. In Operational Research these two approaches have traditionally been separated but in recent years there has been an emergence of using hybrid models that contain both techniques, as the limitations of each have been said to complement one another. This project initially involves building DES and SD models separately before finally combining the two models to create hybrid models of maintenance processes.

### Helen Mossop

#### Clustering customers to estimate willingness-to-pay

**Degree**: Newcastle University, MMathStat Mathematics and Statistics (2009-2013)**Supervisor**: Shreena Patel

Simple probability models are often inadequate for describing the data we encounter in reality because of heterogeneity in the population we are attempting to model. One way to overcome this is to use a mixture model which represents the population as consisting of several sub-populations (or clusters), each of which can be modelled by a standard parametric distribution. This project concerns a population of customers each of whom has an (unobservable) maximum price which they are willing to pay for a product, called a referral price. We wish to cluster customers to capture differences in their price-sensitivity by assuming that referral prices are generated by a mixture of normal distributions. Standard clustering techniques will be adapted in order to estimate how likely a customer is to accept future quotes.

### Gwern Owain

#### Resource Allocation problems in queueing theory

**Degree**: Cardiff University, BSc Mathematics (2010-2013)**Supervisor**: Jak Marshall

Queues occur naturally in business and computer science applications. So ubiquitous are queues in various situations that being able to model their behaviour is an essential skill for any practitioner or researcher of operations research. Often it is of benefit to simultaneously manage the flow of work in and out of multiple queues given limitations of service resources. This project introduces the rich theory of queueing systems and presents an opportunity to explore efficient ways of coping with random demands on a system with multiple parallel queues with cost structures imposed on them.

### Stephen Page

#### Prize-Collecting Steiner Travelling Salesman Problem with Time Windows

**Degree**: University of Cambridge, BA Mathematics (2010-present)**Supervisor**: Saeideh Dehghan-Nasiri

The travelling salesman problem is a very well-known optimization problem. This project studies aspects of this problem with additional time window restrictions on the service time of customers and uses a real road network graph. Small scale versions of the problem may be solved using exact optimization techniques. This project looks at solving the problem using exact solution methods and developing and applying a dynamic programming algorithm that provides a lower bound for the problems of a larger scale.

### Paul Sharkey

#### Modelling the North Sea wave climate

**Degree**: University College Dublin, BSc Mathematical Science (2009-2013)**Supervisor**: Ross Towe

Wave height is of inherent interest to oil companies with offshore operations. Through determining the distribution of wave heights, this information can be used to minimise the risk and consequently the cost of future offshore operations. A current consideration is also whether climate change will have an impact on the distribution of wave heights. This project considers extreme value theory for modelling wave heights in the North Sea.

### Faye Williamson

#### Semi-Markov processes in a healthcare setting

**Degree:** Lancaster University, MSci Mathematics (2010-2013)**Supervisor:** Dan Suen

Analysing healthcare systems has been an important concern of healthcare modellers for many years. Understanding patient flows and the number of patients in healthcare systems is an important tool when trying to improve hospital efficiency and, among other things, reduce patient waiting times. This project seeks to highlight similarities between healthcare models and the types of systems multigrade population models are applied to using data from a healthcare case.

### Elena Zanini

#### Parameter Estimation with Particle Filtering Algorithms

**Degree**: University of Edinburgh, BSc Applied Mathematics (2009-2013)**Supervisor**: Chris Nemeth

There exist numerous problems in statistics, engineering, signal processing, etc. which require the estimation of a hidden process. One such example can be found in target tracking, where the aim is to estimate the state of a target (e.g. position, velocity) given only partial, noisy observations (e.g. bearing measurements only). The process of estimating a target's state given only partial, noisy observations is known as filtering. This project involves gaining an understanding of particle filtering techniques and reviewing the current literature before using particle filtering methods to assess various models.

## 2011 Interns

Here you can find details of the summer 2011 interns including a description of their research project.

### Lawrence Bardwell

#### Breast Cancer Screening

**Degree:** Lancaster University, BSc Mathematics (2009-2012)**Supervisor:** Matt Sperrin

Breast density is a substantial risk factor for breast cancer. It can be estimated from mammograms, which are taken regularly for middle-aged women. A breast density reading can then be used to produce individualised monitoring for women (e.g. screening women with high breast density more frequently). However, breast density is estimated by radiologists subjectively. It is of interest to calibrate the breast density readings so that each radiologist’s scores are on the same scale, and assess the consistency of each radiologist. Data is available for the readings made by radiologists: we can attempt to exploit the fact that each mammogram is read twice by each radiologist, and each mammogram is read by two radiologists.

### Elizabeth Buckingham-Jeffery

#### New Penalty Methods for Bilevel Optimisation

**Degree:** University of Warwick, MMath Mathematics (2008-2012)**Supervisor:** Konstantinos Kaparis

Bilevel problems appear in areas such as economics, engineering, medicine and ecology. These types of problems are optimisation problems which include as part of their constraints a second optimisation problem. The upper level (or leader's) problem corresponds to our aim to optimise a certain function. The notion of optimality takes into account the subaltern part of the upper-level decisions. This part is represented by the lower level (or follower's) problem. This project concerns the linear case of bilevel programs.

### David Ewing

#### The ABC of model choice

**Degree:** University of St Andrews, MMath Mathematics and Statistics (2008-2013)**Supervisors:** Dennis Prangle and Paul Fearnhead

While Approximate Bayesian Computation (ABC) is now well-established for estimating parameters, its use for model-choice is still in its infancy. There have been recent papers disagreeing about whether ABC can be used for model-choice, and if so how it should be implemented. This project looks at some simple applications, to see whether ABC can give reliable inferences about the underlying statistical model; and if so, how to implement ABC so as to infer the model as accurately as possible.

### Thomas Facer

#### Choice Modelling with Links to Optimisation and Compressed Sensing

**Degree:** University of Edinburgh, BSc Mathematics (2008-2012)**Supervisor:** Arne Strauss

In many business applications, frequent decisions need to be made that depend on the choice behaviour of customers. For example, e-retailers such as Amazon.com must decide on the assortment of results to display in response to a customer query; airlines or hotels need to decide on the available booking classes to display in response to a customer request. Similar situations arise for many other firms. An often-used approach to choice modelling is to identify product attributes that influence the customer’s decision and to select and calibrate a structural model based on these attributes that fit the observed data. Recently, an intriguing way was proposed to learn a choice model from data using concepts from Revenue Management, Inventory Optimisation and Compressed Sensing. This project gives an insight into these respective fields whilst working on a topic that is currently at the forefront of research and has wide applicability.

### Liam Fielder

#### Forecasting using time series methods

**Degree:** Lancaster University, MSci Mathematics with Statistics (with a year abroad: Australia) (2008-2012)**Supervisor:** Robert Fildes

One of the most important applications of statistics is the time series forecasting. The key application area is to forecast demand (for a product or service). This project gives an introduction to the area of business forecasting using a newly written textbook. It includes some software testing (and development if appropriate) as well as the evaluation of different methods on test problems.

### George Foulds

#### Portfolio Optimisation

**Degree:** Lancaster University, MPhys Physics First Class (2005-2010)**Supervisor:**Jonathan Tawn

The aim of any investor is to maximise their return. The highest return must be for a given amount of risk, or equivalently the risk must be minimised for an expected return. A mixture of analytical and simulation-based methods will be used to derive the properties of a portfolio and consequently the weight of investment that is given to each individual asset.

### Kaylea Haynes

#### Compressed sensing methods for problems in statistics

**Degree:** Heriot-Watt University, Edinburgh, BSc Mathematics and Statistics (2008-2012)**Supervisor:**Matt Nunes

Compressed sensing (CS) has recently emerged as an important area of scientific research for efficient signal sensing and compression. The main idea behind CS is that certain signals will be able to be entirely constructed using numerical optimisation algorithms from a relatively small number of “well-chosen” signal samples. This project is exploratory in nature and it provides the opportunity to learn about and research the area of compressed sensing, focussing on the role of CS in statistical applications for particular signals of interest.

### Clive Newstead

#### Parametric inference for missing data problems

**Degree:** University of Cambridge, BA Mathematics (2009-2012)**Supervisors:** Giorgos Sermaidis and Paul Fearnhead

A typical complication in parametric inference for missing data problems is the intractability of the likelihood. A well-established approach to maximum likelihood estimation is the simulated likelihood, where estimation is based on the optimisation of an unbiased Monte Carlo estimate of the likelihood. An important drawback, however, is that parameter consistency is achieved only when the Monte Carlo effort increases as a function of the data sample size, thus leading to computationally expensive algorithms. The aim of this project is to tackle this problem by constructing unbiased estimators of the log-likelihood, in which case consistency can be achieved even for fixed Monte Carlo size. The project involves standard techniques for Monte Carlo simulation and unbiased integral estimation and programming in R.

### Ragnhild Noven

#### Analysing the structure of (multivariate) time series

**Degree:** Imperial College London, MSci Mathematics (2008-2012)**Supervisors:** Karolina Krzemieniewska and Matt Nunes

Time series that are observed in practice are often highly complex in nature, for example, accelerometry signals arising from human movement experiments. The underlying behaviour of these signals is sometimes hidden or difficult to detect in the first instance. This project focuses on applied data analysis for complex time series and using statistical techniques to investigate changes in the underlying structure of time series. The project involves analysing real-world data arising from investigative health studies conducted by external collaborators.

### Robert Stainforth

#### Stochastic actor-based models for network dynamics

**Degree:** Durham University, MSci Mathematics and Physics (2008-2012)**Supervisor:** Stephan Onggo

A stochastic actor-based model is a model for network dynamics that can represent a wide variety of influences on network change, and allow us to estimate parameters expressing such influences, and test corresponding hypotheses. The nodes in the network represent social actors, and the collection of ties represents a social relation. The project involves reading and summarising the relevant research literature on stochastic actor-based models, learning how to use RSiena, preparing a set of data, and applying the technique to the data.

### Ivar Struijker Boudier

#### Exploring a new class of probability models for tail estimation in extreme value modelling

**Degree:** University of Glasgow, BSc Statistics (2008-2012)**Supervisor:** Ioannis Papastathopoulos

Statistical modelling of extreme values plays an important role in understanding the behaviour of unusual events such as extreme weather conditions, earthquakes and financial crashes. The most common approach to the modelling of extreme values is to fit an appropriate probability distribution to the tail of the data and extrapolate it to levels above which no data are observed. This class of distributions is called the generalised Pareto distribution which contains the Exponential distribution. However, fits finite samples are not always adequate and more flexible models might be appropriate. The project explores a new class of probability models that incorporates existing models as special cases. The project involves exposure to the theory of extremes, simulation studies for the applicability of the new models and the statistical analysis of a medical dataset.

### Lisa Turner

#### Facility layout

**Degree:** Durham University, MMath Mathematics (2008-2012)**Supervisors:** Yifei Zhao and Stein W. Wallace

Facility layout

Facility layout, in its simplicity, is about where to place different machines on a production floor in situations where the use of conveyor belts is not possible because the different products do not all visit all the machines and, even if they did, not necessarily in the same order. So transportation of the products from machine to the machine can be complicated if the machines are far apart. In fact, it can result in total chaos. The ultimate goal is to place machines close to each other if it is likely that products need to be transported between them. The problem we study is simply: how should the machines be placed on the production floor?

## 2010 Interns

Here you can find details of the summer 2010 interns including a description of their research project.

### Helen Blue

#### Optimisation on road networks

**Degree:** Lancaster University, MSci Mathematics (2008-2012)**Supervisor:** Richard Eglese

There are many problems that involve optimising an objective that is relevant to journey planning over a road network. The first part of the project will be to review some of the existing methods for finding the shortest (or least cost) paths in a network. The second part of the project is to develop an effective algorithm for finding the least cost path between two points where the speed and cost of travelling along an arc depending on the time of day.

### Rhian Davies

#### Investigation of Approximate Bayesian Computation

**Degree:** Lancaster University, BSc Mathematics (2008-2011)**Supervisor:** Dennis Prangle

For many complex phenomena, fitting realistic statistical models is mathematically intractable by standard methods. A recent computational alternative is to repeatedly simulate the model to find good fits. This project investigates one such method (Approximate Bayesian Computation) on data from a Tuberculosis outbreak. The aim is to assess various implementations of this method through computer experiments, which will involve exposure to modern statistical methods and software.

### Jamie Fairbrother

#### Multi-scale methods for texture analysis

**Degree:** University of Warwick, MMath Mathematics (with a year abroad: Europe) (2006-2010)**Supervisor:** Idris Eckley

Wavelets are a recent and powerful mathematical tool which were developed in the 80s. They provide a novel way of decomposing the information within signals and images, providing information at various scales (you can think of these as viewing windows). Texture analysis is a particular application area in which wavelets have been successfully used in recent years. Broadly speaking the texture of an image is the visual character of a region whose structure is, in some sense, regular (e.g. the appearance of a woven material). This project will investigate the potential of wavelets and related methods to modelling structure within textured images.

### Dave Grant

#### Time-Dependent Queueing Systems

**Degree:** University of Manchester, MMath Mathematics (2006-2010)**Supervisors:** Navid Izady and Dave Worthington

In general, in the area of mathematical modelling, modellers often make simplifying assumptions in order to make a problem ‘solvable’. In doing so the modeller is hoping that the solutions produced by the simplified model will nevertheless be valid (in some sense) despite the simplifying assumptions. Important examples in the area of modelling queueing systems are, for example, call centres, accident and emergency departments, hospital emergency admission units, intensive care units. Our interest is in modelling aspects of such queueing systems that typically exhibit time of day (and possibly day of week) variations in their underlying arrival rates of ‘customers’ as well as the usual stochastic variation in arrival times and service times.

### Rachael Griffiths

#### Dynamic modelling for wind-prediction

**Degree:** Lancaster University, MSci Mathematics with Statistics (with a year abroad: Australia National University) (2007- 2011)**Supervisor:** Ben Taylor

Dynamic linear modelling is a technique for the analysis of time series data when the governing parameters of the model themselves evolve over time. In particular, it is easy to obtain predictions using these methods. This project concerns the short term modelling and prediction of wind speeds and hence power output at wind farms. The application is important in deciding whether a potential new site will deliver an acceptable amount of energy.

### Dominic Hickie

#### Pricing on-demand online services

**Degree:** Lancaster University, MPhys Physics with Particle Physics and Cosmology (2007-2011)**Supervisor:** Chris Kirkbride

Cloud computing is a relatively new concept for Internet-based computing in which resources, software, information and applications are provided to user devices (PC, laptop, mobile) on-demand. This project will consider various models for the cloud environment in order to determine how resources can best be utilised to meet demands for service and how to price such services effectively.

### Samantha Hinsley

#### Examining the applicability of a new technique for threshold selection in extreme value modelling

**Degree:** Lancaster University, BSc Mathematics (2008-2011)**Supervisor:** Jenny Wadsworth

It is the extreme values that are important in many applications, such as flooding, stock market crashes, and wind storms. To estimate the frequency of extreme events a statistical model is fitted to the extreme values and extrapolated to the value of interest. This project is concerned with investigating appropriate probability models for “extreme values”, or more precisely the tails of a probability distribution. However there is a challenge in defining what makes a value “extreme”, i.e., from what point should we begin to model the tail? The project will look at examining the applicability of a new method for helping to define a suitable threshold. This project will involve mathematical computation and exposure to real-life problems using a variety of different data sets.

### Nicola Huxley

#### Detecting changes in mean

**Degree:** Lancaster University, MSci Mathematics (with a year abroad: Australia National University) (2007- 2011)**Supervisor:** Rebecca Killick

In recent work, we collaborated with a company to identify whether there was a change in storminess in the Gulf of Mexico. This project arises out of this work. Detecting changes in properties, such as the mean, of a process are important in many other areas of research such as quality control. Although there are many algorithms designed to detect changes in mean, there has been little comparison of the performance of these algorithms. This project will provide an opportunity to research different algorithms, program them and then conduct simulation studies to test their performances under various circumstances.

### Robert Maidstone

#### The Change-Making Problem

**Degree:** Lancaster University, BSc Mathematics (2008-2011)**Supervisor:** Adam Letchford

The Change-Making Problem is concerned with finding the minimum number of coins needed, in a given currency, to reach a certain amount. Suppose, for example, you are in Britain and you wish to give somebody 39p. The minimum number of coins needed is five (20p, 10p, 5p, 2p, 2p). If you were in the US and you wish to give somebody 39c, the minimum number of coins is six (25c, 10c, 1c, 1c, 1c,1c). This topic may seem, at first sight, to belong to recreational mathematics but it is in fact a classical operational research (OR) problem with many applications.

### Tim Park

#### Non-stationary time series analysis

**Degree:** Lancaster University, MPhys Physics (with a year abroad: North America) (2006-2010)**Supervisors:** Idris Eckley & Matt Nunes

Most signals (i.e. time series) observed in the real-world are non-stationary in their nature. This project will explore the behaviour of datasets related to financial data. We will investigate the structure of these signals using wavelets - a form of localised basis functions. The project will give an opportunity to learn about wavelets, their application to time series and provide the experience of conducting advanced exploratory data analyses.

### Emma Ross

#### Facility layout

**Degree:** University of Edinburgh, MA Mathematics (2007-2011)**Supervisors:** Yifei Zhao and Stein W. Wallace

Facility layout, in its simplicity, is about where to place different machines on a production floor in situations where the use of conveyor belts is not possible because the different products do not all visit all the machines and, even if they did, not necessarily in the same order. So transportation of the products from machine to a machine can be complicated if the machines are far apart. In fact, it can result in total chaos. The ultimate goal is to place machines close to each other if it is likely that products need to be transported between them. The problem we study is simply: how should the machines be placed on the production floor? To do this we shall solve numerically small cases of the problem so as to try to understand the emerging structures (designs).

### Ben Sloman

#### Selecting a portfolio in finance

**Degree:** University of Oxford, BSc Mathematics (2009-2012)**Supervisor:** Ye Liu and Jonathan Tawn

In finance, the aim is typically to make as much money as possible while incurring as little risk as possible. One way of reducing the risk is to hold a selection of investments (a portfolio). However, as some investments are correlated then statistical methods are required to find the best way of balancing risk and expected return. In this project, you will explore the basic assumption that returns of investments are multivariate normal using a range of financial data and investigate some extensions of this assumption which are more realistic and result in better decision making in optimising the portfolio choice. The project will involve a real problem with real data, the need for statistical modelling, simulation and optimisation.

### Michael Thistlethwaite

#### Agent-based Physical Asset Maintenance Simulation Modelling

**Degree:** University of Birmingham, BSc Physics (2008-2011)**Supervisor:** Stephan Onggo

Physical assets such as houses, motorways/roads, water pipes and electrical networks need maintenance because the condition of a physical asset deteriorates with time and usage. The risk of an asset failure (e.g. flooding) or not being able to provide the required service quality (due to weak water pressure) increases as the assets condition decreases. The cost of a repair/replacement process, including the liability incurred due to an asset failure, can be very high. Therefore, a good maintenance strategy is needed. In this project, we will use one of the least explored OR modelling techniques for evaluating asset maintenance strategies, that is, an agent-based simulation model.

## 2009 Interns

Here you can find details of the summer 2009 interns including a description of their research project.

### Anna Fowler

#### Detecting changes in regression for time series: a review and application

**Degree:** MSci Hons Mathematics with Statistics/North America-Australasia at Lancaster University**Supervisors:** Idris Eckley and Rebecca Killick

This project aimed to detect changes in regression (trend) in these datasets using industrial data sets, including several variables, provided by Unilever. Several existing methods for detecting changes in the regression were investigated, including (normal) maximum likelihood (with and without penalty), residual sum of squares and cumulative sum of squares before conducting a simulation study looking at their effectiveness. From this simulation study, the most appropriate algorithm was chosen using statistical methods and finally, the algorithm applied to the various industrial datasets. Anna produced a technical report of the findings and had the opportunity to present to statisticians at Unilever in Amsterdam.

Anna is now pursuing a PhD at Imperial College, London.

### Jak Marshall

#### Optimal Control Policies in adjustable queue systems

**Degree:** MSci Hons Mathematics/North America-Australasia at Lancaster University**Supervisor:** Kevin Glazebrook

Countless industrial processes include some variety of queueing system, for example, telecommunications and transport. Problems regularly arise in how queue operators manage the demand for their services. The challenge is to find an optimal way of allocating resource towards providing service across a collection of independent service stations serving customers in corresponding queues given the delicate balance of overspending on service infrastructure versus underspending and incurring costs due to system neglect. The approach to solving this problem relies heavily on computation and a good understanding of queueing objects in order to simulate an ideal queueing system. The key outcome of this project was to deliver a near-optimal method of managing queueing systems by considering a case study involving queues with only limited modes of service available at any time.

Jak joined STOR-i in 2010 to pursue a PhD in STOR.

### Erin Mitchell

#### Queueing Systems and Optimisation of Computer Component Repairs

**Degree:** BSc Hons Mathematics at Lancaster University**Supervisor:** Kevin Glazebrook

Repair companies often offer a promise of a turn-around period in which a faulty product will be repaired and returned to the customer, ensuring optimal customer satisfaction. In the majority of cases, the repair company will not complete all of the repairs themselves, if any at all, but will instead outsource the work to several different sub-companies. Upon receiving a broken product, a computer for example, the repair company must then decide to which of its contracted sub-companies to send the machine. Company A, for example, maybe a larger, more specialist or more equipped business, and as such may be able to perform a given repair a lot quicker than Company B or C. If a company has a quick turnaround on their repairs, it may be desirable to send more broken machines to them than to the other companies. However, a balance must be struck between using the ‘best’ company and making efficient use of all the resources. With different companies being different distances away from a location (the repair company warehouse, for example), the time taken for travel and dispatch must also be considered. Taking into account all of these different factors, a model can be built in order to decide how many repairs to send to each company. Once a basic model has been designed, different probabilities can be assigned to factors, such as the probability of machine breakdown and machine repair, in order for choices and allocations to be made in the most intelligent manner.

Erin started a PhD at Lancaster University in collaboration with Garrad Hassan in 2009.

### Daniel Suen

#### Graphical modelling of divergence weighted independence graphs in the Criminal Justice System

**Degree:** BSc Hons Mathematics at Lancaster University**Supervisor:** Joe Whittaker

Graphical models show how the relationships between several variables can be shown in graphical form. This project required learning the theory behind divergence weighted independence graphs and the modelling of such graphs using the statistical package, R. A key part of the research focused on illustrating how these graphs can be used to identify relationships between factors which affect trust in the Criminal Justice System. On completion of the internship, Daniel produced a comprehensive scientific report including applications using British Crime Survey data.

Daniel joined STOR-i in 2010 to pursue a PhD in STOR.

## Blogs

Click on the links below to see the blogs written by each cohort.

## 2019 Blog

Here you can find out more about the STOR-i internships experience as told by the 2019 STOR-i interns.

### Week 1

#### Written by Joe Holey and Katy Ring

At the weekend we moved into the building in Furness college where many of the STOR-i interns are staying for the duration of the internship. Katy was impressed with the campus environment (particularly the large duck community) having never lived in student accommodation before.

The first day of the internship started with some introductory talks from STOR-i director Jonathan Tawn. We then did some icebreaker activities to get to know the other interns and some of the MRes students better. These included trying to untie a human knot which wasn’t our forte and team juggling which suited our skillset better.

Then we got to the highlight of the day – the buffet lunch! The feast lasted for 2 whole hours and much merriment was had by all.

On Tuesday, we got under way with our work, beginning with an R workshop (more of these followed throughout the week) and meeting with our respective supervisors who introduced us to their work and our projects. Following the day’s work, a few of us joined some members of staff to enjoy a game of football in the sun.

On Thursday, we each gave a 5-minute presentation about ourselves. We were surprised to find that we weren’t just giving these presentations to each other but seemingly to all STOR-i staff members and their extended family. These were the perfect opportunity to see embarrassing baby photos, cute pets and Shyam’s hairstyle woes. After the presentations we went on a scavenger hunt organised by the MRes students, which was a great opportunity to get to know the campus better. There were several opportunities to gain bonus points throughout the event, including getting a photo of someone in your group getting soaked in the fountain by the great hall – Joe gladly obliged by sticking his head right in the water. At the end of the scavenger hunt all the groups met up for a barbecue which featured loads more free food as well as some frisbee and more football.

The working week ended with us going to our first STOR-i forum which was presented by Sam Tickle. These are an opportunity for members of staff to share their work with the rest of the department and for everyone else to learn something about a field of STOR that they may not be familiar with.

Finally, some of the interns attended an “Applied Probability Night” on Friday where we were able to test our poker skills against each other.

### Week 2

#### Written by Matthew Darlington and Shyam Popat

After work on Monday there was badminton, which was a good chance to get to know some of the PhD students in a relaxed atmosphere. On Tuesday there was football as usual in extremely hot conditions, and also a meal and pub quiz organised in Lancaster. We split up into three teams, one team got to win a prize for the most average team and were awarded £10 and a curly wurly. We also had the R course where we competed with our code for the travelling salesman problem, and a box of celebrations for the winners.

During the middle of the week we had a break from the activities which was a good opportunity for us to make progress on our projects.Thursday afternoon we made R code to play noughts and crosses and then we had a tournament with chocolate again for the winners. There was a bit of controversy with the final results as once team had accidently made their method to overwrite the other teams moves! On Friday we had the second forum by Anja Stein, who talked about her work on recommender systems, followed by tea, coffee and biscuits in the hub.

At the weekend, there was a trip to climb up Scaffel Pike. One of the two cars got lost on the way there and ended up going up hardknot path which is one of the worst roads in Europe! It was a very steep climb up to the top, but once there it started to rain for the remainder of the walk. Hard work but worth it at the end with a pub meal back at the Boot and Shoe.

### Week 3

#### Written by Connie Trojan and Shyam Popat

On Monday, we moved into our new base room in STOR-i, right next to the kitchen and our endless coffee supply. We spent (wasted) some time moving our tables and sofas around to create group working and social areas, including a ‘mini-hub’ where we promptly enforced a mandatory 11am coffee break.

Since we had already learned all there was to know about R, on Tuesday we started an introduction to LaTex, learning the basics of document and presentation creation. We once again put in an appearance at the pub quiz, where one our teams tied for second place.

On Friday, we attended a PhD talk from Jess on modelling categorical data. We ended the week with a pub crawl in the city centre, taking full advantage of National Pub Fortnight to claim free pints at the White Cross.

### Week 4

#### Written by Liv Watson

Realisation that we were already coming up to the half way point of the internship this week started Monday morning with a sinking feeling, but then Matt pulled out some homemade banana bread and all was well with the world again. Stressed-out nervous laughs about broken code could be heard in the base room showed that we all were wondering how we’re going to get our R code working correctly in time.

Tuesday brought talks from some of the MRes students – Chloe, Aimee, Graham, Drupad and Thu – all of which were highly interesting and a welcome break from working on our own projects. The afternoon break was spent figuring out the picture round for the quiz that night, and the celebration that occurred when it was finally figured out was a mighty one. That evening, we decided to take advantage of Study Rooms Tasty Tuesday offer of 50% off mains before heading to the White Cross for the weekly pub quiz. Whilst we didn’t win the overall quiz we did win the most calorific prize they have had – so really who won here!? Still not us.

On Wednesday we had our final LaTeX workshop, where we learnt how to create posters so we would be fully equipped to make our end of project posters. It also featured a viewing of first year PhD students’ posters, where we had to say what we liked and disliked about their posters – all I can say is that I hope they are nicer about ours then we were about theirs!

Friday’s forum was a really engaging talk given by Matt Bold all about his *BIG new scheduling problem *– the decommissioning and safe clean-up of legacy nuclear waste at the Sellafield nuclear site in West Cumbria. That evening a couple of interns headed down to the bar to watch the kick off of the Premier League and to play some pool, I went home and played with the puppy...

Sadly, we had to postpone our planned trip of boating and a picnic in Coniston due to the harsh weather in the Lake District on Saturday. Instead we had a boardgames day in the Hub and left the brave(?) to tackle the storm.

### Week 5

#### Written by Katie Dixon

On Tuesday, a handful of the PhD students held a session titled ‘Life as a PhD student’. This was an opportunity to hear first hand what to expect should we choose to continue our studies as a STOR-i student and it allowed us to ask any questions that we had. In the evening, we managed to put together a team for the pub quiz. The team came in third place and they were only 3 points off winning the whole quiz!

As usual, on Friday we attended the STOR-i Forum but this week there was a twist. Instead of the normal set up (one 30-minute presentation), we were given mini lectures from a range of the STOR-i team where they had to outline their area of study in a maximum of pi minutes. Following the forum, we all attended a bake sale in the hub where the fantastic bakers managed to raise over £150 for Mind.

On Saturday, some of the interns chose to tackle Fairfield Horseshoe. In classic Lake District style, they managed to experience all four seasons in one day – some even happening at the same time! This was followed by a trip to Grasmere where Dylan bought 30 bits of the notorious gingerbread all for himself!

### Week 6

#### Written by Matthew Gorton

We finished Tuesday with a barbecue to celebrate Katy’s birthday. During this, Shyam came up with the innovation of barbecued curly fries, which I think it’s fair to say proved a mixed success. In a last-minute moment of ingenuity, Katie and Liv improvised a birthday cake by sticking a candle into the last remaining bread roll. We managed to get everything cooked and eaten just before it started tipping it down! Four of us, plus Sam, then went to the pub quiz. We were sadly unsuccessful prize-wise, but had fun, nonetheless.

Wednesday was not a normal working day, instead we had a 'problem-solving day’. Our task, set by Rob Shone, a researcher in Management Science, was to find a solution the problem of scheduling aircrafts at an airport. It turns out that this is a very complicated task!

Airports can only have a certain number of planes taking off and landing within a certain time period (say, per hour). So, airlines bid on arrival and departure times. Our task was to come up with a method to schedule arrival and departure times that minimises the ‘displacement’ – the difference between the time requested and the time assigned.

All three groups ended up coming up with different ideas and solutions, and we all thought of other issues that you might need to consider: leaving time for bad weather or emergencies when scheduling, different types of aircraft requiring different turn-around times. Rob told us that we had exceeded his expectations, and he even asked us to send our slides to him for them to look at for ideas! Quite impressive for a bunch of interns only working for a single day!

Friday’s forum was given by Livia Stark. Her work is trying to narrow down sources of information to be used by intelligence agencies in a novel way, which was of particular interest to myself as we are both investigating the same technique for solving problems (multi-armed bandits).

Straight after work on Friday, the interns went to Spaghetti House for dinner. We managed to get their early enough to take advantage of their ‘Happy Hour’, giving us pizza or pasta for £5.75. A lovely way to finish the week!

### Week 7

#### Written by Dylan Bahia and Jack Trainer

This week, most of our time was spent trying to knuckle down and get our posters finished so that they could be printed ready for the poster session next week. This meant that most of the week (except the sacred bank holiday) was spent in the STOR-i base room. Of course, we still had our usual half hour on Tuesday morning trying to decipher this week’s clue for the pub quiz and those who attended football on Tuesday had the privilege of seeing STOR-i footballing legend Harjit score his last goal ever. A slow, tough week was all worth it however as we had the opportunity to unwind at the annual STOR-i ball.

The whole department gathered at Lancaster Golf Club for an evening of food, drinks, magic and karaoke. The evening began with a delectable three course meal, accompanied by a small selection of aromatic wines. During this, a magician circulated the tables, flabbergasting us with acts which could be nothing other than sorcery. As the meal became ever more evanescent, Jon entertained us with his light-hearted speech, immediately followed by an awards ceremony. The most notable award was the Tickle Sam award, praising his contribution to the STOR-i department. The mingling of the guests then ensued, with the bar helping to facilitate conversation and creating a night to remember (or in some cases forget). With the highlight of the evening on the horizon, it was time for the guests to muster their brethren and deliver a spectacular karaoke performance. The combination of singing, dancing and laughter birthed a night nothing short of perfect. As the hour past witching hour approached, the guests said their farewells and went home, thus concluding the night. There were some exceptions, who continued the night by sauntering toward the city.

### Week 8

#### Written by Gwen Williams

We started the week making the final edits to our posters, which were intended to summarise our project and findings. We had been warned some STOR-i members had very particular views about poster-appearance, and so making sure our text boxes were correctly aligned was given high priority.On Tuesday, to celebrate submitting our posters, the majority of interns headed into town for the last weekly pub quiz. Although our general knowledge may have let us down, our luck did not, and we managed to win some free drinks.

The rest of the week was spent preparing our presentations. Summarising seven weeks of work in a 10-minute presentation proved challenging, however feedback from other interns during practise sessions made this a lot easier.On Friday morning everyone gave their presentation, with a brief interlude for coffee, of course. Once the presentations were over we all breathed a sigh of relief and went for a celebratory go burrito lunch.

That afternoon was the poster-session, fuelled by a generous spread of sweet treats. We each stood by our poster while members of STOR-i had the opportunity to walk around and ask us questions (or measure how well our text boxes were aligned). This was a great opportunity to talk in some more depth about our research, as well as to say goodbye to members of the department before we all left.

After all the excitement of the presentations and poster session, we were hit by the sad realisation that the internship had come to an end. For a final goodbye, we headed off to a bar, before going back to one of the flats for a big meal. The meal was intended to use up our leftover food before we moved out the next day. Despite having a somewhat unusual set of ingredients, the chefs (thank you) made some delicious tacos. After the meal, we said our goodbyes and wished each other well going back to our different universities. While we were sad to be leaving, we felt very grateful to have spent a fantastic summer as interns at STOR-i and to have been made to feel so welcome by everyone there.

## 2018 Blog

Here you can find out more about the STOR-i internships experience as told by the 2018 STOR-i interns.

### Week 1

#### Written by Peter Greenstreet

I moved into the STOR-i flat on Sunday and within 3 hours I had met 8 of the interns and we started to get to know each other. The Monday began with an introductory talk from Jonathan Tawn, then we had an IT session with Oli who set us up and gave us all laptops! (Sadly only for the 8 weeks internship). We all quickly discovered Oli was a master of all tech, as he can sort any problem out. Next up we had a team-building session which began by holding hands and getting knotted together. This was followed by trying to make a square with some rope whilst blindfolded, which ended up having only 2 corners and not one side of equal length. Finally, we finished with a game where we had to call each other different vegetables. It was all a great laugh and also I really got to bond with both the interns and the Masters students. Next up was a 2-hour lunch with FREE food where we also got to meet our supervisors. Everyone was super chatty and friendly. Following this was another lecture and we finished with a university tour. After this, some of us headed to the sports centre to play badminton with PhD students, who were really good. For the next 3 days, we all met with our supervisors to discuss our projects and find out what we needed to learn for the first couple of weeks. We also had lab sessions on both R and LaTeX which helped refresh my memory as well as teaching me new skills in both. On Tuesday night we went to the legendary White Cross quiz night and one of the STOR-i teams even managed to come second! Thursday started with some more R followed by our introductory presentations which contained loads of cute baby photos. This was followed by a great scavenger hunt. We were split into teams of 4 with 2 interns and a PhD and Master student in each. We were given a list of things to find, as well as challenges all around campus. It was great fun. Then we had a barbeque with lots of food. However, we did lose a sausage to the ducks! On Friday we had a presentation from Sarah about her PhD project followed by cake and then some more time to work on our projects and meet with our supervisors. That evening some of us went to the gym and the others went to a poker night. On Saturday we went for a 3-hour walk and then bought some famous sticky toffee pudding. It was a great opportunity to get to know some of the MRes and PhD students. Then some of us had a roast that evening followed by our lovely sticky toffee pudding.

### Week 2

#### Written by Mason Pearce

We started the week by working on our second challenge using Rstudio in our assigned groups, the challenge was to code a strategy to win a game of tic-tac-toe, we then simulated 1000 games against using our strategy against the other groups. It was very close, group 1 drew with everyone but group 2 won more when playing group 3 and they were the overall victors receiving a giant Toblerone as their prize. Later in the day we moved over to our new base room and got settled in. In the evening some of us went down to the sports hall to play badminton with the MRes and PhD students.

The next day began with presentations from first-year students on project ideas for their PhD, this gave us a taste for the different areas of research that goes on at STOR-i. In the afternoon we were taught how to make beamer presentations and posters in LaTeX to prepare us for the later weeks, a few of us then went to play football. Later on the in the evening we attended The White Cross weekly quiz, splitting into two teams. One of the teams even won a gallon of beer!

We all met with our project supervisors again later in the week and the following days were spent working hard on our individual projects in the new base room. Most of us using the skills we had been taught in Rstudio to code what our supervisor had asked us too, whilst some of us used Python as it is a more suitable programming language for the project-related tasks. Although we were working on our own topics, there was plenty of talking and sharing ideas and lending people a hand if they needed.

On Friday, Sam organised a board game night at Pizzetta on campus, we all attended and a lot of the PhD students came too, which was nice. At the weekend we had planned to go on a trek up Scafell Pike in the Lake District, but due to the weather, we decided to postpone and instead went to escape rooms in the city centre. We were trapped in a jail cell accused of being witches and if we didn’t solve the puzzles to escape we would be ‘left to rot’, we had one hour. At first, we made great time, but towards the end, the puzzles got more difficult and slowed us down, we just managed to escape with only two and a half minutes spare!

### Week 3

#### Written by Niamh Lamin

Week three was a much quieter week in terms of scheduled academic activities but this gave us all a good chance to get our teeth into our projects. I spent most of the week studying the types of inequalities produced by a program called PORTA for optimisation problems. This involved the production of three of four items with start-up costs associated with machines involved in the production of various sub-sets of these items. Even though my supervisor was away, I found this wasn’t a problem because she was always available by email or phone if I got stuck or needed to ask any questions.

As well as speaking with our individual supervisors, we also had a group meeting on Friday afternoon. As in the previous weeks, I found this meeting really useful as it gave me a chance to explain what I had been working on that week to some of the other interns. As well as giving us all chance to find out about the interesting projects the others were working on, I found that explaining my progress helped me to consolidate and check my own understanding and provided useful practice ready for the presentations at the end of the programme.

On Friday morning, we had a STOR-i Forum with a difference - rather than just having a presentation from a single PhD student, we were treated to a series of ‘Pi Minute Theses’. Each PhD student had exactly three minutes and fourteen seconds to introduce us to their research topic. I really liked this format as it meant we got to hear about a wider range of different projects and the general introductions were easier to follow than the more detailed presentations of the previous two weeks. All the projects sounded really interesting but I particularly enjoyed Emily’s presentation about Combination Therapies and how information could be borrowed between similar combinations of drugs to decide which ones to investigate in clinical trials.

Even though the academic timetable was slightly less hectic, the social calendar was just as full as normal so there was a lot to entertain us all in the evenings. The weekly badminton and football sessions continued, as well as the pub quiz on Tuesday evening but there were also some special activities. For example, a group got together to watch the Love Island final on Monday evening and a group of us went for an impromptu ice-cream from Walling’s in Alex Square on Wednesday afternoon- it was the best chocolate-chip ice-cream I’ve ever tasted, though I fear I managed to get more of it down my shorts than actually in my mouth!

To round off the week, there was a bar crawl Friday evening. I’d never been on a bar crawl before but I actually really enjoyed it. We visited some really nice pubs and it was a great chance to socialise and spend time with some of the MRes and PhD students as well as the other interns (that’s one of the things I love about STOR-i there are always plenty of chances for integration between year groups which creates a great atmosphere). We started at the Water Witch and then made our way down into town visiting a series of pubs on a route planned for us especially by Tom and Alan. Though I was really having fun, since I was quite new to these sort of events, I decided to head home after the third pub, especially since the next destination was one whose name would strike fear into even the bravest of souls (which I most certainly am not)- The Pub!

Anyway, I have it on good authority that everyone made it back safely and the event was definitely a success.

### Week 4

#### Written by Sean Hooker

Week fours timetable again provided the interns with the possibility to focus on their projects with lots of time available for independent research. My project involves identifying points in a time series where there has been an abrupt change in its properties, such as a change in the mean or variance. I spent the past week building on techniques that I had coded previously and developing these into computationally more efficient methods.

This culminated in running my chosen method over multiple simulated time series all of the differing lengths. The main measurement I was comparing was the speed of the algorithms. The code took a little longer than expected to get through all the sets of data but I got a nice looking graph out of it and plenty of ideas for improvement.

I’m beginning to feel accustomed to the weekly activities of STOR-i members. Tuesday was football, it was a good turn out from the interns, as well as the regulars, this week and after an exhausting 90 minutes, the match ended with a close score. Also on that evening was the pub quiz at the White Cross pub in town, STOR-i fielded two teams whose members spent the night answering questions on topics from Pokémon to world records on blowing balloons and pretty much everything in between.

The rest of the week flew by, with the occasional hangman session to break up some of the days in our base room. This week’s edition of the Friday Applied Probability (poker) night was held in the intern’s flat and the home advantage was clear with Mason winning the night.

Saturday was the main event of the week with a hike up Scarfell Pike, this had already been cancelled once due to bad weather and it’s clear why even on a (mostly) bright and clear day this was a challenge. The entire group, made up of interns and PhD students, made it up and down before daylight fell, but they didn’t quite miss the rain, however. But this provided the group with some picturesque scenes of the mountains and the drizzle. Their impressions of the hike are currently skewed with the mental images of them all climbing down a mountain in the heavy rain, but given time, and a few more warm drinks, they’ll be able to reminisce what will be the main achievement of the internship so far.

### Week 6

#### Written by James Price

In terms of the project, Week 6 appeared to be a bit of a breakthrough week for a lot of the interns. With the prospect of the presentation and poster session looming, a lot of the work towards the project has taken shape and overall end goals are being achieved.

My project is on finding heuristics for real-time railway rescheduling. I’ve spent the past few weeks exploring various methods for finding the shortest path through various graphs and so last week was spent finding ways to measure both how good the methods were and how long they took to calculate their chosen route through the graph. The results were encouraging and allowed me to observe where certain methods could be refined further. This week saw the addition of a shadowy character to the intern’s base room, the puzzle-maker. This man (or woman) of mystery would leave us a new puzzle every day which, usually after a few hours of head-scratching, lead to a piece of paper hidden somewhere in the room containing a five-letter word. The ingenuity of these clues ranged from noticing a blue arrow pattern in a grid of chairs to colouring numbers on a grid according to an extensive set of rules. This all culminated in having to ask a specifically worded question at the Friday forum. And as if by magic, our prize, in the form of a cake, appeared in the base room.

I really enjoyed the puzzles, which the other interns can testify to due to my regular wonderings around the room to peer under a table or on a ledge. However, I discovered a quite a few new hiding places which will come in useful should my supervisor unexpectedly turn up asking why I got no work done this week.

The regular White Cross Pub quiz on Tuesday’s was also a triumph, which due to Sam Tickle’s beautifully obscure knowledge of the Enid Blyton’s ‘Famous Five’ novels in the Pointless round resulted in a tidy cash prize for the entire team. I guess you could say we had a wonderful time*.

The week closed with the big social event, the STOR-i ball, this year held at Lancaster Golf Club. This full-on night contained a group Ceilidh, complete with skipping, clapping and of course plenty of spins, and also a wonderful three-course meal, although thankfully not in that order. And then just when I thought it was all over, it turned out the night was only beginning. There was a quick taxi ride into town and before I knew it I was in Hustle nightclub, still in a full suit, having a great time.

I’ve managed to block from my memory the time when I finally got to bed, but if anything that’s the sign of a fabulous night.

*the joke is left as an exercise for the avid reader.Read on to find out more about the STOR-i internships experience as told by the current STOR-i interns.

### Week 7

#### Written by Kostya Siroki

The week started with an enjoyable day-off. But it didn’t make Monday any less entertaining, as the Murder Mystery Day took place. Three STOR-i teams participated in it. The aim was to find a “murderer” by answering questions, exploring Lancaster and collecting evidence from witnesses. All the participants found this event fascinating. Moreover, good results were achieved. The “Mafiamaticians” won best costume prize, also team “STORlock Holmes”, containing 2 interns, came second.

For the rest of the week, we were pushing our creativity to the limits so as to produce eye-catching posters. It made us especially collaborative this week due to the regular LaTeX errors and the subsequent necessity to find someone who had already encountered that issue.

A Foosball charity tournament was organized on Tuesday in order to raise money for MIND and also to test out the BRAND NEW FOOSBALL TABLE decorating the hub from now on. Two of interns participated in the competition as a team and successfully won the first round. Sadly luck wasn’t on their side in the second game and they lost against the eventual competition runners-up.

Week 7 was enriched with football. We had two wonderful games on both Tuesday and Thursday. Interns lead by the MRes students opposed PhD students on Tuesday. After an exhausting 90 minute long game the score was 5-5 and so a golden point game began. This time we lost, but next week we will be sure to come back stronger than ever before.

Tuesday was a very busy day as, in addition to the above-stated activities, it also included the pub quiz. Three teams represented STOR-i this time, and every team ended up winning prizes. One of the teams won “The most average team” prize, the other one was the closest when guessing the exact number of “Big Bang Theory” episodes and the last team, but certainly not the least, WON the quiz.

As usual, the week was concluded by the forum. This time the presentation was given by Christian Rohrbeck and of course, it was followed by coffee with cookies.

### Week 8

#### Written by Nicolo Grometto

The final week of the internship has finally begun!

We spent Monday morning making last-minute changes to our posters before sending them off for printing. In the afternoon, we all had a good start on our presentations, trying to condense the results obtained throughout the previous 7 weeks into a ten-minute presentation.

Unfortunately, Tuesday did not see any of the interns attending the weekly pub quiz. Making our posters and slides look pretty in LaTeX and feeling the final day approaching took up a great deal of energy, and almost no interns showed up for the last football session, either. Quite an animated 4-a-side still took place on the field, and the sun shining made it even more enjoyable for those who played.

Wednesday quickly went by, as we spent the whole day making fast progress with our presentations. On Thursday, we had our exit interviews with the Director of STOR-i, Jonathan Tawn. We had the possibility to discuss our experience throughout the internship, as well as the progress we made with our projects in the past weeks. We concluded the day by gathering in groups in the Postgraduate Statistics Centre for rehearsing and giving each other constructive feedback.

And at last, Friday! The day began with an unusual atmosphere at STOR-i, as we were all so excited about showing our work to others, whilst also feeling nervous about having to speak in front of the audience. After rearranging the interns’ office for the afternoon poster session, at 9:45 we started off with the presentations. It was incredible to see how much progress each one of us made during the internship and how well we all managed to present our work.

After a short break, the day continued with the poster display session, which also went exceedingly well. A number of visitors came along to see our work, including the MRes and PhD students, as well as members of staff from STOR-i and the Mathematics and Statistics, and Management Science Departments. We all received positive comments about our research projects, as well as posters, which made us extremely proud and satisfied with our work. We concluded the day with a final meal in town and celebrated our results altogether.

On Saturday morning, the time to leave had come. Whilst feeling sad for having to say goodbye to each other, we were all so happy for having spent a fantastic summer at STOR-i and for feeling part of such an inclusive community. Thank you to everyone who worked hard in order to make this happen.

## 2017 Blog

Here you can find out more about the STOR-i internships experience as told by the 2017 STOR-i interns.

### Week 1

#### Written by Callum Barltrop

The week started with an introductory talk from Jonathan Tawn, followed by some team-building exercises. We also got a tour of the STOR-i facilities and were introduced to the saving grace of the organisation - the coffee machine.

On Tuesday, we had our first lectures on R and Latex. Many of us also met with our supervisors on this day to discuss our projects and decide what reading to do to familiarise ourselves with the content. In the evening, a bunch of us met in the White Cross for the 'world-renowned' pub quiz. Whilst my team didn't win, we did win a gallon of beer for having the closest guess on the bonus round - how much is a KG of Donkey Cheese? (clue: it's very expensive!)

Wednesday and Thursday were fairly similar, with more lectures and reading. In the intern group, we had all got to know each other fairly well by this point and had started making some plans for over the rest of the internship, including booking out a lecture theatre for Game of Thrones!

Friday was slightly different - instead of the usual lectures, we were set our first group R challenge, which involved a famous old puzzle. We also attended our first STOR-i forum, where we found out about some of the fascinating research being done by one of the PhD students at the organisation. In the evening, a few of us met up for a couple of drinks in one of the bars of campus.

Saturday involved a trip to Grasmere in the Lake District, organised by the one known formally as 'Mr Tickle'. After a long 10 mile hike with plenty of rain, cloud and mud, we bought some incredible gingerbread and stared up angrily at the sky as the sun came out... Just our luck!

Finally, on Sunday, George (another intern) and I met up in the morning for a nice steady run in the sun. Later on in the day, a bunch of us met up at a bar on campus to catch the Wimbledon final, where Federer made the game look easy.

All in all, a fantastic first week at STOR-i, with a lot still to look forward to!

### Week 2

#### Written by Edward Austin

Week 2 began with us comparing who had written the best algorithm to solve a Travelling Salesman Problem. I can confirm, with a score nearly 100 times that of the winning group – Jake, Jonny and George’s code – that it was not my group. Not to be put off with this, though, we set to work on our next challenge – making our laptops play noughts and crosses.

Over the course of the week, this certainly brought out some of the competitors in the group with Jake managing to make a code that simply could not be beaten. Indeed, after playing a million games against random opponents it never lost! This piece of programming mastery led to the hypothesis that caffeine, as tracked on our new caffeine chart, was the secret to his coding success.

Wednesday not only saw my group’s noughts and crosses code crushed by everyone else’s but also saw us finish the LaTeX courses with an introduction to creating posters. This will certainly be of great use to us when it comes to the end of the project!

Thursday was a strange day insofar as we had no scheduled activities and instead was left to work solely on our projects. This could have marked the start of a long and prosperous relationship with RStudio, however judging from the number of error messages on my screen this might have to wait a couple more weeks yet! In the evening we attended a board games night with some of the other MRes and PhD students, and fun was had by all

The following day was our second STOR-i coffee morning with a talk by David Torres Sanchez on optimising aircraft engine maintenance schedule. It was a very enjoyable talk in the sense that not only could we all follow what was being said, but it was delivered with cheerful humour too! Lunchtime then saw Callum entrench his tradition of burritos on a Friday and then in the afternoon we all decided to combine the group meetings into a group presentation where each group member gave a small talk on what they had done that week. This was great as not only was the subject matter interesting, but we all learnt a bit more about the work we were doing and what direction we should head in next too!

At the weekend some of us headed into the Lake District for a walk on Saturday, and others spent time with their girlfriends. On Sunday there was also a cinema trip to watch Dunkirk, a film I cannot recommend highly enough!

### Week 3

#### Written by Jake Grainger and Graham Laidler

With the end of the previous weeks’ coding lessons, we were able to fully sink our teeth into our individual projects. This allowed us to really make some solid progress, gaining a fuller sense of our projects’ complexities.

We all began to make some headway with our projects, and our coding skills went through the roof. Here is a spatial dependency plot that Jake produced. It shows the wave height dependency of different points with the central point, represented using his favourite colours.

On Friday, we enjoyed the usual STOR-i forum. This week, each of 5 PhD students attempted to explain their research in just 3 minutes 14 seconds. With the volume of the buzzer helpfully set to maximum, we were left in no illusions as to when this time was up. Jake had a headache for the rest of the day, but some hot milk and spatial extremes perked him up again. This was another great week on the STOR-i internship, and we are all bonding well.

### Week 5

#### Written by Chloe Fearn and Jonnie Bevan

Monday of week 5 saw the continuation of the after-work Game of Thrones watching tradition that has developed. We don’t watch Game of Thrones but given all the exciting talk since, we think it was a great episode!

On Wednesday of this week, we had to tackle the problem-solving day, which involved finding reasons why a cycling company was receiving less custom. We split off into three teams and spent the day analysing the relevant data and coming to conclusions about what the company could do to boost their customer base. At the end of the day, we presented our findings to the other groups and some of the MRes students; it was interesting to see the different approaches we all decided to go with. It was a fun break from the routine that we have settled into with our projects and gave us a chance to work collaboratively for the first time since the noughts and crosses project in week 2. On Thursday morning we talked to three of the PhD students about what life is like at STOR-i. They were very helpful and answered a lot of questions that we had!

This week’s STOR-i forum involved five pi-minute theses as opposed to the general half-hour presentation of a single thesis. We heard short presentations on a range of topics, and the loud buzzer at the end of each one kept us all on our toes! Afterwards, we headed to The Hub for the usual coffee and biscuits, before we rounded off the week with an afternoon of work (and a bit of Pictionary on the whiteboard).

### Week 7

#### Written by Callum Barltrop

This week started off a little different from the other weeks over the internship since it was the first week where we did not have anything scheduled! This was to allow us time to work on our project posters and presentation, which we would be going to be presenting in the following week.

Whilst working independently can be at times, this week really showed to us the difference between having a good working group can make. We regularly took coffee breaks and had chats about how our projects were going, as well as getting second opinions on some of the stuff we were working on. For many of us, this really helped to clarify what we were working on.

On Friday, we had the regular STOR-i forum - this week was a 'Pi Forum' where 5 PhD students had exactly 'Pi' minutes to present their work and the progress they had made.

Saturday began with an early start for some of us as we had decided to go and climb Scafell Pike - the highest point in England! A few of us got a little lost on the way there and spent a fair bit of time driving through farmers fields (cough cough Graham) but we made it in the end for some rather anticlimactic views. A great day all in all though!

Finally, on the Sunday, myself and George once again went out for a gentle run down some of the beautiful country roads around Lancaster, making for some awesome views.

All in all, this week really gave us good experience in what it would be like to work independently on a PhD, as well as how to summarise and present your work in a concise manner!