A student delivers a presentation

Postgraduate Research

Selecting a PhD Topic

The process of PhD project identification and vetting, the construction of supervisory teams, and the allocation of projects to students is managed closely by the STOR-i’s Executive Committee. Cross-disciplinary work is intrinsic to the operation of STOR-i and all students are supervised by a team representing at least two of the centre’s three constituencies (Statistics, OR industry). The majority of projects are with industry, but we also have a number of projects with our academic strategic partners. Typically 50% more projects are offered than are needed to ensure a wide range of options.

Approved projects are presented to the students in written form and via a series of talks at a Project Market, which leads on to in-depth discussions between students, supervisors and external partners at the end of the second term.

At the start of the third term students select a sub-list of projects that they are interested in. Through a series of meetings with the Leadership Team their motivation for selecting the topics is explored. An allocation of projects is arrived at in May.

The three month PhD Research Proposal project (STOR603) which concludes the MRes year (June-September) gives an opportunity to test the fit of students to projects/supervisory teams. In exceptional cases students are able to change projects at the end of the MRes year.

PhD Projects

STOR-i projects have been developed with our industrial partners and use real-life issues to ensure our graduates are equipped to make a significant impact in the commercial world.

To see current and previous PhD projects choose a cohort below:

2018 PhD Cohort

Here you can find details of the PhD research projects undertaken by students who joined STOR-i in 2017, and started their PhD research in 2018. You can also access further technical descriptions by clicking on the links to the student's personal webpage below each project description.

 

Optimal Scheduling for the Decommissioning of Nuclear Sites

Matthew Bold

Supervisors: Christopher Kirkbride, Burak Boyaci and Marc Goerigk

With production having come to an end at the Sellafield nuclear site in West Cumbria, the focus is now turning to the decommissioning of the site, and safe clean-up of legacy nuclear waste. This is a project that is expected to take in excess of 100 years to complete and cost over £90 billion. Given the large scale and complexity of the decommissioning project, it is crucial that each task is systematically choreographed according to a carefully designed schedule. This schedule must ensure that the site is decommissioned in a way that satisfies multiple targets with respect to decommissioning speed, risk reduction and cost, whilst accounting for the inherent uncertainty regarding the duration of many of the decommissioning tasks. My research aims to develop optimisation methods to help construct such a schedule.

In partnership with Sellafield.

Click here to see a technical description.

 

Online Changepoint Methods for Improving Care of the Elderly

‌‌

Jess Gillam

Supervisors: Rebecca Killick

The NHS is under great pressure from an ageing population. Due to great advancements in modern medicine and other factors, the NHS and other social care services must provide the necessary care to a growing population of elderly people. This PhD project is partnered with Howz. Howz is based on research that implies changes in daily routine can indicate potential health risks. Howz use data from sensors placed around the house and other low-cost sources such as smart meter data to detect these changes. Alerts are then sent to the household or immediate care facilitators, where permission has been granted, to check on their safety and wellbeing. To the NHS, early intervention such as this is likely to result in fewer ambulance call-outs for elderly patients and fewer elderly requiring long hospital stays.

The objective of this PhD is to provide novel ways of automatically detecting changes in human behaviour using passive sensors. The first focus of the PhD will be in sensor-specific activity and considering changes in behaviour as an individual evolves over time.

In partnership with Howz.

Click here to see a technical description.

 

On Topics Around Multivariate Changepoint Detection

Thomas Grundy

Supervisors: Rebecca Killick

Royal Mail deliver between forty and fifty million letters and parcels daily. In order for this process to run smoothly and efficiently, the data science team at Royal Mail are using innovative techniques from statistics and operational research to improve certain application areas within the company.

My research will aim to create and develop time-series analysis techniques to help tackle some of the open application areas within Royal Mail. Time-series data are collected over time and a key analysis is to identify time-points where the structure of the data may change; a changepoint. Current changepoint detection methods for multivariate time-series (time-series with multiple components) are either highly inefficient (take too long to get an answer) or highly inaccurate (do not correctly identify the changepoints) when the number of time-points and variables grows large. Hence, my research will aim to produce a multivariate changepoint detection method that is computationally efficient, as the number of time-points and dimensions grows large, while still accurately detecting changepoints. This method will be extremely useful within many of the open application areas within Royal Mail.

In partnership with Royal Mail.

Click here to see a technical description.

 

Rare Disease Trials: Beyond the Randomised Controlled Trial

‌‌Holly Jackson

Supervisors: Thomas Jaki

Before a new medical treatment can be given to the public, it must first go through a number of clinical trials to test its safety and efficiency. Most clinical trials at present use a randomised control design, such that a fixed proportion (usually 50%) of patients are allocated to the new treatment and the other patients are given the control treatment. This design allows the detection of the best treatment with high probability so that all future patients will benefit. However, it does not take into account the wellbeing of the patients within the trial.

Response-adaptive designs allow the allocation probability of patients to change depending on the results of previous patients. Hence, more patients are assigned to the treatment that is considered better, in order to increase the wellbeing of patients within the trial. Multi-Armed bandits are a form of response-adaptive design, which maximise the chance of a patient to benefit from the treatment. They balance ‘learning’ (trying each treatment to decide which is best) and ‘earning’ (allocating the patients to the current best treatment to produce more patient successes).

Response-adaptive designs are not often used in practice, due to their low power. This low power means it can be difficult to find a meaningful difference between the treatments within a trial. Hence more research is needed to extend response adaptive methods such that they both: maximise patient successes and produce high enough power to find a meaningful difference between the treatments. 

In partnership with Quanticate.

Click here to see a technical description.

 

Detecting Changes Within Networked Data Streams

Mirjam Kirchner

Supervisors: Idris Eckley

Recent advances in network infrastructure and parallel data storage have now reached the point where it is possible to continuously collect and stream data from almost any operating system one might be interested in monitoring. 

To make sense of this abundance of data further processing and analysis is essential. In many applications like fraud detection, or the identifications of malfunctions in a system, we are interested in capturing deviations from the regularly observed behaviour, as these might indicate events that need to be further investigated. Such structural changes at a certain time point in the data generating process are termed changepoints.

If the monitored system is very large, then information is usually gathered at a plurality of different locations resulting in a high dimensional, multivariate data set, sometimes referred to as panel data. An example of such a system is the BT telecommunications network - a data cable network that provides over 27.6 million homes and business premises in the United Kingdom with fixed-line, mobile and broadband services. Because of the interconnectedness within the system, emerging events may depend on one another, and it has been observed that changes travel through the network over time.

During my PhD, I aim to develop efficient methods for the detection of changes in network panel data. I intend to improve the reliability of my methods by integrating additional knowledge about the underlying network structure into the detection algorithm.

This project is conducted in partnership with BT, part of the NG-CDI programme — an EPSRC  Prosperity Partnership.

Click here to see a technical description.

 

Statistical Learning for GPS trajectories

‌‌Michael O'Malley

Supervisors: Adam Sykulski and David Leslie

Evaluating risk is extremely important across many industries. For example, in the motor insurance industry, a fair premium price is set by fitting statistical models to predict an individual's risk to the insurer. These predictions are based on demographic information and prior driving history. However, this information does not account for how an individual drives. By accurately assessing this factor insurers could better price premiums. Good drivers would receive discounts and bad drivers penalties.  

Recently insurers have started to record driving data via an onboard diagnostic device known as a black box. These devices give information such as speed and acceleration. In this project, we aim to gain an understanding of how this information can be used to better understand driving ability. This will involve developing statistical models that can predict risk more accurately than traditional methods.

Click here to see a technical description.

 

Scalable Monte Carlo in the General Big Data Setting

Srshti Putcha

Supervisors: Christopher Nemeth and Paul Fearnhead

Technological advances in the past several decades have ushered in the era of “big data”. Typical data-intensive applications include genomics, telecommunications, high-frequency financial markets and brain imaging. There has been a growing demand from industry for competitive and efficient techniques to make sense of the information collected.

We now have access to so much data that many existing statistical methods are not very effective in terms of computation. In recent years, the machine learning and statistics communities have been seeking to develop methods which can scale easily in relation to the size of the data.

Much of the existing methodology assumes that the data is independent, where individual observations do not influence each other. My research will seek to address a separate challenge, which has often been overlooked. We are interested in extending “big data” methods to dependent data sources, such as time series and networks.

In partnership with University of Washington, Seattle.

Click here to see a technical description.

 

Data-Driven Alerts in Airline Revenue Management

‌‌

Nicola Rennie

Supervisors: Catherine Cleophas and Florian Dost

Airlines monitor and control passenger demand by adjusting the number of seats available to passengers in different fare classes, with the objective being to increase revenue. Forecasts of passenger demand are made, based on booking data from previous flights. Passenger booking behaviour that deviates from the expected demand, such as for flights approaching carnivals or major sporting events, needs to be brought to the attention of an analyst. Due to the large networks of flights and the complexity of the forecasts, it is often difficult for analysts to correctly adjust seat availability on flights.

In partnership with Deutsche Lufthansa, my PhD aims to develop methods that highlight such deviations from expected behaviour and potentially make a recommendation to an analyst about what action should be taken. By employing such methods, the project will lead to the development of a prototypical alert system that is able to predict, with a degree of confidence, likely targets for analyst interventions.

In partnership with Deutsche Lufthansa.

Click here to see a technical description.

 

Aggregation and Downscaling of Spatial Extremes

Jordan Richards

Supervisors: Jonathan Tawn and Jenny Wadsworth

Historical records show a consistent rise in global temperatures and intense rainfall events over the last 70 years. Climate change is an indisputable fact of life, and its effect on the frequency and magnitude of extreme weather events is evident from recent events. The Met Office develops global Climate Models, which detail changes and developments in global weather patterns caused by climate change. However, very little research has been conducted into establishing a relationship between the extreme weather behaviour globally, and locally; either within smaller regions or at specific locations.

My PhD aims to develop statistical downscaling methods that construct a link between global, and local, extreme weather. We hope that these methods can be used by the Met Office to improve meteorological forecasting of future, localised, extreme weather events. This improvement will help to see the avoidance of the large-scale costs associated with avoidable damage to infrastructure caused by extreme weather; such as droughts or flooding.

In Partnership with Met Office.

Click here to see a technical description.

 

Ranking Systems in Sport

Harry Spearing

Supervisors: Jonathan Tawn

The age old question: “who is the best?”

Pick your favourite sport. Chances are, you have an opinion on who the best in the world is, at this current moment, or of all time, or who would win if A played B. But is it possible to develop a system which returns an objective answer to these questions?

In developing such systems, it is crucial to capture as much information as possible about the dynamic world in which we live. Understand it. Learn from it. Predict it. Athlete’s injuries, the weather, and even economic factors all impact the outcome of these events and the implied ability of the athletes or teams. This project requires a wide range of strategies in order to capture these signals, from graph theory to extreme value theory, and contextual information from news websites, so that the most accurate system of ranking sports teams or athletes is formulated.

Ranking systems in sport are not only interesting to the inquisitive fan, but a fair and accurate system is at the core of all sports organisational bodies and the multi-billion pound industries that they represent.

But these systems are not exclusive to sports.

Methodological advances in the field of sports ranking systems have far-reaching consequences. Ranking systems are used to rank webpages, or to rank schools and hospitals, or even to determine the most essential medical treatments. So, a ranking system based on poor methodology can have much more severe repercussions than incorrectly seeding a tennis tournament… Ultimately, the importance of ranking systems is self-evident, and sport creates a fruitful playground in which ample advancements can be made.

In partnership with ATASS.

Click here to see a technical description.

 

Evaluation of the Intelligence Gathering and Analysis Process

‌‌Livia Stark

Supervisors: Kevin Glazebrook and Peter Jacko

Intelligence is defined as the product resulting from the collection, processing, integration, evaluation, analysis and interpretation of available information concerning foreign nations, hostile or potentially hostile forces or elements or areas of actual or potential operations. It is crucially important in national security and anti-terror settings.

The rapid technological advancement of the past few decades has enabled a significant growth in the information collection capabilities of intelligence agencies. Said information is collected from many different sources, such as satellites, social networks, human informants, etc. to the extent that processing and analytical resources may be insufficient to evaluate all the gathered data. Consequently, the focus of the intelligence community has shifted from collection to efficient processing and analysis of the gathered information.

We aim to devise effective approaches to guide analysts in identifying information with the potential to become intelligence, based on the source of the information, whose characteristics need to be learnt. The novelty of our approach is to consider not only the probability of an information source providing useful intelligence but the time it takes to evaluate a piece of information. We aim to modify existing index-based methods to incorporate this additional characteristic.

In partnership with the Naval Postgraduate School in Monterey, California.

Click here to see a technical description.

 

Recommending Mallows

Anja Stein

Supervisors: David Leslie and Arnoldo Frigessi

Recommender systems have become prevalent in present-day technological developments. They are machine learning algorithms which make recommendations, by selecting a specific range of items for each individual, which they are most likely to be in interested in. For example, on an e-commerce website, having a search tool or filter is simply not enough to ensure good user experience. Users want to receive recommendations for things, which they may not have considered or knew existed. The challenge recommender systems face is to sort through a large database and select a small subset of items, which are considered to be the most attractive to each user depending on the context.

In a recommendation setting, we might assume that an individual has specified a ranking of the items available to them. For a group of individuals, we may also assume that distribution exists over the rankings. The Mallows model can summarise the ranking information in the form of a consensus ranking and a scale parameter value to indicate the variability in rankings within the group of individuals.

We aim to incorporate the Mallows model to a recommender system scenario, where there are thousands of items and individuals. Since the set of items that an individual may be asked to rank is too large, we usually receive data in the form of partial rankings or pairwise comparisons. Therefore, we need to use methods to predict a user's ranking from their preference information. However, many users will be interacting with a recommender system regularly in real-time. Here, the system would have to simultaneously learn about its unknown environment that it is operating in whilst choosing alternative items with potentially unknown feedback from users. Hence, the open problem we are most concerned about is how to use the Mallows model to make better recommendations to the users in future.

In partnership with Oslo University, Norway.

Click here to see a technical description.

 

Predicting Recruitment to Phase III Clinical Trials 

Szymon Urbas

Supervisors: Christopher Sherlock

In order for a new treatment to be made available to the general public, it must be proven to have a beneficial effect on a disease with tolerable side effects. This is done through clinical trials, a series of rigorous experiments examining the performance of a treatment in humans. It is a complicated process which often takes several years and costs millions of pounds. The most costly part is Phase III which is composed of randomised controlled studies with large samples of patients. The large samples are required to establish the statistical significance of the beneficial effect and are estimated using the data from the Phases I and II of the trials.

The project concerns itself with the design of new methodologies for predicting the length of time to recruit the required number of patients for Phase III trials. It aims to use available patient recruitment data across multiple hospitals and clinics including early data from the current trial. The current methods rely on unrealistic assumptions and very often underestimate the time to completion, giving a false sense of confidence in the security of the trial process. Providing accurate predictions can help researchers measure the performance of the recruitment process and aid them when making decisions on adjustments to their operations such as opening more recruitment centres.

In partnership with AstraZeneca.

Click here to see a technical description.

 

Interactive Machine Learning for Improved Customer Experience

Alan Wise

Supervisors: Steffen Grunewalder

Machine learning is a field which is inspired by human or animal learning and has the objective to create automated systems, which learn from their past, to solve complicated problems. These methods often appear as algorithms which are set in stone. For example, an algorithm trained on images of animals to recognise the difference between a cat or a dog. This project instead concentrates on statistical and probabilistic problems which deal with an interaction between the learner and some environment. For instance, if our learner is an online store which wishes to learn customer preferences by recommending adverts and receiving feedback on these adverts through whether or not customer clicks on them. Multi-armed bandit methods are often used here. These methods are designed to pick the best option out of a set of options through some learner-environment interaction. Multi-armed bandit methods are often unrealistic, therefore, a major objective of this project is to design alterations to the multi-armed bandit methods for use in real-world applications. 

In partnership with Amazon.

Click here to see a technical description.

2017 PhD Cohort

Here you can find details of the PhD research projects undertaken by students who joined STOR-i in 2016, and started their PhD research in 2017. You can also access further technical descriptions by clicking on the links to the student's personal webpage below each project description.

Anomaly detection in streaming data

Alex Fisch

Supervisors: Idris Eckley and Paul Fearnhead

The low cost of sensors means that the performance of many mechanical devices, from plane engines to routers, is now monitored continuously. This is done in order to detect problems with the underlying device in order to allow for action to be taken. However, the amount of data gathered has become so large that manual inspection is no longer possible. This makes automated methods to monitor performance data indispensable.

My PhD focusses on developing novel methods to detect anomalies, or untypical behaviour, in such data streams. More effective methods would allow to detect a wider range of anomalies, which in turn would allow to detect problems earlier, thus reducing their impact. Anomaly detection methods are also used for a range of other applications ranging from fraud prevention to cyber security.

In Partnership with BT

Click here to see a technical description.

 

Dynamic Allocation of Assets Subject To Failure and Replenishment

Stephen Ford

Supervisors: Kevin Glazebrook and Peter Jacko

It is often the case that we have a set of assets to assign to some tasks, in order to reap some rewards. My problem is as follows: we have a limited number of drones, which we wish to use to search several areas. These drones have only limited endurance, and so will need to return and be recharged or refuelled at some point.

The complication of failure and replenishment adds all sorts of possible difficulties: what if one area takes more fuel to traverse, so that drone deployed there fail more quickly? What if the drones are not all identical, with some capable of searching better than others?

This sort of problems are simply too complicated to solve exactly, so my research will look at heuristics – approximate methods that still give reasonably good results.

This project is in collaboration with Mike Atkinson at the Naval Postgraduate School in Monterrey, California.

Click here to see a technical description.

 

Novel wavelet models for nonstationary time series

Euan McGonigle

Supervisors: Rebecca Killick and Matthew Nunes

In statistics, if a time series is stationary – meaning that its statistical properties, like the mean, do not change over time – there is a huge wealth of methods available to analyse the time series. However, it is normally the case that a time series is nonstationary. For example, a time series might display a trend – slow, long-running behaviour in the data. Nonstationary time series arise in many diverse areas, for example finance and environmental statistics, but these types of time series are less well-studied.

The Numerical Algorithms Group (NAG), the industrial partner of the project, is a numerical software company that provides services to both industry and academia. There is an obvious demand to continually update and improve existing software libraries: statistical software for use with nonstationary time series is no exception.

The main focus of the PhD is to develop new models for nonstationary time series using a mathematical concept known as wavelets. A wavelet is a “little wave” – it oscillates up and down but only for a short time. Wavelets allow us to capture the information in a time series by examining them at different scales or frequencies. The ultimate aim of the PhD is to develop a model for nonstationary time series that can be used to estimate both the mean and variance in a time series. Such a model could then be used, for example, to test for the presence of trend in a time series.

In Partnership with NAG

Click here to see a technical description.

 

 Real Time Speech Analysis and Decision Making

Henry Moss

Supervisors: David Leslie and Paul Rayson

Many techniques from machine learning, specifically those that allow computers to understand human speech and writing, are used to aid decision making. Unfortunately, these procedures usually provide just a final prediction. No information is provided about the underlying reasoning or the confidence of the procedure in its output. This lack of interpretability means that the decision maker has to guess the validity of the analysis and so limits their ability to make optimal decisions.

We plan to combine procedures from computer science and statistics to analyse these transcriptions. By using statistical models for grammar, style and sentiment, we will be able to provide interpretable and reliable decision aids.

 

Estimating diffusivity from oceanographic particle trajectories

Sarah Oscroft

Supervisors: Adam Sykulski and Idris Eckley

The ocean plays a major role in regulating the weather and climate across the globe. Its circulation transports heat between the tropics and the poles, balancing the temperatures around the world. Ocean currents impact the weather patterns worldwide, while transporting organisms and sediments around the water. Studies of the ocean have a number of practical applications, for example, knowledge of the currents allows ships to take the most fuel efficient path across the ocean, track pollution such as an oil or sewage spill, or aid in search and rescue operations. These studies can help with building models of the climate and weather which can be used in predicting severe weather events such as hurricanes.

To accurately build models for the ocean, we require knowledge of how it varies geographically in space and time. Such data is obtained from a variety of sources including satellites, underwater gliders, and instruments which freely drift in the ocean, known as floats and drifters.

This project will build new statistical methods for analysing such data, using novel methods from time series analysis and spatial statistics. A particular focus will be to find accurate methods for measuring key oceanographic quantities such as mean flow (a measure of currents), diffusivity (the spread of particles in the ocean), and damping timescales (how quickly energy in the ocean dissipates over time). Such quantities feed directly into global and regional climate models, as well as environmental and biological models.

An early focus of the project will be on diffusivity. Knowledge of how particles spread with time allows us to gain a better understanding of, for example, how an oil spill will spread in the water and the impact that it will cause. Diffusivity can also be used to give an insight into the spreading of radioactive materials which are released into the water or how ocean life such as fish larvae and plankton will disperse. Another application is in aeroplane crashes in the ocean, as using the diffusivity to predict where debris came from can aid recovery missions.

 

Efficient clustering for high-dimensional data sets

Hankui Peng

Supervisors: Nicos Pavlidis and Idris Eckley

Clustering is the process of grouping a large number of data objects into a smaller number of groups, where data within each group are more similar to each other compared to data in different groups. We call these groups clusters, and clustering analysis involves the study of different methods to group data in a reasonable way depending on the nature of the data. Clustering permeates almost every facet of our lives: music are classified into different genres, movies and stocks into different types and sectors, food and groceries that are similar to each other are presented together in supermarkets, etc.

The Office for National Statistics (ONS) are currently using web-scraping tools to collect price data from three leading websites (TESCO, Sainsbury’s, Waitrose) in the UK. We are motivated by the problem of efficiently grouping and transforming a large number of web-scraped price data into a price index that is competitive to the current CPI index. Exploring novel clustering schemes that are able to conduct computationally efficient clustering in the face of missing data, and monitor the changes in each cluster over time, will be the main focus of my PhD.

In Partnership with ONS Data Science Campus

Click here to see a technical description.

 

Real-time Railway Rescheduling

Edwin Reynolds

Supervisor: Matthias Ehrgott

Punctuality is incredibly important in the delivery of the UK’s railway system. However, more than half of all passenger delay is caused by the late running of other trains in what is known as knock-on, or reactionary delay.  Signallers and controllers attempt to limit and manage reactionary delay by making good decisions about cancelling, delaying or rerouting trains. However, they often face multiple highly complex, interrelated decisions that can have far-reaching and unpredictable effects, and must make these decisions in real-time. I am interested in optimisation software which can help them out by suggesting good, or even optimal solutions. In particular, my research concerns the mathematical and computational techniques behind the software. I am sponsored by Network Rail, who hope to benefit from improvements to decision making and therefore a reduction in reactionary delay.

In Partnership with Network Rail

Click here to see a technical desciption.

 

Changepoint in Multivariate Time Series

Sean Ryan

Supervisor: Rebecca Killick

Whenever we examine data over time there is always a possibility that the underlying structure of that data may change. The time when this change occurs is known as a changepoint. Detecting and locating changepoints is a key issue for a range of applications. 

My research focuses on the problem of locating changepoints in multivariate data (data with multiple components). This problem is challenging because not all of the individual components of the data may experience a given change. As a result we need to be able to find the location of the changepoints and the components affected by the change. Current methods that try to solve this problem are either computationally inefficient (it takes too long to calculate an answer) or don't identify the affected components. The aim of my project is to develop methods that can locate changepoints alongside their affected components accurately and efficiently.

In Partnership with Tesco

Click here to see a technical description.

 

Large-Scale Optimisation Problems 

Georgia Souli

Supervisors: Adam Letchford and Michael Epitropakis

Optimisation is concerned with methods for finding the ‘best’ among a huge range of alternatives. Optimisation problems arise in many fields, such as Operational Research, Statistics, Computer Science and Engineering. In practice, Optimisation consists of the following steps. First, the problem in question needs to be formulated mathematically. Then, one must design, analyse and implement one or more solution algorithms, which should be capable of yielding good quality solutions in reasonable computing times. Next, the solutions proposed by the algorithms need to be examined. If they are acceptable, they can be implemented; otherwise, the formulation and/or algorithm(s) may need to be modified, and so on.

In recent years, the optimisation problems arising have become more complex, due for example to increased legislation. Moreover, the problems have increased in scale, to the point where it is now common to have hundreds of thousands of variables and/or constraints. The goal of this project is to develop new mathematical theory, algorithms and software for tackling such problems. The software should be capable of providing good solutions within reasonable computing times.

In Partnership with Morgan Stanley 

 

Statistical methods for induced seismicity modelling

Zak Varty

Supervisors: Jonathan Tawn and Peter Atkinson

The Groningen gas field, located in the north-east of the Netherlands, supplies a large proportion of the natural gas that is used both within the Netherlands and in the surrounding regions. This natural resource is an important part of the Dutch economy, but extraction of gas from the reservoir is associated with induced seismic activity in the area of extraction.

The aim of my PhD is to allow seismicity forecasts to be used to inform future extraction procedures, so that future seismicity can be reduced. In order to do this, a framework needs to be produced for comparing the abilities of current and future forecasting methods. Extra challenges, and therefore opportunities, are added to this task by the sparse nature of the events being predicted and the evolving structure of the sensor network that is used to detect them.

In Partnership with Shell

Click here to see a technical description.

2016 PhD Cohort

Here you can find details of the PhD research projects undertaken by students who joined STOR-i in 2015, and started their PhD research in 2016. You can also access further technical descriptions by clicking on the links to the student's personal webpage below each project description.

Statistical models of widespread flood events as a consequence of extreme rainfall and river flow

Anna Barlow
Supervisors: Jonathan Tawn and Christopher Sherlock

Flooding can have a severe impact on society causing huge disruptions to life and great loss to homes and businesses. The December 2015 floods across Cumbria, Lancashire and Yorkshire caused widespread damage and tens of thousands of properties were left without power. Governments, environmental agencies and insurance companies are keen to know more about the causes and the probabilities of the re-occurrence of such events to prepare for future events. Therefore we wish to better understand the flood risk and the magnitude of losses that can be incurred. This PhD project with JBA Risk Management is concerned with modelling such extreme events and estimating the total impact.

In order to assess the risk from flooding one needs to simulate extreme flood events, and improving upon the existing model for this is the main focus of this project. The simulation of flood events is important in understanding the flood risk and determining the potential loss. This part of the project is based on extreme value theory since we are interested in the events that create the greatest losses for which there may be little or no past data. Extreme value theory is the development of statistical models and techniques for describing rare events.

The second part of the project will be concerned with improving the efficiency of the estimation of large potential losses from the simulated flood events. Current methods involve multiple simulations of the loss at an extremely large number of properties over many flooding events. So we wish to improve the computational burden by reducing the number of simulations while retaining an acceptable degree of accuracy.

In Partnership with JBA

Click here to see a technical description.

Optimal Search Accounting for Speed and Detection Capability

Jake Clarkson
Supervisors: Kevin Glazebrook and Peter Jacko

There are many real-life situations which involve a hidden object needing to be found by a searcher. Examples include a bomb squad seeking a bomb or a land mine; a salvage team the remains of a ship or plane; and a rescue team survivors after a disaster. In all of these applications, there is a lot at stake. There can be huge costs involved in conducting the searches, within marine salvage, for example. Or, there can be consequences if the search is unsuccessful, valuable equipment could be forfeited or damaged, or, even worse, human life could be lost. Therefore, it is very important to search in the most efficient manner, so the search ends in a minimal amount of time.

When the space to be searched is split into distinct areas, the search process can be modelled as playing a multi-armed bandit, which is a mathematical process, named after slot machines, in which consecutive decisions must be made. Existing bandit theory can then be used to easily calculate the optimal order in which areas should be searched, thus solving this classical search problem.

The main focus of this PhD is to expand the existing theory for the classical search problem to accommodate search problems with two extra features. The first is to allow the searcher a choice of a fast or slow search speed. This idea is often prominent in real-life problems, for example, the bomb squad may have a choice between travelling quickly down a stretch of road in a vehicle with sensors, and proceeding on foot with trained sniffer dogs. The vehicle travel, analogous to the fast search, covers the road more quickly, but the chance of missing a potential bomb upon that road will increase. Being on foot with the trained dogs, corresponding to the slow speed, will take more time to cover the same distance, but may well detect a hidden bomb with a larger probability. The second feature removes the ability of the searcher to search any area at any time, another often realistic assumption. For example, the bomb squad can only examine roads adjacent to their current location, to reach roads a further distance away, they must first make other searches.

Click here to see a technical description.

Late stage combination drug development for improved portfolio-level decision-making

Emily Graham
Supervisors: Thomas Jaki, Nelson Kinnersley and Chris Harbron

Pharmaceutical companies will often have a variety of drugs undergoing development and we call this collection a pharmaceutical portfolio. Since drug development is a long, expensive and uncertain process it is important that the decisions we make regarding this portfolio are well informed and are expected to be the most beneficial to the company and the patient population. We are interested in the problem of optimal portfolio decision making in the context of a pharmaceutical portfolio containing combination therapies.

Combination therapies combine existing drugs and new molecular entities with an aim to produce an efficacious effect but with fewer side effects. While some methods do exist for portfolio decision making they do not take into account combination therapies or the information which can be gained from trials containing similar combinations. For example, if drug A+B is performing well this may influence the beliefs that are held about how A+C will perform and whether or not it should be added to the portfolio. We believe that taking into account similarities between combinations and sharing information across trials could lead to better decision making and hence better outcomes for the portfolio.

In Partnership with Roche

Click here to see a technical description.

Automated Data Inspection in Jet Engines

Harjit Hullait
Supervisors: David Leslie, Nicos Pavlidis, Azadeh Khalegh and Steve King

Advances in technology have seen an explosion of high-dimensional data. This has brought a lot of exciting opportunities to gain crucial insights into the world. Developing statistical methods for gaining meaningful insights from this rich source of data has brought some interesting challenges and some very notable failures. There is a need for consistent statistical methods to understand and utilise the vast amounts of data available.

My PhD is focused on developing statistical techniques for finding anomalies with Jet engine data. A Jet engine is a complicated system, with various sensors monitoring a huge number of features from temperature, air pressure etc. Applying standard anomaly detection methods on this data would be computationally expensive, taking potentially years to run. Therefore the challenge is finding methods for capturing the important information from this vast amount of data, and make meaningful inferences.

We need to find ways of extracting the important information from this high-dimensional data in a computationally efficient way. We must also ensure this information contains the necessary information for identifying the true anomalies in the full data. My focus will therefore will be on developing novel methods for identifying and extracting meaningful information, and finding anomalies that correspond to issues in the full data.

In Partnership with Rolls-Royce

Click here to see a technical description.

Operational MetOcean Risk Management under Uncertainty

Toby Kingsman
Supervisors: Burak Boyaci and Jonathan Tawn

One of the main ways that the UK is increasing the amount of renewable energy it generates is by building more offshore wind farms. With the advent of new technologies it is possible to both build bigger turbines and situate them further out at sea. Though these developments are big improvements, wind turbines still require a large amount of government subsidy to make them competitive with fossil fuelled power stations. One way of helping to reduce the need for this subsidy is by carrying out maintenance activities more efficiently.

An example of this is the question of how to route vessels around the wind farm to carry out repairs in the most cost-efficient manner. Sending a large amount of ships to deal with the tasks will get them completed quickly but at a large cost, whereas sending only a few ships will be cheap but risks leaving some failures unaddressed overnight. As a result it is important to find a balance between the two approaches.

This problem is further complicated by the fact that there is a large degree of uncertainty in the accessibility of the wind farm. If the conditions are too choppy or too windy then vessels will be unable to travel to the wind farm. To account for this we will need to build a statistical model of how the metocean conditions change over time near the wind farm.

The aim of the PhD will be to develop an optimisation model that can account for the key factors and constraints that affect the problem to help determine which vessels should be utilised at which times.

In Partnership with JBA

Click here to see a technical description.

Symbiotic Simulation in an Airline Operations Environment

Luke Rhodes-Leader
Supervisors: Stephan Ongo, Dave Worthington and Barry Nelson

Disruption within the airlines industry is a severe problem. It is quite rare that an airline will operate a whole day without some form of delay to their schedule. The causes of these vary widely, from weather to mechanical failures. This often means that the schedule has to be revised quickly to minimise the impact on passengers and the airline. This could potentially be done with a form of simulation called Symbiotic Simulation.

A simulation is a computer model of a system that can estimate the performance of the system. A symbiotic simulation involves an interaction between the system being modelled and the simulation by exchanging information. This allows the simulation to use up to date information to improve its representation of the system. In turn, the predictions of the simulation can then be used to improve the way that the system operates. In our application, the simulation will estimate how well a schedule performs and the airline can then implement the best one.

However, there are issues with the current state of Symbiotic Simulation. These include choosing how to use the up to date information in the best way and in finding a “good” schedule quickly. Such areas will be part of the research during my PhD.

In Partnership with Rolls-Royce

Click here to see a technical description.

Realistic Models for Spatial Extremes

Robert Shooter
Supervisors: Jonathan Tawn, Jenny Wadsworth and Phil Jonathan

Being able to model wave heights accurately is very important to Shell - for both economic and safety reasons. By knowing the characteristics of waves allows the safe design of offshore structures (such as oil rigs) while also meaning that the appropriate amount of money is spent on each structure; a small increase in necessary strength of the structure costs a significant amount. As it is large waves in particular that have to be factored into the assessment of meeting safety criteria, Extreme Value Analysis (EVA) is used, since this allows appropriate modelling for extreme waves. For this project, attention will largely be paid to modelling of waves in the North Sea off the coast of Scotland.

The particular focus of this project will be to consider the effect of altering location on the properties of the extreme waves, as well as direction, and to model these appropriately. For instance, it could be expected that as Atlantic storms (as seen in the UK autumn and winter) create large waves, while very little extreme weather approaches from the East, so that there are fewer extreme waves from this direction.

Another issue to be considered is that distant sites are very unlikely to exhibit extreme values at the same time whilst nearby locations are likely to be very similar in nature. Practically, a mix of these two possibilities is the probable underlying situation. The exact nature of this kind of behaviour needs to be determined both for modelling and important theoretical reasons.

In Partnership with Shell

Click here to see a technical description.

Change and Anomaly Detection for Data Streams

Sam Tickle
Supervisors: Paul Fearnhead and Idris Eckley

It's a changing world.

Pick a system. Any will do. You could choose something simple, like the population dynamics in Lancaster's duck pond, or something fantastically complex, such as the movements of the stock markets around the world or the flow of information between every human being on the planet. Ultimately, every action in every system is governed by change and reaction to it. Every problem is a changepoint problem.

And yet, despite the increased interest in studying the detection of change in recent years, understanding in many aspects of the problem still lag behind where we would like them to be. The first major issue arises from the necessary consideration of multiple variables simultaneously. In most real-life situations, it will be necessary to examine the evolution of more than one quantity (in our duck pond example, the population of ducks and of the students who feed them would both be pertinent considerations). Yet, unpacking what a change means in this context, and how to detect it, is still very much in its infancy.

The second important strand of the project will involve speeding up existing changepoint detection methods. In order for such a detection method to be of any real world use, changes need to be detected as soon as possible, especially in situations where the nature of the change is subtle. Failure to do so can lead to the retention of policies which can be actively detrimental to the system in the medium to long term.

At the same time, however, distinguishing between true changes and mere anomalies in the data is important. Correctly identifying an anomalous occasion, and setting it apart from a true, persistent change, is vital for any decision maker. Doing this effectively and quickly, in the context with multiple data sets (known as a data stream) being observed simultaneously is the central goal of this research.

In Partnership with BT

Click here to see a technical description.

Optimising Aircraft Engine Maintenance Scheduling Decisions

David Torres Sanchez
Supervisors: Konstantinos Zografos and Guglielmo Lulli

In our modern era thousands of flights are in operation every minute. Each of these require a meticulous inspection of their mechanical components to ensure that they can operate safely. As there are several types of maintenance interventions, varying in rigorousity and duration, they have to be scheduled to occur at certain times. One the world's major jet engine manufacturer, Rolls-Royce (RR), is interested in knowing not only when, but also where and what type of intervention is optimal to perform.

My PhD focusses on exploring the most appropriate ways of modelling the problem and ultimately solving it. This involves developing a mathematical formulation which then has to be solved via an efficient algorithm. The efficiency is linked with the ability of the algorithm to cope with the scale of the combinatorial problem, which due to the level on which RR operate, is very large.

In Partnership with Rolls-Royce

Click here to see a technical description.

Efficient Bayesian Inference for High-Dimensional Networks

Kathryn Turnbull
Supervisors: Christopher Nemeth, Matthew Nunes and Tyler McCormick

Network data arise in a diverse range of disciplines. Examples include social networks describing friendships between individuals, protein-protein interaction networks describing physical connections between proteins, and trading networks describing financial trading relationships. Currently there is an abundance of network data where, typically, the networks are very large and exhibit complex dependence structures.

When studying a network, there are many things we may be interested in learning. For instance, we may want to understand the underlying structures in a network, study the changes in a network over time, or predict future observations.

There already exists a collection of well-established models for networks. However, these models generally do not scale well for large (high dimensional) networks. The dimension of the data and dependence structures present interesting statistical challenges. This motivates my PhD, where the focus will be to develop new and efficient ways of modeling high dimensional network data.

Click here to see a technical description.

2015 PhD Cohort

Here you can find details of the PhD research projects undertaken by students who joined STOR-i in 2014, and started their PhD research in 2015. You can also access further technical descriptions by clicking on the links to the student's personal webpage below each project description.

Computational Statistics for Big Data

‌‌

Jack Baker
Supervisors: Paul Fearnhead and Chris Nemeth

Medical scientists have decoded DNA sequences of thousands of organisms, companies are storing data on millions of customers, and governments are collecting traffic data from around the country. These are examples of how information has exploded in recent years. But broadly, these data are being collected so they can be analysed using statistics.

At the moment, statistics is struggling to keep up with the explosion in the quantity of data available. In short - it needs a speed up. This is the area I’ll be working on over the next few years.

Schemes to speed up statistical methods have been proposed, but it’s not obvious how well they all work. So to start with I’ll be comparing them in different cases. This comparison should outline any issues, which I can then try and resolve.

Click here to see a technical description.

Optimal Partition-Based Search

James Grant
Supervisors: Kevin Glazebrook and David Leslie

As within industry, optimal allocation of resources is an important planning consideration in military activities. Certain cases of military search and patrol (such as tracking border crossings, detecting hostile actions in some planar region, searching for missing objects or parts etc.) can be performed remotely using Unmanned Aerial Vehicles (UAVs). A UAV is a class of drone which is capable of detecting events from the sky and relaying the event locations to searchers on the ground.

Searching in this context will typically be performed by a fleet of several UAVs, with each UAV being allocated a certain distinct portion of the search region. UAVs allocated a larger search region will have to spread their time more thinly and as such this may reduce their probability of detecting an event. Furthermore, some UAVs may be better equipped than others to search certain parts of the region – this may be due to varying terrains or altitudes.

The question this project seeks to answer is how should the resources (UAVs) be allocated to maximise the number of events detected. To answer this question information on the capabilities of the UAVs and estimated information on where events are most likely to occur must be considered. Then an ‘Optimisation method’ which identifies the best of a many options should be used to select the optimal partitioning of the search region. Existing methods do little to account for the fact that information on where events will occur is merely estimated and a novel aspect of this project will be to take account of this uncertainty.

In Partnership with Naval Postgraduate School

Click here to see a technical description.

Forensic Sports Analytics

‌Oliver Hatfield
Supervisors: Chris Kirkbride, Jonathan Tawn and Nicos Palvidis

When data are collected about a process occurring over time, it is often of interest to be able to tell when its behaviour departs from the norm. Because of this, an important research area is the detection of anomalies in random processes. These anomalies can take many forms - some may be sudden, whereas others may see gradual drift away from expected behaviour. This project aims to develop new ways of identifying both sorts of anomalous patterns in random processes of a variety of structures and forms, both when observations are independent, and when the processes evolve over time. Anomaly detection has a huge range of applications, such as observing fluctuations in the quality of manufactured goods in order to detect machine faults.

The application that forms the focus of this project is match-fixing in sport. Corrupt gamblers with certainty about the outcomes of matches can bet risk-free, and hence have potential to make substantial illegal gains. The cost of fixing matches, for example via bribes, can be high, and so the corrupt gamblers may need to wager significant amounts of money to make profit. However, betting large amounts can distort the markets, whether gambled in lump sums or disguised more subtly, as bookmakers alter their odds to mitigate potentially large losses. This project attempts to detect suspicious betting activity by looking for unusual behaviour in the odds movements over time. These are considered both before matches and during them, when the in-play markets also react to match events themselves. The aim is to be able to identify fixed matches as early as possible, so that gambling markets can be suspended with the lowest possible losses.

In Partnership with ATASS Sports

Click here to see a technical description.

Novel Inference methods for dynamic performance assessment

Aaron Lowther
Supervisors: Matthew Nunes and Paul Fearnhead

Organisations are complex, constantly changing systems that can be responsible for carrying out many important tasks. The importance of these tasks encourages us to have a sound understanding of how the system behaves but the complex structure makes this difficult.

My PhD focuses on modelling and understanding how aggregations of variables (or tasks, for example) evolve, where we think of these variables as components of a system. The aggregation of such variables is important since the effect on the system from an individual may be negligible, but also huge numbers of variables means that modelling each one is impractical.

Ultimately we are interested in how the change in behaviour of these variables impacts the performance of the system, but in order to predict the future state of the system we must have a thorough understanding of the variables. We may achieve this through generating accurate models and deriving methodology that can determine the structure of the aggregations, which currently is quite limited.

In Partnership with BT

Click here to see a technical description.

Uncertainty Quantification and Simulation Arrival Process

Lucy Morgan
Supervisors: Barry Nelson, David Worthington and Andrew Titman

Simulation is a widely used tool in many industries where trial and error testing is either too expensive, time consuming or both. It is therefore very important to build simulation models that mimic real world processes to high fidelity. This means utilising complex, potentially non-stationary, input distributions. Input uncertainty describes the uncertainty that propagates from the input distributions to the simualtion output and is therefore key to understanding how well a model captures a process.

Currently there are methods to quantify input uncertainty when input distributions are homogenous but non-stationary input models have yet to be considered. The aim of my project will be to create methods that can quantify the input uncertainty in a simulation model with non-stationary inputs. Starting by looking at queueing models with non-stationary arrival processes.

Click here to see a technical description.

Large Scale Statistics with Applications to the Bandit Problem and Statistical Learning

Stephan Page
Supervisors: Steffen Grünewälder, Nicos Pavlidis and David Leslie

The bandit problem is a name given to a large class of sequential decision problems and derives from a term for slot machines. In these problems we are faced with a series of similar situations and for each one we receive a reward by selecting an action based on what has happened so far. It is necessary to select actions in such a way that we learn a lot about the different rewards while still obtaining a good reward from the situation we currently face. Often these objectives are referred to as exploration and exploitation. Usually we are interested in making the sum of our rewards as big as possible after having faced many situations.

The multi-armed bandit problem in which the rewards we receive are only influenced by the actions (or arms) we select has been well-studied. However if we adjust this to the contextual bandit problem in which for each situation the rewards are also influenced by extra information (or context) that we find out before having to select an action then we get something which is much less understood. When we are given a large amount of extra information it is necessary to work out what parts of this information are relevant. This requires the use of large scale statistical methods.

Click here to see a technical description.

Statistical Learning for Interactive Education Software

Ciara Pike-Burke
Supervisors:Jonathan Tawn, Steffen Grünewälder and David Leslie

In recent years, the education sector has moved away from the traditional pen and paper approach to learning and started to incorporate new technologies into the classroom. Sparx is an education research company that uses technology, data and daily involvement in the classroom to scientifically investigate how students learn. As students interact with the system, data on the way they are learning can be securely gathered. The aim of this project is to be able to use a discreet and anonymised data set to improve students' experience and attainment.

Multi-armed bandits are a popular way of modelling the trade-off between exploration and exploitation which arises naturally in many situations. As part of the PhD, they will be applied alongside other statistical learning techniques to help develop systems that interact with the students in order to provide a personalised route through the content and exercises. Another aim of this research will be to develop more accurate predictions of student performance. An accurate prediction of student performance in exams is extremely important for students, teachers and parents.

In Partnership with Sparx

Click here to see a technical description.

Multivariate extreme value modelling for vines and graphic models

Emma Simpson
Supervisors: Jenny Wadsworth and Jon Tawn

There are many real-life situations where we might want to know the chance that a rare event will occur. For instance, if we were interested in the building of flood defences, we would want to take into account the amount of rainfall that the construction should be able to withstand, and knowing how often particularly bad rainfall events are likely to occur would be an important design consideration. Often with rare situations, it may be the case that we are interested in an event that has never happened before, making modelling a huge challenge. The area of statistics known as extreme value theory is dedicated to studying rare events such as these, and allows the development of techniques that are robust to the fact that there is an intrinsically limited amount of data available concerning these infrequent events.

The main aim of this PhD project is to develop techniques related to extreme value theory where there are multiple variables to consider, and of particular interest is developing models that can encapsulate the various ways that these different variables may affect one another. This aim will be achieved by drawing on methods from other areas of statistics that are also concerned with capturing dependence between different variables, and more information about this is available on my webpage.

Although this project is not associated with a specific application, it is hoped that the methods developed could be useful in a variety of areas, with the most common uses of extreme value theory come from environmental and financial sectors.

Click here to see a technical description.

Supporting the design of radiotherapy treatment plans

Emma Stubington
Supervisor: Matthias Ehrgott

Radiotherapy is a common treatment for many types of cancers. It uses ionising radiation to control or kill cancerous cells. Although there has been rapid development in radiotherapy equipment in the past decades it has come at the cost of increased complexity in radiotherapy treatment plan design.

Treatment planning involves multiple interlinked optimisation problems to determine the optimal beam direction, radiation intensities machine parameters etc. The process is complicated further by conflicting objectives; an ideal plan would maximise the radiation to the tumour whilst minimising the radiation to surrounding healthy cells. This is not possible as minimising would result in no radiation therapy and maximising would result in all the healthy cells being killed. Therefore, a compromise must be struck between these two objectives. Currently this is done by comparing a plan to a set of clinical criteria. If a plan does not meet all the criteria the plan must be re-optimised by trial and error until an acceptable plan is found.

The project will aim to remove the trial and error process from treatment planning. Data Envelopment analysis (DEA) will be used to assess the quality of individual treatment plans against a database of existing achievable plans to highlight plans that could be improved further. To begin with the project will focus mainly on prostate cancer cases due to the frequency and relative conformity in shape and location of the tumour. The hope is that methods can then be extended to all cancer types. There is also scope to develop an automatic treatment planning technique to remove clinician subjectivity and speed up the planning process. These aim to ensure individual patients receive the best possible treatment for their unique tumour.

Click here to see a technical description.

Customer Analytics for Supply-Chain Forecasting

Daniel Waller
Supervisors: John Boylan and Nikolaos Kourentzes

Forecasting demand in retail has long been a fundamental issue for retailers. Long-term strategic planning is all about prediction, and demand forecasts inform such processes at the top level. At a lower level, marketing departments find the capacity to predict demand under various arrays of promotions valuable. At the micro-level, supply chain and inventory management processes are reliant on fast, accurate, tactical forecasts for each stock-keeping unit (SKU), to keep stock levels at a suitable level.

Demand forecasting techniques traditionally employed in industry have focussed on extrapolation of past sales data to predict future demand. However, as demand forecasting becomes more complex, with ever increasing ranges of products, there is an increasing need for forecasting tools which use more information. Causal factors, such as promotional activity, have a driving effect on demand patterns and accurate modelling of these can prove crucial to forecasting accuracy.

A further challenge is the huge amount of data now collected at point-of-sale in retail, right down at the micro-level of individual SKUs and transactions. The massive datasets that are compiled as a result pose challenges for forecasting, but also may hold the key to the major gains that can be obtained in the development of prescriptive models for demand.

My PhD aims to bring together these different strands of thought to develop a demand forecasting framework that harnesses the potential in big datasets and incorporates causal factors in demand, such as promotions, to produce accurate forecasts which can provide value at all levels of a retail business.

In Partnership with Aimia

Click here to see a technical description.

Novel methods for distributed acoustic sensing data

Rebecca Wilson
Supervisor: Idris Eckley

Distributed Acoustic Sensing (DAS) techniques involve the use of fibre-optic cable as the measurement instrument. The whole cable is treated as the sensor rather than individual points which allows for a greater degree of control over the measurements that are collected.

In recent years, the use of DAS has become more widespread with this approach being implemented across a range of applications including security e.g. border monitoring and the oil and gas industry. Whilst DAS has proven to be incredibly useful since it allows for real-time recording that is relatively cheap compared to other methods, there are drawbacks related to its use. As with most data collection methods, the measurements that are obtained from such techniques can be corrupted easily.

This PhD aims to develop methods that allow us to detect corruption in DAS signals so that this can be removed, leaving as much of the original signal intact as possible.

In Partnership with Shell

Modelling and solving dynamic and stochastic vehicle routing and scheduling problems using efficiently forecasted link attributes

Christina Wright
Supervisors: Konstantinos Zografos, Nikos Kourentzes and Matt Nunes

There are many risks associated with the transport of hazardous materials. An accident can escalate into something much worse such as a fire or explosion due to the hazardous material being carried. Of most pressing concern is the danger to those nearby should an accident occur. Less people are likely to be injured or even killed on a country lane than if the accident happens in a busy city centre.

Vehicles carrying hazardous materials should travel upon routes where they are least likely to crash and that pose the least danger should an accident occur. The selection of the best route uses an optimisation model. Some of the things that contribute towards the risk such as vehicle speed are unknown beforehand. These values can be predicted using forecasting methods. My PhD will focus upon using forecasting with an optimisation model to try and find the best routes for hazardous material vehicles to take.

Click here to see a technical description.

2014 PhD Cohort

Here you can find details of the PhD research projects undertaken by students who joined STOR-i in 2013, and started their PhD research in 2014. You can also access further technical descriptions by clicking on the links to the student's personal webpage below each project description.

Optimizing Pharmacokinetic Studies Utilising Microsampling

Helen Barnett
Lead supervisor: Thomas Jaki

In the drug development process, the use of laboratory animals has long been a necessity to ensure the protection of both human subjects in clinical studies and future human patients. The parts of the process that involves the use of animals are called pre-clinical studies. Motivation for the development of laboratory techniques in pre-clinical studies is to reduce, refine and replace the use of animals. Pre-clinical pharmacokinetic studies involve using measurements of drug concentration in the blood taken from animals such as rats and dogs to learn about the movement of drug in the body. The technique of microsampling takes samples of considerably less blood than previous sampling techniques in the hope of reducing and refining the use of animals.

In my PhD I aim to make a formal comparison between the results of traditional sampling techniques and microsampling in pre-clinical pharmacokinetic studies in order to show the results from microsampling are of the same quality as traditional methods. I also aim to develop optimal trial designs for trials utilizing microsampling, which includes designing when and how many blood samples to take from the animals in order to achieve the best quality of results. I aim to do this for single dose studies, where one dose of the drug is given at the beginning of the trial and repeated dose studies, when doses are given at regular intervals throughout the trial.

In Partnership with Janssen Pharmaceutica

Click here to see a technical description.

Predicting the Times of Future Changepoints

Jamie-Leigh Chapman
Lead supervisors: Idris Eckley and Rebecca Killick

Changepoint detection and forecasting are, separately, two well established research areas. However, literature focusing on the prediction, or forecasting, of changepoints is quite limited.

From an applied perspective, there is a need to predict the existence of changepoints. Some examples include:

  • Finance – changepoints in financial data could be a result of major changes in market sentiments, bubble bursts, recessions and a range of other factors. Being able to predict these would be very beneficial to the economy.
  • Technology – predicting changepoints in the data produced from hybrid cars, for example, would allow proactive control of the vehicle. This could also apply to drones.
  • Environment – being able to predict changes in wind speed would allow us to predict when turbines need to be turned off. This would improve efficiency and maintenance.
  • This PhD aims to develop models which can predict the times of future changepoints.

Click here to see a technical description.

Inference Methods for Evolving Networks: Detecting Changes in Network Structure

Matthew Ludkin
Lead supervisors: Idris Eckley and Peter Neal

The world around us is made up of networks, from the roads we drive on to the emails we send and the friendships we make. These networks can change in structure over time and, in some cases, the changes can be sudden. In a network of computer connections a sudden change could mean an attack by hackers or email spam. Predicting such a change could reduce the effect of such an attack.

The project will look at modelling the structure of a network as groups of nodes with similar patterns of network links. This modelling technique can then be adapted to account for the network changing through time and, finally, developing methods to detect sudden changes.

Much work has been done in the areas of 'network modelling’ and 'detecting changes through time’ but the two areas have only overlapped in recent years thanks to the availability of data on networks through time.

In Partnership with DSTL

Click here to see a technical description.

Inference using the Linear Noise Approximation

Sean Moorhead
Lead supervisor: Chris Sherlock

The Linear Noise Approximation provides a tractable approximate transition density to Stochastic Differential equations (SDEs). This transition density, given the initial point, is in fact a Gaussian distribution and allows one to simulate the evolution of a SDE quickly. This is particularly useful in statistical inference schemes where the transition density is needed to simulate sample paths of the SDE.

My research involves developing more efficient algorithms that use the LNA as an approximate transition density within a statistical inference for SDEs framework.

An application of my research will involve applying these efficient algorithms to SDE approximations to Markov Jump Processes (MJPs). In particular my research will focus on data collected on the number of different types of fish from waters off of the North coast of Scandinavia in the Barents sea. This data is provided by Statistics for Innovation (a Norwegian Centre for Research-based Innovation that is partnered with STOR-i) and poses a computational complexity challenge due to the multi-compartmental nature of the data. This highlights the need for more efficient algorithms.

Click here to see a technical description.

Physically-Based Statistical Models of Extremes arising from Extratropical Cyclones

Paul Sharkey
Lead supervisors: Jon Tawn and Jenny Wadsworth

In the UK, major weather-related events such as floods and windstorms are often associated with complex storm activity in the North Atlantic Ocean. Such events have caused mass infrastructural damage, transport chaos and, in some instances, even human fatalities. The ongoing threat of these North Atlantic storms is of great concern to the Met Office and its clients. Accurate modelling and forecasting of extreme weather events related to these cyclones is essential to minimise potential damage caused, to aid design of appropriate defence mechanisms to protect the threat to human life and to limit the economic difficulties such an event may cause.

Floods and windstorms are both examples of extreme events. In this context, an extreme event is one that is very rare, with the consequence that datasets of extreme observations are usually quite small. The statistical field of extreme value theory is focused on modelling such rare events, with the goal of predicting the size and rate of occurrence of events with levels that have not yet been observed. This allows a rigorous statistical modelling procedure to be followed in spite of the data constraints.

This PhD research will focus on building an extreme value model that is a statistically consistent representation of the physics that generate the extremes of interest. This will involve exploring the effect of covariates related to the atmospheric dynamics of these storms as well as the joint relationship of rain and wind over space and time.

In Partnership with Met Office and EDF

Click here to see a technical description.

Bayesian Bandit Models for the Optimal Design of Clinical Trials

Faye Williamson
Lead supervisors: Peter Jacko and Thomas Jaki

Before any new medical treatment is made available to the public, clinical trials must be undertaken to ensure that the treatment is safe and efficacious. The current gold standard design is the randomised controlled trial (RCT), in which patients are randomised to either the experimental or control treatment in a pre-fixed proportion. Although this design can detect a clinically meaningful treatment difference with a high probability, which is of benefit to future patients outside of the trial, it lacks the flexibility to incorporate other desirable criteria, such as the participant’s well-being.

Bandit models present a very appealing alternative to RCTs because they perform well according to multiple criteria. These models provide an idealised mathematical decision making framework for deciding how to optimally allocate a resource (i.e. patients) to a number of competing independent experimental arms (i.e. treatments). It is clear that a clinical trial which aims to identify the superior treatment (i.e. explore) whilst treating the participants as effectively as possible (i.e. exploit) is a very natural application area for bandit models seeking to balance the exploration versus exploitation trade-off.

Although the use of bandit models to optimally design a clinical trial has long been the primary motivation for their study, they have never actually been implemented in clinical practice. Further research is therefore required in order to bridge the gap between bandit models and clinical trial design. It is hoped that the research undertaken during this PhD will help achieve this goal, so that one day, bandit models can finally be employed in real clinical trials

Click here to see a technical description.

Classification in Dynamic Streaming Environments

Andrew Wright
Lead supervisors: Nicos Pavlidis and Paul Fearnhead

A data stream is a potentially endless sequence of observations obtained at high frequency relative to the available processing and storage capabilities. Data streams arise in a number of “Big Data” environments including sensor networks, video surveillance, social media and telecommunications. My PhD will focus on the problem of classification in a data stream setting. This problem differs from the traditional classification problem in two ways. First, the velocity of a stream means that storing anything more than a small fraction of the data is infeasible. As such, a data stream classifier must use minimal memory and must be capable of being sequentially updated without access to past data. Second, the underlying data distribution of stream can change with time; a phenomenon known as concept drift. Data stream classifiers must therefore have the ability to adapt to changes in the underlying data generating mechanism. The aim of my PhD is to develop robust classification methods which address both of these problems.

In Partnership with DSTL

Evolutionary Clustering Algorithms For Large, High-dimensional Data Sets

Katie Yates
Lead supervisors: Nicos Pavlidis and Chris Sherlock

In recent years, increased computing power has made the generation and subsequent storage of large datasets commonplace. In particular, it is possible that information is available for a large number of features relating to a particular system or item of interest. These such datasets are thus high dimensional and pose a number of additional challenges in data analysis. My PhD project is concerned in particular with how one may locate “meaningful groups” within these high dimensional datasets, this problem is commonly known as clustering. It is assumed that objects belonging to the same group are in some way more similar to each other than objects assigned to other groups. If such groups can be located effectively, it may then be possible to model each cluster independently given that all members exhibit similar behaviours. This may allow the detection of outlying data points as well as the definition of possible patterns present within the dataset. There exist a number of methods capable of performing this task for low dimensional datasets but the additional challenges faced in the high dimensional setting indicate the requirement for specialist techniques. The initial focus of this project will be to consider methods which first aim to reduce the dimensionality of the problem in some way, without loss of information required for analysis, thus reducing the problem to one which may be solved more efficiently.

A further consideration is that a system may be monitored over time and hence new datasets will be generated as time progresses. In this instance, it is desirable to maintain some level of consistency between the successive groupings of datasets such that the results remain meaningful for the user. This is made possible by considering not only how the current data is grouped but also considering how previous datasets, observed earlier were grouped. In general it is considered inappropriate for radical changes in the clustering structure to be possible. In our opinion, there is a lack of methodology allowing the analysis of high dimensional data which evolve over time. Hence, it is our intention to extend any methodology developed for clustering high dimensional data to further allow the incorporation of historical information, giving rise to more meaningful groupings of evolutionary data.

Click here to see a technical description.

Non-stationary environmental extremes

Elena Zanini
Lead supervisors: Emma Eastoe and Jon Tawn

As one of the six oil and gas "supermajors", Shell has a vested interest in the design, construction and maintenance of marine vessels and offshore structures, a common example of which are oil platforms. The design of robust and reliable offshore sites is, in fact, a key concern in oil extraction. Design codes set specific levels of reliability, expressed in terms of annual probability of failure, which need to be met and exceeded by companies. A correct estimate of such levels is essential to prevent structural damage which could lead not only to losses in revenue, but also to environmental pollution and staff endangerment. Hence, it is essential to understand the extreme conditions marine structures are likely to experience in their lifetime.

Environmental phenomena that have very low probabilities of occurrence are here of interest, and are characterised by scarce data, with the events that need to be estimated often being more extreme than what has already been observed. Extreme Value Theory (EVT) provides the right framework to model and study such phenomena . This project will focus on the extreme wave heights which affect offshore sites, and their relationship with known and unknown factors, such as wind speed and storm direction. These need to be selected and properly included in the model, and this project will focus on developing such theory. Further in the future, existing methods will also be considered and attention will be devoted in optimising the model fit they provide.

In Partnership with Shell

Click here to see a technical description.

2013 PhD Cohort

Here you can find details of the PhD research projects undertaken by students who joined STOR-i in 2012, and started their PhD research in 2013. You can also access further technical descriptions by clicking on the links to the student's personal webpage below each project description.

Efficient search methods for high dimensional data

Lawrence Bardwell
Lead supervisors: Idris Eckley and Paul Fearnhead

My PhD is concerned with finding efficient statistical methods for detecting changepoints in high dimensional time series. Much work has already been done in the case of a single dimension, however when we increase the number of dimensions in a time series there are many more subtleties introduced which complicate the matter and make existing techniques either too limited or too inefficient to be of practical use.

These problems are beneficial to study as combining many of these one dimensional time series together provides more information and leads to better inferences. A potential application of this work could be to assess when and what parts in a network become defective and then to be able to react quickly to this situation so that delays emanating from this breakage would be minimised. We have begun looking at these sort problems in a simplified context where the individual time series are mostly at some baseline level and then abnormal regions occur where the mean value is either raised or lowered. This is an interesting problem in its own right and has applications in genomics but for the most part, it allows us to simplify the main problem somewhat and to focus on certain aspects of it.

In Partnership with B.T.

Click here to see a technical description.

Location, relocation and dispatching for the North West Ambulance Service

Andrew Bottomley
Lead supervisors: Richard Eglese and David Worthington

Ambulance services are responsible for responding to demand for urgent medical care. The level of such demand is unpredictable and resources to meet this demand are limited, so decisions must be made for how to position these resources in order to best meet the response targets in place.

Such decisions involve the positioning of stations, the dispatching of ambulances, and the movement of available ambulances to continue to provide satisfactory coverage across the region. Analogies from results about classical queuing situations can be implemented to help model the possible unavailability of resources more realistically. Different computing strategies can then incorporate such a model and solve this simplified problem to suggest the most preferable placement and movement of the vehicles.

Approaches for the static positioning of ambulances have already been quite extensively studied but building models that allows for dynamic movement of ambulances throughout the day is a newly emerging field that I will be researching.

In Partnership with Northwest Ambulance Service

Click here to see a technical description.

Detection of Abrupt Changes in High Frequency Data

Kaylea Haynes
Lead supervisors: Idris Eckley and Paul Fearnhead

High frequency data (or "Big Data") has recently become a phenomenon across many different sectors due to the vast amount of data readily available via sources such as mobile technology, social media, sensors and the internet. An example of data collected and stored at a high frequency is data from an accelerometer which monitors the activity of the object it is attached to. This project will look at big data sets which have abrupt changes in structure; these changes are known as changepoints. For example this could be data from an accelerometer attached to a person who is alternating from walking to running.

Changepoints are widely studied in many disciplines with the ability to detect changepoints quickly and accurately having a significant impact. For example the ability to detect changes in patients' heartbeats can help doctors' spot signs of disease more quickly and can potentially save lives.

Current changepoint detection methods do not scale well to high frequency data. This research aims to develop methods which are both accurate and computationally efficient at detecting changepoints in Big Data.

In Partnership with Defence Science & Technology Laboratory

Click here to see a technical description.

Modelling ocean basins with extremes

Monika Kereszturi
Lead supervisors: Jonathan Tawn and Paul Fearnhead

Offshore structures, such as oilrigs and vessels, must be designed to withstand extreme weather conditions with a low level of risk. Inadequate design can lead to structural damage, lost revenue, danger to operating staff and environmental pollution. In order to reduce the probability of a structure failing due to storm loading, the most extreme events that could occur during its lifetime must be considered. Hence, interest lies in environmental phenomena that have very low probabilities of occurrence. This means that, by definition, data are scarce, and often the events that need to be estimated are more extreme than what has already been observed. Such extreme and rare environmental events can be characterised statistically using Extreme Value Theory (EVT).

EVT is used to estimate the size and rate of occurrence of future extreme events. Offshore structures are affected by multiple environmental variables, such as wave height, wind speed and currents, so the joint effect of these ought to be estimated. Storms may affect multiple structures in different locations simultaneously; hence spatial models are needed to estimate the joint risk of several structures failing at the same time.

This research aims to develop spatial models for extreme ocean environments, estimating the severity and rate of occurrence of extreme events in an efficient manner over large spatial domains.

In Partnership with Shell

Click here to see a technical description.

Spatial methods for weather-related insurance claims (joint with SFI)

Christian Rohrbeck
Lead supervisors: Deborah Costain, Emma Eastoe and Jonathan Tawn

Storms, precipitation, droughts and snow lead to a high economic loss each year. Nowadays, insurance companies offer protection against such weather events in form of policies which insure a property against damages. In order to set appropriate premiums, the insurance companies require adequate models relating the claims to observed and predicted weather events.

The modelling of weather-related insurance claims is unique in several ways. Firstly, the weather variables vary smoothly over space but their effect on insurance claims in some locations depends on other factors such as geography, e.g., a location close to a river gives a higher risk of flooding. Secondly, past weather data does not provide a reliable basis for predicting future insurance claim sizes as the climate is changing. Specifically, the IPCC report reveals a change in the climate leading to higher sea levels, increasing average temperatures in the coming decades. Therefore the fundamental questions any approach to model insurance claims needs to address are: (i) which events lead to a claim? (ii) what is the expected number of insurance claims given a weather forecast? and (iii) what is the impact of the climate change? Unfortunately, the existing methods cannot answer these questions adequately.

This PhD project aims to improve existing models for weather-related insurance claims by better accounting for the spatial variation of weather and geographical features. In order to build up an appropriate model, statistical methods from spatial statistics, statistical modelling and extreme value theory will be used in the research.

In Partnership with Statistics for Innovation.

Click here to see a technical description.

Multi-faceted scheduling for the National Nuclear Laboratory

Ivar Struijker Boudier
Lead supervisors: Kevin Glazebrook and Michael G. Epitropakis

The National Nuclear Laboratory (NNL) operates a facility which undertakes work covering research into nuclear materials and waste processing services. Each job that passes through this facility requires specialist equipment and skilled operatives to carry out the work. This means that a job cannot be processed until such resources have become available. It is therefore of interest to schedule each job to take place at a time when the required equipment and operative(s) are not engaged in the processing of another job.

Radioactive materials have to be handled with great care and it is not always possible to know the duration of each job in advance. If a job takes longer than expected, the equipment may not be available on time for the next job and this introduces delays to the schedule. Additionally, the equipment being used sometimes breaks down, causing further delays. This PhD aims to develop tools to schedule the work at the NNL facility. Such scheduling tools will have to take into account the uncertainty in job processing times, as well as the possibility of equipment unavailability due to breakdowns or planned maintenance.

In Partnership with National Nuclear Laboratory

Click here to see a technical description.

Inference and Decision in Large Weakly Dependent Graphical Models

‌Lisa Turner
Lead supervisors: Paul Fearnhead and Kevin Glazebrook

In the world we live in, the threat of a future terrorist attack is very real. In order to try and prevent such attacks, intelligence organisations collect as much relevant information as possibly on potentially hostile forces. The timely processing of this intelligence can be critical in identifying and defeating future terrorists. However, improvements to technology have resulted in a huge amount of data being collected, far more than can be processed and analysed. This is particularly applicable to communications intelligence as a result of an increase in the use of social media, emails and txt messaging. Hence, the problem becomes one of deciding which intelligence items to process such that the amount of relevant intelligence information analysed is great as possible.

My research looks at how this problem can be dealt with for communications intelligence. The set of communications can be modelled as a network, where nodes represent the people involved in the communications and an edge exists between nodes if they share at least one conversation. Once a conversation has been processed and analysed, the outcome can provide valuable knowledge on the communication network. Research looks at how the outcome can be incorporated in the model such that it learns from the outcome and how this updated model can then be used to decide which item to screen next.

In Partnership with Naval Postgraduate School

Click here to see a technical description.

2012 PhD Cohort

Here you can find details of the PhD research projects undertaken by students who joined STOR-i in 2011, and started their PhD research in 2012. You can also access further technical descriptions by clicking on the links to the student's personal webpage below each project description.

Stochastic methods for video analysis

Rhian Davies
Lead supervisors: Lyudmila Mihaylova and Nicos Pavlidis

Surveillance cameras have become ubiquitous in many countries, constantly collecting large volumes of data. Due to an over-abundance of data it can be extremely difficult to convert this into useful information. There exists considerable interest in being able to process such data efficiently and effectively to monitor and classify the activities which are identified in the video.

We aim to develop a smart video system with the ability to classify behaviour into normal and abnormal activities which could allow the user to be alerted to anomalous behaviour in the monitored area without the need to manually sift through all of the videos. For example, the system could be used to notify a shop owner to a customer placing goods into their bag instead of their shopping trolley.

In order to develop such a system, we intend to start by adapting simple background subtraction techniques to improve their accuracy. These algorithms are used to separate foreground from the background, allowing us to monitor the activities of interest clearly.

Click here to see a technical description.

Effective learning in sequential decision problems

James Edwards
Lead supervisor: Kevin Glazebrook

Many important problems involve making a sequence of decisions over time even though our knowledge of the problem is incomplete. Learning more about the problem can improve the quality of future decisions. Good long term decisions therefore require choosing actions that yield useful information about the problem as well as being effective in the short term.

The complexity involved in solving these problems often leads to the learning aspects of the problem being modelled only approximately. This simplification can result in poor decision making. This research aims to use modern statistical methods to overcome this difficulty.

Potential applications include: choosing a route to bring emergency relief into a disaster zone with disrupted communications; setting and adjusting the price for a new product; allocating a research budget between competing projects; planning an energy policy for the UK; and responding to an emerging epidemic of uncertain virulence and seriousness.

Click here to see a technical description.

Betting Markets and Strategies

Tom Flowerdew
Lead supervisor: Chris Kirkbride

When gambling on the outcome of a sporting event, or investing on the stock markets, no one would turn down the opportunity to hold an ‘edge’ on the market. An edge could be either some form of added analysis, not seen by the market as a whole, or from more nefarious means, such as insider trading.

When an edge has been found, the problem remains concerning how best to invest money in order to take advantage of this favourable opportunity. A strategy proposed by John Kelly in the 1950s involves betting some proportion of your current bankroll, depending on the magnitude of your edge. Therefore, when you have a bigger edge, you would bet a larger proportion of your current wealth.

This scenario is simplistic, and only applies for very simple situations. When more interesting betting or investing opportunities arise (for example, betting on accumulators, or investing in options), the Kelly criterion is not suitable to deal with the new scenario. This project investigates methods to expand the Kelly criterion (or other similar strategies) to new areas, and is in partnership with ATASS Sports, a statistical analysis company based in Exeter.

In Partnership with ATASS

Click here to see a technical description.

Sports data analysis

George Foulds
Lead supervisors: Mike Wright and Roger Brooks

Sports data analysis often uses basic techniques and draws conclusions from little more than common sense. The importance of applying better statistical techniques to sports data analysis and model building can be seen through the rise of investment strategies based on sports betting. Centaur Galileo, the first sports betting hedge fund, collapsed in early 2012 due to investments guided by inferior models. Therefore, the proposal of more advanced methods to obtain better results is an important one. Two areas of sports data analysis which could be better served by a higher level of analysis are those of home advantage and the effect of technology in sport:

Home advantage is a term used to describe the positive effect experienced by a home team. Although a well-documented phenomenon, most research does little to quantify the underlying factors - an issue that will be addressed. A more subtle analysis will allow a much greater insight into the effect, from which better predictions may be produced.
Some level of technology is used in most sports, whether it is a simple pole for vaulting or a relatively advanced piece of engineering such as a carbon fibre bicycle. Identifying the effect of technology on performance, consistency and other factors important to outcome is an essential step in creating models which give better predictions. This will allow us to update our predictions about outcome in sports faster and more accurately, upon the introduction of new technologies and equipment.

In Partnership with ATASS

Click here to see a technical description.

Machine learning in time varying environments

David Hofmeyr
Lead supervisor: Nicos Pavlidis

We all know the feeling that what we’ve learnt is somehow out of date; that our skills have become redundant or obsolete. The fact of the matter is that times change, and we need to be able to adapt our skills so that they remain relevant and useful.

Machine learning refers to the idea of designing computer programs in a way that they become better at performing some predefined tasks the more experience they have. Much in the same way we, as people, become better at our jobs, at sports, at everything, the more time we spend doing them, computer programs can get better at handling information the more information they have been given. Just like for us, however, these abilities can become redundant when the nature of information changes. It is therefore crucially important to design these programs so that they are adaptive and thus able to accommodate information change without their skill sets becoming obsolete.

Not all changes, even those which would fundamentally affect the nature of information, however, render old skills irrelevant. In being adaptive, therefore, it is important to be able to be selective when adjusting the way we do things since these adjustments might be time consuming and unnecessary if the changes do not affect the specific tasks of interest.

This research will approach the problem of information change in two ways. Firstly, by factoring in the nature of change, rather than just detecting it, it should be possible to be more discerning when deciding whether or not to implement an adjustment when changes occur. Secondly, knowledge will be partitioned into multiple simple aspects therefore only those aspects which are not relevant in the current environment will be “forgotten”.

Click here to see a technical description.

Detecting Abrupt Changes in Ordered Data

Rob Maidstone
Lead supervisors: Paul Fearnhead and Adam Letchford

When data are collected over time this is called a time series. Often the structure of the time series can change suddenly; we call such a change a “changepoint”. To model the data effectively these changepoints need to be detected and subsequently built into the model.

Changepoints occur in many real world situations and detecting them can have significant impact. For example when analysing human genome data it can be noticed that the average DNA copy value is usually about the same level, however occasionally sudden changes away from this level occur. These sudden changes in average DNA level often relate to tumorous cells and therefore the detection of these changes is critical for classifying the tumour type and progression.

Another example of where changepoint detection methods are effective is in finance. Stock data (such as the Dow Jones Index) exhibits a constantly changing time series. Many changes in mean and variance occur and can be detected. This is of use when it comes to modelling the data and forecasting future returns.

This research looks at some of the methods for detecting these changepoints efficiently across a variety of different underlying models. The required methods combine statistical techniques for data analysis with optimisation tools typically used in Operational Research.

Click here to see a technical description.

Detecting Changes in Multiple Sensor Signals

Ben Pickering
Lead supervisor: Idris Eckley

Companies in the oil industry often place sensors within their equipment in order to monitor various properties such as the temperature of the local geology or the vibration levels of the flowing oil at multiple locations throughout the extraction system. This is done in order to ensure that the system continues to run smoothly. For example, a change in the vibrations of flowing oil could indicate the presence of an impurity deposit in the oil well, which could cause a blockage in the valve of the top of the well. Hence, knowledge of any changes in the properties of the data recorded by the sensors is extremely valuable.

Such changes in the properties of data are known as changepoints. The ability to effectively detect changepoints in a given set of data has significant practical implications. However, the task of developing such changepoint detection methods is complicated by the fact that the data sets are often very large and consist of measurements from multiple variables which are related in some way.

This research aims to utilise cutting-edge techniques to develop changepoint detection methods which are able to efficiently detect changes in data arising from multiple related variables, improving upon some of the weaknesses of current detection methods.

In Partnership with SHELL

Click here to see a technical description.

Resource Planning Under Uncertainty

Emma Ross
Lead supervisor: Chris Kirkbride

As markets have grown increasingly competitive, the efficient use of available resources has become paramount for the maximisation of profits and increasingly to ensure the survival of companies. For example, to run the UK's telecommunications network, BT deploy thousands of engineers to repair, maintain and upgrade the network infrastructure. This ensures a high level of network reliability which results in customer satisfaction.

To deliver this service, the engineering field force must be carefully allocated to tasks in each time period. Of particular concern are the risks to BT of a sub-optimal allocation. The over-supply of engineers to tasks can result in unnecessary costs to the business; external contracts may need to be brought in at additional expense and other tasks may suffer without adequate resourcing. Conversely, the under-supply of engineers may lead to missed deadlines and failure to meet customer service targets.

This allocation task is made extremely complex by the unpredictable nature of demand. Plans for the workforce are made extremely far in advance when we can only make vague forecasts of the level of demand we expect to materialise. Supply of engineers is also rendered uncertain by varying efficiency, absence and holidays.

This research explores effective methods for optimal decision making under uncertainty with particular emphasis on modelling the risk (or cost-implications) of an imbalance or gap between supply and demand.

In Partnership with B.T.

Click here to see a technical description.

Modelling droughts and heatwaves

Hugo Winter
Lead supervisor: Jonathan Tawn

Natural disasters such as droughts and heat waves can cause widespread social and economic damage. For example, a drought in the UK may lead to a decrease in soil moisture and a reduction in reservoir levels. In this situation, water companies will be affected economically as they are required to ensure regions are supplied with water. Sustained dry weather may require government policy such as the hosepipe bans seen in recent years. In Saharan regions of Africa, a period of drought can lead to crop failure and famine. This situation can lead to large death tolls if the required aid is not supplied in time.

Heat waves and droughts occur when there are days that are very hot or very dry respectively. These events are referred to as extreme events and by definition rarely occur. Since extreme events do not occur often, there is little data in the historical record. It might also be possible to observe future events that are more extreme than any that have been previously seen. Such a scenario is possible due to global climate change being driven by greenhouse gas emissions. A mathematical modelling technique often used in this type of situation is extreme value theory.

This research aims to model different aspects of dependence within extreme events. Broadly, the main goal is to characterise the severity, spatial extent and duration of extreme events. For example, if an extreme event has been observed at a specific location, is it possible to infer other locations where extreme events might occur? Of particular interest will be how the above aspects of extreme events may change under different climate change scenarios.

In Partnership with the Met Office

Click here to see a technical description.

2011 PhD Cohort

Here you can find details of the PhD research projects undertaken by students who joined STOR-i in 2010, and started their PhD research in 2011. You can also access further technical descriptions by clicking on the links to the student's personal webpage below each project description.

Maintaining the telecommunications network

Mark Bell
Lead supervisor: Dave Worthington

Continuous access to the UK’s growing telecommunications network is essential for a huge number of organisations, businesses and individuals. If access to the network is interrupted, even for a short period, the effects on public services and businesses can be severe. The network has a complex structure, so faults can occur frequently and for many reasons. When a fault does occur it is vital that repair work is performed as soon as possible.

Openreach, part of the BT group, is solely responsible for maintenance and repair of the vast majority of the UK’s network. Performing this effectively means ensuring that at any time they have enough staff available to meet the current demands, which can vary considerably. Models that can understand the effects of these changing demands on the available workforce and the existing workload are of considerable benefit in ensuring that the organisation is prepared for the ‘busiest’ periods. Of particular importance is the model’s ability to understand key performance measures, such as the expected time for a repair job to be completed. Keeping these measures within the targets is central to public satisfaction.

The current models are required to understand behaviour across the entire UK and so there are limits regarding the level of detail they can capture; otherwise the time required to run the models would be impractical. It is therefore vital that the detail included in the model is as accurate as possible. The performance measures output from the models are partly determined by the model inputs which are selected by the analyst; these are based on current knowledge of the system. The research aims to find accurate and robust techniques for the estimation of these input parameters using statistical techniques. This will enable calibration of the models, improving their accuracy when modelling behaviour of the key performance measures, which in the real system are subject to regular fluctuations in the short-term.

In Partnership with B.T.

Click here to find out more about Mark's research.

Effective Decision Making under Uncertainty

Jamie Fairbrother
Lead Supervisor: Amanda Turner

Often we have to make decisions in the face of uncertainty. A shop manager has to decide what stock to order without knowing the exact demand of each item. An investment banker has to choose a portfolio without knowing how the values of different assets will evolve. Taking this uncertainty into account allows us to make good robust decisions.

Using available information and data, a scenario tree describes many possible different futures and importantly is in a form which can be used to "optimise" our decision.

Generally, the more futures a scenario tree takes into account, the more reliable the decision it will yield. However, if the scenario tree is too large the problem becomes intractable. The aim of this research project is to develop a way of generating scenario trees which are small but give reliable decisions. This research would have applications in finance, energy supply and logistics.

Click here for a technical version of Jamie's project.

Defensive Surveillance

Terry James
Lead supervisors: Kevin Glazebrook and Professor Kyle Lin

Defensive surveillance is of great importance in the modern world, motivated by the threats faced on a daily basis and the technology which now exists to mitigate these threats. Adversaries wishing to complete an illicit activity are often intelligent and strategic, wishing to remain covert as they do so. Surveillance must then take the strategic nature of adversaries into account when designing surveillance policies.

This research project aims to explore the task of identifying defensive surveillance policies which can mitigate the threats faced by adversaries in a public setting. For example, consider a surveillance resource responsible for a number of public areas, each of which is a potential target for an adversary. How should the resource be controlled given that the adversary can strike at any time in amongst any of the randomly evolving public crowds?

In Partnership with the Naval Postgraduate School

Click here for further details of Terry's project.

Fighting Terrorism in the Information Swamp

Jak Marshall
Lead supervisors: Kevin Glazebrook and Roberto Szchetman

In a world where the threat of terrorist activity is a very real one, intelligence and homeland security organisations across the globe have an interest in gathering as much information about such activities so that preventative measures can be taken before something like a bomb attack is executed. Problems arise as these agencies generate intelligence data in enormous volumes and it is of a highly varying quality. Satellites taking countless images, field agents submitting their reports and various other high traffic streams all add up to more intelligence than can be reasonably processed in high pressure scenarios.

Further problems arise as any piece of information received from this glut of incoming information needs to be processed by technical experts before it can be contextualized by analysts to fight terrorist threats. It is usually the case that the processing staff aren't completely aware of the importance (or unimportance!) of these pieces of intelligence before they commit their attention to working on them. This research concentrates on modelling the role of the processors in this situation and on the development of methods that can efficiently search this information swamp for vital information.

Two approaches to the problem are considered. The first is a time-saving exercise that asks how a processor should decide whether an individual piece of intelligence needs further scrutiny by them and if not whether it should be flagged as important or cast aside. The second approach takes that decision away from the processor and instead the processor only ever considers the latest report that arrives in their inbox and decides its fate only when the next report arrives. The problem then is to determine how stringent the quality control on intelligence should be given that high arrival rates of reports can result in a small amount of time for the processor to consider each report.

In Partnership with the Naval Postgraduate School

Click here to see a technical description of Jak's project.

Fuel Pricing

Shreena Patel
Lead supervisors: Chris Sherlock

In the market for home-delivered fuel, price takes on a number of different roles. Given that capacity on a delivery, truck has zero value once the truck has left the depot, pricing should minimise the risk of capacity being left unfilled. However, this must be balanced against the firm’s ultimate aim of maximising profit. Hence prices need to be varied over time and across customers to manage demand and make the best use of a limited capacity of delivery vehicles.

Ongoing work with a fuel consultancy firm is looking to develop a model which combines these roles into a single pricing strategy. In particular, price customisation will be achieved using statistical techniques which group together customers according to their price sensitivity.

In Partnership with KSS

Click here for a technical description of Shreena's research project.

Patient flows in A&E Departments

Daniel Suen
Lead supervisor: Dave Worthington

Steadily rising patient numbers and a shrinking budget has been a major concern for the NHS for many years. Rising pressure to maintain the quality of care while coping with a limited budget motivates the need to improve the efficiency of hospitals, in particular, the way they utilise their available resources.

Understanding patient flows in healthcare systems is an important tool when trying to improve hospital efficiency and, among other things, reduce patient waiting times. A better insight into hospitals help decision-makers improve the management of hospital resources (e.g. hospital beds, staff) and avoid patient blockages, where a build-up for one type of resource can have knock-on effects on the rest of the system.

The focus of this research will be on how to best describe these healthcare systems and looking at improving existing modelling techniques such as simulation-based methods.

Click here to see further details of Dan's project.

These are just a few of the very real and practical issues our graduates will be well equipped to tackle, giving them the skills and experience to enable their careers to progress rapidly.

Associated Students

Here you can find details of PhD research projects from STOR-i associated students. You can also access further technical descriptions by clicking on the links to the student's personal webpage below each project description.

Improving consumer demand predictions

Devon Barrow
Supervisor: Sven Crone

The ability to accurately predict the demand for goods and services is important across all sectors of society particularly business, economics and finance. In the business and retail sector for example, prediction of consumer demand affects both the profitability of suppliers and the quality of service delivered. Unreliable demand forecasts can lead to inefficient order quantities, suboptimal inventory levels, and increased inventory as well as administrative and processing costs which all affect the revenue, profitability and cash flow of a company. Improvements in demand forecasts therefore have the potential for major cost savings. 

Traditionally a major source of this improvement has come from the selection of the appropriate choice of forecasting method. Additionally improvements in accuracy can also be achieved by combining the output of several forecast methods rather than relying on any single best one. This research investigates existing techniques and develops new ones for combining predictions from one or more forecast methods. The potential improvements in accuracy and reliability will help to support management decisions and allow managers to better respond to circumstances, events and conditions affecting seasonal demand, price sensitivities and supply fluctuations for both itself, and competitors.

Analysis and Classification of sounds

Karolina Krezmienewska
Supervisors: Idris Eckley and Paul Fearnhead

We are surrounded by a huge number of different sounds in our daily lives. Some of these are generated by natural phenomena, like the sound we hear during seismic activity or when the wind blows. Other sounds are generated by man-made devices. By analysing these sounds we can learn valuable information about their source. This can include either (a) identifying the type of the source or (b) assessing its condition. 

Classification of sounds is currently used in a variety of settings e.g. speech recognition, diagnosing cardiovascular diseases through the sound of the heart, and environmental studies. This project involves the development of more accurate methods for analysing and classifying sounds in collaboration with a leading industrial partner.

Extreme risks of financial investments 

Ye Liu
Supervisor: Jonathan Tawn

A few years have passed, but aftershocks of the credit crunch have spread far beyond just the financial sector and influenced everyone's life in many ways - housing, education, jobs etc. Living in a world yet to recover fully from the crisis-led recession, we cannot help but wonder what went wrong and how we can better prepare ourselves for the future.

To resolve the fundamental issue of understanding uncertainty in the financial sector statistical methods have been used for many decades. However standard statistical analysis relies on a good amount of past information, whereas events like the credit crunch have occurred only a few times throughout human history. Traditional risk management tends to use one model for all situations and make simplistic assumptions that rare events like the credit crunch happen in the same way as the normal ups-and-downs in the financial market. This research reveals that such beliefs can lead to a very inaccurate risk assessment, which is the root of many failed investments in the credit crunch. 

Analyzing the whole financial sector jointly is very difficult and usually assumptions are made that all financial products react similarly to a market crash. This research shows that it is not true and proposes a method which allows flexibility for each financial product to be treated individually. The new method provides a much more accurate risk assessment when multiple financial products are concerned, and is being incorporated by a top UK fund manager to identify the true extent of their risk in future.

Modelling Wind Farm Data and the Short Term Prediction of Wind Speeds    

Erin Mitchell
Supervisors: Paul Fearnhead and Idris Eckley

Wind energy is a fast developing market within the United Kingdom and the entire world. With the ever looming threat of Earth’s fossil fuels drying up, the world is increasingly looking to turn to renewable energy sources; wind energy is a popular and growing market within the renewables sector. 

In 2007 only 1.8% of the energy in the United Kingdom came from renewable sources. However, the United Kingdom’s Government is aiming to produce 20% of its energy from renewable sources by 2020. With the profile and demand for wind energy constantly increasing there is an expanding market in its analysis and prediction. Due to there being financial penalties for both under and over prediction it is important to make accurate predictions to maximise the profit made from sales to the market. Wind energy producers sell their energy in advance of its production and, as such, it is important to make accurate forecasts of wind speeds and energies up to 36 hours in advance. 

Alongside a leading renewables company, this research is looking at developing novel methods for accurate forecasts for wind power output, in particular by implementing dynamic systems with evolving model parameters.

Click here to read more about Erin's research.

Demand learning and assortment optimization 

Jochen Schurr
Supervisors: Kevin Glazebrook

In the retail industry, the most constraining resource is shelf space. Decision makers in that field should therefore pay careful consideration to the question of how to make optimal use of it. In the context of seasonal consumer goods, e.g. fashion, this decision making process becomes dynamic for two reasons: first, as the assortment changes seasonally or even within the season and, second, as the demand for each product is yet to be estimated more precisely with the use of actual sales data. 

The purpose of this project is to identify the key quantities and to study their sensitivity in the decision making process, both in existing and to-be-formulated models.

Modelling and Analysis of Image Texture 

Sarah Taylor
Supervisor: Idris Eckley

When one thinks about texture, a typical example that comes to mind is that of a woven material, straw or a brick wall. More formally, image texture is the visual property of an image region with some degree of regularity or pattern: it describes the variation in the data at smaller scales than the current perspective. In many settings it is useful to be able to detect differing fabric structure, for example to identify whether there is an area of uneven wear within a sample of material. To avoid the subjectivity of human inspection of materials it is thus desirable to develop an automatic detection method for uneven wear. Developing such methods is the focus of this PhD project.

Determining the future wave climate of the North Sea

Ross Towe
Supervisor: Jonathan Tawn

Wave heights are of inherent interest to oil firms, given that many of their operations take place offshore. Information about the meteorological processes, which determine the occurrence of extreme waves, influence plans for any future operations. Clarifying the risk of these operations is of importance for oil firms. 

This project will analyse the distribution of extreme wave heights and how this distribution will change under future climate change scenarios. Determining the distribution of future wave heights depends on knowledge of other factors such as wind speed and storm direction. Data from global climate models can also be used to provide an insight into the future large scale processes; however this information has to be downscaled to the local scale to produce site specific estimates that the oil firms can use. Naturally, past information can be used to predict the distribution at a specific site as well as from other sites across the region.

Facility layout design under uncertainty 

Yifei Zhao
Supervisor: Stein W Wallace

The facility layout problem (FLP) considers how to arrange physical locations of facilities (such as machine tools, work centres, manufacturing cells, departments, warehouse, etc.) for a production or delivery system. The layout of facilities is one of the most fundamental and strategic issues in many manufacturing industries. Any modifications or re-arrangements of existing layout involve substantial financial investment and planning efforts. An efficient layout of facilities can reduce operational cost and contribute to the overall production efficiency. One of the most frequently considered criteria for layout design is the minimization of material handling distance/cost. It is claimed that material handling cost contributes from 20 to 50 percent of the total operating expenses in manufacturing.

Classical FLPs only consider the deterministic cases where flows between each pair of machines are known and certain. However the real production environment involves uncertain factors such as changes in technology and market requirement. Under the uncertain environment, flows between machines are uncertain and can vary from period to period. We are interested in designing a robust layout which adapts to the flow changes. The criterion of the robust layout is to minimize the expected material handling cost over all possible uncertain production scenarios.

These are just a few of the very real and practical issues our graduates will be well equipped to tackle, giving them the skills and experience to enable their careers to progress rapidly.

Research Funding Opportunities

As a STOR-i PhD student, there are four key sources of funding:

Your own fund to cover attendance at training courses, conferences and books. In addition to this fund you are supplied with a high specification laptop. You will manage your own research fund spending. It typically covers attendance at 2 international conferences and 2-3 national meetings/conferences across your studies.

You can make a bid for funding for additional research support for more substantial activities, where your Personal Fund cannot cover it. Applications to the Research Fund are competitive and require a full case putting forward. Successful applicants will be responsible for the management of the award and reporting of outcomes. The process of applying for and managing grants gives the opportunity to practice and develop key skills acquired on the STOR-i programme. 

STOR-i’s Executive Committee are responsible for selecting applications to the research fund. They give full feedback to every applicant.

On PhD completion, STOR-i students are able to apply for a 1 year post-doctoral Impact Fellowship. One is typically awarded per cohort. Impact Fellowships are aimed at enhancing STOR-i students’ career development and ensuring the rapid impact of their research. Applications are assessed against PhD performance and a written research proposal describing how the fellowship will be used to further develop research ideas and achieve impact.

The current Impact Fellows are:

  • James Grant whose project is Optimal Partition-Based Search; and
  • Emma Stubington whose project is Supporting the design of radiotherapy treatment plans.

Why the scheme exists

Gwern Owain joined STOR-i in 2013 having completed a Mathematics degree at Cardiff University. He obtained an MRes in Statistics and Operational Research in 2014, and he started a PhD in statistical modelling for low-count time-series, supervised by Nikos Kourentzes and Peter Neal. Gwern died from leukemia in October 2015.

In recognition of Gwern’s happy experiences in STOR-i, his family (Robin, Eirian and Erin) has very generously offered STOR-i substantial funding in Gwern’s name. The funding provides bursaries for students to help understand and address humanitarian and environmental problems they wouldn’t have considered in their PhD, reflecting Gwern’s strong personal interests.

STOR-i students have undertaken activities in memory of Gwern, such as the STOR-i Yorkshire Three Peaks Challenge

What the scheme funds

The purpose of the bursary is to fund Statistics and Operational Research work by STOR-i students that improves humanitarian or environmental causes: doing good for people or the planet in some form.

Examples of what it could fund, include:

  • Attendance at humanitarian/environmental meetings, when existing funds wouldn’t naturally have covered this.
  • A short period to step outside the PhD and use statistics and operational research skills in some form of humanitarian/environmental cause, e.g., to do an analysis for a relevant group or to self fund an internship.
  • Funding for group activities for students on a humanitarian theme, e.g., schools visit to interest pupils in environmental/ethical Statistics and Operational Research.

Funding and Reporting Process

There is a deadline of 4pm 1st September annually, with bids to be submitted to the Director. However, if the proposed project or student has particular time limiting features then those bids can be submitted at any time. In such cases it is best to discuss this possibility with the Director in advance.

The bid document (1 page max) should explain what is intended to be done, how it will benefit the student awarded, and to provide outline breakdown of how the funding will be used.

Typically successful proposals will be £1K in value, though exceptionally we would be willing to consider proposals of up to £2K (assuming a suitable justification is provided).

Decisions on which bids to fund will be made by a panel consisting of two representatives from STOR-i Management Team, Phil Jonathan (Shell and a close friend of the Owain family), with input from the Owain family as required.

Successful Awards

2016 Awards:

Jamie-Leigh Chapman: To work with the Vegan Society, to analyse survey data, helping them develop a profile for the characteristics of vegans and understand regional differences in their numbers. The Vegan Society plan to use this information in a drive to increase veganism in the UK. Award £1K.

Emma Stubington: To work with Coeliac UK, to undertake data analysis that will contribute to evidence supporting their campaign to stop clinical commissioning groups restricting, or even removing, gluten-free prescription services. Award £1K.

Elena Zanini: To work with Mercy Corps (a global humanitarian aid agency) to improve understanding of why people in certain parts of the world use violence. With this understanding, aid programmes can more effectively address these factors and thereby reduce violence. Award £1K.