STOR-i PhD Projects

STOR-i projects have been developed with our industrial partners and use real-life issues to ensure our graduates are equipped to make a significant impact in the commercial world.

To see current and previous PhD projects choose a cohort below:

PhD Projects: 2018

Here you can find details of current PhD research projects from students joining STOR-i in 2017. You can also access further technical descriptions by clicking on the links to the student's personal webpage below each project description.


Optimal Scheduling for the Decommissioning of Nuclear Sites

Matthew Bold

Supervisors: Christopher Kirkbride, Burak Boyaci and Marc Goerigk

With production having come to an end at the Sellafield nuclear site in West Cumbria, focus is now turning to the decommissioning of the site, and safe clean-up of legacy nuclear waste. This is a project that is expected to take in excess of 100 years to complete and cost over £90 billion. Given the large scale and complexity of the decommissioning project, it is crucial that each task is systematically choreographed according to a carefully designed schedule. This schedule must ensure that the site is decommissioned in a way that satisfies multiple targets with respect to decommissioning speed, risk reduction and cost, whilst accounting for the inherent uncertainty regarding the durations of many of the decommissioning tasks. My research aims to develop optimisation methods to help construct such a schedule.

In partnership with Sellafield.

Click here to see a technical description.


Online Changepoint Methods for Improving Care of the Elderly

‌‌Jess Gillam

Supervisors: Rebecca Killick

The NHS is under great pressure from an ageing population. Due to great advancements in modern medicine and other factors, the NHS and other social care services must provide the necessary care to a growing population of elderly people. This PhD project is partnered with Howz. Howz is based on research that implies changes in daily routine can indicate potential health risks. Howz use data from sensors placed around the house and other low-cost sources such as smart meter data to detect these changes. Alerts are then sent to the household or immediate care facilitators, where permission has been granted, to check on their safety and well-being. To the NHS, early intervention such as this is likely to result in fewer ambulance call outs for elderly patients and fewer elderly requiring long hospital stays.

The objective of this PhD is to provide novel ways of automatically detecting changes in human behaviour using passive sensors. The first focus of the PhD will be in sensor-specific activity and considering changes in behaviour as an individual evolves over time.

In partnership with Howz.

Click here to see a technical description.


On Topics Around Multivariate Changepoint Detection

Thomas Grundy

Supervisors: Rebecca Killick

Royal Mail deliver between forty and fifty million letters and parcels daily. In order for this process to run smoothly and efficiently, the data science team at Royal Mail are using innovative techniques from statistics and operational research to improve certain application areas within the company.

My research will aim to create and develop time-series analysis techniques to help tackle some of the open application areas within Royal Mail. Time-series data are collected over time and a key analysis is to identify time-points where the structure of the data may change; a changepoint. Current changepoint detection methods for multivariate time-series (time-series with multiple components) are either highly inefficient (take too long to get an answer) or highly inaccurate (do not correctly identify the changepoints) when the number of time-points and variables grows large. Hence, my research will aim to produce a multivariate changepoint detection method that is computationally efficient, as the number of time-points and dimensions grows large, while still accurately detecting changepoints. This method will be extremely useful within many of the open application areas within Royal Mail.

In partnership with Royal Mail.

Click here to see a technical description.


Rare Disease Trials: Beyond the Randomised Controlled Trial

‌‌Holly Jackson

Supervisors: Thomas Jaki

Before a new medical treatment can be given to the public, it must first go through a number of clinical trials to test its safety and efficiency. Most clinical trials at present use a randomised control design, such that a fixed proportion (usually 50%) of patients are allocated to the new treatment and the other patients are given the control treatment. This design allows the detection of the best treatment with high probability so that all future patients will benefit. However, it does not take into account the well-being of the patients within the trial.

Response-adaptive designs allow the allocation probability of patients to change depending on the results of previous patients. Hence, more patients are assigned to the treatment that is considered better, in order to increase the well-being of patients within the trial. Multi-Armed bandits are a form of response-adaptive design, which maximise the chance of a patient to benefit from the treatment. They balance ‘learning’ (trying each treatment to decide which is best) and ‘earning’ (allocating the patients to the current best treatment to produce more patient successes).

Response-adaptive designs are not often used in practice, due to their low power. This low power means it can be difficult to find a meaningful difference between the treatments within a trial. Hence more research is needed to extend response adaptive methods such that they both: maximise patient successes and produce high enough power to find a meaningful difference between the treatments. 

In partnership with Quanticate.

Click here to see a technical description.


Detecting Changes Within Networked Data Streams

Mirjam Kirchner

Supervisors: Idris Eckley

Recent advances in network infrastructure and parallel data storage have now reached the point where it is possible to continuously collect and stream data from almost any operational system one might be interested in monitoring. 

To make sense of this abundance of data further processing and analysis is essential. In many applications like fraud detection, or the identifications of malfunctions in a system, we are interested in capturing deviations from the regularly observed behaviour, as these might indicate events that need to be further investigated. Such structural changes at a certain time point in the data generating process are termed changepoints.

If the monitored system is very large, then information is usually gathered at a plurality of different locations resulting in a high dimensional, multivariate data set, sometimes referred to as panel data. An example of such a system is the BT telecommunications network - a data cable network that provides over 27.6 million homes and business premises in the United Kingdom with fixed-line, mobile and broadband services. Because of the interconnectedness within the system, emerging events may depend on one another, and it has been observed that changes travel through the network over time.

During my PhD, I aim to develop efficient methods for the detection of changes in network panel data. I intend to improve the reliability of my methods by integrating additional knowledge about the underlying network structure into the detection algorithm.

This project is conducted in partnership with BT, part of the NG-CDI programme — an EPSRC  Prosperity Partnership.

Click here to see a technical description.


Statistical Learning for GPS trajectories

‌‌Michael O'Malley

Supervisors: Adam Sykulski and David Leslie

Evaluating risk is extremely important across many industries. For example, in the motor insurance industry, a fair premium price is set by fitting statistical models to predict an individual's risk to the insurer. These predictions are based on demographic information and prior driving history. However, this information does not account for how an individual drives. By accurately assessing this factor insurers could better price premiums. Good drivers would receive discounts and bad drivers penalties.  

Recently insurers have started to record driving data via an onboard diagnostic device known as a black box. These devices give information such as speed and acceleration. In this project, we aim to gain an understanding of how this information can be used to better understand driving ability. This will involve developing statistical models that can predict risk more accurately than traditional methods.

Click here to see a technnical description.


Scalable Monte Carlo in the General Big Data Setting

Srshti Putcha

Supervisors: Christopher Nemeth and Paul Fearnhead

Technological advances in the past several decades have ushered in the era of “big data”. Typical data-intensive applications include genomics, telecommunications, high-frequency financial markets and brain imaging. There has been a growing demand from industry for competitive and efficient techniques to make sense of the information collected.

We now have access to so much data that many existing statistical methods are not very effective in terms of computation. In recent years, the machine learning and statistics communities have been seeking to develop methods which can scale easily in relation to the size of the data.

Much of the existing methodology assumes that the data is independent, where individual  observations do not influence each other. My research will seek to address a separate challenge, which has often been overlooked. We are interested in extending “big data” methods to dependent data sources, such as time series and networks.

In partnership with University of Washington, Seattle.

Click here to see a technical description.


Data-Driven Alerts in Airline Revenue Management

‌‌Nicola Rennie

Supervisors: Catherine Cleophas and Florian Dost

Airlines monitor and control passenger demand by adjusting the number of seats available to passengers in different fare classes, with the objective being to increase revenue. Forecasts of passenger demand are made, based on booking data from previous flights. Passenger booking behaviour that deviates from the expected demand, such as for flights approaching carnivals or major sporting events, needs to be brought to the attention of an analyst. Due to the large networks of flights and the complexity of the forecasts, it is often difficult for analysts to correctly adjust seat availability on flights.

In partnership with Deutsche Lufthansa, my PhD aims to develop methods that highlight such deviations from expected behaviour and potentially make a recommendation to an analyst about what action should be taken. By employing such methods, the project will lead to the development of a prototypical alert system that is able to predict, with a degree of confidence, likely targets for analyst interventions.

In partnership with Deutsche Lufthansa.

Click here to see a technical description.


Aggregation and Downscaling of Spatial Extremes

Jordan Richards

Supervisors: Jonathan Tawn and Jenny Wadsworth

Historical records show a consistent rise in global temperatures and intense rainfall events over the last 70 years. Climate change is an indisputable fact of life, and its effect on the frequency and magnitude of extreme weather events is evident from recent events. The Met Office develops global Climate Models, which detail changes and developments in global weather patterns caused by climate change. However, very little research has been conducted into establishing a relationship between the extreme weather behaviour globally, and locally; either within smaller regions or at specific locations.

My PhD aims to develop statistical downscaling methods that construct a link between global, and local, extreme weather. We hope that these methods can be used by the Met Office to improve meteorological forecasting of future, localised, extreme weather events. This improvement will help to see the avoidance of the large-scale costs associated with avoidable damage to infrastructure caused by extreme weather; such as droughts or flooding.

In Partnership with Met Office.

Click here to see a technical description.


Ranking Systems in Sports – and Beyond

Harry Spearing

Supervisors: Jonathan Tawn

The goal of this project is to investigate the statistical properties of world ranking systems in competitive sports, and to evaluate the effectiveness of such systems in assessing the relative strengths of different players or teams. Ranking systems in sports offer a fertile ground for statistical research, and there are a wealth of interesting research questions – for instance:

  • What characteristics of a ranking system may lead a statistician to assess it as being ‘good’ or ‘bad’? What might be the real-world / practical justification for ‘bad’ characteristics being present?
  • To what extent are existing ranking systems ‘fit-for-purpose’ to determine tournament eligibility / seeding, or team selection? (Through such mechanisms, the rankings system itself may exert an influence on the fortunes of individual competitors, and thus, the ultimate result of the competition.)
  • Ranking systems must carefully balance long-term performance with recent performance: they must adapt quickly to new information, without over-reacting to the random fluctuations caused by individual results. Do the rankings in different sports tend to ‘evolve’ at different speeds – and if so, why? Is there any evidence that rankings for team sports evolve more quickly than for individual sports?
  • What level of predictive power is provided by the rankings, in terms of forecasting the result of a match or a tournament? How does this predictive power vary across different sports?
  • In many ranking systems, different tournaments are assigned different weightings or point values to reflect their perceived level of importance. What statistical evidence is there that these weightings are set appropriately?
  • Can ranking systems be exploited by teams or individuals to gain a competitive advantage? (In golf, for instance, might a player be able to gain an edge by playing in carefully-chosen tournaments?)
  • Conversely, can ranking systems be designed to promote / reward specific behaviours?
  • In which situations might it be valuable to assign a multi-dimensional ranking – representing, for instance, a tennis player’s ability on different surfaces, or a golfer’s ability on different kinds of course? How quickly might we be able to identify players or teams who perform strongly or weakly under particular circumstances?
  • Are there new applications of ranking systems to areas of sport where they aren’t currently being used?"

In partnership with ATASS.

Click here to see a technical description.


Evaluation of the Intelligence Gathering and Analysis Process

‌‌Livia Stark

Supervisors: Kevin Glazebrook and Peter Jacko

Intelligence is defined as the product resulting from the collection, processing, integration, evaluation, analysis and interpretation of available information concerning foreign nations, hostile or potentially hostile forces or elements or areas of actual or potential operations. It is crucially important in national security and anti-terror settings.

The rapid technological advancement of the past few decades has enabled a significant growth in the information collection capabilities of intelligence agencies. Said information is collected from many different sources, such as satellites, social networks, human informants, etc. to the extent that processing and analytical resources may be insufficient to evaluate all the gathered data. Consequently, the focus of the intelligence community has shifted from collection to efficient processing and analysis of the gathered information.

We aim to devise effective approaches to guide analysts in identifying information with the potential to become intelligence, based on the source of the information, whose characteristics need to be learnt. The novelty of our approach is to consider not only the probability of an information source providing useful intelligence, but the time it takes to evaluate a piece of information. We aim to modify existing index based methods to incorporate this additional characteristic.

In partnership with the Naval Postgraduate School  in Monterey, California.

Click here to see a technical description.


Recommending Mallows

Anja Stein

Supervisors: David Leslie and Arnoldo Frigessi

Recommender systems have become prevalent in present-day technological developments. They are machine learning algorithms which make recommendations, by selecting a specific range of items for each individual, which they are most likely to be in interested in. For example, on an e-commerce website, having a search tool or filter is simply not enough to ensure a good user experience. Users want to receive recommendations for things, which they may not have considered or knew existed. The challenge recommender systems face is to sort through a large database and select a small subset of items, which are considered to be the most attractive to each user depending on the context.

In a recommendation setting, we might assume that an individual has specified a ranking of the items available to them. For a group of individuals, we may also assume that a distribution exists over the rankings. The Mallows model can summarise the ranking information in the form of a consensus ranking and a scale parameter value to indicate the variability in rankings within the group of individuals.

We aim to incorporate the Mallows model to a recommender system scenario, where there are thousands of items and individuals. Since the set of items that an individual may be asked to rank is too large, we usually receive data in the form of partial rankings or pairwise comparisons. Therefore, we need to use methods to predict a user's ranking from their preference information. However, many users will be interacting with a recommender system regularly in real time. Here, the system would have to simultaneously learn about its unknown environment that it is operating in whilst choosing alternative items with potentially unknown feedback from users. Hence, the open problem we are most concerned about is how to use the Mallows model to make better recommendations to the users in future.

In partnership with Oslo University, Norway.

Click here to see a technical description.


Predicting Recruitment to Phase III Clinical Trials 

Szymon Urbas

Supervisors: Christopher Sherlock

In order for a new treatment to be made available to the general public, it must be proven to have a beneficial effect on a disease with tolerable side effects. This is done through clinical trials, a series of rigorous experiments examining the performance of a treatment in humans. It is a complicated process which often takes several years and costs millions of pounds. The most costly part is Phase III which is composed of randomised controlled studies with large samples of patients. The large samples are required to establish the statistical significance of the beneficial effect and are estimated using the data from the Phases I and II of the trials.

The project concerns itself with the design of new methodologies for predicting the length of time to recruit the required number of patients for Phase III trials. It aims to use available patient recruitment data across multiple hospitals and clinics including early data from the current trial. The current methods rely on unrealistic assumptions and very often underestimate the time to completion, giving a false sense of confidence in the security of the trial process. Providing accurate predictions can help researchers measure the performance of the recruitment process and aid them when making decisions on adjustments to their operations such as opening more recruitment centres.

In partnership with AstraZeneca.

Click here to see a technical description.


Interactive Machine Learning for Improved Customer Experience

Alan Wise

Supervisors: Steffen Grunewalder

Machine learning is a field which is inspired by human or animal learning and has the objective to create automated systems, which learn from their past, to solve complicated problems. These methods often appear as algorithms which are set in stone. For example, an algorithm trained on images of animals to recognise the difference between a cat or a dog. This project instead concentrates on statistical and probabilistic problems which deal with an interaction between the learner and some environment. For instance, if our learner is an online store which wishes to learn customer preferences by recommending adverts and receiving feedback on these adverts through whether or not customer clicks on them. Multi-armed bandit methods are often used here. These methods are designed to pick the best option out of a set of options through some learner-environment interaction. Multi-armed bandit methods are often unrealistic, therefore, a major objective of this project is to design alterations to the multi-armed bandit methods for use in real-world applications. 

In partnership with Amazon.

Click here to see a technical description.