Novel methods for the detection of emergent phenomena in data streams
Student Edward Austin
Supervisors Idris Eckley and Lawrence Bardwell
Industrial partner BT
Every day, more than 90% of households and over 95% of businesses rely on the BT network for internet access. This is not only for personal use, such as streaming content or browsing social media but also for commercial use such as performing transactions. Given the number of users and the importance of digital networks in our everyday lives, any faults on the network must be detected and rectified, as soon as possible.
To facilitate this, the volume of data passing through the network is monitored at several locations. This PhD aims to develop new statistical approaches that are capable of detecting when the volume of data being observed differs substantially from that expected. These differences can take a variety of forms, emerging gradually over time. The project aims to not only detect the onset of these phenomena but to perform the detection in real time.
Multivariate extremes for nuclear regulation
Student Callum Barltrop
Supervisor Jenny Wadsworth
Industrial partner Office for Nuclear Regulation
The Office for Nuclear Regulation (ONR) is responsible for the regulation of nuclear safety and security of GB nuclear-licensed sites. ONR’s Safety Assessment Principles (SAPs)* expect nuclear installations to be designed to withstand natural hazards with a return frequency of one in 10,000 years, conservatively defined with adequate margins to failure and avoiding ‘cliff-edge’ effects. This involves extrapolating beyond the observed range of data. A statistical framework is used to model and estimate such events.
For my PhD project, I am working with the ONR to investigate methods for applying multivariate extreme value theory. In particular, I am looking at techniques for estimating ‘hazard curves’ (graphs of frequency and magnitude) for combinations of natural hazards that could inform the design bases for nuclear installations. I am also considering new methods for incorporating factors such as climate change and seasonal variability into the analysis of environmental data.
Automated resource planning through reinforcement learning
Student Ben Black
Supervisors Chris Kirkbride, Vikram Dokka and Nikos Kourentzes
Industrial partner BT
BT is the UK’s largest telecommunications company, and it employs over 20,000 engineers who work in the field. The engineers do jobs relating to television, internet and phone, and these jobs require different skills to complete. For BT, planning is very important in letting them have enough person-hours available to be able to complete all of the jobs that they have appointed, and also enough engineers with the required skills to do so. Planning the workforce entails, for example, deciding on how many hours of supply BT should make available for each type of job and assigning the engineers’ hours to their different skills.
My project is concerned with these two aspects of planning. Due to the size of the problem at hand, it is naturally best to solve it automatically. The main approach we will use to help automate BT’s planning process is reinforcement learning (RL). RL is a set of methodologies based on how humans and animals learn through reinforcement. For example, dogs learn to sit down on command by being given a treat when they sit, which acts as positive reinforcement. This is the general idea we aim to use; good planning actions will be rewarded, and bad ones will be penalised. Over time, this allows us to learn which planning actions should be taken in which demand scenarios. This approach is not common in workforce planning, and so the research we do here will provide a novel, automatic and fast planning approach that will provide optimal plans for even the largest of workforces.
Learning to group research profiles through online academic services
Student George Bolt
Supervisors Simon Lunagomez and Chris Nemeth
Industrial partner Elsevier
Elsevier is a company which specialises in the provision of online content and information to researchers. Through a large portfolio of products, such as the reference manager Mendeley, or the searchable database of literature ScienceDirect, they aim to help academics with every aspect of the research life cycle.
As a joint venture between STOR-i and Elsevier, this PhD project looks to develop and apply tools from network analysis to make sense of their often high-dimensional but structured datasets. Of particular interest is using data for their various platforms, which lends itself to a natural network-based representation. Successful analysis of these data would allow Elsevier to understand better how its platforms are being used, thus guiding its future development and the improvement of the user experience. The end goal is the development of methodologies which are not only applicable and useful for the problems at hand but also novel within the wider network analysis literature.
Route optimisation for waste collection
Student Thu Dang
Supervisors Burak Boyaci and Adam Letchford
Industrial partner Webaspx
Many countries now have curbside collection schemes for waste materials, including recyclable ones (such as paper, card, glass, metal, and plastic) and non-recyclable ones (such as food and garden waste). Optimising the routes taken by the vehicles can have dramatic benefits, in terms of cost, reliability and CO2 emissions. Although there is a huge academic literature on vehicle routing, many councils still use relatively simple heuristic methods to plan their routes. This PhD project is concerned with the development of improved algorithms for this task.
The routing problems that emerge in the context of waste collection have several key characteristics. First, they are often large-scale, with thousands of roads or road segments needing treatment. Second, the area under consideration usually has to be partitioned into regions or districts, with each region being served on a different day. Third, the frequency of service may depend on the material being collected and on the season (e.g., garden waste might be collected more often in summer than in winter). Fourth, the vehicles have limited capacity, in terms of both weight and volume. As a result, they periodically need to travel to specialised facilities (such as recycling plants or landfill sites) to unload, before continuing the rest of their route. Fifth, there is a limit on the total time spent travelling by each driver. Finally, one must consider the issues of fairness between drivers.
Due to the complexity of these problems, it is unlikely that they can be solved to proven optimality in a reasonable amount of time. Thus, in this PhD, we will develop fast heuristics that can compute good feasible solutions, along with lower-bounding techniques, which will enable us to assess the quality of the heuristic solutions.
Methods for streaming fraud detection in online payments
Student Chloe Fearn
Supervisors David Leslie and Robin Mitra
Industrial partner Featurespace
Whilst credit cards and online purchases are very convenient, the presence of fraudulent transactions is problematic. Fraud is both distressing for the customer and expensive for banks to investigate and refund, so where possible, transactions made without the cardholder’s permission should be blocked. Featurespace has designed a modelling approach which is successful at blocking fraudulent transactions as frequently as possible, without often blocking transactions that were genuinely attempted by the customer.
Due to ever-evolving behaviours by fraudsters to avoid getting caught, the classifier that decides whether or not transactions are fraudulent needs to be updated frequently. We call this model retraining, and the process requires up-to-date labelled data. However, when transactions are blocked, the truth on whether they were fraud or not is unknown. As a result, these transactions are difficult to use for model retraining so they must be used, or not used, with caution. My project is concerned with how to utilise best the information we have. We aim to first look at how to accept transactions in a way that provides the classifier with the most information, and second, to think about using the transactions that were blocked for model training, by carefully predicting whether they were fraudulent or genuine.
Modelling wave interactions over space and time
Student Jake Grainger
Supervisors Adam Sykulski and Phil Jonathan
Industrial partner JBA Trust
The world’s oceans play an important role in many aspects of modern life, from transportation to energy generation. Ocean waves are one of the main challenges faced by vessels and structures operating in the oceans and drive the waves that cause coastal flooding and erosion. In certain conditions, these waves can cause severe damage, endangering structures, vessels, communities and lives.
The resulting scientific challenge is to understand the conditions that can cause instances of catastrophic damage. To do this, it is common to describe the conditions in a given area of the ocean. It is then possible to understand what kind of impacts we would expect on a structure or vessel that is in these conditions or on coastal communities when these waves propagate onshore.
To do this, we use data, taken from a measuring device, such as a buoy, situated in the area of interest. We then try to estimate the conditions that could have given rise to these observations. Usually, scientists and engineers do this by developing general models for ocean wave behaviour that they then fit the observed data. In most cases, these models have to account for multiple wave systems. The wave systems behave differently if they are generated locally (wind sea waves) than if they have travelled from elsewhere in the ocean (swell waves). An added complexity is that these weather systems interact with one another in ways that are very difficult to predict, presenting an extra challenge to those interested in modelling wave behaviour.
Throughout this project, we aim to utilise state-of-the-art techniques from time series analysis to improve how practitioners can estimate model parameters and model how conditions can change over time. More advanced techniques can also be employed to explore the non-linear interactions between swell and wind sea systems, which play an important role in determining the conditions that are experienced in practice.
Simulation analytics for deeper comparisons
Student Graham Laidler
Supervisors Lucy Morgan and Nicos Pavlidis
Industrial partner Northwestern University (Evanston, USA)
Businesses and industries across every sector are reliant on complex operations involving the movement of commodities such as products, customers or resources. Many manufacturing processes, for instance, move a constant flow of products through a production sequence. To allow for informed and cost-effective decision-making, managers need to understand how their system is likely to perform under different conditions. However, the interactions of uncertain variables such as service times and waiting times lead to complex system behaviour, which can be difficult to predict.
Building a computer model of such a system is an important step towards understanding its behaviour. Stochastic simulation provides a probabilistic modelling approach through which the performance of these systems can be estimated numerically. With a combination of machine learning and data analytic techniques, this project aims to develop a methodology for simulation output analysis which can uncover deeper insights into simulated systems.
Information fusion for non-homogeneous panel and time-series data
Student Luke Mosley
Supervisors Idris Eckley and Alex Gibberd
Industrial partner Office for National Statistics
The Office for National Statistics (ONS) has the responsibility of collecting, analysing and disseminating statistics about the UK economy, society and population. Official statistics have traditionally been reliant on sample surveys and questionnaires; however, in this rapidly evolving economy, response rates of these surveys are falling. Moreover, there exists a concern about not making full use of new data sources and the continuously expanding volume of information that is now available. Today, information is being gathered in a countless number of ways, from satellite and sensory data to social network and transactional data. Hence, ONS is exploring how administrative and alternative data sources might be used within their statistics. In other words, how might they remodel the 20th Century survey-centric way into the 21st-century combination of structured survey data, with administrative and unstructured alternative digital data sources?
My PhD project is to assist the ONS with this transformation, by developing novel methods for combining insight from the alternative information recorded at a different periodicity and reliability, with traditional surveys, to meet the ever-increasing demand for improved and more detailed statistics.
Input uncertainty quantification for large-scale simulation models
Student Drupad Parmar
Supervisors Lucy Morgan, Richard Williams and Andrew Titman
Industrial partner Naval Postgraduate School
Stochastic simulation is a well-known tool for modelling and analysing real-world systems with inherent randomness such as airports, hospitals, and manufacturing lines. It enables the behaviour of the system to be better understood and performance measures such as resource usage, queue lengths or waiting times to be estimated, thus facilitating direct comparisons between different decisions or policies.
The stochastic in stochastic simulation comes from the input models that drive the simulation. These input models are often estimated from observations of the real-world system and thus contain an error. Currently, few consider this source of error known as input uncertainty when using simulation as a decision support tool. Consequently, decisions made based on simulation results are at risk of being made with misleading levels of confidence which can have significant implications. Although existing methods allow for input uncertainty to be quantified and hence any risk to be nullified, these methods do not work well for simulation models that are large and complex. This project aims to develop a methodology for quantifying input uncertainty in large-scale simulation models so that crucial and expensive decisions can be made with better risk assessments.
Statistical analysis of large-scale hypergraph data
Student Amiee Rice
Supervisors Chris Nemeth and Simon Lunagomez
Industrial partner The University of Washington (Seattle, USA)
Connections between individuals happen countless times every day in a plethora of ways; from the messages sent on social media to the co-authorship on papers. Graphs provide a way for representing these relationships, with individuals represented by points (or nodes) and the connection between them represented with a line (or edge). This graph structure has been well studied in statistics, and it is known that when a connection involves more than two individuals (maybe an email chain with three or more individuals in it), a graph might not capture the whole story. An alternate construction that enables us to represent connections involving two or more individuals is known as a hypergraph. Hypergraphs can capture a single connection between three or more individuals so statistical analysis on these kinds of connections is made more feasible.
As technology advances, the ability to collect and store data is becoming increasingly easy. The abundance of data analyses large-scale groups of connections between individuals problematic. The PhD will focus on exploring the way that hypergraphs can be used to represent connections as well as aiming to make scalable methods that can handle many individuals.
Multivariate oceanographic extremes in time and space
Student Stan Tendijck
Supervisors Emma Eastoe and Jonathan Tawn
Industrial partner Shell
In the design of offshore facilities, e.g., oil platforms or vessels, it is very important - both for safety and reliability reasons - that structures - old and new - can survive the most extreme storms.
Hence, the focus of this project is centred on modelling the ocean during the most extreme storms. In particular, we are interested in the aspects of the ocean that are related to structural reliability. Wave height is widely considered to be the most important; however, also other environmental variables such as wind speed can play a significant role. Together with Shell, we intend to develop models that can be used to capture (1) the dependence between environmental variables, such as wind speed and wave height, to characterise the ocean environment, (2) the dependence of these variables over time, as it should be taken into account that large waves occur throughout a storm, and (3) the dependence of all these characteristics of the ocean at different locations. These models can then be used to, for example, estimate whether or not old oil rigs are strong and safe enough.
Moreover, a key part of the research will be to develop novel methods to model mixture structures in extremes. This is also directly applicable to the above since waves can be classified into two types: wind waves and swell waves. Even though both types of waves have different characteristics, it is impossible to classify a wave with 100% certainty in most scenarios. Hence, it is of practical importance that models need to be developed that can deal with these types of dependency structures.