My primary research interest for my PhD is the broad area of Markov Decision Processes, or MDPs, which can be used to model complex sequential decision-making problems under uncertainty. When modelling problems in this way, our goal is to find a “policy”, which is a function that maps any given “state” of a system to an optimal “action”, and we want this policy to minimise the long-run average cost (or maximise the long-run average reward) incurred by operating this system. For our application, we use this framework to model the problem of actively maintaining systems with redundant components. Specifically, we ask the question: for any given combination of components being healthy, damaged, or currently being repaired, which combination of damaged components do we want to start repairing? For an individual component which has just failed, we ask whether or not we want to repair this component straight away, given full knowledge of the system. Traditionally, dynamic programming algorithms such as Value Iteration and Policy Iteration were used to solve MDPs. However, with recent and ongoing advancements in Linear Programming (LP) solvers such as Gurobi or CPLEX, we have found that the LP formulation is more desirable and can be solved orders of magnitude faster than bespoke dynamic programming algorithms.
The novelty of our research is that we are not only considering the dynamic optimisation of some pre-existing system, but we are additionally optimising the initial design of the system itself. This is known as an MDP design problem, and it is very rarely seen in the literature due to its complexity. MDPs are known to suffer from the “curse of dimensionality”, where the number of potential states in the MDP grows exponentially as you consider larger problem instances. In our case, we have components that can each take on one of three different conditions – healthy, repairing, or damaged. Therefore, if we have N components, we would have 3^N total states. However, despite this, MDPs such as ours can still be solved exactly for moderate problems. The MDP Design problem suffers from the curse of dimensionality even more greatly, as it must model a state space not just for one MDP, but for all possible initial designs which could be constructed from the decision variables. To further complicate matters, we consider a bi-objective model where we wish to strike a balance between system reliability and operational costs such as using and maintaining components. This means that there is not just one solution, but a set of solutions which strike different balances between the two different objectives.
Due to the vast computational complexity of the MDP Design problem when trying to solve it exactly, the goal of our research is to design bespoke heuristics which approximately solve the problem in far more reasonable timeframes. These heuristics will first be rolled out on small problem instances where we can solve the problems exactly, for the sake of comparison between the heuristics and the exact result. We are currently doing this for a relatively simple model, which despite its simplicity cannot be solved exactly as the size of the problem increases. Our future work will extend the methodologies we develop now to work for more complicated problems, and we hope that this will demonstrate to the wider OR community that solutions to MDP Design problems can be found efficiently and in a way which is scalable to larger problem instances.
