Residual-Conditioned Policy Iteration for Markov Games and Robust Markov Decision Processes

Wednesday 14 January 2026, 1:00pm to 2:00pm

Venue

MAN - Mngt School Robinson LT16 WPA019 - View Map

Open to

Postgraduates, Public, Staff

Registration

Registration not required - just turn up

Event Details

Jefferson Huang of the Naval Postgraduate School, California will present a seminar to the Management Science Department

Abstract: A Markov game (MG) can be viewed as a Markov decision process with multiple players who jointly determine the one-step rewards and transition probabilities. This talk considers two-player zero-sum MGs, which are closely related to robust MDPs. For such MGs, a generalization of policy iteration due to Pollatschek and Avi-Itzhak (1969) often performs well, but may diverge (van der Wal, 1978). Filar and Tolwinski (1991) proposed a modification to this algorithm that is equivalent to applying Newton’s method with Armijo’s rule to approximate the root of a suitably defined functional. We show via a simple example that this modification does not guarantee convergence either. We then present a provably convergent algorithm called residual-conditional policy iteration that retains the desirable empirical performance of Pollatschek and Avi-Itzhak’s original algorithm.

Speaker

Jefferson Huang

Naval Postgraduate School, California

Contact Details

Name Gay Bentinck
Email

g.bentinck@lancaster.ac.uk

Directions to MAN - Mngt School Robinson LT16 WPA019

Lancaster University Management School Lancaster UK