PhD

Technical Description

Using Reinforcement Learning (RL) algorithms, such as Q-learning, to dynamically price perishable goods over a finite planning horizon with limited supply. We develop algorithms for ensuring safety during the exploration phase of an RL agent. The safety feature will ensure the RL algorithm does not take unsafe or fatal actions, without compromising the optimal policy convergence of the RL algorithm.