In this project, we addressed the airline revenue maximization problem using a neural network-based approach to approximate dynamic programming. The objective was to determine an optimal booking policy for two connected flights, balancing ticket sales across different pricing tiers to maximize revenue while considering limited seating capacities.

Neural Network Solution

To overcome the limitations of traditional dynamic programming methods, such as the curse of dimensionality and infeasibility for real-time updates, we implemented Deep Q-Learning. This reinforcement learning approach trains a neural network to learn optimal booking policies by simulating millions of booking scenarios, allowing it to explore the space of possibilities and learn based on its own decisions. Key features of the implementation include:

  • State Representation: Incorporating time, remaining capacities, and request types.
  • Reward Structure: Rewards assigned based on ticket pricing, guiding the network toward profitable decisions.
  • Training: Iterative updates using simulated scenarios, allowing the network to ‘learn by doing.’

Performance and Results

The neural network achieved a significant reduction in variance compared to other approximate methods like standard Q-Learning. While exact dynamic programming produced the highest average revenue, the Deep Q-Learning approach demonstrated a strong balance of computational efficiency and revenue generation, with:

  • A smaller standard deviation in outcomes, indicating greater reliability.
  • An average revenue of £16,927, slightly lower than dynamic programming’s £17,792 but achieved with a more scalable method.

This work highlights the potential of neural networks for solving dynamic, high-dimensional optimisation problems efficiently. It was also the first time I had had the opportunity to implement reinforcement learning agent for a problem of practical significance.

Back to Home