Back to activities
ISS Informal Systems Seminar

Periodic agent-state based Q-learning for POMDPs

iCalendar

Jul 4, 2024   10:00 AM — 11:00 AM

Amit Sinha McGill University, Canada

Amit Sinha

** Hybrid seminar at McGill University or Zoom.**

The traditional approach to POMDPs is to convert them into fully observed MDPs by considering a belief state as an information state. However, a belief-state based approach requires perfect knowledge of the system dynamics and is therefore not applicable in the learning setting where the system model is unknown. Various approaches to circumvent this limitation have been proposed in the literature. A unified treatment of these approaches involves considering the "agent state", which is a model-free, recursively updateable function of the observation history. Some examples of an agent state include frame stacking and recurrent neural networks. Since the agent state is model-free, it is used to adapt standard RL algorithms to POMDPs. However, standard RL algorithms like Q-learning learn a deterministic stationary policy. Since the agent state is not an information state, we cannot apply the same results for MDPs and thus, we must first consider what happens with the different policy classes: stationary/non-stationary and deterministic/stochastic. Our main thesis that we illustrate via examples is that because the agent state is not information state, non-stationary agent-state based policies can outperform stationary ones. To leverage this feature, we propose PASQL (periodic agent-state based Q-learning), which is a variant of agent-state-based Q-learning that learns periodic policies. By combining ideas from periodic Markov chains and stochastic approximation, we rigorously establish that PASQL converges to a cyclic limit and characterize the approximation error of the converged periodic policy. Finally, we present a numerical experiment to highlight the salient features of PASQL and demonstrate the benefit of learning periodic policies over stationary policies.

Peter E. Caines organizer
Aditya Mahajan organizer
Shuang Gao organizer
Borna Sayedana organizer
Alex Dunyak organizer

Location

Room MC 437
CIM
McConnell Building
McGill University
3480, rue University
Montréal QC H3A 0E9
Canada

Associated organization

Centre for intelligent machines (CIM)

Research Axis

Research application