Recurrent Natural Policy Gradient for POMDPs : GERAD

iCalendar

Jan 24, 2025 10:00 AM — 11:00 AM

Semih Cayci – RWTH Aachen University, Germany

Semih Cayci Séminaire hybride à l'Université McGill ou Zoom.

In this talk, we introduce a natural policy gradient method leveraging recurrent neural networks (RNNs) to address the challenges in reinforcement learning for partially observable Markov decision processes (POMDPs) that stem from non-Markovian dynamics. Our approach adopts an actor-critic approach, incorporating RNNs into multi-step temporal difference learning and a natural policy gradient method to enable efficient learning in POMDPs.

We present a rigorous theoretical analysis in the kernel regime, providing finite-time and finite-width guarantees for both the critic and the policy optimization. Our results include explicit bounds on the required network widths and sample complexity, highlighting the potential of RNNs to address challenges in reinforcement learning with partial observability. Additionally, we discuss the limitations of this approach when dealing with long-term dependencies, outlining critical challenges and open problems. This talk will provide insights into the interplay between memory, network architecture, and learning efficiency in POMDPs.

Bio: Semih Cayci is a tenure-track Assistant Professor in the Department of Mathematics at RWTH Aachen University, Germany. Previously, he was an NSF TRIPODS Postdoctoral Fellow at the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign. His research focuses on the theoretical and algorithmic foundations of reinforcement learning, deep learning theory and optimization.

Peter E. Caines organizer

Aditya Mahajan organizer

Shuang Gao organizer

Borna Sayedana organizer

Alex Dunyak organizer

Location

Room MC 437
CIM
McConnell Building
McGill University

3480, rue University
Montréal QC H3A 0E9
Canada