Decision Awareness in Reinforcement Learning
Pierre-Luc Bacon – Université de Montréal, Canada
Decision awareness is the learning principle according to which the components of a learning system ought to be optimized directly to satisfy the global performance criterion: to produce optimal decisions. This end-to-end perspective has recently led to significant advances in model-based reinforcement learning by addressing the problem of compounding errors plaguing alternative approaches. In this talk, I will present some of our recent work on this topic: 1. on learning control-oriented transition models by implicit differentiation and 2. on learning neural ordinary differential equations end-to-end for nonlinear trajectory optimization. Along the way, we will also discuss some of the computational challenges associated with those methods and our attempts at scaling up performance, specifically: using an efficient factorization of the Jacobians in the forward mode of automatic differentiation through novel constrained optimizers inspired by adversarial learning.
Location
Pavillon André-Aisenstadt
Campus de l'Université de Montréal
2920, chemin de la Tour
Montréal Québec H3T 1J4
Canada