Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning : GERAD

iCalendar

21 août 2023 11h00 — 12h00

Marc Lanctot – Google DeepMind, Canada

Marc Lanctot

In this talk, I will cover a recent AAMAS paper that was joint work with Zun Li from University of Michigan from and several others. Multiagent reinforcement learning (MARL) has benefited significantly from population-based and game-theoretic training regimes. One approach, Policy-Space Response Oracles (PSRO), employs standard reinforcement learning to compute response policies via approximate best responses and combines them via meta-strategy selection. We augment PSRO by adding a novel search procedure with generative sampling of world states, and introduce two new meta-strategy solvers based on the Nash bargaining solution. We evaluate PSRO's ability to compute approximate Nash equilibrium, and its performance in two negotiation games: Colored Trails, and Deal or No Deal. We conduct behavioral studies where human participants negotiate with our agents (N=346). We find that search with generative modeling finds stronger policies during both training time and test time, enables online Bayesian co-player prediction, and can produce agents that achieve comparable social welfare negotiating with humans as humans trading among themselves.

Bio: Marc Lanctot is a research scientist at DeepMind. His research interests include multiagent reinforcement learning, computational game theory, multiagent systems, and game-tree search. In the past few years, Marc has investigated game-theoretic approaches to multiagent reinforcement learning with applications to fully and partially observable zero-sum games, sequential social dilemmas, and negotiation/communication games. Marc received a Ph.D. degree in artificial intelligence from the Department of Computer Science, University of Alberta in 2013. Currently. Before joining DeepMind, Marc completed a Postdoctoral Research Fellowship at the Department of Knowledge Engineering, Maastricht University, in Maastricht, The Netherlands on Monte Carlo tree search methods in games.

Federico Bobbio responsable

Defeng Liu responsable

Lieu

Activité hybride au GERAD

Zoom et salle 4488
Pavillon André-Aisenstadt
Campus de l'Université de Montréal
2920, chemin de la Tour
Montréal Québec H3T 1J4
Canada

GERAD

Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning

21 août 2023 11h00 — 12h00

Marc Lanctot – Google DeepMind, Canada

Lieu

Organisme associé

Chaire d’excellence en recherche du Canada sur la science des données pour la prise de décision en temps réel