Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning
Marc Lanctot – Google DeepMind, Canada
In this talk, I will cover a recent AAMAS paper that was joint work with Zun Li from University of Michigan from and several others. Multiagent reinforcement learning (MARL) has benefited significantly from population-based and game-theoretic training regimes. One approach, Policy-Space Response Oracles (PSRO), employs standard reinforcement learning to compute response policies via approximate best responses and combines them via meta-strategy selection. We augment PSRO by adding a novel search procedure with generative sampling of world states, and introduce two new meta-strategy solvers based on the Nash bargaining solution. We evaluate PSRO's ability to compute approximate Nash equilibrium, and its performance in two negotiation games: Colored Trails, and Deal or No Deal. We conduct behavioral studies where human participants negotiate with our agents (N=346). We find that search with generative modeling finds stronger policies during both training time and test time, enables online Bayesian co-player prediction, and can produce agents that achieve comparable social welfare negotiating with humans as humans trading among themselves.
Bio: Marc Lanctot is a research scientist at DeepMind. His research interests include multiagent reinforcement learning, computational game theory, multiagent systems, and game-tree search. In the past few years, Marc has investigated game-theoretic approaches to multiagent reinforcement learning with applications to fully and partially observable zero-sum games, sequential social dilemmas, and negotiation/communication games. Marc received a Ph.D. degree in artificial intelligence from the Department of Computer Science, University of Alberta in 2013. Currently. Before joining DeepMind, Marc completed a Postdoctoral Research Fellowship at the Department of Knowledge Engineering, Maastricht University, in Maastricht, The Netherlands on Monte Carlo tree search methods in games.
Lieu
Pavillon André-Aisenstadt
Campus de l'Université de Montréal
2920, chemin de la Tour
Montréal Québec H3T 1J4
Canada