Research

Stackelberg Equilibria & RL for System and Mechanism Design

Using RL to learn the design of economic mechanisms and other aspects of multi-agent systems has gained increasing attention in recent years. A natural formulation of such an approach is through a Stackelberg Equilibrium, a type of asymmetric equilibrium featuring a leader and one ore more follower agents. By implementing the mechanism or system designer as the leader agent, we can transform system design into an end-to-end learning problem alongside agent behavior. In a series of papers we develop a theory of Stackelberg equilibria in multi-agent RL as well as applications in mechanism design.

Papers: [arXiv 2022] [NeurIPS WS Meta-Learing 2022] [arXiv 2022] [ICML WS RL Theory 2021] [AAAI 2021] [arXiv 2021] [NeurIPS WS Econ Policy 2021]

The RL Co-Pilot: Modular RL and Critics as Counterfactual Oracles

We propose a new research direction of modular RL, or within-agent cooperative RL. In this, an agent is composed of multiple sub-agents, each responsible for one aspect of its behavior. Our aim is to learn each of these modules separately, and in a way where we can mix-and-match different policies for different sub-tasks. We have to specific applications in mind with this: One, we imagine a world where AIs augment human agents rather than replacing them, with specific tasks delegated to AI policies, but others remaining under human control. Two, a module design might greatly simplify training even in purely AI scenarios. For instance, in complex multi-agent scenarios, we may wish to separate learning underlying domain skills from learning multi-agent interaction. In ongoing work, we use critics in actor-critic approaches as proxies for counterfactuals. This allows sub-policies to query an agent for its valuation of multiple “what-if” scenarios, and to act accordingly.

Papers: [Coming Soon] Related: [ICLR 2022] [ICLR 2022 Talk] [ICML WS HMCaT 2022]

Multi-Agent RL as Interdependent Decision-Making

Multi-Agent RL is generally considered a hard problem. In joint work with Sarah Keren, we show that this is only half the story: Other agents can also be an asset. If agents are able to communicate or collaborate, this can help both in training, as well as in adapting to unexpected changes. For instance, we demonstrate that sharing even a small fraction of experiences between agents can lead to drastically faster training and increased converged performance. Furthermore, the approach can be implemented in a “decentralized training with communication” paradigm, unlike most multi-agent RL algorithms, which require a centralized training stage. We view this as a form of “interdependent” or semi-decentralized decision-making, sitting between the typical fully centralized and fully independent points of view.

Papers: [preprint 2022] [NeurIPS WS Deep RL 2022] [arXiv 2021] [IJCAI WS Ad-Hoc 2022] [NeurIPS WS Cooperative AI 2021] [NeurIPS WS Strategic ML 2021]

Atari 2600 Games for Multi-Agent RL

Atari 2600 games have become a standard benchmark domain for single-agent RL. We believe they are also an interesting testbed for multi-agent scenarios, because they require learning a harder underlying domain skill than most multi-agent RL benchmark domains. We further show that we can make small modifications to the emulator execution to smoothly adjust the interaction between different agents, for instance by artificially limiting a (shared) resource in the game.

Papers: [Coming Soon]

Two-Sided and Composable Mechanism Design

Traditionally (algorithmic) mechanism design considers an economic mechanism such as an auction in isolation: A seller is assumed to be naturally endowed with an item, and buyers are assumed to possess an innate valuation for that item. In reality, of course, items and valuations come from somewhere - items are bough or produced, and sold on or consumed for further economic activity. We consider two-sided mechanisms, where the seller first has to procure an item before selling it on. This is a first step toward a more general framework of composable mechanism design, where individual mechanisms can be linked to form a larger economic network. In the course of this work, we also show that reverse (or procurement) auctions behave rather differently from (sales) auctions: As a particularly surprising result, we show that in reverse auctions it can be (cost-) optimal to buy (and then discard!) multiple copies of an item, even when only one is wanted.

Papers: [AAAI] [IPL] [AAMAS] [SAGT]

Matthias Gerstgrasser