Human-level play in the game of Diplomacy by combining language models with strategic reasoning

Despite much progress in training AI systems to imitate human language, building agents that use language to communicate intentionally with humans in interactive environments remains a major challenge. We introduce Cicero, the first AI agent to achieve human-level performance in Diplomacy, a strategy game involving both cooperation and competition that emphasizes natural language negotiation and tactical coordination between seven players. Cicero integrates a language model with planning and reinforcement learning algorithms by inferring players’ beliefs and intentions from its conversations and generating dialogue in pursuit of its plans. Across 40 games of an anonymous online Diplomacy league, Cicero achieved more than double the average score of the human players and ranked in the top 10% of participants who played more than one game.

We present Cicero, an AI agent that achieved human-level performance in the strategy game Diplomacy. In Diplomacy, seven players conduct private natural language negotiations to coordinate their actions in order to both cooperate and compete with each other. In contrast, prior major successes for multi-agent AI have been in purely adversarial environments, such as chess (2), Go (3), and poker (4), where communication has no value. For these reasons, Diplomacy has served as a challenging benchmark for multi-agent learning (5–8).

Cicero couples a controllable dialogue module with a strategic reasoning engine. At each point in the game, Cicero models how the other players are likely to act based on the game state and their conversations. It then plans how the players can coordinate to their mutual benefit and maps these plans into natural language messages.

1 Like