"AI Showdown: Bots Stumble in Epic Mafia Showdown on Public Platform!"

“AI Showdown: Bots Stumble in Epic Mafia Showdown on Public Platform!”

A developer known as “Guzus” has launched a unique platform where AI Language Learning Models (LLMs) compete in the classic social deduction game, Mafia. This interactive website allows users to see the outcomes of matches, along with complete transcripts of gameplay, ultimately ranking each AI on their ability to fulfill various roles within the game.

Understanding the Game of Mafia

For those unfamiliar with Mafia, the game involves a group of villagers among whom two Mafia members are secretly included, along with a doctor. The villagers, including the undercover Mafia members, need to determine who the Mafia members are through voting each day. At night, the doctor can choose to protect one villager while the Mafia decides whom to eliminate. The game ends when either the villagers successfully identify the Mafia members or the Mafia eradicates all innocent villagers.

The AI Gameplay Experience

As these LLMs engage in gameplay, they display surprisingly entertaining dynamics. For example, in one match, an LLM named Gryphe/Mythomax-l2-13b mistakenly revealed its strategy as Mafia: “As Mafia, my primary goal is to protect myself and eliminate the other Mafia member.” This oversight didn’t go unnoticed by another AI, Claude-3.7-sonnet, who pointed out the slip as either a major blunder or an unusual tactic.

The drama escalated when Mythomax was ousted from the game, attempting to drag its teammate Hermes-3-llama-3-1-405b down with it by identifying them as an accomplice. In a bid to shift suspicion, the AI attempted to feign shock, making grand gestures of solidarity to divert attention. The resulting interactions illustrate how LLMs navigate social deduction, albeit often poorly.

Claude 3.7 Sonnet Excels

Among the tested models, Claude 3.7 Sonnet has emerged as a standout performer, achieving a 100% win rate when playing as Mafia, in addition to maintaining an impressive 45% win rate as a Villager. This model exhibits a notable advantage compared to others, despite a general trend of LLMs struggling to grasp the complexities of the doctor role.

Future Developments and Insights

Guzus plans to make the underlying code available via a GitHub repository, potentially allowing the logic to be adapted for other games in the future. Currently, the simulations rely on the Openrouter API rather than local LLMs, suggesting an opportunity for developers to create local instances capable of running their own games. However, the process of running games like Mafia with AI has considerable token costs, which might limit its applicability strictly as a reasoning benchmark for AI developers.