AI bots now play Mafia with each other on public website, and almost all of them are terrible at it

A pegboard illustrating deduction of some kind.
(Image credit: Shutterstock)

A developer named "Guzus" has created a website where a selection of AI Language Learning Models (LLMs) can play the classic social deduction game Mafia with one another.

Not only can you see the results of who won each match, you can also view a complete transcript of each game played. This culminates in a full ranking for each LLM, to crown who might be the best at fulfilling every role played in Mafia.

Claude 3.7 Sonnet bucks the trend

But, out of every LLM listed, there's one clear winner in the tests so far, Claude 3.7 Sonnet. Anthropic's latest thinking model boasts a 100% win rate as a Mafia member, in addition to having the highest Villager win rate of 45%.

Something about Anthropic's model is giving it a distinct advantage over the others tested, even if none of the models quite understand how to play the role of the doctor.

Author Guzus claims to soon be making the Github repository for the game open to all, so that the basic logic might also be applied to other kinds of games.

He also shares that the simulations were not run using local LLMs, instead having to rely on the Openrouter API to function. But, it's possible that once the repository is public, that the project could be forked to work on local LLM clusters, if you have the hardware to run a game with several language models concurrently.

There's likely a significant token cost of running a game like Mafia with AI models, meaning its usefulness is perhaps limited to being a new reasoning benchmark for AI developers to play with.

Sayem Ahmed
Subscription Editor

Sayem Ahmed is the Subscription Editor at Tom's Hardware. He covers a broad range of deep dives into hardware both new and old, including the CPUs, GPUs, and everything else that uses a semiconductor.

  • Jame5
    I'm in my 40's, and honestly I had never heard of the game Mafia before a few years ago. Going from never hearing about it to having people calling it a classic is very odd.

    *Edit: it was created in 1986, first played in 1987 according to wikipedia.
    Reply