Turns out, AI can actually build competent Minesweeper clones — Four AI coding agents put to the test reveal OpenAI's Codex as the best, while Google's Gemini CLI as the worst

Coding hacking database lock illustration
(Image credit: Getty Images)

As the world burns around us because of corporations chasing AI with seemingly unlimited resources, we ought to see what all this commotion has bought us. Recently, the folks over at Ars Technica put four of the most popular AI coding agents to the test, with a deceptively simple ask: build Minesweeper for the web. The clone had to include sound effects, mobile touchscreen support, and a "fun" gameplay twist.

For those unaware, Minesweeper relies on logic, which dictates gameplay, along with reasonable enough UI/UX elements that combine to make a decent challenge. It's not exactly hard to make a Minesweeper clone, but its underlying mechanics require at least some level of ingenuity that usually comes from humans — after all, AGI is the goal, right?

OpenAI Codex - 9/10 🏅

The best performer by far was Codex, which not only did a decent job with the visuals, but was the only AI to actually include "chording," a technique that reveals all surrounding tiles if you placed your flags right. Chording is a favorite amongst seasoned players, so its omission automatically makes any Minesweeper clone feel less polished.

Codex's build had all the buttons properly working, including a toggle for sound, featuring era-accurate beeps and boops, along with on-screen instructions for both mobile and desktop. As for the gameplay twist, there was a "Lucky Sweep" button in the corner that would occasionally reveal one safe tile when you've earned it.

The coding experience with Codex was also smooth, with the command line interface featuring nice animations and local permission management, though the agent did take its sweet time with writing the code. Ars Technica described this effort as the closest to something that would be ready to ship with minimal human interference, scoring it an impressive 9/10.

Claude Code - 7/10

The runner-up was Anthropic's Claude, which took half as long as Codex to come up with the code and delivered a product that was more aesthetically pleasing. In fact, it was the most refined-looking version of all, with custom graphics for the bomb and a device-agnostic smile emoji at the top. The sound effects were also pleasant, and its toggle worked fine across mobile and desktop.

Though, the experience fell apart when there was no chording support — "unacceptable," according to the OP. There was a "Power Mode" which acted as the gameplay twist, lending you simple power-ups that would require genuine creativity on the agent's end. On mobile, there's also a "Flag Mode" button that's a decent alternative to long-pressing to mark the tiles.

In our opinion, this was the best-feeling clone, too, when we tried it. Claude Code's Opus 4.5 model built the Minesweeper clone in less than 5 minutes and featured the cleanest coding interface. Overall, the presentation is very solid, leading to 7/10 score that would be higher had the chording feature been there.

Mistral Vibe - 4/10

In third place, we have Mistral's Vibe that produced a namesake product, which is to say, the results were synonymous with something that'd be vibe-coded. The game worked and looked fine, but it lacked the ever-important chording feature and didn't have sound effects. There was also a "Custom" button at the bottom that did nothing. Vibe didn't add any fun gameplay twists either, so all that knocks off a few points.

The smiling emoji at the top was all-black, which was off-putting to the testers, and selecting the "Expert" mode extends the grid beyond the confines of its square backdrop, but that's just a visual glitch. You can right-click to flag on desktop, but you're forced to press and hold on mobile, which might awkwardly bring up your device's context menu (it didn't, in our case).

The coding interface was solid and easy to use, but not exactly the fastest — though, the last place is so far off that the bar isn't very high. Ars Technica's editors were impressed by how well it performed, despite lacking the large-scale resources of the big names. At the end, Mistral Vibe got a 4/10, which seems lower than it deserved based on their description.

Google Gemini - 0/10 ❌

Dead last was Google's Gemini CLI, which might be surprising to some considering how often Google tops benchmarks these days, and the general comeback story associated with the return of co-founder Sergey Brin to helm frontier AI at the Cali giant. Gemini's Minesweeper clone simply didn't work. It had buttons, but no tiles to speak of, so there was no game to play or even score.

In terms of visuals, it looks eerily similar to Claude Code's final result; like if someone had stopped the agent mid-coding. Gemini also took the longest, with each code run taking an hour and the agent constantly asking for external dependencies. Even after slightly altering the rules to give it a second chance with a hard-and-fast instruction to use HTML5, it couldn't produce a usable result.

Ars Technica does note that Gemini CLI didn't have access to the latest Gemini 3 coding models and relied on a cluster of Gemini 2.5 systems instead. Perhaps, paying for the higher tier of Google AI would've ended more favorably, rendering this test as "incomplete," but it's still pretty disappointing, nonetheless.

So, there you have it — this is what we allowed to quadruple our memory prices and ruin computers for the time being. Codex won with Mistral Vibe and Claude Code following closely, and Google not even trying, but at what cost. If you weren't already all-in on AI, it's safe to say that this experiment won't convince you of anything.

Google Preferred Source

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

Hassam Nasir
Contributing Writer

Hassam Nasir is a die-hard hardware enthusiast with years of experience as a tech editor and writer, focusing on detailed CPU comparisons and general hardware news. When he’s not working, you’ll find him bending tubes for his ever-evolving custom water-loop gaming rig or benchmarking the latest CPUs and GPUs just for fun.