Musk challenges legendary AI researcher Karpathy to an AI coding showdown against Grok 5 — gets a polite 'no' to an IBM Deep Blue-like showdown

Elon Musk, Grok 3.5, xAI
(Image credit: Shutterstock)

Elon Musk has proposed a public coding contest between xAI’s Grok 5 and former OpenAI research lead Andrej Karpathy, comparing it to the 1997 showdown between Garry Kasparov and IBM’s Deep Blue. Karpathy declined, saying he’d rather collaborate with Grok than compete against it.

The challenge came in response to a clip from Karpathy’s recent interview on the Dwarkesh Podcast, where he argued that AGI is likely still a decade away and described Grok 5 as trailing GPT-4 by several months.

Musk, who has said Grok 5 has a 10% and rising chance of reaching AGI, took that as an invitation: “Are you down for an AI coding contest,” he posted on X, tagging Karpathy directly.

Karpathy replied that his contribution would “trend to ~zero” in such a matchup, and emphasized that he sees current models more as collaborators than adversaries.

The idea of a formal model-versus-human coding contest is not far-fetched. DeepMind earlier this year said Gemini 2.5 solved 10 of 12 problems under ICPC World Finals conditions, placing it at a gold-medal level. Both OpenAI and DeepMind have now achieved perfect 12/12 scores on the same benchmark using GPT-4 and GPT-5. These problems are drawn from university-level algorithm competitions, judged for both correctness and runtime performance, and run within strict resource and time constraints.

Earlier this year, a Polish programmer beat OpenAI’s custom model in a 10-hour head-to-head final at the AtCoder World Tour Finals, prompting speculation that it may be the last human win at the highest level. That contest was tightly controlled and fully transparent.

If Musk wants Grok 5 to be taken seriously in that class, he’ll need to subject it to the same conditions. The Deep Blue comparison only works if the match is measurable. That means fixed-length contests using a public problem set, identical access to tooling and compute, and no external inference or human assistance. The results would need to be scored independently and published in full.

Karpathy’s decision not to participate reflects a broader shift in the way machine learning practitioners talk about performance. Rather than staging head-to-head contests, many are now focused on how well models can accelerate human output. But competitive programming still offers a clear and well-defined benchmark. And so far, Grok has yet to post a score.

If xAI wants to demonstrate parity or superiority, a formal run on ICPC-grade tasks would be the obvious place to start.

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

Google Preferred Source

TOPICS
Luke James
Contributor

Luke James is a freelance writer and journalist.  Although his background is in legal, he has a personal interest in all things tech, especially hardware and microelectronics, and anything regulatory. 

  • Zaranthos
    Only a few months behind? OpenAI started a full year before Grok and OpenAI is backed by Microsoft. Sounds like some pretty rapid progress by the new kid on the block Grok. I really have no technical opinion on either since I've largely ignored AI and only did a few basic searches with both. Sometimes the results were OK, sometimes they were garbage in garbage out just like Google often is. I do think they have potential once they can start to truly reason and research problems and not just regurgitate stuff posted online at face value.
    Reply