Musk challenges legendary AI researcher Karpathy to an AI coding showdown against Grok 5 — gets a polite 'no' to an IBM Deep Blue-like showdown

(Image credit: Shutterstock)

Elon Musk has proposed a public coding contest between xAI’s Grok 5 and former OpenAI research lead Andrej Karpathy, comparing it to the 1997 showdown between Garry Kasparov and IBM’s Deep Blue. Karpathy declined, saying he’d rather collaborate with Grok than compete against it.

The challenge came in response to a clip from Karpathy’s recent interview on the Dwarkesh Podcast, where he argued that AGI is likely still a decade away and described Grok 5 as trailing GPT-4 by several months.

Musk, who has said Grok 5 has a 10% and rising chance of reaching AGI, took that as an invitation: “Are you down for an AI coding contest,” he posted on X, tagging Karpathy directly.

You make a lot of great points, especially that children should learn the tools of physics early. Are you down for an AI coding contest or whatever form of competition you’d like for Andrej vs Grok 5, a la Kasparov vs Deep Blue?October 18, 2025

Karpathy replied that his contribution would “trend to ~zero” in such a matchup, and emphasized that he sees current models more as collaborators than adversaries.

The idea of a formal model-versus-human coding contest is not far-fetched. DeepMind earlier this year said Gemini 2.5 solved 10 of 12 problems under ICPC World Finals conditions, placing it at a gold-medal level. Both OpenAI and DeepMind have now achieved perfect 12/12 scores on the same benchmark using GPT-4 and GPT-5. These problems are drawn from university-level algorithm competitions, judged for both correctness and runtime performance, and run within strict resource and time constraints.

Earlier this year, a Polish programmer beat OpenAI’s custom model in a 10-hour head-to-head final at the AtCoder World Tour Finals, prompting speculation that it may be the last human win at the highest level. That contest was tightly controlled and fully transparent.

If Musk wants Grok 5 to be taken seriously in that class, he’ll need to subject it to the same conditions. The Deep Blue comparison only works if the match is measurable. That means fixed-length contests using a public problem set, identical access to tooling and compute, and no external inference or human assistance. The results would need to be scored independently and published in full.

Karpathy’s decision not to participate reflects a broader shift in the way machine learning practitioners talk about performance. Rather than staging head-to-head contests, many are now focused on how well models can accelerate human output. But competitive programming still offers a clear and well-defined benchmark. And so far, Grok has yet to post a score.

If xAI wants to demonstrate parity or superiority, a formal run on ICPC-grade tasks would be the obvious place to start.

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

TOPICS

Luke James is a freelance writer and journalist. Although his background is in legal, he has a personal interest in all things tech, especially hardware and microelectronics, and anything regulatory.

13 Comments Comment from the forums

Zaranthos

Only a few months behind? OpenAI started a full year before Grok and OpenAI is backed by Microsoft. Sounds like some pretty rapid progress by the new kid on the block Grok. I really have no technical opinion on either since I've largely ignored AI and only did a few basic searches with both. Sometimes the results were OK, sometimes they were garbage in garbage out just like Google often is. I do think they have potential once they can start to truly reason and research problems and not just regurgitate stuff posted online at face value.
Reply
sudokill

I wouldn't give xAi too much credit for how fast they've caught up to openAi. The various AI companies aren't truly working independently on their systems. The progress that openAI made in their headstart -- it was much more than a year, closer to 5-- is something xAI can take advantage of through methods like model distillation and transfer learning. OpenAI has even claimed that the performance of models like DeepSeek R1 is due primarily to distillation which they've ironically called copyright violation.
Reply
hwertz

Yeah a live competition for something like this, I wouldn't do either. The AI (on my local computer, maybe I could code faster than it (I have a 4GB GTX1650 GPU so most inference is done on the 6C/12T Coffee Lake CPU. Notebook has a shared memory GPU and 20GB of RAM but that integrated GPU is just not that fast.). But compared to commercial hardware? You know darn well the computer will sail through it and the only thing will be if it flubs an answer or two or not.

The point isn't the AI being too slow, it's how far wrong it goes when it DOES go wrong. A algorithm contest is really a different beast and a lot different than hacing it write some production qualtiy code.
Reply
Makaveli

4GB gpu and a coffee lake cpu is pretty low end to either bother with running LLM's on that.
Reply
upsetkiller

I don't understand why musk even attempts to talk like he knows anything about a single thing. I'm certian americans think he invents or designs his products but the reality is he is a rich dude who hires brilliant folk to do the things his companies do. Dude couldn't even sort twitters code😂 let alone thinking he can talk to someone like Karpathy.
Reply
twotwotwo

As the article mentions IMO scores etc., also remember Claude Code declaring success at generating an ASCII-art Mandelbrot fractal via x86 assembly when the program's output was entirely the '@' symbol (it had zoomed into an uninteresting part of the image). Or even more curiously the METR study in which, on average, a pool of experienced devs in large codebases was somewhat slowed down by access to AI tools early this year.

LLMs are weird non-human-like things and the real world is not a coding competition (sometimes humans need to remember that too). Trying to assign a number to 'how smart LLMs are' or 'how good at coding LLMs are' has some use for optimization and so on, but misses much of what matters about how they work (or don't) in real life.
Reply
hwertz

Makaveli said:
4GB gpu and a coffee lake cpu is pretty low end to either bother with running LLM's on that.
You've got that right LOL. I've considered an upgrade -- but LMStudios option to run partially on GPU is quite slow (unsurprisingly, PCIe bus and system RAM are much slower than the VRAM); I don't have the budget for a 24GB+ GPU. And I'd hate to spend on some like 12GB CPU and then have problems getting models to fit anyway.

But, I can get a pic or two a minute from stable diffusion, and run up through 14B distills for LLMs. Token rate is not that high but it's good enough that I can ask it for something and, if it's lengthy, just check back in a minute or two. I'm sure it'd be VERY slow if I was trying to do some "here, write my whole program for me" vibe coding. But I found an API's docs rather inscrutible recently and asked it to write up a small Python function, I just checked back on it in a minute or so and it was done.

And I don't care to chat with an LLM, but given short prompts process faster and usually have short answers, I think it'd probably be (barely) OK to chat with.

If I made heavy use of LLMs I'd probably pay for one of those hosted LLM services, but, well, I don't make heavy use of them. It's a tool to crack out every now and then for me.
Reply
ravewulf

I have no polite words for Elon Musk, and I'll leave it at that
Reply
King_V

upsetkiller said:
I don't understand why musk even attempts to talk like he knows anything about a single thing. I'm certian americans think he invents or designs his products but the reality is he is a rich dude who hires brilliant folk to do the things his companies do. Dude couldn't even sort twitters code😂 let alone thinking he can talk to someone like Karpathy.
Some Americans do, I'm sure, others don't.

But this American just rolled his eyes and said, deadpan "Oh, look. Musk being a blowhard. Again. Quelle surprise."
Reply
Pete Mitchell

upsetkiller said:
I don't understand why musk even attempts to talk like he knows anything about a single thing. I'm certian americans think he invents or designs his products but the reality is he is a rich dude who hires brilliant folk to do the things his companies do. Dude couldn't even sort twitters code😂 let alone thinking he can talk to someone like Karpathy.

Musk does it for the attention, just like the orangutan in the White House. They have a pathological need for it. Karpathy was smart not to take the bait, as I wouldn't put it past Musk to cheat somehow.
Reply

Show more comments