Jim Keller criticizes Nvidia's CUDA, x86 — 'Cuda’s a swamp, not a moat. x86 was a swamp too'

Jim Keller and Raja Koduri
(Image credit: Tom's Hardware)

Jim Keller, a legendary processor architect who has worked on x86, Arm, MISC, and RISC-V processors, this weekend criticized Nvidia's CUDA architecture and software stack and likened it to x86, which he called a swamp. He pointed out that even Nvidia itself has multiple special-purpose software packages that rely on open-source frameworks for performance reasons. 

"CUDA is a swamp, not a moat," Keller wrote in an X post. "x86 was a swamp too. […] CUDA is not beautiful. It was built by piling on one thing at a time." 

Indeed, just like x86, CUDA has gradually added functionality while maintaining backward compatibility in software and hardware. This makes Nvidia's platform complete and backward compatible, but it affects performance and makes program development harder. Meanwhile, many open-source software development frameworks can be used more efficiently than CUDA. 

"Basically nobody writes CUDA," wrote Keller in a follow-up post. "If you do write CUDA, it is probably not fast. […] There is a good reason there is Triton, Tensor RT, Neon, and Mojo." 

Even Nvidia itself has tools that do not exclusively rely on CUDA. For example, Triton Inference Server is an open-source tool by Nvidia that simplifies deploying AI models at scale, supporting frameworks like TensorFlow, PyTorch, and ONNX. Triton also provides features like model versioning, multi-model serving, and concurrent model execution to optimize the utilization of GPU and CPU resources. 

Nvidia's TensorRT is a high-performance deep learning inference optimizer and runtime library that accelerates deep learning inference on Nvidia GPUs. TensorRT takes trained models from various frameworks, such as TensorFlow and PyTorch, and optimizes them for deployment, reducing latency and increasing throughput for real-time applications like image classification, object detection, and natural language processing. 

But although architectures like Arm, CUDA, and x86 might be considered swamps because of their relatively slow evolution, mandated backward compatibility, and bulkiness, these platforms are also not as fragmented as things like GPGPU, which may not be a bad thing at all.  

It isn't clear what Jim Keller thinks of AMD's ROCm and Intel's OneAPI, but it is clear that even though he spent many years of his life designing x86 architectures, he isn't enamored with its future prospects. His statements also imply that even though he has worked stints at some of the largest chipmakers in the world, including the likes of Apple, Intel, AMD, Broadcom (and now Tenstorrent), we might not see his name on the Nvidia roster any time soon. 

Anton Shilov
Freelance News Writer

Anton Shilov is a Freelance News Writer at Tom’s Hardware US. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

  • NinoPino
    There is a typo, it is MIPS not MISC.
    Reply
  • ezst036
    Maybe Jim should stop flapping his lips and see to it that Tenstorrent produces some motherboards.

    Without some ATX boards, we can't use his RISC-V designs.

    With that said, we're stuck in the swamp and he isn't supplying any rope so we can try to stop drowning in it. So what do I care what he says? He isn't producing any solutions to my problem that's for sure. Talk is cheap.

    This is just another installment in the daily drama.
    Reply
  • Argolith
    ezst036 said:
    Maybe Jim should stop flapping his lips and see to it that Tenstorrent produces some motherboards.

    Without some ATX boards, we can't use his RISC-V designs.

    With that said, we're stuck in the swamp and he isn't supplying any rope so we can try to stop drowning in it. So what do I care what he says? He isn't producing any solutions to my problem that's for sure. Talk is cheap.

    This is just another installment in the daily drama.
    Sorry if a single guy doesn't break up corporate monopolies on his own.
    Reply
  • The Hardcard
    Tenstorrent does have silicon out. I don’t think it’s high volume manufacturing yet.

    Also by “we” you probably need to mean corporate. I think all of these AI startups are chasing data center dollars, so be surprised if there anything selling for less than $25K for a complete running system. And that’s for efficiency plays. Any system that they can show is faster than an H100 system and already has a useable software stack will cost far more.
    Reply
  • Findecanor
    People rooting for RISC-V have been rooting for Tenstorrent's announced 8-wide Ascalon cores.

    But apparently, those are not actually what Tenstorrent is prioritising. They are mostly intended to be "companion processors" to TPUs.
    Ascalon has been announced to support only RV64GCV, whereas to reach feature parity with x86 and ARM and for RISC-V software to kick off, you'd need RVA23 compliance.

    So, IMHO, Tenstorrent is also wading around in the fragmented RISC-V swamp instead of helping build solid software foundations.
    Reply
  • Pierce2623
    Findecanor said:
    People rooting for RISC-V have been rooting for Tenstorrent's announced 8-wide Ascalon cores.

    But apparently, those are not actually what Tenstorrent is prioritising. They are mostly intended to be "companion processors" to TPUs.
    Ascalon has been announced to support only RV64GCV, whereas to reach feature parity with x86 and ARM and for RISC-V software to kick off, you'd need RVA23 compliance.

    So, IMHO, Tenstorrent is also wading around in the fragmented RISC-V swamp instead of helping build solid software foundations.
    Fully agreed. At least CISC computing is mostly unified behind one ISA. RISC V is useless until it has a software stack that actually does things. The tenstorrent CPU that people were excited about doesn’t even comply well enough with RISC V to run much of a what is a very thin software stack anyways. It basically turned out to be a driver for their AI solution and nothing else.
    Reply
  • hsv-compass
    Pierce2623 said:
    Fully agreed. At least CISC computing is mostly unified behind one ISA. RISC V is useless until it has a software stack that actually does things. The tenstorrent CPU that people were excited about doesn’t even comply well enough with RISC V to run much of a what is a very thin software stack anyways. It basically turned out to be a driver for their AI solution and nothing else.
    x86, Arm, MIPS all have unified ISA and mature software stack. Never understand the need for just another poorly supported ISA (YARI - yet another RISC ISA). Is MIPS ISA really royalty free (MIPS open)?
    Reply
  • bit_user
    NinoPino said:
    There is a typo, it is MIPS not MISC.
    When did he work on MIPS? He co-architected a DEC Alpha, but that was its own (RISC) ISA, and not MIPS-based.

    I also thought this was a typo, but I'm not sure what was intended. Anyway, I looked it up and MISC is apparently a thing. I'm still unconvinced that was intended, as it doesn't seem very useful to mention in such a brief summary.
    https://en.wikipedia.org/wiki/Minimal_instruction_set_computer
    Honestly, it would be easier just to list the places Jim worked (and delivered!) as a chip architect:
    DEC
    AMD (twice)
    Apple
    Tesla
    He also briefly oversaw developments at Intel, but that was higher-level and more forward looking than a chip architect role.

    we might not see his name on the Nvidia roster any time soon.
    I think that's a safe bet. He surely doesn't need another salaried position. With Tenstorrent, he's clearly decided to throw his hat into the startup game.

    Plus, at this point, I'm sure Nvidia has quite a deep roster. They don't really need him like his other recent employers did.
    Reply
  • bit_user
    ezst036 said:
    Maybe Jim should stop flapping his lips and see to it that Tenstorrent produces some motherboards.

    Without some ATX boards, we can't use his RISC-V designs.
    First, they have built boards, such as PCIe cards. However, that's for AI chips where you just need to add DRAM, a VRM, and then you have everything you need to plug it into a PC and start using it.

    For RISC-V, Tenstorrent has focused on just the IP and building CPU tiles. Jim has given talks (you can find them on youtube), where he's extolled the virtues of chiplets. He's said a small company like his shouldn't have to deal with all of the various enablement IP, like memory controllers, PCIe, storage controllers, etc.

    So, not only are they not building motherboards for their RISC-V cores, they're not even making complete SoCs. I think it totally makes sense. If he's right, you'll be able to use their cores in CPUs and SoCs from others. However, the lead time on such developments is usually a couple years from the point at which the IP is completed and validated. So, I wouldn't expect to see their RISC-V cores in any devices, just yet.

    ezst036 said:
    With that said, we're stuck in the swamp and he isn't supplying any rope so we can try to stop drowning in it.
    Sure, he is! Tenstorrent makes AI accelerators backed by an open API & toolchain and supported in major AI frameworks.

    I think you need to read between the lines, a little more. His statement is probably meant to counter the narrative Tenstorrent might be facing from potential customers. Maybe they keep hearing the refrain "...but CUDA!", in their sales calls? Just a guess.
    Reply
  • bit_user
    The Hardcard said:
    I think all of these AI startups are chasing data center dollars, so be surprised if there anything selling for less than $25K
    Last year, Tenstorrent inked a deal with a LG, where their AI chiplets will be used in future TVs for AI postprocessing and upscaling.
    https://tenstorrent.com/research/tenstorrent-partners-with-lg-to-build-ai-and-risc-v-chiplets-for-smart-tvs-of-the-future/
    While looking that up, I also found a partnership announcement in the automotive industry:
    https://tenstorrent.com/research/bos-and-tenstorrent-partner-to-develop-automotive-semiconductor/
    Reply