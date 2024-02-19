Jim Keller, a legendary processor architect who has worked on x86, Arm, MISC, and RISC-V processors, this weekend criticized Nvidia's CUDA architecture and software stack and likened it to x86, which he called a swamp. He pointed out that even Nvidia itself has multiple special-purpose software packages that rely on open-source frameworks for performance reasons.

"CUDA is a swamp, not a moat," Keller wrote in an X post. "x86 was a swamp too. […] CUDA is not beautiful. It was built by piling on one thing at a time."

Cuda’s a swamp, not a moat. x86 was a swamp tooFebruary 17, 2024 See more

Indeed, just like x86, CUDA has gradually added functionality while maintaining backward compatibility in software and hardware. This makes Nvidia's platform complete and backward compatible, but it affects performance and makes program development harder. Meanwhile, many open-source software development frameworks can be used more efficiently than CUDA.

"Basically nobody writes CUDA," wrote Keller in a follow-up post. "If you do write CUDA, it is probably not fast. […] There is a good reason there is Triton, Tensor RT, Neon, and Mojo."

Even Nvidia itself has tools that do not exclusively rely on CUDA. For example, Triton Inference Server is an open-source tool by Nvidia that simplifies deploying AI models at scale, supporting frameworks like TensorFlow, PyTorch, and ONNX. Triton also provides features like model versioning, multi-model serving, and concurrent model execution to optimize the utilization of GPU and CPU resources.

Nvidia's TensorRT is a high-performance deep learning inference optimizer and runtime library that accelerates deep learning inference on Nvidia GPUs. TensorRT takes trained models from various frameworks, such as TensorFlow and PyTorch, and optimizes them for deployment, reducing latency and increasing throughput for real-time applications like image classification, object detection, and natural language processing.

But although architectures like Arm, CUDA, and x86 might be considered swamps because of their relatively slow evolution, mandated backward compatibility, and bulkiness, these platforms are also not as fragmented as things like GPGPU, which may not be a bad thing at all.

It isn't clear what Jim Keller thinks of AMD's ROCm and Intel's OneAPI, but it is clear that even though he spent many years of his life designing x86 architectures, he isn't enamored with its future prospects. His statements also imply that even though he has worked stints at some of the largest chipmakers in the world, including the likes of Apple, Intel, AMD, Broadcom (and now Tenstorrent), we might not see his name on the Nvidia roster any time soon.