A project to bring CUDA to non-Nvidia GPUs is making major progress — ZLUDA update now has two full-time developers, working on 32-bit PhysX support and LLMs, amongst other things

AMD
(Image credit: AMD)

ZLUDA, a CUDA translation layer that almost closed down last year, but got saved by an unknown party, this week shared an update about its steady technical progress and team expansion over the last quarter, reports Phoronix. The project continues to build out its capabilities to run CUDA workloads on non-Nvidia GPUs; for now, it is more focused on AI rather than on other things. Yet, work has also begun on enabling 32-bit PhysX support, which is required for compatibility with older CUDA-based games.

Perhaps, the most important thing for the ZLUDA project is that its development team has grown from one to two full-time developers working on the project. The second developer, Violet, joined less than a month ago and has already delivered important improvements, particularly in advancing support for large language model (LLM) workloads through the llm.c project, according to the update.

32-bit PhysX

A community contributor named @Groowy began the initial work to enable 32-bit PhysX support in ZLUDA by collecting detailed CUDA logs, which quickly revealed several bugs. Since some of these problems could also impact 64-bit CUDA functionality, fixing them was added to the official roadmap. However, completing full 32-bit PhysX support will still rely on further help from open-source contributors.

Compatibility with LLM.c

The ZLUDA developers are working on a test project called llm.c, which is a small example program that tries to run a GPT-2 model using CUDA. Even though this test is not huge, it is important because it is the first time ZLUDA has tried to handle both normal CUDA functions and special libraries like cuBLAS (fast math operations).

This test program makes 8,186 separate calls to CUDA functions, spread over 44 different APIs. In the beginning, ZLUDA would crash right away on the very first call. Thanks to many updates contributed by Violet, it can now get all the way to the 552nd call before it fails. The team has already completed support for 16 of the 44 needed functions, so they are getting closer to running the whole test successfully. Once this works, it will help ZLUDA support bigger software like PyTorch in the future.

Improving accuracy of ZLUDA

ZLUDA's core objective is to run standard CUDA programs on non-Nvidia GPUs while matching the behavior of Nvidia hardware as precisely as possible. This means each instruction must either deliver identical results down to the last bit or stay within strict numerical tolerances compared to Nvidia hardware. Earlier versions of ZLUDA, before the major code reset, often compromised on accuracy by skipping certain instruction modifiers or failing to maintain full precision.

The current implementation has made substantial progress in fixing this. To ensure accuracy, it runs PTX 'sweep' tests — systematic checks using Nvidia's intermediate GPU language — to confirm that every instruction and modifier combination produces correct results across all inputs, something that has never been used before. Running these checks revealed several compiler defects, which were addressed later. ZLUDA admits that not every instruction has completed this rigorous validation yet, but stressed that some of the most complex cases — such as the cvt instruction — are now confirmed bit-accurate.

Improving logging

The foundation for getting any CUDA-based software to work on ZLUDA — whether it is a game, a 3D application, or an ML framework — is having logs of how the program communicates with CUDA, which includes tracking both direct API calls, undocumented parts of the CUDA runtime (or drivers), and any use of specialized performance libraries.

With the recent update, ZLUDA's logging system has been significantly upgraded. The new implementation captures a wider range of activity that was not visible before, including detailed traces of internal behavior, such as when cuBLAS relies on cuBLASLt or how cuDNN interacts with the lower-level Driver API.

Runtime compiler compatibility

Modern GPU frameworks like CUDA, ROCm/HIP, ZLUDA, and OpenCL all need to compile device code dynamically while applications run to ensure that older GPU programs can still be built and executed correctly on newer hardware generations without changes to the original code.

In AMD's ROCm/HIP ecosystem, this on-the-fly compilation depends on the comgr library (short for ROCm-CompilerSupport), a compact library with extensive capabilities to handle tasks like compiling, linking, and disassembling code, available on both Linux and Windows.

With ROCm/HIP version 6.4, a significant application binary interface (ABI) change occurred: the numeric codes representing actions were rearranged in a new v3 ABI. This caused ZLUDA to accidentally call the wrong operations — for example, attempting to link instead of compile, which led to errors. The situation was worse on Windows, where the library claimed to be version 2.9 but internally used the v3 ABI, mixing behaviors. These problems were also addressed recently by the ZLUDA team.

Follow Tom's Hardware on Google News to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button.

TOPICS
Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

  • hotaru251
    ngl...i hope this goes well but I fully expect if it does NVIDIA will try their best to shut it down somehow.
    Reply
  • -Fran-
    hotaru251 said:
    ngl...i hope this goes well but I fully expect if it does NVIDIA will try their best to shut it down somehow.
    They already have, kind of, forcing any "pro" user to abide by the TOS, so they can't use ZLUDA for anything and must use nVidia hardware. Because we somehow allow Companies to do that with the hardware we purchase from them.

    Still, that wouldn't be a problem for the regular user at home with non-nVidia hardware.

    Regards.
    Reply
  • Mr Majestyk
    -Fran- said:
    They already have, kind of, forcing any "pro" user to abide by the TOS, so they can't use ZLUDA for anything and must use nVidia hardware. Because we somehow allow Companies to do that with the hardware we purchase from them.

    Still, that wouldn't be a problem for the regular user at home with non-nVidia hardware.

    Regards.
    Why would the user of a Radeon card for example care one bit about Nvidia's TOS?
    Reply