AMD's software stack remains a weak spot — ROCm won't support RDNA 4 at launch

AMD RDNA 4 and Radeon RX 9000-series GPUs

(Image credit: AMD)

AMD's upcoming RDNA 4 consumer graphics cards will not get official ROCm support at launch. According to Phoronix, AMD answered a question during its press briefing stating that ROCm support would not arrive for RDNA 4 until sometime after launch day. AMD has since not clarified what the timeline for support for the consumer cards will look like.

ROCm, or Radeon Open Compute Ecosystem, is AMD's open-source answer to Nvidia's CUDA platform. The ROCm software stack is meant to enable HPC and AI workflows for consumer/prosumer products and has offered official support to AMD's consumer products on Windows since 2022. Of course, ever since launching in the pro sector, ROCm has lagged behind the standards of consumer support set by CUDA, which has offered CUDA support for its consumer products at launch for several generations.

Interestingly, AMD first teased its Navi 48 GPU die through ROCm Validation Suite documentation, with its first sighting coming in April of last year. For the cards using Navi 48 not to be ROCm-ready at launch is a bit silly, if not ironic (in the Alanis Morissette sense of the word). RX 9070 XT and RX 9070 may not launch with official ROCm support for a few days, weeks, or even months, but this does not mean that they will not work with the software, nor is it unusual for a new AMD release.

Since first coming to Windows, AMD has had interesting support rollouts for ROCm. When it first broadened its reach to consumer cards, ROCm's support list only included the RX 6900 XT, RX 6600, and, surprisingly, the R9 Fury from 2015, with only the R9 Fury receiving "full" software support and the 6000-series cards only working with parts of the HIP runtime. Currently, AMD's list of supported GPUs includes the full RX 7000-series, most of the RX 6000-series, and the Radeon VII on Windows, though the lower-end of the 6000-series does not support the HIP SDK, and Linux support is only extended to the RX 7900 and Radeon VII.

Compared to CUDA's history of supporting Nvidia's newest consumer cards on launch day and its extensive backward compatibility stretching back to 2006, ROCm has a long way to go. We'll keep our eyes peeled for any announcements from AMD on the future of ROCm support being extended to the RX 9070 XT, 9070, and the rest of the 9000 series. AMD has also recently acknowledged user polls calling for extending full support to the 6000-series and Strix/Strix Halo mobile chips, so compatibility may take longer than hoped to arrive as AMD works through everyone's ROCm wishlists.

See more GPUs News

Sunny Grimm is a contributing writer for Tom's Hardware. He has been building and breaking computers since 2017, serving as the resident youngster at Tom's. From APUs to RGB, Sunny has a handle on all the latest tech news.

20 Comments Comment from the forums

Peksha

Great news for gamers! No AI-bots among the first buyers jacking up prices
Reply
Makaveli

Peksha said:
Great news for gamers! No AI-bots among the first buyers jacking up prices
Agreed.

As someone that uses ROCm on my 7900 XTX it also means I won't touch these.
Reply
virgult

And this is exactly why AMD can't gain any market share.

There's legions of CS students and pros who would ditch Nvidia the second ROCm becomes a bit less of a bad-taste parody of CUDA. Every single generation, AMD spectacularly fails to deliver. I'm amazed they sell any HPC accelerators at all, considering how pathetic their SW stack is.

SInce they have no interest in offering any versatility in their GPUs, let's at least hope that their "strictly for gaming" stock makes gamers happy so that there's less pressure on (sigh...) Nvidia.
Reply
bit_user

virgult said:
And this is exactly why AMD can't gain any market share.
I think there's definitely more to it than just lacking and inconsistent compute support.

virgult said:
There's legions of CS students and pros who would ditch Nvidia the second ROCm becomes a bit less of a bad-taste parody of CUDA. Every single generation, AMD spectacularly fails to deliver. I'm amazed they sell any HPC accelerators at all, considering how pathetic their SW stack is.
Agreed. A couple years ago, I recall reading an impassioned plea by a prof (maybe adjunct) who had to make Nvidia GPUs a requirement for his class, after countless hours of headaches and failed attempts trying to help students get various AMD GPUs to work.

virgult said:
SInce they have no interest in offering any versatility in their GPUs,
It's not for lack of interest. At least, not any time recently.

It turns out there's a purely technical explanation for why AMD's track record of ROCm support is so spotty. Here's an excerpt of a post by a former AMD-GPU employee who I think retired only about 6 months ago. I've only quoted the key bit, but I'd encourage you to follow the link and read the entire post.
"... rather than compiling compute code to an IR such as PTX and distributing that, the ROCm stack currently compiles direct to ISA and uses a fat binary mechanism to support multiple GPUs in a single binary. This was a decent approach back in the early days of ROCm when we needed a way to support compute efficiently without exposing the IR and shader compiler source used for Windows gaming, but it needed to be followed up with an IR-based solution before the number of chips we wanted to support became too large.

At the time we were thinking about needing one ISA version per chip generation but that changed quickly as hardware teams started introducing new features when ready rather than saving them up for a new generation so we could have 3 or 4 different ISA versions per GPU generation. This meant that binaries for libraries and application code grew much more quickly than initially expected, to the point where they were blowing past 2GB and causing other problems. I believe we fixed the 2GB issue quickly but that does not help to prevent stupidly large binaries if we support too many chips at once."

https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-amd-linux/1520112-amd-seeking-feedback-around-what-radeon-gpus-you-would-like-supported-by-rocm?p=1520611#post1520611Now, you can argue that AMD should've fixed this by now, and I wouldn't disagree. However, I think they've probably been consumed by trying to keep up with Nvidia in making HIP fully CUDA-compatible, porting various software packages to HIP, and doing all the bring-up work on their ambitious MI300. Not to mention having to add support for various RDNA GPUs.

AMD's GPU team is still far smaller than the equivalent parts of Nvidia and I can understand it's hard to prioritize activities that primarily don't benefit new hardware or revenue. Yet, failing to do so is strategically bad. So, it's a classic "rock and hard place" scenario - or, perhaps "ROCm and hard place", as the case may be.

Also, one fact that's uncomfortable for some of us Linux fans is that AMD has long been biased towards Windows. To a significant degree, their mainstream GPU Compute efforts have been consumed by supporting Microsoft initiatives, like C++AMP and DirectCompute. Within the organization, the Linux efforts were long seen as something just needed to support the relatively small workstation and server GPU markets and funded from those revenue streams.
Reply
virgult

bit_user said:
It's not for lack of interest. At least, not any time recently.
I think the lack of interest is shown by the fact the they were *asked*. Intel published a guide on how to use their new GPUs with PyTorch and their ML stack when they released Alchemist. They gave Alchemist and Battlemage samples to AI YouTubers, not just gaming reviewers. If AMD cared, they would mention any progress with ROCm and make some effort at promoting it.

You make great points. I'm still not sold to the idea that investing in a good ML stack isn't the best move an Nvidia competitor can make right now:
- Their HPC hardware benefits from engineers who know the APIs and the framework already. That's the strength of CUDA, the same code that you debugged at home runs on the cluster.
- I'm still sceptical that the "homebrew ML" market is as small as people seem to think. Why does Tom's Hardware bother testing all new GPUs with ML workloads? (The tests aren't great, but they're there!). And homebrew ML engineers make new networks, which could run on new clusters... And see point 1.
- Ultimately, consumer GPU support *is* what made CUDA successful in the first place. If it worked once, it could work again?
Reply
jp7189

CUDA support across all cards is the single biggest reason nvidia has dominated the market. People choose what the are comfortable with. Even if nvidia was a slower card they would be chosen because people know how to use it.

In every ML buying decision I've been a part of, it's always which nvidia card is the right choice. Never, ever has AMD entered the conversation. We just need one person at the table to say something like.. "hey I've been playing around with my AMD card at home and it's working great." For the same money ML researchers will pick the slower nvidia card for their personal use because of uncertainty around ROCm.
Reply
bit_user

virgult said:
I think the lack of interest is shown by the fact the they were *asked*. Intel published a guide on how to use their new GPUs with PyTorch and their ML stack when they released Alchemist. They gave Alchemist and Battlemage samples to AI YouTubers, not just gaming reviewers. If AMD cared, they would mention any progress with ROCm and make some effort at promoting it.
I'm not the best to comment on what outreach AMD has or hasn't done in this area. I expect they'll do more, as their hardware becomes more competitive. They do have a youtube channel, however, with much AI-specific content.
https://www.youtube.com/user/AMDDevCentral
They also have a new blogging platform, with some post specific to AI usage:
https://rocm.blogs.amd.com/

virgult said:
You make great points. I'm still not sold to the idea that investing in a good ML stack isn't the best move an Nvidia competitor can make right now:
Of course they are. Why do you think they announced ROCm support for the RX 7900 GPUs and on Windows? They do finally understand that people want to use their GPUs for compute and AI.

This is also why they're working on support for RDNA4, even though it won't be ready at launch. Maybe the reason for that delay is because they're finally addressing that IR (Intermediate Representation) problem described in the post I quoted!
Reply
bit_user

jp7189 said:
For the same money ML researchers will pick the slower nvidia card for their personal use because of uncertainty around ROCm.
They definitely earned themselves a bad reputation. About 10 years ago, they rewrote their Linux driver stack and their OpenGL (and later Vulkan) support became pretty rock-solid and properly competitive with Nvidia. Many people assumed ROCm would rapidly improve and stabilize in a similar fashion, only to be met with repeated disappointments. This has created a lot of hurt feelings and bad will, in the community. As the post I quoted outlined, I think this wasn't planned, but it was managed horribly. I hope the same people are no longer in charge.

Meanwhile, there's been another userspace taking shape. Mesa developed the "rusticle" frontend for easily supporting OpenCL on GPUs that Mesa supports. ROCm supports OpenCL, but at this point Rusticle might've already surpassed its implementation. I prefer OpenCL to CUDA/HIP anyhow, but support for AI on OpenCL isn't as good.

Lastly, there are also people using Vulkan for compute, which is an area I haven't followed as closely. However, I can say that Vulkan is much lower level than OpenCL and I'm not convinced it really addresses deficiencies of OpenCL the same way it does for OpenGL.

I guess I should also mention WebGPU, which is an API & framework usable from WebAssembly and I think Javascript. This is an area I know even less about than Vulkan. I believe browers are implementing WebGPU atop Vulkan, but I'm not entirely sure.
Reply
bit_user

In fact, just yesterday there was some news on this subject. I think AMD's pursuit of MLIR (Multi-Level Intermediate Representation) might indeed be aimed at addressing the problem I cited in post #5.
https://www.phoronix.com/news/AMD-Vulkan-SPIR-V-Wide-AI
Some background:
IREE (Intermediate Representation Execution Environment) is an MLIR-based end-to-end compiler and runtime that lowers Machine Learning (ML) models to a unified IR that scales up to meet the needs of the datacenter and down to satisfy the constraints and special considerations of mobile and edge deployments.

Key features:
Ahead-of-time compilation
Support for advanced model features
Designed for CPUs, GPUs, and other accelerators
Low overhead, pipelined execution
Binary size as low as 30KB on embedded systems
Debugging and profiling support
Supported ML frameworks:
JAX
ONNX
PyTorch
TensorFlow and TensorFlow Lite
Support for hardware accelerators and APIs:
Vulkan
ROCm/HIP
CUDA
Metal (for Apple silicon devices)
AMD AIE (experimental)
WebGPU (experimental)
They also claim OS support includes Linux, Windows, MacOS, Android, iOS, and WebAssembly (experimental). Supported host ISAs include Arm, x86, and RISC-V.
MLIR, itself:
Multi-Level Intermediate Representation Overview

The MLIR project is a novel approach to building reusable and extensible compiler infrastructure. MLIR aims to address software fragmentation, improve compilation for heterogeneous hardware, significantly reduce the cost of building domain specific compilers, and aid in connecting existing compilers together.

https://mlir.llvm.org/
Perhaps this is what was referred to by part of the post I quoted (again, see #5 for link):
"... it seems to me that we should be able to package up our extended LLVM IR as the "compiler output" then run the back half of LLVM on the IR at runtime."
I think we'll know if AMD has truly solved this problem, based on whether & the degree to which they can properly support all of their recent hardware. Certainly, when you add XDNA to the mix, the matrix of hardware they need to support is reaching the level of a first-order problem.
Reply
abufrejoval

This doesn't bode well for Strix Halo and is especially troublesome when that system has been advertised as an ML secret weapon with its unified high bandwidth memory.

You can finally pre-order a Framework desktop with 128GB of RAM with ca. 200GB/s bandwidth, but without software it's rather useless or overpriced for what it can do.

Not that I think 200GB/s bandwidth will be that much fun with LLMs filling 128GB capacity, but that's another issue and perhaps a sign that AMD marketing doesn't talk to their engineers.

About the only good GPU news I've been able to observe lately is that Limited Edition B580 have finally appeared near MSRP in Europe, so I grabbed one.

Unfortunately enabling resizable BAR on my Broadwell Xeon failed a BIOS checksum check so for now it's basically a spare in case one of my Nvidia GPUs should fail: can't even get spare parts any more, should the need arise...
Reply

Show more comments