Nvidia engineer breaks and then quickly fixes AMD GPU performance in Linux

AMD RDNA 4 and Radeon RX 9000-series GPUs

(Image credit: AMD)

In a surprising turn of events, an Nvidia engineer pushed a fix to the Linux kernel, resolving a performance regression seen on AMD integrated and dedicated GPU hardware (via Phoronix). Turns out, the same engineer inadvertently introduced the problem in the first place with a set of changes to the kernel last week, attempting to increase the PCI BAR space to more than 10TiB. This ended up incorrectly flagging the GPU as limited and hampering performance, but thankfully it was quickly picked up and fixed.

In the open-source paradigm, it's an unwritten rule to fix what you break. The Linux kernel is open-source and accepts contributions from everyone, which are then reviewed. Responsible contributors are expected to help fix issues that arise from their changes. So, despite their rivalry in the GPU market, FOSS (Free Open Source Software) is an avenue that bridges the chasm between AMD and Nvidia.

Radeon performance breaking bug — (Image credit: Git.kernel)

The regression was caused by a commit that was intended to increase the PCI BAR space beyond 10TiB, likely for systems with large memory spaces. This indirectly reduced a factor called KASLR entropy on consumer x86 devices, which determines the randomness of where the kernel's data is loaded into memory on each boot for security purposes. At the same time, this also artificially inflated the range of the kernel's accessible memory (direct_map_physmem_end), typically to 64TiB.

In Linux, memory is divided into different zones, one of which is the zone device that can be associated with a GPU. The problem here is that when the kernel would initialize zone device memory for Radeon GPUs, an associated variable (max_pfn) that represents the total addressable RAM by the kernel would artificially increase to 64TiB.

Since the GPU likely cannot access the entire 64TiB range, it would flag dma_addressing_limited() as True. This variable essentially restricts the GPU to use the DMA32 zone, which offers only 4GB of memory and explains the performance regressions.

The good news is that this fix should be implemented as soon as the pull request lands, right before the Linux 6.15-rc1 merge window closes today. With a general six to eight week cadence before new Linux kernels, we can expect the stable 6.15 release to be available around late May or early June.

See more GPUs News

TOPICS

Hassam Nasir is a die-hard hardware enthusiast with years of experience as a tech editor and writer, focusing on detailed CPU comparisons and general hardware news. When he’s not working, you’ll find him bending tubes for his ever-evolving custom water-loop gaming rig or benchmarking the latest CPUs and GPUs just for fun.

25 Comments Comment from the forums

Rob1C

You'd need an 8-way CPU to practically have enough slots for 64TiB.
Reply
qxp

Rob1C said:
You'd need an 8-way CPU to practically have enough slots for 64TiB.
Not necessarily. On newer systems you can expand RAM by using PCIe 5.0 slots. You could also memory map SSDs, it takes only 8 8TB SSDs to need more than 64TB addressing space.
Reply
USAFRet

So he didn't "improve performance".
Rather, he undid the performance limiter he pushed last week.

It was fine, before he screwed with it.
Reply
bit_user

The article said:
In the open-source paradigm, it's an unwritten rule to fix what you break. The Linux kernel is open-source and accepts contributions from everyone, which are then reviewed. Responsible contributors are expected to help fix issues that arise from their changes. So, despite their rivalry in the GPU market, FOSS (Free Open Source Software) is an avenue that bridges the chasm between AMD and Nvidia.
It's not just about good manners. If a contributor is found to behave in a malicious or excessively reckless manner, they could face a ban. I'm not aware of a case where this has happened, but I think the potential is real.
Reply
bkuhl

bit_user said:
It's not just about good manners. If a contributor is found to behave in a malicious or excessively reckless manner, they could face a ban. I'm not aware of a case where this has happened, but I think the potential is real.
Its happened:

https://www.tomshardware.com/news/university-researchers-apologize-linux-community
Reply
bit_user

bkuhl said:
Its happened:

https://www.tomshardware.com/news/university-researchers-apologize-linux-community
Yeah, I knew of that incident. What I meant was one vendor acting maliciously towards another, in a strictly anti-competitive fashion.
Reply
Rob1C

Rob1C said:
You'd need an 8-way CPU to **practically** have enough slots for 64TiB.

qxp said:
Not necessarily. On newer systems you can expand RAM by using PCIe 5.0 slots. You could also memory map SSDs, it takes only 8 8TB SSDs to need more than 64TB addressing space.

I did say to be practical.

Using your suggested ideology you say we wouldn't need 8-way CPUs to get enough slots for the DIMM, because we could use 8 SSDs. With modern SSDs you'd only need one. Access and execution speed would be slow.

Similarly we could use fewer DIMM slots by simply using larger DIMMs:
https://www.tomshardware.com/news/samsung-talks-1tb-ddr5-modules-ddr5-7200
Large DIMMs like that are reserved for preferred customers, and available at eye watering prices.

So, going the extreme either way isn't practical, we'd need to land somewhere near the middle ground.

More to the point of the comment, which you completely missed, they submitted a change to support a configuration which is as unlikely for Intel systems as it is impossible for AMD systems.
Reply
nogaard777

USAFRet said:
So he didn't "improve performance".
Rather, he undid the performance limiter he pushed last week.

It was fine, before he screwed with it.
And it was better after he fixed it. Don't let your hatred of a corporation make brainless assumptions of the individuals that work there. A large portion of Linux exists because of Nvidia engineers' contributions, and I'd wager far more than from AMD's much smaller team.
Reply
bit_user

nogaard777 said:
A large portion of Linux exists because of Nvidia engineers' contributions, and I'd wager far more than from AMD's much smaller team.
Why wager? If you know, you know. If you don't, well...

The latest data I found was from 2022:
By changesets
Employer
Number of Changsets
Percentage of total
Huawei Technologies
1281
9.2%
Intel
1254
9.0%
(Unknown)
1097
7.9%
Google
917
6.6%
Linaro
837
6.0%
AMD
750
5.4%
Red Hat
672
4.8%
(None)
564
4.0%
Meta
414
3.0%
NVIDIA
389
2.8%

By lines changed
Employer
Number of Lines
Percentage of total
Oracle
91852
12.0%
AMD
89761
11.7%
Google
56504
7.4%
Intel
44062
5.8%
(Unknown)
33765
4.4%
Realtek
33277
4.3%
Linaro
31234
4.1%
Huawei Technologies
27856
3.6%
NVIDIA
25441
3.3%
Red Hat
24073
3.1%

Source: https://lwn.net/Articles/915435/So, AMD changed about 3.53 times as many lines as Nvidia, in 1.93 times as many changesets.
Reply
qxp

Rob1C said:
I did say to be practical.

Using your suggested ideology you say we wouldn't need 8-way CPUs to get enough slots for the DIMM, because we could use 8 SSDs. With modern SSDs you'd only need one. Access and execution speed would be slow.

If you use 8x 8TB PCIe 4.0 SSD you get read bandwidth of at least 56 GB/s - not stellar, but definitely usable. Using Sabrent Rocket 8TB, this will set you back less than $10K.

Rob1C said:
Similarly we could use fewer DIMM slots by simply using larger DIMMs:
https://www.tomshardware.com/news/samsung-talks-1tb-ddr5-modules-ddr5-7200
Large DIMMs like that are reserved for preferred customers, and available at eye watering prices.

So, going the extreme either way isn't practical, we'd need to land somewhere near the middle ground.

More to the point of the comment, which you completely missed, they submitted a change to support a configuration which is as unlikely for Intel systems as it is impossible for AMD systems.
The change was likely in response to customer request, as there are plenty of people that need systems with lots of RAM. And, of course, we will see these in wider uses as prices drop, and by this time the issue has been worked out.
Reply

Show more comments