Nvidia Says Feature Similar to AMD's Smart Access Memory Tech is Coming to Ampere
Nvidia levels the playing field
If Nvidia has things it's way, AMD's latest new performance-boosting technology for RX 6000 "Big Navi" graphics cards might not be a huge one-sided advantage, after all.
According to a statement Nvidia gave to Gamer's Nexus, the company says it will soon enable a feature similar to AMD's Smart Access Memory (SAM) tech, which boosts data transfer efficiency between the GPU and CPU. In fact, Nvidia already has the feature working on its Ampere graphics cards in its labs.
Additionally, Nvidia claims its feature will work equally well with Intel and AMD processors and can use the PCIe 3.0 bus, while AMD has already said that its solution requires an AMD Ryzen 5000 series processor, X570 motherboard, and Radeon RX 6000 GPU to work.
Nivida also suggests that AMD's feature, which it hasn't fully detailed yet, consists of adjusting PCIe's resizeable bar feature, which can be done on almost any modern motherboard if the manufacturer supports the option.
From NVIDIA, re:SAM: “The capability for resizable BAR is part of the PCI Express spec. NVIDIA hardware supports this functionality and will enable it on Ampere GPUs through future software updates. We have it working internally and are seeing similar performance results."November 12, 2020
AMD says that Smart Access Memory allows the CPU and GPU to share information across a broader PCIe pipe, but the company hasn't divulged the details of the tech fully yet. AMD merely says that the CPU and GPU are usually constrained to a 256MB ‘aperture’ for data transfers. That limits game developers and requires frequent trips between the CPU and main memory if the data set exceeds that size, causing inefficiencies and capping performance. Smart Access Memory removes that limitation, thus boosting performance due to more efficient data transfer between the CPU and GPU.
However, the feature looks akin to the PCIe resizable bar feature, a standard feature of the PCIe spec. Nvidia's statement surely suggests that the company feels likewise. If the GPU supports it, adjusting this setting in the motherboard BIOS essentially allows mapping of the full frame buffer, thus improving performance.
Nvidia says its hardware already supports the feature, though it will need to be enabled. Any PCIe-compliant CPU, be it either Intel or AMD, should also be able to use the tech with Nvidia's graphics cards.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
That seemingly takes the shine off of AMD's requirement of an AMD GPU, CPU, and high-end X570 motherboard, especially given that Nvidia plans to enable its competing (yet similar) functionality on all platforms - Intel, AMD, and PCIe 3.0 motherboards included.
Nvidia says that its early testing shows similar performance gains to AMD's SAM and that it will enable the feature through future firmware updates. However, the company hasn't announced a timeline for the updates.
It certainly feels like Nvidia is trying to steal AMD's thunder. If Nvidia's Ampere silicon experiences similar gains from the Smart Access Memory-like tech, it will definitely complicate matters for AMD's push to create a walled all-AMD gaming PC garden.
Paul Alcorn is the Managing Editor: News and Emerging Tech for Tom's Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.
-
Murissokah If NVidia had this capability all along I feel this begs the question of why they had to wait for AMD to implement it. Looks bad either way.Reply
On the other hand, considering Ryzen's appetite for fast memory and the fact that NVidia uses GDDR6X, wouldn't it be an interesting day if it turns out Ryzen 5000 runs faster with an NVidia 3000 series card? -
InvalidError Got to love when one company brags about marketing brand-new features that are merely a firmware tweak away from getting matched by every other hardware manufacturer that has the necessary flexibility (maximum BAR mask size in this case) built into older products, just not exposed yet for whatever reason.Reply -
BaRoMeTrIc OK, so no matter if it's pcie 3 or pcie 4 the pci base address registers can be raised for 256mb x 16 = 4gb to potentially what? 8gb? a full 16?Reply -
Chung Leong Murissokah said:If NVidia had this capability all along I feel this begs the question of why they had to wait for AMD to implement it. Looks bad either way.
From a marketing standpoint, it's better to let everyone get subpar performance than to have some consumers getting subpar performance due to incompatible hardware. -
InvalidError
The PCIe spec had a 64bits version of the initialization registers for a long time to accommodate CPUs with >4GB memory space, I'm actually surprised it took this long for hardware manufacturers to add or unlock support for >256MB BAR.BaRoMeTrIc said:OK, so no matter if it's pcie 3 or pcie 4 the pci base address registers can be raised for 256mb x 16 = 4gb to potentially what? 8gb? a full 16?
The maximum size of BAR blocks depends on how many bits the address decoders support. Logically, the BAR should support enough address bits to let memory-mapped IO sit outside the maximum physical RAM address space at a bare minimum so memory-mapped IO don't carve usable space out of it. If the system allows a device to use up to half of the memory-mapped IO space, then the maximum BAR size might be 64GB on CPUs with 128GB max RAM.
There was a time where CPUs only decoded the lower 36bits of memory addresses and OSes used the MSBs to encode flags for internal use, which required kernel rewrites when address decoding got expanded to 40+bits. Chances are that all of the hardware necessary to support larger BAR sizes has been around since then. -
cryoburner
Yeah, it does seem a bit questionable why they wouldn't have enabled such functionality previously. Are there any drawbacks to doing so?Murissokah said:If NVidia had this capability all along I feel this begs the question of why they had to wait for AMD to implement it. Looks bad either way.
On the other hand, considering Ryzen's appetite for fast memory and the fact that NVidia uses GDDR6X, wouldn't it be an interesting day if it turns out Ryzen 5000 runs faster with an NVidia 3000 series card?
It's also possible that AMD's solution might cover more than just adjusting the PCIe BAR size though, and that might only be a part of it that affects existing games. I think it was suggested that games would have to be optimized for Smart Access Memory to get the most from it, so perhaps it also enables something like direct control over the contents of the "infinity cache" for example. Nvidia's performance gains from adjusting the BAR size alone might not be as large, though we just have speculation to work with for now.
As for Ryzen and RAM speed, that's not how it works. Ryzen's fabric matches the speed of system RAM, but isn't affected by VRAM, and applications are typically processing data stored in system RAM, not on the graphics card. And again, at least from what AMD has shown, the memory bandwidth of their 6000-series cards can effectively be far higher than GDDR6X for data that can fit inside the large 128MB block of L3 cache that they are calling the "infinity cache", which accounts for a relatively large portion of the GPU chip itself. That cache can hold the framebuffer, for example, allowing the GPU to perform operations on it much quicker and more efficiently than if it were stored in VRAM. I'm sure there will be some cases where having faster VRAM would be better, but this new cache is a large part of where AMD's performance and efficiency gains come from this generation, and it reduces the need for faster graphics memory. -
hannibal Most likely the difference between pci 3.0 and 4.0 is the normal. Pci 4.0 has bigger bandwide so it can acces the memory faster than 3.0. I think that there were news that this has been working in Linux for some time.Reply -
InvalidError
Modern games use GBs worth of assets to render a scene, a larger L3$ won't alleviate the need for more VRAM. All extra cache does is reduce the frequency of cache misses so the GPU as a whole can make better use of available VRAM and PCIe bandwidth. I also bet modern GPUs have far more pressing uses for L3$ than the frame buffer, such as all of the temporary data shaders need to pass around between passes.cryoburner said:but this new cache is a large part of where AMD's performance and efficiency gains come from this generation, and it reduces the need for faster graphics memory. -
mitch074 There was a comment on Phoronix forums a while ago about it and yes, it's about modifying BAR. This however requires BIOS and kernel support, and wasn't possible in Windows for quite a while.Reply
AMD enabled the feature on Windows when they had enough control to make sure it works, they never said it was a feature only they could support.
Yes, one could ask why Microsoft/Intel/Nvidia never came together to make it possible before... Oh wait.
Edit: typo -
VforV
So you're hating on AMD for bringing this forward, just because they brag about it? Sure, it would have been better to let this go, ignore it like nvidia did and we the gamers would have never had it... perfect reasoning.InvalidError said:Got to love when one company brags about marketing brand-new features that are merely a firmware tweak away from getting matched by every other hardware manufacturer that has the necessary flexibility (maximum BAR mask size in this case) built into older products, just not exposed yet for whatever reason.
Does it really matter if its a brag or easy to implement as long as NOW thanks to AMD we will get it on both of them?
Some people have such a narrow mind... meh.