AMD RDNA 3 professional GPUs with 48GB can beat Nvidia 24GB cards in AI — putting the 'Large' in LLM

Radeon Pro W7900 Dual Slot
(Image credit: AMD)

AMD is swinging back at Nvidia with new DeepSeek benchmarks that claim its monster 48GB RDNA 3 GPUs can outperform Team Green's previous-generation RTX 4090.

David McAfee, AMD vice president and general manager of Ryzen CPUs and Radeon graphics, posted on X that the Radeon Pro W7900 and Pro W7800 48GB cards can outperform an RTX 4090 by up to 7.3x in DeepSeek R1.

McAfee shared a graph of the three GPUs benchmarked in several iterations of DeepSeek R1 using LM Studio 0.3.12 and Llama.cpp runtime 1.18. The DeepSeek R1 iterations consisted of Distill Qwen 32B 8-bit, Distill Llama 70B 4-bit, Distill Qwen 32B 8-bit, and Distill Llama 70B 4-bit. The former two were configured to output conversational prompts (with 20 tokens) and the latter summarization prompts (with 3017 tokens).

Click See more to see the benchmark results:

In DeepSeek R1 Distill Qwen 32B 8-bit, the RTX 4090 allegedly produced 2.7 tokens a second, the Pro W7800 48GB produced 19.1, and the Pro W7900 48GB produced 19.8 tokens per second. In Distill Llama 70B 4-bit, the RTX 4090 produced 2.3 tokens a second, the Pro W7800 48GB 12.8, and the Pro W7900 48GB 12.7 tokens a second.

In Distill Qwen 32B 8-bit, the RTX 4090 produced 2.5 tokens per second, Pro W7800 48GB 15.7 and Pro W7900 48GB 16.2 tokens per second. In R1 Distill Llama 70B 4-bit, the RTX 4090 produced two tokens per second, Pro W7800 48GB 10.1 and Pro W7900 48GB 10.4 tokens per second.

AMD's benchmarks claim the Radeon Pro W7800 or Pro W7900 48GB GPUs are up to 7.3x faster in Distill Qwen 32B 8-bit, 5.5x faster in Distill Llama 70B 4-bit, 6.5x faster in Distill Qwen 32B 8-bit, and 5.2x faster in Distill Llama 70B 4-bit compared to the RTX 4090.

David McAfee claims the 48GB trims of the WPro W7800 and W7900 have enough VRAM to run the largest DeepSeek R1 models. VRAM is one of the most critical aspects of processing large language models; parameters for LLMs are stored directly in VRAM and are directly proportional to the model sizes. Thus, the larger an LLM is, the more VRAM you need. But with the extra VRAM capacity comes very high prices.

The W7900 48GB costs a whopping $3,500 — $1,500 over the RTX 5090's $2,000 MSRP and $2,000 over the RTX 4090's $1,500 MSRP (though hardly any 4090's were sold at that price). But on the flip side, the 48GB RDNA 3 GPU is less than half the price of the closest current-generation 48GB Nvidia GPU you can buy today, the RTX A6000 Ada.

AMD's marketing looks great, but we have seen this before. AMD previously shared benchmarks of its RX 7900 XTX outperforming the RTX 4090 (mostly) in DeepSeek R1 benchmarks. However, Nvidia responded by showcasing benchmarks of the RTX 4090 (and RTX 5090), drastically outperforming the flagship RDNA 3 GPU with the same DeepSeek R1 configurations.

AMD also neglected to share any benchmarks comparing Nvidia's newest flagship, the RTX 5090, against its RDNA 3-based 48GB workstation-focused graphics cards. It will be interesting to see if Nvidia will follow up with another round of benchmarks to combat AMD, particularly since AMD has more VRAM on its 48GB cards than even the RTX 5090 with its 32GB of GDDR7.

Aaron Klotz
Contributing Writer

Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.

Read more
AMD RX 7900 XTX
AMD claims RX 7900 XTX outperforms RTX 4090 in DeepSeek benchmarks
Nvidia GeForce RTX 4090
Nvidia counters AMD DeepSeek AI benchmarks, claims RTX 4090 is nearly 50% faster than 7900 XTX
AMD Strix Halo Ryzen AI Max
AMD slides claim Strix Halo can beat the RTX 4070 laptop GPU by up to 68% in modern games
AMD Strix Halo Ryzen AI Max
AMD's beastly 'Strix Halo' Ryzen AI Max+ matches the RTX 4060 laptop in leaked 3DMark tests
GeForce RTX 5090
RTX 5090 exhibits 27% higher CUDA performance than RTX 4090 — exceeds 500K points in Geekbench
Nvidia GeForce RTX 5090 Founders Edition
Nvidia GeForce RTX 5090 versus RTX 4090 — How does the new halo GPU compare with its predecessor?
Latest in GPUs
Despite external similarities, the RTX 3090 is not at all the same hardware as the RTX 4090 — even if you lap the GPU and apply AD102 branding.
GPU scam resells RTX 3090 as a 4090 — complete with a fake 'AD102' label on a lapped GPU
WireView Pro 90 degrees
Thermal Grizzly's WireView Pro GPU power measuring utility gets a 90-degree adapter revision
Nvidia Ada Lovelace and GeForce RTX 40-Series
Analyst claims Nvidia's gaming GPUs could use Intel Foundry's 18A node in the future
RX 9070 XT Sapphire
Lisa Su says Radeon RX 9070-series GPU sales are 10X higher than its predecessors — for the first week of availability
RTX 5070, RX 9070 XT, Arc B580
These are the best GPU 'deals' based on real-world scalper pricing and our FPS per dollar test results
Zotac Gaming GeForce RTX 5090 AMP Extreme Infinity
Zotac raises RTX 5090 prices by 20% and seemingly eliminates MSRP models
Latest in News
Despite external similarities, the RTX 3090 is not at all the same hardware as the RTX 4090 — even if you lap the GPU and apply AD102 branding.
GPU scam resells RTX 3090 as a 4090 — complete with a fake 'AD102' label on a lapped GPU
Inspur
US expands China trade blacklist, closes susidiary loopholes
WireView Pro 90 degrees
Thermal Grizzly's WireView Pro GPU power measuring utility gets a 90-degree adapter revision
Qualcomm
Qualcomm launches global antitrust campaign against Arm — accuses Arm of restricting access to technology
Nvidia Ada Lovelace and GeForce RTX 40-Series
Analyst claims Nvidia's gaming GPUs could use Intel Foundry's 18A node in the future
Core Ultra 200S CPU
An Arrow Lake refresh may still be in the cards with only K and KF models, claims leaker
  • derekullo
    Don't forget you can use multiple GPUs with LLMs.

    Radeon PRO W7900 has 48 gigabytes of vram and 864 gigabytes a second of bandwidth.

    Nvidia 5090 has 32 gigabytes of vram and 1792 gigabytes a second of bandwidth.

    For $500 more, assuming MSRP, you could have dual Nvidia 5090s totaling 64 gigabytes of vram and 3584 gigabytes a second of bandwidth.
    Reply
  • Taslios
    derekullo said:
    Don't forget you can use multiple GPUs with LLMs.

    Radeon PRO W7900 has 48 gigabytes of vram and 864 gigabytes a second of bandwidth.

    Nvidia 5090 has 32 gigabytes of vram and 1792 gigabytes a second of bandwidth.

    For $500 more, assuming MSRP, you could have dual Nvidia 5090s totaling 64 gigabytes of vram and 3584 gigabytes a second of bandwidth.
    rather bold of you to assume anything AI related made by any company will be MSRP.... or even available for purchase at all.
    Reply
  • John Freiman
    Ok, a Pro W9700 or any 'Pro' graphics cards are out of the question for someone like me who just uses (or wants to use) a GPU for generating subtitles using Whisper, but...
    My question is, what LLM applications require or takes advantage of such large pools is RAM?
    To me, it seems like these benchmarks only apply to a very small percentage of the market and for that percentage (I'm sure I'm be corrected/schooled) is looking for budget GPU performance vs. faster, premium GPUs with equivalent memory available? Are there no Nvidia vendors with 48GB memory? - or like another poster noted, buying 2 cards (even with elevated pricing) will outperform the Pro W series.
    I understand all about prices and cost of entry - this is why I still use Whisper on my Ryzen CPU and not GPU, but if your use cases requires vast amounts of RAM, aren't you more likely to have the $$ for more costly and effective options? It's not like you have one task to process, they likely have multiple dozens, hundred or thousands of tasks and speed would be off the utmost importance?
    For me, I can setup Whisper to transcribe a season of TV, let it run for a day or 4 and then move to the next season/series.
    SURE, I'd love for it to be done much now quickly and without hugging my CPU cycles, but it can be done.
    People/business that require tasks to be done quickly and efficiently are going to drop $$$ on the best tools/GPUs for speed and efficiency - AMD going on about how their last gen GPU is still relevant or beating the competition seems like a PR/Shareholder battle and not truly directed to the people buying or needing truly large LLM tasks.
    What am I missing?
    Ps. Anyone have a cheap Nvidia GPU that might be busted, but otherwise had working Media Engine and NPU they want to sell cheap? 😋
    Reply
  • abufrejoval
    derekullo said:
    Don't forget you can use multiple GPUs with LLMs.
    Please forget about using multiple GPUs with LLMs .

    Of course, if you can afford NVlink switches and proper DC GPUs, there is a bit of scaling to be had, if you got the engineering teams to go with that.
    derekullo said:
    Radeon PRO W7900 has 48 gigabytes of vram and 864 gigabytes a second of bandwidth.

    Nvidia 5090 has 32 gigabytes of vram and 1792 gigabytes a second of bandwidth.

    For $500 more, assuming MSRP, you could have dual Nvidia 5090s totaling 64 gigabytes of vram and 3584 gigabytes a second of bandwidth.
    And with all that money you'll have the performance of a GTX 1030 with an LLM.

    Because as soon as you run out of memory in one of your GPUs and have to go across the PCIe bus for weights on the other GPU or CPU RAM, you might as well just stick with CPUs, only: some of them have a memory bus that's faster than what GPUs need to share on PCIe.

    The reason why Nvidia can charge so much for their 4TB/s 96GB HBM GPUs with NVlink is because they understand the limitatios of LLMs.

    If all you had to do was to coble GPUs together in a chassis like they used to do for Ethereum, Nvidia would still be flogging gaming GPUs.

    Of course, thinking myself smart, I actually had to try that.

    I did put an RTX 4090 and a 4070 (5 slots total width) into my Ryzen 9 desktop and then observed what happened when I split models between them...

    Pretty near the same that happens, once layers are loaded to CPU RAM: performance goes down the drain and very close to what CPUs can do, too. At least, when you have 16 good cores, like I do.

    And if you look closely via HWinfo, you'll see that the memory bus is at 100% utilization, while everbody else, CPUs or GPUs are just twiddling their thumbs, waiting for data.
    Reply
  • abufrejoval
    Admin said:
    AMD published DeepSeek R1 benchmarks of its W7900 and W7800 Pro series 48GB GPUs, massively outperforming the 24GB RTX 4090.

    AMD RDNA 3 professional GPUs with 48GB can beat Nvidia 24GB cards in AI — putting the 'Large' in LLM : Read more
    They had the same story at CES, how a single Strix Halo could outperform an RTX 4090 by a factor or 2.2...

    The reason: they loaded a 70B model at 4Q, which eats around 42GB.

    With only 24GB of 1TB/s VRAM on board of an RTX 4090, it has to go across the 64GB/s PCIe bus for any weight in RAM.

    While the Strix Halo can use its 256 bit wide the LP-DDR5 RAM at 256GB/s and thus will obviously be faster. Should be 4x really.

    But that's like driving a Ferrari into a corn field and having it run against a tractor: not a fair race.

    Please, AMD marketing has pulled this cheap trick before. But it's totally misleading. So don't go mindlessly echoing their bull, use your brain to filter disinformation.

    Or you'll just put pople off TH.
    Reply
  • Gururu
    Get ready for a second smackdown by nVidia...
    Reply
  • Li Ken-un
    the closest current-generation 48GB Nvidia GPU you can buy today, the RTX A6000 Ada.
    There’s no “A” before the numbers in “RTX 6000 Ada.”
    Reply
  • salgado18
    abufrejoval said:
    I did put an RTX 4090 and a 4070 (5 slots total width) into my Zen 9 desktop...
    This guy is living in 3025
    Reply
  • abufrejoval
    salgado18 said:
    This guy is living in 3025
    Thanks for the hint: fixed Zen 9 to Ryzen 9, not quite the same thing...
    Reply
  • usertests
    abufrejoval said:
    Please forget about using multiple GPUs with LLMs .
    I think most people are doing inference, not training? And there are things like this.
    Reply