AMD MI300X performance compared with Nvidia H100 — low-level benchmarks testing cache, latency, inference, and more show strong results for a single GPU

MI300X
MI300X (Image credit: AMD)

The MI300X is AMD's latest and greatest AI GPU flagship, designed to compete with the Nvidia H100 — the upcoming MI325X will take on the H200, with MI350 and MI400 gunning for the Blackwell B200. Chips and Cheese tested AMD's monster GPU in various low-level and AI benchmarks and found that it often vastly outperforms Nvidia's H100.

However, before we get started, there are some caveats worth mentioning. Chips and Cheese's article does not mention what level of tuning was done on the various test systems, and software can have a major impact on performance — Nvidia says it doubled the inference performance of the H100 via software updates since launch, for example. The site also had limited contact with AMD but apparently not with Nvidia, so there could be some missed settings that could affect results. More critically, the company that provided access to the MI300X, Hot Aisle, was specifically looking for MI300X benchmarks. Chips and Cheese also compared the MI300X primarily to the PCIe version of the H100 in its low-level testing, which is the weakest version of the H100 with the lowest specs.

Caveats and disclaimers aside, Chips and Cheese's low-level benchmarks reveal that the MI300X, built on AMD's bleeding edge CDNA 3 architecture, is a good design from a hardware perspective. The chip's caching performance looks downright impressive, thanks to its combination of four caches in total, including a 32KB L1 cache, 16KB scalar cache, 4MB L2 cache, and a massive 256MB Infinity Cache (which serves as an L3 cache). CDNA 3 is the first architecture to inherit Infinity Cache, which first debuted on RDNA 2 (AMD's 2nd generation gaming graphics architecture driving the RX 6000 series).

Chips and Cheese

(Image credit: Chips and Cheese)

Not only are there four caches for the MI300X GPU cores to play with, but they are also fast. Chips and Cheese's cache benchmarks show that the MI300X's cache bandwidth is substantially better than Nvidia's H100 across all relevant cache levels. L1 cache performance shows the MI300X boasting 1.6x greater bandwidth compared to the H100, 3.49x greater bandwidth from the L2 cache, and 3.12x greater bandwidth from the MI300X's last level cache, which would be its Infinity Cache.

Even with higher clocks on the SXM version of the H100, we wouldn't expect these cache results to radically change. But cache bandwidth and latency on its own doesn't necessarily tell how a GPU will perform in real-world workloads. The RTX 4090 for example has 27% higher LLC bandwidth than the H100 PCIe, but there are plenty of workloads where the H100 would prove far more capable.

Chips and Cheese

(Image credit: Chips and Cheese)

Similar advantages are also prevalent in the MI300X's VRAM and local memory performance (i.e., the scalar cache). The AMD GPU has 2.72X as much local HBM3 memory, with 2.66x more VRAM bandwidth than the H100 PCIe. The only area in the memory tests where the AMD GPU loses is in the memory latency results, where the H100 is 57% faster.

Keep in mind that this is looking at the lowest spec H100 PCIe card that has 80GB of HBM2E. Later versions like the H200 include up to 141GB of HBM3E, with up to 4.8 TB/s of bandwidth. The H100 SXM variant also has substantially faster HBM providing up to 3.35 TB/s of bandwidth, so the use of the 2.0 TB/s card clearly hinders memory bandwidth.

Chips and Cheese

(Image credit: Chips and Cheese)

Moving on, raw compute throughput is another category where Chips and Cheese saw the MI300X dominate Nvidia's H100 GPU. Instruction throughput is ridiculously in favor of the AMD chip. At times the MI300X was 5X faster than the H100, and at worst it was roughly 40% faster. Chips and Cheese's instruction throughput results take into account INT32, FP32, FP16 and INT8 compute.

It's also interesting to look a the current and previous generation results from these data center GPUs. The H100 PCIe shows stronger performance in certain workloads, like FP16 FMAs and Adds, but elsewhere it's only slightly faster than the A100. AMD's MI300X on the other hand shows universally massive improvements over the previous generation MI210.

One of the last and likely most important tests Chips and Cheese conducted was AI inference testing, not only with the MI300X and H100 but with GH200 as well (for one of the tests) — and unlike the low-level testing, the Nvidia GPUs in this case are the faster SXM variants. Chips and Cheese's conducted two tests, using Mixtral 8-7B and LLaMA3-70B. Apparently due to how the servers were rented, the hardware configurations are also a more diverse and inconsistent lot, so not every configuration got tested in each benchmark.

The Mixtral results show how various configuration options can make a big difference — a single H100 80GB card runs out of memory, for example, while the MI300X without KVcache also performs poorly. GH200 does much better, though the MI300X still holds a lead, while two H100 SXM5 GPUs achieve about 40% higher performance. (The two H100 GPUs were necessary to even attempt to run the model at the selected settings.)

Shifting over to the LLaMA3-70B results, we get a different set of hardware. This time even two H100 GPUs failed to run the model due to a lack of memory (with input and output lengths set to 2048 and using FP16). A single H100 with INT8 also performed quite poorly with the same 2048 input/output length setting. Dropping the lengths to 128 improved performance quite a lot, though it was still far behind the MI300X. Two H100 GPUs with input/output lengths of 128 using INT8 finally start to look at least somewhat competitive.

For the MI300X with it's massive 192GB of memory, it was able to run both 2048 and 128 lengths using FP16, with the latter providing the best result of 4,858. Unfortunately, Nvidia's H200 wasn't tested here due to time and server rental constraints. We'd like to have seen it as potentially it would have yielded better results than the H100.

More testing, please!

While the compute and cache performance results show how powerful AMD's MI300X can be, the AI tests clearly demonstrate that AI-inference tuning can be the difference between a horribly performing product and a class-leading product. The biggest problem we have with many AI performance results in general — not just these Chips and Cheese results — is that it's often unclear what level of optimizations are present in the software stack and settings for each GPU.

It's a safe bet that AMD knows a thing or two about improving performance on its GPUs. Likewise, we'd expect Nvidia to have some knowledge about how to improve performance on its hardware. Nvidia fired back at AMD's MI300X performance claims last year, for example, saying that numbers presented by AMD were clearly suboptimal. And here's where there are some questions left unanswered.

The introduction to the Chips and Cheese article says, "We would also like to thank Elio from NScale who assisted us with optimizing our LLM runs as well as a few folks from AMD who helped with making sure our results were reproducible on other MI300X systems." Nvidia did not weigh in on whether the H100 results were reproducible. Even if they could be reproduced, were the tests being run in a less than optimal way? Again, getting access to MI300X and other server hardware via a company specifically promoting the MI300X can influence the testing and results.

Hopefully, future testing can also involve Nvidia folks, and ideally people from both parties can help with any tuning or other questions. And while we're talking about things we'd like to see, getting Intel's Ponte Vecchio or Gaudi3 into the testing would be awesome. We'd also like to see the SMX variant of the H100 used for testing, as that's more directly comparable to the OAM MI300X GPU.

[Note: clamchowder provided additional details on Twitter about the testing and hardware. We've reached out to suggest some Nvidia contacts, which they apparently didn't have because we really do appreciate seeing these sorts of benchmarks and would love to have any questions about testing methodology addressed. And if there's one thing I'm certain of, it's that tuning for AI workloads can have a dramatic impact. We've also adjusted the wording in several areas of the text to clarify things. —Jarred]

The conclusion begins with "Final Words: Attacking NVIDIA’s Hardware Dominance." That's definitely AMD's intent, and the CDNA 3 architecture and MI300X are a big step in the right direction. Based on these results, there are workloads where MI300X not only competes with an H100 but can claim the performance crown. However, as we've seen with so many other benchmarks of data center AI hardware, and as the site itself states, "the devil is in the details."

From using a clearly slower PCIe H100 card for the non-inference tests to a scattershot selection of hardware for inference benchmarks, there's the potential for missing pieces of information. Chips and Cheese tells us this was due to limited hardware availability, and getting access to this sort of hardware is difficult. In short, we want to see more of these sorts of benchmarks — independent tests — done in a way that lets all the hardware perform to the best of its ability.

The MI300X's raw cache, bandwidth, and compute results look very good. But these GPUs are also purpose-built for scaleout and large installations, so even if a single MI300X clearly beats a single H100 (or H200, for that matter), that doesn't say how the picture might change with dozens, hundreds, or even thousands of GPUs working in tandem. The software and ecosystem are also important, and Nvidia has held a lead there with CUDA in the past. Low-level benchmarks of hardware like this can be interesting, and the inference results show what can happen when your GPU doesn't have enough VRAM for a particular model. However, we suspect this is far from the final word on the AMD MI300X and Nvidia H100.

Aaron Klotz
Contributing Writer

Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.

With contributions from
  • Makaveli
    Why does it seems like this write up of the Chips and Cheese article has a negative spin on it in the conclusion here.

    That whole take these results with a pinch of salt section....
    Reply
  • edzieba
    There's a big truckload of salt not mentioned with that pinch: they benchmarked the chips for model inference, but everyone is buying up H100s for model training. Two different workloads.

    There are two other frontpage articles that gloss over this fundamental errror ( & ), which would be like comparing consumer GPUs based on which can play back FMVs with the lowest power - it's nice, but nothing to do with by you bought the GPU in the first place.
    Reply
  • bit_user
    edzieba said:
    There's a big truckload of salt not mentioned with that pinch: they benchmarked the chips for model inference, but everyone is buying up H100s for model training. Two different workloads.
    True, but it's not practical to train a LLM on a single GPU, no matter how powerful. There are lighter-weight models they could use to benchmark training, but then you'd have to consider whether those are a good proxy for LLM training performance.

    MLPerf might be a good avenue to explore, though I have no experience with it:
    https://mlcommons.org/benchmarks/training/
    Maybe @cheesecake1116 knows more about this decision.
    Reply
  • KnightShadey
    This article does seem to have a heavy slant on it due to all the salt & shade being pre-applied.


    *OR* since we're throwing around unsubstantiated ulterior motive accusations willy-nilly, could this be a hit-piece to undermine C&C's article in the lead-up to THG's own forthcoming article mentioned in yesterday's Geekbench thread? 🤔🤨 *dramatic conspiracy music plays*


    These parts seem inaccurate at best, and possibly deliberately misleading given the rest of the tone, and statement of possible bias (which is just shy of a flat out accusation of intent to deceive IMO).

    "Chips and Cheese also mentions getting specific help from AMD with its testing" ... "so there could be some bias in the benchmark results "
    bookended with another accusation not supported by the original write-up..

    as a few folks from AMD who helped with making sure our results were reproducible on other MI300X systems." No mention is made of any consultation with any Nvidia folks, and that suggests this is more of an AMD-sponsored look at the MI300X.
    The use of the term sponsorship doubles-down on the implied bias, for something not conveyed by the testing or the acknowledgment itself which is more about ensuring external validity and reproducibility regarding the product that is the focus of the testing.

    When THG does reviews and asks for the latest drivers for pre-release hardware, to ensure tests reflect shipping final production hardware, and doesn't allow competitors to provide pre-production drivers/software or early access to the review, does that then become a 'sponsored' article because it required assistance from the products mfr/vendor to ensure validity without equal time/efforts/words focused on the competitors hardware?

    Sure there should be a bunch of caveats, and you can disagree with the chosen tests, methodology, but this article, and the above in particular goes well beyond that.

    Even C&C's testing also shows there is a lot of untapped potential in the MI300X, and it's still on AMD to improve the ecosystem that can provide that potential to customers instead of leaving that potential/value locked behind less than stellar software support, and that shortcoming is clearly also addressed in the article too.

    Perhaps in the effort to seem balanced THG's overshot the mark, but it has resulted in the appearance of defensive bias on Nvidia's behalf or worse. 🫤

    I could be wrong, as you (we all) could be, but it sure feels like an attempt to undermine their work.
    Reply
  • JarredWaltonGPU
    KnightShadey said:
    This article does seem to have a heavy slant on it due to all the salt & shade being pre-applied.


    *OR* since we're throwing around unsubstantiated ulterior motive accusations willy-nilly, could this be a hit-piece to undermine C&C's article in the lead-up to THG's own forthcoming article mentioned in yesterday's Geekbench thread? 🤔🤨 *dramatic conspiracy music plays*


    These parts seem inaccurate at best, and possibly deliberately misleading given the rest of the tone, and statement of possible bias (which is just shy of a flat out accusation of intent to deceive IMO).

    "Chips and Cheese also mentions getting specific help from AMD with its testing" ... "so there could be some bias in the benchmark results "
    bookended with another accusation not supported by the original write-up..

    as a few folks from AMD who helped with making sure our results were reproducible on other MI300X systems." No mention is made of any consultation with any Nvidia folks, and that suggests this is more of an AMD-sponsored look at the MI300X.
    The use of the term sponsorship doubles-down on the implied bias, for something not conveyed by the testing or the acknowledgment itself which is more about ensuring external validity and reproducibility regarding the product that is the focus of the testing.

    When THG does reviews and asks for the latest drivers for pre-release hardware, to ensure tests reflect shipping final production hardware, and doesn't allow competitors to provide pre-production drivers/software or early access to the review, does that then become a 'sponsored' article because it required assistance from the products mfr/vendor to ensure validity without equal time/efforts/words focused on the competitors hardware?

    Sure there should be a bunch of caveats, and you can disagree with the chosen tests, methodology, but this article, and the above in particular goes well beyond that.

    Even C&C's testing also shows there is a lot of untapped potential in the MI300X, and it's still on AMD to improve the ecosystem that can provide that potential to customers instead of leaving that potential/value locked behind less than stellar software support, and that shortcoming is clearly also addressed in the article too.

    Perhaps in the effort to seem balanced THG's overshot the mark, but it has resulted in the appearance of defensive bias on Nvidia's behalf or worse. 🫤

    I could be wrong, as you (we all) could be, but it sure feels like an attempt to undermine their work.
    Whether you like it or not, receiving help to get things running from AMD and not receiving equivalent input from Nvidia is inherently biased. I appreciate what Chips and Cheese has done, I think it's interesting, and we wrote up an article to promote it. But we have to call out potential issues, because those issues are absolutely real.

    FWIW, I have reached out to clamchowder and offered to provide my Nvidia contacts. Whether or not Nvidia will repond and offer help is beside the point. It should be allowed to at least give some suggestions. I always do this when looking at new and unusual benchmarks, and likewise when I run tests on a game where results seem odd.

    I get accused of being heavily biased in favor of Nvidia on a regular basis — like for example because I have a bunch of ray tracing enabled games that I use in benchmarks and reviews. But not having those would show significant bias in the other direction. It's a catch-22, damned if you do, damned if you don't sort of situation. Chips and Cheese gets to be in the same boat, simply by virtue of writing about MI300X and H100. (What, no Ponte Vecchio benchmarks!? LOL)

    What I can tell you is that I'm routinely in contact with representatives from all three GPU companies to field questions and potentially find workarounds to problems I encounter. The Stable Diffusion benchmarks are a great example of this. When I first started testing SD performance, I couldn't get it to work on anything but Nvidia GPUs, because they were the default that most public projects targeted. Over time, I received instructions that allowed me to get things running on AMD and Intel GPUs. It took over six months in some cases... and now we have Stable Diffusion 3 and ComfyUI, and I'm almost back to square one it feels.
    Reply
  • bit_user
    KnightShadey said:
    This article does seem to have a heavy slant on it due to all the salt & shade being pre-applied.


    *OR* since we're throwing around unsubstantiated ulterior motive accusations willy-nilly, could this be a hit-piece to undermine C&C's article in the lead-up to THG's own forthcoming article mentioned in yesterday's Geekbench thread? 🤔🤨 *dramatic conspiracy music plays*

    Nah, I think Toms knows they're not in quite the same market as Chips&Cheese. I've never seen them do anything to actively undermine another tech publication, like you're suggesting.

    A lot of these writers and editors have worked for different publications and outlets. I doubt any of them would want to burn bridges like that, especially for such small potential upside.
    Reply
  • cheesecake1116
    bit_user said:
    True, but it's not practical to train a LLM on a single GPU, no matter how powerful. There are lighter-weight models they could use to benchmark training, but then you'd have to consider whether those are a good proxy for LLM training performance.

    MLPerf might be a good avenue to explore, though I have no experience with it:
    https://mlcommons.org/benchmarks/training/
    Maybe @cheesecake1116 knows more about this decision.
    So we tried to run MLPerf training but it ended up being a lot more involved then I was thinking it was going to be and it would have taken more time then we had, it's why we didn't have any training data.... we did try....

    As for MLPerf Inference, we decided that running actual LLM models on the hardware would be more interesting data.... that was our logic, for better or for worse.....

    As for the Tom's article itself, I have major issues with it.....

    For example, the paragraph that says that we used H100 PCIe for our inference results is just wrong... We clearly stated that our inference results were run on H100 SXM, even in the charts....
    Also, we did not receive any assistance from AMD.... All AMD did was verify that our numbers were reproducible on their systems... that is all....
    Frankly it was a judgement call on whether or not to even mention AMD here, but I thought it was more honest to say that we did reach out to AMD then to say nothing......

    As for why no GH200 numbers for LLaMA3-70B, it's simple, we didn't have access to that machine anymore.... One of the Co-Authors', Neggles, rented that machine out, from Hydra.ai, for testing and we only had that machine for about 6 hours thanks to a mess up on Hydra's end. And we weren't just interested in the Hopper part of the system but also the Grace part of that system so we were running CPU microbenchmarks on it as well which limited the amount of time we had for testing even more.....

    Luckily Neggles had access to a H100 SXM machine where we ran the rest of our inference testing, we didn't have the chance to rerun the rest of our suite hence why we disclaimed at the start why our numbers for the microbenchmarks and the OCL tests were run on H100 PCIe... Would I have liked to rerun all of our benchmarks on H100 SXM or GH200, of course... But we simply didn't have the time or resources to do so.....

    That's why the article is as it is..... we are not a large site with a ton of resources.... we take what we can get and we do our best with what we have......
    Reply
  • JarredWaltonGPU
    cheesecake1116 said:
    So we tried to run MLPerf training but it ended up being a lot more involved then I was thinking it was going to be and it would have taken more time then we had, it's why we didn't have any training data.... we did try....

    As for MLPerf Inference, we decided that running actual LLM models on the hardware would be more interesting data.... that was our logic, for better or for worse.....

    As for the Tom's article itself, I have major issues with it.....

    For example, the paragraph that says that we used H100 PCIe for our inference results is just wrong... We clearly stated that our inference results were run on H100 SXM, even in the charts....
    Also, we did not receive any assistance from AMD.... All AMD did was verify that our numbers were reproducible on their systems... that is all....
    Frankly it was a judgement call on whether or not to even mention AMD here, but I thought it was more honest to say that we did reach out to AMD then to say nothing......

    As for why no GH200 numbers for LLaMA3-70B, it's simple, we didn't have access to that machine anymore.... One of the Co-Authors', Neggles, rented that machine out, from Hydra.ai, for testing and we only had that machine for about 6 hours thanks to a mess up on Hydra's end. And we weren't just interested in the Hopper part of the system but also the Grace part of that system so we were running CPU microbenchmarks on it as well which limited the amount of time we had for testing even more.....

    Luckily Neggles had access to a H100 SXM machine where we ran the rest of our inference testing, we didn't have the chance to rerun the rest of our suite hence why we disclaimed at the start why our numbers for the microbenchmarks and the OCL tests were run on H100 PCIe... Would I have liked to rerun all of our benchmarks on H100 SXM or GH200, of course... But we simply didn't have the time or resources to do so.....

    That's why the article is as it is..... we are not a large site with a ton of resources.... we take what we can get and we do our best with what we have......
    Thanks for coming here, cheesecake, and I really don't want any bad blood between us. News pieces do get funneled through and things slip in that maybe shouldn't be. I know my initial response when reading the writeup on your site was, "Wait, why didn't they ask Nvidia for comment / help / input?"

    We've tried to make it clear in the text that the PCIe H100 was used for the low-level testing, while SXM H100 (and GH200) were used for the inference testing. If there's a specific sentence that I've missed that suggests PCIe was used, LMK and I'll fix that.

    I've also toned down any rhetoric suggesting AMD sponsorship. That was my bad to begin with, because I've seen stuff over the years where it's obvious one company basically provided a lot of help to do specific testing that would make their products look better. And whether intentional or not, this MI300X piece feels a bit that way (mostly due to a lack of resources, which I totally get).

    If there's anything still in the text (refresh it to get the latest updates) that really strikes you as being off base or wrong, let me know.
    Reply
  • KnightShadey
    JarredWaltonGPU said:
    Whether you like it or not, receiving help to get things running from AMD and not receiving equivalent input from Nvidia is inherently biased.

    It's not whether I like it or not, it's accusations towards actions that don't rise to the level of those statements, which at least to me, seems unwarranted and factually incorrect, or at the very least bad form, especially if you have experienced similarly likely baseless claims. 🤨
    Again, did AMD get things running, or just help make sure that their results were valid for other MI300X? Those are two different claims. One implies a direct hand, the other implies confirming results or informing of dissimilarities with other MI300s

    The criticism as presented seems too harsh for what amounts to the crime of omission or limitations due to time with the hardware which needed to be returned to Hot Aisle.

    Additionally, it's not inherently biased, which implies a deliberate act for/against that is unreasonable (you had time but didn't bother) or prejudicial (your obvious hatred of Apple, errr.. ATi, errr.. whatever) to the point of invalidating the content, requiring a willful act. Whereas throughout C&C constantly acknowledged the limitations of their setup and testing, even questioning too favourable AMD results, including (but not limited to);

    Starting the 3rd Paragraph;
    " Please note that all of our H100 data, except for the inference data, was generated using the PCIe version of H100, which features slower HBM2e memory, fewer CUDA cores, and a reduced TDP of 350 watts. Our inference data was generated on a H100 SXM5 box which has 3.35TB per second of memory bandwidth (compared to the 2.04TB/s of H100 PCIe), 18 more SMs compared to the PCIe version, as well as having an increased TDP of 700 watts.:"

    "My INT32 add test also behaves as if MI300X is packing values and performing math at double rate, but something definitely went wrong.."

    "NVIDIA doesn’t advertise support for FP16 through OpenCL, so I’m testing on H100 by not doing the check and using the half datatype anyway."

    and my personal fav as mentioned in the other thread the CGP test (which also included the Radeon 780M and 610M because... everyone is using those for training, and is Industry Standard hardware... or perhaps for humour & illumination... you decide);
    "I spent about a day writing this and did not put effort into optimization. The code should be considered representative of what happens if you throw a cup of coffee at a high school intern and tell them to get cracking."
    "The Raphael iGPU included with AMD’s Zen 4 desktop CPUs is another option. It can do the same work in 4.2 hours, which leaves time for a coffee break."

    Sure it's not representative of the full picture due to choices made and limitations they expressed, but that is different from bias.

    Also for what opened & ended with comments on the lack of CUDA support for AMD hardware and their struggles to address that, I don't see many reviewers saying they refuse to use nV tools as it disadvantages the others, they just say they tried to use equivalent tools... To what extent do you go with what you have in front of you and can easily access? Does everyone get a hotline to every IHV/ISV and delay their investigation?

    nVidia has the same right as anyone else, the right of reply. And they will. of that I am sure. And if C&C get the chance I doubt they'd turn down an opportunity for a follow-up investigation.

    My issue is that the level of accusations/criticism/skepticism of a well documented test is harsher than the criticism of THG's reporting of nVidia claims of future CopilotAsterixRT++ support that they are supposedly working on with M$, or AMD's IPC slide claims, or intel's claims, all with far FAR Less supporting details or caveats than that opening 3rd paragraph.

    The balance of skepticism seems to be focused in the wrong area IMO...

    ... but as ever that my 2 frames worth, your mileage may vary. 🤷🏻‍♂️
    Reply
  • KnightShadey
    Well, guess I shouldn't have wasted time re-reading the article if Cheese was going to reply himself...... Oh well. 🤣

    ps, I STILL stand by my Original Claim.. Soup Should Be Eaten WITH A SPORK !!🤪
    Reply