'Clarified' US sanctions do not impact Nvidia RTX 4090D 'Dragon' or H20 GPUs [Updated]

Nvidia RTX 4090D
(Image credit: Nvidia)

Update 4/42024 6:15am PT: We have been notified that the refined and 'clarified' U.S. sanctions do not impact Nvidia's existing sanctions-compliant GPUs for China, specifically the H20 and RTX 4090D.

The new document includes "Corrections and Clarifications" on the export controls, and some of the language was confusing and misinterpreted, by us and other sites. Specifically, the document details "adjusted peak performance" (APP) and "weighted teraflops" (WT), with a limit of 70 TFLOPS or less. We have received additional information from Nvidia on the restrictions and clarifications, and the short summary is that the sanctions-compliant H20 and 4090D GPUs are not impacted.

The specific reasons that the 4090D isn't affected has to do with the definitions. First, the guidelines are for computer systems, not individual GPUs, and more specifically they are for systems with memory coherence — as an example, a 4-way DGX H100 system would fall under this classification.

In an email from Nvidia, it states: "Processor combinations share memory when any processor is capable of accessing any memory location in the system through the hardware transmission of cache lines or memory words, without the involvement of any software mechanism, which may be achieved using “electronic assemblies” specified in 4A003.c, z.1, or z.3."

The other important detail is that the "adjusted peak performance" applies to FP64 throughput, and it's "weighted" because the value gets scaled based on whether it's a vector processor or a scalar (non-vector) processor. In other words, FP64 done via vector units like Nvidia Tensor cores is different from FP64 done via a CPU running 64-bit calculations. (That's a simplification, as CPUs can also include vector units.)

To determine the "weighted teraflops" and "adjusted peak performance," take the aggregate FP64 throughput of the system. Then multiply by 0.9 for vector processors or by 0.3 for non-vector processors. So going back to the 4-way DGX H100 as an example, the H100 SXM variant of the GPU has 67 teraflops of vector FP64 throughput. Four of them in aggregate would deliver 268 teraflops, and multiplied by 0.9 gives 241.2 — well above the 70 weighted teraflops limit. And of course, the HGX H100 would have already been restricted even prior to the more recent updates.

So, what has actually changed? Succinctly, not much. These are not new export controls or restrictions but rather an addendum to attempt to clarify the official "speed limits." The RTX 4090D for its part hardly offers any FP64 throughput, only 1.15 TFLOPS, though it still comes close to the 4,800 TPP limit.

Original unedited article (which misinterpreted the 'clarifications' described above):

The United States government has revised its Chinese semiconductor export restrictions to encompass more high-performance hardware. Specifically, any semiconductor chip offering over 70 "Weighted TeraFLOPS" of performance is now banned from export to China without a license. This lowered limit now includes Nvidia's Chinese-exclusive RTX 4090D "Dragon" graphics card.

The RTX 4090D was made specifically to comply with the U.S. China export bans several months back. The RTX 4090 exceeded the 4,800 Total Processing Power (TPP) limit by 10%, and so Nvidia created the 4090D to come in below that limit (it lands at 4,707 TPP). Amazingly, the new 70 TFLOPS limit is only 5% lower than the RTX 4090D's 73.5 TFLOPS performance figure.

While this change was seemingly inevitable, we have to question whether it's even meaningful. After the launch of the RTX 4090D, the U.S. government has warned Nvidia that its tactics wouldn't go unnoticed, and it has now moved to ban Nvidia's China-exclusive GPU. But does a 5% reduction in the GPU 'speed limit' even matter, and if so, what happens when Nvidia makes a new GPU that comes in below that limit?

The RTX 4090D is a cut-down variant of the RTX 4090, featuring 14,592 CUDA cores and a 425W TBP. Compared to the outgoing RTX 4090, the RTX 4090D has 12.8% fewer CUDA cores and a 5.9% lower TDP. All other core specifications remain the same between the two. The only exception is the base clock, which has been brought up slightly to 2.28 GHz from 2.23 GHz.

Swipe to scroll horizontally
RTX 4090D vs 4090 Specifications
Row 0 - Cell 0 RTX 4090DRTX 4090
SMs114128
CUDA Cores14,59216,384
Tensor Cores456512
RT Cores114128
Boost Clock2,520MHz2,520MHz
Base Clock2,280MHz2,235MHz
VRAM Speed21Gbps21Gbps
VRAM Capacity24GB GDDR6X24GB GDDR6X
VRAM Bus Width384-bit384-bit
VRAM Bandwidth1,008GB/s1,008GB/s
L2 Cache72MB72MB
ROPs176176
TMUs456512
TGP425W450W
Total Processing Power47075285

According to other websites that have tested the card, the RTX 4090D is roughly 10% slower than the RTX 4090 in AI workloads and only 5% slower in gaming. Ironically, Nvidia never fully "locked" the RTX 4090D, enabling Chinese gamers and professionals to overclock the RTX 4090D to RTX 4090 performance levels.

The RTX 4090D was expressly designed for the purpose of complying with America's China export restrictions. These laws were put in place to prevent China and non-NATO countries from acquiring too much computing power — particularly AI processing power — for security reasons. These sanctions have been repeatedly changed over the past few years, first targeting data center chips like the Nvidia A100 and Nvidia H100, but later the RTX 4090 fell victim to the restrictions as it was 10% "too fast."

The current metric used to calculate the maximum allowed performance is known as TPP or Total Processing Power. This is calculated by the maximum compute for a given bit-depth, using TFLOPS (or TOPS for integer work) multiplied by the number of bits. For the RTX 4090, TPP is 660.8 * 8 = 5,286 for FP8 work running on the Tensor cores (sparsity doesn't count).

The new regulations apparently change the defined limit to include "Weighted TeraFLOPS" but neglect to clearly define that that means. Based on the language, however, we assume this refers to FP32 TFLOPS. The RTX 4090 for reference offers 82.6 TFLOPS of compute, while the RTX 4090D drops that to 73.5 TFLOPS — and the next step down for Nvidia's consumer GPUs is the RTX 4080 Super at 'only' 52.2 TFLOPS. Note also that these repeatedly lowering limits are starting to encroach on AMD's RX 7900 XTX, which offers 61.4 TFLOPS of compute.

While the government doesn't specifically name the RTX 4090D as the reason for the new restrictions, it's a safe bet that the card will be discontinued in the near future. Nvidia might keep the 4090D around and rebrand it for a Western audience... or it might just come out with a new GPU that once again complies with the sanctions shenanigans. Let's call it the RTX 4090 DD "Double Dragon" and give it 108 Streaming Multiprocessors and 13,824 CUDA cores, and perhaps a 400W TGP — and most importantly, 69.7 TFLOPS of compute. Then it would once again become compliant, shift tens of thousands of units into China, and probably result in yet another cut to the allowed export performance.

The new restrictions will reportedly go into effect on April 4, 2024. Nvidia has not announced any response to the lower limits, though of course it will comply with them. But fundamentally, if 80 TFLOPS was too fast, and now 70 TFLOPS is also too fast, at some point the U.S. needs to set a hard limit and stick with it — or it will end up 'banning' GPUs that have long since been discontinued. It's doubtful the restrictions are even fully effective, as there are still plenty of Chinese customers hungry for GPUs, and the hardware likely continues to flow into the country through indirect means.

Aaron Klotz
Freelance News Writer

Aaron Klotz is a freelance writer for Tom’s Hardware US, covering news topics related to computer hardware such as CPUs, and graphics cards.

  • ThomasKinsley
    After the launch of the RTX 4090D, the U.S. government has warned Nvidia that its tactics wouldn't go unnoticed, and it has now moved to ban Nvidia's China-exclusive GPU.
    Ah, yes. The questionable tactics of . . . *squints* . . . complying with US sanctions.
    Reply
  • atomicWAR
    I have been waiting for this to happen. Regardless of whether you approve of the sanctions, the constant compute lowering tactics by both sides is getting a little old. Why make a standard for manufacturers to follow, have them comply only to lower the processing power bar again so its new 'high-end' products must be redesigned to fall under the new max allowable limit? You want sanctions that is all fine and well but make your mind up at what your trying to limit. Because you know companies will design products that walk right up to that line. Not that Nvidia is free of guilt here but, AND I can't believe I am going to say this, they aren't doing anything that any other company wouldn't do.

    Let's make an analogy. Lets say the government decided it want to sanction soda cans sold in large quantities to country X. First they say look we can't have anything sold that contains more than eight cans of soda in a single package. So Coke drops selling twelve packs of cans and starts selling their newly designed eight packs for their high end can count. Then a few months latter the government goes 'Hey we see what your doing here trying to skirt right up to the edge of what the law allows and we don't like it' so they lower the bar on how much you can sell again but this time makes it so soda manufactures can only sell packages that have six cans. So if the government only ever wanted soda makers to sell six packs, why not start there? This isn't political it's just common sense. You make a rule/law...someone will always push right up to the edge of what is allowed.

    If governments didn't want China to have anything faster that a 4080/7900xtx for example, then just say that be it literally...only 4080's and 7900XTX or via their compute numbers. smh at compute antics...
    Reply
  • JarredWaltonGPU
    atomicWAR said:
    I have been waiting for this to happen. Regardless of whether you approve of the sanctions, the constant compute lowering tactics by both sides is getting a little old. Why make a standard for manufacturers to follow, have them comply only to lower the processing power bar again so its new 'high-end' products must be redesigned to fall under the new max allowable limit? You want sanctions that is all fine and well but make your mind up at what your trying to limit. Because you know companies will design products that walk right up to that line. Not that Nvidia is free of guilt here but, AND I can't believe I am going to say this, they aren't doing anything that any other company wouldn't do.

    Let's make an analogy. Lets say the government decided it want to sanction soda cans sold in large quantities to country X. First they say look we can't have anything sold that contains more than eight cans of soda in a single package. So Coke drops selling twelve packs of cans and starts selling their newly designed eight packs for their high end can count. Then a few months latter the government goes 'Hey we see what your doing here trying to skirt right up to the edge of what the law allows and we don't like it' so they lower the bar on how much you can sell again but this time makes it so soda manufactures can only sell packages that have six cans. So if the government only ever wanted soda makers to sell six packs, why not start there? This isn't political it's just common sense. You make a rule/law...someone will always push right up to the edge of what is allowed.

    If governments didn't want China to have anything faster that a 4080/7900xtx for example, then just say that be it literally...only 4080's and 7900XTX or via their compute numbers. smh at compute antics...
    The problem is that the government can't specifically target a product or company by name. So it can define rules and say, "nothing above this level" and then it can say what products fail that criterion, but it can't say, "no RTX 4090 cards."

    But I totally agree with the rest of what you're saying. It's asinine that the US DoC created rules, realized that they didn't like how companies were complying with those rules, and so created new rules that affected more products... and then when products were still being sold, lowered the limit yet again. That's idiot bureaucracy at it's worst.

    And an even bigger part of the problem here is that if the US wants to limit sales of AI and GPU hardware that can do 70 teraflops of FP32, well, China just needs twice as many 35+ teraflops GPUs. And so many GPUs have already been sold to China before the updated restrictions were put into place that it's an attempt to put the cat back in the box.

    I felt like the initial rules were fine and were designed to prevent the sale of future GPUs to China. And they would have worked for that — nothing using Blackwell B200 will be allowed for sale in China. But trying to retcon the whole situation will never work.
    Reply
  • Notton
    Okay, so hear me out.
    The people implementing these sanctions have no idea what a TFLOP is.
    They are out of touch with technology, and haven't bothered to do their homework by hiring an expert in the field.
    They probably think TFLOP is a physical object that gets consumed when it is used, and applied sanctions like they were done historically for consumables.

    They are, literally, trying to put restrictions on how fast mathematics can be done.
    I am sure we all know that 1+1=2, and all you have to do is buy 2x 4080s to circumvent this new rule.
    Reply
  • Amdlova
    Notton said:
    Okay, so hear me out.
    The people implementing these sanctions have no idea what a TFLOP is.
    They are out of touch with technology, and haven't bothered to do their homework by hiring an expert in the field.
    They probably think TFLOP is a physical object that gets consumed when it is used, and applied sanctions like they were done historically for consumables.

    They are, literally, trying to put restrictions on how fast mathematics can be done.
    I am sure we all know that 1+1=2, and all you have to do is buy 2x 4080s to circumvent this new rule.
    Gpu not scale well... you miss the sli and crossfire days.
    The rtx 4090D is a try against the America government. Nvidia should be sanctioned by Uncle Sam
    Reply
  • SirStephenH
    The reason for the change is obviously because the "compliant" 4090D can easily be overclocked to full 4090 performance. Maybe instead of simply lowering the overall performance limit the government could, oh, I don't know, target the specific ways manufacturers can skirt the law.
    Reply
  • JarredWaltonGPU
    Amdlova said:
    Gpu not scale well... you miss the sli and crossfire days.
    The rtx 4090D is a try against the America government. Nvidia should be sanctioned by Uncle Sam
    We’re not talking about scaling for games. AI tends to scale much better with multi-GPU, though the inter-GPU communications do become a serious bottleneck as you start moving to hundreds and thousands of GPUs.
    SirStephenH said:
    The reason for the change is obviously because the "compliant" 4090D can easily be overclocked to full 4090 performance. Maybe instead of simply lowering the overall performance limit the government could, oh, I don't know, target the specific ways manufacturers can skirt the law.
    First, the 4090D isn’t skirting the law. It was in full compliance. The law just changed (again) because of morons who don’t know how the tech industry works.

    Second, I can guarantee it’s not about end user overclocking. That only hit the news recently, and this was in the works basically since the last changes happened. Large-scale installations are not going to bother with redlining the cards for 5-10 percent more performance if it compromises stability.

    The problem is that the sanctions aren’t working as well as the govt would like and so they keep trying to plug the holes with their thumbs. Meanwhile, head-sized holes keep leaking.
    Reply
  • tracker1
    The govt should have set the limit at the second tier. Like between the 7900xt and xtx. Just chopping off the highest end altogether.

    I think some of the people involved just like the conflict and drama in the back and forth.

    As TFA mentioned, it's likely not cutting off the resellers and side markets, just capping NVidia. Not that I mind knocking NVidia down a peg or two.
    Reply
  • hotaru251

    Nvidia never fully "locked" the RTX 4090D, enabling Chinese gamers and professionals to overclock the RTX 4090D to RTX 4090 performance levels.
    which you know was done on purpose....Nvidia is very proactive on limiting and locking performance down anytime they leave performance lie kthat on table its by choice.
    Reply
  • bit_user
    JarredWaltonGPU said:
    First, the 4090D isn’t skirting the law. It was in full compliance.
    Being overclockable to exceed the limit shows Nvidia acting in bad faith. IMO, they should get slapped with a fine amounting to multiple times the value of any such units sold in China. Not that they would feel it, right now, but such a violation should not pass without some response. Especially after all the noise they made about wanting to cooperate.
    Reply