Nvidia's Jensen Huang expects GAA-based technologies to bring a 20% performance uplift

(Image credit: TSMC)

Nvidia's Jensen Huang said during a Q&A session at GTC that next-generation process technologies relying on gate-all-around (GAA) transistors will likely bring about a 20% performance boost for the company's processors, reports EE Times. However, the most significant performance uplifts for Nvidia's GPUs are brought by the company's architectures, as well as software innovations.

When asked about future generation Nvidia GPU architectures like Feynman, which is expected two generations from now (2028), Huang mentioned that if Nvidia transitions to a process technology that relies on GAA transistors, it should bring a 20% increase in performance.

Our own Jarred Walton was at the Q&A, and says Huang seemed to downplay the importance of process node changes, emphasizing that the slowdown in Moore's Law means brand-new process technologies going forward are only likely to bring around a 20% improvement — in density, power, and/or efficiency. It wasn't a definitive statement on what node Nvidia might intend to use, though the answer was in response to an analyst question looking for his comments about the potential for Nvidia to use Samsung Foundry in particular.

Huang also noted that while improvements enabled by leading-edge process technologies are welcome, they're no longer transformative. "We'll take it," he said, but indicated other factors were more important. As AI systems scale, the efficiency of managing vast numbers of processors is becoming more important than the raw performance of each processor. Data centers are increasingly looking at performance per watt, Jensen said, nothing that "we're at the limit of physics."

Unlike Apple, which is TSMC's alpha customer for all leading-edge nodes, Nvidia is not typically a company that adopts TSMC's latest process technologies first. Instead, it uses proven technologies. Nvidia has used tailored versions of TSMC's 4nm-class process technologies — 4N and 4NP — to produce its Ada Lovelace, Hopper, and Blackwell GPUs for client PCs and datacenters. TSMC's 4nm-class production nodes belong to the company's 5nm-class process development kit and are essentially refined versions of the foundry's 5nm technology.

The company's next-generation GPUs for AI (codenamed Rubin, with custom Vera CPUs) are expected next year and are projected to use TSMC's 3nm-class fabrication process (presumably N3P, or a tailored version like "3NP"). To that end, it makes sense to expect Nvidia to adopt a GAA-based process technology for Feynman, which is expected in 2028.

TSMC itself expects its first GAA-based process technology — N2 — to increase performance by 10% to 15% compared to N3E, the company's second generation 3nm-class process technology that precedes N3P. Again, Nvidia's Huang likely wasn't even referring to TSMC N2 or Samsung's alternative or even Intel's 18A, but rather just suggesting a 20% improvement in general is what he expects.

It's worth noting that since Nvidia does not use first-generation process technologies (or at least has not used in years), we would expect Feynman GPUs to adopt N2P (if it continues to use TSMC), which enhances performance, reduces resistance and stabilizes power delivery, or even A16 that adds backside power delivery and promises an 8% to 10% performance uplift compared to N2. Both N2P and A16 are expected to ramp in 2027.

If Nvidia adopts N2P or A16 for its 2028 products, then it's reasonable for the company to expect a 20% performance per watt gain for its Feynman GPUs at N2P or A16 compared to Rubin GPUs at N3P. It could be even more than that, though Nvidia seems to be pushing for maximum performance at times rather than maximum efficiency, given the voracious demands for AI compute right now.

While Nvidia is one of the leading developers of processors these days, Jensen Huang emphasized multiple times that his company is not simply a semiconductor company anymore. Instead, he described the company as a provider of large-scale AI infrastructure. He also described it as a leader in algorithm development, especially for computer graphics, robotics, and fields like computational lithography.

But while Nvidia has been gradually shifting from development of 'just' compute GPUs to AI servers and now server racks and clusters, Huang believes that Nvidia does not necessarily compete with its own customers. According to him, Nvidia does not build actual solutions for the end user, but rather supplies foundational technologies.

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

21 Comments Comment from the forums

usertests

We need "3D" process nodes. But GAAFETs and backside power delivery will be a nice snack, especially for chips using <15 Watts instead of 500.
Reply
Jame5

500w is so last generation. We are onto 1kW and 2kW designs in the future.
Reply
Neilbob

- If it results in limited performance, prices will rise and Nvidia will increase profit.
- If it results in plenty of performance, prices will rise and Nvidia will increase profit.
- If the chips are knitted by Grandma using metal scouring pads, prices will rise and Nvidia will increase profit.
Reply
bit_user

The article said:
TSMC itself expects its first GAA-based process technology — N2 — to increase performance by 10% to 15% compared to N3E, the company's second generation 3nm-class process technology that precedes N3P. Given these metrics, it looks like Nvidia's Huang is a bit more optimistic about N2 than TSMC is.
Anton, are you for real?

I have a ton of respect for Anton, but every once in a while, he says something that it seems like he should really know better than to believe. All of TSMC's estimates are based on a mountain of assumptions and organized around a theoretical (or actual?) reference CPU core. This dictates the mix of different cells and doesn't permit designers to adapt the microarchitecture to the process node, but instead looks at only the impact of directly porting the core between nodes. I've previously seen articles mention specifically which CPU core ARM uses for these estimates, but I'm having trouble digging it up.

Essentially, when they quote a performance number, it's just looking at how much higher you could clock the same design at the same power, when doing a direct port. But, the thing is that people rarely do direct ports from one node to the next, especially if they're in different families.

So, the first point of divergence between TSMC & Nvidia's numbers is that GPUs are different in their mix of cells than CPUs. Secondly, if you use the new node's additional density and timing budget to increase IPC, then you can definitely beat their performance estimates. As I said, the way they estimate performance gains is essentially just by looking at how much you could increase the clock speed. Yet, in most cases, you can do better with a mix of IPC and clockspeed improvements. This is especially true of something like a GPU or NPU, where a feature like "tensor cores" really fall into the category of an IPC increase.
Reply
bit_user

Neilbob said:
- If it results in limited performance, prices will rise and Nvidia will increase profit.
- If it results in plenty of performance, prices will rise and Nvidia will increase profit.
- If the chips are knitted by Grandma using metal scouring pads, prices will rise and Nvidia will increase profit.
That's too simplistic. Nvidia actually does need to compete with alternatives on perf/$. The more they can increase performance, the higher a price premium they can justify. If the increase is small, it gives others a chance to close the gap and will put more pricing pressure on Nvidia.
Reply
Rob1C

TSMC isn't quite as optimistic, clocks (alone) could reach 20% higher at 2nm; that is all.

It's not until the next node, which isn't far off, that we get that much of a gain:

https://www.tsmc.com/english/dedicatedFoundry/technology/logic/l_A16

"A16 is best suited for HPC products with complex signal routes and dense power delivery network, as they can benefit the most from backside power delivery. Compared with N2P (mine: "better than 2nm"), A16 offers 8%~10% speed improvement at the same Vdd, 15%~20% power reduction at the same speed, and 1.07~1.10X chip density.".

Still, glad they are pushing forward; or downward, if you prefer.
Reply
Giroro

I wouldn't buy a new GPU for a 20% uplift in performance - especially not when Nvidia would increase prices and TDP by more than 20% to get there. If in 5 years I'll have to make the hard choice between 1440p144 high or 1440p120 ultra, I think I'll manage to find a way to survive. Come back when you can double performance at the same price/power, or when there's some revolutionary new rendering technology that a game I want to play absolutely requires.

But Nvidia doesn't care about me or fps, or games, or in general anymore. It's all about how many Tegraflops they can FP4ize per jiggawatt.
So if the new Nvidia can Amazon suggestive-sell me some ALLCAPS junk in a slightly faster in-browser pop ad, or to help Google employees with direct access to uncensored Gemini see their fictional interpretations of celebrity body parts faster ... I guess I just don't care. If it takes me 36 seconds instead of 30 to figure out if stable diffusion randomly decided to insert a stripper into the image I was upscaling, again, it's not a big deal. Or it's at least not a problem anyone should pay $2000+ to fix.
Reply
JarredWaltonGPU

bit_user said:
Anton, are you for real?

I have a ton of respect for Anton, but every once in a while, he says something that it seems like he should really know better than to believe. All of TSMC's estimates are based on a mountain of assumptions and organized around a theoretical (or actual?) reference CPU core. This dictates the mix of different cells and doesn't permit designers to adapt the microarchitecture to the process node, but instead looks at only the impact of directly porting the core between nodes. I've previously seen articles mention specifically which CPU core ARM uses for these estimates, but I'm having trouble digging it up.

Essentially, when they quote a performance number, it's just looking at how much higher you could clock the same design at the same power, when doing a direct port. But, the thing is that people rarely do direct ports from one node to the next, especially if they're in different families.

So, the first point of divergence between TSMC & Nvidia's numbers is that GPUs are different in their mix of cells than CPUs. Secondly, if you use the new node's additional density and timing budget to increase IPC, then you can definitely beat their performance estimates. As I said, the way they estimate performance gains is essentially just by looking at how much you could increase the clock speed. Yet, in most cases, you can do better with a mix of IPC and clockspeed improvements. This is especially true of something like a GPU or NPU, where a feature like "tensor cores" really fall into the category of an IPC increase.
Anton wasn't at the Q&A, and of course he understands there's a LOT of factors going into the discussion. In fact, I would say Jensen Huang wasn't even speaking about N2 and GAA when he said "20%." You'd have to have been there, and I don't have a recording, but the "20%" figure was more of a quip than an actual estimate.

IIRC, some analyst from Korea asked about the possibility of Nvidia using Samsung's upcoming GAA node, and how much of a gain it might bring. Jensen responded very quickly and said something to the effect of: "20%. That's about all we can expect from a process node transition these days, with the end of Moore's Law. So if you ask what we'll get from a newer, more advanced node, I'd say 20%." I'm paraphrasing here, but it was something along those lines.

He then went into a lengthier discussion to reiterate again how Nvidia is now an AI infrastructure company, performance per watt is the limiting factor for data centers, and how the plan is to have even higher power, more dense racks in the future.

Another caveat: I may be mixing together different responses to different questions, because there's a lot of overlap in what was said. Anyway, I'll tweak that sentence because I most definitely don't feel like Jensen was implying TSMC N2 GAA would be 20% higher while TSMC is only projecting 10~15 percent. And Jensen was more focused on performance per watt rather than just performance.
Reply
Neilbob

Tsk! You should know by now that almost everything I say is dripping with snark and sarcasm. Snarkasm, one might say.

Edit: belatedly realised I forgot to quote bit_user, so now I look like a nutter making random out of context remarks. Whoops.
Reply
Pierce2623

bit_user said:
Anton, are you for real?

I have a ton of respect for Anton, but every once in a while, he says something that it seems like he should really know better than to believe. All of TSMC's estimates are based on a mountain of assumptions and organized around a theoretical (or actual?) reference CPU core. This dictates the mix of different cells and doesn't permit designers to adapt the microarchitecture to the process node, but instead looks at only the impact of directly porting the core between nodes. I've previously seen articles mention specifically which CPU core ARM uses for these estimates, but I'm having trouble digging it up.

Essentially, when they quote a performance number, it's just looking at how much higher you could clock the same design at the same power, when doing a direct port. But, the thing is that people rarely do direct ports from one node to the next, especially if they're in different families.

So, the first point of divergence between TSMC & Nvidia's numbers is that GPUs are different in their mix of cells than CPUs. Secondly, if you use the new node's additional density and timing budget to increase IPC, then you can definitely beat their performance estimates. As I said, the way they estimate performance gains is essentially just by looking at how much you could increase the clock speed. Yet, in most cases, you can do better with a mix of IPC and clockspeed improvements. This is especially true of something like a GPU or NPU, where a feature like "tensor cores" really fall into the category of an IPC increase.
Bud, chill. Half of those gains you’re talking about are what will be referred to as “architectural gains”. You know I agree with you 9 times out of 10 but you just seem like you’re making a big deal out of node gains vs architectural gains for no reason, here. I mean IPC gains literally have nothing to do with the node. The only way nodes affects IPC in any way is through secondary benefits from increased density or efficiency.
Reply

Show more comments