TSMC: Shortage of Nvidia's AI GPUs to Persist for 1.5 Years
Insufficient packaging capacity is to blame.
The chairman of TSMC admitted that the ongoing short supply of compute GPUs for artificial intelligence (AI) and high-performance computing (HPC) applications is caused by constraints of its chip-on-wafer-on-substrate (CoWoS) packaging capacity. This shortage is expected to persist for around 18 months due to rising demand for generative AI applicationsand relatively slow expansion of CoWoS capacity at TSMC.
"It is not the shortage of AI chips, it is the shortage of our CoWoS capacity," said Mark Liu, the chairman of TSMC, in a conversation with Nikkei at Semicon Taiwan. "Currently, we cannot fulfill 100% of our customers' needs, but we try to support about 80%. We think this is a temporary phenomenon. After our expansion of [advanced chip packaging capacity], it should be alleviated in one and a half years."
TSMC is the producer of the majority of AI processors, including Nvidia's A100 and H100 compute GPUs that are integral to AI tools like ChatGPT and are predominantly used in AI data centers. These processors, just like solutions from other players like AMD, AWS, and Google, use HBM memory (which is essential for high bandwidth and proper functioning of extensive AI language models) and CoWoS packaging, which puts additional strain on TSMC's advanced packaging facilities.
Liu said that demand for CoWoS surged unexpectedly earlier this year, tripling year-over-year, leading to the current supply constraints. TSMC recognizes that demand for generative AI services is growing and so is demand for appropriate hardware, so it is speeding up expansion of CoWoS capacity to meet demand for compute GPUs as well as specialized AI accelerators and processors.
At present, the company is installing additional tools for CoWoS at its existing advanced packaging facilities, but this takes time and the company expects its CoWoS capacity to double only by the end of 2024.
In addition, TSMC recently announced intention to invest $2.9 billion in a new facility dedicated to advanced chip packaging. This facility, located near Miaoli, Taiwan, is a testament to the company's commitment to addressing demand for advanced packaging from all sectors and recognized importance of advanced chip packaging in the semiconductor industry going forward
This focus on advanced chip packaging is not exclusive to TSMC; other industry giants like Intel and Samsung are also prioritizing it, with Intel aiming to quadruple its capacity for its top-tier chip packaging by 2025. Traditional outsource semiconductor assembly and test (OSAT) companies like ASE and Amkor also have technologies similar to CoWoS, but they yet have to build up capacity for them comparable to that of TSMC, Intel, and Samsung.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.
-
hotaru251 leather jacket jensen: "but thats profit im losing out on!"Reply
not really but prolly does go through his head. -
derekullo Title almost makes it sound like there is a packing peanuts shortage!Reply
...Ohh The Humanity! -
edzieba Whilst he was talking about the AI market (because that was what the conference session was about) any package that relies on TSMCs advanced packaging would also be affected. That includes Ryzen with 3D vCache (SoIC), chiplet-based Radeon (InFO _oS), Nvidia GPUs with HBM (CoWoS-S), etc. Intel would not be affected (by this particular bottleneck) as they have their own packaging capability.Reply -
The Hardcard edzieba said:Whilst he was talking about the AI market (because that was what the conference session was about) any package that relies on TSMCs advanced packaging would also be affected. That includes Ryzen with 3D vCache (SoIC), chiplet-based Radeon (InFO _oS), Nvidia GPUs with HBM (CoWoS-S), etc. Intel would not be affected (by this particular bottleneck) as they have their own packaging capability.
A big part of the issue is that packaging capacity had already been fully allocated. Nvidia‘s main problem is trying to get more allocation because of surging market demand. So companies that are not increasing their demand above what they have already allocated won’t be affected as much.
also, some companies will be able to shift needs. AMD definitely needs to increase capacity for their new AI products, but given that they use the same modular designs for multiple products they can shift needs. In fact, I wouldn’t be surprised if that contributed to the cancellation of Navi 41. It appears that that was going to have a similar active interposer design as their MI 300 series.
Whatever the reasons it was cancelled, they can now turn that packaging allocation to the big money AI GPU chips . -
PEnns derekullo said:Title almost makes it sound like there is a packing peanuts shortage!
...Ohh The Humanity!
I was about to donate my 20 lbs of those to that poor company.....😄 -
Matt_ogu812
Merely reading about this will create a shortage.edzieba said:Whilst he was talking about the AI market (because that was what the conference session was about) any package that relies on TSMCs advanced packaging would also be affected. That includes Ryzen with 3D vCache (SoIC), chiplet-based Radeon (InFO _oS), Nvidia GPUs with HBM (CoWoS-S), etc. Intel would not be affected (by this particular bottleneck) as they have their own packaging capability.
The Power of Suggestion. -
Matt_ogu812
Anything to justify a price increase so it gets by the convenient excuse of a shortage.MoxNix said:How convenient for NVidia. Let the price gouging continue! -
Matt_ogu812
A big part of the issue is finding a place to store/hide all these 'obscene profits' from the auditors :cool:The Hardcard said:A big part of the issue is that packaging capacity had already been fully allocated. Nvidia‘s main problem is trying to get more allocation because of surging market demand. So companies that are not increasing their demand above what they have already allocated won’t be affected as much.
also, some companies will be able to shift needs. AMD definitely needs to increase capacity for their new AI products, but given that they use the same modular designs for multiple products they can shift needs. In fact, I wouldn’t be surprised if that contributed to the cancellation of Navi 41. It appears that that was going to have a similar active interposer design as their MI 300 series.
Whatever the reasons it was cancelled, they can now turn that packaging allocation to the big money AI GPU chips .