TSMC: Shortage of Nvidia's AI GPUs to Persist for 1.5 Years

Nvidia Hopper H100 GPU and DGX systems
(Image credit: Nvidia)

The chairman of TSMC admitted that the ongoing short supply of compute GPUs for artificial intelligence (AI) and high-performance computing (HPC) applications is caused by constraints of its chip-on-wafer-on-substrate (CoWoS) packaging capacity. This shortage is expected to persist for around 18 months due to rising demand for generative AI applicationsand relatively slow expansion of CoWoS capacity at TSMC.  

"It is not the shortage of AI chips, it is the shortage of our CoWoS capacity," said Mark Liu, the chairman of TSMC, in a conversation with Nikkei at Semicon Taiwan. "Currently, we cannot fulfill 100% of our customers' needs, but we try to support about 80%. We think this is a temporary phenomenon. After our expansion of [advanced chip packaging capacity], it should be alleviated in one and a half years."

TSMC is the producer of the majority of AI processors, including Nvidia's A100 and H100 compute GPUs that are integral to AI tools like ChatGPT and are predominantly used in AI data centers. These processors, just like solutions from other players like AMD, AWS, and Google, use HBM memory (which is essential for high bandwidth and proper functioning of extensive AI language models) and CoWoS packaging, which puts additional strain on TSMC's advanced packaging facilities.

Liu said that demand for CoWoS surged unexpectedly earlier this year, tripling year-over-year, leading to the current supply constraints. TSMC recognizes that demand for generative AI services is growing and so is demand for appropriate hardware, so it is speeding up expansion of CoWoS capacity to meet demand for compute GPUs as well as specialized AI accelerators and processors.

At present, the company is installing additional tools for CoWoS at its existing advanced packaging facilities, but this takes time and the company expects its CoWoS capacity to double only by the end of 2024.

In addition, TSMC recently announced intention to invest $2.9 billion in a new facility dedicated to advanced chip packaging. This facility, located near Miaoli, Taiwan, is a testament to the company's commitment to addressing demand for advanced packaging from all sectors and recognized importance of advanced chip packaging in the semiconductor industry going forward

This focus on advanced chip packaging is not exclusive to TSMC; other industry giants like Intel and Samsung are also prioritizing it, with Intel aiming to quadruple its capacity for its top-tier chip packaging by 2025. Traditional outsource semiconductor assembly and test (OSAT) companies like ASE and Amkor also have technologies similar to CoWoS, but they yet have to build up capacity for them comparable to that of TSMC, Intel, and Samsung.

 

Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

  • NeoMorpheus
    Reply
  • hotaru251
    leather jacket jensen: "but thats profit im losing out on!"

    not really but prolly does go through his head.
    Reply
  • derekullo
    Title almost makes it sound like there is a packing peanuts shortage!

    ...Ohh The Humanity!
    Reply
  • edzieba
    Whilst he was talking about the AI market (because that was what the conference session was about) any package that relies on TSMCs advanced packaging would also be affected. That includes Ryzen with 3D vCache (SoIC), chiplet-based Radeon (InFO _oS), Nvidia GPUs with HBM (CoWoS-S), etc. Intel would not be affected (by this particular bottleneck) as they have their own packaging capability.
    Reply
  • The Hardcard
    edzieba said:
    Whilst he was talking about the AI market (because that was what the conference session was about) any package that relies on TSMCs advanced packaging would also be affected. That includes Ryzen with 3D vCache (SoIC), chiplet-based Radeon (InFO _oS), Nvidia GPUs with HBM (CoWoS-S), etc. Intel would not be affected (by this particular bottleneck) as they have their own packaging capability.

    A big part of the issue is that packaging capacity had already been fully allocated. Nvidia‘s main problem is trying to get more allocation because of surging market demand. So companies that are not increasing their demand above what they have already allocated won’t be affected as much.

    also, some companies will be able to shift needs. AMD definitely needs to increase capacity for their new AI products, but given that they use the same modular designs for multiple products they can shift needs. In fact, I wouldn’t be surprised if that contributed to the cancellation of Navi 41. It appears that that was going to have a similar active interposer design as their MI 300 series.

    Whatever the reasons it was cancelled, they can now turn that packaging allocation to the big money AI GPU chips .
    Reply
  • PEnns
    derekullo said:
    Title almost makes it sound like there is a packing peanuts shortage!

    ...Ohh The Humanity!

    I was about to donate my 20 lbs of those to that poor company.....😄
    Reply
  • MoxNix
    How convenient for NVidia. Let the price gouging continue!
    Reply
  • Matt_ogu812
    edzieba said:
    Whilst he was talking about the AI market (because that was what the conference session was about) any package that relies on TSMCs advanced packaging would also be affected. That includes Ryzen with 3D vCache (SoIC), chiplet-based Radeon (InFO _oS), Nvidia GPUs with HBM (CoWoS-S), etc. Intel would not be affected (by this particular bottleneck) as they have their own packaging capability.
    Merely reading about this will create a shortage.
    The Power of Suggestion.
    Reply
  • Matt_ogu812
    MoxNix said:
    How convenient for NVidia. Let the price gouging continue!
    Anything to justify a price increase so it gets by the convenient excuse of a shortage.
    Reply
  • Matt_ogu812
    The Hardcard said:
    A big part of the issue is that packaging capacity had already been fully allocated. Nvidia‘s main problem is trying to get more allocation because of surging market demand. So companies that are not increasing their demand above what they have already allocated won’t be affected as much.

    also, some companies will be able to shift needs. AMD definitely needs to increase capacity for their new AI products, but given that they use the same modular designs for multiple products they can shift needs. In fact, I wouldn’t be surprised if that contributed to the cancellation of Navi 41. It appears that that was going to have a similar active interposer design as their MI 300 series.

    Whatever the reasons it was cancelled, they can now turn that packaging allocation to the big money AI GPU chips .
    A big part of the issue is finding a place to store/hide all these 'obscene profits' from the auditors :cool:
    Reply