AI GPU clusters with one million GPUs are planned for 2027 — Broadcom says three AI supercomputers are in the works

Google TPUv4 in the data center
(Image credit: Google)

When Elon Musk announced plans to expand xAI's Colossus AI supercomputer from 100,000 GPUs today to 1 million GPUs in the future, the plan seemed overwhelming. But xAI will not be alone in having such a gargantuan supercomputer. Broadcom predicts that three of its clients, among hyperscalers, will deploy AI supercomputers with one million XPUs in fiscal 2027.

"As you know, we currently have three hyperscale customers who have developed their own multi-generational AI XPU roadmap to be deployed at varying rates over the next three years," said Hock Tan, President and CEO of Broadcom, at the company's Q4 2024 earnings call. "In 2027, we believe each of them plans to deploy 1,000,000 XPU clusters across a single fabric."

In addition to serving three major hyperscaler customers, Broadcom disclosed during the call that it had landed orders from two more 'hyperscalers and is in advanced development for their own next-generation AI XPUs.' It was rumored that ByteDance and OpenAI teamed up with Broadcom to develop their AI chips. Broadcom, of course, does not mention names.

Broadcom develops chips for AI, general-purpose data processing, or custom data center hardware — for multiple big-name companies, including Google and Meta. Broadcom and its customers identify the workload demands, such as AI training, inference, or data processing. Then, the company and its partners define the specifications of their chips and develop key aspects of their main differentiators, such as the architecture of processing units, leveraging Broadcom's expertise in silicon design. Broadcom then implements this architecture in silicon and equips it with platform-specific IP, caches, inter-chip interconnects, and interfaces. Broadcom-designed high-performance XPUs are then manufactured by TSMC.

Broadcom may sell its XPUs or custom ASICs directly to customers with long-term supply agreements depending on the contract with a particular customer. In addition, Broadcom may assist in developing products, charging for collaborative engineering and/or research and development efforts.

Broadcom's XPU business is a key ingredient of its strategy to capitalize on the growing demand for AI and cloud infrastructure, making it a critical player in the AI hardware ecosystem. The company believes that in 2027, the serviceable addressable market (SAM) for AI XPU and networking will be between $60 and $90 billion, and the firm is positioned to command a leading share of this market. It is unclear how the company counts SAM, though, as this year, Nvidia alone will earn about $100 billion selling its GPUs, DPUs, and networking hardware to the AI market.

Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

  • JRStern
    Can this possibly be true?
    If so it is madness at an unprecedented level.
    Let's just do some math, that if we're talking $30,000 per B200 GPU, a million times that is thirty billion dollars. And that's probably about half the cost of the completed, installed system. Not to mention ongoing facility cost and the electric bill. Depreciate it over what, ten years, finance it at even 5% opportunity cost, ... even minimal staffing, you're talking $5b-$10b per year and more to own and operate such a thing.

    Now perhaps using a much cheaper chip, slower but with not too different price/performance, you get down to a tenth of that, still not cheap but you get the "million GPU" bragging rights.

    Still it's my perception the technology has already moved on, a cluster of even 1,000 B200s should be enough for any team, though you might have ten or twenty teams going. So if these things are overbuilt because of some technoid FOMO, they won't ever operate 90% of it on any regular basis.
    Reply
  • gg83
    How many cores will be in each GPU? 1,000,000(gpu) x 20,000(cores) is a lot of cores. Just to run a crappy computer version of a 4 year old child with instant access to all the worlds twitter statements.
    Reply
  • bit_user
    The article said:
    The company believes that in 2027, the serviceable addressable market (SAM) for AI XPU and networking will be between $60 and $90 billion, and the firm is positioned to command a leading share of this market.
    Meanwhile, AMD and Intel are squabbling over table scraps.

    I wonder about software support for Broadcom's AI accelerators. Do popular machine learning frameworks already have backends for them? I think they must, but it hasn't come up on my radar. This seems to be one of the issues AMD long struggled with, and Intel recently said Gaudi 3 will miss sales projections because its software support is running behind schedule. That makes me really curious what Broadcom has been doing!

    Also, what are the chances they include a small version of one of the AI accelerator blocks in the next Raspberry Pi SoC?
    Reply
  • bit_user
    JRStern said:
    Can this possibly be true?
    If so it is madness at an unprecedented level.
    Let's just do some math, that if we're talking $30,000 per B200 GPU, a million times that is thirty billion dollars. And that's probably about half the cost of the completed, installed system. Not to mention ongoing facility cost and the electric bill. Depreciate it over what, ten years, finance it at even 5% opportunity cost, ... even minimal staffing, you're talking $5b-$10b per year and more to own and operate such a thing.
    I think your numbers are a bit high, but you're in the right order of magnitude. I doubt customers buying 1M units will pay that $30k list price, even in spite of the extremely high demand. I'd peg the final build cost of the facility at not much more than $30B, although that's a pretty insane amount of money. It makes me wonder what other sorts of construction projects have similar price tags!

    JRStern said:
    Now perhaps using a much cheaper chip, slower but with not too different price/performance, you get down to a tenth of that, still not cheap but you get the "million GPU" bragging rights.
    I doubt Nvidia's price/perf is that far off what's possible. I could believe better price/perf by maybe a factor of 2, but not 10. Not if we're talking about training, that is.

    JRStern said:
    Still it's my perception the technology has already moved on, a cluster of even 1,000 B200s should be enough for any team,
    How do you figure that?

    JRStern said:
    though you might have ten or twenty teams going. So if these things are overbuilt because of some technoid FOMO, they won't ever operate 90% of it on any regular basis.
    A lot of the article focuses on hyperscalers, which means they definitely will have many customers using subsets of those pools. It's definitely not going to be 1M GPUs all training a single model, or anything silly like that.
    Reply
  • bit_user
    gg83 said:
    How many cores will be in each GPU? 1,000,000(gpu) x 20,000(cores) is a lot of cores.
    They're not cores, in the CPU sense of the word. They're just ALU pipelines, but bundled into SIMD-32 blocks (SIMD-32 is 1024 bits, if you'd prefer to look at it that way). So, the H100 has 528 (enabled) SIMD-32 engines, arranged into 132 SMs (Shader Multiprocessors). So, depending on how you look at it, it's either 528 or 132 cores per GPU. The 528 number puts it on par with an Intel server core, since that has 2x AVX-512 FMA ports per core.
    Reply
  • JRStern
    bit_user said:
    I doubt Nvidia's price/perf is that far off what's possible. I could believe better price/perf by maybe a factor of 2, but not 10. Not if we're talking about training, that is.
    Same price/perf but someone might use 10 slower chips at 1/10 the price, just for variety.
    bit_user said:
    How do you figure that?
    Well even NVidia bills the new chips at 20x faster (by using FP4, though that only applies to maybe half the training). Say they're right. Then 1000 B200s is like 20,000 H100s or whatever.

    But more than that ... see next item.
    bit_user said:
    A lot of the article focuses on hyperscalers, which means they definitely will have many customers using subsets of those pools. It's definitely not going to be 1M GPUs all training a single model, or anything silly like that.
    The hunger for huge numbers of GPUs came from Altman's "scale is everything!" mantra five years ago, but even the training for ChatGPT 4.o was done in four pieces, using rather less than 100k GPUs of slower vintage.

    Now, they are moving more work out of training and into inference time, but that's probably the right move, too. But it means all work is done in much smaller chunks, giving huge economies of scale.

    Plus the search is on for more continuous, human-like learning. Nobody has to wipe your brain in order to accommodate reading one more book. And, some other stuff, too.

    No doubt someone still wants to try their hand at mega-machine monotonic models, but the "scale, scale, scale" idea never made actual sense, when computational cost rises exponentially with scale, scale, scale. It's not the history of computation that stuff works like that, algorithms generally improve as fast or faster than hardware, things get exponentially easier.
    Reply
  • bit_user
    JRStern said:
    Same price/perf but someone might use 10 slower chips at 1/10 the price, just for variety.
    Training is difficult to scale like that. There's a lot of communication, hence why NVLink is such a beast. Communication doesn't scale linearly, so by having lots more nodes, your communication overhead is going to increase by an even greater amount, possibly even to point where you spend more energy on communication than computation. That's why I think maybe using a solution like half or 1/3rd as fast might be viable, but 1/10th probably isn't.

    JRStern said:
    Well even NVidia bills the new chips at 20x faster (by using FP4, though that only applies to maybe half the training).
    No, they said it's only 4x as fast at training as Hopper.
    https://www.tomshardware.com/pc-components/gpus/nvidias-next-gen-ai-gpu-revealed-blackwell-b200-gpu-delivers-up-to-20-petaflops-of-compute-and-massive-improvements-over-hopper-h100
    JRStern said:
    Now, they are moving more work out of training and into inference time, but that's probably the right move, too. But it means all work is done in much smaller chunks, giving huge economies of scale.
    Link?

    JRStern said:
    Plus the search is on for more continuous, human-like learning. Nobody has to wipe your brain in order to accommodate reading one more book. And, some other stuff, too.
    There's a long-practiced concept called "transfer-learning", which is exactly what you're talking about. That's standard practice for at least 6 years.
    Reply
  • RedBaron616
    Ever notice how all the climate doomsday types are okay with AI centers sucking up electricity as if it were free? Not a peep from the compliant media. I personally don't care as long as I am not required to help pay for more powerplants for them. I merely point out the hypocrisy.
    Reply
  • bit_user
    RedBaron616 said:
    Ever notice how all the climate doomsday types are okay with AI centers sucking up electricity as if it were free? Not a peep from the compliant media.
    No, I did not. I've heard lots of people are complaining about how much power AI us using. It's literally one of the top complaints I hear about it, even in non-techie contexts and media.

    Even on this site, it's been covered quite heavily, from multiple different angles. Here's just a small sampling of recent articles discussing the subject:
    https://www.tomshardware.com/news/power-consumption-of-ai-workloads-approaches-that-of-small-country-report https://www.tomshardware.com/tech-industry/artificial-intelligence/us-govt-wants-to-talk-to-tech-companies-about-ai-electricity-demands-eyes-nuclear-fusion-and-fission https://www.tomshardware.com/tech-industry/artificial-intelligence/elon-musks-new-worlds-fastest-ai-data-center-is-powered-by-massive-portable-power-generators-to-sidestep-electricity-supply-constraints https://www.tomshardware.com/tech-industry/artificial-intelligence/elon-musks-massive-ai-data-center-gets-unlocked-xai-gets-approved-for-150mw-of-power-enabling-all-100-000-gpus-to-run-concurrently https://www.tomshardware.com/tech-industry/artificial-intelligence/aws-ceo-estimates-large-city-scale-power-consumption-of-future-ai-model-training-tasks-an-individual-model-may-require-somewhere-between-one-to-5gw-of-power https://www.tomshardware.com/tech-industry/artificial-intelligence/meta-turns-to-nuclear-power-for-ai-training-asking-for-developer-proposals-for-small-modular-reactors-or-larger-nuclear-solutions https://www.tomshardware.com/tech-industry/artificial-intelligence/amazon-web-services-hints-at-1000-watt-next-generation-trainium-ai-chip-aws-lays-the-groundwork-for-liquid-cooled-data-centers-to-house-next-gen-ai-chips
    RedBaron616 said:
    I personally don't care as long as I am not required to help pay for more powerplants for them.
    It always has to get paid for, somehow. As powerless consumers, we're sure to foot some of that bill whether we like it or not. I think pretty much the only thing you can do is just try to boycott AI-based services and features as much as you can, in order to make it less profitable for the companies using it.
    Reply
  • JRStern
    bit_user said:
    No, they said it's only 4x as fast at training as Hopper.
    https://www.tomshardware.com/pc-components/gpus/nvidias-next-gen-ai-gpu-revealed-blackwell-b200-gpu-delivers-up-to-20-petaflops-of-compute-and-massive-improvements-over-hopper-h100

    In that same article:
    "B200 ends up with theoretically 1.25X more compute per chip with most number formats that are supported by both H100 and B200."

    NVDA puts out numbers based on chip, module, same, different, peak, total, etc. There are bits of validity to them all, and also bits of invalidity.
    Reply