AI GPU clusters with one million GPUs are planned for 2027 — Broadcom says three AI supercomputers are in the works
AI supercomputers are getting bigger.
When Elon Musk announced plans to expand xAI's Colossus AI supercomputer from 100,000 GPUs today to 1 million GPUs in the future, the plan seemed overwhelming. But xAI will not be alone in having such a gargantuan supercomputer. Broadcom predicts that three of its clients, among hyperscalers, will deploy AI supercomputers with one million XPUs in fiscal 2027.
"As you know, we currently have three hyperscale customers who have developed their own multi-generational AI XPU roadmap to be deployed at varying rates over the next three years," said Hock Tan, President and CEO of Broadcom, at the company's Q4 2024 earnings call. "In 2027, we believe each of them plans to deploy 1,000,000 XPU clusters across a single fabric."
In addition to serving three major hyperscaler customers, Broadcom disclosed during the call that it had landed orders from two more 'hyperscalers and is in advanced development for their own next-generation AI XPUs.' It was rumored that ByteDance and OpenAI teamed up with Broadcom to develop their AI chips. Broadcom, of course, does not mention names.
Broadcom develops chips for AI, general-purpose data processing, or custom data center hardware — for multiple big-name companies, including Google and Meta. Broadcom and its customers identify the workload demands, such as AI training, inference, or data processing. Then, the company and its partners define the specifications of their chips and develop key aspects of their main differentiators, such as the architecture of processing units, leveraging Broadcom's expertise in silicon design. Broadcom then implements this architecture in silicon and equips it with platform-specific IP, caches, inter-chip interconnects, and interfaces. Broadcom-designed high-performance XPUs are then manufactured by TSMC.
Broadcom may sell its XPUs or custom ASICs directly to customers with long-term supply agreements depending on the contract with a particular customer. In addition, Broadcom may assist in developing products, charging for collaborative engineering and/or research and development efforts.
Broadcom's XPU business is a key ingredient of its strategy to capitalize on the growing demand for AI and cloud infrastructure, making it a critical player in the AI hardware ecosystem. The company believes that in 2027, the serviceable addressable market (SAM) for AI XPU and networking will be between $60 and $90 billion, and the firm is positioned to command a leading share of this market. It is unclear how the company counts SAM, though, as this year, Nvidia alone will earn about $100 billion selling its GPUs, DPUs, and networking hardware to the AI market.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.
-
JRStern Can this possibly be true?Reply
If so it is madness at an unprecedented level.
Let's just do some math, that if we're talking $30,000 per B200 GPU, a million times that is thirty billion dollars. And that's probably about half the cost of the completed, installed system. Not to mention ongoing facility cost and the electric bill. Depreciate it over what, ten years, finance it at even 5% opportunity cost, ... even minimal staffing, you're talking $5b-$10b per year and more to own and operate such a thing.
Now perhaps using a much cheaper chip, slower but with not too different price/performance, you get down to a tenth of that, still not cheap but you get the "million GPU" bragging rights.
Still it's my perception the technology has already moved on, a cluster of even 1,000 B200s should be enough for any team, though you might have ten or twenty teams going. So if these things are overbuilt because of some technoid FOMO, they won't ever operate 90% of it on any regular basis. -
gg83 How many cores will be in each GPU? 1,000,000(gpu) x 20,000(cores) is a lot of cores. Just to run a crappy computer version of a 4 year old child with instant access to all the worlds twitter statements.Reply -
bit_user
Meanwhile, AMD and Intel are squabbling over table scraps.The article said:The company believes that in 2027, the serviceable addressable market (SAM) for AI XPU and networking will be between $60 and $90 billion, and the firm is positioned to command a leading share of this market.
I wonder about software support for Broadcom's AI accelerators. Do popular machine learning frameworks already have backends for them? I think they must, but it hasn't come up on my radar. This seems to be one of the issues AMD long struggled with, and Intel recently said Gaudi 3 will miss sales projections because its software support is running behind schedule. That makes me really curious what Broadcom has been doing!
Also, what are the chances they include a small version of one of the AI accelerator blocks in the next Raspberry Pi SoC? -
bit_user
I think your numbers are a bit high, but you're in the right order of magnitude. I doubt customers buying 1M units will pay that $30k list price, even in spite of the extremely high demand. I'd peg the final build cost of the facility at not much more than $30B, although that's a pretty insane amount of money. It makes me wonder what other sorts of construction projects have similar price tags!JRStern said:Can this possibly be true?
If so it is madness at an unprecedented level.
Let's just do some math, that if we're talking $30,000 per B200 GPU, a million times that is thirty billion dollars. And that's probably about half the cost of the completed, installed system. Not to mention ongoing facility cost and the electric bill. Depreciate it over what, ten years, finance it at even 5% opportunity cost, ... even minimal staffing, you're talking $5b-$10b per year and more to own and operate such a thing.
I doubt Nvidia's price/perf is that far off what's possible. I could believe better price/perf by maybe a factor of 2, but not 10. Not if we're talking about training, that is.JRStern said:Now perhaps using a much cheaper chip, slower but with not too different price/performance, you get down to a tenth of that, still not cheap but you get the "million GPU" bragging rights.
How do you figure that?JRStern said:Still it's my perception the technology has already moved on, a cluster of even 1,000 B200s should be enough for any team,
A lot of the article focuses on hyperscalers, which means they definitely will have many customers using subsets of those pools. It's definitely not going to be 1M GPUs all training a single model, or anything silly like that.JRStern said:though you might have ten or twenty teams going. So if these things are overbuilt because of some technoid FOMO, they won't ever operate 90% of it on any regular basis. -
bit_user
They're not cores, in the CPU sense of the word. They're just ALU pipelines, but bundled into SIMD-32 blocks (SIMD-32 is 1024 bits, if you'd prefer to look at it that way). So, the H100 has 528 (enabled) SIMD-32 engines, arranged into 132 SMs (Shader Multiprocessors). So, depending on how you look at it, it's either 528 or 132 cores per GPU. The 528 number puts it on par with an Intel server core, since that has 2x AVX-512 FMA ports per core.gg83 said:How many cores will be in each GPU? 1,000,000(gpu) x 20,000(cores) is a lot of cores. -
JRStern
Same price/perf but someone might use 10 slower chips at 1/10 the price, just for variety.bit_user said:I doubt Nvidia's price/perf is that far off what's possible. I could believe better price/perf by maybe a factor of 2, but not 10. Not if we're talking about training, that is.
Well even NVidia bills the new chips at 20x faster (by using FP4, though that only applies to maybe half the training). Say they're right. Then 1000 B200s is like 20,000 H100s or whatever.bit_user said:How do you figure that?
But more than that ... see next item.
The hunger for huge numbers of GPUs came from Altman's "scale is everything!" mantra five years ago, but even the training for ChatGPT 4.o was done in four pieces, using rather less than 100k GPUs of slower vintage.bit_user said:A lot of the article focuses on hyperscalers, which means they definitely will have many customers using subsets of those pools. It's definitely not going to be 1M GPUs all training a single model, or anything silly like that.
Now, they are moving more work out of training and into inference time, but that's probably the right move, too. But it means all work is done in much smaller chunks, giving huge economies of scale.
Plus the search is on for more continuous, human-like learning. Nobody has to wipe your brain in order to accommodate reading one more book. And, some other stuff, too.
No doubt someone still wants to try their hand at mega-machine monotonic models, but the "scale, scale, scale" idea never made actual sense, when computational cost rises exponentially with scale, scale, scale. It's not the history of computation that stuff works like that, algorithms generally improve as fast or faster than hardware, things get exponentially easier. -
bit_user
Training is difficult to scale like that. There's a lot of communication, hence why NVLink is such a beast. Communication doesn't scale linearly, so by having lots more nodes, your communication overhead is going to increase by an even greater amount, possibly even to point where you spend more energy on communication than computation. That's why I think maybe using a solution like half or 1/3rd as fast might be viable, but 1/10th probably isn't.JRStern said:Same price/perf but someone might use 10 slower chips at 1/10 the price, just for variety.
No, they said it's only 4x as fast at training as Hopper.JRStern said:Well even NVidia bills the new chips at 20x faster (by using FP4, though that only applies to maybe half the training).
https://www.tomshardware.com/pc-components/gpus/nvidias-next-gen-ai-gpu-revealed-blackwell-b200-gpu-delivers-up-to-20-petaflops-of-compute-and-massive-improvements-over-hopper-h100
Link?JRStern said:Now, they are moving more work out of training and into inference time, but that's probably the right move, too. But it means all work is done in much smaller chunks, giving huge economies of scale.
There's a long-practiced concept called "transfer-learning", which is exactly what you're talking about. That's standard practice for at least 6 years.JRStern said:Plus the search is on for more continuous, human-like learning. Nobody has to wipe your brain in order to accommodate reading one more book. And, some other stuff, too. -
RedBaron616 Ever notice how all the climate doomsday types are okay with AI centers sucking up electricity as if it were free? Not a peep from the compliant media. I personally don't care as long as I am not required to help pay for more powerplants for them. I merely point out the hypocrisy.Reply -
bit_user
No, I did not. I've heard lots of people are complaining about how much power AI us using. It's literally one of the top complaints I hear about it, even in non-techie contexts and media.RedBaron616 said:Ever notice how all the climate doomsday types are okay with AI centers sucking up electricity as if it were free? Not a peep from the compliant media.
Even on this site, it's been covered quite heavily, from multiple different angles. Here's just a small sampling of recent articles discussing the subject:
https://www.tomshardware.com/news/power-consumption-of-ai-workloads-approaches-that-of-small-country-report https://www.tomshardware.com/tech-industry/artificial-intelligence/us-govt-wants-to-talk-to-tech-companies-about-ai-electricity-demands-eyes-nuclear-fusion-and-fission https://www.tomshardware.com/tech-industry/artificial-intelligence/elon-musks-new-worlds-fastest-ai-data-center-is-powered-by-massive-portable-power-generators-to-sidestep-electricity-supply-constraints https://www.tomshardware.com/tech-industry/artificial-intelligence/elon-musks-massive-ai-data-center-gets-unlocked-xai-gets-approved-for-150mw-of-power-enabling-all-100-000-gpus-to-run-concurrently https://www.tomshardware.com/tech-industry/artificial-intelligence/aws-ceo-estimates-large-city-scale-power-consumption-of-future-ai-model-training-tasks-an-individual-model-may-require-somewhere-between-one-to-5gw-of-power https://www.tomshardware.com/tech-industry/artificial-intelligence/meta-turns-to-nuclear-power-for-ai-training-asking-for-developer-proposals-for-small-modular-reactors-or-larger-nuclear-solutions https://www.tomshardware.com/tech-industry/artificial-intelligence/amazon-web-services-hints-at-1000-watt-next-generation-trainium-ai-chip-aws-lays-the-groundwork-for-liquid-cooled-data-centers-to-house-next-gen-ai-chips
It always has to get paid for, somehow. As powerless consumers, we're sure to foot some of that bill whether we like it or not. I think pretty much the only thing you can do is just try to boycott AI-based services and features as much as you can, in order to make it less profitable for the companies using it.RedBaron616 said:I personally don't care as long as I am not required to help pay for more powerplants for them. -
JRStern bit_user said:No, they said it's only 4x as fast at training as Hopper.
https://www.tomshardware.com/pc-components/gpus/nvidias-next-gen-ai-gpu-revealed-blackwell-b200-gpu-delivers-up-to-20-petaflops-of-compute-and-massive-improvements-over-hopper-h100
In that same article:
"B200 ends up with theoretically 1.25X more compute per chip with most number formats that are supported by both H100 and B200."
NVDA puts out numbers based on chip, module, same, different, peak, total, etc. There are bits of validity to them all, and also bits of invalidity.