China firms' AI breakthrough can meld GPUs from different brands into one training cluster — Baidu says new tech fuses thousands of GPUs together to help sidestep shortages

(Image credit: Shutterstock)

Tech companies are increasing their pace in developing AI technologies. Baidu, China’s largest search engine, is often compared to Google and is pushing hard to gain a global lead. According to its CEO, Robin Li, the company has developed a system to combine GPUs from different vendors and use them as a single compute cluster for AI training. This was highlighted in its Q1 2024 Earnings Call transcript.

Li mentioned in the call that Baidu is transforming from an Internet-centric business to an AI-first business with its generative AI, ERNIE, becoming the new core of its products in 2025. However, we expect Baidu to encounter headwinds, especially as it’s a Chinese company, and the US severely restricts the export of advanced technologies to China, such as the latest-generation Nvidia, AMD, and Intel chips, which are crucial to AI development.

Because of the lack of available hardware, Chinese companies are forced to build homegrown GPUs that are inferior to American-built technologies. However, as the proverb says, necessity is the mother of invention. Thus, Chinese companies are finding ways around US sanctions, like procuring advanced GPUs on the black market and developing novel solutions.

An example is Baidu’s announcement of an advanced GPU cluster management technology during its earnings call, which is a game-changer for China’s AI ambitions. “Leveraging our technical expertise, we can now integrate GPUs from various vendors into a unified computing cluster to train an LLM,” Li says. “Our platform has demonstrated high efficiency with this setup on a GPU cluster composed of hundreds, even thousands of GPUs. This is an important breakthrough because of the limited availability of imported GPUs.” (transcript via The Globe and Mail)

If Li’s claim is true, Baidu has achieved a brilliant technical breakthrough. This technology will allow the company to mix and match different GPUs, combining powerful but more scarce American-controlled GPUs with readily available but slower Chinese-made GPUs, like the Lingjiu GP201 or Biren BR100, among many others.

While patching together multiple GPUs might seem simple to the regular consumer, it’s a complex problem requiring creative solutions. GPU manufacturers use different architectures with wildly different processing speeds, hardware programming languages, and more. So, Baidu would have to account for those when integrating all these systems. Furthermore, non-deterministic latency, scaling over fabrics, and memory errors must also be considered, to name a few.

If Baidu’s claims are true, this is a massive development. However, we should also note that the company announced it during an earnings call. Baidu (NASDAQ: BIDU) is a publicly traded corporation, so it’s in their favor to claim technologies like this will help them weather the geopolitical storm brewing between China and the US.

And despite the potential of this technology, we should note that this was just an announcement, with no proof of its actual performance. So, until Baidu publishes research or shows the public the viability and efficiency of such a system, we won’t know how it will perform in the real world and whether it will give the company a measurable advantage over its competitors.

Nevertheless, this development shows that China can progress technologically regardless of American tariffs, sanctions, and export limits. As Chinese President Xi Jinping said after the Netherlands blocked the export of ASML’s lithography equipment, “The Chinese people also have the right to legitimate development, and no force can stop the pace of China’s scientific and technological development and progress.”

The White House’s attempts to stop technology transfers from the US and its allies may hinder China’s scientific advancements in the short term. But given the creativity of the human mind and the Chinese government’s deep pockets, it might soon catch up with the US in a few decades — or even in just a few years.

Jowi Morales is a tech enthusiast with years of experience working in the industry. He’s been writing with several tech publications since 2021, where he’s been interested in tech hardware and consumer electronics.

7 Comments Comment from the forums

As Chinese President Xi Jinping said after the Netherlands blocked the export of ASML's lithography equipment,

Is that what Chinese President said via that link ? This is funny ! :LOL:

How to speed up a slow Mac ?
https://www.macworld.com/article/668632/how-to-speed-up-a-mac.html

The White House's attempts to stop technology transfers from the US and its allies may hinder China's scientific advancements in the short term.

Sorry! Page not found.
The page you're looking for has either been moved or removed from the site.
Please try searching our site or start again on our homepage.

Maybe try proof-reading articles BEFORE publishing ?
Reply
bit_user

The article said:
The company can combat AI chip scarcity by combining GPUs from different vendors.
...
If Baidu's claims are true, this is a massive development.
How much is that going to help, really? I'm just not seeing a huge upside, here. If you pair a handful of Chinese GPUs with a Nvidia GPU that's 10x as fast, then the total benefit on training time won't add up to much. Also, for anyone building AI systems, they're dealing in aggregates and I'll bet they build systems with either all Nvidia or all AMD GPUs. It's probably much more the exception that they're down to just a couple boards of either kind, and if you were, you just build another all-AMD system (for instance).

Furthermore, any time you don't have a high-speed fabric and have to rely on PCIe for interconnectivity, you're going to be at a significant disadvantage. The software overhead of abstracting each GPU API is going to add a little more, but I see it as neither a huge win nor a major impediment.

The article said:
If Li's claim is true, Baidu has achieved a brilliant technical breakthrough. This technology will allow the company to mix and match different GPUs
On a purely technical level, I think it's less impressive than prior techniques for mixing & matching different GPUs in the same machine.
https://www.anandtech.com/show/2844
https://www.anandtech.com/show/4522/ecs-p67h2a-review-a-visit-back-to-lucids-hydra/8
While searching for that, I found a project enabling disparate multi-GPU configurations for CFD:
https://www.reddit.com/r/CFD/comments/107mzx5/the_fluidx3d_v20_multigpu_uptate_is_now_out_on/
Sorta shows it's not quite the genius breakthrough the article claims.
Reply
JTWrenn

This has always been doable, the question is how large is the efficiency dropped by doing it.
Reply
ThomasKinsley

bit_user said:
How much is that going to help, really? I'm just not seeing a huge upside, here. If you pair a handful of Chinese GPUs with a Nvidia GPU that's 10x as fast, then the total benefit on training time won't add up to much. Also, for anyone building AI systems, they're dealing in aggregates and I'll bet they build systems with either all Nvidia or all AMD GPUs. It's probably much more the exception that they're down to just a couple boards of either kind, and if you were, you just build another all-AMD system (for instance).

Furthermore, any time you don't have a high-speed fabric and have to rely on PCIe for interconnectivity, you're going to be at a significant disadvantage. The software overhead of abstracting each GPU API is going to add a little more, but I see it as neither a huge win nor a major impediment.

On a purely technical level, I think it's less impressive than prior techniques for mixing & matching different GPUs in the same machine.
https://www.anandtech.com/show/2844https://www.anandtech.com/show/4522/ecs-p67h2a-review-a-visit-back-to-lucids-hydra/8
While searching for that, I found a project enabling disparate multi-GPU configurations for CFD:
https://www.reddit.com/r/CFD/comments/107mzx5/the_fluidx3d_v20_multigpu_uptate_is_now_out_on/
Sorta shows it's not quite the genius breakthrough the article claims.
I think you've nailed it. To me this sounds like SLI on steroids, but SLI suffered from overhead with just two cards that made it a losing proposition. I can't imagine how what the overhead would be with hundreds or thousands of GPUs in a cluster. While the current tech is not impressive on a technical level, I still think it would probably be useful for LLM training during a supply shortage from the sanctions.
Reply
Pierce2623

I’ll happily bet a hundred dollars that it’s literally just the recompilers made to run CUDA code on AMD or Intel just combined into one package.
Reply
bit_user

ThomasKinsley said:
SLI suffered from overhead with just two cards that made it a losing proposition. I can't imagine how what the overhead would be with hundreds or thousands of GPUs in a cluster. While the current tech is not impressive on a technical level, I still think it would probably be useful for LLM training during a supply shortage from the sanctions.
The communication patterns are very different for training vs. rendering. One of the first things the US sanctions targeted was communication bandwidth, because they knew how much that would impede training large-scale models. There are good reasons for Nvidia's focus on NVLink. AMD and Intel each have their own version, as well.
Reply
Gillerer

ThomasKinsley said:
I think you've nailed it. To me this sounds like SLI on steroids, but SLI suffered from overhead with just two cards that made it a losing proposition. I can't imagine how what the overhead would be with hundreds or thousands of GPUs in a cluster. While the current tech is not impressive on a technical level, I still think it would probably be useful for LLM training during a supply shortage from the sanctions.

SLI overhead affected gaming because the need for precise synchronization and low latency. (And what finally killed it was the impending move to low-level APIs such as DX12 and Vulkan, where it would have fallen on game developers to implement multi-GPU on each game.)

Depending on the kind of compute workload you have, these might not be a factor at all. Multi-GPU still has applications for compute and rendering tasks (just not real-time rendering).
Reply

Show more comments

Recommended reading