MEMBER EXCLUSIVE

Nvidia's China presence hits zero, says CEO Jensen Huang, and companies are already working around it — Alibaba reduces reliance on H20 as U.S. and China division deepens

(Image credit: Getty Images / Jade Gao)

Alibaba Cloud has revealed a new GPU pooling system that slashed the number of Nvidia accelerators needed for large-scale inference by more than 80%. The system, known as Aegaeon, was presented at the 2025 SOSP conference in Korea and piloted in Alibaba’s own production environment. It allows multiple large language models to share a single GPU. By doing so, it cuts the hardware footprint for inference workloads to a fraction of what was previously required.

The company claims it served dozens of LLMs up to 72 billion parameters using just 213 H20 GPUs — down from 1,192 for the same workload. That’s a reduction of 82% in real-world usage. For Chinese companies, which are having to find workarounds for shortages of Nvidia parts, it demonstrates that even with limited access, software ingenuity can stretch each GPU further than initially thought.

Alibaba’s findings arrive at a critical moment for Nvidia. CEO Jensen Huang said last week that the company had lost its entire Chinese market share. “At the moment, we are 100% out of China. We went from 95% market share to 0%,” he said during an interview at Citadel Securities’ Future of Global Markets 2025 event in New York on October 6, blaming U.S. export controls that have progressively shut down Nvidia’s ability to sell into the mainland.

So while Alibaba’s pooling system is based on Nvidia silicon, it illustrates that the Chinese market is slowly adapting to the reality of a no-Nvidia future. Systems like Aegaeon prepare the stack for alternative accelerators. The techniques it relies on — token-level scheduling, dynamic model sharing, disaggregated decoding — could just as easily be used to allocate inference across Huawei’s Ascend chips or Biren’s BR100.

The rise of the soft decoupling

Unlike Huawei’s Kirin chip comeback in smartphones, there’s no single flagship product replacing Nvidia in the data center. What’s happening instead is a software-led reshaping of the AI stack that lets domestic chipmakers compete on smaller margins and non-identical capabilities.

Take Baidu, which earlier this year deployed a 30,000-chip cluster based on its Kunlun AI processors — or Moore Threads, whose GPUs are already running Alibaba and DeepSeek models at scale. Even Cambricon and MetaX are back in the conversation as startups pair with model developers to co-design tailored stacks.

These chips may not beat Nvidia’s on raw performance, but performance is no longer the only metric that matters. Aegaeon, for example, doesn’t make any model run faster. It just makes it possible to serve far more models on far fewer chips. Alibaba says its cloud workloads are dominated by a handful of models, while hundreds of others see infrequent calls that would normally tie up entire GPUs. By decoupling GPU access from model provisioning and breaking inference into separate scheduling pools, Aegaeon keeps Nvidia cards saturated without being locked into Nvidia’s ecosystem.

Alibaba isn’t the only company moving in this direction. Tencent is reportedly investing in alternatives to CUDA, building internal runtime infrastructure that will allow models to target its in-house hardware directly. Huawei, meanwhile, has made its Ascend AI software toolkit open-source and is moving more toward state-aligned deployments.

While CUDA was once the binding glue that gave Nvidia a durable advantage, it’s now being replicated. Model developers are adapting. Tooling is diversifying. And the new software primitives being deployed in Alibaba’s stack are highly transferable to other silicon. This is what makes the current moment different from past attempts at silicon substitution. The Chinese AI industry isn’t trying to match Nvidia chip-for-chip; it’s gunning to beat it at the system level.

In that light, Aegaeon begins to look like a whole lot more than a bit of clever optimization. It introduces a scheduling model that abstracts the GPU as a shared resource, making it far easier to integrate accelerators of varying capabilities and provenance. In principle, one could imagine a future version of the system allocating token decoding to domestic chips while handling high-throughput prefill phases on legacy Nvidia cards.

Can Nvidia’s loss be undone?

There’s no precedent for a market this large going from total Nvidia control to zero virtually overnight. And there’s no obvious path back. Even if the U.S. relaxes controls, China is unlikely to re-adopt Nvidia at scale. Regulators have warned against buying chips with embedded telemetry, and firms like Tencent are already pushing to replace CUDA internally. Some, like Alibaba, are leaning into open architectures such as RISC-V for their future silicon.

In the short term, Nvidia’s loss is likely China’s gain. Each GPU shipped before the ban now stretches further, thanks to pooling systems like Aegaeon. And because the biggest breakthroughs are architectural, they lay a foundation for future hybrid deployments: part Nvidia, part domestic, all orchestrated by infrastructure that owes little to any one vendor.

There are questions still unanswered. Can domestic fabs scale quickly enough to handle the growing demand? Will the performance gap between Chinese accelerators and Nvidia widen again as Hopper successors roll out? Can software abstraction truly erase years of CUDA lock-in across training workloads, not just inference?

But whatever the answers, it’s clear that China isn’t waiting for parity. It’s building alternatives that lean into architectural flexibility and cost efficiency. If that means reimagining GPU usage from the ground up, so be it. Alibaba has just provided a potential blueprint.

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

TOPICS

Luke James is a freelance writer and journalist. Although his background is in legal, he has a personal interest in all things tech, especially hardware and microelectronics, and anything regulatory.