Tencent boosts 100,000 GPU-capable AI clusters with network optimization — Xingmai 2.0 increases communication efficiency by 60% and LLM training efficiency by 20%

(Image credit: Tencent)

Tencent Holdings has significantly enhanced its high-performance computing network, Xingmai 2.0, by upgrading its network performance, reports the South China Morning Post. The move boosted the company's AI capabilities and improved large language model (LLM) training efficiency. This development aligns with China's efforts to advance its AI prowess despite restrictions on shipments of advanced processors, such as Nvidia's H100, to China.

The Xingmai 2.0 network supports over 100,000 GPUs in a single computing cluster, doubling the capacity of the initial network launched in 2023. The report says the upgraded Xingmai 2.0 network increases network communication efficiency by 60% and LLM training efficiency by 20%. Tencent achieved these performance improvements by optimizing existing infrastructure instead of investing in new processors, which are hard (almost impossible for a Chinese entity) to get due to U.S. export rules.

Tencent's approach primarily reflects China's push to enhance its technological capabilities using available resources. For example, Baidu has reported significant efficiency improvements in its Ernie LLM, with a fivefold increase in training efficiency and a 99% reduction in inferencing costs. These gains highlight the ongoing efforts by Chinese tech companies to make AI training more efficient and cost-effective. Such advancements are crucial amid the price war, making AI technologies more accessible and affordable.

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.