Chinese Firms Foil US AI Sanctions With Older GPUs, Software Tweaks

image of brain over circuit board
(Image credit: Shutterstock)

After Chinese companies lost access to Nvidia's leading-edge A100 and H100 compute GPUs, which can be used to train various AI models, they had to find ways to train them without using the most advanced hardware. To compensate for the lack of powerful GPUs, Chinese AI model developers are instead simplifying their programs to reduce requirements, and using all the compute hardware they can get in combination, the Wall Street Journal reports.

Nvidia cannot sell its A100 and H100 compute GPUs to Chinese entities like Alibaba or Baidu without getting an export license from the U.S. Department of Commerce (and any application would almost certainly be denied). So Nvidia has developed A800 and H800 processors that offer reduced performance and come with handicapped NVLink capabilities, which limits the ability to build high-performance multi-GPU systems traditionally required to train large-scale AI models.

Due to high costs and the inability to physically get all the GPUs they need, Chinese companies have designed methods to train large-scale AI models across different chip types, something that U.S.-based companies rarely do due to technical challenges and reliability concerns. For example, companies like Alibaba, Baidu, and Huawei have explored using combinations of Nvidia's A100s, V100s, and P100s, and Huawei's Ascends, according to research papers reviewed by WSJ. 

"If it works well, they can effectively circumvent the sanctions," Dylan Patel, chief analyst at SemiAnalysis, is reported to have said.

 

Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

  • bit_user
    Meanwhile, a paper published last year by Baidu and Peng Cheng Laboratory demonstrated that researchers were training large language models using a method that could render the additional feature irrelevant.
    Does anyone know which paper that is?
    Reply