China's Moore Threads polishes homegrown CUDA alternative — MUSA supports porting CUDA code using Musify toolkit

The Moore Threads MTT S4000 graphics card.
(Image credit: Moore Threads)

The first traces of Moore Threads' GPU programming software stack, dubbed MUSA, have surfaced online, furthering the nation's pursuit of tech-autarky. MUSA serves as an alternative to Nvidia's CUDA environment, compatible with the domestic MUSA MTT GPU lineup. Any open-source pedigree of the SDK has not been mentioned, so it is likely proprietary and won't be of much benefit to developers outside China.

The U.S. has implemented a series of export restrictions on China, including: advanced AI chips, high-bandwidth memory (HBM), manufacturing equipment, and silicon wafers from leading players like Intel, TSMC, and Samsung. In a bid to reduce reliance on Western hardware, China is hard at work developing its semiconductor ecosystem with in-house silicon, fab equipment, memory, CPUs, and even GPUs. The latter is of great importance, as modern-day machine learning (sometimes under the buzzword banner of AI) is largely accelerated by parallel computing, something which GPUs excel at.

A strong GPU programming ecosystem offers high-level abstraction, ready-to-use libraries, documentation, and profiling tools. With high-performance Nvidia GPU exports still in limbo, Moore Threads is offering an alternative to CUDA.

MUSA provides a built-in compiler (MCC), runtime libraries (MUSA Runtime), a comprehensive list of specialized libraries (MUSA-X), debuggers, and profilers. To ensure compatibility with already written CUDA code, the MUSA SDK also includes Musify, a tool that translates CUDA code for the MUSA environment, likely by translating PTX code at runtime, similar to zLUDA.

(Image credit: Moore Threads)

The MUSA SDK version 4.0.1 is compatible with x86 processors from Intel (on Ubuntu) and Hygon (on Kylin). Moore Threads is demonstrating the prowess of its stack through several demonstrations on its website, including speech synthesis, AI-image generation, image processing, AI-powered 3D face modeling, just to name a few. You can actually try out a bunch of these demos right now (though you might need an account), some of which are reportedly running on Moore Threads' MTT S3000 datacenter GPUs.

Despite CUDA's clear advantage in terms of advancement, maturity, and support, MUSA could find many indigenous customers in small-scale environments, evolving over time. AI developers and researchers envision a heterogeneous future, championing the adoption of hardware-agnostic and open-source platforms. Breaking free from CUDA's reign requires superior alternatives, with ROCm being a key contender. However, AMD's hardware support still trails behind Nvidia.

TOPICS
Hassam Nasir
Contributing Writer

Hassam Nasir is a die-hard hardware enthusiast with years of experience as a tech editor and writer, focusing on detailed CPU comparisons and general hardware news. When he’s not working, you’ll find him bending tubes for his ever-evolving custom water-loop gaming rig or benchmarking the latest CPUs and GPUs just for fun.

  • bit_user
    Notice how it parallels CUDA, exactly like AMD's HIP.
    https://rocm.docs.amd.com/projects/HIP/en/latest/how-to/hip_porting_guide.html#library-equivalentsI'd bet money on MUSA being a HIP ripoff, if not even a direct fork (HIP is open source).

    BTW, MooreThreads promised to release MUSA what... 2, maybe 3 years ago?
    Reply
  • Mindstab Thrull
    Chinese tech autonomy is definitely going to be interesting. The only concern I really have at the end of the day (or maybe decade?) is compatibility with other systems. China, as far as I'm aware, still wants to be part of the global society - unlike at least the perception of some other nations who want to stay behind closed doors.
    But credit where it's due: they've made huge technological strides in a relatively short span. Even if they've "stolen" (pre-empting some expected comments) some of it, they still have to understand enough about what they have to produce other relevant materials.
    Definitely living in interesting times.
    Reply
  • hotaru251
    Mindstab Thrull said:
    Even if they've "stolen" (pre-empting some expected comments) some of it, they still have to understand enough about what they have to produce other relevant materials.
    which isnt hard to do if your people are taught in the field and its your thing.

    if best chef in world makes a recipe for it then even 1star chef could as they understand how to cook even if someone who deosnt cook can't.

    Also idc how its done the world as a whole needs alternatives to CUDA. Its effective monopoly has only harmed the industry.
    Reply
  • bit_user
    Mindstab Thrull said:
    But credit where it's due: they've made huge technological strides in a relatively short span.
    Moore Threads' GPUs are based on hardware IP they licensed from Imagination Technologies.

    Much of the software support certainly could be based on open source implementations for AMD and Intel GPUs. And, as I mentioned, MUSA could be largely based on AMD's HIP. Nothing is illegal about any of this, so long as they abide by those license terms and provide proper credit. This part will be interesting to watch.
    Reply
  • fiyz
    Mindstab Thrull said:
    Chinese tech autonomy is definitely going to be interesting. The only concern I really have at the end of the day (or maybe decade?) is compatibility with other systems. China, as far as I'm aware, still wants to be part of the global society - unlike at least the perception of some other nations who want to stay behind closed doors.
    But credit where it's due: they've made huge technological strides in a relatively short span. Even if they've "stolen" (pre-empting some expected comments) some of it, they still have to understand enough about what they have to produce other relevant materials.
    Definitely living in interesting times.
    I was thinking something similar, if only to make their encrypted government communication harder to break. As China makes more and more of its tech at home, they will have the opportunity to develop their own "flavor" that isn't easily compatible with western systems. But will they make ternary computers?
    Reply
  • bit_user
    fiyz said:
    But will they make ternary computers?
    Could be RISC-20 ?
    : D
    But, why stop at base-3? QLC NAND is already base-16 and PLC is base-32. And while GDDR-7 uses base-3, PCIe 6.0 uses base-4.
    Reply
  • Mindstab Thrull
    bit_user said:
    Could be RISC-20 ?
    : D
    But, why stop at base-3? QLC NAND is already base-16 and PLC is base-32. And while GDDR-7 uses base-3, PCIe 6.0 uses base-4.
    Base 3 because 4 is unlucky in China...?
    Reply