China's Loongson Unveils 32-Core CPU, Reportedly 4X Faster Than Arm Chip

Loongson, a Chinese fabless chipmaker, has launched the new 3D5000 processor for data centers and cloud computing. MyDrivers reported that Loongson claims its 32-core domestic chips deliver 4X higher performance than rival Arm processors.

The 3D5000 still leverages LoongArch, Loongson's homemade instruction set architecture (ISA) from 2020. The chipmaker was previously a firm believer in MIPS. However, Loongson eventually built LoongArch from the ground up with the sole objective of not relying on foreign technology to develop its processors. LoongArch is a RISC (reduced instruction set computer) ISA, similar to MIPS or RISC-V.

In addition to the 3D5000 and 7A2000, Loongson also announced the 2K050, the company's baseboard management controller (BMC). The 2K050 features LA264 cores at 500 MHz, integrated 2D GDP, 32-bit DDR3 support, and outputs at a 1080p (1920x1080) resolution at 60 Hz.

Zhiye Liu
News Editor and Memory Reviewer

Zhiye Liu is a news editor and memory reviewer at Tom’s Hardware. Although he loves everything that’s hardware, he has a soft spot for CPUs, GPUs, and RAM.

  • DaveLTX
    Despite loongson claiming LSA is developed on their own, it's MIPS with their own AVX like extensions

    So yes it's modified mips
    Reply
  • bit_user
    Thanks for the continuing coverage. I think it's worth keeping tabs on China's tech industry.

    Loongson claims its 32-core domestic chips deliver 4X higher performance than rival Arm processors.
    That's immediately quite suspect, so I plugged in the MyDrivers link to Google Translate and here's what it claims they said:
    "In terms of performance, the SPEC 2006 score of Loongson 3D5000 exceeds 425. The floating point part adopts dual 256bit vector units, and the double precision floating point performance can reach 1TFLOPS (1 trillion times), which is 4 times of the typical ARM core performance."(the bold is theirs)That's a far more nuanced statement. For one thing, the typical ARM core has dual 128-bit vectors, which gives it an automatic 2x. I don't know where the other 2x comes from, but it's not hard to imagine theirs has dual-FMA units, whereas their basis for comparison doesn't. That still doesn't get us quite to 4x, but now we're in the ballpark.

    It's way off the mark to generalize 4x the vector fp64 throughput to an overall 4x performance increase, however.

    According to Loongson's provided numbers, the 3D5000 scores over 425 points in SPEC CPU 2006
    That's the claimed single-CPU score. The article also mentions scores of 800 and 1500, for dual- and quad- CPU configurations.

    For comparison, the last set of SPEC2006 scores I could find for an AMD CPU are the Zen 2-based Rome Epyc:

    https://www.anandtech.com/show/14694/amd-rome-epyc-2nd-gen/10

    I think the score they're quoting is a simple average of the sub-scores, in which case the 128-core dual-CPU Rome Epyc delivered 3434. That's about 2.3x what they claimed to achieve with the same core-count.

    Meanwhile, the processor's stream performance with eight channels of DDR4-3200 memory crosses the 50GB mark.
    This probably refers to the OpenMP-based Stream Triad benchmark, which adds 2 streams of numbers and writes out a 3rd. That means actual memory traffic is 3-4x whatever number they're quoting, so 150 to 200 GB/s. That aligns well with a raw bandwidth of about 205 GB/s from 8x DDR4-3200. In multi-CPU configurations, it particularly stresses the inter-processor link. Presumably, the number their quoting is just from a single-CPU config.

    It would be great to know what process node these chips were designed for.
    Reply
  • bit_user
    DaveLTX said:
    Despite loongson claiming LSA is developed on their own, it's MIPS with their own AVX like extensions
    It's certainly more than that, but I'm not an authority on the subject. My best understanding is that they borrowed much of the MIPS system architecture, while the instruction set architecture is substantially different.

    If you have a good source on the matter, please share it.
    Reply
  • The Historical Fidelity
    bit_user said:
    It would be great to know what process node these chips were designed for.
    I suspect it is SMIC 14nm since the 3A5000 was originally designed to TSMC 14nm design rules and this new cpu is just 2 3A5000’s glued together. Since SMIC 14nm is an unsanctioned copy of TSMC 14nm and loongson no longer has access to TSMC services, the cpu design would be compatible with SMIC’s node so this seems the most likely assumption.
    Reply
  • shady28
    The Historical Fidelity said:
    I suspect it is SMIC 14nm since the 3A5000 was originally designed to TSMC 14nm design rules and this new cpu is just 2 3A5000’s glued together. Since SMIC 14nm is an unsanctioned copy of TSMC 14nm and loongson no longer has access to TSMC services, the cpu design would be compatible with SMIC’s node so this seems the most likely assumption.

    Pretty sure that TSMC never had a 14nm. They went from 16->12->10 (Apple only) -> 7

    Worth noting that Intel 14nm is about 10-12% higher density than TSMC 12FFC and about 20% higher density than SMIC 14nm.

    This is basically on a SMIC node that is just a smidge better than the old 16nm node that TSMC was using in 2013.
    Reply
  • anonymousdude
    shady28 said:
    Pretty sure that TSMC never had a 14nm. They went from 16->12->10 (Apple only) -> 7

    Worth noting that Intel 14nm is about 10-12% higher density than TSMC 12FFC and about 20% higher density than SMIC 14nm.

    This is basically on a SMIC node that is just a smidge better than the old 16nm node that TSMC was using in 2013.

    You are correct. TSMC never did formally name a node 14nm. Their 16nm and 12nm constituted the "14nm-class" node.
    Reply
  • DaveLTX
    bit_user said:
    It's certainly more than that, but I'm not an authority on the subject. My best understanding is that they borrowed much of the MIPS system architecture, while the instruction set architecture is substantially different.

    If you have a good source on the matter, please share it.
    The programming manual mentioned by chips and cheese shows the lineage of the LSA clear as day and night, that and they were using MIPS recently so that gives a sign it's based off MIPS with their own additions
    Reply
  • The Historical Fidelity
    shady28 said:
    Pretty sure that TSMC never had a 14nm. They went from 16->12->10 (Apple only) -> 7

    Worth noting that Intel 14nm is about 10-12% higher density than TSMC 12FFC and about 20% higher density than SMIC 14nm.

    This is basically on a SMIC node that is just a smidge better than the old 16nm node that TSMC was using in 2013.
    Good catch, I meant TSMC 12nm
    Reply
  • thisisaname
    It is easy to make claims and I do not think any third party benchmarks are going to be run any time soon.
    Reply
  • DaveLTX
    thisisaname said:
    It is easy to make claims and I do not think any third party benchmarks are going to be run any time soon.
    https://chipsandcheese.com/2023/04/09/loongsons-3a5000-chinas-best-shot/Except there is?
    https://chipsandcheese.com/2023/01/29/previewing-chinas-loongson-3a5000-with-performance-counters/
    Reply