China's Loongson Unveils 32-Core CPU, Reportedly 4X Faster Than Arm Chip

Loongson 3D5000 (Image credit: MyDrivers)

Loongson, a Chinese fabless chipmaker, has launched the new 3D5000 processor for data centers and cloud computing. MyDrivers reported that Loongson claims its 32-core domestic chips deliver 4X higher performance than rival Arm processors.

The 3D5000 still leverages LoongArch, Loongson's homemade instruction set architecture (ISA) from 2020. The chipmaker was previously a firm believer in MIPS. However, Loongson eventually built LoongArch from the ground up with the sole objective of not relying on foreign technology to develop its processors. LoongArch is a RISC (reduced instruction set computer) ISA, similar to MIPS or RISC-V.

The 3D5000 arrives with 32 LA464 cores running at 2 GHz. The 32-core processor has 64MB of L3 cache, supports eight-channel DDR4-3200 ECC memory, and up to five HyperTransport (HT) 3.0 interfaces. It also supports dynamic frequency and voltage adjustments. Officially, the 3D5000 has a 300W TDP; however, Loongson stated that the conventional power consumption is around 150W. That's roughly 5W per core.

While performance isn't the 3D5000's strong suit, security is. The 32-core processor allegedly has a custom-made mechanism to defend against vulnerabilities such as Meltdown or Spectre. The chip also has its Trusted Platform Module (TPM), so it doesn't rely on an external solution. In addition, according to MyDrivers' report, the 3D5000 also supports a secret national algorithm with an embedded security module that seemingly delivers excellent encryption and decryption efficiency higher than 5 Gbps.

In addition to the 3D5000 and 7A2000, Loongson also announced the 2K050, the company's baseboard management controller (BMC). The 2K050 features LA264 cores at 500 MHz, integrated 2D GDP, 32-bit DDR3 support, and outputs at a 1080p (1920x1080) resolution at 60 Hz.

Loongson's 3D5000 is no match for AMD's EPYC Genoa or Intel's Sapphire Rapids Xeon processors. It was never about beating the foreign competition but pushing for self-sufficiency. Unfortunately, with the ongoing U.S. sanctions, Chinese companies have no means to secure chipmaking tools originating from the U.S. In addition, the U.S. Department of Commerce recently blacklisted Loongson, which likely derailed some of the company's plans.

TOPICS

Zhiye Liu is a news editor, memory reviewer, and SSD tester at Tom’s Hardware. Although he loves everything that’s hardware, he has a soft spot for CPUs, GPUs, and RAM.

21 Comments Comment from the forums

DaveLTX

Despite loongson claiming LSA is developed on their own, it's MIPS with their own AVX like extensions

So yes it's modified mips
Reply
bit_user

Thanks for the continuing coverage. I think it's worth keeping tabs on China's tech industry.

Loongson claims its 32-core domestic chips deliver 4X higher performance than rival Arm processors.
That's immediately quite suspect, so I plugged in the MyDrivers link to Google Translate and here's what it claims they said:
"In terms of performance, the SPEC 2006 score of Loongson 3D5000 exceeds 425. The floating point part adopts dual 256bit vector units, and the double precision floating point performance can reach 1TFLOPS (1 trillion times), which is 4 times of the typical ARM core performance."(the bold is theirs)That's a far more nuanced statement. For one thing, the typical ARM core has dual 128-bit vectors, which gives it an automatic 2x. I don't know where the other 2x comes from, but it's not hard to imagine theirs has dual-FMA units, whereas their basis for comparison doesn't. That still doesn't get us quite to 4x, but now we're in the ballpark.

It's way off the mark to generalize 4x the vector fp64 throughput to an overall 4x performance increase, however.

According to Loongson's provided numbers, the 3D5000 scores over 425 points in SPEC CPU 2006
That's the claimed single-CPU score. The article also mentions scores of 800 and 1500, for dual- and quad- CPU configurations.

For comparison, the last set of SPEC2006 scores I could find for an AMD CPU are the Zen 2-based Rome Epyc:

https://www.anandtech.com/show/14694/amd-rome-epyc-2nd-gen/10

I think the score they're quoting is a simple average of the sub-scores, in which case the 128-core dual-CPU Rome Epyc delivered 3434. That's about 2.3x what they claimed to achieve with the same core-count.

Meanwhile, the processor's stream performance with eight channels of DDR4-3200 memory crosses the 50GB mark.
This probably refers to the OpenMP-based Stream Triad benchmark, which adds 2 streams of numbers and writes out a 3rd. That means actual memory traffic is 3-4x whatever number they're quoting, so 150 to 200 GB/s. That aligns well with a raw bandwidth of about 205 GB/s from 8x DDR4-3200. In multi-CPU configurations, it particularly stresses the inter-processor link. Presumably, the number their quoting is just from a single-CPU config.

It would be great to know what process node these chips were designed for.
Reply
bit_user

DaveLTX said:
Despite loongson claiming LSA is developed on their own, it's MIPS with their own AVX like extensions
It's certainly more than that, but I'm not an authority on the subject. My best understanding is that they borrowed much of the MIPS system architecture, while the instruction set architecture is substantially different.

If you have a good source on the matter, please share it.
Reply
The Historical Fidelity

bit_user said:
It would be great to know what process node these chips were designed for.
I suspect it is SMIC 14nm since the 3A5000 was originally designed to TSMC 14nm design rules and this new cpu is just 2 3A5000’s glued together. Since SMIC 14nm is an unsanctioned copy of TSMC 14nm and loongson no longer has access to TSMC services, the cpu design would be compatible with SMIC’s node so this seems the most likely assumption.
Reply
shady28

The Historical Fidelity said:
I suspect it is SMIC 14nm since the 3A5000 was originally designed to TSMC 14nm design rules and this new cpu is just 2 3A5000’s glued together. Since SMIC 14nm is an unsanctioned copy of TSMC 14nm and loongson no longer has access to TSMC services, the cpu design would be compatible with SMIC’s node so this seems the most likely assumption.

Pretty sure that TSMC never had a 14nm. They went from 16->12->10 (Apple only) -> 7

Worth noting that Intel 14nm is about 10-12% higher density than TSMC 12FFC and about 20% higher density than SMIC 14nm.

This is basically on a SMIC node that is just a smidge better than the old 16nm node that TSMC was using in 2013.
Reply
anonymousdude

shady28 said:
Pretty sure that TSMC never had a 14nm. They went from 16->12->10 (Apple only) -> 7

Worth noting that Intel 14nm is about 10-12% higher density than TSMC 12FFC and about 20% higher density than SMIC 14nm.

This is basically on a SMIC node that is just a smidge better than the old 16nm node that TSMC was using in 2013.

You are correct. TSMC never did formally name a node 14nm. Their 16nm and 12nm constituted the "14nm-class" node.
Reply
DaveLTX

bit_user said:
It's certainly more than that, but I'm not an authority on the subject. My best understanding is that they borrowed much of the MIPS system architecture, while the instruction set architecture is substantially different.

If you have a good source on the matter, please share it.
The programming manual mentioned by chips and cheese shows the lineage of the LSA clear as day and night, that and they were using MIPS recently so that gives a sign it's based off MIPS with their own additions
Reply
The Historical Fidelity

shady28 said:
Pretty sure that TSMC never had a 14nm. They went from 16->12->10 (Apple only) -> 7

Worth noting that Intel 14nm is about 10-12% higher density than TSMC 12FFC and about 20% higher density than SMIC 14nm.

This is basically on a SMIC node that is just a smidge better than the old 16nm node that TSMC was using in 2013.
Good catch, I meant TSMC 12nm
Reply
thisisaname

It is easy to make claims and I do not think any third party benchmarks are going to be run any time soon.
Reply
DaveLTX

thisisaname said:
It is easy to make claims and I do not think any third party benchmarks are going to be run any time soon.
https://chipsandcheese.com/2023/04/09/loongsons-3a5000-chinas-best-shot/Except there is?
https://chipsandcheese.com/2023/01/29/previewing-chinas-loongson-3a5000-with-performance-counters/
Reply

Show more comments