Zhaoxin's 12- and 16-Core CPUs Tested: Centaur Lives On

Zhaoxin
(Image credit: Zhaoxin)

Zhaoxin, a China-based CPU developer with an x86 license, has yet to formally introduce its next-generation KaiSheng KH-40000 processors with up to 16 cores for datacenters. However, it has already started to submit benchmark results to the Geekbench 5 database. The new CPUs show noticeable microarchitecture-related performance improvements over their predecessors but can barely catch up with modern CPUs from AMD and Intel.

Mysterious CPUs

Zhaoxin, co-owned by Via Technologies and the Shanghai Municipal Government, has been gradually leveraging microarchitectures designed by Via (or rather by Centaur) since the mid-2010s, and its upcoming KaiSheng KH-40000 series processors for datacenters are based on the CentaurHauls microarchitecture that some claim resembles Intel's Haswell microarchitecture from 2013.  

The KaiSheng KH-40000/16 and KaiSheng KH-40000/12 CPUs run at 2.20 GHz, have 16 and 12 cores, and are equipped with 32MB and 24MB of L3 cache, respectively. In addition, the 16-core model seems to feature simultaneous multithreading technology (SMT), so it can process up to 32 threads concurrently, assuming that Geekbench 5 correctly reads its capabilities. Based on specifications of Zhaoxin's KaiSheng KH-40000/16 and KaiSheng KH-40000/12 published in the Geekbench 5 database, these CPUs look very similar to Centaur's never-released CHA processor unearthed earlier this year.  

There are differences though: CHA had eight cores, did not support SMT, and was architected for TSMC's N16 node, whereas KaiSheng KH-40000 has up to 16 cores, seems to feature SMT, and is believed to be designed for TSMC's N7 fabrication process. Furthermore, processor IDs of both KH-40000 CPUs read 'CentaurHauls Family 7 Model 11 Stepping 3' (12), whereas the processor ID of Centaur's CHA is 'CentaurHauls Family 6 Model 71 Stepping 2,' so the CPUs in question use different silicon.  

What is odd, though, is that both CHA and KH-4000 operate at 2.20 GHz, so if we did not know CPU IDs, we could speculate that the model KH-4000/16 uses two eight-core CHA dies produced on TSMC's N16 node and glued together using an interconnect.

Mediocre Performance

For Zhaoxin, CentaurHauls should be a significant microarchitectural advancement from its LuJiazui microarchitecture from 2019. Furthermore, the improved core count should make KaiSheng KH-40000 CPUs more competitive on the server market. So, let's look at the performance numbers submitted by the CPU developer.

Swipe to scroll horizontally
Header Cell - Column 0 Zhaoxin KH-40000/16Zhaoxin KH-40000/12Centaur CHAZhaoxin KX-U6780AAMD FX-8350Core i9-12900KRyzen 9 5950XHeader Cell - Column 8
General specifications16C/32T, 2.20GHz, 32MB L312C/12T, 2.20GHz, 24MB L38C/8T, 2.20GHz, 16MB L38C/8T, 2.70GHz, 8MB L34C/8T8P, 8E, 3.20 ~ 5.10GHz, 30MB16C, 3.40 ~ 5.0 GHz, 64MBGeneral specifications
MicroarchitectureCentaurHaulsCentaurHaulsCentaurHaulsLuJiaZuiBulldozer/PiledriverGolden Cove + GracemontZen 3Microarchitecture
OSUnionTech OS DT 20 ProWindows 10 ProWindows 10 ProWindows 10 Pro?Windows 11 ProWindows 10 ProOS
Row 3 - Cell 0 Row 3 - Cell 1 Row 3 - Cell 2 Row 3 - Cell 3 Row 3 - Cell 4 Row 3 - Cell 5 Row 3 - Cell 6 Row 3 - Cell 7 Row 3 - Cell 8
Single-Core | Integer45043947636667018301435Single-Core | Integer
Single-Core | Float55953854131860721891881Single-Core | Float
Single-Core | Crypto1039934782583104060644089Single-Core | Crypto
Single-Core | Score51249351136267021491702Single-Core | Score
Row 8 - Cell 0 Row 8 - Cell 1 Row 8 - Cell 2 Row 8 - Cell 3 Row 8 - Cell 4 Row 8 - Cell 5 Row 8 - Cell 6 Row 8 - Cell 7 Row 8 - Cell 8
Multi-Core | Integer929334523307236435702063116695Multi-Core | Integer
Multi-Core | Float1187541763723208935632320518695Multi-Core | Float
Multi-Core | Crypto52332119482533902431174138145Multi-Core | Crypto
Multi-Core | Score991536033508233335112124216868Multi-Core | Score
Linkhttps://browser.geekbench.com/v5/cpu/15706425https://browser.geekbench.com/v5/cpu/16875254https://browser.geekbench.com/v5/cpu/12878360https://browser.geekbench.com/v5/cpu/12878360https://browser.geekbench.com/v5/cpu/15900997https://browser.geekbench.com/v5/cpu/15911328https://browser.geekbench.com/v5/cpu/9506672Link

When it comes to single-threaded performance, Zhaoxin's (or Centaur's) CentaurHaul microarchitecture significantly outpaces the company's previous generation LuJiazui microarchitecture both in integer (by 22%) and floating point (by 75%) workloads even though the new CPU operates at 2.20 GHz. In contrast, the older one works at 2.70 GHz. The FPU performance uplift seems rather dramatic, but one should remember that we are dealing with a synthetic benchmark.

While the new microarchitecture is significantly better than the preceding one, KaiSheng KH-40000 CPUs with 12 and 16 cores cannot compete against any modern CPUs. Moreover, their single-threaded performance is even lower than that of ill-fated AMD's Bulldozer/Piledriver architecture from mid-2012.

As for multi-thread performance, we see a rather odd advantage that Zhaoxin's 16-core KaiSheng KH-40000/16 with SMT has over 12-core KaiSheng KH-40000/12 CPU. While, in theory, the 16C/32T chip can process 2.66 times more threads than its 12C/12T brethren (and we have never seen this kind of SMT efficiency from any well-known CPU microarchitecture so far), its actual performance advantage is higher than even hypothetical 2.66X (2.69X in integer, 2.84X in float). As we are dealing with a situation when one CPU only has four more cores than its rival, yet its performance is almost three times higher, we believe that there are factors beyond the number of cores that have such an effect on performance. 

Keeping in mind that Windows 10/11 does not always work optimally with schedulers of unfamiliar multi-core CPUs, we believe that the 12-core KaiSheng KH-40000/12 CPU results obtained on Windows 10 Pro do not reflect its true potential. 

Yet, even under Windows 10 Pro and without SMT, CentaurHoals is substantially faster than LuJiazui in multi-threaded integer (by 40%) and multi-threaded floating point (78%) workloads. The problem is that absolute performance numbers demonstrated by both KaiSheng KH-40000 and Centaur CHA CPUs are deficient by today's standards. 

Interestingly, multi-threaded performance numbers demonstrated by Zhaoxin's 12-core KaiSheng KH-40000/12 under Windows and without SMT are comparable to AMD's FX-8350 processor (four modules, eight threads), which the company once marketed as an eight-core CPU. We can hardly call the performance of a decade-old processor competitive by today's standards, at least in Geekbench 5, which is not the best benchmark.

Some Thoughts

While 12-core and 16-core configurations seem okay for desktops and entry-level servers, 12 and 16 cores from Zhaoxin do not deliver performance comparable to that of 12-core or 16-core processors from AMD and Intel. Under Windows and judging only by Geekbench 5 scores, Zhaoxin seems to be a decade behind AMD and Intel regarding performance. Even if Zhaoxin enables SMT on its upcoming CentaurHoals-based CPUs (for client and server applications) and Windows 'learns' how to properly use those cores, KaiSheng KH-40000/16 will still be two times slower than 2021 processors from AMD and Intel with the same core count.

Anton Shilov
Freelance News Writer

Anton Shilov is a Freelance News Writer at Tom’s Hardware US. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

  • Tralallak
    Hi Mr. Shilov.
    Zhaoxin KaiSheng KH-40000 not support HyperThreading Technology!
    This KH-40000/16 is two socket solutions mainhoard CPUs => 2x 16C/16T CPUs = 32C/32T.

    link: https://0.rc.xiniu.com/g4/M00/47/15/CgAG0mJXwJKANRaOAAG8BesnTC0690.jpglink: https://tieba.baidu.com/p/7796218400#/

    Regards

    Tralalak
    Reply
  • tomscomments
    I don't think zhaoxin is their top priority, more something for maintaining a sort of X86 compatibility that's going to get developed accordingly to market
    Loongson seems to be the futur contender of western socs, the one that'll take the consumer market (maybe with some new coming ones) they already built shenwei for supercomputing
    I think loogson is the processor used most in china, even in space programs They'll go with arm based kupeng, riscV etc They'll let many actors emerge. Knowing they are heavily investing in quantum computing. So i don't see zhaoxin attracting most ressources and best engineers
    Reply
  • Geef
    When will the Chinese finally make something that the west can steal the IP from them instead of the other way around like this?
    Reply
  • regs01
    That's single thread performance of Zen 1 on same clocks

    https://browser.geekbench.com/v5/cpu/compare/8603346?baseline=15706425
    Reply