Skip to main content

Zhaoxin KaiXian x86 CPU Tested: The Rise of China's Chips

Zhaoxin KX-6000 IPC and Performance Scaling

(Image credit: Tom's Hardware)

We prefer to measure instruction per cycle (IPC) throughput by locking all processors to the same frequency, typically matching the minimum base speed of the fastest processor to minimize the effect on cache and fabric timings that impact performance. However, the Zhaoxin KX-U6780A ticks at a mere 2.7 GHz that falls well below the minimum base speeds of comparable processors, and the spartan BIOS doesn't support modifying the multiplier. We also can't make adjustments to memory timings. As such, we tested with comparable processors without boost mechanisms or fixed the clock rates (A10-9700) to assure static frequencies, set the memory to the supported frequencies for each chip, and then normalized the results. This isn't our preferred method, but it is good enough for the task at hand. 

We assigned the Zhaoxin KX-U6780A as our baseline model, but it obviously lags the competing chips by a large margin. AMD's Bristol Ridge A10-9700 with the pre-Zen Excavator cores beats the Zhaoxin chip in every metric, while the 12nm Zen+ architecture on the Ryzen 3 3200G extends the lead further. The Ryzen 5 3600 with Zen 2 cements AMD's lead. You can see the current state of Zen 2's IPC performance with more expansive results here

Intel's Kaby Lake opens up a big lead over the Zhaoxin, representing the refreshed Skylake architecture's general IPC trend. Intel's current-gen Coffee Lake processors offer mostly identical IPC performance, so this is an accurate depiction of the current state of play for Intel, security mitigations included. Intel's stagnation on the microarchitectural front hurts versus AMD, but it has plenty of breathing room against Zhaoxin. 

Zhaoxin's key remit will be to improve IPC in the future through architectural enhancements and increased frequencies, but that isn't a straightforward proposition: Other aspects of the chip will also have to move forward in lockstep. 

Image 1 of 4

(Image credit: Tom's Hardware)
Image 2 of 4

(Image credit: Tom's Hardware)
Image 3 of 4

(Image credit: Tom's Hardware)
Image 4 of 4

(Image credit: Tom's Hardware)

The first chart in our album has the multi-threaded Cinebench score we attained with each processor (Multi-Core), along with that score divided by the number of cores (Multi-Threaded Per-Core Score). We also included the result of the Cinebench single thread test (Single Threaded Score). 

These heavily-threaded applications give us an idea of how well each workload scales on the respective architectures. Threading plays a role in boosting the per-core performance in each of these tests, but we're focusing simply on the performance of each physical core regardless of the number of cores. The KX-U6780A obviously suffers from poor per-core performance, but there may be other architectural issues at play that hinder scalability.

Pay attention to the per-core score we calculated from the multi-threaded result and the score from the single-threaded test. As you'll notice, chips with Hyper-Threading gain some performance over our calculations from the multi-thread test, often in the ~20% range, because now both threads are active on a single core. For chips that feature boost technology, they also get the added benefit of a higher single-core boost frequency, albeit typically for a short time considering the length of this test. It goes without saying, but the combination of those technologies would benefit Zhaoxin's processors.

However, there are other factors that can limit performance scalability. The Core i3-8100 doesn't have Hyper-Threading or boost technology, and you'll notice that it loses some performance, albeit not a drastic amount, when comparing the single-thread test results to our calculated per-core results from the multi-threaded test. These types of scaling losses can come from cache and fabric contention, a condition that can be exacerbated by bandwidth-consuming thread dependencies, so these factors must be accounted for during the design stages of the processor. You have to right-size the fabric for the job, and here we can that Zhaoxin's chip loses only four points between the two measurements. You'll see a similar trend with the POV-Ray tests. 

Scaling up per-core performance requires faster interconnects to handle inter-core traffic, not to mention access to memory and I/O devices. That's the key reason why both AMD and Intel pound the interconnect drums so frequently in marketing materials – it has a tremendous impact on workload scalability.

It's fully possible that the KX-U6780A's process node could clock higher, but striking the correct balance between per-core performance and chip interconnects could be best achieved by dialing back the frequency to match the interconnect saturation point, thus landing lower on the voltage/frequency curve and yielding better power efficiency and thermals. We won't know if that is a concern here until Zhaoxin shares more details of its architecture. 

We also included scaling tests with the V-Ray and Stockfish benchmarks, both of which scale very well and fully saturate the cores during operation. We don't have comparable single-threaded test results (there isn't a benchmark for that), but it does provide an interesting holistic view of how Zhaoxin relies upon more cores to compete with chips with far more efficient designs. 

Zhaoxin KX-U6780A Power Consumption

Measuring power consumption is always a tricky proposition, with different methodologies yielding different results. Intercepting power at the physical layer (i.e., measuring at the 8-pin connector) provides the most accurate measurements, but VRM inefficiencies lead to higher power draw measurements that don't match the actual power consumed by the processor. 

Many software utilities provide granular power logging features, but these reports can be inaccurate with some motherboards. However, the advantage of polling the sensor loop boils down to the fact that this technique measures the actual amount of power consumed by the processor itself. To merge the best of both worlds while still ensuring accuracy, we typically compare the power measurements at the physical layer from those we pull from the sensor loop to verify that the software output plausibly coincides. That technique enables fine-grained power testing that represents the real power consumption of the processor under test. 

Image 1 of 2

(Image credit: Tom's Hardware)
Image 2 of 2

(Image credit: Tom's Hardware)

Unfortunately, the Zhaoxin development board doesn't support sensor loop-based power logging, so we turned to Passmark's Inline PSU tester to measure the amount of power flowing into the 8-pin connector. This device measures in a pass-through mode with a high level of accuracy and has expansive logging capabilities, making it an excellent addition to our arsenal of power-testing tools. However, the measurements for the Khaoxin processor come directly from the 8-pin connector as opposed to the information from the sensor loop, so you'll have to account for VRM inefficiencies that can lop ~10 to ~15% off the power readings. 

Image 1 of 2

(Image credit: Tom's Hardware)
Image 2 of 2

(Image credit: Tom's Hardware)

Zhaoxin's design also complicates matters. We know little about the LiuJiaLolapoolza architecture, but the company informed us that while the chipset and graphics units are part of the single monolithic die underneath the heatspreader, these units pull power from a separate power domain that's fed through the 24-pin connector. The company uses a special motherboard to measure the total power draw of the processor, but we don't have access to that equipment.

Because of this power delivery arrangement we can't ascertain the real total power draw of the package (we have no way of knowing how much power flowing from the 24-pin goes to the processor, specifically), so you'll have to take these power measurements with a grain of salt. However, we did measure ~55W of power consumption through the 8-pin connector, and after accounting for VRM losses, we're looking at roughly the same power draw, if not slightly less, than AMD's A10-9700 that's fabbed on a 28nm process. 

Even with the somewhat unclear power results, we can see the power burden bestowed by the older 16nm process. This will certainly improve when the company moves to the 7nm process with the KX-7000 series, but unsurprisingly, the KX-U6780A isn't very power efficient compared to competing processors with smaller process nodes that yield lower power consumption and higher performance.   

Zhaoxin HX002EH1 Dev Board

Regular consumers will never see this reference validation board because it's designed for Zhaoxin's own internal dev work. We can see that the stock cooler topping the chip bears three heatpipes for efficient thermal dissipation, but the fan is loud and the BIOS doesn't offer any bells or whistles, like custom fan curves. Instead, the fan runs on its own based on load. 

The custom cooler mounts over the BGA package that sits adjacent to the 16-lane PCIe slot (lane width is x8, though). The board also houses one 4-lane and three single-lane PCIe slots, along with an old-school PCI slot. 

The four layer slab of PCB measures 244 mm x 305 mm, meaning it adheres to the ATX specification. As such, it also supports the standard ATX power connections, like a 24-pin and single 8-pin for power delivery. The 8-pin feeds a three-phase power delivery subsystem that comes with no additional cooling, such as a heatsink. That's not too much of a concern given the 70W TDP of the chip. 

Image 1 of 6

(Image credit: Tom's Hardware)
Image 2 of 6

(Image credit: Tom's Hardware)
Image 3 of 6

(Image credit: Tom's Hardware)
Image 4 of 6

(Image credit: Tom's Hardware)
Image 5 of 6

(Image credit: Tom's Hardware)
Image 6 of 6

(Image credit: Tom's Hardware)

The board sports a decent helping of connectivity that includes VGA, HDMI and DisplayPort outs along with the following accommodations: 

Four SATA 3.0 connectors
- One PCIe M.2
- One USB 3.1 Gen 2 port on one Type C connector
- One USB 3.1 Gen 2 port on one Type C pin header
- Two USB 3.1 Gen 1 ports on one Type A connector
- Two USB 3.1 Gen 1 ports on one pin header
- Two USB2.0 ports on one Type A connector
- Eight USB2.0 on x4 pin header
- Two UART ports
- One Audio Codec ALC662

The motherboard sports the ZX-200 IO expansion chip (6W chipset) that provides eight lanes of PCIe 2.0 and houses the in-built SATA and USB controller. The 40nm chip sports up to 11 USB ports, as listed above. Zhaoxin says this chipset is used for desktop PC's, all-in-ones, and laptops. 

The board is extremely spartan, and has the BIOS to match. You can't specify memory frequencies or timings, overclocking isn't permitted, and almost all of the features are handled automatically. We imagine that custom motherboards will come with more of the enthusiast-minded trimmings, though given the chip's capabilities, we wouldn't expect the fantastic RGB light shows and muscular power delivery cooling solutions we see on high end boards.  

Zhaoxin HFCBGA
KX-U6780A
HX002EH1 Development Board
2x 8GB SK Hynix DDR4-2666
AMD Socket AM4 (X570/B450M/X370)
Athlon 200GE, 220GE, 3000G, Ryzen 3 3200G, Athlon A10-9700

MSI MEG X570 Godlike / ASUS B450M Plus (iGPU) / MSI X370 Xpower Gaming Titanium (A10-9700)

2x 8GB G.Skill Flare DDR4-3200

Ryzen 3000 - DDR4-3200, DDR4-3600

Second-gen Ryzen - DDR4-2933, DDR4-3466
Intel LGA 1151 (Z390)

Intel Core i5-9600K, Core i5-9400F, i3-9350KF, i3-9100

MSI MEG Z390 Godlike / MSI MPG Z390 (iGPU)

2x 8GB G.Skill FlareX DDR4-3200 @ DDR4-2667 & DDR4-3600
All Systems

Nvidia GeForce RTX 2080 Ti

2TB Intel DC4510 SSD

EVGA Supernova 1600 T2, 1600W

Windows 10 Pro (1903 - All Updates)
Cooling

Corsair H115i, Zhaoxin Stock Cooler

MORE: Best CPUs

MORE: Intel & AMD Processor Hierarchy

MORE: All CPUs Content

  • nofanneeded
    China CPU making Potential is in the ARM not the X86 market ... their Huwawei Kirin ARM CPU is the thing not X86 Chips.
    Reply
  • alextheblue
    we're looking at roughly the same power draw, if not slightly less, than AMD's A10-9700 that's also fabbed on a 28nm process.

    Even with the somewhat unclear power results, we can clearly see the power burden bestowed by the older 28nm process.
    It seems you're implying both chips are fabbed on the same process. The first page says 16nm FinFET.
    Reply
  • PaulAlcorn
    alextheblue said:
    It seems you're implying both chips are fabbed on the same process. The first page says 16nm FinFET.

    Good eye, thanks Alex. Fixed.
    Reply
  • jimmysmitty
    Remember AMD's Phoenix-like rise from the relative ashes of the semiconductor market to the value and performance leader?


    I do. It only took one daring new architecture with a massive 52% IPC gain paired with a good-enough 14nm GlobalFoundries process, and perhaps a little bit of luck with Intel's delays on the 10nm node, to upset both the desktop PC and data center markets.


    Except AMD already had what they needed to meet Intel performance wise and their first step was a catch up after using a uArch that was just bad all around. Bulldozer launched to being beaten by K10.5 CPUs in some areas.

    You also have to consider that Intel and AMD had a big settlement a few years ago that allowed for cross patent sharing so AMD has a lot to work with to design.

    This CPU is extremely underwhelming and will probably only exist in the Chinese market or places that are too cheap to buy AMD or Intel.
    Reply
  • JarredWaltonGPU
    jimmysmitty said:
    Except AMD already had what they needed to meet Intel performance wise and their first step was a catch up after using a uArch that was just bad all around. Bulldozer launched to being beaten by K10.5 CPUs in some areas.

    You also have to consider that Intel and AMD had a big settlement a few years ago that allowed for cross patent sharing so AMD has a lot to work with to design.

    This CPU is extremely underwhelming and will probably only exist in the Chinese market or places that are too cheap to buy AMD or Intel.
    Zhaoxin is at least capable of making x86-64 CPUs, and there's been a pretty decent uplift in performance over its previous gen chip. Still a long way to go, sure, but I don't think closing the uarch gap is going to be that difficult -- especially if patents and licensing are ignored. I'm sure these chips violate hundreds of Intel and AMD patents, but proving that will be difficult, and as long as they remain a China-only product there's not much to be gained by AMD or Intel in trying to fight it.
    Reply
  • Dsplover
    What a kind review for a crappy chip.
    Can’t wait to see the excitement over the next major achievement...
    Reply
  • pug_s
    JarredWaltonGPU said:
    Zhaoxin is at least capable of making x86-64 CPUs, and there's been a pretty decent uplift in performance over its previous gen chip. Still a long way to go, sure, but I don't think closing the uarch gap is going to be that difficult -- especially if patents and licensing are ignored. I'm sure these chips violate hundreds of Intel and AMD patents, but proving that will be difficult, and as long as they remain a China-only product there's not much to be gained by AMD or Intel in trying to fight it.

    Zhaoxin exist only because China's contingency plans if they are not allowed to buy intel or AMD desktop processors. Thanks to Obama's and Trump's exclusion to buy US technologies, China's 2025 plans is to get away from US technologies in the next few years. In the next few years, the Chinese government will probably use some kind of Linux dist using Risc V chips utilizing open source software.
    Reply
  • Gurg
    For reference, performance in Fire Strike Physics is about 87% of the 7857 score generated by my old 2700K eight years ago.
    Reply
  • jimmysmitty
    JarredWaltonGPU said:
    Zhaoxin is at least capable of making x86-64 CPUs, and there's been a pretty decent uplift in performance over its previous gen chip. Still a long way to go, sure, but I don't think closing the uarch gap is going to be that difficult -- especially if patents and licensing are ignored. I'm sure these chips violate hundreds of Intel and AMD patents, but proving that will be difficult, and as long as they remain a China-only product there's not much to be gained by AMD or Intel in trying to fight it.

    It's an extremely long way to go. Its using more power and in a lot of cases giving half the performance of an i3.

    I don't see AMD or Intel fighting it but still even in China this will only sell well to basic users if that unless the government limits or stops sales from AMD and Intel. No one wants to pay money for a product thats that far behind.
    Reply
  • jonathan1683
    Raise your hand if you want a POS Chinese CPU.
    Reply