Zhaoxin KX-6000 IPC and Performance Scaling
We prefer to measure instruction per cycle (IPC) throughput by locking all processors to the same frequency, typically matching the minimum base speed of the fastest processor to minimize the effect on cache and fabric timings that impact performance. However, the Zhaoxin KX-U6780A ticks at a mere 2.7 GHz that falls well below the minimum base speeds of comparable processors, and the spartan BIOS doesn't support modifying the multiplier. We also can't make adjustments to memory timings. As such, we tested with comparable processors without boost mechanisms or fixed the clock rates (A10-9700) to assure static frequencies, set the memory to the supported frequencies for each chip, and then normalized the results. This isn't our preferred method, but it is good enough for the task at hand.
We assigned the Zhaoxin KX-U6780A as our baseline model, but it obviously lags the competing chips by a large margin. AMD's Bristol Ridge A10-9700 with the pre-Zen Excavator cores beats the Zhaoxin chip in every metric, while the 12nm Zen+ architecture on the Ryzen 3 3200G extends the lead further. The Ryzen 5 3600 with Zen 2 cements AMD's lead. You can see the current state of Zen 2's IPC performance with more expansive results here.
Intel's Kaby Lake opens up a big lead over the Zhaoxin, representing the refreshed Skylake architecture's general IPC trend. Intel's current-gen Coffee Lake processors offer mostly identical IPC performance, so this is an accurate depiction of the current state of play for Intel, security mitigations included. Intel's stagnation on the microarchitectural front hurts versus AMD, but it has plenty of breathing room against Zhaoxin.
Zhaoxin's key remit will be to improve IPC in the future through architectural enhancements and increased frequencies, but that isn't a straightforward proposition: Other aspects of the chip will also have to move forward in lockstep.
The first chart in our album has the multi-threaded Cinebench score we attained with each processor (Multi-Core), along with that score divided by the number of cores (Multi-Threaded Per-Core Score). We also included the result of the Cinebench single thread test (Single Threaded Score).
These heavily-threaded applications give us an idea of how well each workload scales on the respective architectures. Threading plays a role in boosting the per-core performance in each of these tests, but we're focusing simply on the performance of each physical core regardless of the number of cores. The KX-U6780A obviously suffers from poor per-core performance, but there may be other architectural issues at play that hinder scalability.
Pay attention to the per-core score we calculated from the multi-threaded result and the score from the single-threaded test. As you'll notice, chips with Hyper-Threading gain some performance over our calculations from the multi-thread test, often in the ~20% range, because now both threads are active on a single core. For chips that feature boost technology, they also get the added benefit of a higher single-core boost frequency, albeit typically for a short time considering the length of this test. It goes without saying, but the combination of those technologies would benefit Zhaoxin's processors.
However, there are other factors that can limit performance scalability. The Core i3-8100 doesn't have Hyper-Threading or boost technology, and you'll notice that it loses some performance, albeit not a drastic amount, when comparing the single-thread test results to our calculated per-core results from the multi-threaded test. These types of scaling losses can come from cache and fabric contention, a condition that can be exacerbated by bandwidth-consuming thread dependencies, so these factors must be accounted for during the design stages of the processor. You have to right-size the fabric for the job, and here we can that Zhaoxin's chip loses only four points between the two measurements. You'll see a similar trend with the POV-Ray tests.
Scaling up per-core performance requires faster interconnects to handle inter-core traffic, not to mention access to memory and I/O devices. That's the key reason why both AMD and Intel pound the interconnect drums so frequently in marketing materials – it has a tremendous impact on workload scalability.
It's fully possible that the KX-U6780A's process node could clock higher, but striking the correct balance between per-core performance and chip interconnects could be best achieved by dialing back the frequency to match the interconnect saturation point, thus landing lower on the voltage/frequency curve and yielding better power efficiency and thermals. We won't know if that is a concern here until Zhaoxin shares more details of its architecture.
We also included scaling tests with the V-Ray and Stockfish benchmarks, both of which scale very well and fully saturate the cores during operation. We don't have comparable single-threaded test results (there isn't a benchmark for that), but it does provide an interesting holistic view of how Zhaoxin relies upon more cores to compete with chips with far more efficient designs.
Zhaoxin KX-U6780A Power Consumption
Measuring power consumption is always a tricky proposition, with different methodologies yielding different results. Intercepting power at the physical layer (i.e., measuring at the 8-pin connector) provides the most accurate measurements, but VRM inefficiencies lead to higher power draw measurements that don't match the actual power consumed by the processor.
Many software utilities provide granular power logging features, but these reports can be inaccurate with some motherboards. However, the advantage of polling the sensor loop boils down to the fact that this technique measures the actual amount of power consumed by the processor itself. To merge the best of both worlds while still ensuring accuracy, we typically compare the power measurements at the physical layer from those we pull from the sensor loop to verify that the software output plausibly coincides. That technique enables fine-grained power testing that represents the real power consumption of the processor under test.
Unfortunately, the Zhaoxin development board doesn't support sensor loop-based power logging, so we turned to Passmark's Inline PSU tester to measure the amount of power flowing into the 8-pin connector. This device measures in a pass-through mode with a high level of accuracy and has expansive logging capabilities, making it an excellent addition to our arsenal of power-testing tools. However, the measurements for the Khaoxin processor come directly from the 8-pin connector as opposed to the information from the sensor loop, so you'll have to account for VRM inefficiencies that can lop ~10 to ~15% off the power readings.
Zhaoxin's design also complicates matters. We know little about the LiuJiaLolapoolza architecture, but the company informed us that while the chipset and graphics units are part of the single monolithic die underneath the heatspreader, these units pull power from a separate power domain that's fed through the 24-pin connector. The company uses a special motherboard to measure the total power draw of the processor, but we don't have access to that equipment.
Because of this power delivery arrangement we can't ascertain the real total power draw of the package (we have no way of knowing how much power flowing from the 24-pin goes to the processor, specifically), so you'll have to take these power measurements with a grain of salt. However, we did measure ~55W of power consumption through the 8-pin connector, and after accounting for VRM losses, we're looking at roughly the same power draw, if not slightly less, than AMD's A10-9700 that's fabbed on a 28nm process.
Even with the somewhat unclear power results, we can see the power burden bestowed by the older 16nm process. This will certainly improve when the company moves to the 7nm process with the KX-7000 series, but unsurprisingly, the KX-U6780A isn't very power efficient compared to competing processors with smaller process nodes that yield lower power consumption and higher performance.
Zhaoxin HX002EH1 Dev Board
Regular consumers will never see this reference validation board because it's designed for Zhaoxin's own internal dev work. We can see that the stock cooler topping the chip bears three heatpipes for efficient thermal dissipation, but the fan is loud and the BIOS doesn't offer any bells or whistles, like custom fan curves. Instead, the fan runs on its own based on load.
The custom cooler mounts over the BGA package that sits adjacent to the 16-lane PCIe slot (lane width is x8, though). The board also houses one 4-lane and three single-lane PCIe slots, along with an old-school PCI slot.
The four layer slab of PCB measures 244 mm x 305 mm, meaning it adheres to the ATX specification. As such, it also supports the standard ATX power connections, like a 24-pin and single 8-pin for power delivery. The 8-pin feeds a three-phase power delivery subsystem that comes with no additional cooling, such as a heatsink. That's not too much of a concern given the 70W TDP of the chip.
The board sports a decent helping of connectivity that includes VGA, HDMI and DisplayPort outs along with the following accommodations:
Four SATA 3.0 connectors
- One PCIe M.2
- One USB 3.1 Gen 2 port on one Type C connector
- One USB 3.1 Gen 2 port on one Type C pin header
- Two USB 3.1 Gen 1 ports on one Type A connector
- Two USB 3.1 Gen 1 ports on one pin header
- Two USB2.0 ports on one Type A connector
- Eight USB2.0 on x4 pin header
- Two UART ports
- One Audio Codec ALC662
The motherboard sports the ZX-200 IO expansion chip (6W chipset) that provides eight lanes of PCIe 2.0 and houses the in-built SATA and USB controller. The 40nm chip sports up to 11 USB ports, as listed above. Zhaoxin says this chipset is used for desktop PC's, all-in-ones, and laptops.
The board is extremely spartan, and has the BIOS to match. You can't specify memory frequencies or timings, overclocking isn't permitted, and almost all of the features are handled automatically. We imagine that custom motherboards will come with more of the enthusiast-minded trimmings, though given the chip's capabilities, we wouldn't expect the fantastic RGB light shows and muscular power delivery cooling solutions we see on high end boards.
Zhaoxin HFCBGA | |
KX-U6780A | |
HX002EH1 Development Board | |
2x 8GB SK Hynix DDR4-2666 | |
AMD Socket AM4 (X570/B450M/X370) | Row 4 - Cell 1 |
Athlon 200GE, 220GE, 3000G, Ryzen 3 3200G, Athlon A10-9700 | |
MSI MEG X570 Godlike / ASUS B450M Plus (iGPU) / MSI X370 Xpower Gaming Titanium (A10-9700) | |
2x 8GB G.Skill Flare DDR4-3200 | |
Ryzen 3000 - DDR4-3200, DDR4-3600 | |
Second-gen Ryzen - DDR4-2933, DDR4-3466 | |
Intel LGA 1151 (Z390) | |
Intel Core i5-9600K, Core i5-9400F, i3-9350KF, i3-9100 | |
MSI MEG Z390 Godlike / MSI MPG Z390 (iGPU) | |
2x 8GB G.Skill FlareX DDR4-3200 @ DDR4-2667 & DDR4-3600 | |
All Systems | |
Nvidia GeForce RTX 2080 Ti | |
2TB Intel DC4510 SSD | |
EVGA Supernova 1600 T2, 1600W | |
Windows 10 Pro (1903 - All Updates) | |
Cooling | |
Corsair H115i, Zhaoxin Stock Cooler |
MORE: Best CPUs
MORE: Intel & AMD Processor Hierarchy
MORE: All CPUs Content