China's newest homegrown AI chip matches industry standard at 45 TOPS — 6nm Arm-based 12-core Cixin P1 starting mass production
New SOC uses Arm designs to give China AI PCs
With AI PC hype at a fever pitch, chip vendors Intel, AMD, and Qualcomm are engaged in a fierce dogfight. Meanwhile, relative newcomer Cixin Technology hopes to offer China a domestic alternative for AI performance with the Cixin P1, a brand-new SoC for AI PCs.
The Cixin P1, announced yesterday in a press conference, is an Arm processor built on the 6nm process, with a 12-core CPU at 3.2 GHz, 10-core integrated graphics, and an NPU capable of 30 TOPS — which pushes the entire package to a total of 45 TOPS. The CPU is built on the Armv9.2-A architecture and runs a configuration of 8 performance cores and 4 efficiency cores. Little info about the chip has made it out of China, so we're relying heavily on coverage from the Electronic Engineering Times China.
Cixin declined to share specifics on the CPU, GPU, or NPU used in the P1 SOC. Based on the specs it did share, and ARM's presence as a guest at the launch event, it is highly likely that Cixin is licensing Arm's CPU and GPU designs. The Cortex-A or Cortex-X series are RISC processors designed by Arm; the onboard CPU is likely a new variant of these.
The GPU is an Arm Immortalis GPU. Speculation suggests that the specific GPU used is the Immortalis-G720 — Arm's 5th-gen mobile GPU, designed for mobile gaming and AI performance. The NPU likely comes from a partner other than Arm, since Arm's largest licensable NPU only scales up to 10 TOPS.
The Cixin P1 has a modern slate of features, including support for LPDDR5-6400 memory and 16x PCIe 4.0 expansion for dedicated GPUs/AI accelerators, and it can output at up to 4K 120FPS. The P1 was built to be a chameleon of a processor, developed under a strategy of "one chip for many uses". This applies to its firmware, which is uniquely compatible with Windows, Android, Kirin, Tongxin, and a host of other operating systems, foreign and domestic.
"One chip for many uses" was also the guiding light in determining Cixin P1's use cases. "Notebooks, mini PCs, all-in-one computers, desktops, home entertainment consoles, enterprise edge hosts, etc.," were cited as potential uses for the P1, and launch attendees saw the processor power notebooks and desktops. They also saw the many uses in person, as the company had benchmarks like 3DMark06 and the popular video game Genshin Impact running on the floor, along with other productivity and AI LLM demos. No numbers, such as benchmark results or Genshin's frames per second, were shown — though reporters qualitatively describe demos as running smoothly.
The Cixin P1 is a very unique processor. While it does not fulfill Beijing's goals of being a fully home-grown processor because it uses Arm CPU and GPU designs, the chip is specifically designed for the needs of the Chinese market and stands ready to enjoy some level of success. It also sadly does not meet Microsoft's arbitrary "AI PC" requirements. The P1's NPU on its own is only 30 TOPS — shy of the 40 TOPS target — though it's likely that Cixin could not care less about Windows Copilot+ certification.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Cixin is a very young company, first established in 2021, that has been growing thanks to investments from 15-20 public and private investment partners listed on its website. This freshness to the Chinese tech scene may limit widespread adoption early on, so we'll have to wait and see if Cixin manages to succeed in the huge, hungry tech market seeking separation from U.S. interference.
Dallin Grimm is a contributing writer for Tom's Hardware. He has been building and breaking computers since 2017, serving as the resident youngster at Tom's. From APUs to RGB, Dallin has a handle on all the latest tech news.
-
ThomasKinsley This doesn't tell me that China is ahead in the AI race. It tells me that 45 TOPS is relatively easy to achieve and the industry is slow-walking progress to maximize sales. Per Nvidia's site, the 3060 has 102 AI TOPS. If you have a 4080 then you have 780 AI TOPS, and a 4090 has 1321 AI TOPS. So why the fuss over 45 TOPS?Reply -
usertests
40 TOPS is the threshold Microsoft chose for Copilot+. It may be arbitrary, but it is a level at which performance is starting to get good for a number of applications.ThomasKinsley said:This doesn't tell me that China is ahead in the AI race. It tells me that 45 TOPS is relatively easy to achieve and the industry is slow-walking progress to maximize sales. Per Nvidia's site, the 3060 has 102 AI TOPS. If you have a 4080 then you have 780 AI TOPS, and a 4090 has 1321 AI TOPS. So why the fuss over 45 TOPS?
As for NPU vs dGPU, if the tinier NPU is delivering higher TOPS/Watt than those GPUs, then it has value. For example, the RTX 3060 has a TDP of 170W, although maybe consumption during an AI workload is less than that, IDK.
I don't know how much power the Cixin P1 NPU uses to get to 30 TOPS, or XDNA2 to get to 50 TOPS, etc. I wish that info was easy to find. If it's about 5 Watts or less, it seems superior in efficiency. Hopefully all TOPS numbers are measuring the same thing (usually INT8). -
ThomasKinsley
You're absolutely right that efficiency can matter in laptops, but in terms of raw performance, especially for generative AI models, you would think that would get more attention. Especially when most mainstream gamers already have this raw power in their PCs.usertests said:40 TOPS is the threshold Microsoft chose for Copilot+. It may be arbitrary, but it is a level at which performance is starting to get good for a number of applications.
As for NPU vs dGPU, if the tinier NPU is delivering higher TOPS/Watt than those GPUs, then it has value. For example, the RTX 3060 has a TDP of 170W, although maybe consumption during an AI workload is less than that, IDK.
I don't know how much power the Cixin P1 NPU uses to get to 30 TOPS, or XDNA2 to get to 50 TOPS, etc. I wish that info was easy to find. If it's about 5 Watts or less, it seems superior in efficiency. Hopefully all TOPS numbers are measuring the same thing (usually INT8). -
usertests
Nvidia's marketers certainly tried to give it more attention:ThomasKinsley said:You're absolutely right that efficiency can matter in laptops, but in terms of raw performance, especially for generative AI models, you would think that would get more attention. Especially when most mainstream gamers already have this raw power in their PCs.
https://www.tomshardware.com/tech-industry/artificial-intelligence/nvidia-criticizes-ai-pcs-says-microsofts-45-tops-requirement-is-only-good-enough-for-basic-ai-tasks
But the fact is, Microsoft is pushing for NPUs to go everywhere, Apple already had them, etc. Rapidly creating a minimum baseline means developers can target it. NPUs aren't just going into laptops, but also millions of office desktops without discrete GPUs (starting with Arrow Lake, mostly).
It's also an additional resource you can use while using 100% of your GPU for something else, like a game. However, the NPU is taking up die space that could have been omitted or used for more cache, cores, iGPU, etc. Microsoft has forced everyone to pay the price. -
bit_user
They didn't provide cost or power figures, did they? Furthermore, I'm sure the spec is theoretical. I'd love to know their sustained, real world performance.ThomasKinsley said:This doesn't tell me that China is ahead in the AI race. It tells me that 45 TOPS is relatively easy to achieve
Lots of companies have tried to build AI chips and most of them are defunct. That tells me it's not as easy as it seems. I think there's a long tail, where even though the main thing you need is a powerful tensor product engine, a lot more flexibility and functionality is needed to have a viable product.ThomasKinsley said:the industry is slow-walking progress to maximize sales.
Those are expensive products, made on a TSMC 4nm-class node. This is purportedly made on a 6nm-class node and also contains 12 ARM cores, PCIe interface, and the rest of the standard SoC stuff.ThomasKinsley said:If you have a 4080 then you have 780 AI TOPS, and a 4090 has 1321 AI TOPS. So why the fuss over 45 TOPS?
If 45 TOPS were such an inexpensive proposition, then AMD's Phoenix and Intel's Meteor Lake would've easily cleared this bar. -
ThomasKinsley
None that I can find.bit_user said:They didn't provide cost or power figures, did they? Furthermore, I'm sure the spec is theoretical. I'd love to know their sustained, real world performance.
OK, I can agree there. When I see generative AI can be done on a Commodore, which was not built with generative AI in mind, it doesn't seem as specialized of a task as recent marketing suggests.bit_user said:Lots of companies have tried to build AI chips and most of them are defunct. That tells me it's not as easy as it seems. I think there's a long tail, where even though the main thing you need is a powerful tensor product engine, a lot more flexibility and functionality is needed to have a viable product.
What is the point of the 45 TOPS standard in the first place? I can understand program specs that require 16GB of RAM or a certain GPU shader version, but is there any test or product that needs 45 TOPS over 37 TOPS? Most of the AI is done on the cloud anyway. It seems as if 45 is an arbitrary number picked because they knew they were going to reach it soon to get people to buy new laptops (just a theory).bit_user said:Those are expensive products, made on a TSMC 4nm-class node. This is purportedly made on a 6nm-class node and also contains 12 ARM cores, PCIe interface, and the rest of the standard SoC stuff.
If 45 TOPS were such an inexpensive proposition, then AMD's Phoenix and Intel's Meteor Lake would've easily cleared this bar.
I've recently been dabbling in offline AI models. I finally found one that works with my pre-W11 specs, and I'm surprised that it does. It's slow because the CPU is doing most of the heavy lifting instead of the GPU, but if my aging specs can work, then I'm fairly confident that modern desktops can do it much better, especially if the GPU is leveraged properly. But that requires far more than 45 TOPS. -
bit_user
Did you read the article?? It takes twenty minutes to generate 8x8 pixel images! That proves nothing!ThomasKinsley said:OK, I can agree there. When I see generative AI can be done on a Commodore, which was not built with generative AI in mind, it doesn't seem as specialized of a task as recent marketing suggests.
Fair question. I assume it's what they deemed necessary to generate tokens at a reasonable speed, for a LLM of reasonable complexity. Rather than speculate further, it'd probably make sense to see if they ever provided a justification.ThomasKinsley said:What is the point of the 45 TOPS standard in the first place? -
ThomasKinsley
What it demonstrates is that there are no hardware barriers to create generative AI. It doesn't require special GPU shaders or codecs or even an NPU. A humble chip from the '80s can do it given enough time. So what is it that these new 45 TOPS chips give us that current chips and graphics cards do not?bit_user said:Did you read the article?? It takes twenty minutes to generate 8x8 pixel images! That proves nothing!
Assuming there is an AI renaissance and I wanted to get ahead of it, I wouldn't want to buy one of these 45 TOPS laptops that are CoPilot+ certified. I'd purchase 192GB of RAM (4x48GB) and the best AI GPU out there to crunch AI models. And if I was a developer trying to make the latest software/AI model, I'd be skipping the laptops and tuning my model for commonly-used GPUs, including the RTX3060 and up.
Somehow this entire segment is being ignored as marketing is going for 45 TOPS instead. What are the benefits? If the AI is on the cloud then your system's TOPS don't matter because the servers are doing the work for you. If your system is running the load, then why not get better hardware and leave the laptops in the dust?
Absolutely, but then the counterargument could be that if 45 TOPS is sufficient then 105 TOPS is even better. Not trying to be argumentative. I'm trying to think of a good argument to justify their position as well.bit_user said:Fair question. I assume it's what they deemed necessary to generate tokens at a reasonable speed, for a LLM of reasonable complexity. Rather than speculate further, it'd probably make sense to see if they ever provided a justification. -
bit_user
I don't know if you're familiar with the concept of a Turing machine, but anything that can be reduced to digital computation is computable by one. All you get, by adding complexity, is just making it faster.ThomasKinsley said:What it demonstrates is that there are no hardware barriers to create generative AI. It doesn't require special GPU shaders or codecs or even an NPU. A humble chip from the '80s can do it given enough time.
https://en.wikipedia.org/wiki/Turing_machine
In a laptop? Maybe for you, but most corporate users (including me) simply will not lug around such a huge gaming laptop. They want something thin & light, with decent battery life - even if they're using AI features like MS CoPilot.ThomasKinsley said:Assuming there is an AI renaissance and I wanted to get ahead of it, I wouldn't want to buy one of these 45 TOPS laptops that are CoPilot+ certified. I'd purchase 192GB of RAM (4x48GB) and the best AI GPU out there to crunch AI models.
Huh? You can buy high-end business laptops that have a dGPU. They're just big, loud, and often have poor battery life (at least, in my experience).ThomasKinsley said:Somehow this entire segment is being ignored
Yes, if you don't need a laptop, then don't buy one! Geez...ThomasKinsley said:If your system is running the load, then why not get better hardware and leave the laptops in the dust?
Well, if 45 TOPS is sufficient, then inflating the spec would be bad due to making the minimum hardware more expensive, hot, loud, bulky, etc. That means fewer people will adopt it, which is contrary to MS' interests, being a software company that's trying to make $$$ off of their AI software.ThomasKinsley said:Absolutely, but then the counterargument could be that if 45 TOPS is sufficient then 105 TOPS is even better. -
anoldnewb
The primary purpose of a NPU on an end user computer is not to train LLM models but to use pre-trained models to customize and inference based on your local information. It can be used to process voice, text or image based inputs and generate outputs. By including data from your emails, web searches and documents it can (theoretically) provide you with more appropriate answers. 45 TOPS is an estimate to allow near real time responses.ThomasKinsley said:None that I can find.
OK, I can agree there. When I see generative AI can be done on a Commodore, which was not built with generative AI in mind, it doesn't seem as specialized of a task as recent marketing suggests.
What is the point of the 45 TOPS standard in the first place? I can understand program specs that require 16GB of RAM or a certain GPU shader version, but is there any test or product that needs 45 TOPS over 37 TOPS? Most of the AI is done on the cloud anyway. It seems as if 45 is an arbitrary number picked because they knew they were going to reach it soon to get people to buy new laptops (just a theory).
I've recently been dabbling in offline AI models. I finally found one that works with my pre-W11 specs, and I'm surprised that it does. It's slow because the CPU is doing most of the heavy lifting instead of the GPU, but if my aging specs can work, then I'm fairly confident that modern desktops can do it much better, especially if the GPU is leveraged properly. But that requires far more than 45 TOPS.