Chinese CPU Maker Debuts 32-Core Chiplet-Based Processor

Loongson
(Image credit: Loongson)

Building large monolithic many-core CPUs is extremely hard even for renowned chip designers. For Chinese CPU developers, many of whom do not have access to leading-edge production nodes, the only way to build a processor with high core count is to adopt a chiplet design. As it turns out, this is exactly what Chinese CPU maker Loongson does with its 32-core 3D5000 processor, Sina reports.  

Earlier this year Loongson began to ship its 3C5000 processor that relies on 16 LA464 cores featuring the company's LoongArch microarchitecture, up to 64MB of cache, and four 64-bit DDR4-3200 memory interfaces with ECC support. The Loongson 3D5000 takes two 3C5000 CPUs and places it on a single piece of substrate to build a 32-core processor with eight memory channels. The 32-core processor supports up to 4-way simultaneous multiprocessor configurations and therefore it is possible to build a server with up to 128 cores. 

Loongson recently completed verification of its 3D5000 processor, the report says. The CPU reportedly consumes 130W at 2.0 GHz as well as 170W at 2.20 GHz. Loongson's 3D5000 CPU comes in an LGA-4129 packaging. 

While building a 32-core CPU based on a proprietary microarchitecture is an achievement, it should be noted that product is a way to test ability to build a chiplet-based design. While China-based SMIC — which produces processors for Loongson — slowly adopts more advanced nodes, it is significantly behind market leader TSMC. Therefore, companies like Loongson cannot offer products that are comparable to those from AMD and Intel.  

Chiplets represent a real opportunity for Loongson to build rather serious processors and servers platforms with significant number of cores and proprietary microarchitecture enhanced for servers and supercomputers. Meanwhile, we have no idea whether other CPU developers from China will follow the suit.  

When it comes to performance, Loongson says that its 32-core 3D5000 CPU scores 400 points in the SPEC CPU2006 base test, whereas the result of a 2-way 32-core 3D5000-based machine exceeded 800 points in the SPEC CPU2006 base benchmark. The CPU designer believes that a 4-way machine will hit 1600 points. 

Loongson is gearing up to ship samples of its 32-core processors in the first half of 2023, whereas commercial versions will be shipped later.

Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

  • atomicWAR
    Smart way to work around their node restrictions.
    Reply
  • bit_user
    32-core CPUs from China incoming.
    Subtitle is rather ironic, given that China has stopped exports of Loongson:

    https://www.tomshardware.com/news/china-bans-exports-of-its-loongson-cpus-to-russia-other-countries

    LoongArch resources, for the curious:
    https://en.wikipedia.org/wiki/Loongson#Loongson_3_LoongArch_processorshttps://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html
    As interesting as it is to watch the development of this architecture, I think RISC-V is likely to be more relevant in the near/mid-term, at least.
    Reply
  • bit_user
    atomicWAR said:
    Smart way to work around their node restrictions.
    The key question is whether they really gain anything from sharing a package.

    Years ago, Intel packaged 2 Cascade Lake Xeon dies together in a single CPU, but it was mostly just a PR stunt. They still talked to each other using the same UPI links they'd use, if they were in separate packages.

    https://www.anandtech.com/show/15149/how-to-tarnish-platinum-sell-it-as-xeon-9200
    Reply
  • atomicWAR
    bit_user said:
    The key question is whether they really gain anything from sharing a package.

    Years ago, Intel packaged 2 Cascade Lake Xeon dies together in a single CPU, but it was mostly just a PR stunt. They still talked to each other using the same UPI links they'd use, if they were in separate packages.

    https://www.anandtech.com/show/15149/how-to-tarnish-platinum-sell-it-as-xeon-9200

    I vaguely recall the debacle with all that. The Pentium D was another poor example gluing chips too if I recall correctly. No question how they execute their link will extremely important on how well these chips perform and scale not to mention the potential heat issues from using older nodes. Anyways I always applaud solid ingenuity working a problem with the tools available to you (and that's all I'll say on that part of it as I don't wish to be political here, not that anyone has been so far). Correctly excuted though this could be quite the performance bump in their home grown chips. Or it could just end up being a hot mess... time will tell.
    Reply
  • ohio_buckeye
    Yawn. I’ll take an amd or Intel cpu any day.
    Reply
  • sylas
    It will be at least a decade before their chips can run Crysis.
    Reply
  • bit_user
    atomicWAR said:
    I vaguely recall the debacle with all that.
    I think the issue was that it didn't solve a real problem, so there was no market demand. The main thing it let Intel do was seem like they were keeping up in the core-count race, by having a 56-core CPU.

    atomicWAR said:
    The Pentium D was another poor example gluing chips too if I recall correctly.
    Perhaps it was poorly executed, but at least in that case we're talking about fitting 2 cores into a uni-processor system, where you wouldn't otherwise have an option to install a second CPU. I actually had a Pentium D at work, and it wasn't bad for that point in time. It had a giant heatsink with a 140 mm fan, but that was enough to keep it fairly quiet.

    In the case of the server CPU I mentioned above, it added no real value because it didn't change the total number of cores you could put in an Intel server machine.

    atomicWAR said:
    No question how they execute their link will extremely important on how well these chips perform and scale not to mention the potential heat issues from using older nodes.
    I think the M1 Ultra stands out as the best-case scenario. It uses a purpose-built 2.5 GB/s interconnect for joining the two dies. It's highly-optimized and not just reusing a package-level interconnect like what Intel did with UPI or what AMD did with Infinity Link (over PCIe), in their first-gen EPYC.

    atomicWAR said:
    Correctly excuted though this could be quite the performance bump in their home grown chips. Or it could just end up being a hot mess... time will tell.
    Whether it's a performance win or just a space-saving way to hook the same 128-cores together that they could before, they'll definitely gain valuable experience from working with chiplets. So, I think the key point is the trajectory they're on, rather than how good this particular CPU will be.
    Reply
  • gg83
    Build a giant old tech super to design new tech super computers. If China puts enough resources at it they will get ahead of everyone else. It will be the battle of the AI's and supercomputers.
    Reply