After six years of promises and no shipping silicon, Tachyum revises Prodigy processor specs to 1,024 cores with 1,600W of power consumption — likely another 5-year delay, company claims its chip is 20 times faster than Nvidia's Rubin NVL576 rack

MEMBER EXCLUSIVE

(Image credit: Tachyum)

This week, Tachyum, a firm that has promised a processor that hasn't shipped for six years, and counting, has now published new target specifications and expected performance for its Prodigy universal processor, just a month after announcing its latest round of financing and its intention to 'upgrade' the Prodigy processor, which only exists on paper.

With target specifications for the most powerful Prodigy processor set, some of which seem unattainable in a realistic timeframe, Tachyum claims that a rack powered by its Prodigy Ultimate hardware will be over 21 times faster than Nvidia's upcoming NVL576 rack based on the Rubin Ultra GPUs. However, details about Tachyum's Prodigy processor released this week may indicate that the device will be delayed by four to five more years under best-case scenarios.

Prodigious hardware

As reported a month ago, Tachyum's Prodigy processor — or rather, system-in-package (SiP) — is said to adopt a multi-chiplet design, with each chiplet fabricated on TSMC's 2nm-class node and featuring up to 256 highly-custom cores with an 8-way out-of-order superscalar execution pipeline and matrix and vector accelerators.

Tachyum intends to introduce 12 Prodigy SKUs, with the range-topping Prodigy Ultimate carrying four chiplets and offering 768 or 1024 cores, up to 1 GB of L2 and L3 cache, 128 PCIe lanes, and a 24-channel memory subsystem supporting up to 48 TB of DDR5-17600 memory per socket and up to 3.38 TB/s peak bandwidth per socket. The Prodigy Premium SKU runs two chiplets and offers 256 – 512 cores and a 16-channel memory subsystem, while the Prodigy Entry SKU has 32 – 256 cores and an 8-channel memory subsystem.

From a Tachyum document, each chiplet contains what appears to be a systolic array of 264 cores organized into four 11×6 groups (66 per group), each integrating eight redundant cores, for a total of 256 cores/256-element matrix unit visible to software per chiplet.

This corroborates Tachyum's claim that its built-in matrix processor supports 16×16, 8×8, and 4×4 operations. Also, such a design provides one extra CPU core/MAC element per row and one extra CPU core/MAC element per column, which is consistent with systolic array design practices that tend to include spare elements for yield and repairability. However, keep in mind that CPUs tend not to use systolic array-like arrangements due to complicated data flows and increased latencies.

From what we can tell, each chiplet is designed to be a fully functional processor with up to 256 cores, 256 MB of L2 and L3 caches, its own eight-channel DDR5 memory subsystem, and I/O that includes up to 96 PCIe 7.0 lanes with 16 controllers. Note that Tachyum seems to reuse PCIe PHY for die-to-die and socket-to-socket interconnections, thus the range-topping Prodigy Ultimate 'only' offers 128 PCIe 7.0 lanes.

As always, Tachyum's specifications impress in terms of numbers, but the very overwhelming nature of these numbers, along with the company's track record of failure to deliver, makes them hard to believe and even expect them to materialize.

For example, a general-purpose computing-capable CPU with 1,024 cores operating at up to 6.0 GHz and consuming up to 1,600W of power seems unrealistic today, especially from a company with zero experience in producing such designs.

Also, while the MRDIMM technology can potentially enable DDR5-17600 modules with actual ICs transferring data in DDR5-8800 mode, there is no such specification for now. Furthermore, 2 TB DDR5 memory modules are non-existent today and are not expected to materialize any time soon, so promising to support up to 48 TB of memory per socket seems a bit premature.

Prodigious performance promises

But while Tachyum's specifications for its Prodigy universal processor look overwhelming, you have to keep in mind that the devices won't launch until the end of the decade. Even with that in mind, the performance promises of Tachyum, compared to non-existent hardware, look outright odd.

Tachyum used to promise that its Prodigy delivered 'orders of magnitude higher AI performance, 3x the performance of the best x86 processors, and 6x HPC performance of the fastest GPGPU, but without providing any quantitative data.'

The company's tune changed in the latest press release, as it described Prodigy as delivering as much as five times the integer throughput, up to 16 times the AI performance, 8X the memory bandwidth, four times the inter-chip and I/O bandwidth, four times greater multi-socket scaling with support for 16 sockets, and roughly double the power efficiency, again without providing any actual numbers. The only exception is perhaps the memory bandwidth claim (3.38 TB/s), but it is not eight times higher compared to AMD's EPYC 9005-series CPUs.

Perhaps addressing concerns about its consistent lack of numbers, Tachyum revealed that its 2nm Prodigy would deliver over '1,000 PFLOPS on inference' and compared this number to Nvidia's Rubin GPU, which is claimed to deliver 50 NVFP4 PFLOPS, suggesting that its processor deals with a similar data format (e.g., FP4, MXFP4, or a proprietary 4-bit format).

However, the claim may contradict common sense, as achieving 20 times higher performance than the Rubin GPU while offering 3.8 times lower memory bandwidth is extremely difficult for bandwidth-bound AI inference workloads. Meanwhile, the 20 times higher performance figure seems to serve as the basis for a claim that a rack-scale Prodigy-based solution will be 21.3 times faster than Nvidia's NVL576, which will feature 144 Rubin Ultra GPU packages.

Perhaps the only useful number revealed by Tachyum this week was '400 FP64 TFLOPS for HPC' performance claim for its top-of-the-range 1,024-core Prodigy Ultimate processor. If true, then the processor is indeed 10 times faster than Nvidia's Blackwell B200 (40 FP64 TFLOPS) at a 400W higher power, and five times faster than AMD's Instinct MI355X (78.6 FP64 TFLOPS), but since we do not know which unit generated that result and how it was achieved, we cannot really make this comparison. In fact, given Tachyum's tendency to create proprietary metrics (like 'TAI PFLOPS'), the '400 DP TFLOPS' figure may not follow standard FLOP accounting (e.g., if it uses DP-equivalent precision).

Another major delay

This week's announcement from Tachyum covers some performance aspects of the Prodigy universal processor, reveals major design changes (multi-chiplet design, 2nm node), discloses alterations of already announced specifications (more cores per chiplet, fewer memory channels per chiplet, PCIe 7.0 support, etc.), and indicates that the company plans to build the processor using a 2nm-class fabrication technology, presumably at TSMC. This is all in a bid to draw a positive picture about the processor. However, all these details point to another major delay of Tachyum's Prodigy.

Around a year ago, Tachyum planned to tape out its 192-core Prodigy implemented on a 5nm-class fabrication technology in 2025. This suggests that the silicon at least existed as an HDL code (RTL complete); its verification and simulation must have been underway or mostly done, so the company only had to synthesize the physical design and then send its GDSII file to its production partner to create photomasks and build the first wafer.

However, now that Tachyum plans to enhance the design and move from a FinFET-based 5nm-class process technology to a gate-all-around transistor-based 2nm-class fabrication node, it has to enhance its high-level design and then go back to the RTL design phase of the chip, as almost all physical constraints of the chip change with the transistor type.

Since everything changes for Prodigy with the redesign and adoption of the 2nm GAA technology, Tachyum will now have to completely rework its RTL from scratch, which will take well over a year (more likely 1.5 years, we're being optimistic), considering that it has a team of between 51 and 200 employees. Full-chip verification and validation (pre-layout) will likely take another 12 to 18 months, given that this is a complex chiplet implemented on a state-of-the-art fabrication technology.

Realistic scenarios point toward late 2030

After the worst functional bugs are shaken out, Tachyum's team may start synthesizing physical design, which will partly overlap with verification and validation, but will still take well over 18 months. After that, the company may proceed to tape out, which will take another half of a year, followed by first-silicon bring-up and post-silicon validation, taking around a year if the first chip works fine (if something needs a respin, add another 18 months). By the time these steps are complete, Prodigy will be ready for mass production. However, both the silicon and platform will take at the very least another six months to ramp.

As a result, Tachyum will be exceedingly lucky if it gets its Prodigy silicon ready to ship in 60 months from now if it starts work today, which means late 2029 in the very best-case scenario, with actual products shipping by 2030. A more realistic scenario is getting the silicon done in around five years (by late 2030), and if the silicon needs a respin, everything gets delayed to 2031 – 2032.

Of course, we are talking about a scenario in which Tachyum handles everything internally. Yet, the company could complete its RTL design internally (as outsourcing microarchitecture-level RTL is rare, risky, expensive, difficult for debugging, and uncommon in the industry) and then outsource everything else to an experienced contract chip designer. In this case, we might see Prodigy in production this decade, if Tachyum is lucky.

But time may not be Tachyum's biggest problem: it may run out of money well before it gets Prodigy silicon from the fab, as designing a 2nm GAA-based chip from RTL to mass production will cost hundreds of millions of dollars, north of $300 million, depending on the complexity of the chip. Perhaps the company can still pull the Prodigy project off with massive outsourcing, but even then, will the processor be competitive with solutions on the market in circa 2030? Furthermore, if Tachyum was willing to outsource Prodigy design earlier, why hasn't it done so already?

Formidable, but costs and competition loom

Tachyum's upgraded new specifications of its Prodigy universal processor make it look like a formidable competitor in the CPU world. However, these new specifications imply that Tachyum must restart much of the design and verification work, pushing the project back by at least four to five years. Given the company's limited resources, history of missed timelines, and the immense cost of designing a cutting-edge 2nm GAA chip, Prodigy may struggle to remain competitive by the time it could realistically ship.

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.