Meta is reportedly testing its first RISC-V based AI chip for AI training

A Broadcom-designed processor. Image is for illustrative purposes only. (Image credit: Meta)

Meta was one of the first companies to build its RISC-V-based chips for AI inference several years ago to cut costs and reduce reliance on Nvidia. Reuters reports that the company went one step further and designed (presumably with Broadcom's assistance) its in-house accelerator for AI training. If the chip meets Meta's goals, it may reduce its reliance on high-end Nvidia AI GPUs —such as H100/H200 and B100/B200—for training advanced large-language models.

Meta and Broadcom have taped out Meta's first AI training accelerator with TSMC; the latter produced the first working samples of these chips, and the partners have successfully brought up the unit, according to the report. By now, Meta has started with a limited deployment of the accelerator, assessing its performance before scaling up production and deployment. It is unclear whether Meta's engineers are running benchmarks on the new chip; it has already been deployed to make some useful work.

Custom RISC-V Accelerator For AI

Since the processor is designed for AI training — which means processing vast amounts of data — expect the processor to feature HBM3 or HBM3E memory. Considering that we are dealing with a bespoke processor, Meta defined its supported data formats and instructions to optimize die size, power consumption, and performance. As for performance, the accelerator has to offer competitive performance-per-watt characteristics with Nvidia's up-to-date AI GPUs, such as H200, B200, and possibly next-generation B300.

The chip is the latest addition to Meta's Meta Training and Inference Accelerator (MTIA) program. The program has faced various setbacks, including when development was halted at similar stages.

For example, discontinued its internal inference processor after it failed to meet its performance and power targets during limited deployment tests. This failure led Meta to shift its strategy in 2022, placing large orders for Nvidia GPUs to meet its immediate AI processing requirements.

Meta's Strive for AI Hardware Independence

Since then, Meta has become one of Nvidia's largest customers, acquiring tens of thousands of GPUs. These units have been critical in training AI models for recommendations, advertisements, and the Llama Foundation model series. Also, the green company's GPUs have been employed for inference processes, supporting interactions for over three billion daily users across Meta's platforms, according to Reuters.

Despite these challenges, Meta has continued advancing its custom silicon program. Last year, Meta began using an MTIA chip for inference tasks, and looking ahead, Meta's leadership has outlined plans to start using its custom chips for AI training by 2026. The plan is to gradually increase usage if the chip meets performance and power targets, which is a critical component of Meta's long-term goal to design more customized hardware solutions for its data center operations.

One interesting thing to note is that MTIA's accelerators for inference use open-source RISC-V cores. This enables Meta to customize instruction set architecture as it wishes to meet its requirements at its cadence, but on the other hand, it does not need to pay royalties to any third party. It is unclear whether MTIA's training accelerator is also based on the RISC-V ISA, but this is possible. If this is true, Meta might have developed one of the industry's highest-performing RISC-V-based chips.

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

8 Comments Comment from the forums

Jame5

RSIC-V? Is that an offshoot of RISC-V?
Reply
usertests

Jame5 said:
RSIC-V? Is that an offshoot of RISC-V?
A very sickly offshoot.
Reply
COLGeek

Fixed in the forums. Hopefully the author will see and adjust on the article side (title).
Reply
ezst036

Gotta fix the messed up URL also.

tomshardware.com/tech-industry/artificial-intelligence/meta-is-reportedly-testing-its-first-rsic-v-based-ai-chip-for-ai-training
Reply
COLGeek

ezst036 said:
Gotta fix the messed up URL also.

tomshardware.com/tech-industry/artificial-intelligence/meta-is-reportedly-testing-its-first-rsic-v-based-ai-chip-for-ai-training
I saw that, but that would break the link from the forum side.
Reply
bit_user

The article said:
One interesting thing to note is that MTIA's accelerators for inference use open-source RISC-V cores. This enables Meta to customize instruction set architecture as it wishes to meet its requirements at its cadence, but on the other hand, it does not need to pay royalties to any third party. It is unclear whether MTIA's training accelerator is also based on the RISC-V ISA, but this is possible.
The authors on this site continue to struggle with language around the open standard that is RISC-V.

Not knowing whether it's even using RISC-V, the author then walks onto a speculative branch of assuming the cores were designed in-house, which is pretty much the only way they could've completely avoided paying for them. It seems most likely they were designed by Broadcom or maybe SiFive, in either case meaning that Meta would've paid something for them (whether it was a one-time up-front charge or an ongoing royalty is more of a footnote).
Reply
bit_user

Edit: I think it's a different chip, but they discussed their next-gen inference accelerator at Hot Chips 2024, where they went into a little detail about its use of RISC-V (but sadly not the cores' microarchitecture or saying whether they were internal or licensed):
https://www.servethehome.com/meta-ai-acceleration-in-the-next-gen-meta-mtia-for-recommendation-inference-risc-v/
In terms of memory, that presentations claims 256 MiB of on-die SRAM and 16-channel LPDDR5, providing 128 GiB at ~205 GB/s. So, they must mean 16-bit channels, because that'd be a 256-bit datapath which aligns with the capacity and bandwidth numbers. I've got to say that's rather underwhelming for an ASIC made on TSMC N5 that's 421 mm^2.

I think the key thing is that they're relying on batching (via the on-die SRAM) and model parallelism to minimize the chip's external DRAM bandwidth requirements. They show 2 processors + 256 GB LPDDR5 per PCIe card, with a single server hosting 12 cards via two PCIe switches.

More detail, although beware that this page covers a bunch of other presentations, as well. There's a table of contents, at the top. The Meta chip is 2nd.
https://irrationalanalysis.substack.com/p/hot-chips-2024-irrational-recap
Reply
Mr Majestyk

Every company that can is looking for ways to wed themselves off Leatherman's proprietary iron grip. As much as I despise Meta, good to see them make this move.
Reply

Show more comments