How Nvidia's $20 billion Groq 3 LPU deal reshapes the Nvidia Vera Rubin Platform — Samsung 4nm process serves as bedrock for SRAM-based AI accelerator chip

MEMBER EXCLUSIVE
Rubin GPU next to Groq LPU
(Image credit: Nvidia)

Nvidia unveiled the Groq 3 language processing unit at GTC 2026 in San Jose on Monday, marking the first chip to emerge from its $20 billion licensing and talent deal with AI inference startup Groq, which was struck on Christmas Eve last year. The SRAM-based inference accelerator slots into the Vera Rubin platform as a dedicated decode-phase co-processor, and Nvidia plans to ship it in Q3 2026, manufactured by Samsung on a 4nm process. It is the company's first rack-scale product built around non-GPU silicon — and its arrival has already displaced a homegrown Nvidia chip from the roadmap.

The LP30 chip at the heart of the Groq 3 LPX rack carries 512 MB of on-chip SRAM per die, delivering 150 TB/s of memory bandwidth. That figure dwarfs the 22 TB/s available from the 288 GB of HBM4 on each Rubin GPU. A full LPX rack houses 256 LPUs for a total of 128GB of SRAM and 40 PB/s of aggregate bandwidth. Nvidia claims the LPX rack, paired with a Vera Rubin NVL72, delivers 35 times higher throughput per megawatt than Blackwell NVL72 alone for trillion-parameter models, at a target price point of $45 per million tokens.

Latest Videos From
Luke James
Contributor

Luke James is a freelance writer and journalist.  Although his background is in legal, he has a personal interest in all things tech, especially hardware and microelectronics, and anything regulatory.