Meta reveals four new MTIA chips built for AI inference

Meta today announced four successive generations of its in-house Meta Training and Inference Accelerator (MTIA) chips, all developed in partnership with Broadcom and scheduled for deployment within the next two years. “We’ve developed a competitive strategy for MTIA by prioritizing rapid, iterative development, reads Meta’s press release, along with an inference-first focus and frictionless adoption by building natively on industry standards.

Go deeper with TH Premium: Chipmaking

The four new chips are MTIA 300, 400, 450, and 500. MTIA 300 is already in production for ranking and recommendations training, while 400 is currently in lab testing ahead of data center deployment. MTIA 450 and 500 are targeted at AI inference and are scheduled for mass deployment in early 2027 and later in 2027, respectively. According to Meta's technical blog, from MTIA 300 through to MTIA 500, HBM bandwidth increases 4.5 times, and compute FLOPs increase 25 times.

Meta says MTIA 450 doubles the HBM bandwidth of MTIA 400, describing it as “much higher than that of existing leading commercial products,” or, in other words, Nvidia’s H100 and H200. MTIA 500 then adds another 50% HBM bandwidth on top of 450, along with up to 80% more HBM capacity. Indeed, it’s HBM bandwidth and not raw FLOPs that are the main bottleneck during the decode phase of transformer inference, and mainstream GPUs are architected to maximize FLOPs for large-scale pre-training. This means they carry a cost and power overhead that Meta says is unnecessary for inference workloads.

Swipe to scroll horizontally

Row 0 - Cell 0	MTIA 300	MTIA 400	MTIA 450	MTIA 500
Workload Focus	R&R Training	General	AI Inference	AI Inference
Module TDP	800 W	1,200 W	1,400 W	1,700 W
HBM Bandwidth	6.1 TB/s	9.2 TB/s	18.4 TB/s	27.6 TB/s
HBM Capacity	216 GB	288 GB	288 GB	384-512 GB
MX4 Performance	-	12 PFLOPS	21 PFLOPS	30 PLOPS
FP8/MX8 Performance	1.2 PFLOPS	6 PFLOPS	7 PFLOPS	10 PFLOPS
BF16 Performance	0.6 PLOPS	3 PFLOPS	3.5 PFLOPS	5 PFLOPS

Meta's approach also includes hardware acceleration for FlashAttention and mixture-of-experts feed-forward network computation, plus custom low-precision data types co-designed for inference. MTIA 450 supports MX4, delivering six times the MX4 FLOPs of FP16/BF16, with mixed low-precision computation that avoids the software overhead of data type conversion.

In terms of eventual deployment, MTIA 400, 450, and 500 will all use the same chassis, rack, and network infrastructure, meaning each new chip generation drops into the existing physical footprint for easy interchange. It’s this modularity, Meta says, that’s behind MTIA’s roughly six-month chip cadence, which itself is much faster than the industry’s typical one-to-two year cycle.

The software stack runs natively on PyTorch, vLLM, and Triton, with support for torch.compile and torch.export so that production models can be deployed simultaneously on both GPUs and MTIA without MTIA-specific rewrites. Meta said it has already deployed hundreds of thousands of MTIA chips across its apps for inference on organic content and ads.

All this comes just two weeks after Meta disclosed a long-term, $100 billion AI infrastructure agreement with AMD, suggesting that there’s a broader effort at play to reduce dependence on Nvidia across different parts of Meta’s AI stack while keeping MTIA at the core of inference workloads.