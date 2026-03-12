Meta reveals four new MTIA chips built for AI inference — to be released on a six-month cadence

The chiplet-based accelerators are designed to run AI inference more efficiently than GPUs optimized for training workloads.

(Image credit: Meta)

Meta today announced four successive generations of its in-house Meta Training and Inference Accelerator (MTIA) chips, all developed in partnership with Broadcom and scheduled for deployment within the next two years. “We’ve developed a competitive strategy for MTIA by prioritizing rapid, iterative development, reads Meta’s press release, along with an inference-first focus and frictionless adoption by building natively on industry standards.

Row 0 - Cell 0

MTIA 300

MTIA 400

MTIA 450

MTIA 500

Workload Focus

R&R Training

General

AI Inference

AI Inference

Module TDP

800 W

1,200 W

1,400 W

1,700 W

HBM Bandwidth

6.1 TB/s

9.2 TB/s

18.4 TB/s

27.6 TB/s

HBM Capacity

216 GB

288 GB

288 GB

384-512 GB

MX4 Performance

-

12 PFLOPS

21 PFLOPS

30 PLOPS

FP8/MX8 Performance

1.2 PFLOPS

6 PFLOPS

7 PFLOPS

10 PFLOPS

BF16 Performance

0.6 PLOPS

3 PFLOPS

3.5 PFLOPS

5 PFLOPS

Meta's approach also includes hardware acceleration for FlashAttention and mixture-of-experts feed-forward network computation, plus custom low-precision data types co-designed for inference. MTIA 450 supports MX4, delivering six times the MX4 FLOPs of FP16/BF16, with mixed low-precision computation that avoids the software overhead of data type conversion.

In terms of eventual deployment, MTIA 400, 450, and 500 will all use the same chassis, rack, and network infrastructure, meaning each new chip generation drops into the existing physical footprint for easy interchange. It’s this modularity, Meta says, that’s behind MTIA’s roughly six-month chip cadence, which itself is much faster than the industry’s typical one-to-two year cycle.

The software stack runs natively on PyTorch, vLLM, and Triton, with support for torch.compile and torch.export so that production models can be deployed simultaneously on both GPUs and MTIA without MTIA-specific rewrites. Meta said it has already deployed hundreds of thousands of MTIA chips across its apps for inference on organic content and ads.

All this comes just two weeks after Meta disclosed a long-term, $100 billion AI infrastructure agreement with AMD, suggesting that there’s a broader effort at play to reduce dependence on Nvidia across different parts of Meta’s AI stack while keeping MTIA at the core of inference workloads.

