Microsoft introduces newest in-house AI chip — Maia 200 is faster than other bespoke Nvidia competitors, built on TSMC 3nm with 216GB of HBM3e

Microsoft's new Maia 200 AI data center GPU — (Image credit: Microsoft)

Microsoft has introduced its newest AI accelerator, the Microsoft Azure Maia 200. The new in-house AI chip is the next generation of Microsoft's Maia GPU line, a server chip designed for inferencing AI models with ludicrous speeds and feeds to outperform the custom offerings from hyperscaler competitors Amazon and Google.

Maia 200 is labelled Microsoft's "most efficient inference system" ever deployed, with all of its press releases splitting time between praising its big performance numbers and stressing Microsoft's lip service to environmentalism. Microsoft claims the Maia 200 gives 30% more performance per dollar than the first-gen Maia 100, an impressive feat considering the new chip also technically advertizes a 50% higher TDP than its predecessor.

Swipe to scroll horizontally

Maia 200 vs Amazon Trainium3 vs Nvidia Blackwell B300 Ultra
Row 0 - Cell 0	Azure Maia 200	AWS Trainium3	Nvidia Blackwell B300 Ultra
Process technology	N3P	N3P	4NP
FP4 petaFLOPS	10.14	2.517	15
FP8 petaFLOPS	5.072	2.517	5
BF16 petaFLOPS	1.268	0.671	2.5
HBM Memory Size	216 GB HBM3e	144 GB HBM3e	288 GB HBM3e
HBM Memory Bandwidth	7 TB/s	4.9 TB/s	8 TB/s
TDP	750 W	???	1400 W
Bi-directional Bandwidth	2.8 TB/s	2.56 TB/s	1.8 TB/s bidirectional

As can be seen above, the Maia 200 offers a clear lead in raw compute power compared to the Amazon in-house competition, and raises an interesting conversation next to Nvidia's top dog GPU. Obviously, direct comparison of the two is a fool's errand; no outside customers can purchase the Maia 200 directly, the Blackwell B300 Ultra is tuned for much higher-powered use-cases than the Microsoft chip, and the software stack for Nvidia launches it miles ahead of any contemporary.

The Maia 200 does beat the B300 in efficiency, however, a big win in a day where public opinion against AI's environmental effects is steadily mounting. The Maia 200 operates at almost half of B300's TDP (750W vs 1400W), and if it's anything like the Maia 100, it will operate beneath its theoretical maximum TDP; Maia 100 was designed to be a 700W chip, but Microsoft claims it was limited to 500W in operation.

Maia 200 is tuned for FP4 and FP8 performance, focusing in on serving customers that are inferencing AI models hungry for FP4 performance, rather than more complex operations. A lot of Microsoft's R&D budget for the chip seems to have been put into the memory hierarchy that exists within its 272MB of high-efficiency SRAM bank, which is partitioned into "multi‑tier Cluster‑level SRAM (CSRAM) and Tile‑level SRAM (TSRAM)," accommodating increased operating efficiency and a philosophy of spreading workloads intelligently and evenly across all HBM and SRAM dies.

It's difficult to measure Maia 200's improvements over its predecessor, the Maia 100, as Microsoft's official stat sheets for both chips have nearly zero overlap or shared measurements. All we can say this early is that Maia 200 will run hotter than Maia 100 did, and that it is apparently 30% better on a performance-per-dollar metric.

Maia 200 has already been deployed in Microsoft's US Central Azure data center, with future deployments announced for US West 3 in Phoenix, AZ, and more to come as Microsoft receives more chips. The chip will be part of Microsoft's heterogeneous deployment, operating in tandem with other different AI accelerators as well.

Maia 200, originally codenamed Braga, made waves for its heavily delayed development and release. The chip was intended for 2025 release and deployment, maybe even beating B300 out of the gates, but this was not meant to be. Microsoft's next hardware release isn't certain, but it will likely be fabricated on Intel Foundry's 18A process, per reports in October.

Microsoft's efficiency-first messaging surrounding the Maia 200 follows its recent trends of stressing the corporation's concern for communities near its data centers, taking great lengths to deafen the backlash to the AI boom. Microsoft CEO Satya Nadella recently spoke at the World Economic Forum on how if companies cannot help the public see the supposed perks of AI development and data center buildout, they risk losing "social permission" and creating a dreaded AI bubble.

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

TOPICS

Sunny Grimm is a contributing writer for Tom's Hardware. He has been building and breaking computers since 2017, serving as the resident youngster at Tom's. From APUs to RGB, Sunny has a handle on all the latest tech news.

2 Comments Comment from the forums

bit_user

The article said:
The new in-house AI chip is the next generation of Microsoft's Maia GPU line, a server chip designed for inferencing AI models with ludicrous speeds and feeds to outperform the custom offerings from hyperscaler competitors Amazon and Google.
They don't use the term "GPU" anywhere on their site. No indication that it even has a GPU-like SIMD/SMT-oriented programming model, either.

A lot of purpose-built AI accelerators are much more DSP-like, in their microarchitecture and programming model. I wouldn't toss around the term "GPU" to describe things that have neither any intrinsic graphics capabilities nor even (as far as we know) a GPU-like ISA.

Some alternate nouns I use: "accelerator" and "processor". Use modifiers like "AI", "neural network", and "training"/"inference" as appropriate.

P.S. Thanks for including the comparison table! I'd have liked to see a few more rows, such as transistor count and maybe SRAM, but I understand that takes some digging. I do think it probably deserves an asterisk that the Blackwell GPUs implement "NVFP4", which uses additional metadata and structure to squeeze more range & precision out of ~4-bit weights. The Microsoft link doesn't say there's anything special about their FP4 representation. Furthermore, I'm pretty sure that B300 Ultra uses two chips, FWIW. Doesn't make the comparison invalid, but does explain some of the disparities - especially when it's made on a larger node.
Reply
bit_user

BTW, if I understand correctly, the B300 Ultra is comprised of two GB110 chips, each with 51.2 MB of L2 cache. The each have 104B transistors, compared to Maia 200's 140B.

Source:
https://www.techpowerup.com/gpu-specs/nvidia-gb110.g1112
Also, there seem to be rumors that this chip was co-designed with Marvell, but that Microsoft might be switching to Broadcom for future designs. No official word on any of that, as far as I could find.
Reply