AMD announces MI350P PCIe AI accelerator card with 144GB of HBM3E — roughly 40% faster in FP16 and FP8 theoretical compute compared to Nvidia's H200 NVL competitor

AMD Instinct MI350P PCIe Card — (Image credit: AMD)

AMD has launched a new member of the MI350-series that comes in a PCIe form factor. The new Instinct MI350P comes with 128 CUs and 144GB of HBM3E memory and is designed to be a drop-in upgrade solution for existing air-cooled servers.

Go deeper with TH Premium: Chipmaking

The MI350P comes in a 10.5" dual-slot card with a fanless cooling solution designed around a 600W power envelope (the card is designed to be cooled by chassis fans in a rack-mounted server). However, the card can be configured to run at a lower 450W power target to maintain compatibility with more thermally or power-constrained chassis.

Swipe to scroll horizontally

AMD MI350X and MI355X specficaitions
Specifications (PEAK THEORETICAL)	AMD Instinct MI350P GPU	AMD Instinct MI325X GPU	AMD INSTINCT MI350X GPU	AMD INSTINCT MI350X PLATFORM	AMD INSTINCT MI355X GPU	AMD INSTINCT MI355X PLATFORM
GPUs	Instinct MI350P PCIe	Instinct MI325X OAM	Instinct MI350X OAM	8 x Instinct MI350X OAM	Instinct MI355X OAM	8 x Instinct MI355X OAM
GPU Architecture	CDNA 4	CDNA 3	CDNA 4	CDNA 4	CDNA 4	CDNA 4
Dedicated Memory Size	144 GB HBM3E	256 GB HBM3E	288 GB HBM3E	2.3 TB HBM3E	288 GB HBM3E	2.3 TB HBM3E
Memory Bandwidth	4 TB/s	6 TB/s	8 TB/s	8 TB/s per OAM	8 TB/s	8 TB/s per OAM
FP64 Performance	36 TFLOPs	Row 4 - Cell 2	72 TFLOPs	577 TFLOPs	78.6 TFLOPS	628.8 TFLOPs
FP16 Performance	2.3 PFLOPS	2.61 PFLOPS	4.6 PFLOPS	36.8 PFLOPS	5 PFLOPS	40.2 PFLOPS
FP8 Performance	4.6 PFLOPS	5.22 PFLOPS	9.2 PFLOPs	73.82 PFLOPs	10.1 PFLOPs	80.5 PFLOPs
FP6 Performance	Row 7 - Cell 1	Row 7 - Cell 2	18.45 PFLOPS	147.6 PFLOPS	20.1 PFLOPS	161 PFLOPS
FP4 Performance*	Row 8 - Cell 1	Row 8 - Cell 2	18.45 PFLOPS	147.6 PFLOPS	20.1 PFLOPS	161 PFLOPS

The card's specs are exactly half of what AMD's high-end MI350X and MI355X AI GPUs offer. The MI350P runs off of AMD's CDNA4 architecture and is built on TSMC's 3nm and 6nm FinFET process. The GPU comes with 8,192 cores, 128 CUs, 512 Matrix Cores, and has a 2.2GHz max clock speed. The GPU is paired to 144GB of HBM3E memory with 4TB/s of bandwidth, and a 128MB last-level cache.

Latest Videos From

Watch full video here:

Just like the MI350X and MI355X, the MI350P offers native support for lower-precision MXFP6 and MXFP4 to accelerate LLMs. Up to eight MI350P cards can be paired together in a single system, allowing data centers to scale performance based on how many cards are used. The MI350P is geared towards small, medium, and large AI workloads surrounding inference and RAG pipelines. AMD claims the GPU is the fastest enterprise PCIe card with an estimated 2,299 TFLOPs and 4,600 peak TFLOPs of performance using MXFP4.

The introduction of the MI350P finally gives AMD a proper competitor to Nvidia's fastest PCIe AI accelerator, currently the H200 NVL. The MI350P is based on a newer architecture and edges out the H200 NVL in performance, featuring 20% better FP64, 43% better FP16, and 39% better FP8 theoretical compute performance.

Image 1 of 17

AMD Instinct MI350P PCIe Card Press Deck — (Image credit: AMD)

Nvidia has not announced a PCIe version of its latest B200 Blackwell GPUs running HBM memory, so for now, AMD will have the most bleeding-edge AI accelerator that fits in a PCIe form factor. It remains to be seen how widely adopted AMD's new card will be, given Nvidia's hold on the market with CUDA. But AMD is working to improve its competing ROCm software stack, as the GPU maker explained to us at CES 2026.

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.

7 Comments Comment from the forums

Kindaian

Can I have one of this with "slow" memory and a cheap price point? I'm asking as an hobbyist that would like to be able to run high memory LLM models locally on the cheap. Asking 20k (imaginary number but probably not far from reality) for one card like this is way too expensive for an hobby consumer AI computer.
Reply
User of Computers

Kindaian said:
Can I have one of this with "slow" memory and a cheap price point? I'm asking as an hobbyist...
AMD: No. you're too poor.
Reply
Kindaian

Me: As an hobbyist, I just don't throw silly money on my project. If you (AMD) don't do it, someone else will, if not, as an hobbyist, I can afford to work around the lack of offer at a reasonable price point.
Reply
GenericUser2001

Kindaian said:
Can I have one of this with "slow" memory and a cheap price point? I'm asking as an hobbyist that would like to be able to run high memory LLM models locally on the cheap. Asking 20k (imaginary number but probably not far from reality) for one card like this is way too expensive for an hobby consumer AI computer.
Maybe look at some of the Ryzen Ai Max machines? If you look around you can find those with 128 GB of RAM for under $3k. I think that is what AMD intends to be the hobbyist local AI option.
Reply
User of Computers

Kindaian said:
Me: As an hobbyist, I just don't throw silly money on my project. If you (AMD) don't do it, someone else will, if not, as an hobbyist, I can afford to work around the lack of offer at a reasonable price point.
AMD: so buy the MI350P.
Reply
bit_user

Kindaian said:
Can I have one of this with "slow" memory and a cheap price point? I'm asking as an hobbyist that would like to be able to run high memory LLM models locally on the cheap. Asking 20k (imaginary number but probably not far from reality) for one card like this is way too expensive for an hobby consumer AI computer.
Sad to say (at least, for me it is, since I'm not a Mac-head), the I think best thing for this was the high-memory Mac Studio machines with M3 Ultra and 512 GB of RAM (although 256 GB is now the only one you can buy). They can do 819 GB/s of memory bandwidth. Their NPU is only capable of 36 TOPS, but I'm not sure how much you could get out of them by also harnessing the GPU and CPU cores (which have matrix cores).
Reply
bit_user

User of Computers said:
AMD: No. you're too poor.
Eh, if you look at the actual silicon inside these cards, the price isn't as unreasonable as some cards out there. For instance, cards like the RTX Pro 6000 Blackwell, that are basically just gaming GPUs with more memory and slightly lower clockspeeds. This MI350P has 2.23x the bandwidth of even that card, as well as 50% more memory and 14.1% more TOPS.

Kindaian said:
Me: As an hobbyist, I just don't throw silly money on my project. If you (AMD) don't do it, someone else will, if not, as an hobbyist, I can afford to work around the lack of offer at a reasonable price point.
There's always a nonlinear relationship between cost and time. If you want faster hardware, you either pay an exorbitant price for it now, or you just wait a few years.

GenericUser2001 said:
Maybe look at some of the Ryzen Ai Max machines? If you look around you can find those with 128 GB of RAM for under $3k. I think that is what AMD intends to be the hobbyist local AI option.
They sort of backed into this one, almost by accident. Their original goal was just to build a Mac Pro competitor. They didn't initially set out to make it into an edge AI machine that could go toe-to-toe with competitors like Nvidia's GB10.

Lead times on CPUs are like 3-4 years. So, we'll have to see what they bring to market in 2027 or 2028, in order to see what their most competitive edge AI offering looks like.
Reply

Show more comments