AMD Introduces Radeon Instinct Accelerators, One Based On Vega 10

The Tom’s Hardware U.S. and German teams are still unpacking their bags from spending two days in Sonoma, California with AMD’s CEO, Dr. Lisa Su, CTO Mark Papermaster, Radeon Technologies Group chief architect Raja Koduri, and other company leaders. We naturally covered several topics during this Technology Summit (yes, including Vega 10 and Zen), and they’ll be dissected in the days and weeks to come. First, though, the company wanted to announce a trio of add-in cards designed to accelerate deep learning inference and training, along with an open-source library tuned to exploit their abilities. Nvidia has a big head start in the GPU-accelerated machine intelligence market, so AMD has no time to waste here.

Meet Radeon Instinct

All three cards belong to a new family of products called Radeon Instinct. Despite what’s written on their shrouds, though, the underlying hardware we’re talking about is largely familiar to the Tom’s Hardware audience. The entry-level model, Radeon Instinct MI6, employs a Polaris GPU, the middle MI8 is based on the Radeon R9 Fury Nano’s Fiji processor, and the flagship MI25 boasts an upcoming Vega-based chip. AMD said the entire lineup is passively cooled, but that’s a little disingenuous. Each model is going to have its own airflow requirement in a rack-mounted chassis, and those fans are anything but quiet.

The Radeon Instinct MI6 is specified to include 16GB of memory and hit the same 150W board power figure as AMD’s desktop-class Radeon RX 480. A claimed peak FP16 rate of 5.7 TFLOPS allows us to work backward into a frequency of around 1,240MHz, because the Ellesmere processor handles half-precision FP16 similar to FP32. Further, a memory bandwidth specification of 224GB/s corresponds exactly with the 7Gb/s GDDR5 found on the 4GB Radeon RX 480.

Leveraging The Hardware Through MIOpen

Thus, to go along with its hardware disclosure, AMD also took the wraps off of a GPU-accelerated software library for its Radeon Instinct cards called MIOpen, which should become available in Q1’17, and showed some preliminary benchmark results generated by Baidu Research’s DeepBench tool. For those unfamiliar, DeepBench tests training performance using 32-bit floating-point arithmetic. AMD is reporting only one of the test’s operations (GEMM), so the outcome is fairly cherry-picked. Then again, this is also very early hardware.

AMD is using a GeForce GTX Titan X (Maxwell) as its baseline. That card has a peak FP32 rate of 6.14 TFLOPS using its base frequency. The Titan X (Pascal) jumps to 10.2 TFLOPS, thanks mostly to a significantly higher GPU clock rate. Using its MIOpen library, AMD showed the 8.2 TFLOPS Radeon Instinct MI8 beating Nvidia’s flagship. Meanwhile, the MI25 appears almost 50% faster than Nvidia’s Pascal-based Titan and 90% quicker than the previous-gen Titan X. Based on what we know about Vega’s mixed-precision handling, we can surmise that MI25 has an FP32 rate around 12 TFLOPS. Clearly, to achieve the advantage AMD is already reporting, MIOpen has to be really well optimized for the hardware.

The MIOpen library is but one part of AMD’s software strategy, and it sits above the open-source Radeon Open Compute platform’s support for the Heterogeneous Compute Compiler, HIP (for converting CUDA code to portable C++), OpenCL, and Python. ROCm also incorporates support for NCCL, a library of collective communication routines for multi-GPU topologies, all of the relevant math libraries, and the C++ standard template library.

Early in 2016, AMD introduced its FirePro S-series cards with Multi-User GPU technology based on the Single Root I/O Virtualization (SR-IOV) standard. As an example: This granted VMs direct access to the graphics hardware, and was apparently popular for cloud gaming. AMD is exposing MxGPU on its Radeon Instinct boards for machine intelligence applications, as well, and was quick to point out that Nvidia is much more stringent about segregating its features into separate products.

Putting It All Together

Beyond merely announcing its Radeon Instinct accelerators, AMD also invited a handful of integrators to show off their plans for the cards. At one end of the spectrum, Supermicro populated its existing SuperServer 1028GQ-TFT—a dual-socket Xeon E5-based 1U machine—with four MI25 cards, yielding up to 100 TFLOPS of FP16 performance. Optional Xeon Phi cards don’t even support half-precision math, and the Tesla P100 add-in cards on Supermicro’s qualification list are slower than the mezzanine version, topping out at 18.7 TFLOPS each.

At the other end, Inventec discussed a 39U platform with 120 MI25 GPUs, six 2U 24-bay servers, and six 4U PCIe switches delivering up to 3 PFLOPS.

Naturally, AMD couldn’t help but tease an imminent future when its Zen CPUs and Radeon Instinct accelerators operate together in a heterogeneous compute environment. However, we’re at least six months away from the realization of such a combination, so speculation about its potential will have to wait.

TOPICS

Chris Angelini is an Editor Emeritus at Tom's Hardware US. He edits hardware reviews and covers high-profile CPU and GPU launches.

10 Comments Comment from the forums

IceMyth

Wow, I think AMD now is in the right track to beat NVidia and stand on the top (AMD fanboy comment). Deep learning and neural networks are the kind of things companies should strive for either NVidia, Intel, or AMD. What I would love to see a comparison of what the comming AMD GPUs will introduce to the public in term of Gaming, Design and all fields but as the article said still need sometime.
Reply
AMD's CEO Lisa Su seems to really know what she's doing. It should be a model example of why sometimes you need an engineer CEO instead of a sales type.
Reply
captaincharisma

AMD has turned into a company that's all talk but as always the reults will be lackluster. but hey, it makes the shareholders happy
Reply
jaber2

Does this mean my next CPU will be a GPU?
Reply
falchard

But AMD has been in this market for years, how are they late when they are several years early? They were even competing with nVidia on the contracts they got recently.
Reply
JakeWearingKhakis

JABER2 - No, its just that the way GPU's are made to calculate they just happen to be very good at neural networks, and things like bitcoin mining. I believe they will be (and already are) helping the CPU when it needs the help and is not being used for graphics processing at the time the CPU needs it.

CAPTAINCHARISMA - They are not all talk. They just went the wrong way with CPUs for a while. Which is why hyperthreading is coming back with ZEN. They aren't all talk with GPUs either. What you've seen this year is their low-mid range line up. once VEGA and the 490X come, there is no doubt it will be wonderful tech.

Nvidia doesn't make CPUs (other than Tegra?) and AMD being the smaller company can keep up without stealing technology and keeping it open source.

Intel is a mega corporation, and AMD having less than I think 10k employees is on their ass. AMD gets way less credit than it deserves.
Reply
falchard

It's amazing how much changed in the last couple months with the RX 480 and GTX 1060. Before you could say the 1060 will perform better in DX11 and worse in Vulkan/DX12. Now it's the RX 480 will perform the same in DX11 and better in Vulkan/DX12. I guess this means the actual competition for the RX 480 8GB was the GTX 1070.
Reply
alidan

18993940 said:
Wow, I think AMD now is in the right track to beat NVidia and stand on the top (AMD fanboy comment). Deep learning and neural networks are the kind of things companies should strive for either NVidia, Intel, or AMD. What I would love to see a comparison of what the comming AMD GPUs will introduce to the public in term of Gaming, Design and all fields but as the article said still need sometime.

tomorrow should be the stream for zen, given they also showed off vega gpu getting 70-100fps at 4k ultra in doom, may see vega there too.

18996427 said:
It's amazing how much changed in the last couple months with the RX 480 and GTX 1060. Before you could say the 1060 will perform better in DX11 and worse in Vulkan/DX12. Now it's the RX 480 will perform the same in DX11 and better in Vulkan/DX12. I guess this means the actual competition for the RX 480 8GB was the GTX 1070.

nah, the 1070 to far ahead hardware wise, there is only so much ground to make up in drivers, but the 480 being between a 1060 and 1070 is about right, to beat a 1070 they would need to refine the process and make more of the silicons higher end, some of them can get damn near 1500mhz clock rate some of them can barely go above base, if the process gets improved and they can put out more higher end parts, it could compete with a 1070, but likely wont happen unless they rebrand for navi.
Reply
omikronsc

"AMD said the MI6 and MI8 are ideal for GPU-accelerated inference, which is what a neural network does after it has gone through its more intensive training stage."

So MI6 and MI8 is a crap. Amdahls law points that the best speed up you get from speeding up the LONGEST operation in a process. Image shows that training takes hours (days in fact) and inference takes seconds (milliseconds in fact) and AMD is best in speeding up inference part.

I'm wondering if this is what AMD really presented or author of this article messed it up.
Reply
problematiq

18997563 said:
"AMD said the MI6 and MI8 are ideal for GPU-accelerated inference, which is what a neural network does after it has gone through its more intensive training stage."

So MI6 and MI8 is a crap. Amdahls law points that the best speed up you get from speeding up the LONGEST operation in a process. Image shows that training takes hours (days in fact) and inference takes seconds (milliseconds in fact) and AMD is best in speeding up inference part.

I'm wondering if this is what AMD really presented or author of this article messed it up.

The problem with benching the training portion is that it's heavily dependent on what you are training. It also depends on your training style, my brother and I trained that same thing (face recognition) on two boxes with identical hardware yet it took his box a whole day an a half longer to train than mine. (his performed the task better in the end though.)
Reply

Show more comments