Skip to main content

Radeon Instinct MI100 Arcturus Early Specifications Show Impressive 200W TDP

Radeon Instinct Accelerator

Radeon Instinct Accelerator (Image credit: AMD)

It's been a while since we've seen any leaks regarding Arcturus, AMD's rumored upcoming professional accelerator. Respected hardware leaker @KOMACHI_ENSAKA has shared what appears to be the potential specifications for the Radeon Instinct MI100, which is reportedly based on the Arcturus silicon.

Although it's not confirmed, Arcturus is believed to be a derivative of AMD's Vega microarchitecture. Like the other Radeon Instinct accelerators, Arcturus is likely coming out of TSMC's 7nm FinFET furnace. However, it remains to be seen whether it'll be the 7nm or 7nm+ process node.

An early prototype of the Radeon Instinct MI100 suggests that the accelerator utilizes a variant (D34303) of the Arcturus XL die. It seemingly runs with a 1,090 MHz base clock and 1,333 MHz boost clock. There is no mention of the number of Stream Processors (SPs) on the Radeon Instinct MI100, but the Arcturus silicon is rumored to carry up to 128 Compute Units (CUs), which would equal to a whopping 8,192 SPs.

It's not carved in stone, but the model name usually holds some clue to the accelerator's performance numbers. The Radeon Instinct MI60 and MI50 accelerators offer up to 58.9 TFLOPS and 53 TFLOPS of peak INT8 performance, respectively. Therefore, it's sound to assume that the Radeon Instinct MI100's INT8 performance should scale up to the 100 TFLOPS mark.

Instinct MI100*Instinct MI60Instinct MI50 (32GB)Instinct MI50 (16GB)
Architecture (GPU)Arcturus (Arcturus XL)GCN 5.1 (Vega 20)GCN 5.1 (Vega 20)GCN 5.1 (Vega 20)
Compute Units?646060
Stream Processors?4,0963,8403,840
Peak Half Precision (FP16) Performance?29.5 TFLOPS26.5 TFLOPS26.5 TFLOPS
Peak Single Precision (FP32) Performance?14.7 TFLOPS13.3 TFLOPS13.3 TFLOPS
Peak Double Precision (FP64) Performance?7.4 TFLOPS6.6 TFLOPS6.6 TFLOPS
Peak INT8 Performance?58.9 TFLOPS53 TFLOPS53 TFLOPS
Memory Size32GB32GB32GB16GB
Memory TypeHBM2HBM2HBM2HBM2
Memory Clock1 GHz - 1.2 GHz1 GHz1 GHz1 GHz
Memory Interface?4096-bit4096-bit4096-bit
Memory Bandwidth?1,024 GBps1,024 GBps1,024 GBps
Total Board Power200W300W300W300W

*Specifications are unconfirmed.

The Radeon Instinct MI100 will reportedly show up with 32GB of HBM2 memory that could operate at 1 GHz or 1.2 GHz. The Radeon Instinct MI60 and MI50 have their memory running at 1 GHz across a 4,096-bit memory interface to provide a memory bandwidth up to 1,024 GBps. 

If the Radeon Instinct MI100 retains the 4,096-bit memory bus and 1 GHz memory, it would deliver the same level of memory bandwidth as the Radeon Instinct MI60 and MI50. However, if the memory is clocked at 1.2 GHz, the Radeon Instinct MI100 can supply a memory bandwidth up to 1,229 GBps.

The leaker highlights that the test board for the Radeon Instinct MI100 is rated for 200W, however, the final product could vary. Assuming that AMD maintains this value, the Radeon Instinct MI100 would be a very efficient performance monster considering that the existing Radeon Instinct MI60 and MI50 conform to a 300W TBP (Total Board Power). A 100W improvement almost sounds too good to be true, but we're crossing our fingers that AMD can pull it off.

AMD reorganized its Radeon Instinct accelerator product stack a few months ago. The chipmaker relegated the Radeon Instinct MI60, which was the previous flagship, to a request basis. The Radeon Instinct MI50 (32GB) has since taken the Radeon Instinct MI60's place on the throne. However, AMD will, in all likelihood, pass the flagship mantle down to the Radeon Instinct M100 once the Arcturus-powered accelerator debuts.

  • bit_user
    Phoronix has long reported that open source driver patches indicate Arcturus will lack any 3D graphics hardware engines. This is to be a pure compute-accelerator die.

    https://www.phoronix.com/scan.php?page=news_item&px=AMD-Arcturus-Linux-5.5
    The Radeon Instinct MI60 and MI50accelerators offer up to 58.9 TFLOPs and 53 TFLOPs of peak INT8 performance
    Two issues, here. Both nit picks, but it's what I do.
    The 'S' in TFLOPS should be capitalized, because it's part of the unit: Trillion Floating Point Operations Per Second.
    When describing integer performance, you omit the "Floating Point" part, leaving just TOPS.I noticed the table repeats these same errors. TFLOPs should be either TFLOPS or TOPS, depending on the row.
    Reply
  • alextheblue
    bit_user said:
    Phoronix has long reported that open source driver patches indicate Arcturus will lack any 3D graphics hardware engines. This is to be a pure compute-accelerator die.
    It could be built on Vega or Vega-derived CUs (as rumored), and they disable anything unnecessary. Only time will tell. Not sure how much I buy any of these rumored specs though.
    Reply
  • bit_user
    alextheblue said:
    It could be built on Vega or Vega-derived CUs (as rumored), and they disable anything unnecessary. Only time will tell. Not sure how much I buy any of these rumored specs though.
    Lisa Su has acknowledged that AMD will be pursuing a bifurcated strategy of HPC and consumer products. We've already seen the beginnings of this, with Vega 20, however it makes sense that they could go further.

    I'm not really sure what would be gained by disabling "anything unnecessary", as they already do clock gating that dynamically powers down parts of the chip that are idle. I've heard estimates that graphics hardware blocks consume up to 25% of their die space, which would only increase with things like ray tracing and some of their other recent additions (DSBR, mesh shaders, etc.). So, the incentive is there to reclaim that for general-purpose compute hardware.
    Reply
  • alextheblue
    bit_user said:
    Lisa Su has acknowledged that AMD will be pursuing a bifurcated strategy of HPC and consumer products. We've already seen the beginnings of this, with Vega 20, however it makes sense that they could go further.
    Yeah that's been the case for them internally for a while now I suspect, given RDNA's focus. It remains to be seen how many resources they will throw at each design.
    bit_user said:
    I'm not really sure what would be gained by disabling "anything unnecessary", as they already do clock gating that dynamically powers down parts of the chip that are idle. I've heard estimates that graphics hardware blocks consume up to 25% of their die space, which would only increase with things like ray tracing and some of their other recent additions (DSBR, mesh shaders, etc.). So, the incentive is there to reclaim that for general-purpose compute hardware.
    I'm not sure if there's anything else in the chip related to that which they could gate, that they aren't already. I mostly meant if they are using existing designs, and the graphics hardware is present, they aren't exposing it in the drivers. I know there's incentive to get rid of the superfluous blocks, but I don't know if we're going to see a redesign like that so soon. Especially since that piece of silicon couldn't also be used in professional graphics cards. What would they use for those? RDNA? Older Vega? Or would they end up with three designs? Who knows at this stage. :p
    Reply
  • bit_user
    alextheblue said:
    Especially since that piece of silicon couldn't also be used in professional graphics cards. What would they use for those? RDNA? Older Vega?
    Nvidia and AMD both offer workstation cards that mirror their consumer range, even reusing the same consumer chips, but with a few professional features enabled.

    alextheblue said:
    Or would they end up with three designs? Who knows at this stage. :p
    Nvidia's P100 and V100 are good examples, here. I suspect neither saw much use as actual graphics cards. They were simply too expensive and didn't offer enough performance advantage vs. the top-end consumer GPUs.

    I think the Titan V was mainly sold as a lower-cost deep learning accelerator. I'm betting most people who bought them weren't using them for gaming or other graphics tasks.
    Reply
  • alextheblue
    bit_user said:
    Nvidia and AMD both offer workstation cards that mirror their consumer range, even reusing the same consumer chips, but with a few professional features enabled.
    I know. I was specifically referring to a hypothetical graphics-less piece of silicon. They wouldn't be able to use that silicon in a pro card, which puts their future workstation cards in an interesting position. Would they be using GCN, or some variant of RDNA? If GCN, a new generation, or a rehash?
    bit_user said:
    Nvidia's P100 and V100 are good examples, here. I suspect neither saw much use as actual graphics cards. They were simply too expensive and didn't offer enough performance advantage vs. the top-end consumer GPUs.

    I think the Titan V was mainly sold as a lower-cost deep learning accelerator. I'm betting most people who bought them weren't using them for gaming or other graphics tasks.
    Those both had Quadro models obviously, but I'm not sure how big the market is for that level of graphical capability. I wouldn't have suggested anyone game on these, nor do I suspect many people bought a Titan V for gaming. My point is ditching the graphics would limit the market for that particular design, and having additional layouts costs them precious resources. Not sure if it's worth it. Guess we'll find out.
    Reply
  • bit_user
    alextheblue said:
    I know. I was specifically referring to a hypothetical graphics-less piece of silicon. They wouldn't be able to use that silicon in a pro card,
    Right, which is how we ended up with the example of Quadro P100 and V100.

    alextheblue said:
    Those both had Quadro models obviously, but I'm not sure how big the market is for that level of graphical capability. I wouldn't have suggested anyone game on these, nor do I suspect many people bought a Titan V for gaming.
    Yeah, and I'm trying to say that the graphics performance for even professional graphics is nearly as lousy for the $. I just don't believe the majority of these get purchased for graphics. Aside from V100 being used to prototype their interactive ray tracing, I'm betting most Quadro P100 and V100 cards are purchased for deep learning or GPU compute. You can't justify their price any other way.

    alextheblue said:
    My point is ditching the graphics would limit the market for that particular design, and having additional layouts costs them precious resources. Not sure if it's worth it. Guess we'll find out.
    If almost nobody is buying them for graphics tasks, then the size of the market you're foreclosing is almost zero.

    Besides, consider this: AMD could still put the display driver and video codec engine (which you need to be competitive at using it for video processing). Even if they drop the rest of the hardware assist, they could still emulate all of that stuff in software. So, one could still use it as a graphics card. I doubt they'll go to all of that trouble, but it's possible.
    Reply