AMD Navy Flounder and Sienna Cichlid GPU Specs Leak in ROCm Update

AMD Radeon RX 6000 — (Image credit: AMD)

A sharp-eyed redditor found a listing in the new ROCm (Radeon open compute) firmware update that reveals some of the specs for the highly-anticipated RDNA2 Sienna Cichlid (Big Navi) and Navy Flounder (Navi 22 or 23) graphics cards. Things could change over the next month as AMD fine-tunes its gear, so take these specs with a grain of salt - even though the numbers come from an official firmware update, these specs aren't confirmed.

The firmware update indicates Sienna Cichlid (Navi 21, i.e., Big Navi) will feature 80 CUs and a 256-bit memory bus. The CU count for this GPU is interesting, indicating all that we know on Big Navi could be legit, with most signs pointing to 80 CUs (or around 80 CUs) to be the spec for Big Navi. If this is true, not to mention if the GPU runs on TSMC's latest 7nm process, we could see RTX 3080-levels of performance, but with much better efficiency.

Swipe to scroll horizontally

Navi GPU Specifications from Linux Kernel Firmware Update:
Parameter	Navi 10	Navi 14	Navi 12	Sienna Cichlid (Big Navi)	Navy Flounder (Navi 22/23)
gc_num_se	2	1	2	4	2
gc_num_cu_per_sh	10	12	10	10	10
gc_num_sh_per_se	2	2	2	2	2
gc_num_rb_per_se	8	8	8	4	4
gc_num_tccs	16	8	16	16	12
gc_num_gprs	1024	1024	1024	1024	1024
gc_num_max_gs_thds	32	32	32	32	32
gc_gs_table_depth	32	32	32	32	32
gc_gsprim_buff_depth	1792	1792	1792	1792	1792
gc_double_offchip_lds_buffer	1024	512	1024	1024	1024
gc_wave_size	32	32	32	32	32
gc_max_waves_per_simd	20	20	20	16	16
gc_lds_size	64	64	64	64	64
num_sc_per_sh	1	1	1	1	1
num_packer_per_sc	2	2	2	4	4

The firmware specifications also list AMD's Navy Flounder with 40 CUs and a 192-bit memory bus. This GPU appears to a direct replacement to the RX 5700 XT and/or RX 5700 on the new RDNA2 architecture. If each CU has the same 64 stream processors as RDNA1, then Navy Flounder will have identical core counts to the 5700 XT.

Article continues below

Strangely, the memory bus is narrower than Navi 10 chips (RX 5700/XT) at 192-bits, AMD could be pulling an Ampere tactic by using higher-frequency GDDR6X memory to compensate for the lower bus width. Unfortunately, there is not enough information to indicate just where these GPUs will be placed in AMD's lineup - they could compete with the rumored RTX 3060 Ti, possibly the RTX 3070, or even other SKUs in the future.

Hopefully, this time around, AMD can make a bigger dent in Nvidia's market share over the upcoming months and years. AMD will announce these new GPUS under the RX 6000 series branding on October 28th.

TOPICS

Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.

14 Comments Comment from the forums

Endymio

> " if the GPU runs on TSMC's latest 7nm process, we could see RTX 3080-levels of performance, but with much better efficiency. ..."
How exactly is that conclusion reached? Isn't that the same node the 3080 is manufactured on?
Reply
no its samsung 8nm
Reply
Endymio

craigss said:
no its samsung 8nm
Yes, but I thought the two nodes were essentially equal in performance (and density), but that NVidia switched to Samsung because the thermals were better?
Reply
no TSMC is better
Reply
dwards

Endymio said:
Yes, but I thought the two nodes were essentially equal in performance (and density), but that NVidia switched to Samsung because the thermals were better?
No, I remember reading that Samsung’s node can hold 80 somethings per square mm, while TSMC’s can hold 96, which is exactly 20% more, so I assume that Nvidia could’ve made GPUs with better thermals if it used TSMC’s node instead of Samsung’s
Reply
spentshells

Nvidia went with samsung due to availability from what I'veread, sorry no references to offer as it was a while ago. TSMC is super busy.
Reply
digitalgriffin

spentshells said:
Nvidia went with samsung due to availability from what I'veread, sorry no references to offer as it was a while ago. TSMC is super busy.
It was a combination of factors. 8nm had a higher yield and depreciation cost so it was much cheaper per chip than the 7nm (Speculation) This decision was made 2 years ago. That's when you start to plan out the layout of the chip based on the fab rules. THIS IS ALL SPECULATION. But I cannot find fault in Adored's analysis.

tXb-8feWoOE:0View: https://www.youtube.com/watch?v=tXb-8feWoOE&t=0s

Plus 8nm Samsung was available. AMD and all the other seized that silicon capacity early on.
Reply
MasterMadBones

Endymio said:
Yes, but I thought the two nodes were essentially equal in performance (and density), but that NVidia switched to Samsung because the thermals were better?
Samsung 8nm is roughly equivalent in to early-mid 2019 TSMC 7nm in terms of performance and yield, but efficiency is much worse. I'm not sure truly how efficient either of these nodes is, but when density is 20% higher and efficiency is 15% higher on TSMC, thermal density should be roughly the same. The absolute amount of heat that needs to be dissipated from Samsung die is larger though, which has resulted in large and elaborate cooling solutions on the FE cards.

digitalgriffin said:
8nm had a higher yield and depreciation cost so it was much cheaper per chip than the 7nm (Speculation) This decision was made 2 years ago
This is comparing Samsung 8nm to Samsung 7nm. Samsung 7nm can be considered a 'broken' node.

digitalgriffin said:
Plus 8nm Samsung was available. AMD and all the other seized that silicon capacity early on.
Nvidia tried to bully TSMC into lowering prices but they didn't budge because demand for 7nm was already high. AMD took up a large part of the capacity that Nvidia was looking to get in the meantime.
Reply
Giroro

256-bit memory bus sounds like memory bandwidth could be too bottle-necked to reach 3080 performance, even if they use best available GDDR6X.
But, then again The Xbox Series S is targeting 1440p120 using only 10GB total system memory... so maybe Navi2 is some miracle architecture with way better memory efficiency than anybody thought was possible.
Reply
digitalgriffin

Giroro said:
256-bit memory bus sounds like memory bandwidth could be too bottle-necked to reach 3080 performance, even if they use best available GDDR6X.
But, then again The Xbox Series S is targeting 1440p120 using only 10GB total system memory... so maybe Navi2 is some miracle architecture with way better memory efficiency than anybody thought was possible.

It's the caching architecture. There is a very large cache on Navi. And to be honest this is the first step to MCM Navi chips that scale using a similar type of infinity fabric.

When you render to a small block, only a small subset of data is needed because blocks are relatively small. And you can only execute so many at a time. This limits the memory needed. BUT and this is a biggggggggg but, this is highly dependent on the look ahead predicting what memory is going to be needed for which tile and having it ready. That is some VERY tricky architectural challenges there. This might be possible as final texture render doesn't happen till later stages of the pipe. It could be easy to trip up the cache system. Textures are what eat up memory.
Reply

Show more comments