Skip to main content

Nvidia Details RTX 30-Series Core Enhancements

RTX 3080 Ampere architecture
(Image credit: Nvidia)

Nvidia revealed the RTX-30 Series Ampere Architecture on September 1, celebrating the 21st anniversary of its first GPU, the GeForce 256. The features and specifications certainly look impressive, as you can read more in our GeForce RTX 3090, GeForce RTX 3080, and GeForce RTX 3070 breakdowns. However, we ended up with quite a few questions, and Nvidia provided plenty of additional information that we're summarizing here. We'll be adding much of this to our main Ampere architecture hub, so this is just the new details.

(Image credit: Nvidia)

First, let's talk about the Ampere streaming multiprocessor (SM). The biggest change for gaming is likely the doubling of FP32 performance. Each SM now has two FP32 clusters, providing for up to 128 FMA (fused multply-add) operations per cycle. Half of these are full FP32 + INT cores, while the other half is FP32 only. That might sound like a potential problem, but generally speaking (particularly for gaming workloads) FP32 is the most important, INT less so. It's a balanced approach to boost overall performance without bloating the core too much.

To help feed the beast (TM!), the data path was doubled, along with L1 bandwidth. L1 capacity is also 33% larger, with twice the partition size.

One of the other changes made is that Ampere can simultaneously run work through the CUDA cores, RT cores, and Tensor cores. This allows a game to run DLSS to upscale one frame while at the same time doing the CUDA and RT calculations for the next frame, cutting down on rendering time and improving overall performance.

(Image credit: Nvidia)

For the RT cores, Ampere also added functionality to interpolate triangle position. This is particularly important for things like motion blur, where not every triangle used to render a scene is at the same position or time. I'm still not a huge fan of motion blur in games, even if it might be more realistic looking, but whatever. This change potentially speeds up ray traversal by 8X, so it's an important addition.

That's it for the truly new information. Much of the remainder is previously known details, but we've provided the full slide deck below for those who want to see more. There are additional details looking into the performance of Wolfentstein Youngblood, as well as RTX IO (which we've covered elsewhere in our discussion of Microsoft DirectStorage and RTX IO).

Image 1 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 2 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 3 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 4 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 5 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 6 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 7 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 8 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 9 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 10 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 11 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 12 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 13 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 14 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 15 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 16 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 17 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 18 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 19 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 20 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 21 of 50

RTX 3080 Ampere architecture

(Image credit: Nvidia)
Image 22 of 50

RTX 3080 Ampere architecture

(Image credit: Nvidia)
Image 23 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 24 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 25 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 26 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 27 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 28 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 29 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 30 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 31 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 32 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 33 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 34 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 35 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 36 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 37 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 38 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 39 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 40 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 41 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 42 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 43 of 50

(Image credit: Nvidia)
Image 44 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 45 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 46 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 47 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 48 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 49 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
Image 50 of 50

Nvidia Ampere architecture - more details and performance metrics

(Image credit: Nvidia)
  • hannibal
    So indeed nvdia version of amd buldoser... ;)
    Interesting to see!
    I practise They just widen up the data pathway...
    Reply