Meet Polaris 10
Six months ago, AMD started teasing features its next-gen GPUs would offer, beginning with a display controller revamped to support HDMI 2.0b and DisplayPort 1.3 HBR3, FreeSync over HDMI and an HDR-capable pipeline. Other bits and pieces emerged in the weeks that followed, pointing to a launch that’d include two distinct GPUs deliberately constructed to reclaim market share in the mainstream desktop market and present a mobile solution offering console-class performance in thin and light form factors.
That latter design includes 16 of AMD’s Compute Units matched to a 128-bit memory bus and 4K video encode/decode acceleration. It’s still forthcoming. The Radeon RX 480 we have today is based on the larger Polaris 10 design. But it’s not large in the sense that Nvidia’s 15.3 billion-transistor GP100 processor is large. Rather, the GPU is just complex enough to drive today’s highest-end virtual reality headsets, putting it at least in the league of AMD’s Radeon R9 290 and Nvidia’s GeForce GTX 970.
Mid-range performance isn’t going to knock anyone’s socks off on its own, especially a month after GP104 redefined the high-end. But by pricing Radeon RX 480 well below similarly quick boards and limiting power consumption to 150W, AMD hopes to make VR accessible to more gamers (if only the companies selling $600 and $800 HMDs would play along).
We’re expecting two versions of the Radeon RX 480: a $200 model with 4GB of on-board GDDR5 operating at 7 Gb/s and a $240 version with 8GB of 8 Gb/s GDDR5. Naturally, we have the 8GB one on-hand.
Inside of Polaris 10
Polaris 10 is composed of 5.7 billion transistors on a 230mm² die. Compare that to Hawaii’s 6.2 billion transistors on a 438mm² die. As you’ll see across our benchmark pages, RX 480 typically lands somewhere between R9 290 and 390…with fewer transistors and about 55% of the power budget. Much of that is naturally attributable to GlobalFoundries’ 14nm FinFET process, which AMD credits for delivering fundamental performance and power benefits over the 28nm node’s planar transistors. At any given power level, FinFET enables higher clocks. At a chosen frequency, a 14nm device uses less power. For Polaris, AMD is grabbing from both bins to bump clock rates up and cut consumption. That’s how it’s able to outperform more resource-rich GPUs like Hawaii at a 150W ceiling (though our measurements show RX 480 fudges a bit on its TDP).
Despite the new code-name, Polaris 10 is based on a fourth-gen implementation of AMD’s Graphics Core Next architecture. With this in mind, most enthusiasts already familiar with GCN are going to recognize the Polaris design’s building blocks, making our step-through of the design fairly straightforward.
MORE: Best PC Builds
MORE: How To Build A PC
MORE: All PC Builds Content
A single Graphics Command Processor up front is still responsible for dispatching graphics queues to the Shader Engines. So too are the Asynchronous Compute Engines tasked with handling compute queues. Only now AMD says its command processing logic consists of four ACEs instead of eight, with two Hardware Scheduler units in place for prioritized queues, temporal/spatial resource management and offloading CPU kernel mode driver scheduling tasks. These aren’t separate or new blocks per se, but rather an optional mode the existing pipelines can run in. Dave Nalasco, senior technology manager for graphics at AMD, helps clarify their purpose:
"The HWS (Hardware Workgroup/Wavefront Schedulers) are essentially ACE pipelines that are configured without dispatch controllers. Their job is to offload the CPU by handling the scheduling of user/driver queues on the available hardware queue slots. They are microcode-programmable processors that can implement a variety of scheduling policies. We used them to implement the Quick Response Queue and CU Reservation features in Polaris, and we were able to port those changes to third-generation GCN products with driver updates."
Quick Response Queues allow developers to prioritize certain tasks running asynchronously without preempting other processes entirely. In case you missed Dave's blog post on this feature, you can check it out here. In short, though, flexibility is the point AMD wants to drive home. Its architecture allows multiple approaches to improving utilization and minimizing latency, both of which are immensely important in applications like VR.
The Compute Units we know so well consist of 64 IEEE 754-2008-compliant shaders split between four vector units, a scalar unit and 16 texture fetch load/store units. Each CU also hosts four texture units, 16KB of L1 cache, a 64KB local data share, and register space for the vector and scalar units. AMD says it made a number of tweaks to improve the CU’s efficiency, including the addition of native FP16 (and Int16) support, tuned cache access and better instruction prefetching. Altogether, the changes purportedly yield up to 15% more performance per CU than the Radeon R9 290’s Hawaii GPU, which is based on a second-gen GCN architecture.
Nine CUs are organized into a Shader Engine, and Polaris 10 boasts four such SEs, consistent with what we know to be the architecture’s maximum. The math (64 shaders * nine CUs * four SEs) adds up to 2304 Stream processors and 144 texture units.
Each Shader Engine is associated with a Geometry Engine, which AMD says it improves by adding a primitive discard accelerator for tossing any primitive that won't rasterize to a pixel prior to scan conversion, thus increasing throughput. This is an automatic function of the graphics pipeline's pre-rasterization stage, and is entirely new to Polaris. There's also an index cache for instanced geometry, though we're not sure how large this is, or how significant its impact is when instancing is used.
Similar to Hawaii, Polaris 10 is capable of up to four primitives per clock cycle. But whereas the quickest Hawaii/Grenada-based GPUs run at up to 1050MHz (in the case of R9 390X), AMD pushes Radeon RX 480 to a base clock rate of 1120MHz and a "boost" rating of 1266MHz, compensating for some of what it loses in on-die resources using higher frequencies. Whereas Radeon R9 290X offered 5.6 TFLOPS of single-precision floating-point performance, RX 480 reaches up to 5.8 TFLOPS using that "boost" specification.
Just how realistic is the 1266MHz number? Hawaii had a real big issue maintaining AMD's clock rate spec as it got hot, and we wanted to make sure the same behavior doesn't affect Polaris. Using Metro: Last Light Redux's built-in benchmark looped 10 times, we recorded frequencies using GPU-Z and got the following graph:
There's exactly 148MHz between the lowest and highest points on this line chart. The floor is 1118MHz and the ceiling is 1265MHz. We'd say AMD nails its base and boost ratings almost exactly, even if what happens in between is subject to constant adjustment. At least an average of 1208MHz is closer to the top than the bottom.
The Hawaii and Fiji SEs have four render back-ends each, capable of 16 pixels per clock (or 64 across the GPU). Polaris 10 cuts that figure in half. Two render back-ends per SE, each with four ROPs, total 32 pixels per clock. This is a significant reduction compared to the Hawaii-based Radeon R9 290 AMD needs to beat with its RX 480. To compound matters, Polaris 10 employs a 256-bit memory bus—much narrower than Hawaii’s aggregate 512-bit path. A 4GB version of Radeon RX 480 will include 7 Gb/s GDDR5, enabling 224 GB/s of bandwidth, while the 8GB model we’re testing today utilizes 8 Gb/s memory, boosting throughput to 256 GB/s. Still, that’s a lot less than R9 290’s 320 GB/s.
Some of the deficit is offset with improved delta color compression, which reduces the amount of information transferred across the bus. AMD now supports 2/4/8:1 lossless ratios, similar to Nvidia’s Pascal architecture. Polaris 10 also benefits from the larger 2MB L2 cache first seen on Fiji. This can help dial back on trips to GDDR5, further reducing the GPU’s reliance on a wide bus and high data rates.
Still, leaning out the GPU’s back end must have an impact on performance as resolution and anti-aliasing utilization increases. Curious about how Polaris compares to Hawaii as the workload intensifies, we fired up Grand Theft Auto V at a modest 1920x1080 with Very High detail settings, then started scaling up anti-aliasing.
Sure enough, you can see the Radeon RX 480 bleeding off average frame rate much faster than the R9 390 as MSAA is toggled from Off to 2x to 4x. With AA disabled, the 480 achieves 97.3 FPS to the 390’s 90.4. But by the end, AMD’s Radeon RX 480 is down to 57.5 frames per second while the 390 averages 62.9.
MORE: Best Graphics Cards