Sign in with
Sign up | Sign in
AMD Radeon HD 7970: Promising Performance, Paper-Launched
By , Igor Wallossek,
1. Radeon HD 7970: A Holiday Surprise That You Can't Buy

Leading into December, we didn’t really expect to see a next-generation graphics card in the 31 days before 2012. In fact, even mid-month, after we’d already been briefed, the plan was to launch in January. Windows 8 and its accompanying DirectX 11.1 API update aren’t expected for months still, and today’s high-end graphics cards are well-equipped to handle modern games. Despite the fact that AMD purportedly stopped production of its Radeon HD 6990 months ago, we were worried that rumors of poor 28 nm yields at TSMC meant there was no way a new GPU could be readied in time.

When AMD moved its launch date up to today, we were even more bowled over. The official line from AMD was that “After collecting feedback from our partners and evaluating our overall readiness…we believe this new date allows us to get ahead of the Christmas season rush and CES.” Getting ahead of the Christmas rush by launching 72 hours before the big day is a tough line to swallow, especially after a follow-up confirming that cards won't be shipping until January 9th. The unfortunate result is that a lot of AMD’s software partners were unprepared to provide us with the applications needed to properly test the GPU’s new features. So, this article officially goes down as a preview, rather than a review. We will, of course, follow up when all of the proper tools are available for testing.

Meet Radeon HD 7970

Regardless of whether or not it’s ready for the world, or the world is ready for it, AMD’s Radeon HD 7970 is up and running in the Tom’s Hardware lab. This card is no minor revision of the Radeon HD 6000 series. The company’s ”Southern Islands” architecture was re-designed from the ground up with a long list of new features and capabilities, including DirectX 11.1 compatibility. Composed of 4.31 billion transistors etched on a 28 nm process, the flagship Tahiti GPU sports about 160% of the Cayman design’s reported transistor count. However, adopting the latest lithography node allows AMD to cram that extra complexity into a 365 mm² die, which is smaller than its predecessor’s 389 mm² surface area.

Before we delve into the major architectural redesign, let’s have a closer look at the new card’s specifications compared to its competition.


Radeon HD 7970
Radeon HD 6970
Radeon HD 6990
GeForce GTX 580
Stream processors
20481536
3072
512
Texture Units
128
9619264
Full Color ROPs
32
3264
48
Graphics (Shdr) Clock
925 MHz880 MHz830 MHz772 (1544) MHz
Texture Fillrate
118.4 Gtex/s
84.5 Gtex/s159.4 Gtex/s49.4 Gtex/s
Memory Clock
1375 MHz1375 MHz1250 MHz
1002 MHz
Memory Bus
384-bit256-bit2x 256-bit
384-bit
Memory Bandwidth264 GB/s
160 GB/s160 GB/s192.4 GB/s
Graphics RAM
3 GB GDDR5
2 GB GDDR52 GB GDDR51.5-3 GB GDDR5
Die Size
365 mm2
389 mm22 x 389 mm2520 mm2
Transistors (Billion)
4.31
2.645.283
Process Technology
28 nm40 nm
40 nm40 nm
Power Connectors
1 x 8-pin, 1 x 6-pin1 x 8-pin, 1 x 6-pin
2 x 8-pin1 x 8-pin, 1 x 6-pin
Maximum power (TDP)
250 W
250 W
375 W244 W
Price
$549 MSRP
$340-$380 (Newegg)
$700-$750 (EOL)
$500-$530 (1.5 GB)
$590-$730 (3 GB)


This product boasts notable advantages over the Radeon HD 6970, with 33% more stream processors and texture units, and a 65% net memory bandwidth increase thanks to its 384-bit memory bus. The only specifications that these cards share are 32 ROPs and a 250 W TDP. Based on those figures alone (and the fact that this is apparently going to be a $550 card), we’d expect the Radeon HD 7970 to decimate the 6970, edge past the GeForce GTX 580, and fall behind AMD’s Radeon HD 6990. There’s frankly a lot more to this story than gaming performance, though, and we’ll get to that in an in-depth exploration of AMD’s new Graphics Core Next architecture.

But first, we’ll share what we know about the Radeon HD 7000 series. Despite rumors to the contrary, all of the 28 nm Radeon 7000 series GPUs, previously code-named Southern Islands, are based on the Graphics Core Next architecture. That includes the Radeon HD 7700 series (Cape Verde Core), 7800 series (Pitcairn), and 7900 series (Tahiti), at the very least. AMD may include some 40 nm products under the 7000-series umbrella, and those would employ rebranded VLIW4/5 architectures.

The Southern Islands-based cards share the same features and abilities, which is good news. Here is a slide showing the placement of new product families relative to the Radeon HD 6000 series:

As you’ll see in our tests, the Radeon HD 7900 series appears to perform as its position in the deck would suggest. Note the Q1, 2012 expected date and the unnamed dual-GPU product at the top of the food chain.

With the relative performance of Radeon HD 7000-series cards established by AMD’s marketing department, let’s have a look at the family’s unique features. We’ll start with the basics: the Southern Islands architecture.

2. Graphics Core Next: The Southern Islands Architecture

While the Radeon HD 7970 is the first commercially-available product based on AMD’s Graphics Core Next architecture, the design itself is certainly no secret. In order to give developers some lead time to better exploit its upcoming hardware, Graphics Core Next was exposed in June 2011 at the Fusion Developer Summit. According to Eric Demers, the CTO of AMD’s graphics division, the existing VLIW architecture that was released with Radeon 2000 could still be leveraged for more graphics potential. However, it’s limited in general-purpose computing tasks. Instead of massaging old technology yet again, the company chose to invest in a completely new architecture. 

More compute performance and flexibility are great, but gaming alacrity and visual quality remain the most pertinent responsibilities of high-end desktop graphics hardware. Thus, AMD’s challenge was to create a GPU with a broader focus, simultaneously improving the 3D experience. In order to do that, the company abandoned the Very Long Instruction Word architecture in favor of Graphics Core Next.

The Efficiency Advantage Of Graphics Core Next

AMD’s VLIW architecture is very efficient at handling graphics instructions. Its compiler is optimized for mapping dot product math, which is at the heart of 3D graphics calculations. The design’s weakness is exposed when it has to schedule the scalar instructions seen in more general-purpose applications, though. Sometimes, it turns out that an instruction set, called a wavefront, can’t execute until another wavefront has been resolved. This is called a dependency. The problem is that the compiler can’t change the wavefront queue after it has been scheduled, so precious ALU potential is often wasted as instructions wait until the dependencies are addressed.

Here’s a theoretical example of how a Radeon HD 6970’s VLIW4 SIMD engine and its 16 banks of shader processors (each SP with four ALUs, totaling 64 ALUs per SIMD engine), would handle a wavefront queue that includes dependencies:

As you can see, the VLIW architecture doesn’t handle dependencies ideally. Wavefronts needlessly wait in the queue while free ALUs sit idle.
So, how do you optimize the amount of scalar work you can perform per clock cycle? Enter the Compute Unit, or CU, which replaces the SIMD engines to which we’ve grown accustomed.

Each CU has four Vector Units (VUs), each with 16 ALUs, for a total of 64 ALUs per CU. Thus, the number of ALUs per SIMD/CU is the same. The main difference is that, unlike the shader processors in a SIMD engine, each of the four VUs can be scheduled independently. The CU has its own hardware scheduler that's able to assign wavefronts to available VUs with limited out-of-order capability to avoid dependency bottlenecks.

This is the key to better compute performance because it gives each VU the ability to work on different wavefronts if a dependency exists in the queue:

In our example, the same wavefront queue that took six clock cycles to complete on VLIW4 SIMD engine can be executed in four clock cycles on Graphics Core Next. AMD suggests that Radeon HD 7970 can achieve up to a 7.5x peak theoretical compute performance improvement over the Radeon HD 6970 due to higher utilization. The real-world difference depends on compiler efficiency, and in some compute tasks, the Radeon HD 7970 is barely better than the 6970 on a per-ALU and per-clock basis. We certainly saw lots of variance in our own benchmarks, as you’ll see. But it’s safe to say, based on our synthetic testing, that the Graphics Core Next compute potential exceeds VLIW4.

Deeper Into The Compute Unit

As mentioned previously, the CU replaces the SIMD engines (as named by AMD) we’ve seen since the Radeon 2000 days. We also noted that the CU is composed of four vector units, which, in turn, contain 16 ALUs and a register file each. And we know that the VUs operate independent of each other.

Let’s go into more depth on those VUs, though. Unlike the simplified clock cycle we use in the example above, each VU can process one-quarter of one wavefront per cycle. Equipped with four VUs, each CU can consequently process four wavefronts every four cycles, the equivalent of one wavefront per cycle per CU.

We haven’t yet talked about the CU’s scalar unit, which is primarily responsible for branching code and pointer arithmetic. The vector units could handle those tasks too, but this co-processor’s strength is in offloading scalar work in order to allow the vector units to flex their muscles in more appropriate ways.

Each CU has four texture units tied to a 16 KB read/write L1 cache, which is two times bigger than the VLIW4’s read-only cache. Historically, L1 was only used to read textures. Now, though, they can go back and forth through the same cache.

3. Bringing It All Together: The Tahiti GPU And Radeon HD 7970

The Tahiti GPU in AMD’s Radeon HD 7970 plays host to 32 CUs. With 64 ALUs per CU, that adds up to a total of 2048 total ALUs. Do the multiplication, assuming a 925 MHz core clock and you have a GPU capable of about 3.8 TFLOPS of 32-bit math and 947 double-precision GFLOPS. The L1 cache offers about 2 TB/s of bandwidth at this card’s clock rate, backed by a larger 768 KB L2 cache.

There are eight render back-ends capable of 32 full-color raster operations per clock, the same as Radeon HD 6970. But while the raw specifications are identical, efficiency is improved by virtue of six 64-bit memory controllers yielding a wider 384-bit memory interface. Between the expanded bus and faster 1375 MHz GDDR5 memory, the Radeon HD 7970 boasts an impressive 264 GB/s of memory bandwidth, which is roughly 100 GB/s more than the Radeon HD 6970.

Revamped Tessellation Engines

Each GPU has two revamped geometry engines optimized for tessellation. Though they’re still limited to 2 billion vertices, AMD claims a 1.7x to 4x performance increase, depending on the number of subdivisions applied to the source primitive. The large parameter buffer cache has also been increased.

PowerTune and ZeroCore

PowerTune should be familiar from our Radeon HD 6900-series launch coverage. To recap, the feature monitors work performed by the GPU and adjusts frequencies so that the board only uses the power that its maximum TDP allows. According to AMD, without PowerTune, the Radeon HD 7970’s core would have to be cut to about 720 MHz in order to fit within the 250 W envelope, taking worst-case scenarios into account.

The Tahiti GPU does have new power management functionality up its silicon-encrusted sleeve though, and it’s called ZeroCore technology. Comprised of several developments, including a deep sleep mode to reduce GPU consumption, a DRAM stutter mode to reduce memory power, and the ability to compress the frame buffer’s contents, ZeroCore has a measurable effect on draw during idle and monitor-off situations. AMD claims that the card only consumes 15 W in a static Windows environment, and its GPU is completely turned off when the monitor is not in use. The fan even stops, and minimal heat is dissipated. Our power readings confirm that the Radeon HD 7970 uses notably less power than the Radeon HD 6970 at idle.

CrossFire users—the folks who often have to contend with the biggest thermal issues—will welcome another component of ZeroCore that turns off the second, third, or fourth board in a multi-GPU environment when they’re not needed. With supply of available Radeon HD 7970s painfully low and only a single sample available to test, we cannot yet confirm that this feature works as AMD is advertising. However, it’s on our list of things to double-check when the company’s supply stabilizes.

PCI Express 3.0

Data is fed into and from the GPU through PCI Express, of course, and AMD’s Radeon HD 7970 is the first graphics card to boast compatibility with the third-gen standard. Frankly, today’s desktop software cannot seem to saturate PCI Express 2.0 slots, even when they’re halved into eight-lane links. So, we doubt we’ll see any performance increase from the interface currently only supported by Intel’s Core i7-3000-series processors. However, AMD hints that the 16 GB/s of bidirectional bandwidth may help compute applications in some cases. Again, though, vendors aren’t even ready to show off the applications they demonstrated at the Radeon HD 7970 briefing, and we weren’t given enough time to test the effects of third-gen PCI Express ahead of today’s embargo anyway.

Meet Radeon HD 7970

At 10.5” long by 4.5” tall, the Radeon HD 7970’s PCB is exactly the same size as the 6970. It appears smaller, though, thanks to a design trick used by automakers and Apple: the heat sink is tapered at the end.

Despite the dimensional similarity, there are some notable differences between AMD’s successive single-GPU flagships. The back of the new card isn’t covered by a metal reinforcement plate, for starters. Moreover, its axial fan intake is just under three inches, while the Radeon HD 6970’s is about two and a half.

Speaking of that fan, it has larger, wider blades designed for better airflow at lower rotational speeds. Despite the seeming improvement, our experience with fan noise was not a positive one. Check our noise benchmarks for more. AMD uses a newer version of the phase-changing thermal interface material (essentially a type of thermal paste) used in the Radeon HD 6990 to mate the cooler to the GPU. The two-step, three-level vapor chamber purportedly has an easier job pushing air out of the back of the card because AMD removed the stacked DVI connector for better airflow.

So, with the second DVI connector removed, what’s left? The reference card comes with two mini-DisplayPort outputs, an HDMI output, and one dual-link DVI output. Before you get too torn up about a triple-monitor Eyefinity setup requiring an expensive investment in DVI adapters, there’s some good news: AMD says it's going to bundle an HDMI-to-DVI adapter and active mini-DisplayPort-to-DVI adapter with its cards. So, despite the loss of one previously-valuable connector, triple-monitor users should be better-supported now than they were with the Radeon HD 6970. Of course, that's not to say the company's add-in board partners will be as generous. Do your homework before picking a brand and make sure those extras come bundled before jumping on the lowest price.

With the cooler removed, you can see the large plate protecting the GPU, designed to resist the warping seen in previous-generation products. Here are some shots of the naked card and exposed GPU, posing in unabashed glory.

Note the six- and eight-pin power connectors, similar to the Radeon HD 6970. These should come as no surprise considering the similar TDP of both cards, although AMD claims the power delivery is overkill for everyone except overclockers. Also pay attention to the dual BIOS switch, which is another welcome carry-over from the 6900 series that facilitates firmware tweaking with a little less risk. 

4. PRTs, DirectX 11.1, Eyefinity, Stereoscopic 3D, And More

The Tahiti GPU isn’t just an improved gaming and compute engine. AMD is introducing a number of new features to its Southern Islands-based product line:

DirectX 11.1, OpenCL 1.2, and DirectCompute 11.1

Windows 8 is slated to include DirectX 11.1, and the Radeon HD 7970 supports it. You can read more about the new features of Direct3D 11.1 on Microsoft’s Dev Center website.

Partially Resident Textures (PRT)

PRTs present a nifty way to take advantage of Tahiti’s virtual hardware memory, treating the card’s GDDR5 like a managed texture cache. PRTs allow for at least two interesting benefits: reducing stutter and texture popping, and facilitating hardware-managed megatextures (similar to the ones used by id, the developers of Quake and Rage).

As textures are fetched for rendering, only the visible segments of that texture are loaded into memory (in 64 KB chunks). But the magic occurs when a texture call is missed. In that case, the GPU can give feedback to the application and ask for further instruction, giving the application unprecedented control to choose the textures it wants to load and determine how to prioritize as it moves forward. In turn, the GPU can display a low-resolution texture before the higher-resolution version is loaded to (ideally) reduce stuttering.

Eyefinity 2.0, Display Support, and Desktop Enhancements

According to AMD, the Radeon HD 7970 is the first graphics card able to provide multiple simultaneous independent output streams, called Discrete Digital Multi-Point Audio (DDMA). This means that each attached screen can output its own unique audio signal. Although this isn’t a significant value-add to most enthusiasts who shy away from monitor-attached speakers in favor of discrete speaker systems or headphones, it does have an interesting application to multi-display video conferencing.

The Radeon HD 7970 also introduces simplified ultra-high-resolution display support. To date, 4K-pixel displays required multiple inputs, but the Radeon HD 7900 series can purportedly do this from a single connector now using 3 GHz Fast HDMI or DisplayPort 1.2 HBR2.

Finally, some notes about upcoming features planned for the Catalyst Control Center driver. AMD says that, in February 2012, the 12.2 driver will offer custom desktop resolutions, a feature it says that Eyefinity users asked for since day one. Other enhancements include preset manager improvements and the ability to drop the Windows taskbar on a preferred display in a multi-monitor environment.

Stereoscopic 3D Enhancements

The first stereoscopic 3D announcement ties in directly with Eyefinity, but it’s not specific to the Radeon HD 7900s: the Catalyst 11.12 driver includes Eyefinity support for HD3D, and the already-available Catalyst 12.1 preview driver includes CrossFire support for both HD3D and Eyefinity, lifting one of the biggest detractors from 3D gaming on AMD graphics cards. Rendering in stereo is incredibly demanding, and CrossFire could definitely help. Until now, though, it simply wasn’t an option.

Radeon HD 7970 is the first graphics card to support the aforementioned 3 GHz Fast HDMI standard, capable of frame packing a 1080p stereoscopic 3D signal to televisions at 120 Hz, or 60 Hz per eye. Unfortunately, this doesn’t work on existing televisions. The display must support Fast HDMI explicitly. On a side note, the Radeon HD 7970 also supports 60 Hz (30 per eye) stereoscopic gaming over standard HDMI 1.4a, although we’re not sure how much of a difference it makes compared to the current 48 Hz implementation.

Finally, it’s worth noting that Microsoft will include a Stereo3D API in Windows 8, and AMD’s Radeon HD 7970 is already claimed to support it. The hope is that this vendor-agnostic solution will embolden more developers to get on-board.

UVD and the new Video Codec Engine (VCE)

AMD added dual-stream HD+HD acceleration to its newest iteration of the Unified Video Decoder (UVD), but that's the only change to UVD's decoding feature set. There is something new to talk about on the encoding side, though, and that’s the Video Codec Engine (VCE). Although AMD isn’t specific about VCE’s implementation (making it tough to compare to Intel’s Quick Sync), it does call the feature a multi-stream hardware-based H.264 encoder. Chris Angelini predicted the arrival of something like this nearly a year ago in his Sandy Bridge launch coverage.

The thing is, limited to H.264 encode acceleration, VCE isn’t as comprehensive as Quick Sync. AMD claims that the encode pipeline is capable of working faster than real-time. However, it’s not as scalable on its own as it would be if you were using VCE with the GPU’s compute improvements in parallel.

To that end, the company recommends using VCE-only on lower-end graphics cards, or perhaps in mobile environments where the power savings of fixed-function logic has a big impact on battery life.

The hybrid mode, shown below, is more interesting. Because it involves compute resources (the ALUs), power consumption increases dramatically. The programmable hardware takes over a greater number of encode tasks, while the VCE retains entropy encode duties.

Although we’d love to compare VCE to Intel’s Quick Sync, or even Nvidia’s CUDA-accelerated encode performance, this is yet another components of the Radeon HD 7970 launch that isn’t ready for testing. In fact, none of the features discussed on this page could be tested to make sure they work. AMD simply couldn’t supply us with the software tools able to exploit the virtues of what it says its GPU can do.

5. Test System And Benchmarks

Our original plan was to test Radeon HD 7970 on an X79-based platform with Core i7-3960X. However, AMD pulled the rug out from under us as our desired platform was en route to the snowy whiteness that is Canada in December. Instead, we were forced to run all of our performance data on an admittedly more common LGA 1155-based Core i5-2500K overclocked to 4 GHz. The Sandy Bridge-E-based testing will have to wait until next month, when AMD lets its next batch of information spill on the 7900-series family.   

Test Hardware
Processor
Intel Core i5-2500K (Sandy Bridge)
Overclocked to 4 GHz, 6 MB L3 Cache, power-saving settings enabled, Turbo Boost disabled.
Motherboard
MSI P67A-GD65, Intel P67 Chipset
Memory
OCZ DDR3-2000, 2 x 2 GB, at 1338 MT/s, CL 9-9-9-20-1T
Hard Drive
Western Digital Caviar Black 750 GB, 7200 RPM, 32 MB Cache, SATA 3Gb/s
Samsung 470 Series SSD 256 GB, SATA 3Gb/s
Graphics CardsRadeon HD 7970 3 GB GDDR5
Radeon HD 6970 2 GB GDDR5
Radeon HD 6990 4 GB GDDR5

GeForce GTX 580 1.5 GB GDDR5
GeForce GTX 590 3 GB GDDR5
Power Supply
Seasonic X760 SS-760KM: ATX12V v2.3, EPS12V, 80 PLUS Gold
CPU Cooler
Cooler Master Hyper TX 2
System Software And Drivers
Operating System
Microsoft Windows 7 Ultimate x64
DirectX Version
DirectX 11
Graphics Driver
GeForce: 285.88 Beta

Radeon: 7900 Launch Beta Driver
Synthetic Benchmarks
3DMark 11
Version 1.0.3.0, Extreme Preset
Unigine Heaven
Version 2.1, two runs, Tessellation Off and Tessellation Normal
Games
Battlefield 3
Version 1.0.0.0, Operation Swordbreaker, Fraps Run
Batman: Arkham City
Version 1.0.0.0, Built-In Benchmark
Metro 2033
Version 1.0.0.1, Built-In Benchmark
DiRT 3
Version 1.2.0.0, Built-In Benchmark
Crysis 2
Version 1.9, FRAPS runs
Elder Scrolls V: Skyrim
Version 1.2.14.0, FRAPS runs
World of Warcraft
Version 4.3.0.150.50, FRAPS runs
6. Synthetic And Tessellation Benchmarks

If 3DMark is any indication, the Radeon HD 7970 has a great deal of potential. It lands right between the GeForce GTX 580 and dual-chip solutions like Radeon HD 6990 and GeForce GTX 590. Being natural cynics, we can’t say we put a lot of faith in the correlation between these scores and what our real-world tests will demonstrate, but it’s good to see in this one ubiquitous title for which graphics vendors religiously optimize, the possibility exists for this single-GPU board to at least approach the last generation of hot, power-hungry dual-GPU cards.

Let’s continue with another synthetic, Unigine’s Heaven Benchmark. Since we’re taking the benchmark twice (once with tessellation off and once with it set to Normal), these results give us a point of comparison to gauge any of Tahiti’s tessellation improvements. Bear in mind that we disabled AMD’s tessellation optimizations in the CCC driver for all tests, making this a fairer metric.

We again see what the Radeon HD 7970 can do. However, the results wind up coming across confusing because the frame rate penalty incurred when enabling tessellation is more pronounced than it is on other cards.

The tessellation setting in Batman: Arkham City only allows you to choose between normal and high tessellation without turning other DirectX 11 features off entirely. So, we benchmarked both settings to isolate tessellation as a variable. The frame rates are so close here that it’s hard to declare a winner.

H.A.W.X. 2 lets us turn tessellation on or off, so it should make an ideal comparison. As we might have guessed given its standing as a TWIMTBP title, the game is clearly optimized for GeForce graphics cards, making a more general assessment difficult.

However, the vendor-specific optimizations can be minimized by comparing tessellated performance relative to non-tessellated results:

Here we see that the Radeon HD 7970 is better able to cope with a tessellation load compared to the Radeon HD 6970. It performs on par with the GeForce GTX 580 and Radeon HD 6990.

7. Benchmark Results: Battlefield 3

Especially in the early stages of a product’s life cycle, synthetic benchmarks give us a glimpse at how well drivers are coming along, as vendors almost always devote lots of time optimizing for those tests. But gamers don’t play synthetics, making real-world metrics much more critical to us enthusiasts. Let’s start with Battlefield 3, one of the most graphically-impressive titles of 2011:

With ultra detail enabled, the Radeon HD 7970 blows past the Radeon HD 6970 and GeForce GTX 580, nipping at the heels of the GeForce GTX 590. It provides smooth Eyefinity performance across three 1080p monitors, too. Note that the GeForce GTX 580 is simply unable to compete in that configuration, as Nvidia’s single-GPU graphics cards require SLI to facilitate at least three cumulative display outputs.

Now let’s make things more interesting by adding 4x MSAA to the mix:

The Radeon HD 7970 doesn’t do as well as the dual-GPU cards with 4xAA at 1080p, but it still manages a playable frame rate, easily surpassing the single-GPU competition.

The triple-monitor Eyefinity result looks impressive, but it’s not really playable on any of these graphics cards. The GeForce GTX 590 couldn’t run in Surround mode consistently without crashing.

8. Benchmark Results: Crysis 2

The new card performs really well at 1080p, but it falters in the three-monitor configuration that so many of our readers have been asking to see us start using. Then again, the dual-GPU cards can’t muster a 30 FPS minimum, either. It looks like you’d have to reduce graphics detail across all of these cards to achieve more playable settings on a trio of displays.

Running at 1080p still isn’t a problem for the Radeon HD 7970, but it’s no surprise that 5760x1080 crawls on all of these cards.

9. Benchmark Results: The Elder Scrolls V: Skyrim

The Elder Scrolls V: Skyrim has already secured a place for itself as one of the best games of 2011. Let’s see how these graphics cards can handle its beautifully-crafted landscapes:

The close results suggest a CPU bottleneck. Skyrim didn’t present much of a challenge for these cards at its maximum settings with 4x MSAA, so we’ll enable transparency anti-aliasing and see if anything changes.

Once again, the playing field is very tight. It’s safe to say that this title is limited by our CPU. The only graphics card unable to provide a minimum 30 FPS at 1080p is the Radeon HD 6970.

10. Benchmark Results: DiRT 3

Codemasters’ racing titles are famous for demanding both CPU and GPU resources, so DiRT 3 is a good metric for testing high-end graphics cards on an overclocking enthusiast-oriented platform.

The Ultra details preset with anti-aliasing disabled is no challenge at all at 1080p, but this game doesn’t scale well with triple-monitor resolutions, and the Radeon HD 6970 can’t manage 30 FPS on average.

Performance at 1080p is smooth from one contender to the next, and although the frame rates certainly drop at 5760x1080, the Radeon HD 7970 manages to maintain a minimum performance level that doesn’t dip below 30 FPS.

11. Benchmark Results: World Of Warcraft

World of Warcraft isn’t a particularly demanding game, but with over 10 million players, it might be the most relevant.

Unfortunately, this isn’t the apples-to-apples comparison we were hoping for. AMD’s cards cannot run at 5760x1080 in DirectX 11 mode—a surprising misstep given this title’s popularity. As such we’re forced to use DirectX 9 rendering.

Additionally, Nvidia’s GeForce cards don’t support transparent anti-aliasing in DirectX 9 mode, necessitating those numbers be run using an incomparable code path. If it’s not one thing…

With that in mind, let’s see how AMD’s new card performs with 8x MSAA:

The frame rates are incredibly high, and performance never drops below 30 FPS. How about with 8x transparency anti-aliasing, enabled through each vendor’s driver interface?

While transparency anti-aliasing slows things down at 1080p, all of the cards maintain smooth performance. The same can’t be said of our findings at 5760x1080, though. In this case, only the dual-GPU solutions are viable, and even the Radeon HD 6990 struggles with its minimum frame rates. Enabling 8x transparency AA is probably an unreasonable setting, though all of these cards would probably have performed smoothly at 4x.

12. Benchmark Results: Batman: Arkham City

Batman: Arkham City has a reputation for providing unplayable frame rates with DirectX 11 features enabled (Ed.: Yeah, because it was busted), but a patch was recently released that supposedly fixes the problems encountered on 64-bit Windows. We thought we’d give it a try:

The Radeon HD 7970 delivers amazing performance relative to the rest of the pack at 1080p, and it almost catches the GeForce GTX 590 at 5760x1080. But the minimum frame rates are terrible across the board. It looks like the patch didn’t solve all of the bugs in DirectX 11 mode. We’ve given this one a couple of chances now—it’s safe to say it’ll likely see its way out of our benchmark suite in favor of something a little more stable.

Let’s add 4x MSAA, just to see what happens (and because we enjoy the pain; meow).

The Radeon HD 7970 destroys its competition at 1080p. But 5760x1080 is much harder on the card, and it remains behind the GeForce GTX 590.

13. Benchmark Results: Metro 2033

Metro 2033 might not be the newest title available—in fact it’s downright old by most gaming standards—but it’s one of the most brutal on graphics hardware, with a DirectCompute-based depth of field filter selectable in DirectX 11 mode. Let’s try High details using DirectX 9 first:

AMD’s new übercard leads the pack at 1080p, and it sits just behind the dual-GPU cards at 5760x1080. Now we’ll pour on the DirectX 11, DoF, and anti-aliasing syrup:

Well that’s disappointing, isn’t it? Minimum and average frame rates drop to the floor at 1080p. While the Radeon HD 7970 holds up well compared to the competition, 5760x1080 can only provide a pyrrhic win for the new Radeon.

14. GPGPU Benchmarks: This Time, With A Preface

It is our goal to be as thorough as possible and to include as many real-world applications as we can, instead of simply relying on synthetic performance metrics. Unfortunately, by pulling its launch date up before Christmas, AMD ran out of time to square away the details. So, the company had to focus its development efforts  on gaming (admittedly, the most important subject for an introduction like this one).

As a consequence, other areas didn’t get any development love, and we're missing the ability to test a lot of what AMD is claiming as features. The blame certainly doesn’t fall on the company's software partners, since they were working on a different timeline than what actually ended up transpiring. Regardless, while some of the general-purpose compute applications work to some degree, most don’t. The ones that do don’t show much of a benefit over their predecessors, or even the unaccelerated code path. Ironically, video acceleration is one of the casualties, so we can’t even highlight one of Tahiti’s marquee features: VCE. In short, this is meant to be a fixed-function block of logic not unlike Intel’s Quick Sync.

So, this time around, we are forced to rely more heavily on synthetics. We will follow up with more real-world applications as soon as compatible versions of the software and supporting drivers are available.

Bitmining

Bitmining is one of the few real-world applications we're able to run, although it's a bit one-sided. Since the server would not let us verify, we had to use solo mode.

Now, Radeons have traditionally been very strong in Bitmining. However, efficiency is really almost more important than sheer performance, and that’s where things are less clear. Sure, the Radeon HD 7970 is the fastest single-GPU card in this group, but it’s comparatively small performance improvement comes with a steep increase in power consumption. For reference, while the aging Radeon HD 5870 attains its respectable performance using 190 W, the new Radeon HD 7970 guzzles down 254 W.

LuxMark

LuxMark is based on the freeware application LuxRender, making it the second real-world application in our GPGPU suite. The results are nothing short of spectacular, as the Radeon HD 7970 returns an almost twofold improvement over its predecessor, which takes third place to the older Radeon HD 5870.

Meanwhile, Nvidia's GeForce GTX 580 trails the pack and comes in last. Granted, this benchmark doesn’t seem to like the GeForce cards to begin with, but the fact that the Radeon HD 7970 is almost three times as fast as Nvidia’s current single-GPU flagship is a bit of an embarrassment. It also goes to show that a little optimization can go a long way.

GPU Caps Viewer

That brings us to our synthetic benchmarks. GPU Caps Viewer uses a combination of OpenCL computations, post-processing, and normal graphics output without anti-aliasing, letting us draw some interesting conclusions.

The Post-FX test is a direct implementation of Nvidias own demo for oclPostprocessGL from the Nvidia GPU Computing SDK. A blur effect is added to the image output during post-processing. Interestingly, the Radeon HD 7970 is able to beat the GeForce GTX 580, even though the demo was originally developed by Nvidia. The older Radeons fall behind by a sizeable margin.

In the particle test, the GeForce GTX 580 chalks up a clear win. Meanwhile, none of the Radeons can keep up, although the Radeon HD 7970 is able to close the gap a little.

NQueen

The N-Queen puzzle (also known as the eight queens puzzle) is a complex mathematical problem from the world of chess. The goal is to arrange eight queens on a chess board in such a way that no two queens can attack each other according to the rules of chess. The color of the piece is irrelevant, so any queen may attack any other queen. In the end, the point is to find the number of possible solutions as quickly as possible.
This problem forms the basis of this benchmark, and the NQueen test proves once more that AMD's Radeon HD 7970 tremendously benefits from leaving behind the VLIW architecture in complex workloads. Both the HD 7970 and the GTX 580 are nearly twice as fast as the older Radeons. So, while the VLIW-based cards are great for crunching numbers, they’re not as well suited to this sort of task. 

DirectComputeBenchmark

Since this is one of the few benchmarks out there that can test DirectCompute performance as well, that was originally on our list too. However, the result we got for the Radeon HD 7970 was far too high to be plausible. Until we can prove otherwise, we’ll discard that result and chalk it up to a bug in the benchmark. The result of the OpenCL benchmark was more believable:

Interestingly, the Radeons rule this benchmark, with the HD 5870 taking the top spot ahead of the HD 6970 and the HD 7970. Despite the fact that it uses an architecture similar to that of the HD 7970, Nvidia’s GeForce GTX 580 trails the AMD group by a wide margin.

First Impressions

While these results hold great promise, it’s certainly too early for the AMD fans out there to celebrate. Due to the distinct lack of usable real-world apps and the beta state of the drivers we had at our disposal, it’s hard to come to any conclusion about Tahiti’s real compute performance. There is definitely a very positive trend, though, so we can hope to see some compelling performance in real-world applications once they surface. 

Moving away from the previous VLIW architecture doesn’t hurt the Radeon HD 7970 too much (if at all, as borne out in Bitmining) in areas where Radeons have traditionally ruled the roost, while simultaneously helping it gain ground in disciplines that Fermi-based cards dominated in the past. Thus, the card appears to be a potent solution able to leave behind the previous generation's limitations. Of course, the drivers and third-part apps have to come around, too. AMD certainly has its work cut out for it in this department.

15. 2D Performance Benchmarks

2D Performance Via GDI and GDI+

While it may not be as sexy as 3D performance, 2D rendering is still important. While there is a clear trend towards rendering 2D content using Microsoft’s more modern Direct2D API, it’s a safe bet that more than 90 percent of all applications in use today still rely on the drawing functionality provided by the older GDI (Graphics Device Interface) and GDI+. Most user interface elements, such as frames, buttons, and toolbars, are rendered using these components, though. Meanwhile, older programs created for very specific purposes rely completely on this rendering method for all of their 2D objects. That’s why we decided to test 2D performance as well.

Hardware Accelerated, or Not?

To start with, let’s take a look at actions that aren’t accelerated. Windows 7 reserves a special part of system memory (non-local memory, also referred to as aperture space), to which the graphics card has direct access. This area serves as a buffer for anything that can’t be accelerated in hardware. If the content of this buffer changes because a window has been moved or added on top or its content has been altered, for example, its elements are copied directly into the video card’s memory.

Unfortunately, only very few GDI and GDI+ operations actually enjoy GPU support under Windows 7. Among them are text rendering, color filling, copying and stretching of images (BitBlt using the standard ROPs, StretchBlt), and transparencies (AlphaBlend, TransparentBlt). While drawing of geometric shapes is no longer accelerated in hardware at all, copying and color fills can actually be output directly, circumventing the aperture space. Since graphics cards haven’t actually contained dedicated 2D units for a while now, a card’s 2D performance completely depends on the quality of its driver.

Text Output

AMD’s Radeon HD 7970 is the only card to do badly here, at least if we’re talking about direct (hardware-accelerated) output to the display and not buffered and unaccelerated output in the form of a DIB (device-independent bitmap). While that shouldn’t result in any serious disadvantages in daily use, a look at the older Radeons shows that the driver could still use a lot of optimizing. Our guess is that hardware acceleration for direct text rendering is still faulty, since that result is even slower than the non-accelerated software solution using a DIB.

Image Manipulation

Looking at stretching performance, we see a similar result. The newest Radeon trails the rest of the field in direct output mode. Interestingly, performance in software mode using the buffer is actually significantly higher in stretching operations than in direct output mode.

Meanwhile, simple copy operations (blitting) don’t show much variation between cards, and of our four cards, only the GeForce GTX 580 is faster taking the direct route than taking the detour through the buffer (a clear sign that hardware acceleration is being used more efficiently here).

Geometry Performance

The Radeon HD 7970 only falls behind by a small margin when drawing lines. Meanwhile, the rest of the benchmarks show all of our contenders performing quite similarly. It is interesting to note that both splines and rectangles are apparently accelerated quite well when they are rendered sequentially, as the direct output path is faster than the software version in either case. This delta is especially pronounced in the triangles test. The exact opposite applies when it comes to drawing polygons, where buffered output is much higher.

Impressions

AMD has certainly improved its drivers since the first time we took a closer look at 2D performance. The Radeon HD 7970 only falls behind its predecessors when it has to handle hardware-accelerated text output, achieving half the performance of older cards. While it is unlikely that this would translate into any visible slow-downs in everyday tasks, you’re bound to notice it when moving around longer texts (floating) in certain programs. The situation is much better than what we saw right after the launch of the Radeon HD 5870.

16. Benchmark Results: Overclocking

AMD representatives were quite confident about the Radeon HD 7970’s overclocking headroom. They went so far to say that most cores will make it over 1 GHz, that many will get past 1.1 GHz, and that some will even slam right into 1.2 GHz. Moreover, they claimed that the stock 1375 MHz GDDR5 memory could make it to 1625 MHz or higher.

We were happy to perform the tests needed to validate AMD’s claims. Unfortunately, the card’s stock BIOS is more of a prude than the company’s representatives. Its software-based Overdrive tool won’t push any higher than 1125 MHz on the core and 1575 MHz memory. Compared to the stock 925 MHz core and 1375 MHz memory settings, though, those are respectable ceilings for a consumer card. Of course, board partners will be able to customize the overclocking limits of their own offerings, and if the Radeon HD 7970 has the overclocking headroom that AMD says it does, we’d hope to see factory-overclocked cards with nice boosts.

With all of that said, we increased PowerTune’s slider by the maximum 20% setting and shot for the moon. The result on this undoubtedly hand-picked sample was the top available frequencies. Will your card be chosen for its headroom? Probably not. Can we guarantee you’ll see a similar overclock? Unfortunately not.

How does that frequency jump translate into performance? We’ll show you:

17. Power, Temperature, And Noise Benchmarks

Let’s say that the Radeon HD 7970 has the potential to be an amazing performer. Would it still be worth $550 if its maximum power load were enough to cause blackouts at Candlestick Park during a 49ers game? Fortunately, we don’t have to speculate.

Surprisingly, the 7970 draws less load power than a GeForce GTX 580, while pulling less at idle than a Radeon HD 6970. AMD’s power management advancements should pay dividends amongst our European audience, which has to pay significantly more for electricity than the North American readers.  

Now it’s time to turn our attention to GPU temperatures. We should mention that the GeForce GTX 580 we’re testing with is a Gigabyte GV-N580SO-15I clocked down to reference frequencies and equipped with an aftermarket cooler.

The temperatures are right where we’d expect them to be in comparison to the Radeon HD 6970, a card with a similar TDP.

Finally, let’s have a look at the noise generated by these products. Once again, keep in mind that the GeForce GTX 580 isn’t a reference card, and its aftermarket cooler provides an advantage you’d typically have to pay extra for.

Uh-oh. That’s a significant amount of noise, which gives us our first concerning design-oriented issue seen thus far. I’m almost afraid to mention it, because when Chris Angelini railed the Radeon HD 6990 for its noise problems, he got a lot of negative feedback. But I’m not willing to bury it, so there it is.

Concerned about a possible heat sink seating issue, I took the card apart and put it back together again with fresh thermal paste. Now, AMD doesn’t recommend this because it claims the phase-changing thermal interface material it uses enables a few-degree advantage over normal thermal paste. In light of our negative results, though, we had little to lose by at least trying. In the end, my surgical procedure made no difference, and we recorded the same acoustic output playing through Battlefield 3.  

Unfortunately, AMD’s time frame for this launch didn’t make testing a second card possible. However, we’ll keep our eyes peeled for a replacement and follow-up should our findings change.

18. Radeon HD 7970: Fast, Forward-Looking, But Not Fully Baked

Enthusiasts who’re only interested in gaming might argue that investing staggering amounts of time and money in a complete GPU redesign to better facilitate general-purpose computing tasks is a bold gamble, and we’d certainly agree, especially after witnessing the recent outcome of AMD’s processor division presenting a redesign of its own. Right now, the primary purpose of a high-end graphics card is indisputably to serve up uncontested game performance. Although we’ve watched AMD push its heterogeneous compute initiative for a while now, it’s still a work in progress. And though we’ve tasted the sweet possibilities of GPU-accelerated video transcoding, password cracking, and Wi-Fi brute-forcing, the real-world applications of compute on the desktop are still disappointingly limited. We continue leaning on our CPUs for a majority of tasks. We hold out hope that this stuff will start catching on in a more serious way, though. The HPC space knows what parallelism can do, after all.

Fortunately, the Radeon HD 7970 doesn’t rely on its compute potential to turn heads. It pushes game performance in a big way, too. With months to go before Nvidia can retaliate with its upcoming Kepler architecture, AMD is able to claim it sells samples the fastest single-GPU graphics card—no small achievement for the company more known for its value proposition as of late.

Beyond its frame rates, the Radeon HD 7970 (and, we have to assume, the other 7000-series boards that will follow it), includes a number of interesting features. Some of them, like ZeroCore are truly worthy of praise. Especially in multi-GPU configurations, improved power management should help keep thermals and acoustics in check.

Unfortunately, we aren’t able to speak to the value of several others, or even confirm that they work. AMD rushed this launch to an extent that I don’t think we’ve ever seen before. First, it preempted the introduction of Windows 8, which will realize the Radeon HD 7000-series’ DirectX 11.1 support. Then, it pulled in its debut so far that its software partners—the ones responsible for enabling important stuff like the Video Codec Engine—couldn’t react fast enough to facilitate testing. AMD even jumped the gun on its own driver team. We have to tell you about important features coming in future builds because they’re simply not ready yet. At the very same time, AMD doesn’t have enough Radeon HD 7970 boards on hand to allow for testing Tahiti in CrossFire.

With so many balls up in the air, it’s certainly hard to recommend that you take any action right now, and that’s why we’re calling this a preview. We haven’t done extensive-enough testing to feel comfortable making a recommendation one way or the other, even if this card were for sale. How about buying an older high-end card? It’s hard to get excited about the Radeon HD 6970 or GeForce GTX 580 knowing that more 7000-series cards are right behind this one, likely priced to compete. As such, we’re now in a slightly awkward position, waiting for the market to look a little less like murky soup and a little more like firm, tasty Jell-O.

Are there any downright negative points to mention? Its $550 price tag is acceptable compared to the GeForce GTX 580, though we’d like to see it a little lower. If supply is poor after January 9th, when AMD expects the 7970 to start selling, however, there’s a very good chance that street prices will actually end up higher. There’s not much competition for this card, and therefore no catalyst to push prices any lower. It might not be as fast, but the Radeon HD 7970 is definitely more desirable than expensive, quirky, and hard-to-find dual-GPU flagships like the Radeon HD 6990 and GeForce GTX 590. Aside from pricing, we also have to complain about the loud noise created by the reference cooler on our test sample. If you can wait until there’s more selection, you might be well served by purchasing a Radeon HD 7970 from a manufacturer that offers a quieter non-reference cooler, or until we can get our hands on another card to double-check our findings.

As for GPU compute, we don’t deny that there’s some real potential there. We saw some impressive results in the tests we did run, although they were mingled with mediocre ones. It seems clear that AMD’s software framework needs further optimization, and that ISVs need more time to figure out what code is truly able to benefit from the parallelization of a GPU. Only then will we see the same sort of pick-up already in progress in the environments where products like Tesla currently live.

Time will tell if AMD’s commitment to GPU-based compute will pay off. In the short-term, though, its Radeon HD 7970 is a desirable graphics card for gamers looking for great single-GPU performance. It'll take Nvidia’s answer to the Radeon HD 7000 series, based on its next-generation Kepler architecture and expected in the first half of 2012, to contend with this powerhouse. Before you're able to buy it in January, however, we'll have a more in-depth follow-up for your decision-making pleasure.