Sign in with
Sign up | Sign in
The Myths Of Graphics Card Performance: Debunked, Part 1
By ,
1. Performance That Matters: Going Beyond A Graphics Card's Lap Time

If you're an auto enthusiast, you've no doubt debated the performance of two sports cars with a friend at some point. One might have made more horsepower. Maybe it had a higher top speed, superior handling, or lighter weight. Typically, those conversations come down to comparing lap times on the Nürburgring and end when someone spoils the fun by reminding us that we can't afford any of the contenders anyway. 

In many ways, high-end graphics cards can be quite similar. You have average frame rate, frame time variance, noise from the cooling solution, and a range of price points, which can incidentally double the cost of a current-gen gaming console. And if you needed any further convincing, some of the latest video cards have aluminum and magnesium alloy frames, just like race cars. Alas, some differences remain. Despite my best attempts at impressing my wife with the latest graphics processor, she remains impervious.

So, what is the lap time equivalent for a video card? What is the one measure that distinguishes winners from losers, cost being equal? It's clearly not just average frames per second, as demonstrated by all of the coverage we've given to frame time variance, tearing, stuttering, and fans that sound like jet engines. Then you get into the more technical specifications: texture fill rate, compute performance, memory bandwidth. What significance do all of those numbers hold? And, like a Formula 1 pit crew member, does your new card require headphones just to be tolerated? How do you account for the overclocking headroom of each card in an evaluation?

Before we dig into the myths that envelop modern graphics cards, let's start by defining what performance is and what it is not. 

Performance Is An Envelope, Not One Number

Discussions of GPU performance are often distilled down to generalizations based on FPS, or average frames per second. In reality, a graphics card's performance includes far more than the rate at which it renders frames. It's better to think in terms of an envelope, rather than one data point, though. This envelope has four major dimensions: speed (frame rate, frame latency, and input lag), quality (resolution and image quality), quietness (acoustic performance, driven by power consumption and cooler design), and of course affordability. 

Other factors play into a card's value, such as game bundles and vendor-specific technologies. I'll cover them briefly, but won't try to weigh them quantitatively. Truly, the importance of CUDA, Mantle, and ShadowPlay support is very user-dependent.

The above graph illustrates the GeForce GTX 690's position in this variable envelope I'm describing. Stock, it achieves 71.5 FPS using a test system I'll detail on the following page in Unigine Valley 1.0 at the ExtremeHD preset. It generates an audible, but not bothersome 42.5 dB(A). If you're willing to live with a borderline-noisy 45.5 dB(A), you can easily overclock the card and get a stable 81.5 FPS using the same preset. Lower the resolution or anti-aliasing level (affecting quality), and you get a big bump up in frame rate, all else being equal. Of course, the (un)affordable $1000 price point doesn't change.

For the sake of running tests in a more controlled manner than you're used to seeing, let's define a reference for video card performance.

MSI Afterburner and EVGA PrecisionX are free tools that let you manually set a card's fan speed, and hence configure its noise level accordinglyMSI Afterburner and EVGA PrecisionX are free tools that let you manually set a card's fan speed, and hence configure its noise level accordingly

For the purposes of today's story, I'll specify performance as the frames per second a graphics board can output at a given resolution, within a specific application along the described envelope (and under the following conditions):

  1. Quality settings in a given application set to their highest value (typically the Ultra or Extreme preset)
  2. Resolution set to a constant level (typically 1920x1080, 2560x1440, 3840x2160, or 5760x1080 in a three-monitor array)
  3. Driver settings at each manufacturer's defaults (whether global or application-specific)
  4. Operating in a closed enclosure at a set 40 dB(A) noise level measured three feet away from the enclosure (ideally, tested on a reference platform that gets updated annually)
  5. Operating with an ambient temperature of 20 °C/68 °F and one atmosphere air pressure (this is important; it directly affects thermal throttling)
  6. Core and memory operating at temperature equilibrium as far as thermal throttling is concerned (so that core/memory clock speeds under load remain fixed or vary within a tight range, given a constant 40 dB(A) noise level (and corresponding fan speed) target
  7. Maintaining a 95th percentile frame time variance below 8 ms, which is half a frame at a typical display refresh rate of 60 Hz
  8. Operating at or near 100% of GPU utilization (this is important to demonstrate a lack of platform bottlenecks; if there are bottlenecks, GPU utilization will be below 100% and the test results will not be very meaningful)
  9. Averaged FPS and frame time variance data from no fewer than three runs per data point, each run no less than one minute long, with individual samples exhibiting no more than 5% deviation from the mean(ideally we want to sample different cards of the same time, particularly when there is reason to believe a vendor's products exhibit significant variance)
  10. Measured with either Fraps for a single card or any built-in frame counter; FCAT is required for multiple cards in SLI/CrossFire

As you can imagine, the reference performance level is both application- and resolution-dependent. But it's defined in a way that allows for independent repetition and verification of tests. In this sense, it's a truly scientific approach. As a matter of fact, we encourage the industry and enthusiasts alike to repeat the tests we perform and bring any discrepancies to our attention. Only in this way will the integrity of our work be assured.

This definition of reference performance does not account for overclocking, or the range of behaviors a given GPU might exhibit from one card to another. Fortunately, we'll see that's only an issue in a few cases. Modern thermal throttling mechanisms are designed to eke out maximum frame rates in as many situations as possible, so cards are operating closer than ever to their limits. Ceilings are often hit before overclocking adds any real-world benefit.

Unigine Valley 1.0 is a benchmark we use extensively in this article. It features a number of DirectX 11-based features and produces highly repeatable tests. It also doesn't rely on physics (and thus CPU) as much as 3DMark (at least in its overall and combined tests).

What Are We Setting Out To Do Here?

In the course of this two-part story, I plan to look at each of the dimensions that compose a video card's performance envelope, and then try to answer common questions about them. We'll extend the conversation to input lag, display ghosting, and tearing, all of which relate to your gaming experience, but not specifically to frame rates. I'd also like to compare cards using this criteria. As you can imagine, testing this way is extremely time consuming. However, I think the additional insight is worth the effort. That doesn't mean our graphics card reviews are going to change; we're experimenting, and taking you with us.

With the definition of graphics card performance already covered, the rest of today's piece involves methodology, V-sync, noise and the noise level-adjusted performance of graphics cards, and a look at the amount of video memory you really need. Part two will look at anti-aliasing technologies, the impact of display choice, various PCI Express link configurations, and the idea of value for your money.

Time to move on to the test system setup. More so here than in other reviews, you will want to read that page carefully, since it contains important information about the tests themselves.

2. Graphics Card Myth Busting: How We Tested

Two Systems; Two Purposes

All of today's tests are performed on two separate rigs. One plays host to an older Intel Core i7-950, and another based on Intel's Core i7-4770K.

Test System 1
Enclosure
Corsair Obsidian Series 800D, Full Tower Case
CPU
Intel Core i7-950 (Bloomfield), Overclocked to 3.6 GHz, Hyper-Threading and power-saving features disabled
CPU Cooler
CoolIT Systems ACO-R120 ALC, Tuniq TX-4 TIM, Scythe GentleTyphoon 1850 RPM radiator fan
Motherboard
Asus Rampage III Formula
Intel LGA 1366, Intel X58 Chipset, BIOS: 903
Memory
Corsair CMX6GX3M3A1600C9, 3 x 2 GB, 1600 MT/s, CL 9
Graphic Cards
AMD Radeon R9 290X 4 GB (Press Board)
Nvidia GeForce GTX 690 4 GB
(Retail Board)
Nvidia GeForce GTX Titan 6 GB (Press Board)
Hard Drive
Samsung 840 Pro, 128GB SSD, SATA 6 Gb/s
Power Supply Unit
Corsair AX850, 850 W
Networking
Cisco-Linksys WMP600N (Ralink RT286) 
Audio Card
Asus Xonar Essence STX
Software and Drivers
Operating system
Windows 7 Enterprise x64, Aero disabled (see note below)
Windows 8.1 Pro x64 (for reference only)
DirectX
DirectX 11
Graphic Drivers
AMD Catalyst 13.11 Beta 9.5; Nvidia GeForce 331.82 WHQL
Test System 2
Enclosure
Cooler Master HAF XB, Desktop/Test bench hybrid format
CPU
Intel Core i7-4770k (Haswell), Overclocked to 4.6 GHz, Hyper-Threading and power-saving features disabled
CPU CoolerXigmatek Aegir SD128264, Xigmatek TIM, Xigmatek 120 mm fan
Motherboard
ASRock Extreme6/ac
Intel LGA 1150, Intel Z87 Chipset, BIOS: 2.20
Memory
G.Skill F3-2133C9D-8GAB, 2 x 4 GB, 2133 MT/s, CL 9
Graphic Cards
AMD Radeon R9 290X 4 GB (Press Board)
Nvidia GeForce GTX 690 4 GB
(Retail Board)
Nvidia GeForce GTX Titan 6 GB (Press Board)
Hard Drive
Samsung 840 Pro, 128GB SSD, SATA 6 Gb/s
Power Supply Unit
Cooler Master V1000, 1000 W
Networking
On-board 802.11ac mini-PCIe Wi-Fi card
Audio Card
On-board Realtek ALC1150
Software and Drivers
Operating system
Windows 8.1 Pro x64
DirectX
DirectX 11
Graphic Drivers
AMD Catalyst 13.11 Beta 9.5; Nvidia GeForce 332.21 WHQL

The first test system needed to facilitate repeatable results in a real-world environment. So, I set up a relatively old, but still very capable LGA 1366-based machine in a full-tower enclosure.

Test system number two needed to fulfill more specific requirements:

  1. Support PCIe 3.0 with a limited number of lanes (an LGA 1150-equipped board with a Haswell-based CPU, which only offers 16 lanes)
  2. Do not employ a PLX bridge chip
  3. Support three-way CrossFire in x8/x4/x4 or SLI in x8/x8 configurations

ASRock sent us its Z87 Extreme6/ac, which fit that description. We previously tested this board (minus the Wi-Fi module) in Five Z87 Motherboards Under $220, Reviewed, where it received our prestigious Smart Buy award. The sample we received was easy to set up, had no problem overclocking our Core i7-4770K sample to 4.6 GHz.

The board's UEFI gave me the option to set PCI Express transfer rates on a slot-by-slot basis, which enabled testing of PCIe Gen 1, 2 and 3 on the same motherboard. You will see the results of these tests in part 2 of this article.

Cooler Master supplied the second test system's chassis and power supply. The unconventional HAF XB enclosure, which also received Smart Buy honors in Cooler Master's HAF XB: Give Your LAN Party Box Breathing Room, proved comfortable to work with. It's very open, of course, so the components inside can get noisy if you don't have the right cooling setup. The case benefits from good airflow though, particularly if you hook up all of the optional fans.

The modular V1000 power supply allowed us to drive three high-end graphics cards, while containing cable clutter in a setting that was destined to get messy.

Comparing Test System 1 And 2

It's striking to see how similarly these systems perform once we get past their underlying architectures and focus on their frame rates. Here's a head-to-head between them in 3DMark Firestrike.

As you can see, the performance in graphics tests is essentially the same, even though the second machine has faster system memory (2133 versus 1800 MT/s, counter-balanced by Nehalem's triple-channel architecture compared to Haswell's two channels). Only in the host processor-dependent tests does the Core i7-4770K demonstrate an advantage.

The second system's main advantage is more overclocking headroom. Our Core i7-4770K sits at a stable 4.6 GHz on air, while the Core i7-950 can't exceed 4 GHz cooled by water. 

It's also worth noting that the first test system is benchmarked using Windows 7 x64 instead of Windows 8.1. There are three reasons for this:

  • First, the Windows desktop manager (Windows Aero or wdm.exe) uses a significant amount of graphics memory. At 2160p, it ties up an additional 200 MB in Windows 7 and 300 MB in Windows 8.1, on top of the 123 MB already reserved by Windows. This cannot be disabled without significant side effects in Windows 8.1, while it can be disabled in Windows 7 by switching to a basic theme. Four hundred megabytes is 20% of a 2 GB card's memory.
  • The memory usage in Windows 7 is consistent with a basic theme enabled. It is always 99 MB at 1080p and 123 MB at 2160p on a GeForce GTX 690. This makes for more repeatable tests. In contrast, the additional ~200 MB of memory used by Aero varies up and down by roughly 40 MB.
  • As of Nvidia's 331.82 WHQL driver, a bug exists affecting 2160p when Windows Aero is enabled. This only surfaces when Aero is enabled on a tiled 4K display, and it manifests itself as lowered GPU utilization during benchmarks (bouncing in the 60-80% range, instead of close to 100%), and a resulting drop in performance of 15% or so. Nvidia was notified of this.
Additional testing equipment
Screen photography
Canon EOS 400D
Canon EF 50 mm f/1.8 lens
1/400s, ISO 800, f/1.8-2.8
Sound Pressure Level monitor
ART SPL-8810, dB(A)/Low/Fast setting

Tearing and ghosting effects do not show up in regular screen shots or game videos; I used a fast camera to capture the actual on-screen image.

Case ambient temperature is measured with the Samsung 840 Pro integrated temperature sensor. Background ambient temperature was in the range of 20-22 °C (68-72 °F). Background sound pressure level for all noise tests was 33.7 dB(A), +/- 0.5 dB(A).

Benchmark Configuration
3D Games
The Elder Scrolls V: Skyrim
Version 1.9.32.0.8, Custom THG Benchmark, 25-Sec. HWiNFO64
Hitman: Absolution
Version 1.0.447.0, Built-in Benchmark, HWiNFO64
Total War: Rome 2
Patch 7, Built-in "Forest" Benchmark, HWiNFO64
BioShock Infinite
Patch 11, Version 1.0.1593882, Built-in Benchmark, HWiNFO64
Synthetic Benchmarks
Ungine Valley
Version 1.0, ExtremeHD Preset, HWiNFO64
3DMark Fire Strike [Extreme]Version 1.1

A variety of tools can be used for measuring graphics card memory use. We went with HWiNFO64, taking advantage of its maximum mark. The same results can be obtained through MSI Afterburner, EVGA Precision X, or simply the RivaTuner Statistics Server stand-alone.

3. To Enable Or Disable V-Sync: That Is The Question

Speed is the first dimension that comes to mind in a graphics card evaluation. How much faster is the latest and greatest than whatever came before? The Internet is littered with benchmarking data from thousands of sources trying to answer that question.

So, let's start by exploring speed and the variables to consider if you really want to know how fast a given graphics card is. 

Myth: Frames rate is the indicator of graphics performance

Let's start with something that the Tom's Hardware audience probably knows already, but remains a misconception elsewhere. Common wisdom suggests that for a game to be playable, it should run at 30 frames per second or more. Some folks believe lower frame rates are still alright, and others insist that 30 FPS is far too low.

In the debate, however, it's not always reinforced that FPS is just a rate, and there is a host of complexity behind it. Most notably, while the frame rate of a movie is constant, a rendered game varies over time and is consequently expressed as an average. Variation is a byproduct of the horsepower required to process any given scene, and as the on-screen content changes, so does frame rate.

The simple point is that there is more to quality of a gaming experience than the instantaneous (or average) rate at which frames are rendered. The consistency of their delivery is an additional factor. Imagine traveling on a highway at a constant 65 MPH compared to the same trip at an average of 65 MPH, spending a lot more time switching between accelerator and brake. You reach your destination in roughly the same amount of time, but the experience is quite a bit different.

So, let's set the question "How much performance is enough?" aside for a moment. We'll get back to it after touching a few other relevant topics.

Introducing V-sync

Myths: Frame rates over 30 FPS aren't necessary; the human eye can't tell a difference. Values above 60 FPS on a 60 Hz display aren't necessary; the monitor is already refreshing 60 times a second. V-sync should always be enabled. V-sync should always be disabled.

How are rendered frames actually displayed? Because of the way almost all LCD displays work, the image on-screen is updated a fixed number of times per second. Typically, the magic number is 60, though there are also 120 and 144 Hz panels capable of more refreshes per second. When you talk about this mechanism, you're referring to the refresh rate, which of course is measured in Hertz.

Now, the mismatch between the graphics card's variable frame rate and the display's fixed refresh rate can be problematic. When the former happens faster than the latter, you end up with multiple frames displayed in the same scan, resulting in an artifact called screen tearing. In the image above, the colored bars denote unique frames from the graphics card getting thrown up on-screen as they're ready. This can be highly distracting, particularly in a fast-paced shooter.

The image below shows another artifact commonly seen on-screen, but rarely documented. Because it's a display artifact, it doesn't show up in screen shots, but instead represents the image your eyes actually see. You need a fast camera to capture it. FCAT, which is what Chris Angelini used to create the traffic cone shot in Battlefield 4, does reflect tearing, but not the ghosting effect I'm illustrating.

Screen tearing is evident in both of my BioShock Infinite images. But it's more evident on the 60 Hz Sharp than the 120 Hz Asus panel because the VG236HE runs at a refresh rate that's twice as high . This artifact is the clearest indicator that a game is running with V-sync, or vertical synchronization, disabled. 

The other issue in the BioShock image is ghosting, which you can see especially in the bottom of the left image. This is attributable to screen latency. In short, individual pixels don't change color quickly enough and show this type of afterglow. The in-game effect is far more dramatic than my images suggest. An 8 ms gray-to-gray response time, which is what the Sharp screen on the left is specified for, appears blurry whenever fast movement happens on-screen.

Back to tearing. The aforementioned V-sync is an old solution to the problem, which synchronizes the rate at which the video card presents frames to the screen's refresh rate. Because multiple frames no longer show up in a single panel refresh, tearing is no longer an issue. However, if your crank up the graphics quality of your favorite title and its frame rate drops below 60 FPS (or whatever your panel's refresh is set to), then your effective frame rate bounces between integer multiples of the refresh, illustrated below. Now, you face another artifact called stuttering.

One of the Internet's oldest arguments is whether you should turn V-sync on or leave it off. Some folks insist it's one or the other, and some enthusiasts will change the setting based on the game they're playing.

So, V-sync On, Or V-sync Off?

Let's say you're in the majority and own a typical 60 Hz display:

  • If you play first-person shooter games competitively, and/or have issues with perceived input lag, and/or if your system cannot sustain at least 60 FPS in a given title, and/or you're benchmarking your graphics card, then you should turn V-sync off
  • If none of the above applies to you and you experience significant screen tearing, then you should turn V-sync on.
  • As a general rule, or if you don’t feel strongly either way, just keep V-sync off.

If you own a gaming-oriented 120/144 Hz display (if you have one, there's a good chance you bought it specifically for its higher refresh rate):

  • You should consider leaving V-sync on only when playing older games, where you experience a sustained >120 FPS and you are experiencing screen tearing.

Note that there are certain cases where the frame rate-halving impact of V-sync doesn't apply, such as applications supporting triple buffering, though those cases aren't common. Also, in some games (like The Elder Scrolls V: Skyrim), V-sync is enabled by default. Forcing it off by modifying certain files can cause issues with the game engine itself. In those cases, you're best off leaving V-sync on.

G-Sync, FreeSync, and the Future

G-Sync Technology Preview: Quite Literally A Game Changer was a preview of Nvidia's solution to all of this. AMD made a somewhat feeble attempt at responding by showing off its FreeSync technology at CES 2014, though that might only be viable on laptops for now - that said, we applaud AMD's open-source approach to the technology as the right way to go. Both capabilities work around V-sync's compromises by allowing the display to operate at a variable refresh.

It is hard to say where the industry is heading, but as I mentioned in my G-Sync coverage, we're not fans of proprietary standards (and I bet most OEMs agree). I'd like to see Nvidia consider opening up G-Sync to the rest of the community, though we know from experience that the company tends not to do this.

4. Do I Need To Worry About Input Lag?

Myth: Graphics Cards Affect Input Lag

Let’s say you’re getting shot up in your favorite multi-player shooter before you have the chance to even react. Is your opposition really that much better than you? Could they be cheating? Or is something else going on?

Aside from the occasional cheat, which does happen, the truth might be that those seemingly super-human reflexes are at least partly assisted by technology. And they might have very little to do with your graphics card.

It takes time for what happens in a game to show up on your screen. It takes from for you to react. And it takes time for your mouse and keyboard inputs to register. Somewhat improperly, the delay between you issuing a command and the on-screen action is commonly called input lag. So, if you press the trigger in a first-person shooter and your weapon fires .1 seconds later, your input lag is effectively 100 milliseconds. 

Human reaction times to visual inputs vary. According to a 1986 U.S. Navy study, the average F-14 fighter pilot reacted to a simple visual stimulus in an average of 223 ms. And it might not seem correct, but human beings actually react faster to sound than visual inputs. Reactions to auditory stimuli tend to be in the ~150 ms range.

If you're curious, you can test for yourself how quickly you react to either by clicking the simple visual test and then the audio test.

Fortunately, no matter how poorly-configured your PC may be, it probably won't hit 200 ms of input lag. So, your personal reaction time remains the biggest influencer of how quickly your character responds in a game.

As differences in input lag increase, however, they increasingly do affect gameplay. Imagine a professional gamer with reflexes comparable to the best fighter pilots at 150 ms. A 50 ms slow-down in input means that person will be 30% slower (that's four frames on a 60 Hz display) than his competition. At the professional level, that's notable.

For mere mortals (including me; I scored 200 ms in the visual test linked above), and for anyone who would rather play Civilization V leisurely than Counter Strike 1.6 competitively, it’s an entirely different story; you can likely ignore input lag altogether.

Here are some of the factors that can worsen input lag, all else being equal:

  • Playing on an HDTV (even more so if its game mode is disabled) or playing on an LCD display that performs some form of video processing that cannot be bypassed. Check out DisplayLag's Input Lag database for a great list organized by model.
  • Playing on LCD displays, which employ higher-response time IPS panels (5-7 ms G2G typical), versus TN+Film panels (1-2 ms GTG possible), versus CRT displays (the fastest available).
  • Playing on displays with lower refresh rates; the newest gaming displays support 120 or 144 Hz natively.
  • Playing at low frame rates (30 FPS is one frame every 33 ms; 144 FPS is one frame every 7 ms).
  • Using a USB-based mouse with a low polling rate. The default 125 Hz is a ~6 ms cycle time, yielding a ~3 ms input lag on average. Meanwhile, gaming mice can go to ~1000 Hz for ~0.5 ms average input lag.
  • Using a low-quality keyboard (keyboard input lag is 16 ms typically, but can be higher for poor ones).
  • Enabling V-sync, especially so when using triple buffering as well (there is a myth that Direct3D does not implement triple buffering; the reality is that Direct3D does account for the option of multiple back buffers, but few games exploit this). Check out Microsoft's write-up, if you're technically inclined.
  • Playing with high render-ahead queues. The default in Direct3D is three frames, or 48 ms at 60 Hz. This figure can be increased to 20 for greater “smoothness” and dropped to one for increased responsiveness at the cost of greater frame time variance and, in some cases, somewhat lower FPS overall. There is no such setting as a zero setting; what zero does is simply reset to the default value of three. Check out Microsoft's write-up, if you're technically inclined.
  • Playing on a high-latency Internet connection. While this goes beyond what would be defined as input lag, if effectively stacks with it

Factors that do not make a difference include:

  • Using a PS/2 or USB keyboard (see a dedicated page in our article: Five Mechanical-Switch Keyboards: Only The Best For Your Hands)
  • Using a wireless or wired network connection (just try pinging your router if you don’t believe us; you should see ping times of less than 1 ms). 
  • Enabling SLI or CrossFire. The longer render queues required to enable these technologies are generally compensated by higher frame throughput.

Bottom Line: Input lag only matters in "twitch" games, and really matters only at highly competitive levels.

There is a lot more to input lag than just display technology or a graphics card. Your hardware, hardware settings, display, display settings, and application settings all influence this measurement.

5. The Myths Surrounding Graphics Card Memory

Video memory enables resolution and quality settings, does not improve speed

Graphics memory is often used by card vendors as a marketing tool. Because gamers have been conditioned to believe that more is better, it's common to see entry-level boards with far more RAM than they need. But enthusiasts know that, as with every subsystem in their PCs, balance is most important.

Broadly, graphics memory is dedicated to a discrete GPU and the workloads it operates on, separate from the system memory plugged in to your motherboard. There are a couple of memory technologies used on graphics cards today, the most popular being DDR3 and GDDR5 SDRAM.

Myth: Graphics cards with 2 GB of memory are faster than those with 1 GB

Not surprisingly, vendors arm inexpensive cards with too much memory (and eke out higher margins) because there are folks who believe more memory makes their card faster. Let's set the record straight on that. The memory capacity a graphics card ships with has no impact on that product's performance, so long as the settings you're using to game with don't consume all of it.

What does having more video memory actually help, then? In order to answer that, we need to know what graphics memory is used for. This is simplifying a bit, but it helps with:

  • Loading textures
  • Holding the frame buffer
  • Holding the depth buffer ("Z Buffer")
  • Holding other assets that are required to render a frame (shadow maps, etc.)

Of course, the size of the textures getting loaded into memory depends on the game you're playing and its quality preset. As an example, the Skyrim high-resolution texture pack includes 3 GB of textures. Most applications dynamically load and unload textures as they're needed, though, so not all textures need to reside in graphics memory. The textures required to render a particular scene do need to be in memory, however.

The frame buffer is used to store the image as it is rendered, before or during the time it is sent to the display. Thus, its memory footprint depends on the output resolution (an image at at 1920x1080x32 bpp is ~8.3 MB; a 4K image at 3840x2160x32 is ~33.2 MB), the number of buffers (at least two; rarely three or more).

As specific anti-aliasing modes (FSAA, MSAA, CSAA, CFAA, but not FXAA or MLAA) effectively increase the number of pixels that need to be rendered, they proportionally increase overall required graphics memory. Render-based anti-aliasing in particular has a massive impact on memory usage, and that grows as sample size (2x, 4x, 8x, etc) increases. Additional buffers also occupy graphics memory.

So, a graphics card with more memory allows you to:

  1. Play at higher resolutions
  2. Play at higher texture quality settings
  3. Play with higher render-based antialiasing settings

Now, to address the myth.

Myth: You need 1, 2, 3, 4, or 6 GB of graphics memory to play at (insert your display's native resolution here).

The most important factor affecting the amount of graphics memory you need is the resolution you game at. Naturally, higher resolutions require more memory. The second most important factor is whether you're using one of the anti-aliasing technologies mentioned above. Assuming a constant quality preset in your favorite game, other factors are less influential.

Before we move on to the actual measurements, allow me to express one more word of caution. There is a particular type of high-end card with two GPUs (AMD's Radeon HD 6990 and 7990, along with Nvidia's GeForce GTX 590 and 690) that are equipped with a certain amount of on-board memory. But as a result of their dual-GPU designs, data is essentially duplicated, halving the effective memory. A GeForce GTX 690 with 4 GB, for instance, behaves like two 2 GB cards in SLI. Moreover, when you add a second card to your gaming configuration in CrossFire or SLI, the array's graphics memory doesn't double. Each card still has access only to its own memory.

These tests were run on a Windows 7 x64 setup with Aero disabled. If you’re using Aero (or Windows 8/8.1, which doesn't have Aero), you should add ~300 MB to each and every individual measure you see listed below.

As you can see from the latest Steam hardware survey, most gamers (about half) tend to own video cards with 1 GB of graphics memory, ~20% have about 2 GB, and the number of users with 3 GB or more is less than 2%.

We tested Skyrim with the official high-resolution texture pack enabled. As you can see, 1 GB of graphics memory is barely enough to play the game at 1080p without AA or with MLAA/FXAA enabled. Two gigabytes will let you run at 1920x1080 with details cranked up and 2160p with reduced levels of AA. To enable the full Ultra preset and 8xMSAA, not even 2 GB card is sufficient.

Bethesda’s Creation Engine is a unique creature in this set of benchmarks. It is not easily GPU-bound, and is instead often limited by platform performance. But in these tests, we newly demonstrate how Skyrim can be bottlenecked by graphics memory at the highest-quality settings.

It's also worth noting that enabling FXAA uses no memory whatsoever. There's a value trade-off to be made in cases where MSAA is not an option.

6. More Graphics Memory Measurements

Io Interactive's Glacier 2 engine, which powers Hitman: Absolution, is memory-hungry, second only (in our tests) to the Warscape engine from Creative Assembly (Total War: Rome II) when the highest-quality presets are taken into account.

In Hitman: Absolution, a 1 GB card is not sufficient for playing at the game’s Ultra Quality level at 1080p. A 2 GB card does allow you to set 4xAA at 1080p, or to play without MSAA at 2160p.

To enable 8xMSAA at 1080p you need a 3 GB card, and nothing short of a 6 GB Titan supports 8xMSAA at 2160p.

Once again, enabling FXAA uses no additional memory.

Note: Ungine’s latest benchmark, Valley 1.0, does not support MLAA/FXAA directly. Thus, the results you see represent memory usage when MLAA/FXAA is force-enabled in CCC/NVCP.

The data shows us that Valley runs fine on a 2 GB card at 1080p (at least as far as memory use goes). You can even use a 1 GB card with 4xMSAA enabled, which is not the case for most games. At 2160p, however, the benchmark will only run properly on a 2 GB card so long as you don't turn on AA, or use a post-processing effect instead. The 2 GB ceiling gets hit with 4xMSAA turned on.

Ultra HD with 8xMSAA enabled gobbles up over 3 GB of graphics memory, which means this benchmark will only run properly at that preset using Nvidia's GeForce GTX Titan or one of AMD's 4 GB Hawaii-based boards.

Total War: Rome II uses an updated Warscape engine from Creative Assembly. It doesn't support SLI at the moment (CrossFire does work, however). It also doesn't support any form of MSAA. The only form of anti-aliasing that works is AMD's proprietary MLAA, which is a post-processing technique like SMAA and FXAA.

One notable feature of this engine is its ability to auto-downgrade image quality based on available video memory. That's a good way to keep the game playable with minimal end-user involvement. But a lack of SLI support cripples the title on Nvidia cards at 3840x2160. At least for now, you'll want to play on an AMD board if 4K is your resolution of choice.

With MLAA disabled, Total War: Rome II’s built-in “forest” benchmark at the Extreme preset uses 1848 MB of graphics memory. The GeForce GTX 690’s 2 GB limit is exceeded with MLAA enabled at 2160p. At 1920x1080, memory use is in the 1400 MB range.

Note the surprising factor of running a supposedly AMD-only technology (MLAA) on Nvidia hardware. As both FXAA and MLAA are post-processing-based techniques, there is no technical reason why they won't run on interchangeable hardware. Creative Assembly is either switching behind-the-scenes to FXAA (despite what the configuration file says), or AMD's marketing department hasn't picked up on the fact above.

You need at least a 2 GB card to play Total War: Rome II at its Extreme quality preset at 1080p, and likely a CrossFire array with 3 GB+ to play smoothly at 2160p. If you only have a 1 GB card, the game might still be playable at 1080p, but you'll have to make some quality compromises.

What happens when graphics memory is completely consumed? The short answer is that graphics data starts getting swapped to system memory over the PCI Express bus. Practically, this means performance slows dramatically, particularly when textures are being loaded. You don't want this to happen. It'll make any game unplayable due to massive stuttering.

So, how much graphics memory do I need?

If you own a 1 GB card and a 1080p display, there's probably no need to upgrade right this very moment. A 2 GB card would let you turn on more demanding AA settings in most games though, so consider that a minimum benchmark if you're planning a new purchase and want to enjoy the latest titles at 1920x1080.

As you scale up to 1440p, 1600p, 2160p or multi-monitor configurations, start thinking beyond 2 GB if you also want to use MSAA. Three gigabytes becomes a better target (or multiple 3 GB+ cards in SLI/CrossFire).

Of course, as I mentioned, balance is critical across the board. An underpowered GPU outfitted with 4 GB of GDDR5 memory (rather than 2 GB) isn't going to automatically be playable at high resolutions just because it's complemented by the right amount of memory. And that's why, when we review graphics cards, we test multiple games, resolutions, and detail settings. It takes fleshing out a card's bottlenecks before smart recommendations can be made.

7. Thermal Management In A Modern Graphics Card

Modern graphics cards from both AMD and Nvidia employ protection mechanisms to ramp up fan speeds, and eventually throttle back clock rates and voltages if they get too hot. This technology doesn't always work to keep your system stable (particularly when you're overclocking). Rather, it's meant to keep the hardware from getting damaged. So it's not unheard of for an over-tuned card to crash, requiring a reset.

There has been much debate about how hot is too hot for a GPU. However, higher temperatures, if they're tolerated by the equipment, are actually desirable as they result in better heat dissipation overall (as the difference with ambient temperature, and thus amount of heat that can be transferred, is higher). At least from a technical perspective, AMD's frustration over reactions to the Hawaii GPU's thermal ceiling is understandable. There are no long-term studies that I'm aware of speaking to the viability of given temperature set points. From my own experiences with device stability, I have to rely on manufacturer specifications.

On the other hand, it is a well-known fact that silicon transistors broadly perform better at lower temperatures. That is the main reason you see competitive overclockers using liquid nitrogen to get the chips they're testing as cold as possible. In general, lower temperatures help facilitate more overclocking headroom.

Some of the most power-hungry cards in the world are the Radeon HD 7990 (375 W TDP) and GeForce GTX 690 (300 W TDP). Both are dual-GPU cards. Single-GPU boards tend to be quite a bit lower, though the Radeon R9 290-series cards creep up closer to 300 W. In either case, that's a lot of heat to dissipate.

Volumes have been written about graphics card cooling, so we wont delve into that. Rather, we're interested in what actually happens when you begin applying load to a modern GPU.

  1. You launch a processing-intensive application like a 3D game or your favorite bitcoin miner
  2. The card's clock rates increase to their nominal/boost values; the board starts warming up due to greater current absorption
  3. Fan speed progressively rises, up to a point defined by firmware; usually it'll taper off when acoustics approach 50 dB(A)
  4. If the programmed fan speed isn't enough to keep the GPU's temperature below a certain level, clock rates scale back until the temperature falls below the set threshold
  5. Your card should operate stably within a relatively narrow frequency and temperature range until the application driving the load is shut down

As you can imagine, the exact thermal throttling point depends on many factors, including the specific load, the enclosure's airflow, the ambient air temperature, and even ambient air pressure. That's why cards throttle at different times, or not at all. This thermal throttling point can be used to define a reference level of performance. And if we set a card's fan speed (and thus noise level) manually, we can create a noise-dependent measurement level. What use is that? Let's find out...

8. Testing Performance At A Constant 40 dB(A)

Why 40 dB(A)?

First of all, note the A in the decibel notation. That stands for A-weighting. It means that sound pressure levels are adjusted using a curve that mimics human sensitivity to noise levels at different frequencies.

Forty decibels is generally considered to be the average background noise level for a commonly quiet apartment. Recording studios might be in the 30 dB range, while 50 dB might be a quiet suburb or a conversation at home. Zero is commonly considered the threshold of human hearing, although it's uncommon to hear in the 0-5 dB range unless you're less than five years old. The decibel scale is logarithmic, and not linear. So 50 dB is twice as loud as 40, which is twice as loud as 30.

Trivia

Trivia: The world's quietest room has a -9 dB background noise level, and will reportedly give you hallucinations in less than an hour if you stand inside in the dark, due to sensory deprivation.

A PC operating at 40 dB(A) tends to blend in with the background noise level of your home/apartment. Generally, it shouldn't be noticeable.

How Do You Dial In 40 dB(A) Consistently?

A card's noise profile is affected by a few variables, one of which is the speed of its fan. Not all fans make the same amount of noise at the same RPM level, but each fan, on its own, should be consistent at a given rotational speed.

So, by measuring directly with a SPL meter from three feet away, I manually set each card's fan profile right at 40 dB(A).

Card
Fan Setting %
Fan RPM
dB(A) ±0.5
Radeon R9 290X
41%
2160
40.0
GeForce GTX 690
61%
2160
40.0
GeForce GTX Titan
65%
2780
40.0

The table above shows us that the Radeon R9 290X and GeForce GTX 690 achieve 40 dB(A) at the same fan speed, although at different fan settings. The Radeon's fan can be pushed higher overall, hitting rotational speeds and noise levels that the GTX 690's cooler cannot. Nvidia's GeForce GTX Titan, on the other hand, has a different noise profile, hitting 40 dB(A) at a higher 2780 RPM, but at a setting (65%) similar to the GeForce GTX 690 (61%).

This table illustrates fan profiles along a variety of presets. Overclocked cards under load can get pretty loud; I measured around 47 dB(A). The Titan is the quietest under a typical load, at 38.3 dB(A), while the GeForce GTX 690 is the loudest at 42.5 dB(A).

9. Can Overclocking Hurt Performance At 40 dB(A)?

Myth: Overclocking always yields performance benefits

Setting a specific fan profile, and letting cards throttle until they reach stability, yields an interesting and repeatable test.

Card
Ambient (°C)
Fan Setting
Fan RPM
dB(A) ±0.5GPU1 Clock
GPU2 Clock
Memory Clock
FPS
Radeon R9 290X
30
41%
2160
40.0870-890
n/a
1250
55.5
Radeon R9 290X
Overclocked
28
41%216040.0831-895
n/a1375
55.5
GeForce GTX 690
42
61%
216040.0967-1006
1032
1503
73.1
GeForce GTX 690
Overclocked
43
61%216040.0575-1150
1124
1801
71.6
GeForce GTX Titan
30
65%
2780
40.0915-941
n/a1503
62
GeForce GTX Titan
Overclocked
29
65%2780
40.0980-1019
n/a1801
68.3

Only the GeForce GTX Titan performs better when it's overclocked. The Radeon R9 290X gets absolutely no benefit, while the GeForce GTX 690 actually loses performance at our 40 dB(A) test point, cutting clock rate as low as 575 MHz when we overclock.

This test shows how much more performance headroom the Titan has compared to the other cards. Although it doesn't match the GeForce GTX 690, the overclocked Titan gets close, leaving the Radeon R9 290X further behind than more typical benchmarks might suggest.

Another interesting point is how much higher the ambient temperature gets with a GeForce GTX 690 in my case (12-14 °C). That's the effect of its center-mounted axial fan, which blows hot air back into the chassis, limiting thermal headroom. In most real-world cases, we'd expect a similar scenario. So, the trade-offs between more noise for more performance (or the other way around) need to be considered based on your own tastes.

Now, with V-sync, input lag, graphics memory, and benchmarking at a specific acoustic footprint explored in-depth, we'll get back to work on part two, which already includes exploring PCIe transfer rates, display sizes, deep-dives on proprietary vendor technologies, and value for your dollar. Of course, if there are other topics you'd like to see us broach, please let us know in the comments section!