Nvidia GeForce GTX 1080 Pascal Review

Simultaneous Multi-Projection And Async Compute

The Simultaneous Multi-Projection Engine

Certain parts of GP104 affect the performance of every game we test today—the increased core count, its clock rate and Nvidia’s work to enable 10 Gb/s GDDR5X. Other features can’t be demonstrated yet, but have big implications for the future. Those are the tough ones to assign value to in a review. Nevertheless, the Pascal architecture incorporates several capabilities that we can’t wait to see show up in shipping games.

Nvidia calls the first its Simultaneous Multi-Projection Engine. This feature is enabled through a hardware block added to GP104’s PolyMorph Engines. This piece of logic takes the geometry data and processes it through as many as 16 projections from one viewpoint. Or it can offset the viewpoint for stereo applications, replicating geometry as many as 32 times in hardware and without the expensive performance overhead you’d incur if you tried to achieve the same effect without SMP.

Single-plane projectionSingle-plane projection

Alright, so let’s back up a minute and give this some context. I game almost exclusively on three monitors in a Surround configuration.  But my monitors are tilted inward to “wrap” around my desk, if only for office productivity reasons. Games don’t know this though. A city street spanning all three displays bends at each bezel, and a circular table on the periphery appears distorted. The proper way to render for my configuration would be one projection straight ahead, a second projection to the left, as if out of a panoramic airplane cockpit, and a third projection on the right oriented similarly. The previously-bent street could be straightened out this way, and I’d end up with a much wider field of view. The entire scene still has to be rasterized and shaded, but you save the setup, driver and GPU front-end overhead of rendering the scene three times.

The incorrect perspective with angled displaysThe incorrect perspective with angled displaysCorrected perspective with SMPCorrected perspective with SMP

The catch is that an application must support wide FOV settings and use SMP API calls. That means game developers have to embrace the feature before you can enjoy it. We’re not sure how much effort will go into accommodating the relative few folks gaming in Surround. But there are other applications where it makes sense to implement this functionality immediately.

Using Single-Pass Stereo, SMP creates one projection for each eyeUsing Single-Pass Stereo, SMP creates one projection for each eye

Take VR as an example. You already need one projection for each eye. Today, games simply render to the two screens separately, incurring all of the aforementioned inefficiencies. But because SMP supports a pair of projection centers, they can both be rendered in one pass using a feature Nvidia calls Single Pass Stereo. Vertex processing happens once, and SMP kicks back two positions for each vertex corresponding to your left and right eyes. From there, SMP can apply additional projections to enable a feature referred to as Lens Matched Shading.

First pass image with Lens Matched ShadingFirst pass image with Lens Matched ShadingFinal image sent to HMDFinal image sent to HMD

Briefly, Lens Matched Shading attempts to make VR more efficient by avoiding a lot of the work that’d normally go into rendering a traditional planar projection, prior to it being bent in to match the distortion of an HMD’s lenses (thereby wasting the pixels out where the bend is most pronounced). Using SMP to divide the display region into quadrants, this effect can be approximated. So instead of rendering a square projection and manipulating that, the GPU creates images matching the lens distortion filter. This keeps it from generating more pixels than needed. And so long as developers match or exceed each HMD’s sampling rate requirements per eye, you won’t be able to tell a difference in quality.

When you combine Single Pass Stereo and Lens Matched Shading, Nvidia claims it’s possible to see a 2x performance improvement in VR compared to a GPU without SMP support. Some of this comes from gains related to pixel throughput—using Lens Matched Shading to avoid working on pixels that don’t need to be rendered, Nvidia’s conservative preset knocks a 4.2 MPix/s (Oculus Rift) workload down to 2.8 MPix/s, freeing up 1.5x of the GPU’s shading horsepower. Then, by processing geometry once and shifting it in hardware (rather than re-doing everything for your second eye), Single Pass Stereo effectively alleviates half of the geometry work being done today. For those of you who were awestruck by Jen-Hsun’s slide showing “2x Perf and 3x Efficiency Vs. Titan X” during the 1080 livestream, now you know what went into that math.

Asynchronous Compute

The Pascal architecture also incorporates some changes relating to asynchronous compute, which is timely for several reasons relating to DirectX 12, VR and AMD’s architectural head-start.

With its Maxwell architecture, Nvidia supported static partitioning of the GPU to accommodate overlapping graphics and compute workloads. In theory, this was a good way to maximize utilization, so long as the both segments were active. If you set aside 75% of the processor for graphics and that segment went idle waiting for the compute side to finish, you could burn through whatever gains might have been possible by running those tasks concurrently. Pascal addresses this with a form of dynamic load balancing. GPU resources can still be allocated, but if the driver determines that one partition is underutilized, it’ll allow the other to jump in and finish, preventing a stall from negatively affecting performance.

Nvidia also improves Pascal’s preemption capabilities—that is, its ability to interrupt a task in order to address a more time-sensitive workload with very low latency. As you know, GPUs are highly parallel machines with big buffers intended to keep those similar resources sitting next to each other busy. An idle shader does you no good, so by all means, queue up work to feed through the graphics pipeline. But not everything a GPU does, especially nowadays, is as tolerant of delays.

In VR, you want to send your preemption request as late as possible to capture the most up to date tracking dataIn VR, you want to send your preemption request as late as possible to capture the most up to date tracking data

A perfect example is the asynchronous timewarp feature Oculus enabled for the launch of its Rift. In the event that your graphics card cannot get a fresh frame out every 11ms on a 90Hz display, ATW generates an intermediate frame using the rendering thread’s most recent work, correcting for head position. But it has to have enough time to create a timewarped frame, and unfortunately graphics preemption isn’t particularly granular. In fact, the Fermi, Kepler and Maxwell architectures support draw-level preemption, meaning they can only switch at draw call boundaries, potentially holding up ATW. Preemption requests consequently have to be made early in order to guarantee control over the GPU in time to get a warped frame out ahead of the display refresh. This is a bummer because you really want ATW to do its work as late as possible before that refresh.

Pascal implements far more granular pixel-level preemption for graphics, so GP104 can stop what it’s doing at a pixel level, save the pipeline’s state off-die and switch contexts. Instead of the millisecond-class preemption we’ve seen Oculus write about, Nvidia is claiming less than 100µs.

The Maxwell architecture already supported the equivalent of pixel-level preemption on the compute side by enabling thread-level granularity. Pascal has this as well, but adds support for instruction-level preemption in CUDA compute tasks. Nvidia’s drivers don’t include the functionality right now, but it, along with pixel-level preemption, should be accessible through the driver soon.

Create a new thread in the US Reviews comments forum about this subject
This thread is closed for comments
206 comments
Comment from the forums
    Your comment
    Top Comments
  • xenol
    Quote:
    Exactly, but it seems like nvidia instructed every single outlet to bench the Reference 1080 only against stock Maxwell cards, which is honestly bullshit - pardon. I bet an OCed 980Ti would come super close to the stock 1080, which at that point makes me wonder why even upgrade now, sure you can push the 1080 too, but I'd wait for a price drop or at least the supposed cheaper AIB cards.

    The thing is not every card is OC'd to the same level, and some cards won't OC to the highest level of performance you can get. Stock is the only way to keep things fair because every card can do at least stock or better, but not every card can OC to the same level.
  • tical2399
    Anonymous said:
    performance wise, no comment. Price wise, really? if the 1080 costs 700 @ launch, the 1080ti, or whatever, will cost how much? 1000? then the Pascal Titan 1500? I dont like the road we are heading, really.


    The 1080 costs 600 at launch, that extra 100 is the suckers price that nvidia is charging for day one people. They are charging 100 because they know most are stupid enough to pay it. The actual price is 600
  • crisan_tiberiu
    performance wise, no comment. Price wise, really? if the 1080 costs 700 @ launch, the 1080ti, or whatever, will cost how much? 1000? then the Pascal Titan 1500? I dont like the road we are heading, really.
  • Other Comments
  • toddybody
    These power consumption charts are making me cross eyed :/
  • JeanLuc
    Chris, were you invited to the Nvidia press event in Texas?

    About time we saw some cards based of a new process, it seemed like we were going to be stuck on 28nm for the rest of time.

    As normal Nvidia is creaming it up in DX11 but DX12 performance does look ominous IMO, there's not enough gain over the previous generation and makes me think AMD new Polaris cards might dominate when it comes to DX12.
  • slimreaper
    Could you run an Otoy octane bench? This really could change the motion graphics industry!?
  • F-minus
    Seriously I have to ask, did nvidia instruct every single reviewer to bench the 1080 against stock maxwell cards? Cause i'd like to see real world scenarios with an OCed 980Ti, because nobody runs stock or even buys stock, if you can even buy stock 980Tis.
  • cknobman
    Nice results but honestly they dont blow me away.

    In fact, I think Nvidia left the door open for AMD to take control of the high end market later this year.

    And fix the friggin power consumption charts, you went with about the worst possible way to show them.
  • FormatC
    Stock 1080 vs. stock 980 Ti :)

    Both cards can be oc'ed and if you have a real custom 1080 in your hand, the oc'ed 980 Ti looks in direct comparison to an oc'ed 1080 worse than the stock card in this review to the other stock card. :)
  • Gungar
    @F-minus, i saw the same thing. The gtx 980Ti overclocks way better thn 1080, i am pretty sure OC vs OC, there is nearly no performance difference. (disappointing)
  • toddybody
    Quote:
    @F-minus, i saw the same thing. The gtx 980Ti overclocks way better thn 1080, i am pretty sure OC vs OC, there is nearly no performance difference. (disappointing)


    LOL. My 980ti doesnt hit 2.2Ghz on air. We need to wait for more benchmarks...I'd like to see the G1 980ti against a similar 1080.
  • F-minus
    Exactly, but it seems like nvidia instructed every single outlet to bench the Reference 1080 only against stock Maxwell cards, which is honestly bullshit - pardon. I bet an OCed 980Ti would come super close to the stock 1080, which at that point makes me wonder why even upgrade now, sure you can push the 1080 too, but I'd wait for a price drop or at least the supposed cheaper AIB cards.
  • FormatC
    I have a handpicked Gigabyte GTX 980 Ti Xtreme Gaming Waterforce at 1.65 Ghz in one of my rigs, it's slower.
  • WildCard999
    I have to say i'm a bit disapointed with 4K performance even though its better then the 980ti/Titan X I still wouldn't consider it a 4K GPU. I would like to see a follow-up review for SLI since the bandwith has nearly doubled with the new bridges.

    "So why does the card still have two connectors? Using new SLI bridges, both connectors can be used simultaneously to enable a dual-link mode. Not only do you get the benefit of a second interface, but Pascal also accelerates the I/O to 650MHz, up from the previous generation’s 400MHz. As a result, bandwidth between processors more than doubles."
  • xenol
    Quote:
    Exactly, but it seems like nvidia instructed every single outlet to bench the Reference 1080 only against stock Maxwell cards, which is honestly bullshit - pardon. I bet an OCed 980Ti would come super close to the stock 1080, which at that point makes me wonder why even upgrade now, sure you can push the 1080 too, but I'd wait for a price drop or at least the supposed cheaper AIB cards.

    The thing is not every card is OC'd to the same level, and some cards won't OC to the highest level of performance you can get. Stock is the only way to keep things fair because every card can do at least stock or better, but not every card can OC to the same level.
  • CraigN
    Why did you cap Witcher 3 at 60 FPS?

    Sure, it has some inconsistent performance, but it's a bit meaningless for the 1440p benchmark to see it just smack up against the wall with the Titan X and 980 Ti when you could have let them off the leash to at least see the maximum gains you would get from it, like you did for every other game in the review.
  • Badelhas
    Nice review, congrats! But what about including tht HTC Vive on your benchmarks? If you talk about the VR benefits, you have to show them in graphs!
  • tical2399
    Not enough reason to move fro my 980 ti. I don't even think that the 1080 ti will do 4k 60 in all games. I'll probably just wait another year and a half or so to 2 yeafrs for 1180 ti or whatever it will be
  • FarmerFran
    Currently the 1080 is priced pretty close to the 980ti. Within ~100ish. So if you recently purchased a 980ti then an upgrade might not be worth it.
  • crisan_tiberiu
    performance wise, no comment. Price wise, really? if the 1080 costs 700 @ launch, the 1080ti, or whatever, will cost how much? 1000? then the Pascal Titan 1500? I dont like the road we are heading, really.
  • tical2399
    Anonymous said:
    performance wise, no comment. Price wise, really? if the 1080 costs 700 @ launch, the 1080ti, or whatever, will cost how much? 1000? then the Pascal Titan 1500? I dont like the road we are heading, really.


    The 1080 costs 600 at launch, that extra 100 is the suckers price that nvidia is charging for day one people. They are charging 100 because they know most are stupid enough to pay it. The actual price is 600
  • FarmerFran
    Like all things inflation... the pricing seems to be set right with the 900 series.
  • chaosmassive
    Quote:
    Seriously I have to ask, did nvidia instruct every single reviewer to bench the 1080 against stock maxwell cards? Cause i'd like to see real world scenarios with an OCed 980Ti, because nobody runs stock or even buys stock, if you can even buy stock 980Tis.


    Nvidia : My card, my rules