Why Re-Organize GF100? One Word: Geometry
Successful architectures don’t get re-worked just to impress the ladies. There’s a rhyme and reason behind Nvidia’s decision to arm each GPC with its own raster engine and each SM with what it calls a PolyMorph engine (no, WoW players, we’re not talking about sheeping the stream processors here…).
First things first: the PolyMorph engine refers to a five stage piece of fixed-function logic that works in conjunction with the rest of the SM to fetch vertices, tessellate, perform viewport transformation, attribute setup, and output to memory. In between each stage, the SM handles vertex/hull shading and domain/geometry shading. From each PolyMorph engine, primitives are sent to the raster engine, each capable of eight pixels per clock (totaling 32 pixels per clock across the chip).
Now, why was it necessary to get more granular about the way geometry was being handled when a monolithic fixed-function front-end has worked so well in the past? After all, hasn’t ATI enabled tessellation units in something like six generations of its GPUs (as far back as TruForm in 2001)? Ah, yes. But how many games actually took advantage of tessellation between then and now? That’s the point.
Ever since the days of Nvidia’s GeForce 2 architecture, we’ve been hearing about programmable pixel and then vertex shading. Now we’re getting some very impressive shaders able to add tremendous detail to the latest DirectX 9 and 10 games (Nvidia claims a 150x increase in shading performance from the GeForce FX 5800-series to GT200). But I know we’ve all seen some of the terri-bad geometry that totally ruins the guise of realism in our favorite games. Purportedly, the next frontier in augmenting graphics realism involves cranking the dial on geometry.
DirectX 11 posits to fix this via three new stages in the rendering pipeline: the hull shader, which computes control point transforms, the tessellator, which takes in the tessellation factors from the hull shader and outputs domain points, and the domain shader, which operates on each of those points.
But in order to facilitate the performance needed to make tessellation feasible, Nvidia had to shift away from that monolithic front-end and toward a more parallel design. Hence, the four raster and 16 PolyMorph engines. The company naturally has its own demos that show how much more efficient GF100 is versus the Cypress architecture, which employs the “bottlenecked” monolithic design—however, we’ll want to compare the performance of a title like Aliens Vs. Predator from Rebellion Developments with tessellation on and off to compare a more balanced app. Up front, though, Nvidia claims that GF100 enables up to 8x better performance in geometry-bound environments than GT200.