I wrote a piece last year about DX10 and what we can expect from the hardware built around that specification entitled What Direct3D is all about. In it I discussed the graphics portion of the DirectX10 API. Overall there were five major changes that happened to the API:
- Improved programmer expressiveness (Shader Model 4.0 and Geometry Shaders)
- Tight hardware specifications
- Improved performance (lower command cycle counts per frame)
- Unified instruction sets (High Level Shader Language 10)
- Stream I/O (Geometry Shader can write to memory)
While enhancing coding and making tighter specification guidelines might sound great to independent software and hardware developers, the meat for enthusiasts are the final three inside D3D10's unified shader architecture. It isn't the hardware itself but what it can do that is amazing.
Looking to the architecture first, the unified approach to handling tasks makes the most logical sense. Why have fixed function units that can only doe pixel or vertex shading when you can scale the number you need to fit the workload? The unified instruction set makes it possible for all of the available streaming or floating point processors to change function on the fly. This flexibility in services allowed Nvidia's G80 (GeForce 8800 Series) to handle twice the work that the previous generation could. While the transistor count was doubled from G70 to G80, normal yields are far from a 1:1 ratio. Instead of getting bottlenecks from one area, the core can shift itself to meet the challenge.
From "The Direct3D 10 System" by David Blythe
A powerful Input Assembler (IA) provides the processor's components with enough threads of work to keep them 100% occupied. The IA also tags data so it can be used in conjunction with the other threads and even replicate data as to minimize reordering completed work. This is called instancing. The IA can now create and tag hundreds if not thousands of complete objects and flood a scene with them. This is simply not possible under previous architectural generations.
These changes created greater efficiencies in how the processor handles work. To compliment that, steps were also taken to minimize overhead caused by dealing with large drivers. As mentioned before, the bulk of the driver is taken out of the kernel inside Vista. Lower overhead means even more work the CPU and other components can do. This frees them up to do more of what you want them to. In the case of games, this means greater frames per second or better image quality or additional physics simulations at current frame rates.
Lastly, there is the Geometry Shader (GS). While existing games will not benefit from the functions provided, future titles make use of it. The power of the GS comes from its ability to write to memory. This is done using Stream Output (SO). Using SO allows the geometry shader to read data back into the process through any part of the process. This means that data could stop coming from the IA and Vertex Shader entirely. Scenes could be sustained entirely by manipulations of output data. Just think of the game types that could be generated in real time (give me my vertical and horizontal scrolling games in 3D).