The main complaint from developers and hardware manufacturers was the amount of reliance on the CPU in previous Direct3D versions. The problem with graphics is that it is a subsystem; both the CPU and graphics processor rely on some of the same components, such as the system main memory. Additionally, the graphics card has to rely on the CPU for coordination and instruction. Also, if complex situational or dynamic geometry is required, the CPU may be required to calculate some effect that alters the object, which can have an impact on performance as well.
With the implementation of better validation and error handling, resource mapping and access, and even an improved shader language, the biggest problem was somewhere else. Blythe stated that the API runtime has changed very little over the past 10 years. Three things were identified that caused additional processing in the driver and the runtime:
- Miscommunication over application requirements
- Differing processing styles
- Mismatches between the API and the hardware
The first and the second were easy to fix: a tighter standard between hardware and software makers clears up the first, while the second is in the hands of the software developers.
Here is a set of images from DX10 content samples used in Microsoft's testing
The last is the toughest situation, as Blythe states: "Our analysis failed to show any significant advantage in retaining fine grain changes on the remaining state, so we collected the fine grain state into larger, related, immutable aggregates called state objects. This has the advantage of establishing an unambiguous model for which pieces of state should and should not be independent, and reducing the number of API calls required to substantially reconfigure the pipeline. This model provides a better match for the way we have observed applications using the API."
|Operation/Command Cycles||Direct3D 9||Direct3D 10|
|Bind VS Shader||6636||416|
|Set Blend Function||787||530|
The table above shows the change between Direct3D 9 and 10 in the number of command cycle counts used in Blythe's analysis. On the Intel Pentium 4 system his team used, the number of draw calls was reduced ten fold. Reduced overhead in the driver means there is more CPU time to do other, more important tasks, as well as less reliance on the CPU to render images. This should free up the CPU for other activities, such as better artificial intelligence (AI) computations, effects physics calculations, or putting more objects into our scenes.
One major improvement comes in the form of something that Microsoft calls Instancing 2.0. Instancing is where multiple objects in the scene are extremely similar and all need to be drawn, lit, textured, and so forth. Under Direct3D 9, each unit in an army could require its own draw call, adding to the software stack and CPU utilization. Under Direct3D 10, all of the units of your army can be done with a single draw call. This will allow future games to instance thousands of units to fill our battlefields and add to the realism in our games.
Here is a screenshot of Supreme Commander (THQ), in which you can see over a hundred high resolution units.