To The Core, Continued
From the setup engine and "Ultra-Threading Dispatch Processor," the threads go to the pixel shading units. That is right, units. ATI clustered four pixel pipelines together into cellular units called Quad Pixel Shader Cores. These cores support Pixel Shader 3.0 and each core performs at 16 pixels per thread in order to maintain high efficiency. Each thread can have six shader instructions on four pixels per clock. The diagram below depicts how a shadow map is broken apart into 4x4 pixel blocks.
This diagram shows the power of dynamic branching using an example of shadow mapping. The logic on the processor allows for "if / else" operations. The green areas represent the first half of the operation or the "if" and the blue shows the second half "else". "If" the there is no need for a shadow then the textures stay the same. However, if they do need to be shaded with a shadow, the "else" statement applies the shadow. The pink sections represent 4x4 pixel blocks that need both the first and second operations performed on them.
With the pixel count being 4x4 (16 pixels), the number of pixel blocks that can be handled by only one half of the equation reaches a maximum. Utilizing this block size minimizes the amount of blocks that have to pass through two rendering passes in order to correct the shadowing effect.
This threading allows for faster branch execution over a traditional architecture.
Each of the family members has a very similar composition and execution process flow. The variances are a matter of the number of items in the core such as vertex shaders and pixel shading units as well as bus sizes and memory densities. The diagram below show how this looks in a flow chart format.