So what's the big deal about this PowerVR technology? Let's start with part of the rendering process that PowerVR differs from the "regular" process in a real basic view. Normally, in the case of other accelerators, triangles (which make up our beloved polygons) are sent in any order (except that alpha blended triangles restricts some ordering flexibility) to the card and the Z-buffer on the pixel level decides which polygons will appear in front of the others. The common method described above requires a random access Z-buffer that has the exact dimensions (but not necessarily the same number of bits) as the screen itself. This is not generally as efficient since everything is done on a per-pixel comparison.
With PowerVR the process is as follows:
- Gather and chunk (intersect check) all the triangles for the scene. Please note that this is done completely in hardware. The triangle data is bus mastered from host memory.
- Starting with the first chunk, the scene is rendered as normal within that chunk, except that it all takes place just in the on-chip rendering cache.
- The final rendering in that cache is then copied to the final back buffer. So each pixel is written once at most.
- Repeat for each chunk.
- Flip the back buffer to the front buffer.
Why is this a good thing? Z-buffer memory and memory bandwidth is not used (all the z comparisons occur on chip) and only visible pixels that are to be drawn in the display memory are textured, shaded and lit, saving both unnecessary graphics processing and memory bandwidth for texture fetches. Keep in mind that the PowerVR does indeed have a Z-buffer but it's only a tile (chunk) in size, 32x16 pixels in the Neon250's case, and is entirely on chip. This "cache" of sorts is used for each chunk so the bandwidth of a full sized memory based implementation of a Z-buffer is saved in exchange for this high-speed on-chip Z-buffer. The on-chip Z-buffer runs in parallel with other pipeline stages so there is no cost in terms of performance.
So how the heck are they doing all this? PowerVR has something called 'display list render', which allows them to batch polygons before rendering them with the 3D hardware. Normally a board would draw polygons one by one. Because of this, the scene can be rendered in regions or tiles. Some of you may be familiar with the BitBoys' method of tiled rendering that is not a deferred rendering scheme. Their Glaze 3D part is a traditional accelerator that uses screen tiling as a simple way to order its pixel/texel accesses in a manner that's friendlier to their texture cache.
If you saw a scene generated (a single frame) by a regular card, you would see the picture appear polygon by polygon. If you were to see the same scene drawn by the PowerVR, the picture would start to fill in by tiles, left to right, top to bottom. Thanks to this method, they were able to get rid of the external Z-buffer thus saving memory and memory bandwidth. A big thanks to Paul for providing some major technical assistance!