ATi's New Radeon - Smart Technology Meets Brute Force

B - Radeon's HyperZ

The biggest catch for ATi's new 3D-chip is clearly this feature. In fact I like HyperZ so much, that I really have to commend ATi's engineering team for it. The only complaint I have is the name. 'HyperZ' is definitely 'too funky for me'. 'Accelerated Z' would have catered more to the grown-ups who are supposed to shell out the significant amount of money for a Radeon card.

The actual 'HyperZ' technology consists of three different ways to reduce the fill rate as well as the memory bandwidth waste that kills so much performance in modern 3D accelerators.

Hierarchical Z

The first technique to reduce unnecessary Z-buffer accesses and wasted pixel rendering is called 'Hierarchical Z'. It comes into place AFTER the triangle setup and BEFORE the rendering unit. Before a pixel gets sent from the triangle setup to the rendering unit Hierarchical Z looks up a defined area of the Z-buffer and checks if the pixel will be visible or not. If the pixel should be hidden it gets discarded right away, so that the rendering unit doesn't waste its time with it. The catch here is that this defined area of the Z-buffer is kept in a special cache, which avoids unnecessary Z-buffer reads. I don't want to get in any deeper detail, because ATi doesn't want to disclose any more than necessary.

Z-Compression

This one is rather easy to understand, although the implementation takes a bit more understanding. We have learned that Z-buffer accesses are the biggest threat to local video memory bandwidth, so it is easy to understand that the (lossless) compression of the Z-coordinates will increase the performance of any chip that is hindered by memory bandwidth problems. The programmers amongst you will know that the compression of one z-coordinate at a time won't buy you much, so you can figure that ATi is obviously compressing a whole area. It is also not too difficult to guess that this area will be the very same one that is kept in the cache for the operations of Hierarchical-Z.

Fast Z-Clear

'Z-Clear' is something that most of us would forget when thinking about the Z-buffer impact on local video memory bandwidth, because it's not required while a frame is rendered. However, 'Z-Clear' is needed each time a frame has been fully rendered, and before the next frame can get drawn.

Hidden behind the term 'Z-Clear' is something rather pathetic, but important. We've learned above that each pixel that was rendered gets its z-coordinate stored in the z-buffer, so that the rendering unit can find out if a pixel that 'wants' to get rendered in the same spot is in front or behind that other pixel. Once the frame is all rendered the z-buffer represents the z-coordinates of all pixels that are visible on the screen. Those values need to be cleared before the next frame gets rendered, which is done by filling the z-buffer with zeros. A zero in the z-buffer shows the rendering pipeline that no pixel has been rendered in the spot so far, with the result that the pixel in the rendering pipeline will get rendered and not discarded.

Now this filling of a respectable amount of memory with zeros is taking a considerable amount of time and of course memory bandwidth. At a screen resolution of 1600x1200 and a color depth of 32-bit, the z-buffer is no less than 5.5 MB big. This amount of memory needs to be cleared after each frame, which can take quite a while. ATi's 'Fast Z-Clear' is able to clear the z-buffer more than 50 times faster, thus saving time and memory bandwidth. Again the programmers amongst my readers will have a pretty good idea how this will work, especially after my comments about the 'special areas' of the Z-buffer.