Page 2:ATi's Business Strategy
Page 3:Hardware T&L
Page 4:Hardware T&L, Continued
Page 5:Vertex Skinning
Page 6:Excurse - The Next Step Of The 3D Pipeline After T&L, The Triangle Setup
Page 7:Fill Rate, Rendering Pipelines And Triangle Size
Page 8:Wasted Energy - The Rendering Of Hidden Surfaces
Page 9:B - Radeon's HyperZ
Page 10:Fill Rate And Memory Bandwidth - They Belong Together!
Page 11:Fill Rate And Memory Bandwidth - They Belong Together! Continued
Page 12:C - The Pixel Tapestry Architecture
Page 13:The Pixel Tapestry Architecture, Continued
Page 14:3D - Textures
Page 15:Range Based Fog
Page 16:Card Details
Page 17:Driver Interface
Page 18:Driver Interface, Continued
Page 19:Test Setup
Page 20:Benchmark Expectations
Page 21:Benchmark Results - Quake 3 Arena Demo001
Page 22:Benchmark Results - Quake 3 Arena Demo001 FSAA
Page 23:Benchmark Results - Expendable Demo
Page 24:Benchmark Results - Expendable Demo FSAA
Page 25:Benchmark Results - Dagoth Moor Zoological Gardens
Page 26:Benchmark Results - Evolva Rolling Demo
Page 27:Benchmark Results - Evolva Rolling Demo Bump Mapped
Page 28:Benchmark Results - MDK2 Demo
B - Radeon's HyperZ
The biggest catch for ATi's new 3D-chip is clearly this feature. In fact I like HyperZ so much, that I really have to commend ATi's engineering team for it. The only complaint I have is the name. 'HyperZ' is definitely 'too funky for me'. 'Accelerated Z' would have catered more to the grown-ups who are supposed to shell out the significant amount of money for a Radeon card.
The actual 'HyperZ' technology consists of three different ways to reduce the fill rate as well as the memory bandwidth waste that kills so much performance in modern 3D accelerators.
The first technique to reduce unnecessary Z-buffer accesses and wasted pixel rendering is called 'Hierarchical Z'. It comes into place AFTER the triangle setup and BEFORE the rendering unit. Before a pixel gets sent from the triangle setup to the rendering unit Hierarchical Z looks up a defined area of the Z-buffer and checks if the pixel will be visible or not. If the pixel should be hidden it gets discarded right away, so that the rendering unit doesn't waste its time with it. The catch here is that this defined area of the Z-buffer is kept in a special cache, which avoids unnecessary Z-buffer reads. I don't want to get in any deeper detail, because ATi doesn't want to disclose any more than necessary.
This one is rather easy to understand, although the implementation takes a bit more understanding. We have learned that Z-buffer accesses are the biggest threat to local video memory bandwidth, so it is easy to understand that the (lossless) compression of the Z-coordinates will increase the performance of any chip that is hindered by memory bandwidth problems. The programmers amongst you will know that the compression of one z-coordinate at a time won't buy you much, so you can figure that ATi is obviously compressing a whole area. It is also not too difficult to guess that this area will be the very same one that is kept in the cache for the operations of Hierarchical-Z.
'Z-Clear' is something that most of us would forget when thinking about the Z-buffer impact on local video memory bandwidth, because it's not required while a frame is rendered. However, 'Z-Clear' is needed each time a frame has been fully rendered, and before the next frame can get drawn.
Hidden behind the term 'Z-Clear' is something rather pathetic, but important. We've learned above that each pixel that was rendered gets its z-coordinate stored in the z-buffer, so that the rendering unit can find out if a pixel that 'wants' to get rendered in the same spot is in front or behind that other pixel. Once the frame is all rendered the z-buffer represents the z-coordinates of all pixels that are visible on the screen. Those values need to be cleared before the next frame gets rendered, which is done by filling the z-buffer with zeros. A zero in the z-buffer shows the rendering pipeline that no pixel has been rendered in the spot so far, with the result that the pixel in the rendering pipeline will get rendered and not discarded.
Now this filling of a respectable amount of memory with zeros is taking a considerable amount of time and of course memory bandwidth. At a screen resolution of 1600x1200 and a color depth of 32-bit, the z-buffer is no less than 5.5 MB big. This amount of memory needs to be cleared after each frame, which can take quite a while. ATi's 'Fast Z-Clear' is able to clear the z-buffer more than 50 times faster, thus saving time and memory bandwidth. Again the programmers amongst my readers will have a pretty good idea how this will work, especially after my comments about the 'special areas' of the Z-buffer.
- ATi's Business Strategy
- Hardware T&L
- Hardware T&L, Continued
- Vertex Skinning
- Excurse - The Next Step Of The 3D Pipeline After T&L, The Triangle Setup
- Fill Rate, Rendering Pipelines And Triangle Size
- Wasted Energy - The Rendering Of Hidden Surfaces
- B - Radeon's HyperZ
- Fill Rate And Memory Bandwidth - They Belong Together!
- Fill Rate And Memory Bandwidth - They Belong Together! Continued
- C - The Pixel Tapestry Architecture
- The Pixel Tapestry Architecture, Continued
- 3D - Textures
- Range Based Fog
- Card Details
- Driver Interface
- Driver Interface, Continued
- Test Setup
- Benchmark Expectations
- Benchmark Results - Quake 3 Arena Demo001
- Benchmark Results - Quake 3 Arena Demo001 FSAA
- Benchmark Results - Expendable Demo
- Benchmark Results - Expendable Demo FSAA
- Benchmark Results - Dagoth Moor Zoological Gardens
- Benchmark Results - Evolva Rolling Demo
- Benchmark Results - Evolva Rolling Demo Bump Mapped
- Benchmark Results - MDK2 Demo