LMA II - The New Light Speed Memory Architecture - PC Graphics Beyond XBOX - NVIDIA Introduces GeForce4

Page 6 of 19:

LMA II - The New Light Speed Memory Architecture

For me, the most important feature, and the one that's predominantly responsible for GeForce4 Ti's impressive performance leap over GeForce3, is the new improved 'LMA II'. Please follow this link to learn what 'LMA' stood for in GeForce3. It explains why a special memory controller makes so much sense in 3D-graphics. I will not explain this again here.

LMA II is GeForce3's LMA with each component tweaked, tuned and advanced.

Crossbar Memory Controller
GeForce3 was already equipped with this feature, enabling it to access memory in 64 bit, 128 bit as well as the usual 256 bit chunks, significantly improving memory bandwidth usage. For LMA II, NVIDIA improved the load balancing algorithms for the different memory partitions and improved the priority scheme to make more efficient use of memory across the four partitions.
Visibility Subsystem - Z-Occlusion Culling
This feature was also found in GeForce3 already, but for NV25 it has been tuned to cull more pixels while using less memory bandwidth to do it. The culling is now done in a certain culling surface cache on-chip to avoid off-chip memory accesses.
Lossless Z-Buffer Compression
This is another feature that was included into GeForce3 already. However, in LMA II the 4:1 compression is supposed to be done successfully more often, due to a new compression algorithm.
Vertex Cache
The vertex cache stores vertices after they are sent across the AGP. It's used to make the AGP more efficient, by avoiding multiple transmissions of the same vertices (e.g. primitives that share edges).
Primitive Cache
Assembles vertices after processing (after vertex shader) into fundamental primitives to pass onto triangle setup.
Dual Texture Caches
These were already found in GeForce3. The new cache algorithms are advanced to 'look ahead' more efficiently in cases of multi texturing or higher quality filtering. This contributes to the significantly improved 3 and 4 texture performance of GeForce4 Ti.
Pixel Cache
This cache at the end of the rendering pipeline is a coalescing cache, which is very similar to the 'write combining' feature of Intel and AMD processors. It waits until a certain amount of pixels have been drawn until it writes them to memory in burst modes.
Auto Pre-charge
Memory banks need to be pre-charged before they can be read, adding a nasty clock penalty to every read in a new bank of memory. To avoid this waste of time, GeForce4 Ti is able to assign memory banks for pre-charge ahead of time, according to a certain prediction algorithm.
Fast Z-Clear
This feature has already been around for several years. It was used for the first time on ATI's Radeon chip. What it does is simply set a flag for a defined area of the frame buffer, so that, instead of filling the whole frame buffer area with zeros, only the flag has to be filled, saving memory bandwidth.

NView

All GeForce4 cards are equipped with dual 350 MHz RAMDACs as well as TDMS transmitters for dual CRT or flat panel solutions. On top of that, NVIDIA supplied a plethora of software features that enable you to do all kinds of things with your personal dual display setup. Getting into more detail of this feature would extend over the boundaries of this article.