- GDDR-3 Memory: GeForce FX 5700 Ultra
- OpenGL: ATi FireGL X2-256t and NVIDIA Quadro FX 1100
- Sky's the Limit Video Editing: Pinnacle Studio 9
- Future Promise for Graphics: PCI Express
- ADS DVD Xpress: Trash VHS Cassettes, Burn DVDs
- Dual Display Gaming Bigs Up
- Integrated VGA & How Good Is ATi's Radeon 9100 IGP?
- TV on PC: Compro Videomate Tv Gold Plus
- ASUS Radeon 9600 XT/TVD
- Gigabyte With NVIDIA Again: Gigabyte GeForce FX 5950 Ultra
- Core i7 overclock feature changed
- Worth it to wait for nehalem? really?
- Solidworks is killing me! What do I need to upgrade?
- 2.66GHz Nehalem to cost $300?
- Nehalem
- 8800 GTX water cooling
- Vote the cooler YOU would choose for heavy OC!!
- Lots of posts on big CPU Heatsinks/Fans, but whats the best?
- Worth it?
- Is it possible to open pixelpipes on 7800GS?
Source: Tom's Hardware US – Keywords: performance, leap
Topics: NVIDIA
Syndication:
Pixel Processing Engine
![]()
Each of the 16 pixel pipelines sports two shader units (superscalar design) and one floating point texture processor. NV40 also comes with four L1 texture caches, each of which serves four pipelines. A large L2 cache also helps to additionally unburden the memory interface. The architecture of the shader units follows a True SIMD (single instruction, multiple data) design. While the first shader unit of every pipe can handle arithmetic operations as well as texture reads and normalization, the second unit is limited to arithmetic. In other words: Shader unit 1 is "shared" with texture. When not texturing (for that pass) it is available for pixel shading. Shader unit 2 is always available for pixel shading.

Simply put, the NV40 usually acts like a 16-pipe design (16x1 - classical texture mapping with Color & Z). Imagine for example a shader with an arithmetic to texture ratio of 4:1. In such a scenario, Shader Unit 1 could spend 75% of its time during the passes on arithmetic, while Shader Unit 2 does 100% arithmetic. In this example, one pixel pipe can calculate 7 ops/clock.

In the case of shaders, we have to differentiate between instructions and operations (ops). Instructions define functions that are supposed to be applied to certain components (R,G,B or alpha) of a pixel. The shader units then carry out their calculations (Ops) according to these instructions.
NV40 is able to carry out 4 or more instructions per pixel and 8 or more operations per pixel and clock cycle. According to NVIDIA, ATi's R3xx series can only carry out 2 instructions per pixel and a maximum of four operations per pixel and clock cycle.
In short, I think it's safe to say that the NV40's pixel shader engine is blazingly fast and highly efficient.
Here's a summary of the new features of the pixel shader engine:
- Full Support for shader model 3.0
- 216 (65,535) length pixel shader programs - shatters PS2.0 limit of 96
- Dynamic Flow control - Loops & Branching, Call & Return, Subroutines
- Highest precision pixel shading - Native/optimized FP32 processing
- Flexible data type support - FP32, FP16 operand & texture formats
- Full speed non-power of 2 textures with mipmapping
- Multiple Render Target Support
- Centroid Sampling AA Support
- Previous page Pixel Processing Engine
- Next page Vertex Engine