Shop for All
WinTV-HVR 1600 TV Tuner

Compare the top 5 lowest prices by hovering your mouse over the product names on the left

$19.96
PCTV HD PRO STICK-USB2 HD TVTUN $99.99
Radeon HD 4850 Video Card Radeon HD 4850 Video Card $208.87
GeForce 9800 GX2 Video Card GeForce 9800 GX2 Video Card $424.99
Radeon HD 4870 X2 Video Card Radeon HD 4870 X2 Video Card $539.99

See More Products...

Topics:

Pixel Processing Engine

1:00 PM - April 14, 2004 by Lars Weinand
Source: Tom's Hardware US – Keywords: performance, leap
Topics: NVIDIA

Syndication: Add to your Google homepage Add to My Yahoo!

Pixel Processing Engine


Each of the 16 pixel pipelines sports two shader units (superscalar design) and one floating point texture processor. NV40 also comes with four L1 texture caches, each of which serves four pipelines. A large L2 cache also helps to additionally unburden the memory interface. The architecture of the shader units follows a True SIMD (single instruction, multiple data) design. While the first shader unit of every pipe can handle arithmetic operations as well as texture reads and normalization, the second unit is limited to arithmetic. In other words: Shader unit 1 is "shared" with texture. When not texturing (for that pass) it is available for pixel shading. Shader unit 2 is always available for pixel shading.


Simply put, the NV40 usually acts like a 16-pipe design (16x1 - classical texture mapping with Color & Z). Imagine for example a shader with an arithmetic to texture ratio of 4:1. In such a scenario, Shader Unit 1 could spend 75% of its time during the passes on arithmetic, while Shader Unit 2 does 100% arithmetic. In this example, one pixel pipe can calculate 7 ops/clock.

In the case of shaders, we have to differentiate between instructions and operations (ops). Instructions define functions that are supposed to be applied to certain components (R,G,B or alpha) of a pixel. The shader units then carry out their calculations (Ops) according to these instructions.

NV40 is able to carry out 4 or more instructions per pixel and 8 or more operations per pixel and clock cycle. According to NVIDIA, ATi's R3xx series can only carry out 2 instructions per pixel and a maximum of four operations per pixel and clock cycle.

In short, I think it's safe to say that the NV40's pixel shader engine is blazingly fast and highly efficient.

Here's a summary of the new features of the pixel shader engine:

  • Full Support for shader model 3.0
  • 216 (65,535) length pixel shader programs - shatters PS2.0 limit of 96
  • Dynamic Flow control - Loops & Branching, Call & Return, Subroutines
  • Highest precision pixel shading - Native/optimized FP32 processing
  • Flexible data type support - FP32, FP16 operand & texture formats
  • Full speed non-power of 2 textures with mipmapping
  • Multiple Render Target Support
  • Centroid Sampling AA Support

Talkback
Be the first to comment on this review!

Note You are going to post a comment as anonymous.