Generation Gap
If you’re due for a graphics card purchase, odds are good that you’re not starting from scratch. Perhaps you’re doing one of those every-three-years-or-so PC makeovers common in the mainstream desktop crowd. Well, three years ago, ATI (still independent from AMD at that point) had just released its Radeon X1950 parts, with the XTX variant ringing in right around $450. At this time, everyone was still obsessing over speeds, feeds, and gaming features. As such, the press liked to tout the fact that the XTX was making the jump to GDDR-4 memory, although the other X1950 models were all doing well with their 256-bit wide GDDR-3 memory buses. Chips were built on a 90 nm fabrication process, cards generally used the PCI Express x16 slot interface, and the graphics libraries supported with the then-current DirectX 9.0c and OpenGL 2.0. The object of the game was to use traditional architectures and just ratchet them up as fast as possible.
What a difference a few years makes. Just as we’ve seen happen with CPUs, graphics processors made several architectural leaps. The new goal wasn’t necessarily to strive for lightning fast frequencies. If cores could made more efficient and make better use of parallel processing, total performance would naturally follow. For example, the following year, with the advent of the HD 2000-series, ATI made the leap to unified shaders in its desktop GPUs. A shader is a little program designed primarily to perform a graphics rendering task. For example, vertex shaders would alter the shape of an object, while pixel shaders could apply textures to individual pixels. Inside the GPU, designers dedicated blocks of circuitry to running those shader tasks. The X1950 had integrated hardware for eight vertex shaders and 48 pixel shaders. The downside of this architecture was that if, for instance, an application needed a lot of vertex operations done but not much pixel processing, all eight vertex shaders would be cranking away while most of the pixel shader circuitry would sit around idle, filing its nails, waiting for something to do. The 2000-series introduced unified, programmable shaders, so any block of shader circuitry could perform any suitable shader task—vertex, pixel, or otherwise. A mid-range 2000-series card could suddenly blow through that vertex-intensive task in less time than the X1950 XTX, even when based on lower clock speeds and at lower price points.
Similar sorts of advances have appeared in other areas around the GPU. The ring bus architecture that debuted with the X1000 evolved, widened, and grew to encompass a PCI Express bus connection, allowing for more efficient data exchanges with the surrounding system. The PCI Express bus itself migrated from version 1.0 to 2.0, doubling the interface bandwidth. Graphics library support updated all the way to DirectX 10.1 and OpenGL 3.0. By the time, AMD/ATI reached the 4650/70 chips (code-named RV730), the fab process had shrunk to 55 nm and the number of transistors in each processor mushroomed to 514 million—over five times the 80 to 90 million found in the X1950 chips made only two years prior.
What was AMD doing with all of that extra circuitry? Keep reading.