After an impressive Radeon HD 4000-series lineup, AMD has maintained its momentum with an equally-impressive Radeon HD 5000-series.
One of the innovations that AMD adopted with its GPU design strategy was a "small die" concept. Whereas Nvidia typically architects a flagship GPU and then tries to scale budget versions by disabling processing cores, reducing clock speeds, or using fewer 64-bit memory interfaces, AMD elected to build around a optimized die philosophy in which the best mid-range GPU that can be built on a given manufacturing process is developed. Those GPUs can then be paired up on a flagship card. So, while Nvidia has historically offered the highest-performing GPU for a given generation, the last two product cycles have seen AMD offering the highest price/performance ratio.
ATI's (and now AMD's) approach to GPU design has always been gaming-focused and yet conservative. While Nvidia was trying to introduce 16/32-bit floating point into its GeForce FX line-up, a feature too slow for games, ATI was sticking to a fixed 24-bit precision shader in its Radeon 9700.
We've seen the same "problems" with Nvidia's GT200 core (GeForce GTX 260/275/280/285/295) and support for IEEE-754-compliant double-precision math. This capability is only just being introduced in the Radeon HD 5800-series (and is absent from the HD 5700-series). Nvidia’s ambition will carry over to GF100, seen by many to be architected more for CUDA than gaming.
What's important to recognize is that AMD has kept pace with the times. When hardware and software reached the point where FP32 shaders were needed, they had the feature available. Though Nvidia was first to market with double-precision IEEE-754-compliant math, now that this feature is coming into greater importance, the Radeon HD 5800-series is making that capability available as well. For the most part, Nvidia's first-to-market advantage for non-gaming features has not been a major driving force for sales (more on this later).
Looking Further Ahead
AMD's designs are leading toward the goal of Fusion. In Fusion, AMD plans to integrate traditional GPU technology with the CPU. Not only can this reduce latency while increasing bandwidth (faster), but the CPU/GPU can share a substantial set of resources (cheaper, less silicon). In much the same way that our CPUs have floating point units (FPUs), which are integrated components optimized for single- and double-precision math, rounding, and so on, AMD will offer a future that integrates GPUs and CPUs.
In the initial phase, AMD will most likely integrate existing CPU cores with GPU cores. In the same way the company developed today's Radeon GPUs with multi-chip scalability, Fusion processors will be scalable by controlling the number of processing elements. Down the line, the integration will likely be more thorough, with no clear difference between the CPU and GPU components of the chip. Instead, it'll be a CPU with an integrated FPU and "stream" cores, which, through a software driver will act as graphics chip. With proper power management, this will offer a potent combination of performance per watt. It will be possible to have a fully-capable desktop environment powered by the integrated GPU, saving battery life, and then having on-demand access to a faster GPU for applications able to benefit from it.
Nvidia tried this with its Hybrid SLI technology, but it did not work for the enthusiast market. While Hybrid SLI (GeForce Boost) allowed you to split the 3D workload between the integrated GPU and the GPU on an add-on card, it was useless with a flagship GPU. The latency and processing overhead required to split the workload was greater than the performance benefit. AMD has a higher chance of success with Fusion, thanks to the lower-latency of the integrated component.
Whereas Fusion is about integrating graphics technology into the CPU, Torrenza is about providing direct-to-CPU connectivity via HyperTransport or a similarly-advanced, high-bandwidth low-latency interconnect protocol.
The use of direct-to-CPU add-ons is actually already shipping in niche markets. For example, you can use an XtremeData XDI device in a standard AMD Opteron socket. This features two Altera Stratix II FPGAs that have direct access to the primary Opteron CPU and RAM. Even though these devices may not have the same raw GFLOPS that an eight-core CPU or GPGPU setup may leverage, the real-world performance for compute-intensive algorithmic applications, such as financial market data analysis, data encryption, or military radar systems, is considerably higher in both raw performance and performance per watt. You can also get HTX cards with similar FPGAs or multi 10 GigE network adapters with direct HyperTransport links to the CPU.
AMD's graphics division has demonstrated its ability to compete at the top of the market. Its Radeon HD 4000-series cards offered better price/performance than Nvidia's GeForce GTX (GT200) line-up, and the Radeon HD 5000-series remains the uncontested DirectX 11 champion.
AMD's CPU division has also demonstrated an ability to design high-performance multi-core CPUs, starting from the Athlon 64 X2 to today's current six-core server-oriented Opterons. Throughout the last decade, AMD and ATI have both been either a performance leader or in a close second place. As they enter the next ten years, AMD is the only technology company with the technical resources and consistent track record in both CPU and GPU technology. When AMD talks about Fusion, one cannot help but to believe that the company can succeed. We have high expectations for AMD in the upcoming decade.