AMD launches "stream processor" accelerator board

Tampa (FL) - AMD today introduced what it calls the world's first dedicated stream processor. What sounds like a completely new product is in fact based on the R580 graphics processor used in Radeon X1900 graphics cards built by the firm's graphics division, formerly known as ATI. The slightly modified GPU promises to inject massive floating point performance into computers.

BlueGene/L, the world's currently fastest supercomputer on record, uses 131,072 processor cores (65,536 dual-core processors) to achieve a peak performance of 367 TFlops. If AMD has its way, "stream processors" can either reduce the number of processors to achieve the same performance or dramatically accelerate such a system. The firm's stream processor announced is based on the 384-million-transistor R580 graphics core, which is more commonly used in Radeon X1900 graphics cards and is known for hiding a powerful number crunching engine: In fact - and at least in theory - the BlueGene's 367 TFlops could be achieved could be achieved with less than 1000 graphics processors, which provide a performance of about 375 GFlops each.

The concept of stream computing is based on the idea to make use of "massively parallel processors," which, in the case of the R580, addresses 48 individual cores. A few years, software developers discovered that the GPU cannot only be used to render graphics, but can also process other data, especially in environments where applications depend heavily on calculations. BionicFX, for example, was one of the first developers to use Nvidia 6800 graphics for audio processing; more recently, Stanford's Folding@Home research team announced that it would enable R580 graphics cards to accelerate the simulation of protein folding processes.

Compared to the regular graphics processor, the stream processor board is equipped with 1 GB of GDDR3 memory (instead of 512 MB) and has a modified memory controller to enable stream computing applications. While the accelerator is aligned with AMD's Torrenza effort and may support HyperTransport connectivity in the future, it still integrates a PCI Express interface. The platform does not support Crossfire connectivity, but AMD spokesperson Will Willis told TG Daily that it is "a matter of server vendors" to provide systems with multiple stream processor boards. The modifications to the GPU are somewhat expensive, as AMD charges $2600 for the board. While this amount is more than five times higher than an X1900 graphics card, it is only about a third of what Clearspeed asks for its accelerator board, which delivers about 100 GFlops of performance.

Key to a successful implementation are applications that actually can take advantage of the processor. AMD simplifies the creation of such software by providing a new thin hardware interface which it calls CTM (for "close to metal"). The company claims that the technology can increase processing application performance by as much as eightfold more than traditional 3D application programming interfaces. CTM promises "unfettered access to the native instruction set and memory" of the GPU.

At this time it is unclear what applications could emerge from stream computing in the near future and AMD officials hesitated to make any predictions. However, while the technology is targeted at high performance computing initially, the company believes that stream computing will be aiming for the consumer market as well. Company representatives told TG daily that the processors are well suited for example for image and video processing, especially H.264 encoding applications. Physics processing also has been mentioned a potential application for stream computing in consumer PCs. AMD expects streaming software to surface within the next two years.