R600: Finally DX10 Hardware from ATI

Memory Read/Write Cache

Among some of the other new caches inside the R600 design, there is a memory read/write cache for the general purpose register (GPR) array. DX10 wanted to "virtualize" any of the available resources thereby making them larger than they were before. Sticking to the theme of virtualization under DX10, ATI needed to virtualize the GPR stack. Under the DX9 API standard, there was only access to 16 or 32 GPRs per thread and R5xx went above that. ATI had to bring in a system of virtualizing its GPR and it did so by creating a bidirectional read/write cache in parallel with the vertex cache and the texture cache. This allows the shader core to actually write to and read back from memory. It can also handle write combining and other enhancements to improve performance. Write combining is the ability to group data together before sending it to memory. It saves on write commands and would be beneficial for the GS Stream Out functionality. Under this new setup, each pixel can have access to 4K 128-bit vectors or 64 kB of data per pixel. With tens of thousands of pixels in flight in the shader core, it would not be possible to hold all of the data. Again, this is why virtualization is so important in DX10.