ROPs, Memory Controller
ROPs
ROPs were another weak point of the preceding generation for AMD, given their poor performance with antialiasing enabled. As with the texture units, the engineers started from scratch, again with the goal of maximizing the efficiency of the units per die area.
The first improvement is to Z rendering. ATI had introduced the possibility of doubling the fill rate on depth passes with its preceding architecture, but was still behind Nvidia, which offered a fill rate that was multiplied by eight in these situations. With the RV770, AMD is still behind, only quadrupling the fill rate – to 64 pixels per cycle. Let’s check that with the trusty fill rate tester:
Again, there is no surprise, as we saw for pure fill rates. On the other hand, Z rendering was a little disappointing. There is some improvement, but where the RV670 came close to its theoretical value (x1.89 instead of x2), the RV770 is far from it (x2.41 instead of x4). That’s just not enough to compete with the G92, which, though it’s also fairly far off the theoretical value (x5.2 instead of x8), is still out of reach.
However, that’s not the main improvement to the ROPs. ATI’s engineers focused on correcting the antialiasing performance, which was catastrophic compared to the competition. And where the RV670 could write only 8 pixels per cycle in MSAA 2X or 4X, with its fill rate divided by two, the RV770 doesn’t take a performance hit, and can still write 16 pixels per cycle in these situations. In the same way, rendering in an FP16 frame buffer has been optimized and is now done at full speed, whereas before the RV670’s fill rate had also been divided by two.Memory controller
Since its introduction of the ring bus with the R520, AMD has continued to work on its memory controller. The latest new feature consists of separating clients that are “bandwidth-greedy” (like the L2 texture cache or the ROPs) from clients that can settle for more reduced bandwidth (the PCI Express controller, the display controller, etc.). Less greedy clients share the same hub, whereas the memory controllers are distributed on the chip near the high bandwidth consumers.