The GPU Side: VLIW4 > VLIW5
While Trinity benefits from the very latest work in AMD’s processor portfolio, the APU employs a one-generation-old graphics architecture (that is to say it doesn’t use the Graphics Core Next design at the heart of Radeon HD 7000-series cards). Nevertheless, Trinity’s on-die GPU—ominously code-named Devastator--isn’t a re-hash of the old VLIW5 arrangement first seen in the Radeon HD 2900 XT either. It’s based on the more efficient VLIW4 design exclusive to the Radeon HD 6900 series.
Based on the nomenclature alone, you might assume that VLIW4 is VLIW5-minus-one, and you probably aren’t inclined to assume it’s improved. VLIW4 makes better use of available resources in today’s games, though. Let’s dig into why that’s the case.
VLIW stands for very long instruction word (describing AMD’s architecture), while the number is indicative of the quantity of ALUs in each thread processor. Due to wavefront dependencies, it’s difficult to keep all five of the ALUs in a VLIW5 thread processor engine fed. According to AMD’s analysis, modern games tend to utilize between three and four ALUs each clock cycle. VLIW4 removes the fifth and inefficient ALU, while enhancing the capabilities of the remaining four, resulting in better utilization of resources and available die space for an increased number of thread processors. VLIW4 provides better performance per mm², improved flow control, and improved GPU compute results compared to VLIW5. The newer design also accommodates improvements in the hardware tessellator, with better thread management and buffering support. For more details about the advantages of VLIW4, check out Radeon HD 6970 And 6950 Review: Is Cayman A Gator Or A Crock?.
Now, the Devastator GPU portion of Trinity has six SIMD engines, each with four texture units and 16 thread processors. With four ALUs per thread processor, Devastator boasts a grand total of 384 ALUs and 24 texture units. Two render back ends control the output, each able to handle four full-color raster operations per clock for an aggregate 128-bit memory interface and eight ROPs.
These specifications appear quite similar to Llano’s Sumo core, the key differences being that Devastator has 16 fewer ALUs and four more texture units. But when you consider the inefficiencies of VLIW5, as many as 80 of Llano’s ALUs could be going unused at any given time. So, we expect the newer APU to outperform its predecessor in gaming and compute applications
There are a handful of other differences as well. For instance, Trinity supports three independent display outputs (four if you’re using DisplayPort to daisy chain one panel to another), while Llano is limited to two outputs and DisplayPort 1.1a. Each of the four displays accommodates its own protected high-bitrate 7.1-channel audio stream, and display grouping is also supported.
In addition, the new model includes AMD’s Video Codec Engine (VCE), fixed-function logic dedicated to H.264 encode that we haven’t been able to test on any of the company’s Radeon HD 7000-series cards because the feature was not enabled in software. We’ll be testing it for the first time later in this article, though.