Iris Pro Graphics 6200
From one generation to the next, Intel changes up the way it handles productization of its graphics. In the Sandy and Ivy Bridge families, high-end desktop processors came equipped with the company’s top trims: HD Graphics 3000 (12 execution units) and HD Graphics 4000 (16 EUs). Haswell saw the company deploy HD Graphics 4600 (referred to as GT2, with 20 EUs), saving HD Graphics 5000, Iris Pro Graphics 5100 and Iris Pro Graphics 5200 (that’s GT3, GT3 and GT3e, respectively, all rocking 40 EUs) for soldered-down CPUs.
In the image above, I numbered the six domains composing Haswell’s GT2, otherwise known as HD Graphics 4600. Domain three demarcates the Sub-Slice—a building block containing EUs, texture samplers, L1 instruction cache and a Media Sampler. Domain two is referred to as Slice Common, and it hosts the rasterizer, pixel back-ends and L3 cache. Together, those blocks make up the Slice.
A Slice in Haswell’s GT2 config included one Slice Common and two sub-slices, totaling 20 EUs. For Broadwell, Intel juggles the organization of resources to optimize for performance and power—each Sub-Slice is made up of eight EUs, rather than 10. But as a result of its shift to 14nm manufacturing, Intel can put a third Sub-Slice on GT2, yielding 24 EUs and more sampling throughput/cache per EU (and still reducing power versus Haswell, according to Jason Ross, graphics architect at Intel). The EUs themselves receive targeted improvements that relate to the architecture and implementation, bettering their performance and cutting power. For instance, the two SIMD floating-point units in each EU now support native 32-bit integer operations. Previously, only one did. The result is a doubling of integer computation throughput within each EU. The execution units also get native 16-bit floating-point support.
Broadwell’s GT3 adds a complete second Slice, doubling the already-faster GT2’s resources, including its fixed-function media capabilities. The math on that adds up to 48 EUs—a 2.4x increase compared to Core i7-4790K’s HD Graphcis 4600. And because there are three Sub-Slices per Slice instead of two, texture sampler performance increases 1.5x, while the FLOPS-to-texture ratio falls from 40:1 to 32:1.
The gains are palpable. For a 140% increase in EUs, we measure between a 109% and 141% performance improvement, depending on the operation in question.
GT3e further incorporates 128MB of embedded DRAM on the processor package, behind its shared L3 cache on a dedicated ring bus stop. Not only does this benefit performance, but Intel says there are also advantages to power (and thus efficiency) as you avoid transactions that would have previously gone to system memory. The eDRAM operates in its own clock domain and, according to the firmware on MSI's Z97A Gaming 6, runs at 1.8GHz. At that frequency, and given read/write buses capable of 32 bytes/cycle, you’re looking at bi-directional throughput of over 57 GB/s.
Of course, as we know from last generation’s Iris Pro Graphics 5200, the eDRAM isn’t married to the graphics engine; it’s available to the IA cores as well.
|Processor Graphics||Graphics Architecture||EUs||Max. Frequency||Peak GFLOPS|
|Core i7-5775C||Iris Pro Graphics 6200Gen 8||48||1150MHz||883 GFLOPS|
|Core i5-5675C||Iris Pro Graphics 6200Gen 8||48||1100MHz||844 GFLOPS|
|Core i7-4790K||HD Graphics 4600Gen 7.5||20||1250MHz||400 GFLOPS|
|Core i5-4690K||HD Graphics 4600Gen 7.5||20||1200MHz||384 GFLOPS|
Beefing Up Media In Broadwell
Intel has a storied history of design decisions rooted in simultaneous performance and power gains. More than four years ago, it introduced Quick Sync, again leveraging its manufacturing advantage to build a fixed-function engine for media encode/decode acceleration. The company lobbied ISVs to support its hardware, and a number of apps surfaced right off the bat to exploit it. Over time, Quick Sync has evolved to accelerate the latest formats, while giving developers more balance between quality and performance (target usages).
With Broadwell, Intel continues its quest to push more work at the fixed-function blocks optimized for specific tasks. These are faster than parallelized programmable logic (like EUs), which are in turn quicker than general-purpose IA cores. Because they involve fewer transistors, they also use a lot less power. That’s a win on two fronts—if you can afford to throw hardware at the problem. Intel, with its 14nm process, can.
So what does Broadwell on the desktop enable above and beyond Haswell? The Multi-Format Codec engine gets native support for 4096x2048 content, accelerating HEVC decode at up to 4Kp30 and VP9 at up to 4Kp24. This isn’t handled by a fixed-function block, though. Rather, Intel describes an approach involving the IA and graphics cores. This isn’t ideal, and the company is obviously working on a fully hardware-accelerated solution, but it’s better than nothing.
AVC/H.264 encoding receives a more substantial speed-up by virtue of the additional Sub-Slices (and the second Slice on GT3), since there’s a fixed-function Media Sampler—responsible for motion estimation—in each one. And because the EUs are used for rate control and mode decision, several steps along Intel’s familiar two-stage encoder run faster.
The Ivy Bridge graphics architecture included a sixth domain called the video quality engine, using dedicated hardware for video and image processing at very low power. Prior to that, those jobs were handled by the EUs. With Broadwell, the VQE is purportedly up to 2x faster.
Taken together, these improvements should have a profound impact on media performance, particularly in the context of desktop Haswell’s GT2 engine versus desktop Broadwell’s GT3e. One Multi-Format Codec becomes two. One Video Quality Engine becomes two, each with up to 2x throughput. Two Media Samplers become six, also sporting up to 2x throughput each.
SiSoftware Sandra 2015 appears to use Quick Sync for encoding, and these are the transcode results we measure. The H.264->H.264 task gets a 39% speed-up compared to Core i7-4790K and the WMV->H.264 workload enjoys 44% more throughput on Core i7-5775C.
Intel is also touting its end-to-end 4K support, which could be more relevant to the Core i5-5675C and Core i7-5775C than most other Broadwell-based processors, provided these wind up in small form factor and media PCs. The CPUs accelerate AVC/H.264 encode and decode at 4Kp60, along with HEVC decoding at 4Kp30 through the EUs and IA cores. Intel’s display controller can do up to 3840x2160 at 60Hz using DisplayPort 1.2 or 4096x2160 at 24Hz with HDMI 1.4. Unfortunately, HDMI 2.0 support didn’t make the cut.