HD Graphics 4000: The Plus In Intel’s Tick+
Tom Piazza, the Intel fellow who unveiled Ivy Bridge’s graphics subsystem at last year’s IDF, made the case that even though we’re looking at a die shrink—a tick in the company’s cadence, by definition—the integrated GPU is more accurately characterized as a tock.
I already mentioned that Intel improved the performance of its integrated GPU by adding four more execution units to its highest-end implementation, that it finally folded in DirectX 11 support, that Quick Sync is faster now than it was last generation, and that up to three displays are supported natively.
Getting there required a reorganization of how Intel approached processor-based graphics, allowing the company to not only set a more aggressive roadmap for scaling graphics in the future, but also to fix some of the inefficiencies that held Sandy Bridge back. The result is an architecture partitioned into five domains.
- The first domain includes global assets like the geometry pipeline. Incorporating programmable hull and domain shader stages complement a fixed-function tessellation unit as requisites for DirectX 11 support.
- Intel refers to the second domain as Slice Common, which hosts the rasterizer, the pixel back-ends, and L3 cache. Sandy Bridge didn’t have L3 dedicated to graphics because Intel wasn’t able to derive meaningful performance from it. The processor’s ring bus provided sufficient bandwidth that the shared L3 worked well enough. But because Ivy Bridge pushes graphics harder, the dedicated L3 supplements its bandwidth requirements, simultaneously reducing power consumption when the engine can go to its own repository rather than spinning up the ring.
- Domain three, dubbed Slice, includes the shaders, texture samplers, L1 instruction cache, and the Media Sampler used by Quick Sync. In future generations, this is one collection of resources Intel plans to use to scale performance up. It’s also able to lay down additional Slice Commons to scale back-end throughput accordingly.
- The fourth domain is made up of fixed-function media features. This can also be scaled up or down, depending on how granular Intel chooses to get with media-oriented performance.
- Display outputs constitute the last domain. You can do three digital outputs on a desktop platform (provided a motherboard vendor enables them), but two have to be DisplayPort connections—one at up to 2560x1600 and one at up to 1920x1200. The third screen can be HDMI (up to 1080p), DVI, VGA, or DisplayPort at up to 1920x1200.
Within each domain, Intel says it tweaked and tuned for additional performance, increasing geometry throughput, optimizing buffer-clearing, improving anisotropic sampling quality, maximizing sustained compute performance, and bolstering performance per watt by leveraging that dedicated graphics L3 cache.
From Theory To HD Graphics 4000
All of those features materialize on Core i7-3770K as the HD Graphics 4000 engine, armed with 16 EUs, a base frequency of 650 MHz, and a maximum dynamic frequency of 1.15 GHz. At idle, the graphics logic spins down to 350 MHz, creating more thermal headroom for the IA cores.
Last year, I lamented the fact that Intel armed most of its mobile processors and its K-series desktop SKUs with HD Graphics 3000, the fastest implementation available. Meanwhile, 12 other desktop-oriented models got stuck with HD Graphics 2000, hobbling their performance. Over the course of 2011, Intel slowly rectified that situation by launching additional SKUs with HD Graphics 3000.
This time around, Intel divides up 3D alacrity a little differently. All launched mobile and desktop Core i7s get HD Graphics 4000, and all but one (Core i5-3570K) desktop Core i5s get HD Graphics 2500.
Instead of 16 EUs, HD Graphics 2500 only offers six. Intel says to expect somewhere between 10-20%-better performance from HD Graphics 2500 compared to 2000. We have some i5s in the lab and will have a closer look at HD Graphics 2500 in the days to come.
What about HD Graphics 4000, though?