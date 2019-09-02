Intel Gen12 Graphics Linux Patches Reveal New Display Feature for Tiger Lake
Some information about the upcoming Gen12 (aka Xe) graphics architecture from Intel has surfaced via recent Linux kernel patches. In particular, Gen12 will have a new display feature called the Display State Buffer. This engine would improve Gen12 context switching.
Phoronix reported on the patches on Thursday. The patches provide clues about the new Display State Buffer (DSB) feature of the Gen12 graphics architecture, which will find its way to Tiger Lake (and possibly Rocket Lake) and the Xe discrete graphics cards in 2020. In the patches, DSB is generically described as a hardware capability that will be introduced in the Gen12 display controller. This engine will only be used for some specific scenarios for which it will deliver performance improvements, and after completion of its work, it will be disabled again.
Some additional (technical) documentation of the feature is available, but the benefits of the DSB are described as follows: “[It] helps to reduce loading time and CPU activity, thereby making the context switch faster.” In other words, it is new engine that offloads some work from the CPU and helps to improve context switching time.
Of course, the bigger picture here is the enablement for Gen12 that has been going on in the Linux kernel (similar to Gen11), which is especially of interest given that it will mark the first graphics architecture from Intel to get released as a discrete GPU. To that end, Phoronix reported in June that the first Tiger Lake graphics driver support was added to the kernel, with more batches in August.
Tiger Lake and Gen12 Graphics: What we know so far
With the first 10th Gen (10nm) Ice Lake laptops nearly getting into customers’ hands after almost a year of disclosures, Intel has already provided some initial information about what to expect for the 11th-Gen processors next year, codenamed Tiger Lake (with Rocket Lake on 14nm still in the rumor mill). Ice Lake focused on integration and a strong CPU and GPU update, and with the ‘mobility redefined’ tag line, Tiger Lake looks to be another 10nm product solely for the mobile market.
Credit: Intel
On the CPU side, Tiger Lake will incorporate the latest Willow Cove architecture. Intel has said that it will feature a redesigned cache, transistor optimizations for higher frequency (possibly 10nm++), and further security features.
While the company has been teasing its Xe discrete graphics cards for even longer than it has talked about Ice Lake, details remain scarce. Intel said it had split the Gen12 (aka Xe) architecture in two microarchitectures, one that is client optimized, and another one that is data center optimized, intending to scale from teraflops to petaflops. From a couple of leaks from 2018, it is rumored that the Arctic Sound GPU would consist of a multi-chip package (MCP) with 2-4 dies (likely using EMIB for packaging), and was targeted for qualification in the first half of next year. The leak also stated that Tiger Lake would incorporate power management from Lakefield.
The MCP rumor is also corroborated by some recent information from an Intel graphics driver, with the (Discrete Graphics) DG2 family coming in variants of what is presumably 128, 256 and 512 executions units (EUs). This could indicate one, two and four chiplet configurations of a 128EU die. Ice Lake’s integrated graphics (IGP) has 64EUs, and the small print from Intel’s Tiger Lake performance numbers revealed that it would have 96EUs.
A GPU with 512EUs would have in the neighborhood of 10 TFLOPS, which does not look sufficient to compete with 2020 GPU offerings from AMD and NVIDIA in the high-end space. However, not all the gaps are filled yet. A summary chart posted by @KOMACHI_ENSAKA talks about three variants of Gen12:
- Gen12 (LP) in DG1, Lakefield-R, Ryefield, Tiger Lake, Rocket Lake and Alder Lake (the successor of Tiger Lake)
- Gen12.5 (HP) in Arctic Sound
- Gen12.7 (HP) in DG2
How those differ is still unclear. For some speculation, the regular Gen12 probably refers simply to the integrated graphics in Tiger Lake and other products. However, the existence of DG1 and information about Rocket Lake could indicate that Intel has also put this IP in a discrete chiplet. This chiplet could then serve as the graphics for Rocket Lake by packaging it together via EMIB. If we assume Arctic Sound is the mainstream GPU, then Gen12.5 would refer to the client optimized version and Gen12.7 to the data center optimized version of Xe. In that case, the amount of EUs Intel intends to offer to the gaming community remains unknown.
Moving to the display, it remains to be seen if the Display State Buffer is what Intel referred to with the ‘latest display technology’ bullet point, or if the DSB is just one of multiple new display improvements. Tiger Lake will also feature next-gen I/O, likely referring to PCIe 4.0.
Given the timing of Ice Lake and Comet Lake, Tiger Lake is likely set for launch in the second half of next year.
Display Improvements in Gen11 Graphics Engine
With display being one of the key pillars of Tiger Lake, it is worth recapping the big changes in Gen11’s display block (we covered the graphics side of Gen11 previously).
Credit: Intel
As the name implies, the display controller controls what is displayed on the screen. In Ice Lake’s Gen11, it is part of the system agent, and it had some hefty improvements. The Gen11 display engine introduced support for Adaptive-Sync (variable display refresh rate technology) as well as HDR and a wider color gamut. The Gen11 platform also integrated the USB Type-C subsystem, and the display controller has specific outputs for Type-C, and it can also target the Thunderbolt controller.
Intel also introduced some features for power management, most notably Panel Self Refresh (PSR), a technology first introduces in the smartphone realm. With PSR, a copy of the last frame is stored in a small frame buffer of the display. In the case of a static screen, the panel will refresh out of the locally stored image, which allows the display controller to go to a low power state. As another power-saving feature, Intel added a buffer to the display controller, to fetch pixels for the screen into. This allows the display engine to concentrate its memory accesses into a burst, meanwhile shutting down the rest of display controller. This is effectively a form of race to halt, reminiscent of the duty cycle Intel introduced in Broadwell’s Gen8 graphics engine.
Lastly, on the performance side, in response to the increasing monitor resolutions, the display controller now has a two-pixel-per-clock pipeline (instead of one). This reduces the required clock rate of the display controller by 50%, effectively trading die area for power efficiency (since transistors are more efficient at lower clocks as voltage is reduced). Additionally, the pipeline has also gained in precision in response to HDR and wider color gamut displays. The Gen11 controller now also supports a compressed memory format generated by the graphics engine to reduce bandwidth.
I'm guessing that's referring to Thunderbolt 3, DisplayPort 2.0, and maybe HDMI 2.1.
None of the roadmap leaks or info on LGA-1200 mention any client CPUs having PCIe 4.0 this year or next.
That was 20 years ago!
AMD still made uncompetitive CPUs as recently as 3 years ago, and look at them now!
It's a lot more informative to look at the strides Intel has made in their integrated graphics. They're hardly starting from zero, this time around.
And at one point their roadmaps also showed Intel would be manufacturing 7nm nearly 3 years ago. And were still on 14nm for the desktop with just medeocre 10nm mobile processors...
https://www.anandtech.com/show/13405/intel-10nm-cannon-lake-and-core-i3-8121u-deep-dive-review/2
At the end of the day, they are projections. They can and usually are wrong, especially when they look many months or years in the future, just like the slide above.
It's understandable when they under-deliver on their roadmap promises, because technology is actually hard, sometimes.
However, what you're saying is that they're going to over-deliver on their roadmap promises. It could happen, but I don't think they've had a track record of doing that.
They're not projections as in stock market or weather forecasts - these are their plans that they're communicating to partners and customers! And work that's not planned doesn't usually get done. Furthermore, features that their partners weren't expecting wouldn't necessarily get enabled or properly supported.
Correct. I'm just reiterating Intel.
It says in that corner of the slide pictured, "projected"
They also started project Larabee, but no consumer devices were sold.
So I did not count it.
Just stating that this is their third attempt to enter the Discrete GPU market, the first 2 attempts were complete failures.
The article states this is their first.
Yeah, forcing x86 into GPUs was an exercise in trying to fit a square peg into a round hole.
To be fair, there were a lot of 3D graphics chips, back then - S3, 3D Labs, Matrox, Rendition, Tseng, Cirrus Logic, Number Nine, PowerVR, 3DFX, and of course ATI and Nvidia. Plus, even a few more I'm forgetting. Most weren't very good. Even Nvdia's NV1 could be described as a failure.
From the sound of it, the i740 was far from the worst. Perhaps it just didn't meet with the level of success that Intel was used to. Being a late entrant to a crowded market surely didn't help.
http://www.vintage3d.org/i740.php#sthash.e4kIOqFj.MxbFM9tE.dpbs
Ah, I had missed that.
Ah, you're right. I mis-remembered the ET6000 having 3D acceleration, but it seems that was to be introduced in the ET6300 that was never finished. Interestingly, I just learned that ATI bought Tseng... so, I wonder if any of that IP ever did see the light of day.
Oh man. I used to think the NV1 was cool, until I read this:
http://www.vintage3d.org/nv1.php
What a disaster! If that had been a product in a larger company, they'd have killed off the entire graphics division, after such a showing. The only reason Nvidia kept going is because that's all they had.
Be sure to check out the gallery, if you can (requires flash). Its quadric patch rendering is nothing to brag about, even on games that supported it!
BTW, that article states:
Ain't gonna happen. Not for consumers. PCIe 5.0 is more power-hungry and costly to implement. It might carry other limitations, as well. Moreover, there's not a strong need for more bandwidth, in the mainstream/consumer segment. And, according to this, Comet Lake-S (mainstream CPU for 2020) will still be PCIe 3.0.
https://www.tomshardware.com/news/intel-comet-lake_s-early-impressions-amd-ryzen-3000,40260.html
However, it is on their Server roadmap for 2021, while Ice Lake servers are slated to get PCIe 4.0 in Q2 of 2020.
This argument makes no sense to me. PCIe is fully forward-and-backward compatible. If you have a PCIe 3.0 peripheral, you can still use it in your Ryzen 3k X570 board! Likewise, there's no downside in buying (or a company building) a PCIe 4.0 SSD, since it'll still work in PCIe 5.0 boards. And, for companies, I'd hazard a guess that they'd learn a few things in building PCIe 4.0 devices that would carry-over to PCIe 5.0, easing the transition relative to jumping straight from PCIe 3.0 to 5.0.
In fact, the only case I can see for having PCIe 5.0 in consumer devices would be if you could reap some cost savings by cutting lane counts. However, the problem you'll run into is that people have x16 PCIe 3.0 GPUs and x4 NVMe PCIe 3.0 SSDs that they'll want to carry forward to any new motherboard, and they won't want to lose any of those lanes. So, unless a mobo somehow has an additional set of lower-speed lanes that only become active if the higher-speed lanes drop back, there's no real cost savings in it. And, for consumers, PCIe speeds just aren't a big bottleneck.
Faster bus speeds are all about cloud & datacenter. For things like all-flash storage arrays, 100 & 200 Gbps networking, and AI accelerators. That's why PCIe 5.0 came on so quick, and why PCIe 6.0 is hot on its heels.
I predict it'll be quite a while, before you see any consumer CPUs, GPUs, or SSDs with PCIe 5.0. I'd bet at least 2025. Maybe Intel or AMD could upgrade their Southbridge connection to 5.0 before then, so they can cut back on direct-connected CPU lanes (and I'm really looking at AMD, here), but not for their GPU slots.
Oh, they're just extrapolating. Each Gen11 EU has two 128-bit SIMD pipelines, together delivering ~16 FLOP/cycle. So, if you clock that at about 1.2 GHz, you get 10 TFLOPS. Intel tends to clock their GPUs in the neighborhood of 1 GHz, but they could obviously go a fair bit higher. They could also increase the SIMD width per EU, but then 512 would be really a lot.
https://en.wikipedia.org/wiki/List_of_Intel_graphics_processing_units#Gen11
For reference, Vega 64 delivers 10.2 - 12.7 TFLOPS, RX 5700 TX gives 8.2 to 9.8 TFLOPS, GTX 1080 Ti gives 10.6 to 11.3 TFLOPS, and RTX 2080 Ti gives 11.8 to 13.4 TFLOPS.
Yup. Architectures are starting to converge in a way. AMD/Nvidia/Intel are all going to be clocked in the ballpark range. I expect 1.6-2GHz range clocks for Xe graphics.
Also leaks show PCIe 4.0 support for Tigerlake. Not only that, DMI is being boosted to x8. That's quadruple the bandwidth. It is needed though because DMI x4 on PCIe 3 is limiting.
In servers, the chip after Icelake/Cooperlake is going PCIe 5/CXL. CXL is the Compute Express Link they announced quite recently. Actually CXL is based off PCIe 5 or something.
But I agree on the client side, PCIe 4 will last quite a while.
Marketing tells you technology is moving faster than before. Reality tells you scaling difficulties are slowing it down.
That's true. I think at least the barriers are far lower on their third attempt. Actually I would say there were many half-attempts with Iris Pros and such.
Intel was too focused on only the hardware and fabrication side with the i740 and Larrabbee. They are starting to get the drivers down with their Gen architecture, and they already have a consumer base. Gen 12/Xe builds on that.
Now they need to prove Xe can scale up, because current Intel iGPUs absolutely suck at it. The Skylake Iris Pro was a total failure scaling up.