Intel VP claims up to 30% of CPU performance is untapped by modern games — software optimization is critical to unlocking full potential of hybrid CPUs
Intel has been making CPUs with a hybrid architecture for the past 5 years.
Intel switched to a hybrid architecture for its CPUs back in 2021 with Alder Lake, mixing performance and efficiency cores on the same package, similar to ARM-based chips. Since then, the company has iterated on the design slowly but steadily. And even though a "unified core" is in the works, for now, the hybrid architecture seems to have reached maturity from a hardware standpoint, according to Intel VP Robert Hallock.
Hallock sat down with PC Games Hardware and, during the interview, blamed the software side of things for performance issues on hybrid chips. He was referring to the practice of disabling E-cores on modern Intel CPUs to get better performance; some people report higher FPS in games when playing only on the P-cores. Hallock's response to this was that "they are virtually identical in performance… it’s about 1% difference."
He went on to explain how the early state of the Intel Thread Director back in the day contributed to better P-core-only performance. Windows' task scheduler is essentially blind without the Thread Director actually telling it which process is better suited to which core (hence the name "director"). With better optimization, even though the E-cores aren't as powerful, they still contribute their fair share to the overall operation. Even simple tools like Intel's APO can help in this regard.
Beyond that, though, there were also issues with low ring bus frequencies with E-cores at the time that contributed to worse performance. With them enabled, even if your P-cores were ready to boost much higher, the interconnect would be bottlenecked to lower speeds because the E-cores on the same die just couldn't keep up. Intel has worked to more clearly decouple the core clusters in subsequent generations, like Raptor and Arrow Lake.
Hallock continued, saying that he truly believes "that the general PC gaming market and especially enthusiasts [...] are significantly underestimating the importance of software to the PC experience." Software optimization is the next frontier of efficiency, of extracting more performance from the same silicon, because the silicon itself is seemingly not the bottleneck.
Things like the new binary optimization feature Intel has packed inside the Arrow Lake refresh chips are an example of this. Even though it doesn't work for most apps and games yet — Geekbench even flagged it — it's proof that tuning code can lead to better performance on the same hardware. Beyond the program, everything from the driver to even the BIOS adds its own overhead, leaving performance on the table.
"Yes, you can make the game faster with a faster piece of hardware, but there's always going to be 10, 20, 30% performance hidden behind the fact that that game was just not optimized for your CPU," claimed Hallock. AMD's solution to this problem has been rather simple: just add a lot of SRAM next to the cores, aka 3D V-cache, so that the CPU's L3 cache needs are met quickly, helping achieve higher FPS in games.
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Nova Lake has something similar in the works with its bLLC (Big Last Level Cache), but that's still a hardware solution. The thought of up to 30% performance just waiting to be extracted through better software optimization is therefore not overstated. In a way, Hallock is pointing fingers at developers and engineers who have optimized for AMD's relatively conventional silicon first, which affects the true potential of Intel's hybrid architecture.
Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

Hassam Nasir is a die-hard hardware enthusiast with years of experience as a tech editor and writer, focusing on detailed CPU comparisons and general hardware news. When he’s not working, you’ll find him bending tubes for his ever-evolving custom water-loop gaming rig or benchmarking the latest CPUs and GPUs just for fun.
-
Gururu Writing better software to increase hardware performance is essentially cheating at Geekbench I think.Reply -
Vanderlindemedia Reminds me of the FX Era. AMD launched a chip, that lacked single threading performance, but shined in multithreading. This was in a era where games where reliant on single core performance, making it a "bad" chip. I don't think a lot of people are or where waiting on hybrid cores. If i buy a CPU it needs to be all performance cores, nothing else. There's no trade-off with AMD products these days, as they are non of that hybrid stuff.Reply
Great that Intel finds 30% on the table, but the same story applied for AMD back then. Release a P core chip only. -
TerryLaze Reply
That's not at all what he is pointing at or what's happening, he is blaming games being developed for console first and that means that even AMD desktop CPUs with more than 8 cores lose just as much if not even more performance than intel.
In a way, Hallock is pointing fingers at developers and engineers who have optimized for AMD's relatively conventional silicon first, which affects the true potential of Intel's hybrid architecture.
Original article:
Viele Titel würden zunächst für Konsolen entwickelt und optimiert und erst im Anschluss auf den PC übertragen. Diese Vorgehensweise könne dazu führen, dass PC-Versionen nicht vollständig auf die jeweilige Hardware abgestimmt sind. In der Folge bleibe ein Teil der möglichen Leistung ungenutzt. Hallock spricht in diesem Zusammenhang von Leistungsreserven im Bereich von 10 bis 30 Prozent, die ohne gezielte Software-Anpassung nicht erschlossen werden könnten.
-
bit_user The phrase "up to" is doing a lot of work, here. The iBOT test found an average of only 8%, with some games experiencing less speedup and the greatest improvement being only 18%. On Cyberpunk 2077, the speedup was only 1.8%.Reply
https://www.tomshardware.com/pc-components/cpus/intels-binary-optimization-tool-tested-and-explained-how-the-ibot-translation-delivers-up-to-18-percent-faster-gaming-performance-8-percent-on-average
If the issue is "the fact that that game was just not optimized for your CPU", then a larger L3 cache isn't directly addressing that problem. It's addressing an issue, but a complementary one. Consoles have higher memory latency and smaller L3 caches than standard desktop CPUs, so the X3D CPUs certainly aren't addressing a deficit of desktop CPUs, relative to their console counterparts.The article said:AMD's solution to this problem has been rather simple: just add a lot of SRAM next to the cores, aka 3D V-cache, so that the CPU's L3 cache needs are met quickly, helping achieve higher FPS in games. -
wussupi83 I was initially anti e-core. But after some testing I've come to appreciate them.. mostly. Except, interestingly in my testing the 285k can actually be a slower than the 265k in some tasks and disabling 4 or 8 E-Cores in the 285k improves performance. But, interestingly, disabling E-Cores entirely hurts performance. And this is not a heavy CPU usage thermal throttled tests doing this so it's not like 265k is just boosting higher.Reply -
daworstplaya Honestly, not a fan of the whole hybrid big Little, e-Core architecture. Also not sure, why you need more than 2 e-cores on a CPU designed for gaming and heavy workloads. 1 e-Core to run the OS, the other to run some low level back ground tasks/apps. More e-cores is just a waste of space on the silicon that could've been used for more p-Cores.Reply -
usertests Reply
If used properly, E-cores are the opposite of a waste of space. They give more multi-threaded performance than the same area of P-cores. For single-threaded performance or gaming, 6-8, or soon 16, should be more than enough P-cores.daworstplaya said:Honestly, not a fan of the whole hybrid big Little, e-Core architecture. Also not sure, why you need more than 2 e-cores on a CPU designed for gaming and heavy workloads. 1 e-Core to run the OS, the other to run some low level back ground tasks/apps. More e-cores is just a waste of space on the silicon that could've been used for more p-Cores.
However, I suspect Intel will try to reverse course on E-cores eventually. More cores/threads than ~52 won't help the typical consumer. If they can get cores to subdivide and combine into "super cores" as needed, that could be helpful. We may be seeing the groundwork for that in Nova Lake: Intel will apparently move from 3 MiB L2 cache for P-cores (Arrow Lake), to 4 MiB L2 cache shared between 2 P-cores (Nova Lake).
LPE-cores will be around forever, actually filling the "background task" role and offering true efficiency benefits, often on a separate tile, instead of increasing multi-threading performance like E-cores. -
bit_user Reply
Some people talk about these classes of cores as "latency cores" and "throughput cores", rather than P and E. That's because the main advantage of P-cores is that they can provide low-latency computation, but at a high cost in die area and energy. Therefore, the best use of them is to run only the most latency-sensitive threads on them.daworstplaya said:not sure, why you need more than 2 e-cores on a CPU designed for gaming and heavy workloads. 1 e-Core to run the OS, the other to run some low level back ground tasks/apps.
The E-cores provide more computation per area and more computation per Watt. So, if you have some workload that divides well among a lot of threads, the way to maximize throughput is actually to spend most of your silicon budget on E-cores.daworstplaya said:More e-cores is just a waste of space on the silicon that could've been used for more p-Cores.
If anything, Intel's current designs are too P-core biased. However, that's assuming software was optimally designed to harness E-cores, which it's not. So, we're stuck with a compromise that's still not maximizing throughput. -
razor512 Intel needs to make a gaming focused CPU that does something like 8 P cores and then 1 to 2 super performance cores that delve well into diminishing returns, and use whatever die space that would go to an iGPU, and make a super performance core that pushes 6+GHz in addition to seeking a higher IPC.Reply
Most CPU bottlenecks in games are related to a single thread within the game that ends up maxing out at 1 core worth of CPU time. -
bit_user Reply
I'm sure you've heard they're unifying them, right? The P-core microarchitecture is dying off and the successor to Arctic Wolf (I think) will be the first Unified core. It'll be featured in whatever comes after Razer Lake.usertests said:However, I suspect Intel will try to reverse course on E-cores eventually.
🤣usertests said:More cores/threads than ~52 won't help the typical consumer.
IMO, 52-core Nova Lake is about as much of a "typical consumer" part as the 9950X3D!
Rumored, but not coming for a while, if ever. I've managed to convince myself it's a viable path, but challenging to implement in both hardware and software.usertests said:If they can get cores to subdivide and combine into "super cores" as needed, that could be helpful.
I think that might have more to do with Intel deciding the 4-tier cache hierarchy (not to mention the memory side-cache) of Arrow Lake was a mistake. So, now they're going to flatten it, more like their E-cores and what Apple & Qualcomm have done. I'm really curious to know if Coyote Cove will still have a L0 cache.usertests said:We may be seeing the groundwork for that in Nova Lake: Intel will apparently move from 3 MiB L2 cache for P-cores (Arrow Lake), to 4 MiB L2 cache shared between 2 P-cores (Nova Lake).
I think they're not so much "background task" cores, but actually near-idle cores. Intel made a big thing about how Meteor Lake could do video playback using just the SoC tile and the I/O tile, with the GPU and CPU tiles put completely to sleep.usertests said:LPE-cores will be around forever, actually filling the "background task" role
For extremely low-intensity usage, you can eke out more battery life on them. However, background tasks definitely won't be confined to them, once the system gets busy enough. When I log into my i5-1250P work machine, the CPU is fully pegged for like 2-3 minutes. From my perspective, most of that is "background tasks", but if the OS want to run them, it'll run them. Maybe the LPE cores fill up first, but then they'll overflow on the E-cores and even the P-cores.
Cinebench will still use them. So what, if they're only like 60% as fast as a regular E-core? Points is points.usertests said:instead of increasing multi-threading performance like E-cores.