Sign in with
Sign up | Sign in
Core i7-4770K: Haswell's Performance, Previewed
By ,
1. Core i7-4770K Gets Previewed

Editor's Note: The Core i7-4770K is now available! Get my impressions of the final silicon in The Core i7-4770K Review: Haswell Is Faster; Desktop Enthusiasts Yawn. It shouldn't surprise you to learn that the numbers and conclusions in this preview were pretty much spot-on!

We recently got our hands on a Core i7-4770K, based on Intel's Haswell micro-architecture. It’s not final silicon, but compared to earlier steppings (and earlier drivers), we’re comfortable enough about the way this chip performs to preview it against the Ivy and Sandy Bridge designs.

Presentations at last year's Developer Forum in San Francisco taught us as much as there is to know about the Haswell architecture itself. But as we get closer to the official launch, more details become known about how Haswell will materialize into actual products. Fortunately for us, some of the first CPUs based on Intel's newest design will be aimed at enthusiasts.

Fourth-Generation Intel Core Desktop Line-Up

Cores / Threads
TDP (W)
Clock Rate
1 Core
2 Cores
3 Cores
4 Cores
L3
GPU
Max. GPU Clock
TSX
i7-4770K4 / 8
84
3.5 GHz
3.9 GHz
3.9 GHz
3.8 GHz
3.7 GHz
8 MB
GT2
1.25 GHz
No
i7-47704 / 8
84
3.4 GHz
3.9 GHz
3.9 GHz3.8 GHz3.7 GHz8 MBGT2
1.2 GHz
Yes
i5-4670K4 / 4
84
3.4 GHz
3.8 GHz3.8 GHz3.7 GHz3.6 GHz
6 MB
GT2
1.2 GHzNo
i5-46704 /4
84
3.4 GHz
3.8 GHz3.8 GHz3.7 GHz3.6 GHz6 MBGT21.2 GHzYes
i5-45704 / 4
84
3.2 GHz
3.6 GHz3.6 GHz3.5 GHz
3.4 GHz
6 MBGT21.15GHz
Yes
i5-44304 / 4
84
3 GHz
3.2 GHz
3.2 GHz
3.1 GHz
3 GHz
6 MBGT21.1 GHz
No
i7-4770S4 / 4
65
3.1 GHz
3.9 GHz
3.8 GHz3.6 GHz3.5 GHz8 MB
GT21.2 GHzYes
i5-4570S4 / 465
2.9 GHz3.6 GHz3.5 GHz3.3 GHz
3.2 GHz
6 MB
GT21.15GHzYes
i5-4670S4 / 465
3.1 GHz3.8 GHz
3.7 GHz
3.5 GHz
3.4 GHz
6 MB
GT21.2 GHzYes
i5-4430S4 / 465
2.7 GHz
3.2 GHz
3.1 GHz
2.9 GHz
2.8 GHz
6 MBGT21.1 GHz
No
i7-4770T4 / 445
2.5 GHz
3.7 GHz
3.6 GHz
3.4 GHz
3.1 GHz
8 MBGT21.2 GHzYes
i5-4670T4 / 445
2.3 GHz
3.3 GHz
3.2 GHz
3 GHz
2.9 GHz
6 MBGT21.2 GHzYes
i7-4765T
4 / 4
35
2 GHz
3 GHz
2.9 GHz
2.7 GHz
2.6 GHz
8 MB
GT2
1.2 GHz
Yes
i5-4570T
2 / 4
35
2.9 GHz
3.6 GHz
3.3 GHz
-
-
4 MB
GT2
1.15 GHz
Yes


According to Intel’s current plans, you’ll find dual- and quad-core LGA 1150 models with the GT2 graphics configuration sporting 20 execution units. There will also be dual- and quad-core socketed rPGA-based models for the mobile space, featuring the same graphics setup. Everything in the table above is LGA 1150, though. All of those models share support for two channels of DDR3-1600 at 1.5 V and 800 MHz minimum core frequencies. They also share a 16-lane PCI Express 3.0 controller, AVX2 support, and AES-NI support. Interestingly, four of the listed models do not support Intel's new Transactional Synchronization Extensions (TSX). We're not sure why Intel would want to differentiate its products with a feature intended to handle locking more efficiently, but that appears to be what it's doing.

The much-anticipated GT3 graphics engine, with 40 EUs, is limited to BGA-based applications, meaning it won’t be upgradeable. Intel will have quad-core with GT3, quad-core with GT2, and dual-core with GT2 versions in ball grid array packaging. GT3 will also make an appearance in a BGA-based multi-chip package that includes a Lynx Point chipset. That’ll be a dual-core part, though.

In addition to the processors Intel plans to launch here in a few months, we’ll also be introduced to the 8-series Platform Controller Hubs, currently code-named Lynx Point. The most feature-complete version of Lynx Point will incorporate six SATA 6Gb/s ports, 14 total USB ports (six of which are USB 3.0), eight lanes of second-gen PCIe, and VGA output.

Eight-series chipsets are going to be physically smaller than their predecessors (23x22 millimeters on the desktop, rather than 27x27) with lower pin-counts. This is largely attributable to more capabilities integrated on the CPU itself. Previously, eight Flexible Display Interface lanes connected the processor and PCH. Although the processor die hosted an embedded DisplayPort controller, the VGA, LVDS, digital display interfaces, and audio were all down on the chipset. Now, the three digital ports are up in the processor, along with the audio and embedded DisplayPort. LVDS is gone altogether, as are six of the FDI lanes.

2. Results: Sandra 2013

Although Dhrystone isn’t necessarily applicable to real-world performance, a lack of software already-optimized for AVX2 means we need to go to SiSoftware’s diagnostic for an idea of how Haswell’s support for the instruction set might affect general integer performance in properly-optimized software.

The Whetstone module employs SSE3, so Haswell’s improvements over Ivy Bridge are far more incremental. 

Sandra’s Multimedia benchmark generates an image of the Mandelbrot Set fractal using 255 iterations for each pixel, representing vectorised code that runs as close to perfectly parallel as possible.

The integer test employs the AVX2 instruction set on Intel’s Haswell-based Core i7-4770K, while the Ivy and Sandy Bridge-based processors are limited to AVX support. As you see in the red bar, the task is finished much faster on Haswell. It’s close, but not quite 2x.

Floating-point performance also enjoys a significant speed-up from Intel’s first implementation of FMA3 (AMD’s Bulldozer design supports FMA4, while Piledriver supports both the three- and four-operand versions). The Ivy and Sandy Bridge-based processors utilize AVX-optimized code paths, falling quite a bit behind at the same clock rate.

Why do doubles seem to speed up so much more than floats on Haswell? The code path for FMA3 is actually latency-bound. If we were to turn off FMA3 support altogether in Sandra’s options and used AVX, the scaling proves similar.

All three of these chips feature AES-NI support, and we know from past reviews that because Sandra runs entirely in hardware, our platforms are processing instructions as fast as they’re sent from memory. The Core i7-4770K’s slight disadvantage in our AES256 test is indicative of slightly less throughput—something I’m comfortable chalking up to the early status of our test system.

Meanwhile, SHA2-256 performance is all about each core’s compute performance. So, the IPC improvements that go into Haswell help propel it ahead of Ivy Bridge, which is in turn faster than Sandy Bridge.

The memory bandwidth module confirms our findings in the Cryptography benchmark. All three platforms are running 1,600 MT/s data rates; the Haswell-based machine just looks like it needs a little tuning.

We already know that Intel optimized Haswell’s memory hierarchy for performance, based on information discussed at last year’s IDF. As expected, Sandra’s cache bandwidth test shows an almost-doubling of performance from the 32 KB L1 data cache.

Gains from the L2 cache are actually a lot lower than we’d expect though; we thought that number would be close to 2x as well, given 64 bytes/cycle throughput (theoretically, the L2 should be capable of more than 900 GB/s). The L3 cache actually drops back a bit, which could be related to its separate clock domain.

It still isn’t clear whether something’s up with our engineering sample CPU, or if there’s still work to be done on the testing side. Either way, this is a pre-production chip, so we aren’t jumping to any conclusions.

3. Results: OpenCL Performance

Intel enabled OpenCL 1.1 support on its Ivy Bridge-based processors with HD Graphics 4000 and 2500, giving developers an option to exploit the graphics component’s execution units for general-purpose workloads. Popular desktop applications like WinZip and Photoshop now offer sometimes-substantial performance gains on platforms able to more granularly parallelize workloads that would have previously been handled by fewer processing cores. With Haswell, support is being expanded to OpenCL 1.2.

Our Photoshop CS6 benchmark is most effective at showing the difference between processors that lack OpenCL support and those with it. The Core i7-2700K tackles this workload using its four Hyper-Threaded cores, while the -3770K and -4770K get their HD Graphics components involved.

The Haswell-based Core i7-4770K is slightly faster than its predecessor, likely due to a combination of additional EUs, more bandwidth, and higher IPC.

We run our WinZip test with and without OpenCL enabled on all processors, and you can clearly see there isn’t as much differentiation as there was in Photoshop. The explanation is easy enough, though. WinZip 17 is really well-threaded (much more so than 16.5 was). So, the CPU cores are taxed, even without OpenCL support. With OpenCL turned on, WinZip only offloads compression for files larger than 8 MB. So, if our 1.3 GB folder of files is full of documents, PowerPoint presentations, PDFs, and music (which it is), acceleration isn’t going to help much.

We do observe small speed-ups from the Core i7-4770K and -3770K, whereas the -2700K actually slows down when we try turning OpenCL on. The moral of the story? OpenCL is only going to register as a benefit insofar as the tasks you run are well-suited to heterogeneous computing. The Photoshop benchmark represents one end of that spectrum, and our WinZip test illustrates the other.

LuxMark 2.0 quantifies the speed-up from HD Graphics 4000 to 4600, simultaneously reminding us that the Core i7-2700K, for as capable as it is, doesn’t help in OpenCL-enabled software. As a side note, AMD's A10-5800K registered 225,000 samples per second, less than the Core i7-3770K.

Now, with that said, is OpenCL always going to be the performance win that each of our tests seems to show? Not necessarily. As we see in Sandra 2013’s GP Processing module, FP32 math is significantly faster on Intel’s HD Graphics engine than its x86 cores. However, doubles have to be emulated on all three processors, and the Sandy Bridge-based Core i7-2700K turns in better results there. It turns out that Intel’s powerful x86 cores emulate those results faster than Ivy Bridge or Haswell can on the GPU.

4. Results: Performance Teaser, Per-Clock Perf And Threaded Apps

Per-Clock Performance

First, let's take a look at how Haswell fares against Ivy and Sandy Bridge at a constant 3.5 GHz, with power-saving features and Turbo Boost disabled.

At least in our single-threaded LAME conversion test, Haswell is just over 3% faster than Ivy Bridge and over 5% faster than Sandy Bridge.

Threaded Performance

How about when we turn on all of the chip’s features, and let the Core i7-4770K stand up against Ivy Bridge, Sandy Bridge, and Sandy Bridge-E?

In Blender 2.64, the quad-core Haswell-based part is more than 7% quicker than Core i7-3770K and 14% faster than Core i7-2700K. At the same time, a stock Core i7-3970X is still more than 23% faster than Core i7-4770K. If you were hoping that Haswell would offer an inexpensive quad-core substitute for what will be the two-generation-old Sandy Bridge-E architecture, it’d appear that the six-core design will continue to make sense for anyone with a true workstation.

Our Visual Studio-based Chrome compile benchmark would seem to concur. Think the -3770K and -2700K seem too close together? That’s what I thought at first, until I looked back to this image from our Core i7-3770K launch and saw the same thing. In comparison, Haswell has a huge impact in performance, pretty much cutting the gap between Sandy Bridge and Sandy Bridge-E in half. The six-core chip still reigns supreme in workstation-oriented tasks, but the Core i7-4770K’s >13% advantage over -3770K is stellar.

5. Results: More Common Desktop Apps

The Core i7-4770K plants itself right between Intel's four-core Core i7-3770K and six-core -3970X.

More aggressive threading places Haswell ahead of Ivy Bridge in this OCR-based test, but quite a ways behind the flagship CPU representing Sandy Bridge-E.

Like LAME on the previous page, our iTunes benchmark is single-threaded. In this test, however, we're leaving Turbo Boost enabled. Higher IPC throughput pushes Core i7-4770K into a lead. The Sandy Bridge-E-based Core i7-3970X doesn't benefit from its six cores, and slides back to third place. It's only faster than the Core i7-2700K because of a higher single-core Turbo Boost frequency.

Haswell is certainly faster than Ivy and Sandy Bridge, but our Core i7-4770K cannot catch the Core i7-3970X in a benchmark able to utilize all processing resources.

Again, Core i7-3970X proves the viability of an older Sandy Bridge-E configuration in threaded software. The more mainstream processors trail behind.

6. Results: HD Graphics 4600 In Hitman And DiRT

When Intel introduced Ivy Bridge’s architecture back at IDF 2011, company engineers made it a point to discuss the graphics pipeline’s modularity, even hinting that future implementations of the HD Graphics engine would see the multiplication of certain on-die components.

Based on early platform documentation, Intel currently has two configurations of Haswell's GPU planned: GT2 and GT3. Conceivably, there's a GT1 as well, though we've seen no mention of it pertaining to any of Intel's LGA, rPGA, or BGA models. Core i7-4770K sports GT2, with 20 execution units (up from Core i7-3770K’s 16). That part is branded as HD Graphics 4600.

Although it represents a notable speed-up compared to HD Graphics 4000, this isn’t earth-shattering enough to make Hitman: Absolution playable. Intel was showing off a system with GT3 in its CES 2013 booth, and that implementation looks to be significantly faster. However, HD Graphics 4600 is going to be far more incremental. We can’t turn this game down any more than its Low preset, and even at 1366x768, it’s just not fluid.

AMD's Trinity-based A10-5800K achieves an average of 20.39 FPS at 1920x1080, besting the Core i7-4770K in its current state.

We’re afforded a little more flexibility in DiRT Showdown, where HD Graphics 4600 runs the game’s Medium-quality preset fairly well at 1366x768. Stepping up to 1920x1080 still isn’t playable.  

The Core i7-3770K’s HD Graphics 4000 engine might be expected to outpace its predecessor by a greater delta. However, we used Intel’s latest pre-production drivers on Haswell and Ivy Bridge, while HD Graphics 3000 was backed by the latest public release.

With an average frame rate of 35.8 at 1920x1080, the A10-5800K is again quicker than the Core i7-4770K with it beta drivers.

7. Results: HD Graphics 4600 In Skyrim And WoW

Despite the early nature of our hardware and software, it’s likely that you still won’t be able to game comfortably at 1920x1080 on an HD Graphics 4600-equipped processor. A title like Skyrim will probably be accessible at 1366x768, even on a lower-end chip than the Core i7-4770K. An almost-30% speed-up is certainly commendable, particularly from a 25% increase in graphics resources (plus some frequency) compared to Ivy Bridge’s GT2.

Nevertheless, the Trinity-based A10-5800K currently looks faster still, achieving more than 45 FPS at 1920x1080 in this same test.

Although WoW is commonly derided for its age and cartoonish graphics, it remains a popular title. And the Core i7-3770K handles it pretty well at 1366x768, averaging almost 70 FPS. Our Core i7-4770K sample is about 16% quicker at the same resolution, easily qualifying as playable.

It’d even appear to do pretty well at 1920x1080, though we’ll caution that the experience is far less pleasant than a sustained 50 FPS. And this is taken from a flight path in Pandaria. You’d be completely slammed in a raid situation. AMD's A10-5800K, averaging more than 60 FPS at 1920x1080, is currently much more playable.

A Little Context On Graphics

How will these results affect our next comparison between Intel’s CPUs and AMD’s Richland-based APUs? The first thing we have to remember is that Core i7-4770K will likely be a $300+ processor (and that most enthusiasts who buy one will use discrete graphics, rather than on-die). So, while it’d appear that Haswell’s GT2 implementation could get Intel close to Trinity's performance, at least in traditionally platform-bound games like Skyrim and WoW, remember that price-competitive models won’t be as fast as our -4770K.

I want to wait for Intel's final silicon and drivers before putting AMD into the same charts as Haswell-based chips, but based on the numbers I've been running, it appears likely that processors equipped with GT2 will come up short against AMD's fastest APUs on the desktop.

Almost certainly, however, a (mobile) part with twice as many execution units and 128 MB of L4/eDRAM at 1.2 GHz would blow Trinity out of the water in games. A comparison to Richland probably won't change much there. And we'll likely need to wait until 2014 to see how Kaveri affects AMD's position.

8. A Taste Of Things To Come…On The Desktop

So, now enthusiasts have a general sense for how Haswell will compare to high-end Sandy Bridge, Sandy Bridge-E, and Ivy Bridge processors. You probably could have guessed this before even looking at our benchmarks, but the pre-production Core i7-4770K is in the neighborhood of 7 to 13% faster than Core i7-3770K in today’s threaded workloads. That’s pretty consistent with the evolution from Sandy to Ivy Bridge, even as the flagship Haswell-based part keeps its thermal ceiling under 84 W.

Processors with Intel’s HD Graphics 4600 engine should offer notably better 3D performance than today’s HD Graphics 4000, though most enthusiasts purchasing unlocked K-series parts won’t even notice. An additional four execution units and a maximum dynamic frequency 100 MHz faster than Core i7-3770K are only good for incrementally-faster frame rates—but nothing that’ll replace discrete graphics (seems to be the conclusion we draw every generation, huh?). As before, desktop gamers will continue buying graphics cards.

The mobile space is where Intel’s efforts should become more apparent…and it has something for that market we anticipate will give AMD’s and Nvidia’s entry-level GPUs a serious run. CPUs with the GT3 graphics engine will only be available in BGA packaging, though.

Where does that leave you as a power user on the desktop? Well, you’ll have access to quad- and dual-core Haswell-based CPUs armed with GT2 and two memory channels each. The LGA 1150 interface means you’ll need a new motherboard with an 8-series chipset. Fortunately, the updated platform gives you six SATA 6Gb/s ports and six USB 3.0 ports (14 total ports, including USB 2.0). At least from the enthusiast angle, everything else is pretty much the same.

Overclocking is undoubtedly what many of you are going to base your buying decision on. Many enthusiasts assumed that Ivy Bridge-based CPUs manufactured at 22 nm would be far more tunable than 32 nm Sandy Bridge processors. When that turned out not to be the case, many folks expressed that they’d sit on fast second-gen Core chips running comfortably at 4.4 and 4.5 GHz. I wasn’t able to overclock the Core i7-4770K I tested—largely because that’s no way to treat a borrowed CPU. But we’re certainly curious to see how a more mature process affects the architecture’s scalability.

Making Way For Mobile

We know from our talks with motherboard vendors at this year’s CES that you’ll be able to buy Haswell in LGA 1150 trim, but that its successor, Broadwell, is going to be BGA-only (meaning it’ll ship soldered onto motherboards). Now, it’s possible that Skylake, the architecture to follow Broadwell, will see Intel re-introduce an upgradeable interface. However, Core i7-4770K is going to get a lot of attention, if only because of its position as the last flagship before we’re subject to less flexibility.  

Of course, where we’re ultimately headed is a world where these desktop-class architectures are pulled down into smaller computing devices. It’s already happening with Ivy Bridge-based chips, but will continue with Intel’s Y-series parts and AMD’s Kabini. I know a lot of enthusiasts are bemoaning the slow erosion of unfettered configurability. However, the sky is not falling, and we're not ready to throw in the towel as power users. To the contrary, I’m looking forward to getting my hands on Ivy Bridge-E, a Haswell-based Surface, and the next generation of x86-based consoles.

Follow Chris Angelini on Twitter