Results: OpenCL Performance
Intel enabled OpenCL 1.1 support on its Ivy Bridge-based processors with HD Graphics 4000 and 2500, giving developers an option to exploit the graphics component’s execution units for general-purpose workloads. Popular desktop applications like WinZip and Photoshop now offer sometimes-substantial performance gains on platforms able to more granularly parallelize workloads that would have previously been handled by fewer processing cores. With Haswell, support is being expanded to OpenCL 1.2.
Our Photoshop CS6 benchmark is most effective at showing the difference between processors that lack OpenCL support and those with it. The Core i7-2700K tackles this workload using its four Hyper-Threaded cores, while the -3770K and -4770K get their HD Graphics components involved.
The Haswell-based Core i7-4770K is slightly faster than its predecessor, likely due to a combination of additional EUs, more bandwidth, and higher IPC.
We run our WinZip test with and without OpenCL enabled on all processors, and you can clearly see there isn’t as much differentiation as there was in Photoshop. The explanation is easy enough, though. WinZip 17 is really well-threaded (much more so than 16.5 was). So, the CPU cores are taxed, even without OpenCL support. With OpenCL turned on, WinZip only offloads compression for files larger than 8 MB. So, if our 1.3 GB folder of files is full of documents, PowerPoint presentations, PDFs, and music (which it is), acceleration isn’t going to help much.
We do observe small speed-ups from the Core i7-4770K and -3770K, whereas the -2700K actually slows down when we try turning OpenCL on. The moral of the story? OpenCL is only going to register as a benefit insofar as the tasks you run are well-suited to heterogeneous computing. The Photoshop benchmark represents one end of that spectrum, and our WinZip test illustrates the other.
LuxMark 2.0 quantifies the speed-up from HD Graphics 4000 to 4600, simultaneously reminding us that the Core i7-2700K, for as capable as it is, doesn’t help in OpenCL-enabled software. As a side note, AMD's A10-5800K registered 225,000 samples per second, less than the Core i7-3770K.
Now, with that said, is OpenCL always going to be the performance win that each of our tests seems to show? Not necessarily. As we see in Sandra 2013’s GP Processing module, FP32 math is significantly faster on Intel’s HD Graphics engine than its x86 cores. However, doubles have to be emulated on all three processors, and the Sandy Bridge-based Core i7-2700K turns in better results there. It turns out that Intel’s powerful x86 cores emulate those results faster than Ivy Bridge or Haswell can on the GPU.