For many years, I was the lead programmer for custom business sector-specific applications, and in lieu of synthetic benchmarks, which may not be relevant to real-world applications, I created three tests using software from that world.
One Thread Solving a Complex Problem
The first program optimizes building a complex brick wall. A 60-foot-long stretch contains eight windows and two door openings, including the frames and jambs. Small holes for the spools of horizontal blinds and three interlocking walls also have to be placed. The goals are minimizing the number of split bricks and optimizing re-use of split bricks so that waste is almost eliminated. Since placement of bricks depends on where other bricks are located, the task can't be parallelized easily. So, it runs in a single thread, while memory use is negligible.
Intel's Core i3 wins this benchmark comfortably, delivering performance that's 20% faster than AMD's A10-7850K. However, since we are only talking about a three-second run time, the absolute difference isn't significant. Then again, we've seen similar scaling in our single-threaded iTunes and LAME benchmarks, too.
4 Threads = 4 Jobs?
The next application optimizes solar panel placement by considering the sun's position throughout the day, each of the 365 days in a year, from sunrise to sunset, in one-hour intervals. Two trees, a neighbor’s house, and a few chimneys create shadows, and in some light conditions, front-row solar panels throw shadows on second-row panels as well. This program also optimizes the placement of wiring. Furthermore, based on historical meteorological data, the expected energy output for the whole year is estimated. This software can be easily parallelized, as the energy output corresponding to each sun position is calculated independently.
While the Core i3-4330 is still in front, it’s a closer race since Hyper-Threading technology doesn't quite match the effectiveness of four integer units. The Haswell-based Intel CPU is barely faster than the A10-7850K, while beating the lower-clocked A10-7800 by 6%. The older Core i3-2100, which we included for comparison purposes, clearly shows its age.
We further increase the degree of parallelization by running a photo-realistic renderer on several computers on a network. For instance, all of a company's office PCs can be harnessed for a computation-intensive task like rendering. One PC serves as the controller, which farms out jobs to other PCs based on their hardware capabilities. In this test, each of the compute clients runs four worker threads.
The different CPU and APU models are even closer together due to communication overhead and increased memory footprint. The A10-7850K manages a win against the Core i3-4330, and the A10-7800 finishes a close third. What do we learn from this benchmark? Real-world applications involve more moving parts than video transcoding or gaming. While IPC is naturally an important consideration, it's not all-telling.
A Typical Consumer Application: Video Compression
When an application doesn't support OpenCL, or OpenCL acceleration is disabled, the Kaveri architecture's two modules address up to four threads in parallel. We decided to use HandBrake as a benchmark to test this. As expected, the A10-7800 winds up in the middle of our test field.