Geekbench investigates up to 30% jump with Intel's iBOT — performance gain attributed to newly-vectorized instructions

Intel Core Ultra 7 270K Plus CPU installed in a motherboard socket — (Image credit: Tom's Hardware)

A few days after reviews of Intel's Core Ultra 7 270K Plus and Core Ultra 5 250K Plus rolled out, Geekbench said it would invalidate all results recorded with the two CPUs. That's because it's the only non-gaming application that currently supports Intel Binary Optimization Tool, or iBOT, which modifies a binary to optimize it for a specific Intel architecture. A week later, Geekbench has published its findings after investigating what iBOT is doing behind the scenes, and attributed an uplift of up to 30% in certain workloads to newly-vectorized instructions.

Go deeper with TH Premium: CPU

A hand holding the Ryzen 7 9850X3D. — (Image credit: Tom's Hardware)

Overall, Geekbench found a 5.5% increase in both single and multithreaded performance with version 6.3 run on the MSI Prestige 16 AI+ with an Intel Core Ultra 9 386H. Those results aren't dissimilar to what we saw when testing iBOT with the 270K Plus and 250K Plus. Several of Geekbench's subtests saw no performance benefit, but some saw outsized performance increases, namely object removal at a 24.6% jump and HDR processing at a 28.5% jump. Geekbench chose to dig deeper into the HDR subtest to see what was going on.

With iBOT enabled, Geekbench saw a 14% reduction in overall instructions and a 62% drop in scalar instructions. However, it saw a 1,366% increase in vector instructions. To see which instructions were executing, Geekbench used Intel's Software Development Emulator, or SDE.

Latest Videos From

Watch full video here:

With iBOT disabled and after 100 runs of the HDR subtest, Geekbench saw a total of 220 billion scalar instructions and 1.25 billion vector instructions. With iBOT on, that went to 84.6 billion scalar and 18.3 billion vector. By vectorizing a large number of the instructions in this subtest, iBOT is able to significantly improve performance, relying on SIMD (single instruction, multiple data) rather than a linear pipeline (single instruction, single data, or SISD) of scalar instructions.

The change in instruction mix is what's interesting here. Geekbench's conclusion is what you probably expect; it doesn't appreciate an optimization that only applies in a small list of applications. "[iBOT] undermines this by replacing that varied code with processor-tuned, fully optimized binaries, measuring peak rather than typical performance."

Geekbench's view is rather negative, and understandably so, but the peek behind the curtain here has a lot of implications for the future of iBOT. Vectorized instructions on modern CPU architectures can vastly improve performance with a relatively small hit to power consumption — just look at Zen 5's performance in an AVX-512 workload like Y-Cruncher. This investigation shows that Intel is able to do that on the backend with a shipping binary.

There are downsides here, however. Geekbench noted a 40-second startup delay in its initial testing with iBOT, which shrunk to a consistent two-second delay on subsequent passes. There was no delay with iBOT disabled. Additionally, it found no performance improvement with Geekbench 6.7. iBOT computes a checksum against the executable, which means it's trying to find out if a specific binary is optimized.

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

TOPICS

Jake Roach is the Senior CPU Analyst at Tom’s Hardware, writing reviews, news, and features about the latest consumer and workstation processors.

25 Comments Comment from the forums

bit_user

Nice of them to be transparent about their findings. Reading between the lines, I sense they're a little insulted by how much performance it's implying they left on the table through poor usage of vectorization, in some of those workloads.

Now, a nice question to answer would be whether the compilers for ARM CPUs was also more aggressive at vectorization, as that could help explain some of the giant lead in GB6 scores that Apple an Qualcomm have pulled vs. x86.
Reply
wakuwaku

bit_user said:
Nice of them to be transparent about their findings. Reading between the lines, I sense they're a little insulted by how much performance it's implying they left on the table through poor usage of vectorization, in some of those workloads.
You mean reading between Tom's lines? Because there are no lines to read in between if you read the source.

Geekbench staff clearly said that that Geekbench was designed to reflect workloads using different techniques. If that is their philosophy, then using more of a single technique, vectorization in this case, just because it ekes out more performance in their workloads make zero sense. That doesn't reflect the real world. If the real world uses more a lot of vectorization sure it makes more sense. But for now it might as well be a purely synthetic benchmark that tests one kind of workload.
Reply
bit_user

wakuwaku said:
Geekbench staff clearly said that that Geekbench was designed to reflect workloads using different techniques. If that is their philosophy, then using more of a single technique, vectorization in this case, just because it ekes out more performance in their workloads make zero sense. That doesn't reflect the real world. If the real world uses more a lot of vectorization sure it makes more sense.
They said the big gains were:
"object removal at a 24.6% jump and HDR processing at a 28.5% jump"
I'm not sure how knowledgeable you are about those sorts of algorithms, but they align closely with standard use cases for vectorization. The only surprising thing here is that the source code wasn't more explicitly vectorized, from the outset. It's pretty much a baseline expectation for code like that to be vectorized.
Reply
powwow84

Some of the replies here seem a bit backwards, maybe intentionally so. The issue geekbench has is it's supposed to tell you how you can expect the hardware to perform and Intel is trying to game things for benchmarks but a user wouldn't see if they bought and deployed the product.

To put this in simple terms, imagine a task generally takes 3 seconds on a CPU and runs occasionally. You can make it take take 2.2 seconds, but it will have to precompile the task into an optimized function which will take 20 seconds. Makes a benchmark look faster especially if you are struggling in benchmarks, but your occasionally-run task now takes dramatically longer for the user when they encounter it so there's no way it will actually be used. Hence why nobody else is doing it, and Intel is trying to portray an apples to oranges comparison as apples to apples.
Reply
TerryLaze

wakuwaku said:
You mean reading between Tom's lines? Because there are no lines to read in between if you read the source.

Geekbench staff clearly said that that Geekbench was designed to reflect workloads using different techniques. If that is their philosophy, then using more of a single technique, vectorization in this case, just because it ekes out more performance in their workloads make zero sense. That doesn't reflect the real world. If the real world uses more a lot of vectorization sure it makes more sense. But for now it might as well be a purely synthetic benchmark that tests one kind of workload.
ibot can not change the amount of type of workload there is.......
geekbench already had all of the vectorization workload in it, it just did it in a bad way, either because that's what they found in most real apps or...because they are the bad app...so same thing.

powwow84 said:
Some of the replies here seem a bit backwards, maybe intentionally so. The issue geekbench has is it's supposed to tell you how you can expect the hardware to perform and Intel is trying to game things for benchmarks but a user wouldn't see if they bought and deployed the product.

To put this in simple terms, imagine a task generally takes 3 seconds on a CPU and runs occasionally. You can make it take take 2.2 seconds, but it will have to precompile the task into an optimized function which will take 20 seconds. Makes a benchmark look faster especially if you are struggling in benchmarks, but your occasionally-run task now takes dramatically longer for the user when they encounter it so there's no way it will actually be used. Hence why nobody else is doing it, and Intel is trying to portray an apples to oranges comparison as apples to apples.
It doesn't precompile, it's real time.
Reply
usertests

TerryLaze said:
It doesn't precompile, it's real time.
I think the point is that iBOT only works with a handful of games/apps, so it's not representative of the CPU's performance.

If Intel "games" almost everything that runs, we could just call it a special IPC increase.

Maybe Intel will move towards that but they have to avoid triggering anti-cheat detectors and other security measures.
Reply
icebox768

Additionally, it found no performance improvement with Geekbench 6.7. iBOT computes a checksum against the executable, which means it's trying to find out if a specific binary is optimized.
So even a minor software update will disable this feature. It interferes with the normal operation of the software and requires an unknown number of Intel personnel for targeted adaptation, and it's limited to specific versions of specific software.

I don't know what you guys think, but I usually call this kind of behavior "cheating."
Reply
patriotpa

...almost like GB was compiled to deliberately de-optimize the workload on certain CPUs for an easily guessed reason. $$$$
Reply
Gururu

Seems pretty cool. Not sure Geekbench is the mark to get all twisted on about. It’s the real applications that matter and if it works for the reviewers should work for us the same way. There will be plenty of reviewers that scour the archives to find games not supported for “fairer” comparisons. I’d say that that would be cheating…
Reply
King_V

TerryLaze said:

It doesn't precompile, it's real time.
What, then, is that initial 40 second delay on the first run?
Reply

Show more comments