Geekbench investigates up to 30% jump with Intel's iBOT — performance gain attributed to newly-vectorized instructions

Intel Core Ultra 7 270K Plus CPU installed in a motherboard socket
(Image credit: Tom's Hardware)

A few days after reviews of Intel's Core Ultra 7 270K Plus and Core Ultra 5 250K Plus rolled out, Geekbench said it would invalidate all results recorded with the two CPUs. That's because it's the only non-gaming application that currently supports Intel Binary Optimization Tool, or iBOT, which modifies a binary to optimize it for a specific Intel architecture. A week later, Geekbench has published its findings after investigating what iBOT is doing behind the scenes, and attributed an uplift of up to 30% in certain workloads to newly-vectorized instructions.

With iBOT enabled, Geekbench saw a 14% reduction in overall instructions and a 62% drop in scalar instructions. However, it saw a 1,366% increase in vector instructions. To see which instructions were executing, Geekbench used Intel's Software Development Emulator, or SDE.

Article continues below

With iBOT disabled and after 100 runs of the HDR subtest, Geekbench saw a total of 220 billion scalar instructions and 1.25 billion vector instructions. With iBOT on, that went to 84.6 billion scalar and 18.3 billion vector. By vectorizing a large number of the instructions in this subtest, iBOT is able to significantly improve performance, relying on SIMD (single instruction, multiple data) rather than a linear pipeline (single instruction, single data, or SISD) of scalar instructions.

The change in instruction mix is what's interesting here. Geekbench's conclusion is what you probably expect; it doesn't appreciate an optimization that only applies in a small list of applications. "[iBOT] undermines this by replacing that varied code with processor-tuned, fully optimized binaries, measuring peak rather than typical performance."

Geekbench's view is rather negative, and understandably so, but the peek behind the curtain here has a lot of implications for the future of iBOT. Vectorized instructions on modern CPU architectures can vastly improve performance with a relatively small hit to power consumption — just look at Zen 5's performance in an AVX-512 workload like Y-Cruncher. This investigation shows that Intel is able to do that on the backend with a shipping binary.

There are downsides here, however. Geekbench noted a 40-second startup delay in its initial testing with iBOT, which shrunk to a consistent two-second delay on subsequent passes. There was no delay with iBOT disabled. Additionally, it found no performance improvement with Geekbench 6.7. iBOT computes a checksum against the executable, which means it's trying to find out if a specific binary is optimized.

Google Preferred Source

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

Jake Roach
Senior Analyst, CPUs

Jake Roach is the Senior CPU Analyst at Tom’s Hardware, writing reviews, news, and features about the latest consumer and workstation processors.

  • bit_user
    Nice of them to be transparent about their findings. Reading between the lines, I sense they're a little insulted by how much performance it's implying they left on the table through poor usage of vectorization, in some of those workloads.

    Now, a nice question to answer would be whether the compilers for ARM CPUs was also more aggressive at vectorization, as that could help explain some of the giant lead in GB6 scores that Apple an Qualcomm have pulled vs. x86.
    Reply
  • wakuwaku
    bit_user said:
    Nice of them to be transparent about their findings. Reading between the lines, I sense they're a little insulted by how much performance it's implying they left on the table through poor usage of vectorization, in some of those workloads.
    You mean reading between Tom's lines? Because there are no lines to read in between if you read the source.

    Geekbench staff clearly said that that Geekbench was designed to reflect workloads using different techniques. If that is their philosophy, then using more of a single technique, vectorization in this case, just because it ekes out more performance in their workloads make zero sense. That doesn't reflect the real world. If the real world uses more a lot of vectorization sure it makes more sense. But for now it might as well be a purely synthetic benchmark that tests one kind of workload.
    Reply
  • bit_user
    wakuwaku said:
    Geekbench staff clearly said that that Geekbench was designed to reflect workloads using different techniques. If that is their philosophy, then using more of a single technique, vectorization in this case, just because it ekes out more performance in their workloads make zero sense. That doesn't reflect the real world. If the real world uses more a lot of vectorization sure it makes more sense.
    They said the big gains were:
    "object removal at a 24.6% jump and HDR processing at a 28.5% jump"
    I'm not sure how knowledgeable you are about those sorts of algorithms, but they align closely with standard use cases for vectorization. The only surprising thing here is that the source code wasn't more explicitly vectorized, from the outset. It's pretty much a baseline expectation for code like that to be vectorized.
    Reply
  • powwow84
    Some of the replies here seem a bit backwards, maybe intentionally so. The issue geekbench has is it's supposed to tell you how you can expect the hardware to perform and Intel is trying to game things for benchmarks but a user wouldn't see if they bought and deployed the product.

    To put this in simple terms, imagine a task generally takes 3 seconds on a CPU and runs occasionally. You can make it take take 2.2 seconds, but it will have to precompile the task into an optimized function which will take 20 seconds. Makes a benchmark look faster especially if you are struggling in benchmarks, but your occasionally-run task now takes dramatically longer for the user when they encounter it so there's no way it will actually be used. Hence why nobody else is doing it, and Intel is trying to portray an apples to oranges comparison as apples to apples.
    Reply
  • TerryLaze
    wakuwaku said:
    You mean reading between Tom's lines? Because there are no lines to read in between if you read the source.

    Geekbench staff clearly said that that Geekbench was designed to reflect workloads using different techniques. If that is their philosophy, then using more of a single technique, vectorization in this case, just because it ekes out more performance in their workloads make zero sense. That doesn't reflect the real world. If the real world uses more a lot of vectorization sure it makes more sense. But for now it might as well be a purely synthetic benchmark that tests one kind of workload.
    ibot can not change the amount of type of workload there is.......
    geekbench already had all of the vectorization workload in it, it just did it in a bad way, either because that's what they found in most real apps or...because they are the bad app...so same thing.
    powwow84 said:
    Some of the replies here seem a bit backwards, maybe intentionally so. The issue geekbench has is it's supposed to tell you how you can expect the hardware to perform and Intel is trying to game things for benchmarks but a user wouldn't see if they bought and deployed the product.

    To put this in simple terms, imagine a task generally takes 3 seconds on a CPU and runs occasionally. You can make it take take 2.2 seconds, but it will have to precompile the task into an optimized function which will take 20 seconds. Makes a benchmark look faster especially if you are struggling in benchmarks, but your occasionally-run task now takes dramatically longer for the user when they encounter it so there's no way it will actually be used. Hence why nobody else is doing it, and Intel is trying to portray an apples to oranges comparison as apples to apples.
    It doesn't precompile, it's real time.
    Reply
  • usertests
    TerryLaze said:
    It doesn't precompile, it's real time.
    I think the point is that iBOT only works with a handful of games/apps, so it's not representative of the CPU's performance.

    If Intel "games" almost everything that runs, we could just call it a special IPC increase.

    Maybe Intel will move towards that but they have to avoid triggering anti-cheat detectors and other security measures.
    Reply
  • icebox768
    Additionally, it found no performance improvement with Geekbench 6.7. iBOT computes a checksum against the executable, which means it's trying to find out if a specific binary is optimized.
    So even a minor software update will disable this feature. It interferes with the normal operation of the software and requires an unknown number of Intel personnel for targeted adaptation, and it's limited to specific versions of specific software.

    I don't know what you guys think, but I usually call this kind of behavior "cheating."
    Reply
  • patriotpa
    ...almost like GB was compiled to deliberately de-optimize the workload on certain CPUs for an easily guessed reason. $$$$
    Reply
  • Gururu
    Seems pretty cool. Not sure Geekbench is the mark to get all twisted on about. It’s the real applications that matter and if it works for the reviewers should work for us the same way. There will be plenty of reviewers that scour the archives to find games not supported for “fairer” comparisons. I’d say that that would be cheating…
    Reply
  • King_V
    TerryLaze said:

    It doesn't precompile, it's real time.
    What, then, is that initial 40 second delay on the first run?
    Reply