Intel Finds Bug in AMD's Spectre Mitigation, AMD Issues Fix

AMD
(Image credit: AMD)

News of a fresh Spectre BHB vulnerability that only impacts Intel and Arm processors emerged this week, but Intel's research around these new attack vectors unearthed another issue: One of the patches that AMD has used to fix the Spectre vulnerabilities has been broken since 2018. Intel's security team, STORM, found the issue with AMD's mitigation. In response, AMD has issued a security bulletin and updated its guidance to recommend using an alternative method to mitigate the Spectre vulnerabilities, thus repairing the issue anew.

As a reminder, the Spectre vulnerabilities allow attackers unhindered and undetectable access to information being processed in a CPU through a side-channel attack that can be exploited remotely. Among other things, attackers can steal passwords and encryption keys, thus giving them full access to an impacted system.

Intel's research into AMD's Spectre fix begins in a roundabout way — Intel's processors were recently found to still be susceptible to Spectre v2-based attacks via a new Branch History Injection variant, this despite the company's use of the Enhanced Indirect Branch Restricted Speculation (eIBRS) and/or Retpoline mitigations that were thought to prevent further attacks.

In need of a newer Spectre mitigation approach to patch the far-flung issue, Intel turned to studying alternative mitigation techniques. There are several other options, but all entail varying levels of performance tradeoffs. Intel says its ecosystem partners asked the company to consider using AMD's LFENCE/JMP technique. The "LFENCE/JMP" mitigation is a Retpoline alternative commonly referred to as "AMD's Retpoline."

As a result of Intel's investigation, the company discovered that the mitigation AMD has used since 2018 to patch the Spectre vulnerabilities isn't sufficient — the chips are still vulnerable. The issue impacts nearly every modern AMD processor spanning almost the entire Ryzen family for desktop PCs and laptops (second-gen to current-gen) and the EPYC family of datacenter chips.

The abstract of the paper, titled "You Cannot Always Win the Race: Analyzing the LFENCE/JMP Mitigation for Branch Target Injection," lists three Intel authors that hail from Intel's STORM security team: Alyssa Milburn, Ke Sun, and Henrique Kawakami. The abstract sums up the bug the researchers found pretty succinctly:

"LFENCE/JMP is an existing software mitigation option for Branch Target Injection (BTI) and similar transient execution attacks stemming from indirect branch predictions, which is commonly used on AMD processors. However, the effectiveness of this mitigation can be compromised by the inherent race condition between the speculative execution of the predicted target and the architectural resolution of the intended target, since this can create a window in which code can still be transiently executed. This work investigates the potential sources of latency that may contribute to such a speculation window. We show that an attacker can "win the race", and thus that this window can still be sufficient to allow exploitation of BTI-style attacks on a variety of different x86 CPUs, despite the presence of the LFENCE/JMP mitigation."
 
Intel's strategic offensive research and mitigation group (STORM) is an elite team of hackers that attempts to hack Intel's own chips, which you can read about more here.

AMD Security Bulletin

(Image credit: AMD)

In response to the STORM team's discovery and paper, AMD issued a security bulletin (AMD-SB-1026) that states it isn't aware of any currently active exploits using the method described in the paper. AMD also instructs its customers to switch to using "one of the other published mitigations (V2-1 aka ‘generic retpoline’ or V2-4 aka ‘IBRS’)." The company also published updated Spectre mitigation guidance reflecting those changes [PDF].

AMD commented on the matter to Tom's Hardware, saying, "At AMD, product security is a top priority and we take security threats seriously. AMD follows coordinated vulnerability disclosure practices within the ecosystem, including Intel, and seeks to respond quickly and appropriately to reported issues. For the mentioned CVE, we followed our process by coordinating with the ecosystem and publishing our resulting guidance on our product security website."

We asked Intel if it had found other vulnerabilities in AMD's processors in the past, or if this were an isolated event. "We invest extensively in vulnerability management and offensive security research for the continuous improvement of our products. We also work to get outside perspectives, collaborating with researchers and leading academic institutions to find and address vulnerabilities," a company representative responded. "If we identify an issue that we believe may impact the broader industry, we follow coordinated vulnerability disclosure practices to report potential vulnerabilities to vendors and release findings and mitigations together."

Security vulnerabilities obviously make for what would normally be strange bedfellows. In this case, that's a good thing: The Spectre vulnerabilities threaten the very foundations of security in the silicon that powers the world. AMD's security bulletin thanks Intel's STORM team by name and noted it engaged in the coordinated vulnerability disclosure, thus allowing AMD enough time to address the issue before making it known to the public. That's good for everyone.  

Paul Alcorn
Managing Editor: News and Emerging Tech

Paul Alcorn is the Managing Editor: News and Emerging Tech for Tom's Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.

  • The_King
    When can we expect new Benchmarks to show just how much the performance difference there is now, between Intel and AMD after the new Spectre V2 mitigation.

    https://www.tomshardware.com/news/amd-cpus-see-less-than-10-performance-drop-from-revised-spectre-v2-mitigations
    Reply
  • USAFRet
    The_King said:
    When can we expect new Benchmarks to show just how much the performance difference there is now, between Intel and AMD after the new Spectre V2 mitigation.

    https://www.tomshardware.com/news/amd-cpus-see-less-than-10-performance-drop-from-revised-spectre-v2-mitigations
    With all the previous mitigations, how much performance change did we see?
    Reply
  • TerryLaze
    The_King said:
    When can we expect new Benchmarks to show just how much the performance difference there is now, between Intel and AMD after the new Spectre V2 mitigation.

    https://www.tomshardware.com/news/amd-cpus-see-less-than-10-performance-drop-from-revised-spectre-v2-mitigations
    Take a look inside the article, don't go by the headline alone, epyc is only impacted by about 10% while the 5950x is upto™ 54% and that is in context-switching which is heavily used in games and even more so when gaming while doing more stuff on your PC at the same time.
    The Ryzen 9 5950X (Vermeer) suffered a 54% performance reduction with the Stress-NG (Context Switching) benchmark.

    Compared to the Ryzen 9 5950X, the Ryzen 9 5900HX (Cezanne) wasn't affected as much with Stress-NG. The mobile Zen 3 chip only saw 22% lower performance.


    with the EPYC 72F3, Phoronix only logged 8.9% lower networking performance and 7.2% lower storage performance.
    Reply
  • Which of these Spectre, and other, vulnerabilities can be executed remotely and have been seen to be used? It's starting to look like a PR war to me at the moment.
    Reply
  • hotaru.hino
    tommo1982 said:
    Which of these Spectre, and other, vulnerabilities can be executed remotely and have been seen to be used? It's starting to look like a PR war to me at the moment.
    On one hand yes. On the other, a vulnerability with potentially severe consequences if exploited should be patched if there's an actual way to exploit it, regardless of how feasible it is do so. It's a known vulnerability, someone's going to use it for bad purposes.
    Reply
  • domih
    TerryLaze said:
    Take a look inside the article, don't go by the headline alone, epyc is only impacted by about 10% while the 5950x is upto™ 54% and that is in context-switching which is heavily used in games and even more so when gaming while doing more stuff on your PC at the same time.

    The chip is going down! Save the children! AMD is dead before the end of the week!

    Mmm... Let's be real and use some facts.

    I tested before and after the patches with Passmark and then with a more real life workload(*) on my TR 3960X as well as my 5950X. In both cases Passmark (CPU and Memory tests) speed degradation was less than 1% and the real life workload speed degradation was less than 3%. Tested on Ubuntu 20.04 LTS.

    (*) A complete unit-testing run on a project from work that takes 10 minutes to complete (heavy computation, crypto security and database).

    I don't use Windows so I let another good Samaritan do the same thing on Win 10 and Win 11.

    This is consistent with what Phoronix found and published (see https://www.phoronix.com/scan.php?page=article&item=amd-retpoline-2022&num=1).

    Tom's Hardware reports the same (see https://www.tomshardware.com/news/amd-cpus-see-less-than-10-performance-drop-from-revised-spectre-v2-mitigations)

    Meanwhile BHI / Spectre-BHB affects INTEL and ARM, but so far AMD is not considered affected.

    CONCLUSION: from my seat, with AMD, it's a non story. The children are fine. The Sun rose this morning.
    Reply
  • hotaru.hino
    TerryLaze said:
    Take a look inside the article, don't go by the headline alone, epyc is only impacted by about 10% while the 5950x is upto™ 54% and that is in context-switching which is heavily used in games and even more so when gaming while doing more stuff on your PC at the same time.
    Stress-ng is also like prime95 in that it's a torture test. It's right there in the manpage:

    stress-ng was originally intended to make a machine work hard and trip hardware issues such as thermal overruns as well as operating system bugs that only occur when a system is being thrashed hard. Use stress-ng with caution as some of the tests can make a system run hot on poorly designed hardware and also can cause excessive system thrashing which may be difficult to stop.

    I would file this under an unrealistic test, much like prime95. It's only useful in seeing how the processor performs in extreme cases.
    Reply
  • TerryLaze
    hotaru.hino said:
    Stress-ng is also like prime95 in that it's a torture test. It's right there in the manpage:
    Yes but a stress test heavy in context switching which happens always no matter how many cores you have (because windows) , while prime is heavy in calculating prime numbers that ..is used?! Like at all?! ... Somewhere?!
    Reply
  • hotaru.hino
    TerryLaze said:
    Yes but a stress test heavy in context switching which happens always no matter how many cores you have (because windows) , while prime is heavy in calculating prime numbers that ..is used?! Like at all?! ... Somewhere?!
    Context switching can be an expensive operation yes, but that means the scheduler is aware of this as well. Threads and processes don't simply swap in and out all the time, they only do that if they're ready to run. And considering that CPU % utilization on any given OS's resource manager (Task Manager, top, etc.) means how often the CPU did not run system idle task, a low CPU % utilization indicates to me there's not a lot of context switching going on.

    Otherwise you're free to prove to us that the numbers stress-ng uses are the similar to that in a given user scenario.
    Reply
  • TerryLaze
    hotaru.hino said:
    Context switching can be an expensive operation yes, but that means the scheduler is aware of this as well. Threads and processes don't simply swap in and out all the time, they only do that if they're ready to run. And considering that CPU % utilization on any given OS's resource manager (Task Manager, top, etc.) means how often the CPU did not run system idle task, a low CPU % utilization indicates to me there's not a lot of context switching going on.

    Otherwise you're free to prove to us that the numbers stress-ng uses are the similar to that in a given user scenario.
    If you are buying a 32 thread monster to have it sit idle that's up to you...
    You can always just use it for cinebench and the like where there is zero context switching.
    Reply