Intel finds root cause of CPU crashing and instability errors, prepares new and final microcode update

Raptor Lake
Raptor Lake (Image credit: Intel)

Although Intel recognized the root cause of failures among 13th and 14th Generation Core 'Raptor Lake' processors in late July — its microcode made the CPU demand elevated voltage levels beyond safe limits — the company never delivered a precise diagnosis. It has now outlined an issue known as Vmin Shift Instability, which can occur under four circumstances.

The problem stems from a clock tree circuit in the IA core that is prone to failure under high voltage and temperature, causing a shift in the clock duty cycle and leading to system instability. Intel has pinpointed four key operating conditions that trigger this issue and implemented mitigations through various microcode updates.

  • First, motherboard power settings exceed Intel's recommended guidelines, causing Vmin shift. Intel advises users to follow its default power settings to avoid this problem. 
  • Second, the eTVB microcode allowed certain 13th and 14th Generation Core i9 processors to maintain high performance even at elevated temperatures, which was corrected with the 0x125 microcode update released in June 2024.
  • Third, the SVID microcode sometimes requests higher voltages over an extended period, increasing the risk of instability. Intel resolved this with the 0x129 microcode update, distributed in August 2024. 
  • Lastly, both microcode and BIOS were requesting elevated voltages during idle or light activity, which is mitigated by the 0x12B microcode update, combining previous fixes.

Intel's internal tests show that the 0x12B update does not noticeably affect performance. Benchmarks and gaming tests, including popular titles like Cyberpunk 2077 and Shadow of the Tomb Raider, showed results within normal expected variations when compared to the earlier 0x125 update. 

Intel is working with motherboard makers to ensure the 0x12B microcode update is distributed via BIOS updates. This rollout may take several weeks, but Intel is pushing for quick validation and implementation. 

Intel also took time to assure its customers once again that its existing mobile processors as well as upcoming codenamed Lunar Lake and Arrow Lake processors are not affected by this issue

Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

  • bit_user
    The drip-feed of these mitigations doesn't inspire a ton of confidence. How do we know they aren't just pacing them out until Arrow Lake launches, at which point they'll release the final mitigation that actually has a significant performance impact.

    The article said:
    Intel's internal tests show that the 0x12B update does not noticeably affect performance. Benchmarks and gaming tests, including popular titles like Cyberpunk 2077 and Shadow of the Tomb Raider, showed results within normal expected variations when compared to the earlier 0x125 update.
    Yeah, relative to 0x125 - and that includes running at stock power limits and Tau, which is not what a lot of gaming boards defaulted to, or how most reviewers benchmarked it.

    I'd love to see how much performance the i9-14900K lost, between those original reviews and this latest update.
    Reply
  • Marlin1975
    Hahaha So this time its going to be fixed.... trust intel... haha

    I have to agree. This just seems like they are delaying recalls and hope people will forget.
    Reply
  • ThisIsMe
    In a world of infinite possibilities, that certainly is a possibility. No need to stop there you two. Keep the imagination flowing. lol
    Reply
  • setx
    Look, it's 14+++ again!
    Reply
  • rluker5
    I don't know if messing with how load line calibration works is a good thing or not. That is what this microcode update is doing. And it is doing it because the motherboard manufacturers are setting the LLC to have too much vdroop by default.

    When you apply a lot of amps to your CPU and the motherboard has set the LLC to a large amount of vdroop the volts drop a lot. But they have to stay high enough to be stable. So the time when the CPU is not running a lot of amps the voltages will be much higher.

    If you lower the vdroop you can set your load voltages the same and your low load voltages will be much lower.

    What the motherboard manufacturers have done, in an attempt to bend the rules to apply a factory undervolt, is responsible for nearly all of the degradation issues.
    Reply
  • TheHerald
    bit_user said:
    The drip-feed of these mitigations doesn't inspire a ton of confidence. How do we know they aren't just pacing them out until Arrow Lake launches, at which point they'll release the final mitigation that actually has a significant performance impact.


    Yeah, relative to 0x125 - and that includes running at stock power limits and Tau, which is not what a lot of gaming boards defaulted to, or how most reviewers benchmarked it.

    I'd love to see how much performance the i9-14900K lost, between those original reviews and this latest update.
    Why would it have significant performance impact? Even if it's let's say restricted to the 12900k or ks clockspeeds (which supposedly have no such issue) performance should be around 36k in cbr23 at 253w PL2. The 8 extra ecores are scoring 9k on their own.
    Reply
  • RodroX
    To bad the article forget to mention that once the damage is done to your chip, theres no turn back and that intel should accept an RMA request for it.
    Reply
  • Elusive Ruse
    bit_user said:
    The drip-feed of these mitigations doesn't inspire a ton of confidence. How do we know they aren't just pacing them out until Arrow Lake launches, at which point they'll release the final mitigation that actually has a significant performance impact.


    Yeah, relative to 0x125 - and that includes running at stock power limits and Tau, which is not what a lot of gaming boards defaulted to, or how most reviewers benchmarked it.

    I'd love to see how much performance the i9-14900K lost, between those original reviews and this latest update.
    TH owes us a couple of new benchmarks, the post damage mitigation versions of Raptor lake and Windows 24H2 update that lifted Zen4 and 5 performance.
    Reply
  • great Unknown
    "All of the known Raptor Lake issues are now mitigated."

    In my programming days, I tended to comment my code with
    statements like, "This prevents such-and-such problems. FLW."

    Famous Last Words.
    Reply
  • EzzyB
    bit_user said:
    The drip-feed of these mitigations doesn't inspire a ton of confidence. How do we know they aren't just pacing them out until Arrow Lake launches, at which point they'll release the final mitigation that actually has a significant performance impact.


    Yeah, relative to 0x125 - and that includes running at stock power limits and Tau, which is not what a lot of gaming boards defaulted to, or how most reviewers benchmarked it.

    I'd love to see how much performance the i9-14900K lost, between those original reviews and this latest update.
    Probably none if you consider that stock power limits are the manufacturers recommended power limits. The fact that board makers and reviewers test things outside those limits is absolutely not Intel's fault.

    It's interesting that Puget Systems, a manufacturer of high-end workstations, reported that at the same time they actually had more failures in AMD processors (though no higher than normal.) The reason? They never exceeded Intel's recommended power limits.

    This whole thing smacks me like the techno-amateur with his 15% "stable overclock" (the long standing Greatest Lie in Computing) ranting about a BSOD and how Windows sucks! I mean it passed Prime 95, right?

    Think about this, I'm still using a system based on a 9700K. I bought the components years ago, assembled them and, right out of the box, the processor that is supposed to run 6 of it's 8 cores at 3.6ghz base was running every core at a base of 4.8ghz with no intervention whatsoever from the user. This is because, at some point, Tom's Hardware or some other site is going to review that board and they need it to seem as fast or faster than other boards. Millions are spent because this board does 197FPS instead of that board at 194FPS and our brains can't even tell the difference!

    The fact that all these mitigations are forcing the CPU to not accept out of parameter power should tell you something, right?
    Reply