Intel doesn't have a tool to detect if a chip is affected by crashing errors yet — Intel Default Settings still recommended after patch is applied, but power limits can be raised

Raptor Lake
(Image credit: Tom's Hardware)

Intel says it hasn’t developed a reliable tool yet to detect if a Raptor Lake or Raptor Lake Refresh processor has been affected by the instability issue that can permanently damage a chip. Nevertheless, the company told Tom’s Hardware that it “continues to investigate the possibility of a detection tool” and “will issue an update if one becomes available.” The company also clarified its stance on power limits with the new microcode that can mitigate the crashing errors with chips that haven't experienced instability yet. 

Several 13th- and 14th-gen Intel Core chips have had instability issues that led to system crashes, but it took a few months for the company to find the root cause. The company finally released its findings in late September, plus a microcode update that is thought to fully address the issue once and for all—at least for processors that a degradation of the clock tree circuit hasn't yet impacted.

Unfortunately, any damage done to your processor cannot be reversed by the microcode update. So, even if you install it as soon as it comes out for your motherboard, you might still experience some instability. Thankfully, Intel released an extra two-year warranty on top of the existing three years for all affected processors from the Intel Core i5, i7, and i9 families, giving them five years of coverage. So, even if you have a problem a couple of years from now, you could still RMA the affected chip and get a new one without the issue (or maybe even a free upgrade).

However, it’s still inconvenient to find out that your Intel Core processor is dead or dying as it’s happening. After all, getting a replacement chip via the RMA process could take a few days, so you’ll be without your computer within that time unless you have a backup device. That’s why a tool that would detect if your CPU has suffered degradation and is about to fail would be nice.

Aside from the microcode code updates, the chipmaking giant has advised everyone to stick to the Intel Default Settings as part of the mitigation against the Vmin shift causing the instability issues. However, there are a few caveats. “The microcode update 0x12B (which includes previous microcode updates 0x125 and 0x129), in addition to Intel Default settings, is the full mitigation for the Intel 13th and 14th Generation Desktop Processor Vmin Shift Instability issue,” Intel said.

However, Intel says that even though using Intel’s default power profile is still recommended even after applying patches, users are free to increase the PL1 and PL2 power limits beyond the 'recommended values' and still remain in warranty (you can raise them to 4096W). However, users should still follow the safety settings, such as the IccMax and other settings listed at the top of this table, to remain within warranty (table also here).

CPus

(Image credit: Intel)

As such, Intel says that its warranty will still cover processors that use higher power delivery profiles that remain within the IccMax threshold of the particular processor. So, users can still adjust the PL1/PL2 settings of their Intel Core CPUs without losing warranty coverage. However, the company also says, “Users who desire to overclock or utilize higher power delivery settings than recommended can still do so at their own risk as overclocking may void warranty or affect system health.”

Jowi Morales
Contributing Writer

Jowi Morales is a tech enthusiast with years of experience working in the industry. He’s been writing with several tech publications since 2021, where he’s been interested in tech hardware and consumer electronics.

With contributions from
  • -Fran-
    Didn't nVidia had a tool for this? You need an nVidia GPU though, heh.

    Also, in case the sarcasm wasn't obvious: the reports of the issues started because people were getting errors when installing/updating nVidia drivers and/or compiling shaders in games.

    I'm sure you can simulate that load in a test so you stress test them for this specific issue. At least, it'll be something than just waiting your CPU to die or just stop working correctly?

    Regards.
    Reply
  • bit_user
    I think it's really not in Intel's interest to develop such a tool. It would simply amount to a stress test and people would run it who might not be experiencing crashes related to degradation, but would certainly RMA their CPU if the tool was able to encounter one.

    In other words, it would probably just lead to more RMAs, not less.
    Reply
  • Amdlova
    The 13600T i have have some issues. When I clean the bios cpu don't work anymore. Need to place other cpu and replace after that.
    I think have bad IMC controller or is just a dying intel brick.
    Reply
  • bit_user
    Amdlova said:
    The 13600T i have have some issues. When I clean the bios cpu don't work anymore. Need to place other cpu and replace after that.
    I think have bad IMC controller or is just a dying intel brick.
    Sorry to hear that, bro. Good luck getting your system recovered. Let us know what else you learn about the state of that CPU.
    : (
    Reply
  • rluker5
    I don't like how this article claims that all possible instability is due to vmin shift aka excessive voltage induced degradation.

    I don't have vmin shift but I have experienced instability from what I believe are more common sources such as:

    Excessive undervolting for better efficiency and/or better clocks within my cooling budget. This has at one time or another affected nearly every chip I own, yet is easily permanently fixed by raising the voltage closer to stock. You even find out how far you can undervolt by continuing until you get instability. Then you increase voltage until it is stable. Motherboard default settings also undervolt your CPU when it is under load with their default LLC settings.

    Enabling XMP and further ram overclocking and tightening of timings. In previous years one could just enable XMP and assume that it worked. This is no longer the case. Especially if you have a lower priced 4 dimm motherboard. If you buy an XMP set of 8000 ram it is almost certainly going to be unstable at XMP settings. Also overclocking and timing tightenings are tested to the point of creating instability then pulled back a bit.

    Right now I have 73 mods active when I run CP2077. Mods aren't always stable and the more you have the greater the chance of instability. Sometimes game updates break mods. The same can be said for some Windows updates like the latest one that had to be pulled because some systems would no longer boot. Some drivers are poor and/or don't work with the latest versions of Windows. Like soundcard drivers.

    Claiming that these three sources of instability I listed (and all others) are the same as chip degradation is a lie. And it is doing a disservice to people with different sources of instability to tell them that everything is because their chip has degraded.

    Testing for vmin shift with the other likely causes for instability removed indicates vmin shift. Simple instability in general use does not indicate vmin shift, it only indicates the possibility of it.

    Excessive voltage is the main cause. If you use voltage monitoring software and your voltages are frequently above 1.5v your CPU is likely at high risk. If your CPU never gets over 1.4v you are at very low risk of voltage degrading your CPU.
    Reply
  • rluker5
    Amdlova said:
    The 13600T i have have some issues. When I clean the bios cpu don't work anymore. Need to place other cpu and replace after that.
    I think have bad IMC controller or is just a dying intel brick.
    That is very vague so I'll offer some vague responses.
    Usually when a problem arises when I change something, the thing I changed has come with problems.
    Perhaps the new bios is the issue and the old one was fine? It is a less common chip and some support for it may have dropped.

    Also I have had issues with bios settings when reset to default when I was making a lot of changes and what fixed it was reflashing the bios. It seemed that not everything was resetting to optimized defaults when I chose that option and I had partially modified bios settings that weren't stable. The recognition of a new chip by the motherboard did a better job of resetting those unreset options. Reflashing the bios reset all of the options and gave a clean bios. But this was only on one or 2 motherboards. I think most do a good job.

    Also I have had only one CPU ever fail (the mobile to desktop interposer on a 5950hq to LGA 1150 socket chip, specifically) and have had 2 bios chips fail. One from Ebay bought and one on a storebought motherboard. Both potentially failed from having the plug in chips shifted in their sockets. Both were fixed by replacements of the bios chip and the storebought one is still running fine in my daughter's PC.

    Also I have had socket mounting issues with both delidded CPUs and adapted mobile CPUs. Some of which were from shifting the pc around with a big air cooler. I had to play around to get the socket mounting pressure right.

    I'm bringing up bios and mounting because it sounds like bios resets cause the problem and remounting fixes them.
    Maybe if you are using that old Audigy soundcard, or have some usb device plugged in that might be the issue. I have an old HT Omega Claro PCI soundcard that gets some fuss from a system that is running a UEFI bios, some old GPUs don't work on UEFI, and have had some troubles booting when some USB hardware is plugged in. Specifically a Creative SoundBlaster Play3(might be fried) and a usb DVD drive. I've also had a broken Hauppage TV card block bootup and have had a swapped out wifi card on a motherboard that came from the factory with a different one block startup.

    These were all from LGA1150 motherboards though. But I have heard that if you aren't careful with one of those aftermarket CPU mounting plates that things don't work well. But both of the PCs I have that use them have been fine.

    Perhaps you brought up IMC because you are getting a ram issue light or code with the 13600t that your temporary replacement CPU gets the jedec ram timings to work then your 13600t works with those same timings? I don't think motherboards change anything for jedec, but maybe some do. If it is a IMC related ram functionality problem, maybe starting on a "cleaned" bios with just one stick to be easier on the IMC will help, then you could go to 2? It would be easier and probably lower risk to your motherboard's fragile bits to try that first.
    Reply
  • Amdlova
    I have contacted intel on my country to see if they will change the cpu... but have a fuuu 2d qr code...
    You need that key to send it for RMA don't find any working app to read that QR code. But intel have an old ap can read :S "Intel RL kit" you need to find on Google because don't have on playstore anymore.

    I will try send this cpu for RMA I don't think they will have the same cpu, maybe send another 65w cpu.
    Reply
  • bit_user
    Amdlova said:
    I have contacted intel on my country to see if they will change the cpu... but have a fuuu 2d qr code...
    You need that key to send it for RMA don't find any working app to read that QR code.
    There are some web-based QR code decoders, where I think you just upload the image and it gives you the resulting string. Perhaps you could try those.
    Reply
  • TheHerald
    rluker5 said:
    I don't like how this article claims that all possible instability is due to vmin shift aka excessive voltage induced degradation.

    I don't have vmin shift but I have experienced instability from what I believe are more common sources such as:

    Excessive undervolting for better efficiency and/or better clocks within my cooling budget. This has at one time or another affected nearly every chip I own, yet is easily permanently fixed by raising the voltage closer to stock. You even find out how far you can undervolt by continuing until you get instability. Then you increase voltage until it is stable. Motherboard default settings also undervolt your CPU when it is under load with their default LLC settings.

    Enabling XMP and further ram overclocking and tightening of timings. In previous years one could just enable XMP and assume that it worked. This is no longer the case. Especially if you have a lower priced 4 dimm motherboard. If you buy an XMP set of 8000 ram it is almost certainly going to be unstable at XMP settings. Also overclocking and timing tightenings are tested to the point of creating instability then pulled back a bit.

    Right now I have 73 mods active when I run CP2077. Mods aren't always stable and the more you have the greater the chance of instability. Sometimes game updates break mods. The same can be said for some Windows updates like the latest one that had to be pulled because some systems would no longer boot. Some drivers are poor and/or don't work with the latest versions of Windows. Like soundcard drivers.

    Claiming that these three sources of instability I listed (and all others) are the same as chip degradation is a lie. And it is doing a disservice to people with different sources of instability to tell them that everything is because their chip has degraded.

    Testing for vmin shift with the other likely causes for instability removed indicates vmin shift. Simple instability in general use does not indicate vmin shift, it only indicates the possibility of it.

    Excessive voltage is the main cause. If you use voltage monitoring software and your voltages are frequently above 1.5v your CPU is likely at high risk. If your CPU never gets over 1.4v you are at very low risk of voltage degrading your CPU.
    The problem with undervolting is, you never know if you are stable.Passing stress tests doesn't mean you are stable, it means you haven't crashed yet. That's why it's highly recommended to test your undervolts at high temperatures, like 90+C, and up the temp limit to 110c while you are stress testing. If you are stable at 100c, then sure you will be stable at 90 or lower.
    Reply
  • rluker5
    TheHerald said:
    The problem with undervolting is, you never know if you are stable.Passing stress tests doesn't mean you are stable, it means you haven't crashed yet. That's why it's highly recommended to test your undervolts at high temperatures, like 90+C, and up the temp limit to 110c while you are stress testing. If you are stable at 100c, then sure you will be stable at 90 or lower.
    Alder Lake and Raptor Lake both adjust voltage with temperature changes. And even with undervolting it is not difficult to reach high temperatures with the 13900kf. Which throttles at 100c BTW. How much power do you think an i9 can consume at 1.4v? 150w? It is closer to 350w.

    But as far as stability is concerned, if you don't crash in testing and don't crash in applications, how are you not stable? Applications are also clearly tests. Is there some new definition that doesn't involve crashing, or getting full expected performance numbers?

    Reducing the max voltage some of these i9s to normal levels see is the way to fix their heat and degradation issues. You can easily have a long lasting, performance enhanced chip, as the majority do.

    It does not surprise me that bit_user is against it.
    Reply