Inside Intel's Secret Overclocking Lab: The Tools and Team Pushing CPUs to New Limits

How to Void Your Warranty as Safely as Possible

(Image credit: Tom's Hardware)

Intel spends a tremendous amount of time and treasure assuring that its chips will run beyond the rated speed, thus delivering some value for the extra dollars you plunk down for an overclockable K-Series chip. However, in spite of what some might think given Intel's spate of overclocking-friendly features and software, we have to remember that, unless you pay an additional fee for an insurance policy, overclocking voids your warranty.

The reason behind that is simple, but the physics are mindbogglingly complex. Every semiconductor process has a point on its voltage/frequency curve beyond which a processor will wear out at an untenable rate. If the chip wears enough, it triggers electromigration (the process of electrons slipping through the electrical pathways), which leads to premature chip death. Some factors are known to increase the rate of wear, such as the higher current and thermal density that comes as a result of overclocking.

All this means that, like the carton of milk in your refrigerator, your chip has an expiration date. It's the job of semiconductor engineers to predict that expiration date and control it with some accuracy, but Intel specs lifespan at out-of-the-box settings. Because increasing frequency through overclocking requires pumping more power through the chip, thus generating more heat, higher frequencies typically result in faster aging, and thus lowered life span.

In other words, all bets are off for Intel's failure rate predictions once you start bumping up the voltage. But there are settings and techniques that overclockers can use to minimize the impact of overclocking, and if done correctly, premature chip death from overclocking isn't a common occurrence.

Because Intel doesn’t cover overclocking with its warranty, the company doesn't specify what it would consider 'safe' voltages or settings.

But we're in a lab with what are arguably some of the smartest overclockers in the world, and these engineers spend their time analyzing failure rate data (and its relationship to the voltage/frequency curve) that will never be shared with the public.

We're aware that, due to company policy, the engineers couldn't give us an official answer to the basic question of what is considered a safe voltage, but that didn't stop us from asking what voltages and settings the lab members use in their own home machines. Given that, for a living, they study data that quantifies life expectancy at given temperatures and voltages... Well, connect the dots. 

Speaking as enthusiasts, the engineers told us they feel perfectly fine running thier Coffee Lake chips at home at 1.4V with conventional cooling, which is higher than the 1.35V we typically recommend as the 'safe' ceiling in our reviews. For Skylake-X, the team says they run their personal machines anywhere from 1.4V to 1.425V if they can keep it cool enough, with the latter portion of the statement being strongly emphasized.

At home, the lab engineers consider a load temperature above 80C to be a red alert, meaning that's the no-fly zone, but temps that remain steady in the mid-70’s are considered safe. The team also strongly recommends using adaptive voltage targets for overclocking and leaving C-States enabled. Not to mention using AVX offsets to keep temperatures in check during AVX-heavy workloads.

As Ragland explained, the amount of time a processor stays in elevated temperature and voltage states has the biggest impact on lifespan. You can control the temperature of your chip with better cooling, which then increases lifespan (assuming the voltage is kept constant). Assuming voltage remains constant, each successive drop in temperature results in a non-linear increase in life expectancy, so the 'first drop' in temps from 90C to 80C yields a huge increase in chip longevity. In turn, colder chips run faster at lower voltages, so dropping the temperature significantly by using a beefier cooling solution also allows you to drop the voltage further, which then helps control the voltage axis. 

In the end, though, voltage is the hardest variable to contain. Ragland pointed out that voltages are really the main limiter that prevents Intel from warrantying overclocked processors, as higher voltages definitely reduce the lifespan of a processor.

But Ragland has some advice: "As an overclocker, if you manage these two [voltage and temperature], but especially think about 'time in state' or 'time at high voltage,' you can make your part last quite a while if you just think about that. It's the person that sets their system up at elevated voltages and just leaves it there 24/7 [static overclock], that's the person that is going to burn that system out faster than someone who uses the normal turbo algorithms to do their overclocking so that when the system is idle your frequency drops and your voltage drops with it. So, There's a reason we don't warranty it, but there's also a way that overclockers can manage it and be a little safer."

That means manipulating the turbo boost ratio is much safer than assigning a static clock ratio via multipliers. As an additional note, you should shoot for idle temperatures below 30C, though that isn't much of a problem if you overclock via the normal turbo algorithms as described by Ragland.

Feedback 

Hanging out with Intel's OC lab team was certainly a learning experience. The engineers have a passion for their work that's impossible to fake: Once you start talking shop you can get a real sense of a person's passion, or lack thereof, for their craft. From our meeting, we get the sense that Intel's OC lab crew members measure up to any definition of true tech enthusiasts, and we got the very real impression this is more than "just a job" to them.

Like employees at any company, there are certain things the engineers simply aren't allowed to answer, but they were forthright with what they could share and what they couldn't. We're accustomed to slippery non-answer answers to our questions from media-trained representatives (from pretty much every company) when a simple "I can't answer that" would suffice. There were plenty of "I can't answer that" responses during our visit, but we appreciate the honesty.

We peppered the team with questions, but they also asked us plenty of questions. The team was almost as interested in our observations and our take on the state of the enthusiast market as we were interested in their work, which is refreshing. We had a Q&A session where we were free to give feedback, and while Ragland obviously can't make the big C-suite-level decisions, his team sits at the nexus of the company's overclocking efforts, so we hope some of our feedback is taken upstream.

In our minds, overclocking began all those years ago as a way for enthusiasts to get more for their dollar. Sure, it's a wonderful hobby, but the underlying concept is simple: Buy a cheaper chip and spend some time tuning to unlock the performance of a more expensive model. Unfortunately, over the years, Intel's segmentation practices have turned overclocking into a premium-driven affair, with prices for overclockable chips taking some of the shine off the extra value. Those same practices have filtered out to motherboard makers, too. We should expect to pay extra for the premium components required to unlock the best of overclocking, but in many cases, the "overclocking tax" has reached untenable levels. 

While segmentation is good for profits, it also leaves Intel ripe for disruption. That disruptor is AMD, which freely allows overclocking on every one of its new chips.

Unfortunately, we can't rationally expect Intel to suddenly unlock every chip and abandon a segmentation policy that has generated billions of dollars in revenue, but there are reasonable steps it could take to improve its value proposition.

Case in point: Intel's policy of restricting overclocking to pricey Z-series motherboards. AMD allows overclocking on nearly all of its more value-centric platforms (A-series excluded), which makes overclocking more accessible to mainstream users. We feel very strongly that Intel should unlock overclocking on its downstream B- and H-series chipsets to open up overclocking to a broader audience.

Intel's team expressed concern that the power delivery subsystems on many of these downstream motherboards aren't suitable for overclocking, which is a fair observation. However, there is surely at least some headroom for tuning, and we bandied about our suggestion of opening up some level of restricted overclocking on those platforms. Remember, back in the Sandy Bridge era, Intel restricted overclocking to four bins (400 MHz), so there is an established method to expose at least some headroom. We're told our feedback will be shared upstream, and hopefully it is considered. We'd like to see Intel become more competitive in this area, as that would benefit enthusiasts on both sides of the ball. 

Our other suggestions include that Intel works on a dynamic approach to its auto-overclocking mechanisms. AMD's Precision Boost Overdrive (PBO) opened up overclocking to less-knowledgeable users by creating a one-click tool to auto-overclock your system. Intel's relatively-new IPM is also a great one-click tool that accomplishes many of the same goals, but it is based on static overclock settings that don't automatically adapt in real-time to the chip's properties or changes in thermal conditions. Instead, you have to re-run the utility. We'd like to see a more dynamic approach taken, and we're told that Intel is already evaluating that type of methodology.

MORE: Best CPUs

MORE: Intel & AMD Processor Hierarchy

MORE: All CPUs Content

Paul Alcorn
Managing Editor: News and Emerging Tech

Paul Alcorn is the Managing Editor: News and Emerging Tech for Tom's Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.

  • Dark Lord of Tech
    Can you get the AMD tour? Would love to see that.
    Reply
  • PaulAlcorn
    Dark Lord of Tech said:
    Can you get the AMD tour? Would love to see that.

    I'll jump on a plane the second it is offered :)
    Reply
  • bit_user
    @PaulAlcorn , thanks for the awesome piece!

    I'm still making my way through it, but wanted to draw special attention to this bit:
    the engineers told us they feel perfectly fine running thier Coffee Lake chips at home at 1.4V with conventional cooling, which is higher than the 1.35V we typically recommend as the 'safe' ceiling in our reviews. For Skylake-X, the team says they run their personal machines anywhere from 1.4V to 1.425V if they can keep it cool enough, with the latter portion of the statement being strongly emphasized.

    At home, the lab engineers consider a load temperature above 80C to be a red alert, meaning that's the no-fly zone, but temps that remain steady in the mid-70’s are considered safe. The team also strongly recommends using adaptive voltage targets for overclocking and leaving C-States enabled. Not to mention using AVX offsets to keep temperatures in check during AVX-heavy workloads.
    Thanks for that!
    Reply
  • StewartHH
    Some one should comparison between different vendors die size like Intel 10nm vs AMD 7nm to see if there is actually performance gain. I would use per-core speed and not taking multiple cores into account.
    Reply
  • bit_user
    @PaulAlcorn , uh oh. Now that I just finished heaping praise, I've got a gripe. In the penultimate paragraph:

    ... assures that the learnings lessons and advances made in the overclocking realm ...

    I was saddened to see the "learnings" virus infecting your otherwise admirable writing.

    I think "learnings" is one of those pseudo-jargon words that MBAs and other B-school types like to throw around, out of jealousy for practitioners of real professions. Everyone from auto mechanics to accountants, lawyers, and doctors needs jargon to adequately and efficiently express concepts and constructs central to their work. However, common sense pervades business to such a degree that I think they're embarrassed by how easily understandable it'd be, if they didn't inject some fake jargon to obscure the obvious. The resulting assault on the English language is disheartening, at best.

    Yes, if you've ever heard of her, you probably guessed I'm a fan of Lucy Kellaway, former journalist of the Financial Times and BBC. Worth a read:

    The 8 Lucy Kellaway rules for claptrap and the fundamental theorem of corporate BS
    Lucy Kellaway’s dictionary of business jargon and corporate nonsense
    Reply
  • Gurg
    AMD CTO Mark Papermaster: "you can't rely on that frequency bump from every new semiconductor node." AMD's future outlook of very limited frequency bumps, performance increases only from more cores and expensive software modifications to use more cores.
    VersusIntel Ragland: "People who think this the end of the world for overclocking because our competitors' 7nm has very little headroom, that's not true. Intel is all about rock-solid reliability; our parts aren't going to fail...you can count on your part running at spec, so there's so much inherent margin that we will always have overclocking headroom...I think users will be happy with the margin we can offer in the future."

    Ouch! Intel's Ragland really "punked" AMD's negative outlook.

    PS Great fascinating article
    Reply
  • jiang-v
    Anyone knows how to made contact with them? cause I fould a big bug on 10th corex chip about adaptive mode overclocking
    overclocking/comments/ehxa7cView: https://www.reddit.com/r/overclocking/comments/ehxa7c/big_bug_in_10th_core_x_vid_mechanism_worst_avx512/
    Reply
  • nofanneeded
    In the past OC gave a huge difference , today we can easy hit 4.4 all cores without OC and this is more than enough for me.

    for me OCing is dead. and I dont care about missing 5 fps.

    I put the price difference in a better GPU ...
    Reply
  • CompuTronix
    Outstanding article! Thank you, Paul! I would love to have been there. I have a few dozen questions that the Team may or may not have been allowed answered.

    However, like bit_user, I found it of particular interest that the Team was forthcoming regarding specific voltage and temperature values they're comfortable with running on their personal home rigs, which max out at 1.425 and 80°C. With respect to electomigation and longevity, every day in the forums we see many overclockers express their concerns over these very issues.

    On their website, Silicon Lottery shows Historical Binning Statistics that include the Core voltages used to validate their overclocked 14 and 22nm processors. For 22nm the maximum is 1.360. For 14nm the maximum is 1.456. While Intel's warranty is 3 years, Silicon Lottery's warranty is 1 year, which suggests at least one reason for the voltage difference between Intel's Team and Silicon Lottery.

    Here's a forgotten link to a revealing Tom's Hardware video interview of July, 2016, with Intel's Principal Engineer (Client Computing Group), Paul Zagacki, where BGTnJkuqlbo']Intel Discusses i7-4790K Core Temperatures and Overclocking. The video coincides with the formation of Intel's Overclocking Lab, also in 2016. In the video, Intel points out that overclocking abilities begin to "roll off" above 80°C, which agrees with the value the Team revealed in your article.

    While Core temperatures, overclocking and Vcore are often highly controversial and hotly debated topics in at least the overclocking forums, the term "electromigration" is closely related to a much less known term, which is "Vt (Voltage threshold) Shift". With respect to voltage and temperature, the two terms describe the causes and effects of processor and transistor "degradation" at the atomic level.

    In the Intel Temperature Guide, in Section 8 - Overclocking and Voltage, I created a table for Maximum Recommended Vcore per microarchitecture from 2006 to the present. For 22 and 14nm, those values are 1.300 and 1.400 respectively. I also created a graph showing the Degradation Curves for 22 and 14nm processors. The table and graph helps overclockers get a better perspective of the degradation and longevity issue:


    Sparing our members and visiting readers the deep dive, Vt Shift basically represents the potential for permanent loss of normal transistor performance. Excessively high Core voltage drives excessively high current, power consumption and Core temperatures, all of which contribute to gradual Vt Shift over time. Core voltages that impose high Vt Shift values are not recommended. The 14nm curve suggest 1.425'ish is the practical limit, which also agrees with the value the Team revealed in your article. The curve also suggests that Silicon Lottery might be pushing the edge of the envelope a bit.

    The concern here is that when novice overclockers casually glance around the computer tech forums, where conflicting and misleading numbers get flung around like gorilla poo in a cage, many don't realize through the fog of all the confusion that one size Vcore does not fit all. Aside from high Core temperatures, Vcore that might be reasonable for one microarchitecture can degrade another. So 22nm Haswell users now wanting to overclock their aging processors to keep up with today's games need to heed the degradation curves, which applies as well to 14nm Skylake and Kaby Lake users.

    CT :sol:
    Reply
  • JamesSneed
    Paul this was a wonderful article. Seriously, nice job.
    Reply