Anandtech: The truth of CPU degradation

yomamafor1

Distinguished
Jun 17, 2007
2,462
1
19,790
http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=3251&p=6

This is the kind of quality of an article I would expect from Tom's. This article is simply superb, very well written, very well explained, and very technical.

This article not only offered the review of E8500, it also offered the explanation of E8400's temperature errata, as well as its degradation due to overclocks. It should debunk a lot of erroneous claims about temperature and degradation of CPU under overclocks.

Temperature:
In the past, internal CPU temperatures were sensed using a single on-die diode connected to an external measurement circuit, which allowed for an easy means of monitoring and reporting "actual" processor temperatures in near real-time. Many motherboard manufacturers took advantage of this capability by interfacing the appropriate processor pins/pads to an onboard controller, such as one of any of the popular Super I/O chips available from Winbond. Super I/O chips typically control most if not all of the standard motherboard input/output traffic associated with common interfaces including floppy drives, PS/2 mice and keyboards, high-speed programmable serial communications ports (UARTs), and SPP/EPP/ECP-enabled parallel ports. Using either a legacy ISA bus interface or a newer LPC (low pin-count) interface, the Super I/O also monitors several critical PC hardware parameters like power supply voltages, temperatures, and fan speeds.

This method of monitoring CPU temperature functioned satisfactorily up until Intel conducted their first process shrink to 65nm. The reduction in circuit size influenced some of the temperature-sensing diode's operating characteristics enough that no amount of corrective calibration effort would be able to ensure sufficient accuracy over the entire reporting range. From this point on Intel engineers knew they would need something better. From this came the design we see effectively utilized in every CPU produced by Intel today, starting with Yonah - one of the first 65nm processors and a precursor to the now wildly-successful Core 2 architecture.

The new design, called a Digital Thermal Sensor (DTS), no longer relied on the use of an external biasing circuit where power conditioning tolerances and slight variances in sense line impedances can introduce rather large signaling errors. Because of this, many of the reporting discrepancies noted using the older monitoring methods were all but eliminated. Instead of relying on each motherboard manufacturer to design and implement this external interface, Intel made it possible for core temperatures to be retrieved easily, all without the need for any specialized hardware. This was accomplished through the development and documentation of a standard method for reading these values directly from a single model specific registers (MSR) and then computing actual temperatures by applying a simple transformation formula. This way the complicated process of measuring these values would be well hidden from the vendor.

This confirms Intel's documented errata on motherboard unable to measure E8400's DTS accurately, and thereby causing thermal interrupts.


CPU degradation from overclocks:
As soon as you concede that overclocking by definition reduces the useful lifetime of any CPU, it becomes easier to justify its more extreme application. It also goes a long way to understanding why Intel has a strict "no overclocking" policy when it comes to retaining the product warranty. Too many people believe overclocking is "safe" as long as they don't increase their processor core voltage - not true. Frequency increases drive higher load temperatures, which reduces useful life. Conversely, better cooling may be a sound investment for those that are looking for longer, unfailing operation as this should provide more positive margin for an extended period of time.

As a result, those who believed they can run E8400 @ 4.xGhz with simple stock cooling, and overvolted them about 0.2~0.3V, and still believe they can outrun their warranty lifetime, are morons. As shown in Anand's graph, E8400's VID is 1.125V, and in order to run at 4.xGhz, most people put more than 30% more voltage into the processor. Compared to only 15% more voltage to overclock a 65nm dual core Core 2 to 4.xGhz (1.5V), or 28% for a Q6600 to hit 4.0Ghz (1.6V). So far I have not seen a Q6600 to be overclocked to 4.0Ghz without exotic cooling, that's capable of running more than few days for benching purpose.

EDIT:
I believe Anand summed up very well in terms of overclocking Wolfdale in his conclusion.
...Never before has achieving these levels of overclocks been so easy. However, don't become tempted by the incredible range of core voltage selections your premium motherboard offers; it's important not to lose sight of the bigger picture.

These processors are built on a new 45nm High-K process that invariably makes them predisposed to accelerated degradation when subjected to the same voltages used with last-generation's 65nm offerings. Although we certainly support overclocking as an easy and inexpensive means of improving overall system performance, we also advocate the appropriate use of self-restraint when it comes to choosing a final CPU voltage. Pushing 0.1V more Vcore through a processor for that last 50MHz does not make a lot of sense when you think about it.

This means, 1.8V for 4.0Ghz E8400 is similar to putting 2.1~2.2V into 65nm Core 2s. I can guarantee you, unless you have a golden chip, or you're extremely lucky, your chip will fail within weeks, if not days.

Regardless, this is a great read. Definitely a good article for those interested in getting a Wolfdale, and for the editors of Toms to learn from.
 

yomamafor1

Distinguished
Jun 17, 2007
2,462
1
19,790
More than a few programs have been released over the last few years, each claiming to accurately report these DTS values in real-time. The truth is that none can be fully trusted as the Tjunction values utilized in these transformations may not always be correct. Moreover, Intel representatives have informed us that these as-of-yet unpublished Tjunction values may actually vary from model to model - sometimes even between different steppings - and that the temperature response curves may not be entirely accurate across the whole reporting range. Since all of today's monitoring programs have come to incorrectly assume that Tjunction values are a function of the processor family/stepping only, we have no choice but to call everything we thought we had come to know into question. Until Intel decides to publish these values on a per-model basis, the best these DTS readings can do for us is give a relative indication of each core's remaining thermal margin, whatever that may be.

This explains why people are encountering temperature reading problems on the Wolfdales.
 

ocguy31

Distinguished
Feb 27, 2008
84
0
18,630
Nice post, but I dont think anyone is using 1.8v for 4ghz. 1.4v would be the max. But I'm just fine using 1.27 @ 3.8ghz, why ruin such a great chip that is almost impossible to replace right now?
 

SpinachEater

Distinguished
Oct 10, 2007
1,769
0
19,810
Nice post yomama. I really like the reviews that come out of Anandtech. Especially their mobo reviews. Their articles are definitely held to higher standards than TH and they are usually quite thorough. I just wish they would kick them out faster… :??:
 
I've been waiting for this for a long time. Granted, I'm using an overclocked Athlon 2800+, and with it's 130nm (feel lucky those of you who have got 90nm cpu's.) process, it's fine pushing at extra 200-400MHz when i need it to. But it just makes sense, doesn't it people? Pushing more power through something smaller will put more strain on it.
I won't be surprised if even smaller processes start losing intel's fabulous overclocking abilities, as of lately.
Only time will tell, but i'm sure there's a remedy to this - the most obvious one will probably be less power usage. We may find sub 45nm processors that have voltages not only very near 1v, but we may make changes in the thousandths for overclocking. 1.001v...1.002v...

New technology is so much fun.
 

zenmaster

Splendid
Feb 21, 2006
3,867
0
22,790
I would rate the article as "FAIR".

The writing is excellent, but they fail to provide a technical reference for many of their claims. This is perfectly acceptable if the results are based upon your own tests which you can present.

The claims made in the article, as valid as they may be, are totally invalid w/o documenting the source of those claims.
 

Asian PingPong

Distinguished
Jan 21, 2008
131
0
18,680
Wait, does this mean that the temperature readings on the Core 2s are inaccurate, or have the tjunction been clearly established for each model?
 

yomamafor1

Distinguished
Jun 17, 2007
2,462
1
19,790
It means since Tjunction (or to better put it, maximum operating temperature) differs between chips, models, and steppings, its hard to come up with an universal maximum value for it.

For example, while the first iteration of Core 2 has maximum temperature of 105C, the later revisions lowered that to 95C, and that is still considers inaccurate. The truth is, we'll never know the accurate value until Intel publishes them.
 

sailer

Splendid
I find the article interesting, if not a bit unsettling. For sure, I'll be watching the temps a bit more closely and taking more effort to reduce them. I don't know if I've been just lucky with some past overclocking efforts or the fact that I usually change CPUs more often than every three years which has kept me out of trouble and my credit card intact.
 

scyle

Distinguished
Mar 2, 2008
60
0
18,630
I honestly dont know much about OCing, but would that article apply to say, when Tom OC'd that 1.8ghz Intel Dual Core chip to 3ghz?
 

epsilon84

Distinguished
Oct 24, 2006
1,689
0
19,780
Excellent article, thanks for the link. I'm sure even the seasoned overclocker can learn something from this article.

I think it has been well established that overclocking *will* shorten the lifespan of your CPU - however, the point is that, overclocking within reason (ie. 10% overvolt with a good HSF) will not shorten the CPU life long enough for you to notice, since most enthusiasts update their CPU at least once every couple of years.

FWIW, I've yet to have an overclocked CPU die, though I usually only keep them for 2 - 3 years max. However, 5 year ago I built my cousin a rig based on an Athlon XP 2500+ overclocked to 2.2GHz, and it is still running strong to this day.

However, I've had a Radeon 9800 Pro die after prolonged overclocking for over a year, I don't know if it's just bad luck or not. Thankfully, by the time the it died I was due for an upgrade anyway. ;)
 

dragonsprayer

Splendid
Jan 3, 2007
3,809
0
22,780



the article itself says it does not apply to any chip in the same manor - if you read it, it says high hafnium gate cpu's are more likely to degrade faster.

this article while nice for scientist and engineer is useless for normal people

i will translate it:

1) use the best cooling possible and keep temps as low as possible
2) keep voltage as low as possible

3) the more your run a cpu at 100% output the more damage done to it from high end overclocking


my perspective - "the sweet spot"

the sweet spot is the speed which is slight lower then where you see a greater increase in temps after a same increase in speed, in many case the multipler and fsb are interchangable.

fsb should be maximized until performance drops off - today that is 1600-1700 fsb with intel chipsets

the multiplier should optimal for the cpu - c2d is 9 give or take a little. that why the E6600 is such a great chip and the e8500/e8400 are close as is the q6600

you increase the speed of the cpu until you see a spike in temps then back off slightly. typically the voltage is 1.38-1.48. As you increase voltage the temps increase greatly above 1.45 or so volts and the so does the degradation process or diffusion of silicone from resistance.

heat =resistance, this results in electrons being pushed, hard bumping of the electons and energizing the atoms in to cpu material and push the atoms, diffusion - like sand at the beach. the sand stays put until a strong enough current fluidised the sand.

bottom line - new wolfdate cpu's may suffer more from overclocking then cpu's in the past, buy the best cooler, use as many low speed high output fans as possible and keep your voltage in line!

for gamer and such that do not run their systems 25/7 who cares, if you folding at home turn down the oc and voltage

 

dev1se

Distinguished
Oct 8, 2007
483
0
18,780
My E8400 chip hits 4ghz no problem and very stable and the motherboard manages the voltages for that (Not sure of the exact voltage) but it's cooled very well, with a huge copper waterblock with copper fins that the water passes through... and it keeps the CPU @ 20 - 25'c on idle & 35'c full load, with the Cores between 38 - 45'c on idle and upto core temps of 60'c on full load.

Would these be safe temps? I'll get the Voltage in a moment since ive not got it clocked at the minute.

3.6ghz is a safe O.C, no voltage changes required, very little extra heat and a huge performance increase.

 

yomamafor1

Distinguished
Jun 17, 2007
2,462
1
19,790


Mind proving that with a quotation? I looked around and the article clearly states that overclocking does indeed result in faster degradation, regardless of a specific type of CPU.

Anand just made this statement, which I quoted earlier.
These processors are built on a new 45nm High-K process that invariably makes them predisposed to accelerated degradation when subjected to the same voltages used with last-generation's 65nm offerings.

this article while nice for scientist and engineer is useless for normal people

This is the reason why I don't like people doing overclocking when they certainly have no background of it. Most of them do not understand the inherit danger of overclocking, and how it affect their system. Its one thing to overclock to 3.6Ghz (for example), and its another to use 1.45V to achieve it.

bottom line - new wolfdate cpu's may suffer more from overclocking then cpu's in the past, buy the best cooler, use as many low speed high output fans as possible and keep your voltage in line!
Proof?


 
I was actually going to post the same link, but guess you beat me to it ;). Nice to see that more people read Anandtech. I agree with you on the quality of the recent toms articles. Most of the ones are OK but some are just out right dumb. (ie. CPU Cooler charts, WTF? failing all the back plated CPU coolers just because they can't install it? ) I can install a CPU, Motherboard, and a back plated cooler (ie. Zalman CPNS 8700) in about 15 minutes or less if its a good case to work with.

@Every one: By the time the CPU dies you probably would have upgraded.
 

WR

Distinguished
Jul 18, 2006
603
0
18,980
1) use the best cooling possible and keep temps as low as possible
2) keep voltage as low as possible

3) the more your run a cpu at 100% output the more damage done to it from high end overclocking
Bolded part cannot be relied on. Many times your overclocked CPU can be suffering just as much damage when idle as when at load. Many of the fried Wolfdales died when near idle. Reasons:

1) Vcore is higher when idle and not using Speedstep; Vdroop occurs only during load
2) Temperature may not drop much when idle because CPU and system fans slow down

When someone says they ran 1.6V for 10 seconds to bench SuperPi-1M, you can infer they probably used 1.6V for several minutes at least.

my perspective - "the sweet spot"

the sweet spot is the speed which is slight lower then where you see a greater increase in temps after a same increase in speed
Here I'd have to disagree and say it depends on the process and die size. A lot of the Wolfdale deaths were at the "sweet spot," which happened to be ~1.4V and 4+ GHz. Unlike with Conroe, the "sweet spot" is not a good indicator of what safe voltages are for Wolfdale because a 107 mm^2 45nm die doesn't produce a lot of heat to manifest diminishing returns on Vcore before you reach heavy electromigration.

Conroes stopped you from frying them because they got hot/unstable and made Vcore bumping futile. Wolfdales don't get hot till it's too late.
 

SpinachEater

Distinguished
Oct 10, 2007
1,769
0
19,810



There is something that I don't like with the format of their forums but I am starting to frequent them more often. They feel less techy than here. Posts tend to get buried easier...I always lose thee threads that I post on...maybe I will figure it out.. :pt1cable:

JonnyGuru is still over there though...he is pretty much a bad ass.
 

homerdog

Distinguished
Apr 16, 2007
1,700
0
19,780
Funny, I read this earlier today and didn't see anything special about it. Just typical AnandTech quality, which nothing else even comes close to.

And yes, Tom's's CPU cooler charts (along with most of their articles) are ridiculous. I prefer the forums here though.

Edit: The guys over at Tom's games do a great job though, especially Ben and Rob.
 

Zorg

Splendid
May 31, 2004
6,732
0
25,790
He misspoke and for some reason you are having a hard time understanding what he said. You actually posted the proof below that you were asking him for. What he meant to say was "the Hafnium high K gate CPUs". Intel uses Hafnium as the dielectric in the high K gate 45nm CPUs. Don't get me wrong, I'm not one to defend dragonsprayer, and he said stuff, in his post, that I'm pretty sure contradicts previous posts of his, but c'mon let us not split hairs.
Technology@Intel · 45-Nanometer-Hafnium-based-High- k-dielectric-Metal-Gate
And here is the proof you asked for, if you can call it proof. I didn't say it was not accurate just not proof.
These processors are built on a new 45nm High-K process that invariably makes them predisposed to accelerated degradation when subjected to the same voltages used with last-generation's 65nm offerings.
The same "proof" that you presented in the Anandtech quote earlier.
 

Mathos

Distinguished
Jun 17, 2007
584
0
18,980
Actually this does indeed affect all processors. But I think as the die's base transistor process size shrinks we will be seeing this more and more. I honestly think this will hit AMD's SOI and super SOI process just as bad if not more so than Intels HighK.

@DS Actually heat= wasted power or wattage that isn't used to do any work inside the processor. I could throw a few formulas at ya'll from my old intro to ac/dc electronics book from college, but I'll refrain from that for now. You'd be amazed but in my early 20s I was going to school for Electronics engineering as well as Computer system and internetworking technology degrees.

It frankly amazes me that current CPU's use so much current (amperage) considering the wire lead and transistor size in question. When you consider the fact that these CPU's are using close to 10 amps for the entire cpu it makes you think. Granted I know there are probably current flow limiting circuits and obviously there are voltage divider circuits involved. And when you consider that most complete electronic devices run on mere milliAmps it gets even more fun.

But back to the subject at hand, it's completely believable that modern processors, or anything below 90nm is becoming more and more susceptable to electron path ablation (I think thats what it's called). Especially when you start raising the voltage more than say 5-10% above spec to try and OC a CPU.

I was going to link to a thread on Xtremesystems but I can't find it at the moment, anyway it was related to some issues with the Phenom Overclocking. Whats happening with some peoples CPU's is after about a month to month and a half the OC will suddenly become unstable at the present voltage, even though it had been running completely stable prior to that. Some times with them upping the voltage higher again will continue to work for a time, but then the same thing will happen. Or in some cases no matter how far the voltages are raised they're unable to regain OC stability. This seems to happen more on the ones that require higher than 1.3v for OC stability, at 2.6 and 2.7. I got lucky for example and got one that does 2.6/2.7 at 1.248v which is only .016v above stock. Also note that some of the higher voltage requirements are due to HSF performance, or at least that is what we've found. The cooler the processor stays the less voltage it requires.

*and this where my train of thought pulls an Amtrak and I can't continue the post*
 

Zorg

Splendid
May 31, 2004
6,732
0
25,790
I assume that you are referring to electromigration.