(better) Understanding electromigration.

m25

Distinguished
May 23, 2006
2,363
0
19,780
Now, I have done some quick readings around and know that EM is that thing that ruins your CPU, especially when you OC and it increases with heat and voltage. Until here it's OK but nobody mentions frequency; does frequency increase accelerate EM anyhow or does it not?
 

Bache

Distinguished
Dec 3, 2006
344
0
18,780
Now, I have done some quick readings around and know that EM is that thing that ruins your CPU, especially when you OC and it increases with heat and voltage. Until here it's OK but nobody mentions frequency; does frequency increase accelerate EM anyhow or does it not?
I've read that it only affects Northwood CPU's when you increase the Vcore over a set voltage.

Prescott CPU's did not suffer from it at all, and C2D's don't either I think.
 

Mex

Distinguished
Feb 17, 2005
479
0
18,780
Overclocking early stepping Northwood cores yielded a startling phenomenon. When VCore was increased past 1.7 V, the processor would slowly become more unstable over time, before dying and becoming totally unusable. This is believed to have been caused by the physical phenomenon known as Electromigration, where the internal pathways of the CPU become degraded over time due to excessive electron energy. This was also known as Sudden Northwood Death Syndrome.
Electromigration occurs in CPUs (and other products) if the voltage for a given CPU is raised too high. This passage suggests that electromigration was a fairly unknown phenomenon until the high failure rates of overclocked Northwoods brought EM into the open. It sounds more like a design flaw or overclockers overestimating the maximum tolerance of the Northwood, since this is the only real epidemic failure of CPUs in recent history.

Note that electromigration is a slow process. It can take years for a modern CPU to fail from EM. Most products are retired long before they have the chance to fail, which is likely why you haven't heard of this occuring in newer CPUs.

EDIT: Jack, don't embarass me too bad; I'm still a n00b at this.
 

kwalker

Distinguished
May 3, 2006
856
0
18,980
I would have to say that voltage outside of the limitations of a given device would be the cause of EM .
with that in mind you need increased voltage to drive up the frequency in more than a moderate overclock.
Frequency would be more irrata and data loss than EM.
But I would rather here jacks take on this subject any day :wink:
 

pausert20

Distinguished
Jun 28, 2006
577
0
18,980
You need a suceptable metal. This would be Aluminum. Copper will do this too but it is much less suseptable than Aluminum. All of the lower layers of a Core 2 use copper for the interconnects. The top 1 or 2 layers are usually still made from Aluminum and they are much thicker so electro migration will take a very long time to occur.

The orginal Northwoods only used Aluminum interconncts so as you turn up the voltage electro migration occurs at a fast rate.

That's my 2 cents.
 

Mex

Distinguished
Feb 17, 2005
479
0
18,780
Ah, okay. All of those Northwoods died due to a design choice that went horribly wrong when the resulting CPUs were placed into the hands of rabid overclockers. Despite the vulnerable aluminum interconnects, death from electromigration would still take many years as long as the VCore was kept at a normal level, right?

Jack, I too would like to hear an explanation from you.
 

pausert20

Distinguished
Jun 28, 2006
577
0
18,980
Yes, that should be true. When Intel switched to copper interconnects the issue of Electro Migration has not been heard from since.

It was not really a design choice per se. Aluminum interconnects were the standard way to manufacture a processors in those days. From what I had heard it came as a suprise to the design/process guys.

What it all comes down too is that the electrons have enough energy that as they move through the very thin Aluminum interconnect wires that they hit the Aluminum atoms and move them. Aluminum was found to be very suseptable to this at the very small widths of the interconnect wires compared to copper ones.

I do know that both Copper and Aluminum get brittle as you work harden them in the macro world. Just wondering if that is happening in the nano/micro world if the interconnects?????
 

Mex

Distinguished
Feb 17, 2005
479
0
18,780
Okay, thank you for clearing that up. It sounds as if this incident was a factor in Intel's decision to switch to copper interconects, which is a bit surprising to me since in the grand scheme of things, very few processors are overclocked by the end user (I'd guess <2%, if not lower) and in turn would be susceptable to EM. Oh well, it's better to be safe than sorry.

Again, thank you for the explanation. Although if he shows up, Jack might be mad that you stole some of his thunder. :lol:
 

pausert20

Distinguished
Jun 28, 2006
577
0
18,980
Oh, I'm not too worried. I'm sure Jack will fleshout what I said and use links. I tried to find the article I had read to get my understanding but I could not find it.

I actually found it interesting that Intel still uses Aluminum as the top most interconnects. From my understanding it is very easy to deposit Aluminum compared to the copper process.

I was also trying to find a cross section of an Intel chip that shows how the top most interconnect wires are much thicker than the bottom most ones next to all of the transisters.
 

shinigamiX

Distinguished
Jan 8, 2006
1,107
0
19,280
Now, I have done some quick readings around and know that EM is that thing that ruins your CPU, especially when you OC and it increases with heat and voltage. Until here it's OK but nobody mentions frequency; does frequency increase accelerate EM anyhow or does it not?

Ok, here we go.....

First, in doing a search I found a 'term paper' on a college server by some student it appears that actually did a great job summarizing the whole electromigratin thing, I really like this paper as it does a good job going into basic detail without a huge technical overhead:
http://www.eng.uwaterloo.ca/~asultana/Project_ECE730vlsi.pdf

Particularly, see figure 2-1 on page 12, it essentially summarizes electromigration in a simple picture. I will refer to this PDF in this reply.

Other articles on electromigration to get the intresest going (some are subscription based, but a library would also have them):
http://pmos.upc.es/blues/publications/ICTesting/JLG_MR_37_7_97.PDF
http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=1044338
http://www.mrs.org/s_mrs/sec_subscribe.asp?CID=2467&DID=141052&action=detail

And my favorite:
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1493069
This one concerns ESD induced electromigration failures as latent ESD failures that can shorten the lifetime to as little as a few weeks.

Ok, now to the answer ----

The answer is actually simpler than you think.... the short of it is, increasing frequency also increases the total current through the device, hence, the metal lines will experience higher current density and higher electromigration degradation.

Here is the explanation .... those who have watch me post know I am keen on the td=CV/I, where td is gate delay, C is total capacitance, V of course is voltage, and I is current or Idsat, drive current. This is a fundamental equation describing the max switching speed of a device. However, in this form and this way of thinking td is the dependent variable and is a function of C, V, and I -- all three of which are design parameters, what happens after we have optimized the process and nothing changes any longer --- then we rearrange this equation and, in this case, let's look at I as the independent variable:

I = CV/td or CV*(1/td)

But 1/td is one over time which is frequncy, f. So ---

I = CV*f

Thus, since C is fixed by the oxides, wires, and transistors in the CPU, V is dialed by you the user, and f is set by the clock generator, then I is a direct function of frequency AND voltage.

In the link above, electromigration lifetimes are modeled by Black's equation (see link above):

tf = A * (1/J)^n * EXP(Ea/kT)

A is material dependent, J is the current density which is I/unit area cross section of the wire, n is a emperically determined exponent, Ea is activation energy, k is the Boltzmann constant, and T is temperature. Key here is current density J as the current goes up so does the electromigration factor.

So really, the increase of electromigration with frequency is no more than an increase in current driven by the frequency generator. Pretty simple.

Side note: Validate my I = CV*f equation, recall that I also post many times that the equation for dynamic power is P=CV^2*f , well with a little algrebra check this out ---

P = I*V ===> fundamental electrical power equation.

Substitute the expression for current as a function of frequency from my argument above,

P = (CV*f)*V = C*V^2*f wow, now we see where the dynamic power equation comes from and that power really goes as a cube of the 'speed fundamental varibles --- 2 orders in voltage and 1 order in frequency.

EDIT: NOTE --- though volting and clocking up your CPU can increase the rate of electromigration, the time scale here is still VERY long. Pausert20 is correct, the heavier Cu atoms, the lower resistance and lower temperature of the wire as a result has pretty much eliminated electromgiration problems, they still exist but the lifetime is so long that it makes littel difference based on the turn over rate that we typically upgrade to.... (We meaning enthusiast).

TDDB is probably the most common failure mode if suiciding a chip.
http://www.dfrsolutions.com/page.asp?id=114&mstr=4
Notice that it has the same exponential form as electromigration, they are both first order affects.

Jack
I'm just curious, but... what do you for for a living??
 

m25

Distinguished
May 23, 2006
2,363
0
19,780
Thanks a lot Jack, you're always great when it comes to these things. So in a consolidated process, an ncrease incurrent does not alter the silicon structure as much as voltage and temperature do, right.
Basically, if you OC a CPU on stock voltage and keep it cool enough you're not altering it's life expectancy by any significant amount. Am I right?
 

I

Distinguished
May 23, 2004
533
2
18,995
Now, I have done some quick readings around and know that EM is that thing that ruins your CPU, especially when you OC and it increases with heat and voltage. Until here it's OK but nobody mentions frequency; does frequency increase accelerate EM anyhow or does it not?

No, you don't "know" anything yet. Electromigration will not ruin a CPU in any reasonable or problematic period of time. People today are still running Celeron 300 o'c to 450, just how many decades did you think a CPU needs to run? NASA is still using 80486, even older.

Electromigration happens with or without overclocking, so the real question is whether the overclocking has a useful result, rather than retiring a CPU and having to spend more $ to get same resultant performance.

Yes frequency would accelerate EM, but again we don't care, what you are supposing is important, is not in any real world scenario. If you were sending a deep space probe on it's mission, maybe then it matters. For anything you own, the CPU is not going to be among the earlier failure points. The only thing that changes that is if you let it run extremely hot, in which case it's not a concern of EM that matters, it's merely getting the temp down to a better level.
 

m25

Distinguished
May 23, 2006
2,363
0
19,780
YEs, now I got it; is temperature and heat have an exponential effect, the effect of current increase is only linear (proportional to frequency)
I was asking this because now I'm left with the 'old' 3000+ I had and before it's relegated to an office PC, I thought it had still performance to shoot with a decent board at about 2.6GHz. It only reaches 38°C @2.0G so I am pretty confident it won't go above 45°C at 2.6GHz.
The other option is a silent PC; undervolted at 1.25V it only reaches 34° on stock. It only takes a passive CPU cooler and a 120 mm fan blowing air away from the case.