Sign in with
Sign up | Sign in
Your question

No HT For Penryn & Nehalem

Last response: in CPUs
Share
February 2, 2007 6:39:14 AM

Apparently it won't be in Penryn & Nehalem... :( 

http://sg.vr-zone.com/?i=4574

More about : penryn nehalem

February 2, 2007 7:12:32 AM

Noooooooooo! Mommy!!!!!! I want HT on my Penryn! I'm gonna hold my breath until I turn blue!!!!!!!!!!!!!!!!!! :lol: 
February 2, 2007 7:16:09 AM

Who cares?Intel makes a decent processor but it doesn't need hyper threading.Although it would probably be a good ideal as it could boost performance for certian apps.

Dahak

AMD X2-4400+@2.6 S-939
EVGA NF4 SLI MB
2X EVGA 7800GT IN SLI
2X1GIG DDR IN DC MODE (soon 2b crucial balistix ddr500)
WD300GIG HD
EXTREME 19IN.MONITOR 1280X1024
THERMALTAKE TOUGHPOWER 850WATT PSU
COOLERMASTER MINI R120
Related resources
February 2, 2007 7:16:56 AM

That might even be beneficial for Intel, think "Man Dies From Asphyxiation Over Intel CPU Features." That would result in a crap load of free publicity.
February 2, 2007 11:48:39 AM

Quote:
Noooooooooo! Mommy!!!!!! I want HT on my Penryn! I'm gonna hold my breath until I turn blue!!!!!!!!!!!!!!!!!! :lol: 

That's because it hurts performance more than it improves it.
February 2, 2007 12:46:03 PM

Well, we don't know for absolute sure. The only implementation of HT was during Netburst, which is bad in of itself and also the age when most programs weren't setup to handle more than one logical core. When applications and games were forced to be multi threaded thru HT, you started to see a incredible drop in performance compared to running these same programs with the HT off. Though there's the possibility that if done right, HT could have some use.
February 2, 2007 1:29:44 PM

HT is not a magic weapon.
It's just an architectural trick that, depending on the characteristics of the CPU and its own implementation, may give a sensible performance boost or not.
Given the current Core architecture, i do not see it giving a sensible performance boost, and as such i'm not surprised if it's not used on Penryn.
We'll see what Intel does with Nahalem.
February 2, 2007 1:48:09 PM

hmmm didn't prescott had 31 stages onto its pipeline and c2d only 14?
Wouldn't putting hyperthreading on c2d would mean increasing the pipeline length again? and possibly hurt more than benefit?
February 2, 2007 2:00:23 PM

Quote:
hmmm didn't prescott had 31 stages onto its pipeline and c2d only 14?

Yes.

Quote:

Wouldn't putting hyperthreading on c2d would mean increasing the pipeline length again? and possibly hurt more than benefit?

No, it's the other way around. :p 
If C2D has a 31 stages pipeline, then maybe HT would make sense for it, but implementing HT does *not* imply extending the pipeline to 31 stages.
The reason for having such a long pipeline is, to have high clock frequency (Prescott's architecture was supposed to scale past 7GHz).
Netburst architecture has several limitations, for example Northwood lacked an integer multiplier and a barrel shifter (i.e. shift and multiplication were costly there), Prescott had high latency cache and an insanely long pipeline.
The idea of using HT was, if the pipeline is stalling because one of those limitations (or there is a branch mispredict and the pipeline has to be flushed), then having a second thread is handy, cause this one is free to proceed and use the otherwise idle ALUs (arithmethic-logical units) to actually do some useful work.
C2D instead has a pipeline which is much more flexible and balanced (and short), and as such stalls are much more rare (i.e. not much "idle time" to exploit for a second thread).
Also having such a high decoding bandwidth means keeping the scheduler buffers quite full, so having a wide range of choice of potential instructions to execute, in case of a stall, from a single thread.
In other words, even when some instructions in C2D stall, the CPU can find other useful instructions to execute from the same thread more easily than what Netburst could do.
February 2, 2007 2:08:17 PM

Quote:
Well, we don't know for absolute sure. The only implementation of HT was during Netburst, which is bad in of itself and also the age when most programs weren't setup to handle more than one logical core. When applications and games were forced to be multi threaded thru HT, you started to see a incredible drop in performance compared to running these same programs with the HT off. Though there's the possibility that if done right, HT could have some use.

:?: :!:
Don't think so, it worked with netburst because it somehow made up for the inefficiencies there, but with today's efficient Core2, K8 and future K8L arch.s, HT has no chance of reappearing. I am repeating this; it was just VIAGRA for netburst, and keeping the viagra parallelism; you only need it if you are,... well, 'inefficient' :D 
K eep in mind that netburst was Core's contemporary and they didn't even use it in Core, not using it for Core2, AMD never thought about something like etc.... I'm not Jack to give you killer numbers and phrases but conceptually, at the end, why the heck would you need two virtual cores within one aside from helping hide it's inefficiency?! On a good core design, HT only makes the effect of enabling multithreading of an app, on a single core system; it drags down performance because data gets split and re-merged uselessly.
Right that HT gave almost 30% on netburst, but it does not mean it can give some 5-10% on Core2, because core2 takes not 30% but almost 90% over netburst.
I can make such reasonings till tomorrow, but the greatest proof for this is that Intel discarded it in both Core and Core2 and AMD has never mentioned it.
February 2, 2007 2:11:20 PM

Quote:
HT is not a magic weapon.
It's just an architectural trick that, depending on the characteristics of the CPU and its own implementation, may give a sensible performance boost or not.

The only CPU with such characteristics were P4s and never again will those conditions be replicated.

Quote:
Given the current Core architecture, i do not see it giving a sensible performance boost, and as such i'm not surprised if it's not used on Penryn.
We'll see what Intel does with Nahalem.

It may only work if that's a 62 stage CPU aimed at 100GHz :wink:
February 2, 2007 2:19:54 PM

wow. thanks for the thorough answer.

I now feel stupid! /kidding!

I understand the utilization of the c2d pipeline is a lot more efficient. (layman terms). So hyperthreading can be implemented on c2d without increasing the pipeline length? and actually be beneficial?
February 2, 2007 3:07:48 PM

Quote:

I understand the utilization of the c2d pipeline is a lot more efficient. (layman terms).

Yes. :) 

Quote:
So hyperthreading can be implemented on c2d without increasing the pipeline length?

Yes, but:

Quote:
and actually be beneficial?

No. :p 
It wouldn't be beneficial, because the C2D pipeline is running already at a very high efficiency.
HT helps filling the pipelines of Netburst, because those are not very efficient in doing it alone.
But on C2D, it wouldn't really help with that in most cases, and HT has a certain overhead, which could likely result in *reduced* overall performance.
February 2, 2007 3:13:31 PM

so there's no more point to hyperthreading? R.I.P?

For the effort that intel went through so hyperthreading was supported from software makers (game, video editing etc) and suddenly hear that there's no more need for it, it's kinda hard to digest.
February 2, 2007 3:17:11 PM

I still think it *might* be beneficial in some future architecture, but i don't have time now for details.
But on the software side, Hyperthreading has somehow paved the way for dual-core CPUs... so nothing has really been wasted. :) 
February 2, 2007 3:56:23 PM

Hyper Threading was just a lousy half-assed attempt at quasi dual core implementation. It was major kludge and given the era of multi-core is well at hand, there should be no need for HT ever.
February 2, 2007 4:00:21 PM

Quote:

I understand the utilization of the c2d pipeline is a lot more efficient. (layman terms).

Yes. :) 

Quote:
So hyperthreading can be implemented on c2d without increasing the pipeline length?

Yes, but:

Quote:
and actually be beneficial?

No. :p 
It wouldn't be beneficial, because the C2D pipeline is running already at a very high efficiency.
HT helps filling the pipelines of Netburst, because those are not very efficient in doing it alone.
But on C2D, it wouldn't really help with that in most cases, and HT has a certain overhead, which could likely result in *reduced* overall performance.

I read that the plan was to use HT to evaluate both sides of a branch so that it would reduce pipeline stalls because they could just throw away the branch that isn't taken when they evaluate the condition. This is an old trick (so old I'm surprised if they don't do this already) but apparently HT would make it 0 cost.
February 2, 2007 4:15:23 PM

I will attest though, owning a 3.2 northwood 800mhz fsb with HT...

when I turn it on, 1080p quicktime movies play somewhat smoothly

when I turn it off, they're rendered unwatchable, and games seem noticably more sluggish, even with a 7600gt

it seems to be like it's doing something right for some applications, but if I had a c2d, I wouldn't miss it all that much :p 
February 2, 2007 7:14:22 PM

Quote:
That might even be beneficial for Intel, think "Man Dies From Asphyxiation Over Intel CPU Features." That would result in a crap load of free publicity.


Nah. I'm the one who asphyxiates chicks. You can find some of my videos on the web... 8)
February 3, 2007 1:11:19 PM

Quote:

I read that the plan was to use HT to evaluate both sides of a branch so that it would reduce pipeline stalls because they could just throw away the branch that isn't taken when they evaluate the condition. This is an old trick (so old I'm surprised if they don't do this already) but apparently HT would make it 0 cost.

I think i also saw written something like that somewhere, but it's wrong.
That's a technique called "predication" and is used on Itanium processors.
It has nothing to do with HT, because predication works on a single thread.
Also, while it makes sense on Itanium (Itanium has no HW scheduling, but it has a lot of execution units which sit idle without proper code), it does not on Core 2.
You say that a branch would cost 0 with such technique, but in fact, it would cost a lot.
Basically, with such a technique half of the execution resources of Core 2 would do work for nothing on every single branch instruction; compare this with the current approach, where branches are typically predicted correctly on more than 90% of the cases.
The difference here is that the HW scheduling of Core 2 makes an excellent job at using the execution resources to do useful stuff most of the time (also, Itanium does not have exactly top class integer performance).
February 3, 2007 6:36:11 PM

Uarch efficiency aside, could it also be because HT has no market anymore? A Wolfdale with HT has four logical cores, but Yorkfield will have four actual cores, making that implementation pointless. A Yorkfield with HT has eight logical cores, but hardly any programs will be multithreaded to take advantage of 8 whopping cores. Ultimately, HT has been supplanted by the real deal.

Plus, Intel may want to avoid HT because it is associated with Netburst and its inefficiency.

Pippero, you're probably right, but Intel may just not see a place for it marketwise. Besides, it wouldn't be the first time Intel touted a tech that had no practical purpose *cough*Viiv*cough*.
!