Tom's Hardware Forums » CPU & Components » CPUs » Penryn to have Hyperthreading
 

Penryn to have Hyperthreading

Add a reply



 Word :   Username :  
 
Bottom
Author
 Thread : Penryn to have Hyperthreading
 
Profile: enthusiast
More Information

The "always" accurate folks at the Inquirer say Penryn will have hyperthreading in addition to 6 megs of cache, SSE4, Low/Hi K dialetrics/gates, and a 45 nano process.
 
There has always been "rumours" that Core2duo actually had HT circuitry (much like Willamette and early Northwoods did) but that Intel never turn it on.
 
HT was never worth a whole lot in the desktop universe, a couple % at best, it did however make 25% + differences in selected server type applications.
 
The big problem was that Netburst could only RETIRE two micro-ops per clock, so regardles of how full the pipeline got stuffed, what came out the other end was always limited.
 
With Core doing 4 (or kinda 5, depending how you want to argue and/or count things) instructions per clock, hyperthreading may actually work this time.
 
A more interesting possibility is that Intel may actually have Penryn use hyperthreading to (under some conditions) exectute BOTH haves of a conditional in the code, hyperthread both branches, and then actually use the one that turns out to be correct - greatly reducing the penalty of having to flush the pipeline in the event of a branch prediction error.

Related Pr oduct
Register or log in to remove.

Profile: enthusiast
More Information

You are a little late. I first posted this couple of days ago. As I said, there are lots of recycled posts.  :roll:

Profile: newbie
More Information

Interesting post.  However, I have never given much creedence to HT tech.  It just seems like a marketing driven term rather than an actual benefit.  This doesn't mean that as "free" feature I wouldn't take it.  I am just skeptical.  :wink:

Profile: enthusiast
More Information


 
It was actually pretty good that in applications that could support it, it gave more advantages than L2 cache doubling can ever do. Here's another comparison:
http://www.anandtech.com/showdoc.aspx?i=1746&p=6
 
And there is the smoothness of running it with it on.

Profile: Forum Veteran
More Information

Quote :

Interesting post.  However, I have never given much creedence to HT tech.  It just seems like a marketing driven term rather than an actual benefit.  This doesn't mean that as "free" feature I wouldn't take it.  I am just skeptical.  :wink:

Me thinks you never used an HT enabled CPU.

Profile: nimble knuckle
More Information

Quote :

Interesting post.  However, I have never given much creedence to HT tech.  It just seems like a marketing driven term rather than an actual benefit.  This doesn't mean that as "free" feature I wouldn't take it.  I am just skeptical.  :wink:


 
In multithreaded apps it improves performance by up to 25%, but since these apps are rare on the desktop (especially a few years ago) I found HT more useful for keeping the PC 'responsive' during high CPU load times. Also, it's particularly useful if an application 'hangs' Windows and is using 100% of CPU resources, HT enables the system to remain responsive enough to actually reach Task Manager and to end the hung application there.

Profile: newbie
More Information

I've been outed :o  
 
OK. so I am talking about something I don't really know about... BUT even with this info  taken with a grain of salt,  why should HT technology be something that is compelling on the Penryn CPU?  I am not bashing HT, just mentioning that it occupies ~5% of the transitor space (or so I have heard) and does not offer many benefits besides a few particular apps.  AGAIN, I will take HT tech/ SSE4+  :wink:  (didn't conroe already have sse4 :wink: ) + 1333 FSB, etc. without so much as a grumble :lol:

Profile: nimble knuckle
More Information

Quote :

I've been outed :o  
 
OK. so I am talking about something I don't really know about... BUT even with this info  taken with a grain of salt,  why should HT technology be something that is compelling on the Penryn CPU?  I am not bashing HT, just mentioning that it occupies ~5% of the transitor space (or so I have heard) and does not offer many benefits besides a few particular apps.  AGAIN, I will take HT tech/ SSE4+  :wink:  (didn't conroe already have sse4 :wink: ) + 1333 FSB, etc. without so much as a grumble :lol:


 
I would imagine the main point would be to improve multithreaded performance. It would allow a dual core CPU to execute four threads, although not at the same levels as a true quad core CPU.  
 
Imagine Alan Wake, we have all heard the hype about how there are individual threads for physics, AI, audio etc. The physics component is very intensive and should max out one core, but apparently the AI and audio threads are far less intensive, so in such an instance we may see the benefits of HT in being able to execute more threads simultaneously.

Profile: newbie
More Information

An application that fully utilizes a multicore CPU has to be multithreaded anyway, and in that case the chances are good that it will scale to 4-8 threads when the CPU has HT.
 
I've seen ~30% performance gains from hyperthreading in a server app so it does work and is not just a "marketing" feature.

m25
Profile: Faithful Poster
More Information

An efficient architecture like Core2 or K8/K8L cannot benefit from HT; it was just some kind of viagra to keep P4s and some PDs competitive for a little more. HT just made up for the incredible rate of cache misses and overall pipeline stalls the netburst architecture had, however, I'd not be surprised if they reintroduced it, just to boast a little more.
 
To darkz:
The 30% you mention is for a netburst P4 of Xeon, given a prescott core. The (somehow) more efficient northwood core, only gained ~15% from HT and something like a Core2, will only get something not more than 5%, an amount that can well be swallowed by thread synchronization in some cases.

Profile: nimble knuckle
More Information

Quote :

An efficient architecture like Core2 or K8/K8L cannot benefit from HT; it was just some kind of viagra to keep P4s and some PDs competitive for a little more. HT just made up for the incredible rate of cache misses and overall pipeline stalls the netburst architecture had, however, I'd not be surprised if they reintroduced it, just to boast a little more.
 
To darkz:
The 30% you mention is for a netburst P4 of Xeon, given a prescott core. The (somehow) more efficient northwood core, only gained ~15% from HT and something like a Core2, will only get something not more than 5%, an amount that can well be swallowed by thread synchronization in some cases.


 
Reintroducing it just to boast makes no sense. Intel chased the magic Ghz once just to boast and i´m certain they have learned from that.
 
Having HT with a long pipeline CPU like P4 wasn´t a bad idea. I´m not sure how well it would work with a 14 or 15 stage CPU. I like the idea though. Since development goes to multicore anyway, why not reintroduce the virtual cores?

Profile: newbie
More Information

Quote :

An efficient architecture like Core2 or K8/K8L cannot benefit from HT; it was just some kind of viagra to keep P4s and some PDs competitive for a little more. HT just made up for the incredible rate of cache misses and overall pipeline stalls the netburst architecture had, however, I'd not be surprised if they reintroduced it, just to boast a little more.
 
To darkz:
The 30% you mention is for a netburst P4 of Xeon, given a prescott core. The (somehow) more efficient northwood core, only gained ~15% from HT and something like a Core2, will only get something not more than 5%, an amount that can well be swallowed by thread synchronization in some cases.


 
I think the performance advantage from HT will be more than 5%, more like 10%-15% at least. If it was only 5% it would mean that C2D basically has 5x better branch prediction and 5x less cache misses than netburst which I find hard to believe. AFAIK the main advantage of C2D is being able to execute more instructions per cycle which doesn't really affect HT does it?

Profile: newbie
More Information

Quote :

I like the idea though. Since development goes to multicore anyway, why not reintroduce the virtual cores?


 
My thoughts exactly  :D Also, the apps I work with would benefit fully from HT, so I'm all for it...

Profile: enthusiast
More Information

The differences in branch prediction and cache misses don't have to be 5x for HT to make so much less difference on a Core 2.  Since the processor itself os so much more efficient the penalties for cache misses and mis-predictions are far smaller.  That means that there is less room for that virtual thread to run in the processor (it's not empty for as long).

Profile: Forum Veteran
More Information

Quote :

An efficient architecture like Core2 or K8/K8L cannot benefit from HT; it was just some kind of viagra to keep P4s and some PDs competitive for a little more. HT just made up for the incredible rate of cache misses and overall pipeline stalls the netburst architecture had, however, I'd not be surprised if they reintroduced it, just to boast a little more.
 
To darkz:
The 30% you mention is for a netburst P4 of Xeon, given a prescott core. The (somehow) more efficient northwood core, only gained ~15% from HT and something like a Core2, will only get something not more than 5%, an amount that can well be swallowed by thread synchronization in some cases.

In some benches HT actually hurt performance(ever so slightly). With my 3.0C, HT gave a 19% advantage in Cinebench(2003).I liken HT to NCQ on hard drives. It helps in certain scenarios, and doesn't in some.  :? HT does give a great increase in responsiveness(undeniably), and as someone else mentioned, it does leave some horsepower, while the CPU is heavily loaded, to do minor things, and/or as said...at least get to task manager without having to wait for 10 minutes for the CPU to summon up a few cycles.

Profile: newbie
More Information

Ok, so are we going to see 2 threads/core or 4?  :D

m25
Profile: Faithful Poster
More Information

They haven't enabled HT in 65 nm Core2 CPUs, and this tels a lot on it's impact on the new arch. Pentium Ds have it , while Core2 dualcores don't, not even the extreme editions.

Profile: newbie
More Information

Correct - no HT on Penryn.
 
Nehalem (phase II 45nm) will be the next CPU w/ hyper-threading enabled...  and it sounds like they are going to make some "modifications" to how it works to minimize the penalty for using it (when you would lose performance) and to make the CPU's work better together to gain more performance when possible... i.e. all CPU's will share a cache, thus many things can be done smarter / more efficiently w/ this...

m25
Profile: Faithful Poster
More Information

Quote :

In some benches HT actually hurt performance(ever so slightly). With my 3.0C, HT gave a 19% advantage in Cinebench(2003).I liken HT to NCQ on hard drives. It helps in certain scenarios, and doesn't in some. Confused HT does give a great increase in responsiveness(undeniably), and as someone else mentioned, it does leave some horsepower, while the CPU is heavily loaded, to do minor things, and/or as said...at least get to task manager without having to wait for 10 minutes for the CPU to summon up a few cycles.


Even the gain in multitasking is only relative to the netburst inefficient architecture; the multitasking of a HT enabled P4 is often not distinctly superior to that of a comparable K8 Athlon,...taking a look at the charts, the inverse may be true; A singlecore athlon 64 sometimes beats a HT enabled P4:
http://www23.tomshardware.com/cpu. [...] &chart=192

m25
Profile: Faithful Poster
More Information

Looking at the charts.... just found some mistakes there:
1-The Toledo core, for what I know is a dual core K8 with 1024K/core of L2, while they label the S939 4200+ and 4600+, both with 512K/core of L2 as Toledo, while they might be Manchester.
2-Even worse, the X2 4200+ appears to be faster than the X2 4600+ in this multitasking test:
http://www23.tomshardware.com/cpu. [...] &chart=193
 :roll:

Profile: Faithful Poster
More Information
n°1481527
01-30-2007 at 11:11:43 PM