Penryn to have Hyperthreading
Tags:
Last response: in CPUs
The "always" accurate folks at the Inquirer say Penryn will have hyperthreading in addition to 6 megs of cache, SSE4, Low/Hi K dialetrics/gates, and a 45 nano process.
There has always been "rumours" that Core2duo actually had HT circuitry (much like Willamette and early Northwoods did) but that Intel never turn it on.
HT was never worth a whole lot in the desktop universe, a couple % at best, it did however make 25% + differences in selected server type applications.
The big problem was that Netburst could only RETIRE two micro-ops per clock, so regardles of how full the pipeline got stuffed, what came out the other end was always limited.
With Core doing 4 (or kinda 5, depending how you want to argue and/or count things) instructions per clock, hyperthreading may actually work this time.
A more interesting possibility is that Intel may actually have Penryn use hyperthreading to (under some conditions) exectute BOTH haves of a conditional in the code, hyperthread both branches, and then actually use the one that turns out to be correct - greatly reducing the penalty of having to flush the pipeline in the event of a branch prediction error.
There has always been "rumours" that Core2duo actually had HT circuitry (much like Willamette and early Northwoods did) but that Intel never turn it on.
HT was never worth a whole lot in the desktop universe, a couple % at best, it did however make 25% + differences in selected server type applications.
The big problem was that Netburst could only RETIRE two micro-ops per clock, so regardles of how full the pipeline got stuffed, what came out the other end was always limited.
With Core doing 4 (or kinda 5, depending how you want to argue and/or count things) instructions per clock, hyperthreading may actually work this time.
A more interesting possibility is that Intel may actually have Penryn use hyperthreading to (under some conditions) exectute BOTH haves of a conditional in the code, hyperthread both branches, and then actually use the one that turns out to be correct - greatly reducing the penalty of having to flush the pipeline in the event of a branch prediction error.
More about : penryn hyperthreading
It was actually pretty good that in applications that could support it, it gave more advantages than L2 cache doubling can ever do. Here's another comparison:
http://www.anandtech.com/showdoc.aspx?i=1746&p=6
And there is the smoothness of running it with it on.
Related ressources
- Quad core Kentsfield or Penryn ? - Forum
- More Penryn benchmarks! - Forum
- Penryn and Nehalem; Can AMD catch up? - Forum
- Intel demos Penryn at 3.33 ghz at Channel Expo 007 - Forum
- Penryn Delayed to H1 '08 - Forum
Quote:
Interesting post. However, I have never given much creedence to HT tech. It just seems like a marketing driven term rather than an actual benefit. This doesn't mean that as "free" feature I wouldn't take it. I am just skeptical. :wink:In multithreaded apps it improves performance by up to 25%, but since these apps are rare on the desktop (especially a few years ago) I found HT more useful for keeping the PC 'responsive' during high CPU load times. Also, it's particularly useful if an application 'hangs' Windows and is using 100% of CPU resources, HT enables the system to remain responsive enough to actually reach Task Manager and to end the hung application there.
I've been outed
OK. so I am talking about something I don't really know about... BUT even with this info taken with a grain of salt, why should HT technology be something that is compelling on the Penryn CPU? I am not bashing HT, just mentioning that it occupies ~5% of the transitor space (or so I have heard) and does not offer many benefits besides a few particular apps. AGAIN, I will take HT tech/ SSE4+ :wink: (didn't conroe already have sse4 :wink: ) + 1333 FSB, etc. without so much as a grumble
OK. so I am talking about something I don't really know about... BUT even with this info taken with a grain of salt, why should HT technology be something that is compelling on the Penryn CPU? I am not bashing HT, just mentioning that it occupies ~5% of the transitor space (or so I have heard) and does not offer many benefits besides a few particular apps. AGAIN, I will take HT tech/ SSE4+ :wink: (didn't conroe already have sse4 :wink: ) + 1333 FSB, etc. without so much as a grumble
Quote:
I've been outed
OK. so I am talking about something I don't really know about... BUT even with this info taken with a grain of salt, why should HT technology be something that is compelling on the Penryn CPU? I am not bashing HT, just mentioning that it occupies ~5% of the transitor space (or so I have heard) and does not offer many benefits besides a few particular apps. AGAIN, I will take HT tech/ SSE4+ :wink: (didn't conroe already have sse4 :wink: ) + 1333 FSB, etc. without so much as a grumble
I would imagine the main point would be to improve multithreaded performance. It would allow a dual core CPU to execute four threads, although not at the same levels as a true quad core CPU.
Imagine Alan Wake, we have all heard the hype about how there are individual threads for physics, AI, audio etc. The physics component is very intensive and should max out one core, but apparently the AI and audio threads are far less intensive, so in such an instance we may see the benefits of HT in being able to execute more threads simultaneously.
An efficient architecture like Core2 or K8/K8L cannot benefit from HT; it was just some kind of viagra to keep P4s and some PDs competitive for a little more. HT just made up for the incredible rate of cache misses and overall pipeline stalls the netburst architecture had, however, I'd not be surprised if they reintroduced it, just to boast a little more.
To darkz:
The 30% you mention is for a netburst P4 of Xeon, given a prescott core. The (somehow) more efficient northwood core, only gained ~15% from HT and something like a Core2, will only get something not more than 5%, an amount that can well be swallowed by thread synchronization in some cases.
To darkz:
The 30% you mention is for a netburst P4 of Xeon, given a prescott core. The (somehow) more efficient northwood core, only gained ~15% from HT and something like a Core2, will only get something not more than 5%, an amount that can well be swallowed by thread synchronization in some cases.
Quote:
An efficient architecture like Core2 or K8/K8L cannot benefit from HT; it was just some kind of viagra to keep P4s and some PDs competitive for a little more. HT just made up for the incredible rate of cache misses and overall pipeline stalls the netburst architecture had, however, I'd not be surprised if they reintroduced it, just to boast a little more.To darkz:
The 30% you mention is for a netburst P4 of Xeon, given a prescott core. The (somehow) more efficient northwood core, only gained ~15% from HT and something like a Core2, will only get something not more than 5%, an amount that can well be swallowed by thread synchronization in some cases.
Reintroducing it just to boast makes no sense. Intel chased the magic Ghz once just to boast and i´m certain they have learned from that.
Having HT with a long pipeline CPU like P4 wasn´t a bad idea. I´m not sure how well it would work with a 14 or 15 stage CPU. I like the idea though. Since development goes to multicore anyway, why not reintroduce the virtual cores?
Quote:
An efficient architecture like Core2 or K8/K8L cannot benefit from HT; it was just some kind of viagra to keep P4s and some PDs competitive for a little more. HT just made up for the incredible rate of cache misses and overall pipeline stalls the netburst architecture had, however, I'd not be surprised if they reintroduced it, just to boast a little more.To darkz:
The 30% you mention is for a netburst P4 of Xeon, given a prescott core. The (somehow) more efficient northwood core, only gained ~15% from HT and something like a Core2, will only get something not more than 5%, an amount that can well be swallowed by thread synchronization in some cases.
I think the performance advantage from HT will be more than 5%, more like 10%-15% at least. If it was only 5% it would mean that C2D basically has 5x better branch prediction and 5x less cache misses than netburst which I find hard to believe. AFAIK the main advantage of C2D is being able to execute more instructions per cycle which doesn't really affect HT does it?
The differences in branch prediction and cache misses don't have to be 5x for HT to make so much less difference on a Core 2. Since the processor itself os so much more efficient the penalties for cache misses and mis-predictions are far smaller. That means that there is less room for that virtual thread to run in the processor (it's not empty for as long).
Quote:
An efficient architecture like Core2 or K8/K8L cannot benefit from HT; it was just some kind of viagra to keep P4s and some PDs competitive for a little more. HT just made up for the incredible rate of cache misses and overall pipeline stalls the netburst architecture had, however, I'd not be surprised if they reintroduced it, just to boast a little more.To darkz:
The 30% you mention is for a netburst P4 of Xeon, given a prescott core. The (somehow) more efficient northwood core, only gained ~15% from HT and something like a Core2, will only get something not more than 5%, an amount that can well be swallowed by thread synchronization in some cases.
Correct - no HT on Penryn.
Nehalem (phase II 45nm) will be the next CPU w/ hyper-threading enabled... and it sounds like they are going to make some "modifications" to how it works to minimize the penalty for using it (when you would lose performance) and to make the CPU's work better together to gain more performance when possible... i.e. all CPU's will share a cache, thus many things can be done smarter / more efficiently w/ this...
Nehalem (phase II 45nm) will be the next CPU w/ hyper-threading enabled... and it sounds like they are going to make some "modifications" to how it works to minimize the penalty for using it (when you would lose performance) and to make the CPU's work better together to gain more performance when possible... i.e. all CPU's will share a cache, thus many things can be done smarter / more efficiently w/ this...
Quote:
In some benches HT actually hurt performance(ever so slightly). With my 3.0C, HT gave a 19% advantage in Cinebench(2003).I liken HT to NCQ on hard drives. It helps in certain scenarios, and doesn't in some. Confused HT does give a great increase in responsiveness(undeniably), and as someone else mentioned, it does leave some horsepower, while the CPU is heavily loaded, to do minor things, and/or as said...at least get to task manager without having to wait for 10 minutes for the CPU to summon up a few cycles.Even the gain in multitasking is only relative to the netburst inefficient architecture; the multitasking of a HT enabled P4 is often not distinctly superior to that of a comparable K8 Athlon,...taking a look at the charts, the inverse may be true; A singlecore athlon 64 sometimes beats a HT enabled P4:
http://www23.tomshardware.com/cpu.html?modelx=33&model1...
Looking at the charts.... just found some mistakes there:
1-The Toledo core, for what I know is a dual core K8 with 1024K/core of L2, while they label the S939 4200+ and 4600+, both with 512K/core of L2 as Toledo, while they might be Manchester.
2-Even worse, the X2 4200+ appears to be faster than the X2 4600+ in this multitasking test:
http://www23.tomshardware.com/cpu.html?modelx=33&model1...
:roll:
1-The Toledo core, for what I know is a dual core K8 with 1024K/core of L2, while they label the S939 4200+ and 4600+, both with 512K/core of L2 as Toledo, while they might be Manchester.
2-Even worse, the X2 4200+ appears to be faster than the X2 4600+ in this multitasking test:
http://www23.tomshardware.com/cpu.html?modelx=33&model1...
:roll:
Quote:
With my 3.0C, HT gave a 19% advantage in Cinebench(2003)I know Cinebench is a respected benchmark and lots of users/reviewers use it to benchmark, but for benchmarking, its a bad one. It's no less synthetic than PCMark benchmarks. It may be a real used benchmark, but when the behavior is similar to synthetic ones, then its not really useful.
Take a look at multiprocessor scaling for all processors, Pentium 4/D/M/Core Duo/Core 2 Duo/Athlon/AthlonFX/etc. They have nearly similar numbers for 2P, 4P, 8P. Since its a rendering benchmark, Athlon 64's should scale better since it has more bandwidth right?? No. It scales just as well as any Intel processor.
Cinebench, is one of the few benchmarks that shows Core 2 Duo pretty close to the X2.
I just found this....
http://www.dailytech.com/Welcome+Back+HyperThreading/ar...
And it seems that we are really going to have HT after all...! But not in the way that we used to Know...
Check it out...
Any Coments???
Cheers
Sam
http://www.dailytech.com/Welcome+Back+HyperThreading/ar...
And it seems that we are really going to have HT after all...! But not in the way that we used to Know...
Check it out...
Any Coments???
Cheers
Sam
Hyperthreading is Intel's name for simultaneous multi-threading, which let a processor work on two threads at once. Intel did this by having the CPU look like two virtual CPUs so that the OS will schedule tasks on both cores. Other manufacturers use multi-threading techniques similar to HT- Sun's UltraSPARC T1 uses a fine-grained threading technique to let each of its 4, 6, or 8 cores look like 16, 24, or 32 cores to the OS, respectively. IBM's POWER5 dual-core CPU has SMT enabled so the CPU looks like 4 cores to the OS. The PowerPC Processing Engine core in the Cell CPU has SMT enabled also.
The POWER5 CPU has a long-ish pipeline at 20 stages IIRC, which is similar to the P4 Northwood's 21 and much less than the P4 Prescott/Cedar Mill's 31 stages. However, the T1 supposedly has 6-stage pipelines (part of why it runs at only 1.4 GHz maximum) so apparently this multi-threading isn't only useful on long-pipeline chips to mitigate branch mispredictions on long-pipeline CPUs. I'd like to see Intel bring it back, even if it's just to see how well it would work.
The POWER5 CPU has a long-ish pipeline at 20 stages IIRC, which is similar to the P4 Northwood's 21 and much less than the P4 Prescott/Cedar Mill's 31 stages. However, the T1 supposedly has 6-stage pipelines (part of why it runs at only 1.4 GHz maximum) so apparently this multi-threading isn't only useful on long-pipeline chips to mitigate branch mispredictions on long-pipeline CPUs. I'd like to see Intel bring it back, even if it's just to see how well it would work.
I didn´t know that HT was implemented in other cpu uArch outside from Intel. thanks for the info...!
The thing that I was wondering is the way that Intel is going to implement HT, dailey tech reports that the implementation of HT on Penryn's derivates it's diferent compared to P4, they say that the OS is not going to see any virtual cores as I can recall now...!
So if we are going only to see 2 cores for Woldale and 4 cores for Yorksfield, then how is supposed to work this version of HT? :roll:
Anyone?
Thx
Sam
8)
The thing that I was wondering is the way that Intel is going to implement HT, dailey tech reports that the implementation of HT on Penryn's derivates it's diferent compared to P4, they say that the OS is not going to see any virtual cores as I can recall now...!
So if we are going only to see 2 cores for Woldale and 4 cores for Yorksfield, then how is supposed to work this version of HT? :roll:
Anyone?
Thx
Sam
8)
Penryn to NOT have HT.
http://www.theinquirer.net/default.aspx?article=37316
More than enough engineers, Lenovo sales people in outer Mongolia and the usual rabble picketing my house all confirm that it is not there.
http://www.theinquirer.net/default.aspx?article=37316
Quote:
Penryn does not have HT, nor will it ever. That banner is left up to Nehalem in late 2008. More than enough engineers, Lenovo sales people in outer Mongolia and the usual rabble picketing my house all confirm that it is not there.
Quote:
Penryn to NOT have HT.http://www.theinquirer.net/default.aspx?article=37316
Penryn does not have HT, nor will it ever. That banner is left up to Nehalem in late 2008.
More than enough engineers, Lenovo sales people in outer Mongolia and the usual rabble picketing my house all confirm that it is not there.
Quote:
Multiple wrong sourcesThat's what happens when you wait beside the unemployment line, thinking that these ex-Intel, Ex-AMD employees can now spill their guts on upcoming projects. The problem is, they're disgruntled, and will say anything to blemish their ex employees name, now.
Quote:
Sorry for the confusion.He should have that as a siggy, like guys do here. It would stave off carpal-tunnel syndrome that much longer by not having to type all these apologies. :wink:
Related ressources:
- ForumNo HT For Penryn & Nehalem
- ForumAfter Conroe: Penryn - Nehalem - Gesher
- ForumIntel to bring back Hyperthreading with Nehalem core
- ForumWelcome Back Hyper-Threading !
- ForumReverse HyperThreading ? AMD's Next Marvel? Read On...
- Forumdoes 965 support hyperthreading ?
- ForumAnother source confirms Hyper Threading in Penryn !
- ForumPenryn derivatives Yorkfield/Wolfdale to come Q1 2008
- ForumPenryns in trouble?
- ForumGateway P7915u GTX260M
- ForumAMD Piledriver rumours ... and expert conjecture
- ForumWhere can I find a good mechanical keyboard?
- Forumdo all cases have a builtin speaker?
- ForumQuestions I Have Concerning CPU Coolers...
- Forumwill penryn work with 965
- More resources
!
Also, the apps I work with would benefit fully from HT, so I'm all for it...