Sign in with
Sign up | Sign in
Your question

Penryn to have Hyperthreading

Tags:
Last response: in CPUs
Share
January 29, 2007 2:25:00 AM

The "always" accurate folks at the Inquirer say Penryn will have hyperthreading in addition to 6 megs of cache, SSE4, Low/Hi K dialetrics/gates, and a 45 nano process.

There has always been "rumours" that Core2duo actually had HT circuitry (much like Willamette and early Northwoods did) but that Intel never turn it on.

HT was never worth a whole lot in the desktop universe, a couple % at best, it did however make 25% + differences in selected server type applications.

The big problem was that Netburst could only RETIRE two micro-ops per clock, so regardles of how full the pipeline got stuffed, what came out the other end was always limited.

With Core doing 4 (or kinda 5, depending how you want to argue and/or count things) instructions per clock, hyperthreading may actually work this time.

A more interesting possibility is that Intel may actually have Penryn use hyperthreading to (under some conditions) exectute BOTH haves of a conditional in the code, hyperthread both branches, and then actually use the one that turns out to be correct - greatly reducing the penalty of having to flush the pipeline in the event of a branch prediction error.

More about : penryn hyperthreading

January 29, 2007 2:50:30 AM

You are a little late. I first posted this couple of days ago. As I said, there are lots of recycled posts. :roll:
January 29, 2007 2:55:25 AM

Interesting post. However, I have never given much creedence to HT tech. It just seems like a marketing driven term rather than an actual benefit. This doesn't mean that as "free" feature I wouldn't take it. I am just skeptical. :wink:
Related resources
January 29, 2007 3:57:39 AM

Quote:
Interesting post. However, I have never given much creedence to HT tech. It just seems like a marketing driven term rather than an actual benefit. This doesn't mean that as "free" feature I wouldn't take it. I am just skeptical. :wink:
Me thinks you never used an HT enabled CPU.
January 29, 2007 4:41:47 AM

Quote:
Interesting post. However, I have never given much creedence to HT tech. It just seems like a marketing driven term rather than an actual benefit. This doesn't mean that as "free" feature I wouldn't take it. I am just skeptical. :wink:


In multithreaded apps it improves performance by up to 25%, but since these apps are rare on the desktop (especially a few years ago) I found HT more useful for keeping the PC 'responsive' during high CPU load times. Also, it's particularly useful if an application 'hangs' Windows and is using 100% of CPU resources, HT enables the system to remain responsive enough to actually reach Task Manager and to end the hung application there.
January 29, 2007 5:09:58 AM

I've been outed :o 

OK. so I am talking about something I don't really know about... BUT even with this info taken with a grain of salt, why should HT technology be something that is compelling on the Penryn CPU? I am not bashing HT, just mentioning that it occupies ~5% of the transitor space (or so I have heard) and does not offer many benefits besides a few particular apps. AGAIN, I will take HT tech/ SSE4+ :wink: (didn't conroe already have sse4 :wink: ) + 1333 FSB, etc. without so much as a grumble :lol: 
January 29, 2007 6:52:29 AM

Quote:
I've been outed :o 

OK. so I am talking about something I don't really know about... BUT even with this info taken with a grain of salt, why should HT technology be something that is compelling on the Penryn CPU? I am not bashing HT, just mentioning that it occupies ~5% of the transitor space (or so I have heard) and does not offer many benefits besides a few particular apps. AGAIN, I will take HT tech/ SSE4+ :wink: (didn't conroe already have sse4 :wink: ) + 1333 FSB, etc. without so much as a grumble :lol: 


I would imagine the main point would be to improve multithreaded performance. It would allow a dual core CPU to execute four threads, although not at the same levels as a true quad core CPU.

Imagine Alan Wake, we have all heard the hype about how there are individual threads for physics, AI, audio etc. The physics component is very intensive and should max out one core, but apparently the AI and audio threads are far less intensive, so in such an instance we may see the benefits of HT in being able to execute more threads simultaneously.
January 29, 2007 7:48:09 AM

An application that fully utilizes a multicore CPU has to be multithreaded anyway, and in that case the chances are good that it will scale to 4-8 threads when the CPU has HT.

I've seen ~30% performance gains from hyperthreading in a server app so it does work and is not just a "marketing" feature.
January 29, 2007 10:23:02 AM

An efficient architecture like Core2 or K8/K8L cannot benefit from HT; it was just some kind of viagra to keep P4s and some PDs competitive for a little more. HT just made up for the incredible rate of cache misses and overall pipeline stalls the netburst architecture had, however, I'd not be surprised if they reintroduced it, just to boast a little more.

To darkz:
The 30% you mention is for a netburst P4 of Xeon, given a prescott core. The (somehow) more efficient northwood core, only gained ~15% from HT and something like a Core2, will only get something not more than 5%, an amount that can well be swallowed by thread synchronization in some cases.
January 29, 2007 10:45:01 AM

Quote:
An efficient architecture like Core2 or K8/K8L cannot benefit from HT; it was just some kind of viagra to keep P4s and some PDs competitive for a little more. HT just made up for the incredible rate of cache misses and overall pipeline stalls the netburst architecture had, however, I'd not be surprised if they reintroduced it, just to boast a little more.

To darkz:
The 30% you mention is for a netburst P4 of Xeon, given a prescott core. The (somehow) more efficient northwood core, only gained ~15% from HT and something like a Core2, will only get something not more than 5%, an amount that can well be swallowed by thread synchronization in some cases.


Reintroducing it just to boast makes no sense. Intel chased the magic Ghz once just to boast and i´m certain they have learned from that.

Having HT with a long pipeline CPU like P4 wasn´t a bad idea. I´m not sure how well it would work with a 14 or 15 stage CPU. I like the idea though. Since development goes to multicore anyway, why not reintroduce the virtual cores?
January 29, 2007 11:24:30 AM

Quote:
An efficient architecture like Core2 or K8/K8L cannot benefit from HT; it was just some kind of viagra to keep P4s and some PDs competitive for a little more. HT just made up for the incredible rate of cache misses and overall pipeline stalls the netburst architecture had, however, I'd not be surprised if they reintroduced it, just to boast a little more.

To darkz:
The 30% you mention is for a netburst P4 of Xeon, given a prescott core. The (somehow) more efficient northwood core, only gained ~15% from HT and something like a Core2, will only get something not more than 5%, an amount that can well be swallowed by thread synchronization in some cases.


I think the performance advantage from HT will be more than 5%, more like 10%-15% at least. If it was only 5% it would mean that C2D basically has 5x better branch prediction and 5x less cache misses than netburst which I find hard to believe. AFAIK the main advantage of C2D is being able to execute more instructions per cycle which doesn't really affect HT does it?
January 29, 2007 11:28:24 AM

Quote:
I like the idea though. Since development goes to multicore anyway, why not reintroduce the virtual cores?


My thoughts exactly :D  Also, the apps I work with would benefit fully from HT, so I'm all for it...
January 29, 2007 11:59:45 AM

The differences in branch prediction and cache misses don't have to be 5x for HT to make so much less difference on a Core 2. Since the processor itself os so much more efficient the penalties for cache misses and mis-predictions are far smaller. That means that there is less room for that virtual thread to run in the processor (it's not empty for as long).
January 29, 2007 4:00:23 PM

Quote:
An efficient architecture like Core2 or K8/K8L cannot benefit from HT; it was just some kind of viagra to keep P4s and some PDs competitive for a little more. HT just made up for the incredible rate of cache misses and overall pipeline stalls the netburst architecture had, however, I'd not be surprised if they reintroduced it, just to boast a little more.

To darkz:
The 30% you mention is for a netburst P4 of Xeon, given a prescott core. The (somehow) more efficient northwood core, only gained ~15% from HT and something like a Core2, will only get something not more than 5%, an amount that can well be swallowed by thread synchronization in some cases.
In some benches HT actually hurt performance(ever so slightly). With my 3.0C, HT gave a 19% advantage in Cinebench(2003).I liken HT to NCQ on hard drives. It helps in certain scenarios, and doesn't in some. :? HT does give a great increase in responsiveness(undeniably), and as someone else mentioned, it does leave some horsepower, while the CPU is heavily loaded, to do minor things, and/or as said...at least get to task manager without having to wait for 10 minutes for the CPU to summon up a few cycles.
January 29, 2007 5:07:21 PM

Ok, so are we going to see 2 threads/core or 4? :D 
January 30, 2007 6:58:08 PM

They haven't enabled HT in 65 nm Core2 CPUs, and this tels a lot on it's impact on the new arch. Pentium Ds have it , while Core2 dualcores don't, not even the extreme editions.
January 30, 2007 7:10:17 PM

Correct - no HT on Penryn.

Nehalem (phase II 45nm) will be the next CPU w/ hyper-threading enabled... and it sounds like they are going to make some "modifications" to how it works to minimize the penalty for using it (when you would lose performance) and to make the CPU's work better together to gain more performance when possible... i.e. all CPU's will share a cache, thus many things can be done smarter / more efficiently w/ this...
January 30, 2007 7:33:52 PM

Quote:
In some benches HT actually hurt performance(ever so slightly). With my 3.0C, HT gave a 19% advantage in Cinebench(2003).I liken HT to NCQ on hard drives. It helps in certain scenarios, and doesn't in some. Confused HT does give a great increase in responsiveness(undeniably), and as someone else mentioned, it does leave some horsepower, while the CPU is heavily loaded, to do minor things, and/or as said...at least get to task manager without having to wait for 10 minutes for the CPU to summon up a few cycles.

Even the gain in multitasking is only relative to the netburst inefficient architecture; the multitasking of a HT enabled P4 is often not distinctly superior to that of a comparable K8 Athlon,...taking a look at the charts, the inverse may be true; A singlecore athlon 64 sometimes beats a HT enabled P4:
http://www23.tomshardware.com/cpu.html?modelx=33&model1...
January 30, 2007 7:54:20 PM

Looking at the charts.... just found some mistakes there:
1-The Toledo core, for what I know is a dual core K8 with 1024K/core of L2, while they label the S939 4200+ and 4600+, both with 512K/core of L2 as Toledo, while they might be Manchester.
2-Even worse, the X2 4200+ appears to be faster than the X2 4600+ in this multitasking test:
http://www23.tomshardware.com/cpu.html?modelx=33&model1...
:roll:
January 30, 2007 8:11:43 PM

It is not mistake, there are X2's with Toledo core, but with half L2 dissabled.
January 31, 2007 12:45:54 AM

Quote:
With my 3.0C, HT gave a 19% advantage in Cinebench(2003)


I know Cinebench is a respected benchmark and lots of users/reviewers use it to benchmark, but for benchmarking, its a bad one. It's no less synthetic than PCMark benchmarks. It may be a real used benchmark, but when the behavior is similar to synthetic ones, then its not really useful.

Take a look at multiprocessor scaling for all processors, Pentium 4/D/M/Core Duo/Core 2 Duo/Athlon/AthlonFX/etc. They have nearly similar numbers for 2P, 4P, 8P. Since its a rendering benchmark, Athlon 64's should scale better since it has more bandwidth right?? No. It scales just as well as any Intel processor.

Cinebench, is one of the few benchmarks that shows Core 2 Duo pretty close to the X2.
a c 102 à CPUs
January 31, 2007 1:54:53 AM

Hyperthreading is Intel's name for simultaneous multi-threading, which let a processor work on two threads at once. Intel did this by having the CPU look like two virtual CPUs so that the OS will schedule tasks on both cores. Other manufacturers use multi-threading techniques similar to HT- Sun's UltraSPARC T1 uses a fine-grained threading technique to let each of its 4, 6, or 8 cores look like 16, 24, or 32 cores to the OS, respectively. IBM's POWER5 dual-core CPU has SMT enabled so the CPU looks like 4 cores to the OS. The PowerPC Processing Engine core in the Cell CPU has SMT enabled also.

The POWER5 CPU has a long-ish pipeline at 20 stages IIRC, which is similar to the P4 Northwood's 21 and much less than the P4 Prescott/Cedar Mill's 31 stages. However, the T1 supposedly has 6-stage pipelines (part of why it runs at only 1.4 GHz maximum) so apparently this multi-threading isn't only useful on long-pipeline chips to mitigate branch mispredictions on long-pipeline CPUs. I'd like to see Intel bring it back, even if it's just to see how well it would work.
January 31, 2007 3:40:22 AM

I didn´t know that HT was implemented in other cpu uArch outside from Intel. thanks for the info...!

The thing that I was wondering is the way that Intel is going to implement HT, dailey tech reports that the implementation of HT on Penryn's derivates it's diferent compared to P4, they say that the OS is not going to see any virtual cores as I can recall now...!

So if we are going only to see 2 cores for Woldale and 4 cores for Yorksfield, then how is supposed to work this version of HT? :roll:

Anyone?

Thx

Sam
8)
January 31, 2007 9:14:02 AM

Penryn to NOT have HT.

http://www.theinquirer.net/default.aspx?article=37316

Quote:
Penryn does not have HT, nor will it ever. That banner is left up to Nehalem in late 2008.

More than enough engineers, Lenovo sales people in outer Mongolia and the usual rabble picketing my house all confirm that it is not there.
January 31, 2007 3:39:55 PM

Quote:
Penryn to NOT have HT.

http://www.theinquirer.net/default.aspx?article=37316

Penryn does not have HT, nor will it ever. That banner is left up to Nehalem in late 2008.

More than enough engineers, Lenovo sales people in outer Mongolia and the usual rabble picketing my house all confirm that it is not there.


Quote:
Multiple wrong sources


That's what happens when you wait beside the unemployment line, thinking that these ex-Intel, Ex-AMD employees can now spill their guts on upcoming projects. The problem is, they're disgruntled, and will say anything to blemish their ex employees name, now.

Quote:
Sorry for the confusion.


He should have that as a siggy, like guys do here. It would stave off carpal-tunnel syndrome that much longer by not having to type all these apologies. :wink:
!