Penryn to have Hyperthreading

the_vorlon · Jan 29, 2007

The "always" accurate folks at the Inquirer say Penryn will have hyperthreading in addition to 6 megs of cache, SSE4, Low/Hi K dialetrics/gates, and a 45 nano process.

There has always been "rumours" that Core2duo actually had HT circuitry (much like Willamette and early Northwoods did) but that Intel never turn it on.

HT was never worth a whole lot in the desktop universe, a couple % at best, it did however make 25% + differences in selected server type applications.

The big problem was that Netburst could only RETIRE two micro-ops per clock, so regardles of how full the pipeline got stuffed, what came out the other end was always limited.

With Core doing 4 (or kinda 5, depending how you want to argue and/or count things) instructions per clock, hyperthreading may actually work this time.

A more interesting possibility is that Intel may actually have Penryn use hyperthreading to (under some conditions) exectute BOTH haves of a conditional in the code, hyperthread both branches, and then actually use the one that turns out to be correct - greatly reducing the penalty of having to flush the pipeline in the event of a branch prediction error.

DavidC1 · Jan 29, 2007

You are a little late. I first posted this couple of days ago. As I said, there are lots of recycled posts. :roll:

HeavyF · Jan 29, 2007

Interesting post. However, I have never given much creedence to HT tech. It just seems like a marketing driven term rather than an actual benefit. This doesn't mean that as "free" feature I wouldn't take it. I am just skeptical.

DavidC1 · Jan 29, 2007

http://www.tomshardware.com/2002/11/14/single_cpu_in_dual_operation/page12.html

It was actually pretty good that in applications that could support it, it gave more advantages than L2 cache doubling can ever do. Here's another comparison:
http://www.anandtech.com/showdoc.aspx?i=1746&p=6

And there is the smoothness of running it with it on.

1Tanker · Jan 29, 2007

Interesting post. However, I have never given much creedence to HT tech. It just seems like a marketing driven term rather than an actual benefit. This doesn't mean that as "free" feature I wouldn't take it. I am just skeptical.

Me thinks you never used an HT enabled CPU.

epsilon84 · Jan 29, 2007

Interesting post. However, I have never given much creedence to HT tech. It just seems like a marketing driven term rather than an actual benefit. This doesn't mean that as "free" feature I wouldn't take it. I am just skeptical.

In multithreaded apps it improves performance by up to 25%, but since these apps are rare on the desktop (especially a few years ago) I found HT more useful for keeping the PC 'responsive' during high CPU load times. Also, it's particularly useful if an application 'hangs' Windows and is using 100% of CPU resources, HT enables the system to remain responsive enough to actually reach Task Manager and to end the hung application there.

HeavyF · Jan 29, 2007

I've been outed

OK. so I am talking about something I don't really know about... BUT even with this info taken with a grain of salt, why should HT technology be something that is compelling on the Penryn CPU? I am not bashing HT, just mentioning that it occupies ~5% of the transitor space (or so I have heard) and does not offer many benefits besides a few particular apps. AGAIN, I will take HT tech/ SSE4+

(didn't conroe already have sse4

) + 1333 FSB, etc. without so much as a grumble :lol:

epsilon84 · Jan 29, 2007

I've been outed

OK. so I am talking about something I don't really know about... BUT even with this info taken with a grain of salt, why should HT technology be something that is compelling on the Penryn CPU? I am not bashing HT, just mentioning that it occupies ~5% of the transitor space (or so I have heard) and does not offer many benefits besides a few particular apps. AGAIN, I will take HT tech/ SSE4+ (didn't conroe already have sse4 ) + 1333 FSB, etc. without so much as a grumble :lol:

I would imagine the main point would be to improve multithreaded performance. It would allow a dual core CPU to execute four threads, although not at the same levels as a true quad core CPU.

Imagine Alan Wake, we have all heard the hype about how there are individual threads for physics, AI, audio etc. The physics component is very intensive and should max out one core, but apparently the AI and audio threads are far less intensive, so in such an instance we may see the benefits of HT in being able to execute more threads simultaneously.

darkz · Jan 29, 2007

An application that fully utilizes a multicore CPU has to be multithreaded anyway, and in that case the chances are good that it will scale to 4-8 threads when the CPU has HT.

I've seen ~30% performance gains from hyperthreading in a server app so it does work and is not just a "marketing" feature.

m25 · Jan 29, 2007

An efficient architecture like Core2 or K8/K8L cannot benefit from HT; it was just some kind of viagra to keep P4s and some PDs competitive for a little more. HT just made up for the incredible rate of cache misses and overall pipeline stalls the netburst architecture had, however, I'd not be surprised if they reintroduced it, just to boast a little more.

To darkz:
The 30% you mention is for a netburst P4 of Xeon, given a prescott core. The (somehow) more efficient northwood core, only gained ~15% from HT and something like a Core2, will only get something not more than 5%, an amount that can well be swallowed by thread synchronization in some cases.

Slobogob · Jan 29, 2007

An efficient architecture like Core2 or K8/K8L cannot benefit from HT; it was just some kind of viagra to keep P4s and some PDs competitive for a little more. HT just made up for the incredible rate of cache misses and overall pipeline stalls the netburst architecture had, however, I'd not be surprised if they reintroduced it, just to boast a little more.

To darkz:
The 30% you mention is for a netburst P4 of Xeon, given a prescott core. The (somehow) more efficient northwood core, only gained ~15% from HT and something like a Core2, will only get something not more than 5%, an amount that can well be swallowed by thread synchronization in some cases.

Reintroducing it just to boast makes no sense. Intel chased the magic Ghz once just to boast and i´m certain they have learned from that.

Having HT with a long pipeline CPU like P4 wasn´t a bad idea. I´m not sure how well it would work with a 14 or 15 stage CPU. I like the idea though. Since development goes to multicore anyway, why not reintroduce the virtual cores?

darkz · Jan 29, 2007

An efficient architecture like Core2 or K8/K8L cannot benefit from HT; it was just some kind of viagra to keep P4s and some PDs competitive for a little more. HT just made up for the incredible rate of cache misses and overall pipeline stalls the netburst architecture had, however, I'd not be surprised if they reintroduced it, just to boast a little more.

To darkz:
The 30% you mention is for a netburst P4 of Xeon, given a prescott core. The (somehow) more efficient northwood core, only gained ~15% from HT and something like a Core2, will only get something not more than 5%, an amount that can well be swallowed by thread synchronization in some cases.

I think the performance advantage from HT will be more than 5%, more like 10%-15% at least. If it was only 5% it would mean that C2D basically has 5x better branch prediction and 5x less cache misses than netburst which I find hard to believe. AFAIK the main advantage of C2D is being able to execute more instructions per cycle which doesn't really affect HT does it?

darkz · Jan 29, 2007

I like the idea though. Since development goes to multicore anyway, why not reintroduce the virtual cores?

My thoughts exactly

Also, the apps I work with would benefit fully from HT, so I'm all for it...

Aragorn · Jan 29, 2007

The differences in branch prediction and cache misses don't have to be 5x for HT to make so much less difference on a Core 2. Since the processor itself os so much more efficient the penalties for cache misses and mis-predictions are far smaller. That means that there is less room for that virtual thread to run in the processor (it's not empty for as long).

1Tanker · Jan 29, 2007

An efficient architecture like Core2 or K8/K8L cannot benefit from HT; it was just some kind of viagra to keep P4s and some PDs competitive for a little more. HT just made up for the incredible rate of cache misses and overall pipeline stalls the netburst architecture had, however, I'd not be surprised if they reintroduced it, just to boast a little more.

To darkz:
The 30% you mention is for a netburst P4 of Xeon, given a prescott core. The (somehow) more efficient northwood core, only gained ~15% from HT and something like a Core2, will only get something not more than 5%, an amount that can well be swallowed by thread synchronization in some cases.

In some benches HT actually hurt performance(ever so slightly). With my 3.0C, HT gave a 19% advantage in Cinebench(2003).I liken HT to NCQ on hard drives. It helps in certain scenarios, and doesn't in some. :? HT does give a great increase in responsiveness(undeniably), and as someone else mentioned, it does leave some horsepower, while the CPU is heavily loaded, to do minor things, and/or as said...at least get to task manager without having to wait for 10 minutes for the CPU to summon up a few cycles.

mwswami · Jan 29, 2007

Ok, so are we going to see 2 threads/core or 4?

m25 · Jan 30, 2007

They haven't enabled HT in 65 nm Core2 CPUs, and this tels a lot on it's impact on the new arch. Pentium Ds have it , while Core2 dualcores don't, not even the extreme editions.

AZHeat · Jan 30, 2007

Correct - no HT on Penryn.

Nehalem (phase II 45nm) will be the next CPU w/ hyper-threading enabled... and it sounds like they are going to make some "modifications" to how it works to minimize the penalty for using it (when you would lose performance) and to make the CPU's work better together to gain more performance when possible... i.e. all CPU's will share a cache, thus many things can be done smarter / more efficiently w/ this...

m25 · Jan 30, 2007

In some benches HT actually hurt performance(ever so slightly). With my 3.0C, HT gave a 19% advantage in Cinebench(2003).I liken HT to NCQ on hard drives. It helps in certain scenarios, and doesn't in some. Confused HT does give a great increase in responsiveness(undeniably), and as someone else mentioned, it does leave some horsepower, while the CPU is heavily loaded, to do minor things, and/or as said...at least get to task manager without having to wait for 10 minutes for the CPU to summon up a few cycles.

Even the gain in multitasking is only relative to the netburst inefficient architecture; the multitasking of a HT enabled P4 is often not distinctly superior to that of a comparable K8 Athlon,...taking a look at the charts, the inverse may be true; A singlecore athlon 64 sometimes beats a HT enabled P4:
http://www23.tomshardware.com/cpu.html?modelx=33&model1=444&model2=491&chart=192

m25 · Jan 30, 2007

Looking at the charts.... just found some mistakes there:
1-The Toledo core, for what I know is a dual core K8 with 1024K/core of L2, while they label the S939 4200+ and 4600+, both with 512K/core of L2 as Toledo, while they might be Manchester.
2-Even worse, the X2 4200+ appears to be faster than the X2 4600+ in this multitasking test:
http://www23.tomshardware.com/cpu.html?modelx=33&model1=477&model2=479&chart=193
:roll:

gOJDO · Jan 30, 2007

It is not mistake, there are X2's with Toledo core, but with half L2 dissabled.

DavidC1 · Jan 31, 2007

With my 3.0C, HT gave a 19% advantage in Cinebench(2003)

I know Cinebench is a respected benchmark and lots of users/reviewers use it to benchmark, but for benchmarking, its a bad one. It's no less synthetic than PCMark benchmarks. It may be a real used benchmark, but when the behavior is similar to synthetic ones, then its not really useful.

Take a look at multiprocessor scaling for all processors, Pentium 4/D/M/Core Duo/Core 2 Duo/Athlon/AthlonFX/etc. They have nearly similar numbers for 2P, 4P, 8P. Since its a rendering benchmark, Athlon 64's should scale better since it has more bandwidth right?? No. It scales just as well as any Intel processor.

Cinebench, is one of the few benchmarks that shows Core 2 Duo pretty close to the X2.

samxxxii · Jan 31, 2007

I just found this....

http://www.dailytech.com/Welcome+Back+HyperThreading/article5921.htm

And it seems that we are really going to have HT after all...! But not in the way that we used to Know...

Check it out...

Any Coments???

Cheers

Sam

MU_Engineer · Jan 31, 2007

Hyperthreading is Intel's name for simultaneous multi-threading, which let a processor work on two threads at once. Intel did this by having the CPU look like two virtual CPUs so that the OS will schedule tasks on both cores. Other manufacturers use multi-threading techniques similar to HT- Sun's UltraSPARC T1 uses a fine-grained threading technique to let each of its 4, 6, or 8 cores look like 16, 24, or 32 cores to the OS, respectively. IBM's POWER5 dual-core CPU has SMT enabled so the CPU looks like 4 cores to the OS. The PowerPC Processing Engine core in the Cell CPU has SMT enabled also.

The POWER5 CPU has a long-ish pipeline at 20 stages IIRC, which is similar to the P4 Northwood's 21 and much less than the P4 Prescott/Cedar Mill's 31 stages. However, the T1 supposedly has 6-stage pipelines (part of why it runs at only 1.4 GHz maximum) so apparently this multi-threading isn't only useful on long-pipeline chips to mitigate branch mispredictions on long-pipeline CPUs. I'd like to see Intel bring it back, even if it's just to see how well it would work.

samxxxii · Jan 31, 2007

I didn´t know that HT was implemented in other cpu uArch outside from Intel. thanks for the info...!

The thing that I was wondering is the way that Intel is going to implement HT, dailey tech reports that the implementation of HT on Penryn's derivates it's diferent compared to P4, they say that the OS is not going to see any virtual cores as I can recall now...!

So if we are going only to see 2 cores for Woldale and 4 cores for Yorksfield, then how is supposed to work this version of HT? :roll:

Anyone?

Thx

Sam
8)

Penryn to have Hyperthreading

Distinguished

Distinguished

Distinguished

Distinguished

Splendid

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Splendid

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Splendid

Distinguished

Share this page