Sign in with
Sign up | Sign in
Your question

Hyper threading all the way back from P4 to Corei7

Last response: in CPUs
Share
September 1, 2009 3:02:41 AM

they used hyper threading back in the p4, then it disappeared for a while and started to use the P3 method on the Core 2 processors.

so i wanna ask this, why did they make the corei7 hyper threading? does it by doing that gain more performance? if so then how does hyper threading help doing that.
a c 83 à CPUs
September 1, 2009 3:28:54 AM

I don't know all the technical details about hyper threading, but I know it creates logical cores. Back when the P4 was out, it was a single core processor but by giving it hyperthreading it would trick the OS into thinking there was a dual core processor which improved multitasking capabilities and performance in multithreaded apps similar to what a true dual core would do, but to a lesser extent.

I can't say for sure why they left it out of the C2 design and reimplimented it into the I7 design, but once again it is back for the purpose of creating more logical cores to improve multitasking abilities.
a b à CPUs
September 1, 2009 3:38:25 AM

The implementation for P4 and i7 is not the same.
Related resources
September 1, 2009 3:41:43 AM

how they are not the same, could you explain in details please.

thank you.
a b à CPUs
September 1, 2009 4:36:46 AM

P4 and i7 have different microarchitectures - the P4 has a lot of very small pipeline stages that run at a (relatively) faster clock rate, while the i7 has fewer pipeline stages that do more work and run at a slower clock rate.

Hyperthreading takes advantage of unused cycles in the execute stages of a pipeline. Modern CPUs are "superscalar" - meaning they have multiple integer and floating point adders, and memory access units in the execution stages of their pipelines. A single stream of instructions can't keep all of these units busy, so hyperthreading fetches two streams of instructions into the same pipeline so that there's more work for the execution units to do.

The effectiveness of hyperthreading depends on the stages in the pipeline, their ability to decode and queue multiple instructions at once, the types of execution units in the pipeline, and other things.
September 1, 2009 4:58:12 AM

so what your saying is, for example P4 3.0 (hyper threading) and p4 3.0 (not hyper threading)

both have the same clock speed, both have the same specs, but one is hyper the other is not.

now when opening a program that requirs full load, with hyperthreading cpu, it will use all its core 100% to use all of its speed, but in the other cpu not hyperthreading, it will take time to get 100% and will not use all its core speed? is that what your saying?
September 1, 2009 5:54:47 AM

surda said:

now when opening a program that requirs full load, with hyperthreading cpu, it will use all its core 100% to use all of its speed, but in the other cpu not hyperthreading, it will take time to get 100% and will not use all its core speed? is that what your saying?


To give an example, suppose you have a CPU core with two integer arithmetic units and one floating point unit. With a single thread, a single program might be using just one of those integer units and the other integer and the floating point unit are idle for that clock cycle.

If the CPU core supports two threads and the next instruction in the other thread only needs the floating point unit, or one integer unit with or without the floating point unit, then the CPU core can run instructions from both threads simultaneously. If the next instruction from the first thread then needs both integer units and the next instruction from the second thread needs one of the integer units, then the first thread will execute an instruction while the second has to wait. Consequently you get one thread running just about as fast as it would without hyperthreading (there are issues with sharing the cache between threads) while another progresses slowly, but better than not progressing at all without hyperthreading.

In addition, if the first thread then accesses data that's not in the cache, the thread will stall for a number of clock cycles while it waits for data from RAM. That means that the second thread can immediately take over the CPU and run unimpeded for that period until the data is available for the first thread. This is particularly important for simple hyperthreading CPUs like the Atom which don't have out of order execution to allow them to reorder the program to make the most the available resources of the CPU in a single thread... much of the time a thread is waiting for data or for an earlier instruction to complete, and for all that time the other thread can be executing transparently.

Oh yeah, hyperthreading wasn't a big win on the P4 because there wasn't enough spare capacity in the CPU for the second thread to execute on most clock cycles. i7 has been designed with more capacity so that a second thread can execute more often.
a b à CPUs
September 1, 2009 6:57:13 AM

surda said:
now when opening a program that requirs full load, with hyperthreading cpu, it will use all its core 100% to use all of its speed, but in the other cpu not hyperthreading, it will take time to get 100% and will not use all its core speed? is that what your saying?
No... It's difficult to understand without a good grasp of how pipelines work and what the CPU state of a program is. But basically a single CPU has some spare capacity in it's pipeline to do additional work. There's no way that one thread of execution can use all that capacity because of the need to execute instructions sequentially and avoid dependency problems, but if you have two threads of execution then some more of that capacity can be put to use.
a b à CPUs
September 1, 2009 1:18:11 PM

surda said:
how they are not the same, could you explain in details please.

thank you.

http://www.tomshardware.com/reviews/Intel-i7-nehalem-cp...
Paragraph 3:
Quote:
A simple observation led to the introduction of SMT: the “wider” (meaning more execution units) and “deeper” (meaning more pipeline stages) processors become, the harder it is to extract enough parallelism to use all the execution units at each cycle. Where the Pentium 4 was very deep, with a pipeline having more than 20 stages, Nehalem is very wide. It has six execution units capable of executing three memory operations and three calculation operations. If the execution engine can’t find sufficient parallelism of instructions to take advantage of them all, “bubbles”—lost cycles—occur in the pipeline.

I think the original HT had little benefit (aside from better apparent system responsiveness with using Win XP), because it was hard to keep the deeper pipeline well fed.

It is hard to explain in layman's terms as the nature of Hyper-Threading requires quite a bit of know-how about CPU architecture and pipelines.

Here's another article:
http://www.hardwareanalysis.com/content/article/1557.3/

About Scheduling and Caching:
Quote:
Many of the processor resources inside the Hyper-Threading processor are not SMT aware, which means that they cannot distinguish whether a thread is being executed by the first or the second logical processor. For example to the execution unit a sequence of instructions is just data, no information is given about to what thread or logical processor this information belongs to and it is just executed as is. So a problem can arise when one thread claims a shared resource, for example the floating point unit, which the second thread also needs, and as a result the second thread will stall until the first one is completed.
a b à CPUs
September 1, 2009 5:27:49 PM

amnotanoobie said:
the “wider” (meaning more execution units) and “deeper” (meaning more pipeline stages) processors become, the harder it is to extract enough parallelism to use all the execution units at each cycle.
I like that description, thanks for bringing it to my attention!
September 3, 2009 4:58:53 AM

im still new to this hyper threading thing, i might did not get this 100% but, from what i understand now is, a single one cant keep all the core busy so basically you wont be having all full advantage of that core, but with hyper threading, 2 of them will fetch and try to use all of the core speed to gain more performance and try to get all if not most of the core busy to get your programs or whatever your doing faster.

and whats a pipelines? could i get a smiple explanation on that one please.
a b à CPUs
September 3, 2009 8:17:23 AM

surda said:
im still new to this hyper threading thing, i might did not get this 100% but, from what i understand now is, a single one cant keep all the core busy so basically you wont be having all full advantage of that core, but with hyper threading, 2 of them will fetch and try to use all of the core speed to gain more performance and try to get all if not most of the core busy to get your programs or whatever your doing faster.

If by "one", you mean thread, then yes, that's the basic idea.
a b à CPUs
September 3, 2009 9:22:39 AM

surda said:
im still new to this hyper threading thing, i might did not get this 100% but, from what i understand now is, a single one cant keep all the core busy so basically you wont be having all full advantage of that core, but with hyper threading, 2 of them will fetch and try to use all of the core speed to gain more performance and try to get all if not most of the core busy to get your programs or whatever your doing faster.

and whats a pipelines? could i get a smiple explanation on that one please.


That is almost the jist of it.

Here's a simplified article about dual-cores and pipelines:
http://icrontic.com/articles/dual_core/2
a b à CPUs
September 3, 2009 5:46:47 PM

surda said:
whats a pipelines? could i get a smiple explanation on that one please.

Think of it like an automobile assembly line. Let's say that it takes 4 hours from start to finish to put all of the parts together to build an automobile. By that reasoning you might expect that an auto assembly plant could only produce 6 cars a day (4 hours per car x 6 cars = 24 hours).

But of course they produce many, many more cars by using a line where workers do the various assembly stages in parallel for multiple cars. Pipelines in a computer work exactly the same way.

To execute an instruction, the CPU must typically:

- fetch the instruction from memory
- decode the instruction (figure out what it's supposed to do)
- get the data to be used in the instruction
- perform the actual operation (add the two data items, for example)
- store the result

Rather than doing all of this work for one instruction, THEN do it all for the next instruction, the CPU separates these steps (or "pipeline stages") into different circuits and passes multiple instructions through them in parallel, just like an automobile assembly line.

Here a Wikipedia article with more info.
September 4, 2009 5:30:07 AM

ok pipelines are inside the Core, with hyper threading, each thread will accesses more pipelines per nansecond than a single thread, thus making things work a little faster and using more pipelines and getting almost all of them busy.

another advantage of hyper threading is you can be doing both things at one time, lets say for example some programs require only one stream (one thread) to do the work, while the other still idle, thus u can do play or use another program to get the other thread working, AKA mutli tasking thats another advantage for hyper threading.

If u wanna do this with^ without hyper threading, then you will have to wait till the whole operation is done till u can play or use another program smoothly.

but the problem with hyper threading is if the 2 threads needs to access the Ram at the same time using the buss, they wont be able to go and squeeze them selfs inside the bus both together because it will cause errors, so one will have to go using bus to the ram to get the data or other stuff, while the other has to waits in the core doing its job waiting for the other thread to get the data and come back so it can go and accesses the ram, i think the same with Cache.

if there is anything i said is not correct please let me know.

thanks in advance.
a b à CPUs
September 4, 2009 6:53:41 AM

surda said:
ok pipelines are inside the Core, with hyper threading, each thread will accesses more pipelines per nansecond than a single thread, thus making things work a little faster and using more pipelines and getting almost all of them busy.

another advantage of hyper threading is you can be doing both things at one time, lets say for example some programs require only one stream (one thread) to do the work, while the other still idle, thus u can do play or use another program to get the other thread working, AKA mutli tasking thats another advantage for hyper threading.

If u wanna do this with^ without hyper threading, then you will have to wait till the whole operation is done till u can play or use another program smoothly.

but the problem with hyper threading is if the 2 threads needs to access the Ram at the same time using the buss, they wont be able to go and squeeze them selfs inside the bus both together because it will cause errors, so one will have to go using bus to the ram to get the data or other stuff, while the other has to waits in the core doing its job waiting for the other thread to get the data and come back so it can go and accesses the ram, i think the same with Cache.

if there is anything i said is not correct please let me know.

thanks in advance.


I think that's pretty much what it is, unless someone else would really want to get into the really small details of it. That's the major advantage of dual-cores, they really only have to contend with L2 or L3 cache space but they could get data off the ram themselves.
September 4, 2009 8:15:03 AM

so the diff between the dual core (without hyper threading) and P4 (with Hyper threading) is that Dual core have 2 cores with one stream, each core have its own cache and both can accesses ram at the same time, but P4 it only have 1 core with 2 Streams that work on one core, both share the same cache and both cant accesses the ram at the same time?

is this what it is basically or am i wrong?
September 4, 2009 10:00:26 PM

why some say that Hyper threading makes it slow sometimes, how ??

i mean if its 1 thread, its gonna be the same speed, if 2 threads, its either gonna be the same as having 1 thread or faster, not slower

could someone explain to me please.
September 4, 2009 11:42:58 PM

surda said:
why some say that Hyper threading makes it slow sometimes, how ??


For one thing, if both threads want to access a lot of memory, then they'll be fighting over the cache.

I believe the P4 also had bugs which reduced performance when hyperthreading was enabled, but I forget the details.
a b à CPUs
September 5, 2009 1:08:55 AM

surda said:
why some say that Hyper threading makes it slow sometimes, how ??

i mean if its 1 thread, its gonna be the same speed, if 2 threads, its either gonna be the same as having 1 thread or faster, not slower

could someone explain to me please.


MarkG said:
For one thing, if both threads want to access a lot of memory, then they'll be fighting over the cache.

I believe the P4 also had bugs which reduced performance when hyperthreading was enabled, but I forget the details.



Ah hyper threading. still using my HT p4 today.

Just to make it simple, the HT make 1 core act like 2.

So if i was using a single thread app, only 1 thread will be active. If it's a dual thread application than 2 threads are use.

Yes they do use they same cache. Although for the ram, it seams to me both threads the ram.


Now for the the HT slow down issue. i haven't seen it. if anything i normally see slower results with out HT on. Although i guess back in the day when programs were only single threads HT was maybe cause problems with apps that didn't know how to use it. i dont know. just a guess.
September 5, 2009 3:08:34 AM

so P4 3.0 hyper threading always wins over P4 3.0 without hyper threading?
a b à CPUs
September 5, 2009 3:21:10 AM

In single-threaded applications the HT chip may be a little slower. But it's application dependent (as is pretty much everything).
September 5, 2009 5:16:02 AM

The slowdown is caused by the fact that there's still just one core, doing two threads simultaneously. Each thread gets less cache to work with, and if they both need memory or need the same resources, switching back and forth between them can cause a very minor performance loss.

Or when one thread is a game, and the other thread is something you don't really care about, both threads will calculate faster (calcuations for A + B with HT are faster than calculations for A + B without HT, but A's calculations take longer = lower FPS). You get more work done, but if it's doing work on things you don't care about, it can hurt things you do care about. It's like doing two things at once, if you did one at a time each might take 2 hours, but if you do them together it might only take 3 hours. However, if you need the first one done in two hours, it's better to do them one at a time.
September 5, 2009 7:39:30 AM

yes Dekasav i understand that, but lets take that one for example

you said playing a game and doing something els both will calculate faster, but u will get slower FBS on the game right because of having both threads going back and forth on the cache and ram.

imagine that on p4 not hyper threading, wouldn't it be slower than a hyper threading processor? on both the game and the other program?

^that example is for multi tasking

now lets take an example of non multi tasking.

lets say you open the same one program on both p4 hyper threading and another on p4 not hyper threading.

on which processor it will open faster and act faster (just doing one thing at a time)

i bet its the hyper threading because in hyper threading 2 threads will be busy fetching the idle pipelines instead of 1 and get all of the core busy.

so in overall, having hyper threading always wins, or is it not?

Note: both p4 hyper threading and not hyper threading having the same clock speed, example 3.0 GHz

September 5, 2009 3:37:43 PM

Hyperthreading on amd cpu's will rock, yeah i know this has nothing to do with the thread, but i just thin its awesome
September 5, 2009 4:28:34 PM

If the program only uses a single thread, then neither one will open faster. While HT does execute two threads, if a program only has one, you can't magically spawn another one. That's ignoring that how fast a program opens has more to do with your HDD.
a b à CPUs
September 5, 2009 6:25:24 PM

surda said:
lets say you open the same one program on both p4 hyper threading and another on p4 not hyper threading.

on which processor it will open faster and act faster (just doing one thing at a time)

i bet its the hyper threading because in hyper threading 2 threads will be busy fetching the idle pipelines instead of 1 and get all of the core busy.
No, if there really are no other programs running, and if the one program does not incorporate multiple threads, then the program will run exactly the same whether the processor has hyperthreading enabled or not.

The problem is that programs are written as a sequence of instructions. The processor MUST execute the instructions in the same order that they're written. But sometimes there are instructions that have no dependencies that can be executed simultaneously, and this is why pipelines have extra execution units.

For example consider a CPU with a pipeline that has two integer ALUs (Arithmetic Logic Units). If the program has the following two instructions:

a = b + c
d = e + f

...then the CPU can execute both instructions at the same time because the 2nd one has no dependency on the first one. But in with these instructions:

a = b + c
d = a + e

...the 2nd instruction can't be started until the first one completes because it needs to use the result of the 1st instruction. In this case that 2nd ALU is going to be idle while the 1st instruction executes.

In a hyperthreaded CPU, during that idle time the 2nd ALU can be used for an instruction from another program (or thread). But if there's only one program (with one thread running) there's no way to take advantage of it.
September 5, 2009 6:45:31 PM

ok got it.

now just one more

can hyper threading be slower than non hyper threading, i know they can be at the same speed, or hyper threading going faster, but can a not hyper threading win over hyper threading processor.
September 5, 2009 7:35:30 PM

In the P4 there were some situations it could run slower with HT on, but they're fairly rare, and I don't think they happen on Core i7.
September 5, 2009 7:57:43 PM

I'm sure there are, even on i7, but Windows 7 is much smarter about how to divvy up the threads and when to turn HT on and off.
a b à CPUs
September 5, 2009 8:04:23 PM

surda said:
can hyper threading be slower than non hyper threading, i know they can be at the same speed, or hyper threading going faster, but can a not hyper threading win over hyper threading processor.
If you have a task that runs at the highest CPU priority, hyperthreading can impact it's performance. In a single-core non-HT processor, the ONLY instructions that execute are the ones for the process with the highest priority (assuming it's not waiting for something). But a single core processor with hyperthreading can execute instructions for both the high-priority task and for another task.

The issue is that within the pipeline there is no prioritization for one task or the other. Instructions for both tasks go into the pipeline equally, and it's possible for instructions from the lower priority task to use pipeline resources that instructions for the higher priority task could have used. Thus the high priority task will not be able to execute quite as many instructions per second as it would if it had the pipeline all to itself.

This is not that big an issue because it's actually rare for one task to be "top dog" and get exclusive access to the CPU - the OS is always handling interrupts and doing round-robin task switching anyway. And it's even less of an issue with the more modern HT CPUs because each pipeline has more resources and so there's a lower rate of contention between two threads that can use it.
a b à CPUs
September 5, 2009 10:59:04 PM

Well here's HT where an app is only aloud to use 1 thread.



Now here is were the app is aloud to use both threads.





Just some images to show how app use the threads.





I know, i know. this is a old computer that's still using a 478 socket. well when i build stuff i like it lasting for a long while.
September 6, 2009 2:32:57 AM

k thanks for your help guys, really appreciated.

Edit: just one more

whats the diff between hyper threading on p4 and hyper threading on core i7?
a b à CPUs
September 6, 2009 5:25:19 AM

I think probably only Intel knows the complete picture, but the major differences are that the P4 pipeline is "deeper" (ie, has more stages, akin to assembly line stations) and "narrower" (ie, doesn't have as many execution units such as ALUs and floating point units). The latter difference is especially important for hyperthreading since it relies on there being idle execution units in order to execute instructions for two threads at once.

There's also a whole lot more cache on the i7 which will help keep more code and data being used by two instruction streams available without having to wait for slow fetches from RAM.
a b à CPUs
September 6, 2009 5:36:15 PM

surda said:
k thanks for your help guys, really appreciated.

Edit: just one more

whats the diff between hyper threading on p4 and hyper threading on core i7?


sminlal said:
I think probably only Intel knows the complete picture, but the major differences are that the P4 pipeline is "deeper" (ie, has more stages, akin to assembly line stations) and "narrower" (ie, doesn't have as many execution units such as ALUs and floating point units). The latter difference is especially important for hyperthreading since it relies on there being idle execution units in order to execute instructions for two threads at once.

There's also a whole lot more cache on the i7 which will help keep more code and data being used by two instruction streams available without having to wait for slow fetches from RAM.


As sminlal said only intel knows fully. Main differnece it that there is lv1, lv2 and lv3 cache in the the I7 comapered to p4 having only lv1 and 2. Also i7 lv3 cache is 8 times larger than my p4 lv2 cache. Other than that the act very much the same.
September 6, 2009 9:39:25 PM

thanks for the help bro, any more info will be appreciated.
a b à CPUs
September 7, 2009 9:51:14 PM

No problem. Glad to help.
!