Hyper threading all the way back from P4 to Corei7

surda · Sep 1, 2009

they used hyper threading back in the p4, then it disappeared for a while and started to use the P3 method on the Core 2 processors.

so i wanna ask this, why did they make the corei7 hyper threading? does it by doing that gain more performance? if so then how does hyper threading help doing that.

loneninja · Sep 1, 2009

I don't know all the technical details about hyper threading, but I know it creates logical cores. Back when the P4 was out, it was a single core processor but by giving it hyperthreading it would trick the OS into thinking there was a dual core processor which improved multitasking capabilities and performance in multithreaded apps similar to what a true dual core would do, but to a lesser extent.

I can't say for sure why they left it out of the C2 design and reimplimented it into the I7 design, but once again it is back for the purpose of creating more logical cores to improve multitasking abilities.

randomizer · Sep 1, 2009

The implementation for P4 and i7 is not the same.

surda · Sep 1, 2009

how they are not the same, could you explain in details please.

thank you.

sminlal · Sep 1, 2009

P4 and i7 have different microarchitectures - the P4 has a lot of very small pipeline stages that run at a (relatively) faster clock rate, while the i7 has fewer pipeline stages that do more work and run at a slower clock rate.

Hyperthreading takes advantage of unused cycles in the execute stages of a pipeline. Modern CPUs are "superscalar" - meaning they have multiple integer and floating point adders, and memory access units in the execution stages of their pipelines. A single stream of instructions can't keep all of these units busy, so hyperthreading fetches two streams of instructions into the same pipeline so that there's more work for the execution units to do.

The effectiveness of hyperthreading depends on the stages in the pipeline, their ability to decode and queue multiple instructions at once, the types of execution units in the pipeline, and other things.

surda · Sep 1, 2009

so what your saying is, for example P4 3.0 (hyper threading) and p4 3.0 (not hyper threading)

both have the same clock speed, both have the same specs, but one is hyper the other is not.

now when opening a program that requirs full load, with hyperthreading cpu, it will use all its core 100% to use all of its speed, but in the other cpu not hyperthreading, it will take time to get 100% and will not use all its core speed? is that what your saying?

MarkG · Sep 1, 2009

surda :

To give an example, suppose you have a CPU core with two integer arithmetic units and one floating point unit. With a single thread, a single program might be using just one of those integer units and the other integer and the floating point unit are idle for that clock cycle.

If the CPU core supports two threads and the next instruction in the other thread only needs the floating point unit, or one integer unit with or without the floating point unit, then the CPU core can run instructions from both threads simultaneously. If the next instruction from the first thread then needs both integer units and the next instruction from the second thread needs one of the integer units, then the first thread will execute an instruction while the second has to wait. Consequently you get one thread running just about as fast as it would without hyperthreading (there are issues with sharing the cache between threads) while another progresses slowly, but better than not progressing at all without hyperthreading.

In addition, if the first thread then accesses data that's not in the cache, the thread will stall for a number of clock cycles while it waits for data from RAM. That means that the second thread can immediately take over the CPU and run unimpeded for that period until the data is available for the first thread. This is particularly important for simple hyperthreading CPUs like the Atom which don't have out of order execution to allow them to reorder the program to make the most the available resources of the CPU in a single thread... much of the time a thread is waiting for data or for an earlier instruction to complete, and for all that time the other thread can be executing transparently.

Oh yeah, hyperthreading wasn't a big win on the P4 because there wasn't enough spare capacity in the CPU for the second thread to execute on most clock cycles. i7 has been designed with more capacity so that a second thread can execute more often.

sminlal · Sep 1, 2009

surda :

No... It's difficult to understand without a good grasp of how pipelines work and what the CPU state of a program is. But basically a single CPU has some spare capacity in it's pipeline to do additional work. There's no way that one thread of execution can use all that capacity because of the need to execute instructions sequentially and avoid dependency problems, but if you have two threads of execution then some more of that capacity can be put to use.

amnotanoobie · Sep 1, 2009

surda :

http://www.tomshardware.com/reviews/Intel-i7-nehalem-cpu,2041-5.html
Paragraph 3:

A simple observation led to the introduction of SMT: the “wider” (meaning more execution units) and “deeper” (meaning more pipeline stages) processors become, the harder it is to extract enough parallelism to use all the execution units at each cycle. Where the Pentium 4 was very deep, with a pipeline having more than 20 stages, Nehalem is very wide. It has six execution units capable of executing three memory operations and three calculation operations. If the execution engine can’t find sufficient parallelism of instructions to take advantage of them all, “bubbles”—lost cycles—occur in the pipeline.

I think the original HT had little benefit (aside from better apparent system responsiveness with using Win XP), because it was hard to keep the deeper pipeline well fed.

It is hard to explain in layman's terms as the nature of Hyper-Threading requires quite a bit of know-how about CPU architecture and pipelines.

Here's another article:
http://www.hardwareanalysis.com/content/article/1557.3/

About Scheduling and Caching:

Many of the processor resources inside the Hyper-Threading processor are not SMT aware, which means that they cannot distinguish whether a thread is being executed by the first or the second logical processor. For example to the execution unit a sequence of instructions is just data, no information is given about to what thread or logical processor this information belongs to and it is just executed as is. So a problem can arise when one thread claims a shared resource, for example the floating point unit, which the second thread also needs, and as a result the second thread will stall until the first one is completed.

sminlal · Sep 1, 2009

amnotanoobie :

I like that description, thanks for bringing it to my attention!

TechnologyCoordinator · Sep 1, 2009

MarkG :

Nicely done MarkG.

surda · Sep 3, 2009

im still new to this hyper threading thing, i might did not get this 100% but, from what i understand now is, a single one cant keep all the core busy so basically you wont be having all full advantage of that core, but with hyper threading, 2 of them will fetch and try to use all of the core speed to gain more performance and try to get all if not most of the core busy to get your programs or whatever your doing faster.

and whats a pipelines? could i get a smiple explanation on that one please.

cjl · Sep 3, 2009

surda :

If by "one", you mean thread, then yes, that's the basic idea.

amnotanoobie · Sep 3, 2009

surda :

That is almost the jist of it.

Here's a simplified article about dual-cores and pipelines:
http://icrontic.com/articles/dual_core/2

sminlal · Sep 3, 2009

surda :

Think of it like an automobile assembly line. Let's say that it takes 4 hours from start to finish to put all of the parts together to build an automobile. By that reasoning you might expect that an auto assembly plant could only produce 6 cars a day (4 hours per car x 6 cars = 24 hours).

But of course they produce many, many more cars by using a line where workers do the various assembly stages in parallel for multiple cars. Pipelines in a computer work exactly the same way.

To execute an instruction, the CPU must typically:

- fetch the instruction from memory
- decode the instruction (figure out what it's supposed to do)
- get the data to be used in the instruction
- perform the actual operation (add the two data items, for example)
- store the result

Rather than doing all of this work for one instruction, THEN do it all for the next instruction, the CPU separates these steps (or "pipeline stages") into different circuits and passes multiple instructions through them in parallel, just like an automobile assembly line.

Here a Wikipedia article with more info.

surda · Sep 4, 2009

ok pipelines are inside the Core, with hyper threading, each thread will accesses more pipelines per nansecond than a single thread, thus making things work a little faster and using more pipelines and getting almost all of them busy.

another advantage of hyper threading is you can be doing both things at one time, lets say for example some programs require only one stream (one thread) to do the work, while the other still idle, thus u can do play or use another program to get the other thread working, AKA mutli tasking thats another advantage for hyper threading.

If u wanna do this with^ without hyper threading, then you will have to wait till the whole operation is done till u can play or use another program smoothly.

but the problem with hyper threading is if the 2 threads needs to access the Ram at the same time using the buss, they wont be able to go and squeeze them selfs inside the bus both together because it will cause errors, so one will have to go using bus to the ram to get the data or other stuff, while the other has to waits in the core doing its job waiting for the other thread to get the data and come back so it can go and accesses the ram, i think the same with Cache.

if there is anything i said is not correct please let me know.

thanks in advance.

amnotanoobie · Sep 4, 2009

surda :

ok pipelines are inside the Core, with hyper threading, each thread will accesses more pipelines per nansecond than a single thread, thus making things work a little faster and using more pipelines and getting almost all of them busy.

another advantage of hyper threading is you can be doing both things at one time, lets say for example some programs require only one stream (one thread) to do the work, while the other still idle, thus u can do play or use another program to get the other thread working, AKA mutli tasking thats another advantage for hyper threading.

If u wanna do this with^ without hyper threading, then you will have to wait till the whole operation is done till u can play or use another program smoothly.

but the problem with hyper threading is if the 2 threads needs to access the Ram at the same time using the buss, they wont be able to go and squeeze them selfs inside the bus both together because it will cause errors, so one will have to go using bus to the ram to get the data or other stuff, while the other has to waits in the core doing its job waiting for the other thread to get the data and come back so it can go and accesses the ram, i think the same with Cache.

if there is anything i said is not correct please let me know.

thanks in advance.

I think that's pretty much what it is, unless someone else would really want to get into the really small details of it. That's the major advantage of dual-cores, they really only have to contend with L2 or L3 cache space but they could get data off the ram themselves.

surda · Sep 4, 2009

so the diff between the dual core (without hyper threading) and P4 (with Hyper threading) is that Dual core have 2 cores with one stream, each core have its own cache and both can accesses ram at the same time, but P4 it only have 1 core with 2 Streams that work on one core, both share the same cache and both cant accesses the ram at the same time?

is this what it is basically or am i wrong?

surda · Sep 5, 2009

why some say that Hyper threading makes it slow sometimes, how ??

i mean if its 1 thread, its gonna be the same speed, if 2 threads, its either gonna be the same as having 1 thread or faster, not slower

could someone explain to me please.

MarkG · Sep 5, 2009

surda :

For one thing, if both threads want to access a lot of memory, then they'll be fighting over the cache.

I believe the P4 also had bugs which reduced performance when hyperthreading was enabled, but I forget the details.

warmon6 · Sep 5, 2009

surda :

MarkG :

Ah hyper threading. still using my HT p4 today.

Just to make it simple, the HT make 1 core act like 2.

So if i was using a single thread app, only 1 thread will be active. If it's a dual thread application than 2 threads are use.

Yes they do use they same cache. Although for the ram, it seams to me both threads the ram.

Now for the the HT slow down issue. i haven't seen it. if anything i normally see slower results with out HT on. Although i guess back in the day when programs were only single threads HT was maybe cause problems with apps that didn't know how to use it. i dont know. just a guess.

surda · Sep 5, 2009

so P4 3.0 hyper threading always wins over P4 3.0 without hyper threading?

randomizer · Sep 5, 2009

In single-threaded applications the HT chip may be a little slower. But it's application dependent (as is pretty much everything).

Dekasav · Sep 5, 2009

The slowdown is caused by the fact that there's still just one core, doing two threads simultaneously. Each thread gets less cache to work with, and if they both need memory or need the same resources, switching back and forth between them can cause a very minor performance loss.

Or when one thread is a game, and the other thread is something you don't really care about, both threads will calculate faster (calcuations for A + B with HT are faster than calculations for A + B without HT, but A's calculations take longer = lower FPS). You get more work done, but if it's doing work on things you don't care about, it can hurt things you do care about. It's like doing two things at once, if you did one at a time each might take 2 hours, but if you do them together it might only take 3 hours. However, if you need the first one done in two hours, it's better to do them one at a time.

surda · Sep 5, 2009

yes Dekasav i understand that, but lets take that one for example

you said playing a game and doing something els both will calculate faster, but u will get slower FBS on the game right because of having both threads going back and forth on the cache and ram.

imagine that on p4 not hyper threading, wouldn't it be slower than a hyper threading processor? on both the game and the other program?

^that example is for multi tasking

now lets take an example of non multi tasking.

lets say you open the same one program on both p4 hyper threading and another on p4 not hyper threading.

on which processor it will open faster and act faster (just doing one thing at a time)

i bet its the hyper threading because in hyper threading 2 threads will be busy fetching the idle pipelines instead of 1 and get all of the core busy.

so in overall, having hyper threading always wins, or is it not?

Note: both p4 hyper threading and not hyper threading having the same clock speed, example 3.0 GHz

Hyper threading all the way back from P4 to Corei7

Distinguished

Distinguished

Champion

Distinguished

Splendid

Distinguished

Distinguished

Splendid

Distinguished

Splendid

Splendid

Distinguished

Splendid

Distinguished

Splendid

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Splendid

Distinguished

Champion

Distinguished

Distinguished

Share this page