Hyperthreading performance?

ricno

Distinguished
Apr 5, 2010
582
0
19,010
On Intel CPU's (mainly the server line) how is the performance when executing code on the hyperthreaded "cores"? That is, how does it compare to the normal cores.

If a CPU is equiped with hyperthreading support, should it generaly be enabled or should it be done only in certain cases?

 

ricno

Distinguished
Apr 5, 2010
582
0
19,010


Yes, that is natural if I run systems/applications which does place load on low number of threads, but I am wondering really two things:

1. If I get all logical CPU's loaded - how will the hyperthreaded ones compare to the real ones?

2. Is there any disadvantage to have it enabled even if there is light load on a server?
 

ricno

Distinguished
Apr 5, 2010
582
0
19,010
Thank you for your reply! So heat increase could be a disadvantage, that is interesting.

I seem to remember that when HT become to available in servers some 6-7 years ago that there was some talk that it could actually lower the performance of a system if not used properly. (Perhaps if not using applications with good multithreading and similar.) But that the ordinary execution could be worse, but I do not know if that was true.

So, if heat is not an issue and on a modern processor, is there any risk for the system performing worse under low/medium usage?

And if really being on a system with high thread usage, how is the performance on a HT core compared to a "real" one?
 

ricno

Distinguished
Apr 5, 2010
582
0
19,010


Mostly the newer processors, but I am also interested if there is any performance difference between the older HT and the new one in the Nehalem line.

Is it correct that HT has been "missing" for some years in Intels server CPU's?
 

ricno

Distinguished
Apr 5, 2010
582
0
19,010


Thank you for your reply!

I know that Vmware ESX servers are Hyperthread aware and take this into account when doing process scheduling. Do you know if Windows Servers (say 2003 or 2008) can do this too?

 

ricno

Distinguished
Apr 5, 2010
582
0
19,010


Thanks, it was an good white paper. Interesting that even Windows XP has awareness for HT in the thread scheduler. Do you happen to know if Linux has the same feature? (It seems very reasonable it has.)




From both the whitepaper above and from other sources it is mentioned that on older HT systems if running threads on both logical CPU's the performance gain would be around 10-30% compared to physical processors/cores. Do you know if this is changed with the Nehalems?

That is, how does the second logical processor compare to the first if there is competition of the CPU resources today?

Does the logical HT processors share the CPU cache with the first one on each core?
 
I am outdated as far as newer xeons but I run a Dual Xeon 3.2 HT (Prestonia 1mb L2) and on encoders and other multithreaded apps it runs better obviously than a single cpu. On 3dmark03 my ancient systems matches up to core2duo's ei: E6600's with cards like the 9800gt.
XP Pro,Vista ultimate,Server 2003,server 2008 all recognise at least 4 cores while vista basic and premium don't.
It makes running multiple apps way better also. Even though my system is old I have had a 700mb avi encoding and burning,browsing with multiple ie,ms outlook and wmp playing music at the same time and no lag.
I will use DVDFlick with two instances with two threads a piece running at same time and get about 75fps per encode.
Remember this is 2003 technology.
Check out 2cpu.com regarding xeon's and ht.
That is that websites specialty.
"Creamy Smooth Xeon Goodness"
 

ricno

Distinguished
Apr 5, 2010
582
0
19,010


Even just an opinion is interesting, thanks for your replies.

 
Thanks, it was an good white paper. Interesting that even Windows XP has awareness for HT in the thread scheduler. Do you happen to know if Linux has the same feature? (It seems very reasonable it has.)
It definitely does.
From both the whitepaper above and from other sources it is mentioned that on older HT systems if running threads on both logical CPU's the performance gain would be around 10-30% compared to physical processors/cores. Do you know if this is changed with the Nehalems?
It depends on how well threaded an application is and certainly other factors.
That is, how does the second logical processor compare to the first if there is competition of the CPU resources today?

Does the logical HT processors share the CPU cache with the first one on each core?
This article explains it (to a degree): http://en.wikipedia.org/wiki/Hyperthreading
 
Nice link GhislainG.
If you look in that article modern HT uses a replay queu to reduce cache thrashing so that definitely answers OP question.
This link from some little known website is interesting:
http://www.tomshardware.com/reviews/single-cpu-dual-operation,549.html
I run a dual xeon HT so it is fascinating to read. I actually have the Xeon Prestonia (northwood) 3.06's in my parts "vault"
On Activision's website their system evaluator read it as a 4.44 chip.
The point is if you can do a modern Xeon HT system especially a dual rig it would be radical.
Not only multithreaded apps but the ability to multitask with many open apps using the right OS is awesome.
Basically the more threads you can run the better.
2cpu.com is cool website to beat a dead horse.
(though it is no THG)
 

GunBladeType-T

Distinguished
Jul 8, 2010
553
0
19,010



Hyperthreading ensures that the Proccessor is utilizing a full load 100% of the time due to some of the programs using certain execute-read functions in the pipeline while other commands aren't being used so, Hyperthreading using unused command in the pipeline for other programs or software to keep it utilizized at 100% effieciency. U should check out netburst architecture on wikipedia, I remeber reading up on the tech a long time ago whent he Netburst architecture first came out!
 

iqvl

Distinguished
Apr 17, 2010
244
0
18,710
@OP:

Get an i7 only if more than 50% of your time are spent on multithreading jobs such as video encoding. Though, I don't think that's possible even for most professional CADers.
 

ricno

Distinguished
Apr 5, 2010
582
0
19,010


I have actually recently bought an i7 860 which will be used mostly for virtualization of operating systems. I think that it will do well even when running many simultaneous virtual machines.







I am mostly asking from a theoretical point of view, as it will of course be different in a real situation of how a certain application is written in terms of multithreading.

Let as say this situation:

One physical CPU with four cores and HT enabled.

We first start four single-threaded applications (for simplicity) which all are very CPU consuming. The OS should schedule these to the first logical processor of each core. Assume that they use all CPU time available and also assume that the performance of the application is "100 points". :)

So when running four instances we get the hypothetical score of 100 of each process. Assume now that we start another four instances of this application which all will consume all available cpu time. Now the OS scheduler will have to place these eight processes across all 8 logical CPU's.

The question is now what the "performance score" could reasonable be? The older implementation of Hyper Threading speaks about that the gain for the second logical cpu would be 10-30%, but I wonder how the newer Nehalem cpus will perform.

(Of course it is impossible to say anything specific in a made-up situation, but is it still 30% or is it 50% or anything other?)


 

ricno

Distinguished
Apr 5, 2010
582
0
19,010


It is mostly an interest to know the technology actually, but the systems that I think most of is Vmware ESX servers with many virtual servers running. That is a lot of operating systems with lots of processes / threads with different amount of CPU usage.

However, when you say that the performance increase could be as low as 0%, how is that possible?

 

ricno

Distinguished
Apr 5, 2010
582
0
19,010


I have done some testing now with Windows 7 with a Intel i7 860 (4 cores and HT enabled) and when placing some load on the system the operating system scheduler seems to almost exclusively use the logical CPU 1, 3, 5 and 7, i.e. the real cores. It really seems to avoid the other four logical CPU:s which is the "faked" hyperthreaded ones.

So Win7 is also certainly hyperthreading aware, but I still wonder of the performance of the HT threads since it at least seems to be unwilling to use them.


 

Yianpap

Distinguished
Jan 6, 2011
2
0
18,510
I have an i7 920 (4 physical, 8 virtual cores with HT) on Windows 7 64-bit. I ran a Monte Carlo simulation on a single thread/core, then 4 (without HT) and then 8 (with HT). I do not remember the exact times now but I was able to perform about 5 times more simulations in the same time with HT, compared to when using a single core.
That is with HT it was like effectively having an extra physical core, or 25% more power.
Alternatively, if you count your cores in logical units then they count on average like 60-65% of a physical one.
When running without HT, I got slightly less than the 4x gain I expected (because the code is 100% "parallelizable", it just needs to run independent iterations that can be spread equally across cores) which I found strange. So if I compare the non- HT run with the HT run (both multithreaded) then the gain was even more than 25% (30-35%).

I guess it all depends on how much of your application's work can be parallelised internally. If it's not 100% "multithreaded" as mine was, then your gain will probably be less.
Then again I am not sure that the way I implemented multithreading is the optimum so don't take my figures for granted!