Sign in with
Sign up | Sign in
Your question

Hyperthreading performance?

Last response: in CPUs
Share
July 20, 2010 11:23:31 AM


On Intel CPU's (mainly the server line) how is the performance when executing code on the hyperthreaded "cores"? That is, how does it compare to the normal cores.

If a CPU is equiped with hyperthreading support, should it generaly be enabled or should it be done only in certain cases?

a b à CPUs
July 20, 2010 11:27:52 AM

It will depend on the programs that you are running whether it will utilize hyper threading.
m
0
l
July 20, 2010 11:34:50 AM

Tamz_msc said:
It will depend on the programs that you are running whether it will utilize hyper threading.


Yes, that is natural if I run systems/applications which does place load on low number of threads, but I am wondering really two things:

1. If I get all logical CPU's loaded - how will the hyperthreaded ones compare to the real ones?

2. Is there any disadvantage to have it enabled even if there is light load on a server?
m
0
l
Related resources
a b à CPUs
July 20, 2010 11:52:49 AM

Generally speaking, only certain benchmarks show performance gains with Hyper-Threading enabled.There is no noticeable changes in real world scenarios.And disabling it will lower your CPU temperatures.
m
0
l
July 20, 2010 12:07:05 PM


Thank you for your reply! So heat increase could be a disadvantage, that is interesting.

I seem to remember that when HT become to available in servers some 6-7 years ago that there was some talk that it could actually lower the performance of a system if not used properly. (Perhaps if not using applications with good multithreading and similar.) But that the ordinary execution could be worse, but I do not know if that was true.

So, if heat is not an issue and on a modern processor, is there any risk for the system performing worse under low/medium usage?

And if really being on a system with high thread usage, how is the performance on a HT core compared to a "real" one?
m
0
l
a c 113 à CPUs
July 20, 2010 12:30:09 PM

ricno, are you referring to the older Xeon or the Nehalem (55xx series) processors? If you use Oracle or SQL Server, then you probably want HT enabled on Nehalem based servers.
m
0
l
July 20, 2010 12:37:33 PM

GhislainG said:
ricno, are you referring to the older Xeon or the Nehalem (55xx series) processors? If you use Oracle or SQL Server, then you probably want HT enabled on Nehalem based servers.


Mostly the newer processors, but I am also interested if there is any performance difference between the older HT and the new one in the Nehalem line.

Is it correct that HT has been "missing" for some years in Intels server CPU's?
m
0
l
a c 113 à CPUs
July 20, 2010 12:45:20 PM

HT on Nehalem is much better than it was several years ago. You are correct when you say that HT has been "missing" for some years in Intel Xeon processors. You may be interested in this article: http://www.anandtech.com/show/2743/6 Note that some applications don't benefit from HT, but only you knows what your servers are being used for. I love Nehalem processors when running Oracle and SQL Server.
m
0
l
July 20, 2010 12:52:16 PM

GhislainG said:
HT on Nehalem is much better than it was several years ago. You are correct when you say that HT has been "missing" for some years in Intel Xeon processors. You may be interested in this article: http://www.anandtech.com/show/2743/6 Note that some applications don't benefit from HT, but only you knows what your servers are being used for. I love Nehalem processors when running Oracle and SQL Server.


Thank you for your reply!

I know that Vmware ESX servers are Hyperthread aware and take this into account when doing process scheduling. Do you know if Windows Servers (say 2003 or 2008) can do this too?

m
0
l
July 20, 2010 8:56:52 PM

GhislainG said:
Even Windows XP is HT aware. However, some applications still won't benefit from HT. This document explains how an HT aware OS works: http://download.microsoft.com/download/5/7/7/577a5684-8...


Thanks, it was an good white paper. Interesting that even Windows XP has awareness for HT in the thread scheduler. Do you happen to know if Linux has the same feature? (It seems very reasonable it has.)


GhislainG said:
HT on Nehalem is much better than it was several years ago.


From both the whitepaper above and from other sources it is mentioned that on older HT systems if running threads on both logical CPU's the performance gain would be around 10-30% compared to physical processors/cores. Do you know if this is changed with the Nehalems?

That is, how does the second logical processor compare to the first if there is competition of the CPU resources today?

Does the logical HT processors share the CPU cache with the first one on each core?
m
0
l
a b à CPUs
July 20, 2010 10:11:22 PM

I am outdated as far as newer xeons but I run a Dual Xeon 3.2 HT (Prestonia 1mb L2) and on encoders and other multithreaded apps it runs better obviously than a single cpu. On 3dmark03 my ancient systems matches up to core2duo's ei: E6600's with cards like the 9800gt.
XP Pro,Vista ultimate,Server 2003,server 2008 all recognise at least 4 cores while vista basic and premium don't.
It makes running multiple apps way better also. Even though my system is old I have had a 700mb avi encoding and burning,browsing with multiple ie,ms outlook and wmp playing music at the same time and no lag.
I will use DVDFlick with two instances with two threads a piece running at same time and get about 75fps per encode.
Remember this is 2003 technology.
Check out 2cpu.com regarding xeon's and ht.
That is that websites specialty.
"Creamy Smooth Xeon Goodness"
m
0
l
a b à CPUs
July 20, 2010 10:13:56 PM

Regarding your question:
I would think that with each physical processor that the logical has to share the cache but with the larger caches today and shorter pipelines combined with fast fsb's it should execute well.
Just my opinion I could be wrong...
m
0
l
July 20, 2010 10:51:57 PM

king smp said:

I would think that with each physical processor that the logical has to share the cache but with the larger caches today and shorter pipelines combined with fast fsb's it should execute well.

Just my opinion I could be wrong...


Even just an opinion is interesting, thanks for your replies.

m
0
l
a c 113 à CPUs
July 21, 2010 1:23:36 AM

Quote:
Thanks, it was an good white paper. Interesting that even Windows XP has awareness for HT in the thread scheduler. Do you happen to know if Linux has the same feature? (It seems very reasonable it has.)
It definitely does.
Quote:
From both the whitepaper above and from other sources it is mentioned that on older HT systems if running threads on both logical CPU's the performance gain would be around 10-30% compared to physical processors/cores. Do you know if this is changed with the Nehalems?
It depends on how well threaded an application is and certainly other factors.
Quote:
That is, how does the second logical processor compare to the first if there is competition of the CPU resources today?

Does the logical HT processors share the CPU cache with the first one on each core?
This article explains it (to a degree): http://en.wikipedia.org/wiki/Hyperthreading
m
0
l
a b à CPUs
July 21, 2010 3:15:38 AM

Nice link GhislainG.
If you look in that article modern HT uses a replay queu to reduce cache thrashing so that definitely answers OP question.
This link from some little known website is interesting:
http://www.tomshardware.com/reviews/single-cpu-dual-ope...
I run a dual xeon HT so it is fascinating to read. I actually have the Xeon Prestonia (northwood) 3.06's in my parts "vault"
On Activision's website their system evaluator read it as a 4.44 chip.
The point is if you can do a modern Xeon HT system especially a dual rig it would be radical.
Not only multithreaded apps but the ability to multitask with many open apps using the right OS is awesome.
Basically the more threads you can run the better.
2cpu.com is cool website to beat a dead horse.
(though it is no THG)
m
0
l
July 21, 2010 3:18:56 AM

ricno said:
On Intel CPU's (mainly the server line) how is the performance when executing code on the hyperthreaded "cores"? That is, how does it compare to the normal cores.

If a CPU is equiped with hyperthreading support, should it generaly be enabled or should it be done only in certain cases?



Hyperthreading ensures that the Proccessor is utilizing a full load 100% of the time due to some of the programs using certain execute-read functions in the pipeline while other commands aren't being used so, Hyperthreading using unused command in the pipeline for other programs or software to keep it utilizized at 100% effieciency. U should check out netburst architecture on wikipedia, I remeber reading up on the tech a long time ago whent he Netburst architecture first came out!
m
0
l
a b à CPUs
July 21, 2010 4:08:42 AM

@OP:

Get an i7 only if more than 50% of your time are spent on multithreading jobs such as video encoding. Though, I don't think that's possible even for most professional CADers.
m
0
l
July 21, 2010 1:11:41 PM

iqvl said:
@OP:
Get an i7 only if more than 50% of your time are spent on multithreading jobs such as video encoding.


I have actually recently bought an i7 860 which will be used mostly for virtualization of operating systems. I think that it will do well even when running many simultaneous virtual machines.



ricno said:

From both the whitepaper above and from other sources it is mentioned that on older HT systems if running threads on both logical CPU's the performance gain would be around 10-30% compared to physical processors/cores. Do you know if this is changed with the Nehalems?


GhislainG said:
It depends on how well threaded an application is and certainly other factors.


I am mostly asking from a theoretical point of view, as it will of course be different in a real situation of how a certain application is written in terms of multithreading.

Let as say this situation:

One physical CPU with four cores and HT enabled.

We first start four single-threaded applications (for simplicity) which all are very CPU consuming. The OS should schedule these to the first logical processor of each core. Assume that they use all CPU time available and also assume that the performance of the application is "100 points". :) 

So when running four instances we get the hypothetical score of 100 of each process. Assume now that we start another four instances of this application which all will consume all available cpu time. Now the OS scheduler will have to place these eight processes across all 8 logical CPU's.

The question is now what the "performance score" could reasonable be? The older implementation of Hyper Threading speaks about that the gain for the second logical cpu would be 10-30%, but I wonder how the newer Nehalem cpus will perform.

(Of course it is impossible to say anything specific in a made-up situation, but is it still 30% or is it 50% or anything other?)


m
0
l
a c 113 à CPUs
July 21, 2010 1:26:55 PM

Back then it was a theory but the the Nehalem can achieve a performance increase between 0% and 40%. Again it depends on the application. You need to find benchmarks that represent what you'll use the system for.
m
0
l
July 21, 2010 1:56:09 PM

GhislainG said:
Back then it was a theory but the the Nehalem can achieve a performance increase between 0% and 40%. Again it depends on the application. You need to find benchmarks that represent what you'll use the system for.


It is mostly an interest to know the technology actually, but the systems that I think most of is Vmware ESX servers with many virtual servers running. That is a lot of operating systems with lots of processes / threads with different amount of CPU usage.

However, when you say that the performance increase could be as low as 0%, how is that possible?

m
0
l
July 30, 2010 7:56:20 PM


GhislainG said:
Even Windows XP is HT aware.


I have done some testing now with Windows 7 with a Intel i7 860 (4 cores and HT enabled) and when placing some load on the system the operating system scheduler seems to almost exclusively use the logical CPU 1, 3, 5 and 7, i.e. the real cores. It really seems to avoid the other four logical CPU:s which is the "faked" hyperthreaded ones.

So Win7 is also certainly hyperthreading aware, but I still wonder of the performance of the HT threads since it at least seems to be unwilling to use them.


m
0
l
a c 113 à CPUs
July 31, 2010 1:17:52 PM

It will use them once all physical cores are busy. You can probably achieve 100% CPU utilization when using Prime95. HT cores will never be as fast as physical cores, but they still help when running well threaded applications.
m
0
l
January 6, 2011 1:07:41 PM

I have an i7 920 (4 physical, 8 virtual cores with HT) on Windows 7 64-bit. I ran a Monte Carlo simulation on a single thread/core, then 4 (without HT) and then 8 (with HT). I do not remember the exact times now but I was able to perform about 5 times more simulations in the same time with HT, compared to when using a single core.
That is with HT it was like effectively having an extra physical core, or 25% more power.
Alternatively, if you count your cores in logical units then they count on average like 60-65% of a physical one.
When running without HT, I got slightly less than the 4x gain I expected (because the code is 100% "parallelizable", it just needs to run independent iterations that can be spread equally across cores) which I found strange. So if I compare the non- HT run with the HT run (both multithreaded) then the gain was even more than 25% (30-35%).

I guess it all depends on how much of your application's work can be parallelised internally. If it's not 100% "multithreaded" as mine was, then your gain will probably be less.
Then again I am not sure that the way I implemented multithreading is the optimum so don't take my figures for granted!
m
0
l
a c 113 à CPUs
January 6, 2011 8:14:15 PM

Your test is very good. Dispatching requires some overhead and that explains why you can't achieve a 4x gain when running without HT. If you achieved a 30-35% gain when enabling HT, then your code is well written.
m
0
l
April 4, 2013 1:39:07 PM

Hi,
I have a very CPU intensive application (I see 98% CPU usage for more than 95% of the time on a single core machine).
So when I use an Intel based server like "Kontron CG2100" with 12 cores and 24 vCPUs, I noticed I have to run 24 instances of my application to enable the parallel processing of independent tasks to fully utilize the available CPU.
What do I loses, if I make 1:1 virtualization so that I only have 12 vCPUs for 12 cores and then I only need to run 12 instances of my application and still able to use complete CPU!
For this kind of CPU intensive application scenario, is there any added benefit of hyper threading or 1:2 virtualization other than actually forcing ourselves to run double the number of processes and hence double the RAM usage and also increase I/O burden if there is any involved?

My understanding is this architecture is beneficial only if we want to use the machine for running many many (compared to # of physical cores) non CPU intensive (being idle or I/O waiting etc) apps.

Thanks for your replies.
m
0
l
!