Does Windows know how to maximally assign workloads to threads/cores in a quad-core processor that has Hyper-Threading on?

Status
Not open for further replies.

wulfay

Distinguished
Oct 17, 2011
6
0
18,510
So, I have been trying to google for this answer, but I cannot find a direct technical answer to the question. (Or if it did, it was in a confusing way)

So, say you have a program/game that uses 2 threads, and only 2 threads. With a quad-core CPU and hyper threading on, will it always know to still use two PHYSICAL cores instead of just 2 threads, using 50% of the cpu instead of 25% of it? Does Hyper-threading every cause problems of this type, artificially using half of what it really could/should do? How does Windows know how to handle all of this?

Does having the cores split up in threads decrease the efficiency vs. using 1 thread for 1 core, is essentially what I'm asking. I have seen some discussions and benchmarks related to hyper threading not affecting performance too much in one way or the other, but I am curious how it works.

Thanks! Sorry if the way I asked the question was odd, just trying to be clear over concise.
 
Solution
Windows can tell you that a processor has hyper threading and can therefore plan the thread distribution accordingly, it is built into the windows system libraries
http://msdn.microsoft.com/en-us/library/aa394373(v=vs.85).aspx
To determine if hyperthreading is enabled for the processor, compare NumberOfLogicalProcessors and NumberOfCores. If hyperthreading is enabled in the BIOS for the processor, then NumberOfCores is less than NumberOfLogicalProcessors. For example, a dual-processor system that contains two processors enabled for hyperthreading can run four threads or programs or simultaneously. In this case, NumberOfCores is 2 and NumberOfLogicalProcessors is 4.

If you know that hyper threading is enabled and if you know that...

Rui Martins

Honorable
Jan 24, 2014
1
0
10,510
To be clear, lets define these Terms first:

Core - The concept of what an original CPU is. A core executes a software thread.

Software Thread - A sequence of instructions that a program follows.

Hypher Threading - It's a fake Core. Only a minute part of the core was cloned (namelly registers), in essence it simulates an extra Core, but it's only able to use the real core "free time" to run another software thread. It shares all other resources (ALU, Cache, memoryAccess, Bus, etc ... )

Core Thread - is similar to hyperthreading, in the sense that it's a fake core, i.e. an incomplete core clone. The real difference is that a Core Thread includes more private resources than what exists with hyperthreading alone (another set of registers), how many more resources, depends on the actual technology being used (for example, it could have it's own instruction fetcher).

CPU - represent a processing unit or in other words, a processor of a stream of instructions, so you need something that knows how to process instructions (a core) and a stream of instruction (a thread). This can be made by putting together Hyperthreading with an existing core, or using a core thread in an existing core.

Windows, knows "CPU"s only!
Windows does NOT know what a "Core" or "Core Thread" (in CPU terms) his.

A typical application, can have tens of software threads.

An operating systems refers to an aplication as a "process", and you can see these listed in windows "Task Manager", in the tab "processes". A process is composed of one or more software threads.
A software thread is also called a light weight process, since it shares memory with the process, but as a different and parallel execution sequence.
Operating system schedules processes (by scheduling its threads) by priority.
By default a process can run on any of the existing CPUs.

If you open Task Maanger, tab "processes" and you right click on a process, there is an option called "set afinity", which allows you to define on which CPUs your process (its software threads) can run on.

In some very special cases, it can be faster to force a process to run on a single CPU, to minimize context switching.

So best of worlds, is to have full cores, also sometimes named just "core".
So having 4 core 4 threads (one thread per core), is better than having 2 cores and 4 threads (2 threads per core).

NOTE: Assuming the same core speed, the least you share the faster you get.

To Resume imagine this:
- A core is like a pipe. More Mhz means larger diameter pipe, hence more water (or instructions) goes through.
- A core thread is like a tube or hose, that flows water (or instructions).
- A CPU is a pair of a pipe with a hose inside.

So windows only see CPUs (a pipe+hose pair), i.e. outputs streams (water or instructions)

If you have 2 pipes and 2 hoses, there is only 1 hose inside each pipe, hence hoses can be as large as the pipe.
You have 2 output streams (water or instructions), since you have 2 hoses.

If you have 1 pipe and 2 hoses, there are 2 hoses inside the same pipe, hence hoses have to be smaller to fit inside the same pipe.
But you still have two streams of output (water), since you still have 2 hoses, although smaller.
 

DeathAndPain

Honorable
Jul 12, 2013
358
0
10,860
Yes, but that does not really answer his question, which is an interesting question after all. I am not sure how aware Windows actually is of the difference between real and fake cores. All it displays e.g. in task manager are "cores". I know for sure that older generations of Windows, such as NT 4.0 Server, knew nothing about hyperthreading. You could still use HT on them; they simply regarded every fake core as a real CPU and treated it accordingly - all the way down to licensing! (the cost of your windows server license depends on how many CPUs you are employing. If your Windows treats every core of your modern processor as a separate CPU, then things quickly get expensive. Which is why this was fixed in later Windows versions; they recognize multi-core processors as such - but still display every core as a core, fake or not.

Returning to wulfay's question: He talks about an application which can utilize a maximum of, say, 2 cores (Starcraft 2 would be an example for this). So the game allocates two "cores" from the CPU. wulfay now argues that this is a matter of luck: The cores that are being allocated could actually belong to the same physical (hyperthreaded) core, so that this single core must do the work of both threads, and all other 3 cores are sitting idle. This is most obviously less efficient than if the 2 allocated cores belong to 2 different physical cores. The question is now whether Windows - or the CPU - is intelligent enough to make sure the first case never occurs.

All I know is that when an application can use less cores than available, e.g. only one core, Windows detects that this core is at high load while the others are idle. In order to better balance the load, Windows shifts the application to another core. However, this only achieves that this latter core goes to high load and the first one goes idle. This way Windows keeps desperately rotating the application from core to core within milliseconds without ever achieving anything in the process - a long criticized weakness of Windows. Funny enough this causes task manager to display a pretty equal utilization of all cores, so that it looks as if the application would scale nicely over several cores when it does not.

I am not sure whether this core-shifting has an impact on performance (apparently it does not), but it sure has an impact on power-saving, because it prevents the other cores from entering the deeper sleep states that they could use if they were not being used every few milliseconds. I remember reading that with the introduction of Windows 8.1 Microsoft reportedly altered the allocation algorithm to detect these situations and refrain from pointless core-switching. Pretty late a reaction, seeing that the problem has been known since at least Windows XP.

Personally, I always endeavor to determine how many cores a game can use, and then set its core affinity to the corresponding number of cores. That way I prohibit shifting this application to any of the other cores, allowing these to go to sleep without losing the faintest amount of performance.

The only game I know that does this on automatic is Medieval 2 - Total War. When you Alt + Tab out of the game and check its CPU affinity in Task Manager, you find that it allocates itself to core 0 alone.

wulfay's question remains unanswered though (unless you count bouncedk's short and reasonless answer).
 

ganon11000

Honorable
Jul 21, 2012
1,102
3
11,660
Here is the simplest answer: its closed source so NOBODY but Microsoft can know (without hacking). You can either trust it if you use applications that have more threads than your CPUs cores or you can turn it off and you won't have to worry.
 

DeathAndPain

Honorable
Jul 12, 2013
358
0
10,860

You are a little quick with affirming such a thing. Besides the fact that hackers do exist (and they are good enough to disable the forced activation system), there are other ways to find out. For instance, you could try an application that can use exactly 2 cores (such as Starcraft 2) and set affinity to cores 0 and 1 (so that it may not use any other cores). Then measure its performance. Then pick 2 different cores that it may run on. Whenever you pick 2 cores which belong to the same physical core, you should notice a significant performance drop.

Once you have recorded what performance looks like on two different physical cores as opposed to 2 hyperthreads on the same core, remove the affinity setting, so that Windows may assign any cores of its liking to the application. Then measure again. Windows (below 8.1) keeps rotating the application across all its cores in a vain attempt to balance the load (this is a known and provable fact). If you see a performance less than what you measured when you forced 2 different physical cores to be used, you know that at some times the same physical core is being assigned. On the other hand, if Windows is intelligent enough to detect hyperthreaded cores and tries first to disperse the load on the real cores, then the performance should match your best case.

I am honest enough to say that I cba to do this, but that does not mean that it cannot be done.
 
Windows can tell you that a processor has hyper threading and can therefore plan the thread distribution accordingly, it is built into the windows system libraries
http://msdn.microsoft.com/en-us/library/aa394373(v=vs.85).aspx
To determine if hyperthreading is enabled for the processor, compare NumberOfLogicalProcessors and NumberOfCores. If hyperthreading is enabled in the BIOS for the processor, then NumberOfCores is less than NumberOfLogicalProcessors. For example, a dual-processor system that contains two processors enabled for hyperthreading can run four threads or programs or simultaneously. In this case, NumberOfCores is 2 and NumberOfLogicalProcessors is 4.

If you know that hyper threading is enabled and if you know that hyper threaded cores are either alternately or if they are placed all at the end you can easily aim your threads to the primary cores and avoid pushing multiple tasks to a single physical core before you have made use of all the others.

It is ignorant to say that nobody but microsoft can know, computer science/architecture isn't a mystery field, microsoft has tons of APIs for windows which give you lots of information about the system so that you can better tune your application to the environment if you don't like how windows organizes loads by default.
 
Solution

technochi

Reputable
Mar 29, 2014
1
0
4,510


i should imagine it would depend on the programmer's design for the game, instructing the processor to make up its own mind, or ordering it to do one or the other

technochi computers
 

xaminmo

Reputable
Oct 7, 2014
3
0
4,510
The Windows scheduler is not very intelligent. Windows NT 6.1 (eg Windows 7 and Server 2008 R2) will rebalance processes across threads periodically, but all it knows about is virtual processors, so all threads across all cores look equal to it. If two threads are on the same CPU, or if they are backed by fractional VMWare resources, Windows has no clue.

In general, soft threads (ie, instruction pipelines) are assigned sequentially: 0 and 1 would be off of core 0; 2 and 3 would be off of core 1; etc.

If you run two high-resource workloads on multi-core, HT system, Windows will NOT make the best use of the CPU. You'll have one core working at 100%, with both of its threads spinning away, competing for the same logic units.

Compare to higher-end server operating systems, such as AIX, which are generally aware of the underlying architecture, and will rebalance workloads for peak performance, taking into account I/O wait (waited cycles), memory affinity, cache affinity, and wattage usage. High-CPU-Time processes which run in user space most of the time would be separated to different cores automatically as necessary by the OS scheduler.

For lower end operating systems like Windows, you can play with the processor affinity masks for your processes, but most software I've seen does not go to this effort, rather leaving it as a manual exercise for the admin/user.
 

Derek Roberts

Reputable
Jan 24, 2015
7
0
4,510
By default Windows uses a CPU with 4 physical and 4 logical cores as if it is a CPU with 2 physical and 2 logical cores (a dual core w/HT). To save on energy, when they're not needed, cores 3 and 4 are disabled (along with their logical cores), so most of the time there are 4 "parked" cores on the system.

0-3 are the physical cores
4-7 are the logical cores

If Parked cores are enabled ("unparked") in the registry, Windows will utilize all 4 physical cores (and 4 logical cores) always, but still the load on all cores will not normally be balanced, unless running four 32 bit processes that all require relatively the same amount of computing power, or one 64 bit process that knows how to evenly disperse it's computing needs. When possible, a program or process will choose to run all of it's threads on a single physical core to avoid the logistics of sending that information to two different physical locations instead of one. In other words, if running 4 different processes that each contribute 10% to the total CPU load, the CPU will run each process on a different physical core, so each core has a 10% load. If running 1 process that contributes for 70% of the total CPU load, the CPU will try to run all of the threads on one physical core and it's logical core rather than sending it to two different physical places.

Another reason that the loads on the CPU's won't often "balance" deals with the physical and logical pairs of cores. Generally a physical core will not "balance" well with it's logical core because both of them combined can still only process the workload of one core, so while the physical core is busy the logical core has to be waiting and vice versa.

After disabling the "core parking" feature the OS is noticeably more aggressive utilizing the CPU.
Disable core parking with this tutorial:
https://www.youtube.com/watch?v=pl3u9eiskM4

You can also enable the number of CPU cores to use at startup by running msconfig and on Boot tab choose Advanced Options, then check Number of Processors and select the highest value in the list (physical + logical). If you choose a lower value the OS will disable cores at next reboot. I think "default/unchecked box" is all processors.
 


Yes and no.

Windows uses a preemptive round robin scheduler with 32 priority levels.

Originally, Windows XP treated microprocessors with SMT (aka Hyperthreading) no different than a traditional multi-processor system. For the purposes of scheduling, all logical processors were treated equally; a single-core microprocessor with SMT was treated the same as a dual-core microprocessor without SMT; as was a pair of single-core microprocessors without SMT in a two-socket configuration. The onus to properly determine the optimal scheduling policy and set kernel thread affinity rested entirely with the process. This policy had few consequences because there were very few highly multithreaded programs at the time and those that did exist did not have tight real time constraints.
Starting in Windows Vista, the Windows scheduler got a bit more intelligent when multi-core CPUs and/or SMT were used. There are now a number of different ways to tune the scheduler, especially when minimizing power consumption is desired. Assuming that power consumption is not an obstacle, the scheduler will by default try and balance threads so as to make optimal use of execution and cache resources. Idle logical processors that do not share resources will be scheduled before idle logical processors that do share resources with a non-idle logical processor. Threads belonging to different processes may be scheduled on logical processors that do not share caches (multi-socket configurations) whereas threads that do belong to the same process may be scheduled on processors that do have a common cache architecture.

Given that the SMT threads use the same L1 caches there are some cases where scheduling two threads from the same process on a pair of logical-processors that share resources may be beneficial if they share a lot of data. A good example of this would be a high priority IO thread passing messages to a normal priority logic thread through shared memory. If both threads share a cache hierarchy, cache lines will not have to be invalidated when the synchronization objects are locked.

Thread scheduling is not a one-size-fits-all scenario, so the best policy is for the developer to tune thread priority and processor affinity on a per-thread basis. Developers know much, much more about what their program is doing than the thread scheduler does.
 


Prettymuch everything that you wrote in here is nonsense. Stop making stuff up.
 
Status
Not open for further replies.