Is a dual-core 1.0 GHz equal to single-core 2.0 GHz (read more)?

NugieDX

Reputable
May 28, 2014
16
0
4,510
Assuming ALL other parameter is the same (micro-architecture, FSB speed, etc etc),
does a dual-core 1.0 GHz have the same performance with a single-core 2.0 GHz?
Actually, that number is just an example. You can change that to, say, a quad-core
1.8 GHz vs a dual-core 3.6 GHz. You know what I'm saying, right? ;)

If they're not equal, could and would you tell me why?
Thanks.
 
Solution


Engineer here, I can answer this question with some authority.

The go-to academic rule for expected gains from increasing concurrency is Amdahl's Law. All programs have some components which simply cannot be explicitly parallelized through any method (microarchitectures perform implicit parallelization through superscalar and reordered execution). The...

Powerbolt

Honorable
Oct 21, 2013
413
0
10,960
Assuming all things equal I would suppose that they would be relatively equivalent in terms of performance. I can't come up with any logical reason why they wouldn't be, that is. I'm sure someone will be along shortly enough to go into deeper semantics about it. These kind of questions get things fired up around the CPU forums. Lol.
 
That's a bit more complicated, really. Multi-core systems tend to feel more responsive because Windows (or other OS) can allocate different tasks to different cores concurrently without making any one core switch tasks as often. However, to accelerate one program, it depends on the program. In general you can't expect a program written for one core to use multiple cores if run on a multi-core system (it will run on one core allocated to it by the OS). To be accelerated by a multi-core system a program has to be written in such a way as to be multi-core aware (which is not a simple programming exercise in general)

Multi-core CPUs are excellent natural muti-taskers and came about largely because of heat and power issues (running up clock speeds takes more power than adding cores)
 


No, a dual core system will NEVER be as good at a single task as a single core of twice the speed. When you put multi core systems into the mix there is loss of productivity across cores from the code, IE memory calls and such in the processors and motherboard. So they will never be as fast as a single core at a single task.
 

Omegaclawe

Honorable
Sep 28, 2013
156
0
10,710
No. All other things equal, the single core will be much faster with current software. As developers improve, this gap will shrink, but there will always be inefficiencies. The single core will even be faster in two completely unrelated loads in most cases, due to memory load, though that could be improved in the future.

However, that's rarely a good comparison, as architectures vary wildly in GHz efficiency. For instance, when I first built a PC, AMD had the Athlon 64 series, which, at 2Ghz was considerably faster than the 3GHz Pentium 4's at the time. In more modern times, AMD's 4+Ghz FX cores, per core, are slower than 2GHz Ivy Bridge / Haswell Cores.

I suggest looking at reviews and benchmarks of processors in the sort of programs you'll be using (CAD, Games, etc.) instead of just trying to compare the total GHz, because really, they don't count for shit.

tl;dr: The single core is better, even when going to 2 vs 4 cores, etc. But GHz is a pretty useless measurement in the real world.
 

Vlad Rose

Reputable
Apr 7, 2014
732
0
5,160
Pretty much what SchizTech said. However it is to be noted that multicore CPUs still share the same resources (bus, etc). Also, no program has been fully optimized to take 100% advantage of multiple cores. I don't even believe is possible to have 100% due to code results relying on previous code results (example, z=x+y relying on the results of x and y first). As a result, from a purely performance point of view, the single core would win. From a user perspective though, the computer may feel faster with a dual core since an OS runs more than 1 program at once at all times.
 

NugieDX

Reputable
May 28, 2014
16
0
4,510
So it really depends on the program.
Now that you mention it, I remember I've read that somewhere.

So, if there's a single-core task, the dual-core would lose (in aforementioned scenario).
But, I think single-core task is already scarce nowadays, right?
 

Omegaclawe

Honorable
Sep 28, 2013
156
0
10,710

There are actually loads that are significantly better with more cores, but in those cases, 4 is nowhere near enough. You want hundreds, or thousands of cores. For instance, bitcoin mining, graphics rendering, physics, etc. This is why we use GPU's for it. They're basically multi-thousand (for high-end) core processors with really stupid cores.
 

Omegaclawe

Honorable
Sep 28, 2013
156
0
10,710


Actually, single-core tasks are still the norm, just like 32-bit is still more common than 64-bit despite almost all PC hardware being capable of the latter. Multicore coding is hard and programmers are lazy.
 

NugieDX

Reputable
May 28, 2014
16
0
4,510

I'm learning a lot today. :lol:
There's something I want to ask about GPU too, but not here and not now.
 

Vlad Rose

Reputable
Apr 7, 2014
732
0
5,160


True, but those are multiple processes in each of itself. Each piece in a bitcoin run independent of one another. Each spline in a rendering runs independent of one another. It's where multiple low power (stupid) cores really shine as they aren't reliant on one another's results.
 


Engineer here, I can answer this question with some authority.

The go-to academic rule for expected gains from increasing concurrency is Amdahl's Law. All programs have some components which simply cannot be explicitly parallelized through any method (microarchitectures perform implicit parallelization through superscalar and reordered execution). The fraction of the program that this strictly serial code comprises establishes an upper bound on the expected speedup.

So, from a purely computer-science approach, a 2.0Ghz single-core microprocessor is better than a 1.0Ghz dual-core microprocessor if and only if all else is equal. If there were no strictly-serial components in the code, and overhead is not considered, they would be equal.

From a computer engineering approach, all bets are off. The trouble with Amdahl's Law is that it is an observation made in the early days of computing, before multiprogramming and resource management became standard parts of most micro-architectures. Now, since we're analysing the hardware rather than the software we have to change our perspective. Rather than quantifying performance in terms of time taken to complete a particular task, we quantify performance in terms of aggregate operations completed across all running tasks within a fixed unit of time. The former is rather self explanatory, but the latter is often denoted as Instructions-per-clock, or IPC. IPC is a useless measure for comparing architectures with dislike instruction sets (such as ARM vs x86) but works fine for comparing architectures with compatible instruction sets (such as AMD's FX series vs Intel's Core i7 series).

IPC is where things start to get really interesting. A traditional ISA core executions instructions in sequence until it is powered off. This sequence is altered by flow control instructions that are part of the program itself, or interrupts that are part of the hardware. There is however no requirement that the ISA core execute instructions exclusively from one context (commonly called a thread) at a time. The only requirement is that the logical equivalence of each context be preserved over time and that the contexts be isolated from each other. The technology which enables a single core to pick and choose instructions from two or more contexts at once in order to maximize instruction throughput is called Simultaneous Multi Threading, or SMT. Intel's proprietary two-thread implementation of this is called Hyperthreading.

SMT allows a single core to be more efficient when running two programs at once as it suppresses the effects of long-latency events such as cache misses. This seems to immediately sidestep Amdahl's law. The performance of the system as a whole is improved by enabling two programs to be executed simultaneously on a single core, including the serial-only sections. The net effect is that while each program may execute somewhat slower due to resource sharing, they will both complete faster when run concurrently than when run sequentially.

Now, lets look at what happens when there are multiple ISA cores accessing the same physical memory space.
Multi-core and multi-socket systems are variations of the same thing, Symmetric Multi Processing, or SMP. SMP sounds somewhat similar to SMT and indeed they have quite a bit in common. The difference is that while SMT allows a single ISA core to more efficiently use the resources that it has, SMP duplicates that ISA core multiple times and hence duplicates the execution resources. For the sake of simplicity, lets focus on a single-socket deployment only, where all cores exist within the same physical package and on the same physical die (as this is no different than multiple-sockets).

Most modern architectures from all manufacturers have separate level1 instruction and data caches. These caches are kept small to reduce access time, and are kept separate because they are both accessed on nearly every cycle and in different stages of the instruction pipeline. The L1 cache is exclusive to a single core, but is shared between contexts running on that core (meaning that SMT shares L1 cache). L2 cache is larger and slower than L1, and may be shared by multiple cores. L3 cache is larger and slower than L2 cache, and is typically shared by all cores within the package (excepting packages that contain multiple CPU dies such as Core 2 Quad).

More cores means more cache; L1 at a minimum, but most likely L2 as well. A higher operating frequency means that the data in the cache gets accessed more frequently, and the more frequently it gets accessed the more frequently it will not find the data that it is looking for; this is called a cache-miss. A cache miss requires that data be loaded from main memory into the cache all the way down to L1 where it can then be used for execution. If the data is not in main memory and has been swapped out to a backing store such as a hard disk drive, the microprocessor will most likely suspect the running process and switch to another until the load is completed. Modern microprocessors have many features to mask the effects of cache misses such as out-of-order execution and prefetching but these only stretch so far. SMT can fill in the gap by shifting resources to another thread on the microprocessor that has not encountered a miss, but again, this only goes so far. If a microprocessor simply cannot continue, it must stall. Stall cycles are cycles where no instructions are executed. Stall cycles are inevitable, especially in superscalar micro-architectures which execute multiple independent instructions at once (in which case the stall is issued on a per-port basis), but minimizing stall cycles is key to maximizing IPC. If the CPU's operating frequency gets too far ahead of the cache controller's ability to write and load data to and from main memory, it will introduce more and more stall cycles. This is why Intel's celeron microprocessors were very popular for setting overclocking records, they had almost no CPU cache to speak of and really just set the world record for the fastest rate at which nothing was done.

SMP on the other hand introduces more cache to work with, albeit not to the direct benefit of any other core. Good memory controllers can keep up with slower SMP systems than they can with faster uniprocessor systems.

So, in summary there are an awful lot of factors to consider when determining whether or not a 1.0Ghz dual-core system is superior to a 2.0Ghz single-core system. If the work is aggregate, the dual-core will win in most cases, but if it's a single very specific task that is highly sequential, the 2.0Ghz single-core might win.
 
Solution