Sign in with
Sign up | Sign in
Your question

Why is a core 2 duo 2.4GHz faster than a P4 3.0GHz?

Last response: in CPUs
Share
October 11, 2009 2:39:06 AM

I know that cache and FSB speed affects CPU performance but what other things have a significant enough effect to make a higher clocked processor run better than a slower one?

This question is asked for a comparison between the P4 dual cores and the core 2 duos and is also asked for single core performance between a single core P4 and one core of a core 2 duo 2 .4GHZ

Does it have to do with the instruction sets that they use?
a b à CPUs
October 11, 2009 3:59:54 AM

The simple answer is transistor count. Each successive generation of processors fit more and more on the chip. All the speed rating defines is how many times per second data can pass through the CPU. What it doesn't account for, is the amount of data it can process. In this link, you'll see the P4 has 42 million transistors. But the core 2 dou has 291 million. Half that for a single core comparision and that 42 vs 145.5. Over 3 times as many transistors. Now if you have an operation that only needs 40 million, the faster clocked CPU will win, but if you have 100 million, then the P4 would have to do 2 cycles vs less than 1 for the C2D.

Clear as mud?

http://en.wikipedia.org/wiki/Transistor_count
October 11, 2009 4:33:27 AM

instruction sets only help if a program can use them. but if it can, then newer instructions are usually more efficient that older ones. architecture has a greater impact on performance than the number of transistors. the transition between the pentium 3 and pentium 4 demonstraits that quite well. clock per clock, the P3 was faster than the P4 in most things when they first came out. the silicon and process used, along with electrical leakage affects the speed on the processor as well
Related resources
October 11, 2009 4:46:40 AM


Transistor count is not always the main factor for a processor to become more efficient. Inreference to the link provided by skora Intel Core2 Duo has 291million transistors and AMD K10 has 463million (Phenom I). But we all know that Core2 Duo is more effiecient compared to Phenom. But Phenom II X4 with 6MB L3 cache has roughly 758million transistors and we all know from the benchmarks that it competes with Core2 Quad very well. Clock speed doesn't always indicates the performance of a prpcessor either.

http://www.tomshardware.com/reviews/phenom-ii-940,2114-...

So, overall a processors efficiency depends on the following factors:

1. Processor Architecture.
2. Instruction set.
3. Cache accessibility.
4. Steppings.
5. FSB/HyperTransport Link/Quick Path.
6. Memory type, accessibilty.
7. Intercommunication links between processor with multiple cores.
and there may be more.

Hope this will throw some lights on your query. :) 


a b à CPUs
October 11, 2009 6:41:45 AM

Basically the core2 is a wider issue design and the prefetch is much more aggressive than the P4 and the cache latency is lower.

The core2 doesn't suffer from the terrible problem the P4 had with the REPLAY function:

REPLAY (a complete cache flush) is only meant to happen when branch prediction (speculation) goes wrong...

That is ... the correct instructions are executed ... with the wrong data.
REPLAY flushes the data and inserts the correct data and instructions ... taking huge chunks of time to do so.

Unfortunately the logic in the area of branch prediction on the P4 is so bad that REPLAY happens a lot ...

So the beast basically spins producing nothing ...

That massively effects IPC ... efficiency.

This is well known but poorly documented as I think Intel doesn't want the world + dog to know.

Hence I like to bring it up to remind them.

All CPU's have design faults ... AMD produced a heap of CPU's during the Athlon era with insufficient layers and had hotspots on the core for a few models ... which meant they didn't overclock (scale as well in frequency)

The Athlons had better IPC than the Pentiums for most of that generation (clock of clock) but Intel's superior process allowed them to scale better in frequency.

Clockspeed doesn't always win though.

A Pentium 4 running at 5.3Ghz has about the same IPC as a 2.75Ghz Athlon64 ... or a 2.5Ghz core2 ... roughly speaking.
a b à CPUs
October 11, 2009 6:48:08 AM

MU ... don't post and I might stand a chance of best answer ...
a c 122 à CPUs
October 11, 2009 7:17:51 AM

^lol
a b à CPUs
October 11, 2009 7:24:25 AM

trudat?
a b à CPUs
October 11, 2009 12:38:43 PM

cache
a b à CPUs
October 11, 2009 2:14:25 PM

Simple more cache, two floating point units across two cores, vastly improved IPC, more efficient pipline stages, and a LOT more.
October 11, 2009 4:17:07 PM

enayet_redeemer said:
Transistor count is not always the main factor for a processor to become more efficient. Inreference to the link provided by skora Intel Core2 Duo has 291million transistors and AMD K10 has 463million (Phenom I). But we all know that Core2 Duo is more effiecient compared to Phenom. But Phenom II X4 with 6MB L3 cache has roughly 758million transistors and we all know from the benchmarks that it competes with Core2 Quad very well. Clock speed doesn't always indicates the performance of a prpcessor either.

http://www.tomshardware.com/reviews/phenom-ii-940,2114-...

So, overall a processors efficiency depends on the following factors:

1. Processor Architecture.
2. Instruction set.
3. Cache accessibility.
4. Steppings.
5. FSB/HyperTransport Link/Quick Path.
6. Memory type, accessibilty.
7. Intercommunication links between processor with multiple cores.
and there may be more.

Hope this will throw some lights on your query. :) 


Thanks for your help. Do you have a reliable source/s for this information that I can do further reading on the subject?
October 11, 2009 5:40:19 PM


Regarding the sources for further reading, a long list is waiting for you. :D 

1. Processor Architecture.

http://en.wikipedia.org/wiki/CPU_design

2. Instruction set.

http://en.wikipedia.org/wiki/Instruction_set

3. Cache accessibility.

http://developer.amd.com/documentation/articles/Pages/1...
http://en.wikipedia.org/wiki/CPU_cache

4. Steppings.

http://www.tomshardware.com/reviews/dual-quad,1720-3.ht...
http://www.techpowerup.com/articles/overclocking/29

5. FSB/HyperTransport Link/Quick Path.

http://en.wikipedia.org/wiki/Front-side_bus
http://en.wikipedia.org/wiki/Intel_QuickPath_Interconne...
http://en.wikipedia.org/wiki/HyperTransport

6. Memory type, accessibilty.

http://en.wikipedia.org/wiki/Random-access_memory
http://en.wikipedia.org/wiki/Processor-in-memory
http://en.wikipedia.org/wiki/DDR3_SDRAM
http://benchmarkreviews.com/index.php?option=com_conten...

7. Intercommunication links between processor with multiple cores.

http://en.wikipedia.org/wiki/Multi-core

The links provided hold overviews of the technologies not in depth and/or purely technical.

Hope this will help for further reading.



October 11, 2009 7:53:07 PM

Think of it as like this

If there is one really strong man carrying a log (P4)

OR

2 Stronger men carrying a log ( Core 2 Duo)

Just saying its newer technology and they are smaller 45nm vs 65nm which mean more transistors and just better rendering
October 11, 2009 8:12:47 PM

for all you people talking about ISA....isa is quite irrelevant in modern processors....everyone uses x86...sort of...all the code that runs on almost any processor that you put into a PC is x86 isa, but once the instruction is inside the processor it almost always gets translated on the fly into micro-ops

those of you using transistor count as a metric, transistor count is less relevant today than it was 10 or 15 years ago....it use to be that more transistors meant that you could implement more functions in single cycles rather than using multi cycle instructions, it also allowed for deeper pipelines....but then intel ran into the laws of physics at full speed...and now transistor count is not making processors any faster, what it does is increase the amount of cores you can put on a processor (as many of you know) but it is not this that makes processors fast....it simply allows two threads to run simultaneously. The speed of the core is currently being increased by increasing ILP, or instruction level parallelism, which means that you take the instructions in a single thread, and you run them out of order, some at the same time....and a bunch of other things that allow single threads to run faster

while it is true that advances in hardware allow for more stuff (caches/cores) to be placed on a chip, the real gains are currently occurring in the actual core design allowing individual threads to run at higher IPC's
!