Pro Llama

Distinguished
Dec 8, 2009
353
0
18,810


The short answer is yes L2 is faster.
 

Kewlx25

Distinguished
It's a trade-off

If you have a cache miss and the data is in the L3 cache, then L3 cache is faster. If you have a cache miss and the L3 cache does not have it, then it's slower.

The idea is that with lots of threads running and decent prefecting, the L3 cache will overall help more than hurt and cache-misses are hidden by working on other stuff while waiting.

Since hyper-threading just make the CPU "look like" it's able to run two threads, what the CPU will do is if you have a cache miss on one thread, the CPU will change threads and start work on the other one. This means that even though one thread is waiting for data, another thread can continue to work assuming it does have data. The L3 cache greatly helps the chance of the needed data being ready. This is one of the ways misses are "hidden"
 

Pointertovoid

Distinguished
Dec 31, 2008
327
0
18,810
On single-tasked programmes, all Cores have the same speed per MHz, as you can check on forums for SuperPi benchmarks.

More precisely, Cpus with a direct connection to Ram have a 5% advantage.

So unless you have multitasked games or video encoding, take a dual-core with a high frequency. The C2D's Lga775 limits upgrade possibilities, this is its only drawback.

-----

Hyperthreading is better than described above. The Cpu has two sets or registers for data, state and so on, and feeds computation units (=Sse) with requests from both threads, tick for tick. This is how HT achieves up to 30% improvement on multitasked software. Far better than just improved context switching. But not equivalent to one core more, and certainly useless for single-tasked software.
 

Kewlx25

Distinguished


I was focusing on the "hiding of latency" part of HT.Like you said, current i7 CPUs have many duplicated units which allows two different threads to actually work roughly the same time, unlike the P4 which had only a few duplicated units.