Reading through the architecture for AMD Bulldozer and Piledriver processors, it is clear that each processor module has the hardware to handle 2 simultaneous threads on their own pipeline, except in the rare case of a 256 bit Floating point calculation where it has to share the two FPU units to perform the operation. Every other operation has a full path on it's own hardware through the CPU and 2 threads can pass without contention through each module.
As for sharing L2 cache, if you give each pair 2MB to share instead of 1MB of dedicated, there is no performance hit. L3 cache is shared at Intel, too.
It feels like everyone is missing these facts, but maybe its me. Why would someone say that AMD does not use "real" cores?
As for sharing L2 cache, if you give each pair 2MB to share instead of 1MB of dedicated, there is no performance hit. L3 cache is shared at Intel, too.
It feels like everyone is missing these facts, but maybe its me. Why would someone say that AMD does not use "real" cores?