Interesting individual FX cores and modules benchmarks

So I just built up a new Vmware ESXi 5.1 server using Bulldozer Opteron chips and I thought this would be a great way to finally understand how the modules and cores perform. There is a lot of talk about how the modules have one "real" core and one slower sister core, or it's not really a "real" dual core module, or whatever.

These are 2.7Ghz Opteron chips (so slightly slower than an FX-8000). I've built a 2 CPU virtual machine and then assigned affinity to individual cores to force it to use either both cores in one module, one primary core in 2 modules, or one secondary core in 2 modules. I verified in performance monitoring on vmware that they were using the intended cores for each test.

Here are the results:

2 cores in one module:

Novabench Integer 109M
Novabench Floating 43.6M

Passmark CPU 1921
Passmark Integer 2860
Passmark Floating 1426
Passmark Single Core 1023


1 primary core in 2 different modules:

Novabench Integer 114M
Novabench Floating 44.4M

Passmark CPU 2187
Passmark Integer 2915
Passmark Floating 1764
Passmark Single Core 1022

1 secondary core in 2 different modules:

Novabench Integer 114M
Novabench Floating 44.4M

Passmark CPU 2178
Passmark Integer 2919
Passmark Floating 1766
Passmark Single Core 1023

What I think this shows is that the individual cores within a module are in fact the exact same speed and performance. But something is bottlenecking them slightly within the core, and it looks like it is probably the FPU scheduler, as Floating point performance drops down 20% on Passmark when using a single module, but has identical performance when tested individually

Anyway, I guess you can make of this what you wish. There are definitely 8 identical speed cores in an 8-core FX, but they are bottlenecked somewhere at the module level.

I thought it was interesting as I'd never seen individual cores within a module isolated in any benchmarks.
4 answers Last reply
More about interesting individual cores modules benchmarks
  1. C'mon man, somebody at least pretend this was interesting! :)
  2. I have been doing this gig ever since AMD announced module architecture, but only the numbskulls cannot fathom that concept and call it a 1 core + fake core, which obviously men of a certain intellect can understand this. I mean even AMD themselves demonstrated this concept but still the armchair experts suggest otherwise.

    I did this using a FX8350 ES and FX6300, disabled all but one x86 core and cycled through the same test suite with every core isolated and the results are the same within a 0.00002% margin of error. The module is simply two x86 cores sharing the same L2 and Front end. Therein lies the issue, the front end is to small to feed the cores quick enough. Steamy addresses this well by adding more resources to feed the cores faster and also speeds up L1, L2 and L3 speeds and latencies by around 40%. With the current arch the cores are starved and waste cycles and hence the performance penalty when both cores in a module are utilized. Steamroller expects to mitigate this by as much as 15%. that means in theory that if Zambezi/Vishera perform like 1.80% of a dual core, Steamy will be 1.95% of a dual core. This compared to intel's HT which is about 1.1% of a dual core.
  3. That is interesting Twelve25

    It really shows the inherent issues with building architectures which rely too heavily on the expertise of a managing component/mechanism up the hierarchical chain (in this case the scheduler).

    What AMD needs is a better coherency between the individual cores that makeup a module. There needs to be an improvement in the exchange medium/communications medium that would allow for each core to adapt to incoming work loads and schedule FPU loads accordingly and on the fly.

    If AMD can nail that (perhaps through leveraging an improved/shared caching mechanism) they'd have an amazing architecture on their hands in terms of multi-core capabilities/scaling.
  4. ElMoIsEviL said:
    That is interesting Twelve25

    It really shows the inherent issues with building architectures which rely too heavily on the expertise of a managing component/mechanism up the hierarchical chain (in this case the scheduler).

    What AMD needs is a better coherency between the individual cores that makeup a module. There needs to be an improvement in the exchange medium/communications medium that would allow for each core to adapt to incoming work loads and schedule FPU loads accordingly and on the fly.

    If AMD can nail that (perhaps through leveraging an improved/shared caching mechanism) they'd have an amazing architecture on their hands in terms of multi-core capabilities/scaling.


    They didn't expect to get it right first time with Zambezi, and Vishera did reduce the scheduler bottleneck to a fair extent. Streamroller is the biggest change yet, added FPU's, widened instruction executions and L2 and L3 cache speed increased and latency reductions around 40%. The new improved and wider/deeper Front End should keep those pipelines feed better.
Ask a new question

Read More

CPUs Core