Why Do People Say the AMD FX Is Not A Real 8 Core Prosseser

Status
Not open for further replies.
Solution
The FX-8150 isn't a "true" octa core processor but rather has four Bulldozer modules packed with a pair of processing cores each — this is the biggest change from the older Phenom II architecture. The CPU has plenty of cache, 8MB of L2 (2MB per core) and 8MB of L3 (2MB per module). Bulldozer has also been updated with FMA, XOP, AES, AVX, and SSE 4.2 instruction sets giving it a boost in applications that support them. The total package has an amazing 2+ billion transistors, more than twice that of latest Phenom II X4/X6's, fit on a ~315 mm2 die.

Pulled from:

http://www.silentpcreview.com/amd-fx8150
 
people don't know what they are talking about and have no idea how the hardware works and pull stupid generalization out of their asses.

They think that shared resources makes the 2 core module compatible to a hyperthreaded core but thats completely wrong in every aspect when looking at the hardware. SMT used in hyperthreading uses the scheduler to cram the core with instructions that can be done in 1 cycle to increase core efficiency. Bulldozer lower the ammount of hardware used in a module that aren't used as much in a traditional cpu core. The approaches are completely different.

As it stands the cores of the bulldozer is still pretty much full cores, just not as efficient. Anayways bulldozer FX doesn't have 2 billion transistors, it has 1.2 billion, the 16 core interlagos is the chip with 2 billion transistors.

Bulldozer modules share an L2 and uses the same FPU in most instruction sets. It each core is able to do 1 256 bit integer calculations and 1 128 bit float point calculation. The 2 128 bit float point units are joined into a single 256 bit FPU shared between the module.

just because a cpu isn't just the same cores taped together doesn't mean it has less cores than one that does, it just means the cores are weaker if used inefficiently.
 

popatim

Titan
Moderator
See to my thinkiing the 'instruction managers'(built into the cpu) job is to schedule instructions appropriately for max efficincy and not the programmers (job) to make programs/instructions more bulldog friendly. Instead we see the same 'this thread has to run in this core' mentality we've had for years when it should be easy enough to run iinstructions in the core that can handle it best and the reintegrate it in the thread on the way out of the processor.
 

actually its the os that does scheduling and the cpu only reorders operations bases on efficiency with the registers and other components. The programmer's job is to make a program that runs well and would be advised to look into things regarding scheduling as well as efficient compilation of the code for the cpu to run.

The programmer should look at every way to make his program run as efficiently as possible if he wants to code anything well.
 

loneninja

Distinguished
It's because Bulldozer had 4 modules with 2 integer cores in each module giving it a total of 8 integer cores, but some resources are shared within the module rather than having 8 full cpu cores.

The Win7 patch for Bulldozer actually turned Bulldozer into a 4 core 8 thread processor like Intel I7 when running Cinnebench.
 

ElMoIsEviL

Distinguished

I'll complete the puzzle.

Bulldozer FX-8150 has 4 modules each with 2 integer scheduling units. This means that Bulldozer has a total of 8 Integer Scheduling Units. A.K.A Execution pipelines. (As you can see bellow the FX-8150 contains four Bulldozer Modules)

amd_bulldozer_fx8150_die.jpg


The reason many people rightfully claim that Bulldozer is not a true 8 core processor is that when processing floating point operations... Bulldozer is left with the equivalent of half the Floating Point Scheduling Units found in a traditional processor.

bulldozer-module-diagram-img1.jpg

As you can see above each Floating Point Scheduling Unit, found in a Bulldozer module, is shared by two Integer Scheduling Units. So when it comes to processing floating point operations... Bulldozer acts more like a Quad Core than an 8 Core Processor and thus comes out looking severely crippled.
 
no it does not. the 256 bit FPU inside each module is 2x the float point power of an phenom 2. If things were using the right instructions and optimized for bulldozer it wouldn't look crippled at all.
 
Solution

ElMoIsEviL

Distinguished


Yes it would. Bulldozer has two 128-bit units (or a single 256-bit AVX unit) per two Integer Scheduling units. Sandybridge, for comparison, has one 256-bit unit (capable of AVX) per one Integer Scheduling units (so the equivalent of two 128-bit units per single Integer Scheduling Unit).

Phenom II, as shown bellow, has far more FPU resources than Bulldozer (in terms of the number of FPU units). AMD did, how ever, add AVX and other optimizations to Bulldozers FPU units.
ConventionalCPU2.jpg


AMD cut down its Potential FPU resources by half with this new Bulldozer architecture and they've done that in part because AMD feels confident that OpenCL, DirectCompute and other such technologies will leverage the FPU resources found in AMDs GPUs.
 

ElMoIsEviL

Distinguished


Length of the Execution Pipeline is mostly responsible for that discrepancy based on my understanding of the current reviews. In other words... AMD lengthened Bulldozers Execution Pipeline (adding more stages) in order to bolster the potential clock speeds. So Bulldozer does less work per Mhz (per clk) than Sandy Bridge. They could have done this for several reasons... they haven't been exactly upfront as to why, it would seem, they've gone the Netburst route.

One possible explanation could be, looking at other "similar" architectures in terms of execution pipe-lining, the fact that Intels P4 ran its ALUs (execution units) at a higher clock rate than the core itself (so that if the branch predictor returned a bad result it could be flushed and re-executed without too many issues).
AMD may have opted to do the same with Bulldozer. Given that the ALUs run at the same speed as the rest of the CPU... AMD could have had to rely on higher clock speeds on the whole (akin to how AMD GPUs run ALUs at the same core speed as the GPU whereas nVIDIA run their SPUs at a higher rate than the core).

I couldn't tell you for sure exactly what is faulting Bulldozer only that Cache wise, FPU wise and even Execution unit wise (ALU) it is inferior (per/clk) than Sandy Bridge even a lowly Core i5 2500K.
 
well i tend to go by what amd call it. they say 2 modules per core and 4 cores per chip... if thats how they want to advertise it then so be it
to be honest i think its just sales garb. because if they has sold it as an 8 core cpu they would have been hung in the press for under performing on a single thread basis. this way they can say we only have 4 core which competes nicely with intel, which it does if you make it compete with the first gen neph parts...
 
Status
Not open for further replies.