Pulssqt :
Hello..
Ill just type what i think i know, then you correct me if I'm wrong.
So basically, It's an 8 core processor.
It has 4 modules, each containing 2 cores, that share some resources, including FPU and L2 Cache.
1 Module has 256 bit FPU, and can also act as 2x128 bit FPU if needed, so each core can do It's own job.
Each module has decoders, fetch, EUs, I/O pipeline. Piledriver cores share decoders and fetch.
Now thing i don't understand.
When they can and can't use 4 or 8 cores ? If they share FPU, does that mean that they can constantly use 8 core normally if each has 128bit FPU ?
OR, when one core is working, other one is 'waiting' ? But why is there 2x128bit then ? Is it because 8 cores can sometimes do the work, and sometimes they need to do it with 4 ?
All x86 CPUs from AMD and Intel use a compatible logical processor interface. The physical arrangement of the resources inside of the CPU package may affect performance, but they do not affect capability.
Understanding how this affects the FPU requires knowledge of how the FPU used in Intel's and AMD's microprocessors works.
In the early 1990s the FPU was a coprocessor that performed scalar floating point arithmetic. Intel integrated the FPU into the CPU starting with the 80486. AMD of course followed suit. This operational stack is called x87 and uses its own set of 8x80-bit CPU registers in addition to the original 8x32-bit general purpose registers.
In the mid to late 1990s Intel began adding additional instruction sets to extend the arithmetic capabilities of their microarchitecture. The first of these was called MMX (it's not an acryonym). MMX shares the same CPU registers as the x87 floating point operations but enables vector integer math. Vector operations are not possible on the general purpose registers. Vector operations enable data-level-parallelism by performing the same operation on multiple sets of data. For example, a general purpose CPU register on a Pentium MMX can be treated as an 8-bit value, a 16-bit value, or a 32-bit value. A 64-bit MMX register (sharing the same physical location as an 80-bit x87 register) can be treated as a pair of 32-bit values, four 16-bit values, or eight 8-bit values. This enables acceleration of certain mathematical formulas that are highly data parallel such as matrix multiplication. The CPU can perform up to eight arithmetic operations at once.
Next up is the SSE stack. SSE (Streaming SIMD Extensions) was designed to integrate and replace x87 and MMX. SSE debuted with 8x128-bit registers and successive revisions (SSE,SSE2,SSE3,SSSE3,SSE4.1,SSE4.2) have added new instructions. Right now the SSE stack is responsible for vector integer arithmetic, vector logicals, scalar floating point arithmetic, and vector floating point arithmetic. The SSE stack was extended to 16x128-bit registers for use with the 64-bit long mode.
The newest revision is AVX (Advanced Vector Extensions), extends the SSE stack to 256-bit registers. AVX adds 256-bit vector floating point operations, and AVX2 adds 256-bit vector integer operations. AVX-512 further extends the registers to 512 bits, but right now AVX-512 is only used on Intel's Xeon Phi coprocessors.
AMD's implementation of the floating point hardware on their FX series microprocessors is split. The unit may operate on separate SSE instructions from both of the frontends to which it is coupled at the same time, or it may operate on AVX instructions from one of the frontends to which it is coupled. If both frontends issue AVX instructions, one of them will have to wait in the reservation station temporarily.