The thing is, AMD decided to club together 2 CPU cores, to make what is known as a module. Also, in order to reduce die size, they made some components as shared between the two cores. Each module still has 2 separate integer cores, but they now share the front end and the FPU.
Here's a BD/PD block diagram,
Coming to the FPU, it is actually comprised of two 128bit FPU's inside, but its still called as a single FPU, because they are fed by a single FPU scheduler. Also,if any core needs to execute a 256 bit FPU instruction, then these 2 128bit FPU's, can combine together and operate on the 256bit instruction. Very smart idea actually, given than 256bit FPU operations are encountered very rarely.
On the other hand, the sharing of the front end has the biggest impact on the per core performance of each core. AMD plans to address this issue in the Steamroller module, that will be released probably in a years time or less, by now offering a separate integer decoder for each core. Observe the difference here,
Intels hyperthreading meanwhile, is a similar, but follows quite a different approach. Each core can support 2 threads, but it is only one core in the end. If 2 threads are made to run on the hyperthreaded core, then they will share resources of the core, one at a time. This differs from AMD modules, wherein each integer core can be processing a different instruction simultaneously.
PS:- This topic has been a source of very heated arguments in the past
If you ask me, AMD isn't lying, and each module is 2 cores. Technically speaking, the original 8086 'core' never even had a FPU in the first place!! Some people prefer to believe differently however.