Heh, IIRC the 'multiple cores working on a thread' is the old "reverse hyperthreading" scenario, sorta like macro-op fusing but spread across cores instead of just registers. I think the hardware requirements would be extreme.
I believe BD will be the first AMD 4-issue core, comparable to Intel's, since K8 through the current Phenoms are 3-issue cores which is one of the reasons why K8 and later has lower IPC. However, I'm not sure if the 4 decoders are per-core or per-module on BD. IIRC a BD module is 2 integer pipes with a shared FP unit. Since these are complete integer pipes, AMD calls them cores I think. In contrast, Intel's core is an integer pipe plus an FP unit plus extra hardware to let the core switch to another thread when it's not fully occupied with the current thread (hyperthreading). Intel's philosophy is that for lightly-loaded threads (i.e., <70% clock cycles used), it makes sense to switch to another thread and execute it for a while, to keep the core working as near 100% capacity as possible. AMD's BD is taking this one step further and making the second integer pipe much more complete.