So my question is hard to explain but......So a Dual core i3 can match it with a quad core phenom II x4 right? So what i cant understand is why AMD cant take advantage of their extra cores and push the information through them regardless of threading. I mean, since a single CPU core can process multiple instructions per clock, making it faster per core (intel SB), isn't there a way to take say, 2 AMD phenom cores and make them work as 1 to get better performance when their is only 1 thread? Cant you push a single thread through multiple cores? surely this is possible.... is it?? can someone answer me why it hasn't/can't be done?
The problem with threads is that their tends to be data dependencies, when a compiler splits things into various parallel threads it makes sure that there are no data dependencies between the threads until the end of their execution. Once the program is compiled there is no reasonable efficient way to check on the fly to make sure there are no data dependencies, you would basically have to run it, make sure there are no dependencies, and then modify the program for future usage, but you also have to be sure that a logical branch wouldnt produce a data dependency too so its very difficult to do once code is compiled.
While many times you might get by fine just forcing things into parallel, that one time you do run into a data dependency that gets executed out of order it might crash the program, or even worse, it might not and it may keep on going.
^^ Basically, compilers make certain assumptions about data dependency, that if you start to combine cores resources, that would no longer be applicable.
Also, once a thread [a number of instructions] gets assigned to a core, its up to the CPU to determine how to process the data; the OS simply waits for the results. So we're looking at a purly hardware level solution, which takes away die space that could otherwise be used to improve IPC. Hence, combining cores resources is simply not an effective way to go about things.
At the end of the day, the serial portion of a program [memory/disk access] limits how effective parallization can be [see Amdahl's law: http://en.wikipedia.org/wiki/Amdahl%27s_law]. Hence why I've always viewed AMD's many-weak core approach as flawed, as most non-benchmark programs are limited in their ability to make use of all those resources.