In essence, that's what their 4x4 platform is. But doing what you're thinking of isn't a viable option.
The way Intel's dual die chips work is both chips talk to 1 fsb. Their 2 way workstation and server setups have used this for years so adapting this to a situation where both chips are on the same socket wasn't too big of an investment.
AMD's 2 way setups took a different approach. Instead of sharing 1 fsb, both chips had their own, giving both CPU's dedicated bandwidth. This approach was carried over to the X2 with the primary difference being the fsb now only covers a few nm instead of cm. In the K8, all io activity is done by the half of the northbridge that has been integrated (HTT and the IMC) which allows for vastly greater fsb speeds and internal bandwidth. In order to place 2 K8 dies on 1 socket, 1 die would have to be connected to the other via HTT and have no direct memory access, placing the load of 4 cores on 1 memory controler. This would have a substantial impact on the performance of those 2 cores, making the entire venture relatively pointless.
At the end of the day, both kentsfield and AMD's 4x4 platform will be very expensive and have no impact on the mainstream. Don't expect this to change untill the second half of next year.
Pure speculation here... but simply making a processor of essiantially 2 dual-core processor dies on one physical chip translates to twice the heat and twice the power consumption. That alone is unacceptable, much less that a shared cache requires 1 die not 2. This is why Intel and AMD have to first engineer native quad-cores prior to releasing them....anything less than that would be a hack job and of poor quality. This answer is because of, of couse, in response to the previously mentioned model of glueing two dies together to make a quad core. Not feasible nor profitable in the least.
Unfortunately it is a bit more complex than that. Quad Cores will cause many-a-bottleneck... starting with the FSB. Dual-Core processors were successful because even as things stood, the Front Side Bus -- which with AMD procs is and remains .... 200mhz did not saturate the FSB except at the very high end (which is why the perf spec above FX60 isn't much at all) and becomes a bottleneck. With preliminary Intel Quad Core bencharks out on the net, the performance difference between a Dual and Quad Core Proc is less than 25% in most applications and less than 10% in others. The widely spread claim is that apps simply aren't optimized for "multiple cores" but it is more fundamental than that. Even should Quad Cores come to the market they will not be truly effective until the FSB bandwidth increases by 1.5 fold (by that i mean a MINIMUM of 333mhz actual speed - 667mhz DDR) and it would be MUCH more effective at a true FSB rating of 400 actual or 800DDR. Until then Quad Cores will not provide to be much of a performance benefit at all.... at least not for the money it will cost.
This is also why AMD is doing their 4x4 platform. Two independent HT buses and two independent memory banks hence two independent FSBs to worry about bottlenecking. This is where AMD figures its performance advantage will come from over a true Quad Core like what Intel is releasing. I am curious as to what the actual benchmarks will turn out to be.
I don't think it's an issue of being able to do it. They sort of painted themselves into a corner by mocking Intels "glued" process, and praising their "native" technology. They would look rather stupid now, if they resorted to their rival's tech....that they've been poo-pooing. :wink:
Putting two dual-core CPUs onto one socket does not necessary double the heat output versus comparable single-core CPUs. They are generally a little undervolted and run slower than their SC brethren and in the chips that are on a single die, there are parts that are on a single-core CPU that there are only one of, such any memory interfacing parts and sometimes even the L2 cache. Here's an example: An Athlon 64 3500+ Venice has 512KB of L2 cache, runs at 2.2 GHz, has a Vcore of 1.400 V, and is rated at 67W TDP. The Athlon 64 X2 4200+ also has 512KB L2 cache per core and runs at 2.2 GHz but has a Vcore of 1.3500 V and has a TDP of 89W. It's nowhere near double.
And as far as a multi-chip module, Intel did just that with the Pentium D chips. They were simply two 1MB L2 cache Prescotts or two Cedar Mill P4s under one IHS and fitting on one LGA775 socket. They shared the FSB and ran very hot (although that's not the fault of the MCM, just that the Netbursts were furnaces.) They largely did not suffer from FSB choking, surprisingly, and the P4s needed a ton of FSB bandwidth to run well. The early reviews of the Kentsfield revealed that even with its low-ish 1066 MHz FSB that it it scales reasonably well. Of course if I were Intel, I'd bump it up to 1333 in an eye blink, but it seems that the Core 2s aren't that starved for FSB bandwidth at the current clock speed. Well, at least running apps that may or may not have been very memory-bandwidth-intensive. I bet that even when those are run that a 1333 FSB will be enough for speeds below 3 GHz. The Core 2 Duo notebook chips go along happily on a 667 MHz FSB in the low 2 GHz range- that would be my comparison. Not that great, but Kentsfield is not shipping yet. Time will tell, and I bet it will be okay. Expensive, yes, but it will probably work okay.
Now put 8 cores on one die and I bet you will see FSB choking being a bigger issue as supposedly the FSB speed is tapped out at about 400-ish MHz. This is where I see Intel dropping the FSB or running dual FSBs to the socket. But they are putting in IMC into their long-term plans, so I have to believe that they are at least thinking of it.
Intel's MCM chips sucked because they were based on NetBurst. If they would have put two Pentium Ms in an MCM and sold it for the desktop, that would have likely performed well and much closer to the Core 2 than the Pentium Ds did. If they would have made a single-die NetBurst chip, it likely would have sucked just a tad less than the MCM versions because of the large shared L2 cache.
The packaging seems not to make that big of a difference. I betcha that the "glued" quad-cores (Kentsfield) won't perform that much worse than a native quad-core of a similar arch (Penryn.) You can do neat things like having large L3s and unified L2s on a single-die chip that you can't using a MCM, but that bump in performance is not all that huge.