I decided to post this in a seperate thread so as not to go too off topic in the thread we have been discussing this issue in.
First of all, I will explain my system setup, and my analysis method. My system is a S939 X2 based system. I reset the bus speed to 200MHz for this experiment, which reduced my processor back to it's stock speed of 2.2GHz. Since we are looking at the performance of the HT bus, it's also important for me to point out the devices that will be accessed over this bus:
Radeon X1950 Pro 256MB, PCIe 16x
2xSATA150 HDs in RAID 1
PCI Wireless Card
I decided to use two applications to run these tests. Sisoft Sandra was chosen because this gives a detailed analysis of individual components, and is often much more sensitive to changes than real world benchmarks. It's weakeness however is that it tests components in isolation, making it difficult to saturate the HT bus. Because of this, I also used 3DMark06. This seemed like a good choice because it stresses multiple components simultaneously, and since gamers are most likely to be affected by lower HT speeds (since the graphics card is the most bandwidth heavy device using the bus).
I started out with the default multiplier of 5x, giving a HT speed of 1000MHz. I then dropped down to 2x and 1x, keeping all other settings constant.
My results for the Sisoft Sandra benchmarks I ran showed no statistically significant drops in performance with lower HT speeds. This is perhaps understandable because it tests components in isolation. As such, I will not show these results here. The 3DMark06 results were more interesting:
2x: 4693 (-1.2% of 5x)
1x: 4462 (-6.1% of 5x)
What this shows is, that on my system, you have to reduce the HT bus to 40% of it's normal speed before any performance loss is observed. Lowering it further below this results in a significant loss in performance.
If we assume therefore, that my system is utilising approximately 40% of its available bandwidth, then in the worst case, an equivalent system that had a K8 quad core instead would utilise 80% of the available bandwidth. Since this still leaves 20% headroom, then with my system, I would have to be running a quad core CPU at above 2.64GHz (2.2 x 1.2) in the worst case before I saw any significant loss in performance.
There are some caveats with these results, when translating the performance across to K8L. Firstly, it is possible that the use of DDR2 memory would affect these results. I think this is unlikely however, since memory access is independant of the HT bus on single socket systems. More significantly however, more demanding graphics cards than mine would place more strain on the PCIe bus. Therefore, a Crossfire or SLi system may well saturate the bus faster.
What this does show is that verndewd may indeed have a point that K8L quad cores may suffer a substantial performance hit on AM2. On my system, given that I would start to see performance loss on a 2.64GHz quad core, it is reasonable to assume that a 2.9GHz K8L would also show a loss, even if K8L has similar I/O characteristics to K8. This loss would be greater with more intensive graphics setups than mine.
In light of this evidence, I now agree with verndewd that quad core K8L may well have some significant performance limitations on AM2, in games. Further tests would have to be run to see if this also holds for other applications. I would hypothesise however, that dual core K8L based systems will not suffer the same problem.
I'm only extrapolating based upon what I've observed on my own machine. Bear in mind that my calculations assumed the worst case - that is, twice as many cores will result in twice as many I/O operations. In reality, this may not be the case.
Intel and AMD's architectures are very different, and as I'm not an expert I'm not really in a position to comment on the differences between them. I do seem to remember reading on one of AMD's presentation slides though that HT1 has about equivalent bandwidth (but lower latency) to Intel's current FSB. HT3 will pull ahead of this again. If this is true then it isn't unreasonable to assume that FSB1066 is at the very edge of what it can handle with quad core, and if K8L has a higher IPC than C2, then that may push it over.
Also remember that at stock, C2Q runs at 2.66GHz (about the point at which performance would suffer with a quad core K8 I believe). I havn't seen any benchmarks that have increased the multiplier when overclocking past this, so its hard to say what effect there would be beyond this point. If by raising the multiplier the performance increase is not linear then that would show that indeed the FSB is bottlenecking the chip.