AnandTech has a great summary of AMD's plans for the near future.
http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2768&p=1
Specifically, I want to draw attention to the details on the upcoming K8L architecture.
http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2768&p=3
First of all claims about the doubling of floating point and SSE resources are correct. However, it isn't exactly as what it appears to be. My understanding of the K8 design philosophy was that every execution unit would be given its own port so that there could not be any conflicts. In this case, AMD has shied away from this approach and has decided to have 2 FPUs and 2 SSE units share 2 ports which is a more Intel approach. This should mean that the potential of these additional units isn't as great (ie. less than 100% performance increase), but some of that is offset by 1 cycle 128bit operation assuming the code is taking advantage of that. I'd be interested to know if the FPU and SSE units that share the same ports are share any resources that may impair their ability to operate in parallel.
Also, at first I was impressed by the marketing of the dual 128-bit loads per cycle, but it appears that this is just a euthamism for the widening of the data buses to 256bit. This will bring K8L in line with Intel designs which have used 256bit buses for quite sometime. Still, this should lead to a much needed increase in bandwidth to supply this enhanced core.
http://www.realworldtech.com/page.cfm?ArticleID=RWT060206035626
Overall, K8L is a great improvement over K8, but I really don't see it completely dominating Core. We'll have to see who's better (probably not significantly) closer to launch since clock speeds and other factors of course some into play. In platform terms though, AMD's HyperTransport approach is really blooming with an addition link per Opteron, unganging ability and the HTX slot.
http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2768&p=1
Specifically, I want to draw attention to the details on the upcoming K8L architecture.
http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2768&p=3
First of all claims about the doubling of floating point and SSE resources are correct. However, it isn't exactly as what it appears to be. My understanding of the K8 design philosophy was that every execution unit would be given its own port so that there could not be any conflicts. In this case, AMD has shied away from this approach and has decided to have 2 FPUs and 2 SSE units share 2 ports which is a more Intel approach. This should mean that the potential of these additional units isn't as great (ie. less than 100% performance increase), but some of that is offset by 1 cycle 128bit operation assuming the code is taking advantage of that. I'd be interested to know if the FPU and SSE units that share the same ports are share any resources that may impair their ability to operate in parallel.
Also, at first I was impressed by the marketing of the dual 128-bit loads per cycle, but it appears that this is just a euthamism for the widening of the data buses to 256bit. This will bring K8L in line with Intel designs which have used 256bit buses for quite sometime. Still, this should lead to a much needed increase in bandwidth to supply this enhanced core.
http://www.realworldtech.com/page.cfm?ArticleID=RWT060206035626
Integer resources have not been increased and so performance improvements there will come from other areas such as the better caches and improved out of order operation. However, what AMD is adding in regard to the latter is the ability for loads to pass previous loads. This ability has been present since the P6 and PM. K8L will not be able to allow loads pass previous stores as Core can.Additionally, it is easy to deduce, based on information about the load/store units that the bus between the L1 and L2 caches has been widened to 256 bits.
I of my concerns about the way AMD is implementing their quad core is the size of the caches. I know AMD is proud that their architecture doesn't need to rely on the "brute force" approach of large caches, but that still isn't a reason to decrease the cache size. The quad cores will only have 512kB of cache per core. Even on AMD's architecture, going to 1MB of L2 cache does make a performance difference which is why all the Opterons have them. Now AMD is expanding their cores which requires more bandwidth with lower latency to feed it and they are decreasing their L2 cache count. There is a L3 cache, but it's only 2MB and 512kB of L2 cache + 512kB of higher latency L3 cache per core is not equivalent to 1MB of L2 cache. It's great that going to DDR2 doubles the available bandwidth from memory, but it'd be better to try to avoid the RAM subsystem as much as possible to begin with. I think the reduced cache size is a consequence of AMD projecting that they may be manufacturing constrained. This isn't a bad thing necessarily since it means they are selling as fast as they're making chips, but it means they have to sacrifice cache to save transistors to improve yields.The load/store units also have somewhat more flexible execution; they can re-order loads with respect to other loads (although loads cannot move around stores).
Overall, K8L is a great improvement over K8, but I really don't see it completely dominating Core. We'll have to see who's better (probably not significantly) closer to launch since clock speeds and other factors of course some into play. In platform terms though, AMD's HyperTransport approach is really blooming with an addition link per Opteron, unganging ability and the HTX slot.