There are no informations from relevant sources about the next AMD architecture, but there are rummors that it will be the old K8 improved. Some of the K8 design engineers are talking about this architecture that can be improved and that is promising evolution. I was thinking about it, but I don't have any new ideas, not mentioned before. I was wondering about some weak points of the K8, how can be improved and how it will affect the K8 performance.
The new IMC for the DDR2 on the pre-release sAM2 K8 proved as uneffective becouse of the high latency of DDR2 memory modules.
The K8 singlecores with 128bit DDR IMC are not starving for bandwidth, but I think thats not the case for the dualcores. I guess they will have adventage of more bandwidth, but with low latency memory access as it is the case for the DDR. Anyway, the L2 cache freqfency VS RAM freqfency ratio is much more better with the DDR2-800 than with the DDR-400, but the higher latency is disadvantaging the doubleclocked DDR2 and the overall performance remains almost the same.
I was thinking that LOAD/STORE reorder scheduler (like the one on the future Intel Core arch chips-SmartMemoryAccess) will improve the efficiency of the DDR2 IMC on th K8. Maybe there would be DDR2 modules with lower latency when DRAM chip producers start producing 65nm DRAM cells, but I think that the scheduler will improve the performance in this case too(unlike the Core arch chips that are accessing the memory via the FSB and the northbrige and have almost no advantage of lower latency DDR2).
The shared L3 cache done with Z-RAM is another rumor that is possible. If this happen than larege L3 will boost the K8L performance for cache sensitive apps, but that is not the spirit of K8. They are very fast for multimedia and gaming becouse this kind of software does not need large on-chip cache, but needs faster memory access and more memory bandwidth. With the scheduler(I am thinking about) and the large shared L3 it will be data-streaming monster.
There are rumors that the improved K8 (K8L or K10) will have more issue superscallar cores. That means a rework on almost whole K8 architecture, new fetcher, decoder, branch predictor, widther in-chip buses and etc. I guess this will help also, but I am not sure how effective this improvement will be, counting the number of extra transistors involved.
If K8L(or K10) will be more issue, than there will be more execution units. I guess they will be 128bit, so 128bit SIMD instruction will be achived each cycle. I wonder how the reducing of the FP execution stages will affect the K8 FP performance.
So, what do you think about what I am thinking, am I thinking right?
And what else possibly might boost the current K8 performance?
The new IMC for the DDR2 on the pre-release sAM2 K8 proved as uneffective becouse of the high latency of DDR2 memory modules.
The K8 singlecores with 128bit DDR IMC are not starving for bandwidth, but I think thats not the case for the dualcores. I guess they will have adventage of more bandwidth, but with low latency memory access as it is the case for the DDR. Anyway, the L2 cache freqfency VS RAM freqfency ratio is much more better with the DDR2-800 than with the DDR-400, but the higher latency is disadvantaging the doubleclocked DDR2 and the overall performance remains almost the same.
I was thinking that LOAD/STORE reorder scheduler (like the one on the future Intel Core arch chips-SmartMemoryAccess) will improve the efficiency of the DDR2 IMC on th K8. Maybe there would be DDR2 modules with lower latency when DRAM chip producers start producing 65nm DRAM cells, but I think that the scheduler will improve the performance in this case too(unlike the Core arch chips that are accessing the memory via the FSB and the northbrige and have almost no advantage of lower latency DDR2).
The shared L3 cache done with Z-RAM is another rumor that is possible. If this happen than larege L3 will boost the K8L performance for cache sensitive apps, but that is not the spirit of K8. They are very fast for multimedia and gaming becouse this kind of software does not need large on-chip cache, but needs faster memory access and more memory bandwidth. With the scheduler(I am thinking about) and the large shared L3 it will be data-streaming monster.
There are rumors that the improved K8 (K8L or K10) will have more issue superscallar cores. That means a rework on almost whole K8 architecture, new fetcher, decoder, branch predictor, widther in-chip buses and etc. I guess this will help also, but I am not sure how effective this improvement will be, counting the number of extra transistors involved.
If K8L(or K10) will be more issue, than there will be more execution units. I guess they will be 128bit, so 128bit SIMD instruction will be achived each cycle. I wonder how the reducing of the FP execution stages will affect the K8 FP performance.
So, what do you think about what I am thinking, am I thinking right?
And what else possibly might boost the current K8 performance?