I was reading how Intel is using internal routing with their 48 core CPU. Wouldn't the whole process be simplified by stripping down the execution pipeline and subdividing branch prediction to additional cores? For that matter, why not Stream them together like ATI Does on their GPU's?
As for the CPU itself, it is actually based on terascale which was already extremley fast. With Terascale Intel could control each CPU individually. SO if you didn't need certain ones, each could be shut off or put into a low power state.
And as for speed, Intels 80 core terascale CPU was pushing out 1TFLOP worth of data at 2.5GHz using 62w under load while it took ATIs HD4870 800 SP units and probably 3x the power draw.
The router setup Intel has for that 48 core CPU is actually quite good and is probably what will go into Larrabee.
Remember back when the Pentium 4 had the REALLY long branch prediction? It was really fast if you had a cache hit. But if you missed it was like the computer stalled. I probably should go do some more reading but i was thinking about having a virtual back door to each core that could render GUI actions in real time using unused CPU cycles to do something that the user "might" do. How many things can you click on, on your desktop? Now if each core had already executed a huge amount of the code for those applications and cached it into a large cache that runs side by side existed cache, then gets bursted into RAM at full speed. I am just imagining something like this on a 1024 core CPU.... (Yeah i know out of reality, but someones gotta dream)