AMD Could Follow Intel with Variation

Bache

Distinguished
Dec 3, 2006
344
0
18,780
Whats stopping AMD from producing CPU's simmilar to Intels build, but with variation?

I mean, since intel has hit a great cpu build with there c2d's, it can always and will be improved on.

Surely amd can can follow the same build line/method?

Just add new improvements, ie. onboard mem controller, etc.

There might only possibly be intels build direction the best.

AMD's path could lead to nowhere or no future in the long run.

Intel with there C2D's, etc, might have "stumbled on the best future path".
 

epsilon84

Distinguished
Oct 24, 2006
1,689
0
19,780
I think the biggest issue with AMD performance right now is that it's still a 3-issue design trying to compete with a 4-issue design in Core 2. The IMC helps bridge the gap somewhat, but its not enough, and what will AMD do once that advantage disappears with Nehalem?

I think AMD have some tougher times coming up ahead, for the next few years they will have to remain the budget alternative to Intel and compete with pricing rather than performance. Their platforms are top notch, with the 780G gaining wide praise, and their GPUs aren't doing too badly either.

They're missing 1/3rd of the puzzle, that being a competitive CPU, but hopefully their platform and GPU sales will make them enough money to spend on R&D so that they can devote resources towards building the next 'Intel killer'. As improbable a goal as that may seem at this time - its still better to aim high than die wondering, right?
 

Amiga500

Distinguished
Jul 3, 2007
631
0
18,980


The K8 memory systems prevent use of two dual core K8s on the same die.

You need to go multi socket.


Surely amd can can follow the same build line/method?

See above.

Just add new improvements, ie. onboard mem controller, etc.

AMD have an onboard memory controller, it is Intel that do not.


There might only possibly be intels build direction the best.

AMD's path could lead to nowhere or no future in the long run.

Intel with there C2D's, etc, might have "stumbled on the best future path".


You've got it totally the wrong way around.

AMD are on the right path, the Barcelona arch is just not as good as it should be.

Intel are migrating to AMD's approach with Nehalem, with the on board memory controller.



More interestingly, the Barcelona core has apparently been designed so that the IMC can work with more than one chip on the same die - so AMD can take a MCM approach to 8 or 16 cores.
 

epsilon84

Distinguished
Oct 24, 2006
1,689
0
19,780


Which is kind of ironic considering Intel is going the 'native' route with Nehalem - all dual/quad/octo variations will be monolithic. The PR wars between AMD and Intel are going to be interesting, to say the least. ;)
 

Amiga500

Distinguished
Jul 3, 2007
631
0
18,980



Indeed.



No doubt AMD will claim their design does not suffer from the same "problems"* as Kentsfield, Clovertown, Harpertown et al as it has been "designed to operate in an MCM environment" or similar.



*I bet AMD wish they had the same "problems" as Intel right now!
 

FHDelux

Distinguished
Jan 25, 2008
99
0
18,630
This is just a speculation on my part, but it seems to me a lot of the magic in the c2d/q is the enormous L2 cache size in comparison to the phemon / barcelona procs. The intel chips probably dont have that much of an advantage over the AMD quad cores with the cache turned off on both processors (meaning, AMDs core design is almost, if not just as efficient as intel). That being said, with AMDs native core design, theres just no room on the die for 12mb of L2 cache so they are stuck with half a MB or a MB per core instead of the insane 2mb to 3mb per core. Thats just my opinion, i bet when AMD drops to 45nm they can increase their cache size and things can get better for them. I used athlons throughout the whole P4 era and i never had any problems with the speed, however, AMD sat on their butts way too long, so now they are playing the catch up game. But they'll get there.
 

Amiga500

Distinguished
Jul 3, 2007
631
0
18,980


The large L2 cache for the Conroe/Penryn based chips helps mask the FSB bottleneck with Intel CPUs.



Since AMD do not use a FSB, that same bottleneck simply doesn't exist for them. There are people saying the Barcelona is IMC limited, so memory latency would seem to be an issue for AMD at the moment (which is a similar effect to the FSB bottleneck - but scale the IMC speed up and it disappears).



Indeed, a quick google and this page shows what I'm talking about remarkably clearly:

http://techreport.com/articles.x/13176/3


Compare the old K8 to K10, memory bandwidth is well improved in K10 over K8 (which is still better than the Conroe/Penryn Intels), but memory latency has slowed from K8 to K10 to levels similar to Conroe/Penryn.


AMD have two approaches that will improve K10:

1. Bigger L3 cache levels to reduce amount of running to system memory (L3 cache jumps to 6MB in H2 this year with 45nm)

2. Increase IMC speeds as that will improve effectiveness of existing cache - techreport comments that the 1.8GHz 2350 has a L3 latency of around 23ns, but the 2.0GHz 2360 has a L3 latency of roughly 19ns. Thats a 17% L3 latency improvement with a 11% IMC speed rise.

 

Amiga500

Distinguished
Jul 3, 2007
631
0
18,980



To slightly contradict myself.




According to fudzilla, AMD's 8-core chip (Montreal) will be a native 8.
 

WR

Distinguished
Jul 18, 2006
603
0
18,980
Why AMD doesn't follow Intel's design path:

1) Processors take years to design before hitting market. That design work is confidential until the CPU is nearly completed, so you'd be years behind playing catch-the-leader.

2) Intel has a huge R&D budget to maintain a lead in process technology. Notice their smaller node CPUs always come out before IBM's or AMD's. Even if design info leaked, it would have to be scaled back to an older process - not as many transistors, possibly less thermal headroom and more defect problems.

3) Just because Intel is awesome at designing and shrinking cores doesn't mean the rest of their work is flawless. People generally agree that Intel's use of a desktop optimized architecture (making money off chipsets as well as CPUs) for servers leaves it behind AMD, which designed around server needs first.

I do believe the K10 core is still behind the Penryn cores excluding L2 discrepancies - there is good evidence the K8 core was well behind the Conroes. Performance comparisons of K8 cores with 512K and 1MB of L2 showed that the extra cache was not helping much at all and explains why AMD discontinued 1MB L2 lines. In line with theory, the IMC is supposed to make large amounts of cache irrelevant. So giving a K8 x2 4 MB of cache would likely fail to bring its performance up to Conroe-4M levels.

K10 core looks dated on paper. There are improvements over K8 but those are quite situational. They need an across-the-board boost like increasing issue rate and general execution strength... but I understand that will take some creativity and extra efficiency if your process is simply behind.
 
I think your missing the key point which is Intel's superior prefetchers and cache design ... it is optimised (as a whole) to make the absolute best of any situation.

There is a lot of logic tied up in that particular part of the core2 uarch.

Yes a large L2 cache is part of the key ... but it is not a simple as that.

Intel have done a fantastic job of avoiding the FSB bottleneck ... which is a bit of a misnomer anyway ... name any usefull process which currently exposes this "weakness" on a single socket system ... only huge virtualised memory processes.

Multisocket communication is a different story.

Phenom's cache system is exposed as having a real weakness in comparison (on a single socket level) ... namely the L3 cache latency.

Intel's L1 and L2 cache design is faster and more efficient, and the transistors use less power. And the design is more complex.

Of course I could be wrong ...

The disasterous long pipes and cache flush issues with Netburst were a real focus for the team working on core2 .... they learned by their mistakes.

In stark contrast I don't yet see evidence that AMD's l3 cache is anything but a simple "bolt on" accessory.



 

Amiga500

Distinguished
Jul 3, 2007
631
0
18,980


I can expose the bottleneck quite easily on a single socket (quad core) system.

But admittedly it is workstation type engineer software I'm using (not virtualised though).



Optimising around the bottleneck gives a 15% ish speedup.
 

FHDelux

Distinguished
Jan 25, 2008
99
0
18,630




Wow that is a really good article. Appears as though AMD has a few more optimizations that need to be made, as you said, before their latency issues dissapear. I still think perhaps more cache per core could mask this problem a little more, esspecially at the speeds we are talking about here. I realize the entire setup of the barcelona core is different than a K8, or C2d, however, L2 cache is always a good way to hide your memory latency issues to an extent lol.
 

WR

Distinguished
Jul 18, 2006
603
0
18,980
In stark contrast I don't yet see evidence that AMD's l3 cache is anything but a simple "bolt on" accessory.
It's more complicated than "bolt-on," but the complications are mostly to take care of compatibility with 4 disparately clocked cores, not to enhance performance as with Conroe, which is just 2 synchronous cores and very convenient to optimize around.

Having the L3 at all is a detriment, too. AMD worked a little on the L1 and L2's in K10, but that was countered by the slowdown the L3 would bring.
 

Mathos

Distinguished
Jun 17, 2007
584
0
18,980


Yeah, but the L3 is only causing a slowdown because it is linked to the speed of the NB/IMC, meaning a phenom x4 running at 2.3ghz core, 1.8ghz nb/imc has to wait multiple cycles to get info from the L3 to L2 and vice versa. When the cores and nb/imc are running in sync with each other at the same speed that all changes, as does the performance of the processor, especially at higher clock speeds. Which is why according to some articles the phenom some times seems to scale in performance at a better than 1:1 ratio.
 

Amiga500

Distinguished
Jul 3, 2007
631
0
18,980


Indeed.

That is the overall point I was trying (poorly) to make earlier.


They need to improve IMC speeds as a priority over increased L3 cache sizes. Perhaps B3 will fix it.
 

thefumigator

Distinguished
Jul 3, 2005
142
0
18,680
@amiga500
I think -almost sure- that you are wrong -almost sure I repeat-
I agree that you can't glue 2 amd cores together like "lets put 2 brisbane together to make a quad core on one socket", since that's not possible

But as from what I understand, 2 amd64 cores can be glued together using hypertransport on die (requires a new core stepping btw), exactly the same way as an opteron 200 series that goes 2 in 2 different sockets, but all the HT tracks that goes from socket 0 to socket 1 can be included in one package encapsulated to make a single socket 4-core processor this way. (cons: the power consumption is equal to the consumptions of dual sockets, too much. also the cost, you are really buying 2 cores in one socket, it may be expensive)

If what you say is really the way it is, then you wouldn't be able to put 1 memory dimm into an AMD multiple socket computer which isn't true since you can have 2 opterons 200 series with just 1 memory dimm that will connect directly to the cpu IMC that is next to the module.

Also, there are benchmarks -very old ones- out there that compares the impact of leaving one opteron without its own memory. there are also several low end server motherboards that got just 4 memory slots that connects to one of the two sockets (always talking about old 940).

The impact (of leaving just 1 memory dimm on multiple cpu) is low (or its far from disaster).
 

Mathos

Distinguished
Jun 17, 2007
584
0
18,980


Yeah I noticed a lot of improvement, the clock speeds on the cores scale a lot better performance wise too with the higher speed IMC. For example my 2.6 by 2.4 performance was right on par (withen say 5%) with what a qx6700 could do in a lot of things with same vid card and similar setup. Never got to try 2.6 by 2.6 though, since thats when my video card crapped out, and I had to RMA, was a defective card. But, I've got a sapphire toxic edition 3870 on it's way that should be here around monday or tuesday, so I'll start messing around with it again. Not to mention the new bios seems to be a far site more stable, bios p0j on the k9a2 plat. Only problem with it is there are no options to change the nb/imc multi like there was with the bios's that had the custom p-states section. So I think I'm gonna contact MSI customer service and pester them to see if they'll put together a custom beta bios with the IMC/NB multi plane control included again. Need to bug em and see if they'll make it so the voltages can be more finely tuned as well, though auto setting seems to be doing pretty well, and pretty conservative in the new bios. It's not as fast as 1.1b3 bios but it appears to be a hell of a lot more stable so far.
 

TMSter

Distinguished
Jan 1, 2007
130
0
18,680
This topic almost made me throw up. Amd arent followers they are leaders they don't need to do nothing like intel. :kaola: