Ad

News

Huge Supermarket Data Breach Exposes 4.2 Million Accounts

Portland (Maine) - A huge security break at the Hannaford Bros. Read more

Intel: No chipset shortages with better capacity allocation through ATI support

Anand Chandrasekher, senior vice president of Intel, admitted that terminating the supply of entry-level chipsets led to Intel missing profit and market share goals. Read more

CeBIT 2008: Thermaltake Offers 2kW PSU

Thermaltake now offers a high performance power supply for users that just can't get enough power. The new model belonging to the Toughpower line provides up to 2 Kilowatts of power. Live demonstrations are given at the company's CeBIT stand Read more

CeBIT 2008: RAM Watercooling By Gigabyte

Gigabyte, more commonly known for its motherboards and graphics cards, is presenting a water cooling solution for RAM modules, dubbed Cool Rain. Read more

Latest Reviews & Articles

Highpoint Takes On Adaptec's SAS Controller

Highpoint Takes On Adaptec's SAS Controller

Professional hardware can be expensive, but are cost effective business products really an alternative? We compared Highpoint’s new RocketRAID 2640X4 SAS RAID controller with Adaptec’s 5405 entry level card. Read more

GeForce GTX 295 Performance: Previewed

GeForce GTX 295 Performance: Previewed

After a frustrating second half of 2008, Nvidia is looking to start the new year off by reclaiming its single-card performance crown. We got our hands on an engineering sample GeForce GTX 295 to give you a taste of what you can expect in two weeks. Read more

Phenom Recycled: Athlon X2 7000-Series

Phenom Recycled: Athlon X2 7000-Series

Just a couple of weeks before the introduction of its 45 nm Phenom II, AMD introduces a new dual-core chip. The Athlon X2 7000-series is basically a 65 nm Phenom with two active cores, but with the full L2 and L3 cache memory. Read more

Does Saving Power Mean Hurting Performance?

Does Saving Power Mean Hurting Performance?

Modern processors are capable of switching into power-efficient modes to save power when they’re idle, and an increasing number of motherboards offer dynamic features for the same purpose. Yet, the benefits come at a price. Read more

All the Reviews & Articles

Related Content

  Tom's Hardware Forums » CPU & Components » CPUs » Anand disects K10 numbers at CeBit
 

Anand disects K10 numbers at CeBit




Word :   Username :  
 
Bottom
Author
 Thread : Anand disects K10 numbers at CeBit
 
Profile: Forum Resident
More Information

Anandtech has an interesting update on Barcelona's perf numbers by comparing Woodcrest, Clovertown and 8220 and extrapolating with some new info from AMD.

Linkage!

Related Product

Register or log in to remove.

C’est magnifique, mais ce n’est pas la guerre.
Profile: Forum Master
More Information

Quote :

However, stating that the data cache bandwidth is twice as high as Intel's Core is ignoring a few things. Eric Bron, probably one of the most knowledgeable developers when it comes to SSE, stated: "Intel Core can sustain one 128-bit load and one 128-bit store per cycle (I've measured actual timings very near this theoretical peak), so Core can copy 128 bits per cycle. Barcelona (K10) can only copy 64 bits per cycle from the above store bandwidth limitation." So the twice as much "load bandwidth" is only a small part of the story:

* Intel Core can do a 128-bit Load and 128-bit Store in one cycle if possible
* AMD's K10 can either do two 128-bit loads, or two 64-bit stores, or one 128-bit Load and one 64-bit Store

Depending on the situation, AMD's K10 can do twice as much, about equal or about 33% less work in one cycle. So you cannot conclude that the AMD K10 has twice as much SSE bandwidth as the Intel quad core Xeon. It will only be faster if loads happen twice as often (or more) as Stores. In most "harder to vectorize" FP code, this is the case, so here the K10 chip will probably win by a small margin (as the percentage of SSE code is low). An example of this is the SpecFP benchmark. In some "easy to vectorize" SSE code this is not the case, and in that case the K10 will probably not be beaten per clock cycle, but the clock speed disadvantage might give the Xeon the edge.

Profile: nimble knuckle
More Information

Good quotation.

It seems to me that code being optimized for AMD or Intel systems would be very beneficial in today's systems. If only the development costs weren't so prohibitive.

Profile: member
More Information

Nice selective quote there Ninja. How 'bout this.

Quote :

The result is that the dual Xeon x5355 (eight cores) is heavily bottlenecked by a lack of bandwidth and hardly faster than the dual Opteron 2220SE (four cores) in CPU FP2000 rates. If we take a look at the best dual core Xeon 5150 (2.66 GHz) score, it gets a score of 78.2. That means that the quad Xeon 2.66 GHz is only about 32% faster than its dual core brother at the same clock speed, another clear indication that the dual Xeon x5355 scores are seriously limited by memory bandwidth. It is no surprise that the quad socket Opteron 8220 is about 34% faster than the Xeon x5355 (and we are ignoring the probably inflated result of 184 you can get with the Sun Studio Compiler).

This puts AMD's claim that the best "K10" (most likely at 2.3 GHz) will be 42% faster than the Xeon x5355 in Spec FP rate in the right perspective. We reported in our Barcelona architecture article that the AMD K10's Northbridge is set up to handle higher bandwidth than the current AMD chips. As has been shown numerous times, the current Athlon 64 X2/Opteron architecture is not able to use the extra bandwidth that DDR2 gives.

So most of the 42% advantage is probably due to K10's better Northbridge and better use of DDR2. Ron Myers of AMD claimed that the difference is now already greater than 42%. Combine this with the fact that the K10 is running at only 2.3 GHz, and we can conclude that the memory subsystem (Load/store unit, L1, L2, Northbridge) of the K10 is simply (vastly) superior compared to the Athlon 64s and to the quad Xeon. This confirms our and Intel's assumption that the K10 will probably make the largest impact as a very potent HPC chip. The hardware virtualization features in AMD's K10 are quite impressive, but we'll discuss them later.



In other words, no matter how good Intel's Core may be, it is severely constrained by its 1970's era FSB. Core2 is good for desktop baby benches, but when it comes to serious bandwidth, it falls flat on its face. No surprise really.

C’est magnifique, mais ce n’est pas la guerre.
Profile: Forum Master
More Information

Technically, you are engaging in selective quoting too. Hello pot, my name is kettle.

Profile: addict
More Information

So, Xeon will be better at some things, Barcelona at others. As expected I suppose.

And quoting from an article, by its very nature, has to be selective. Otherwise, you'd be quoting the entire article, which would be rather reptitive considering the link is given.

My ass does all my talking!
Profile: nimble knuckle
More Information

Quote :

Technically, you are engaging in selective quoting too. Hello pot, my name is kettle.



Hi Kettle! Nice to meet you.

Profile: Forum Resident
More Information

Quote :

Nice selective quote there Ninja. How 'bout this.

The result is that the dual Xeon x5355 (eight cores) is heavily bottlenecked by a lack of bandwidth and hardly faster than the dual Opteron 2220SE (four cores) in CPU FP2000 rates. If we take a look at the best dual core Xeon 5150 (2.66 GHz) score, it gets a score of 78.2. That means that the quad Xeon 2.66 GHz is only about 32% faster than its dual core brother at the same clock speed, another clear indication that the dual Xeon x5355 scores are seriously limited by memory bandwidth. It is no surprise that the quad socket Opteron 8220 is about 34% faster than the Xeon x5355 (and we are ignoring the probably inflated result of 184 you can get with the Sun Studio Compiler).

This puts AMD's claim that the best "K10" (most likely at 2.3 GHz) will be 42% faster than the Xeon x5355 in Spec FP rate in the right perspective. We reported in our Barcelona architecture article that the AMD K10's Northbridge is set up to handle higher bandwidth than the current AMD chips. As has been shown numerous times, the current Athlon 64 X2/Opteron architecture is not able to use the extra bandwidth that DDR2 gives.

So most of the 42% advantage is probably due to K10's better Northbridge and better use of DDR2. Ron Myers of AMD claimed that the difference is now already greater than 42%. Combine this with the fact that the K10 is running at only 2.3 GHz, and we can conclude that the memory subsystem (Load/store unit, L1, L2, Northbridge) of the K10 is simply (vastly) superior compared to the Athlon 64s and to the quad Xeon. This confirms our and Intel's assumption that the K10 will probably make the largest impact as a very potent HPC chip. The hardware virtualization features in AMD's K10 are quite impressive, but we'll discuss them later.



In other words, no matter how good Intel's Core may be, it is severely constrained by its 1970's era FSB. Core2 is good for desktop baby benches, but when it comes to serious bandwidth, it falls flat on its face. No surprise really.


This just confirms what I meant about FSB saturation. In heavy loads, the cores are fighting for too little bandwidth from RAM. The fact that C2Q doesn't improve much over C2D illustrates this. Anand makes note of the fact also that Intel is pushing up the DIB.

K10 should effectively double the bandwidth and actually be able to use more of it. This should give Kuma a real push on the desktop. I guess people forget that there is a dual core version of K10 with L3.

Profile: addict
More Information

Quote :

Technically, you are engaging in selective quoting too. Hello pot, my name is kettle.



Hi Kettle! Nice to meet you.

Sup, I'm grill, can I chill with you guys?

Profile: nimble knuckle
More Information

Get lost, sharikou

Profile: addict
More Information

so... basically all this article shows is that AMD has a good chance of putting up a fight in the next generation of processors


woot i guess

C’est magnifique, mais ce n’est pas la guerre.
Profile: Forum Master
More Information

Two small, tiny, possibly insignificant things you have might forgotten in your rush to always have something to say against the, "big bad Intel(or VIA or AMD, depending on who is doing the calling) fanboy" Ninja. I posted the conclusion to the statement you quoted (which is being selective I might add) meaning that the people in the know, when looking at all the data showed that the bandwidth problem isn't as big as its been played out to be.

Two, you still haven't shown any proof that shows that a monolithic multicore performs better than the same arch in a dual die - single processor solution. http://www.21softs.com/emoticons/images/tonguep.gif

With that, I await someone calling me a fool, biased, idiot, moron, jackass, meanie or a derivative of these terms.

Have a nice day,
Ninja

Profile: Forum Resident
More Information

Quote :

Two small, tiny, possibly insignificant things you have might forgotten in your rush to always have something to say against the, "big bad Intel(or VIA or AMD, depending on who is doing the calling) fanboy" Ninja. I posted the conclusion to the statement you quoted (which is being selective I might add) meaning that the people in the know, when looking at all the data showed that the bandwidth problem isn't as big as its been played out to be.

Two, you still haven't shown any proof that shows that a monolithic multicore performs better than the same arch in a dual die - single processor solution. http://www.21softs.com/emoticons/images/tonguep.gif

With that, I await someone calling me a fool, biased, idiot, moron, jackass, meanie or a derivative of these terms.

Have a nice day,
Ninja




I think the point wasn't the inefficiency of the MCM, but the lack of scaling going from two cores to 4 in terms of bandwidth. Opteron doubles it, Clovertown doesn't.

Profile: Faithful Poster
More Information

Thanks for the link BM, I can't wait for people to start giving this post one star simply because it came from you/is pro AMD.

I've heard about the load/store "problem" with K10 before. It worried me then, as it still worries me. I love the "can be twice as fast, as fast or 33% slower" quote. If I understand this correctly, this means that when executing SSE code, K10s performance will be better then, equal to, or vastly slower then C2D. Does anyone know/can explain when these will be true? For example, when using a SSE encoder, ripping video will be faster while running LAME will be slower? I want to know when this worst case senerio will take place.

I'm also worried about the comment about the better memory arch in K10. If much of the improvment is because of better memory management, why didn't they simply tweek the K8L arch? I remember when AMD dropped DDR and switched to DDR2. You had to run DDR2-800 inorder to have 1-3% better performance then DDR. The memory management on the DDR2 chips has been horrid. If they could make these tweeks, why not just do it? If they could get ~30-40% gains just by tweeking the memory controller, I'd do that while working on K10 more. I'd keep working on K10 if the max frequency is only 2.3GHz. (nearly 30% slower then the current max frequency.)

Profile: Forum Resident
More Information

Quote :

Thanks for the link BM, I can't wait for people to start giving this post one star simply because it came from you/is pro AMD.

I've heard about the load/store "problem" with K10 before. It worried me then, as it still worries me. I love the "can be twice as fast, as fast or 33% slower" quote. If I understand this correctly, this means that when executing SSE code, K10s performance will be better then, equal to, or vastly slower then C2D. Does anyone know/can explain when these will be true? For example, when using a SSE encoder, ripping video will be faster while running LAME will be slower? I want to know when this worst case senerio will take place.

I'm also worried about the comment about the better memory arch in K10. If much of the improvment is because of better memory management, why didn't they simply tweek the K8L arch? I remember when AMD dropped DDR and switched to DDR2. You had to run DDR2-800 inorder to have 1-3% better performance then DDR. The memory management on the DDR2 chips has been horrid. If they could make these tweeks, why not just do it? If they could get ~30-40% gains just by tweeking the memory controller, I'd do that while working on K10 more. I'd keep working on K10 if the max frequency is only 2.3GHz. (nearly 30% slower then the current max frequency.)




I think it's actually a good idea totally upddate rather than trying to redo the current K8.

But as an aside if the reports about the R600 shindig on April 22th are true there will be demos using Barcelona, but there is the conspicuous absence of an HT3 chipset. 790G is supposedly on the way so I guess the systems will be using that.

That would be a great chance to show off Agena FX. Imagine what two of those will do with CrossFire. And they say they were working on 4 way CrossFire.

That would definitely the first huge thing from the "New AMD."

Profile: addict
More Information