Clovertown (double Conroe) and Athlon64 compared

sharikou

Distinguished
Apr 15, 2006
42
0
18,530
http://sharikou.blogspot.com/

Please visit the link to see more details

Clovertown scores revealed

Clovertown compared to Athlon 64 2800+ (1.8GHZ, socket 754, 130nm, single channel DDR)


Intel showed off Clovertown quad-core server CPUs running on the Bensley platform with FB-DIMM memory at Spring IDF Taipei. Clovertown is basically two 65nm Conroe CPUs stacked together, with total of 8MB L2 cache. This page contained the benchmark scores for a 2P Clovertown. The clockspeed was 2GHZ. For single threaded test, it got a Cinebench 9.5* score of 362. Daniel J. Casaletto, Intel Vice President, Digital Enterprise Group Director, Microprocessor Architecture and Planning, was running the demo. For 2P 8 cores, the score scaled to 1723, or 4.7x. Adding 7 cores led to 3.7x more performance. I think this is quite poor, you get only about half a core's worth when you add a core -- FSB bottleneck.

Let's pay more attention to this photo here, which shows the 2P Clovertown in action and is quite exciting. Look at the upper left corner, it reads Cinebench 64 Bit Edition. Finally, we can see Intel got 64 bit working, it's running the 64 bit version of Cinebench 9.5!

In comparison, a 3 year old 2GHZ single core Opteron 246 achieves a score of 366 in single threaded test, 1.1% faster than the NGMA Core at the same clockspeed. Clock for clock, Intel CORE (Merom/Conroe) is slower than Hammer.

On my old Athlon 64 2800+ (1.8GHZ, Socket 754, 130nm), I got a Cinebench 9.5 score of 294. My ClawHammer is a bit slower than Conroe CORE, but only a little. If you consider my CPU is only 1.8GHZ and only uses single channel DDR, and my old PC only has integrated S3 UniChrome graphics which eats some memory, it's quite good. I managed to overclock it to 1.9GHZ and got a score of 312. I expect the old ClawHammer to get a score 0f 294*2/1.8= 327 at 2GHZ.

I am interested in seeing some Clovertown and Sempron socket 939 comparisons. If you have such a machine running Windows x64, please submit your results in the comments. Don't under estimate AMD desktop CPUs, check out this Athlon 64 and Xeon comparison.

The Conroe performance analysis is here. I pointed out that when working set is larger than Conroe's cache (4MB), Conroe performs slower than Athlon64. The Cinebench 9.5 needs over 150MB to run, as a result, Clovertown's 8MB cache didn't help.
 

spud

Distinguished
Feb 17, 2001
3,406
0
20,780
http://sharikou.blogspot.com/

Please visit the link to see more details

Clovertown scores revealed

Clovertown compared to Athlon 64 2800+ (1.8GHZ, socket 754, 130nm, single channel DDR)


Intel showed off Clovertown quad-core server CPUs running on the Bensley platform with FB-DIMM memory at Spring IDF Taipei. Clovertown is basically two 65nm Conroe CPUs stacked together, with total of 8MB L2 cache. This page contained the benchmark scores for a 2P Clovertown. The clockspeed was 2GHZ. For single threaded test, it got a Cinebench 9.5* score of 362. Daniel J. Casaletto, Intel Vice President, Digital Enterprise Group Director, Microprocessor Architecture and Planning, was running the demo. For 2P 8 cores, the score scaled to 1723, or 4.7x. Adding 7 cores led to 3.7x more performance. I think this is quite poor, you get only about half a core's worth when you add a core -- FSB bottleneck.

Let's pay more attention to this photo here, which shows the 2P Clovertown in action and is quite exciting. Look at the upper left corner, it reads Cinebench 64 Bit Edition. Finally, we can see Intel got 64 bit working, it's running the 64 bit version of Cinebench 9.5!

In comparison, a 3 year old 2GHZ single core Opteron 246 achieves a score of 366 in single threaded test, 1.1% faster than the NGMA Core at the same clockspeed. Clock for clock, Intel CORE (Merom/Conroe) is slower than Hammer.

On my old Athlon 64 2800+ (1.8GHZ, Socket 754, 130nm), I got a Cinebench 9.5 score of 294. My ClawHammer is a bit slower than Conroe CORE, but only a little. If you consider my CPU is only 1.8GHZ and only uses single channel DDR, and my old PC only has integrated S3 UniChrome graphics which eats some memory, it's quite good. I managed to overclock it to 1.9GHZ and got a score of 312. I expect the old ClawHammer to get a score 0f 294*2/1.8= 327 at 2GHZ.

I am interested in seeing some Clovertown and Sempron socket 939 comparisons. If you have such a machine running Windows x64, please submit your results in the comments. Don't under estimate AMD desktop CPUs, check out this Athlon 64 and Xeon comparison.

The Conroe performance analysis is here. I pointed out that when working set is larger than Conroe's cache (4MB), Conroe performs slower than Athlon64. The Cinebench 9.5 needs over 150MB to run, as a result, Clovertown's 8MB cache didn't help.

Hmm blog sites always good for a laugh.
 

ltcommander_data

Distinguished
Dec 16, 2004
997
0
18,980
We've already had this posted.

The thing is I don't see how you can justifiably compare benchmarks from completely different websites and test setups. The most glaring thing is that the 2CPU review uses Cinebench 2003 which is based on Cinema 4D R8 while the Intel benchmarks use Cinebench 9.5. We also have no idea how the Intel system is set-up. It has also been pointed out by someone else in the other thread that Cinebench may not scale 100% to 8 cores anyways. In other words this doesn't prove anything one way or another.

It should also be noted that this is still an early engineering sample with the real launch not until 2007 or Christmas at the earliest. The current Cloverton supposedly uses a 1066MHz FSB while Intel is tweaking the CPU and the northbridges to expand it to 1333MHz. That should aid in the scaling issue.

As well, if you are the same person that wrote the blog, I really don't agree with your analysis of the cache thrashing issue. What you seem to have done is compared Yonah, Sossaman, and Core's shared cache with that of an Hyperthreading enabled Netburst. The thing is there is a major difference between the two. The caches in Netburst are completely passive and have no control over their contents and allocation which is why thrashing can occur. The entire point of the shared cache in Yonah and Core is to avoid this issue which is why they are labelled "Smart". The can actively control cache allocation to each core preventing thrashing from occuring. If both cores are heavily loaded and needs lots of cache, each will be assigned 50%, while if one is heavily loaded and the other lightly loaded the cache will be divided appropriately. A core cannot simply take control and eject the other core's data like in Netburst. Granted Intel may be totally incompetent and have flawed sharing logic, but I think it's better to assume it at least kind of works instead of being completely no-existent until some real tests from shipping products are done.

In regards to whether Intel is wasting transistors on the large 4MB cache, it really isn't if you compare it to their previous designs or even AMD's. Smithfield on 90nm had 2x1MB, so doubling the cache size while implementing the process shrink to 65nm isn't unreasonable. AMD processors also have 2x1MB on the 90nm process. I'm also not sure if Intel production is really facing "limited capacity" as you put it in the blog.
 

sharikou

Distinguished
Apr 15, 2006
42
0
18,530
Please read it carefully, this is a completely different benchmark comparsion. http://sharikou.blogspot.com/2006/04/clovertown-scores-revealed.html

The benchmarks are

1) Clovertown (double Conroe) 2GHZ doing Cinebench 9.5 64 bit edition. The result was just reported from IDF Taipei.
clovertown-3_rSGeUlFKkfnX.jpg
. The Clovertown (Intel Core) got a score of 362.

2) Athlon 64 2800+ (Socket 754, 130nm, 1.9GHZ) got a score of 312 on exactly the same benchmark
http://i35.imagethrust.com/i/402784/athlon643000754occinebenc.jpg
[/img]

3) Someone did the same test on Opteron 246, got a score of 366.
http://www.2cpu.com/review.php?id=110&page=6

These are additional proof that Intel CORE won't demonstrate any IPC advantage over current AMD64 implementation.

On cache thrashing, it's a possibility, depends on how Intel manages the cache.

On cache size, 4MB will bring some benefit, but it won't help much in today's compute environment, where the big apps which really need performance are also memory intensive. Adding cache is not an architectural solution.
 

9-inch

Distinguished
Feb 15, 2006
722
0
18,980
On cache size, 4MB will bring some benefit, but it won't help much in today's compute environment, where the big apps which really need performance are also memory intensive. Adding cache is not an architectural solution.
...It only shows how greatly bandwidth starved are Intel upcoming processors. :wink:
I expect Conroe to be a bad multitasker thanks to its shared L2 cache.
 

spud

Distinguished
Feb 17, 2001
3,406
0
20,780
Please read it carefully, this is a completely different benchmark comparsion. http://sharikou.blogspot.com/2006/04/clovertown-scores-revealed.html

The benchmarks are

1) Clovertown (double Conroe) 2GHZ doing Cinebench 9.5 64 bit edition. The result was just reported from IDF Taipei.
clovertown-3_rSGeUlFKkfnX.jpg
. The Clovertown (Intel Core) got a score of 362.

2) Athlon 64 2800+ (Socket 754, 130nm, 1.9GHZ) got a score of 312 on exactly the same benchmark
http://i35.imagethrust.com/i/402784/athlon643000754occinebenc.jpg
[/img]

3) Someone did the same test on Opteron 246, got a score of 366.
http://www.2cpu.com/review.php?id=110&page=6

These are additional proof that Intel CORE won't demonstrate any IPC advantage over current AMD64 implementation.

On cache thrashing, it's a possibility, depends on how Intel manages the cache.

On cache size, 4MB will bring some benefit, but it won't help much in today's compute environment, where the big apps which really need performance are also memory intensive. Adding cache is not an architectural solution.

They are all relatively close to each other in score regardless of its its a Netburst, K8, or Core I fail to see any issues at this moment in time.
 

sharikou

Distinguished
Apr 15, 2006
42
0
18,530
I don't see any bandwidth starvation problem for a 2GHZ Clovertown running one instance of Cinebench 9.5. But it's slower than single core Opteron 246 at the same clockspeed. What Intel has is a low IPC problem.
 

accord99

Distinguished
Jan 31, 2004
325
0
18,780
I don't see any bandwidth starvation problem for a 2GHZ Clovertown running one instance of Cinebench 9.5. But it's slower than single core Opteron 246 at the same clockspeed. What Intel has is a low IPC problem.
It's not, you're comparing the score of Cinebench 2003. In 9.5, an 2GHz Opteron 270 scores 334 in 64-bit mode. It's already faster, and its likely Maxxon hasn't implemented optimizations yet to take advantage of its increased SIMD capabilities.
 

rettihSlluB

Distinguished
Jun 5, 2005
296
0
18,780
I don't see any bandwidth starvation problem for a 2GHZ Clovertown running one instance of Cinebench 9.5. But it's slower than single core Opteron 246 at the same clockspeed. What Intel has is a low IPC problem.
...Then it's worser than what I thought. 8O
 

endyen

Splendid
Look, and read the post first. That way you wont look like such an idiot. The benches were all done on 9.5.
What is worse is that Cinebench is highly Intel floptimized.
 

accord99

Distinguished
Jan 31, 2004
325
0
18,780
Look, and read the post first. That way you wont look like such an idiot. The benches were all done on 9.5.
What is worse is that Cinebench is highly Intel floptimized.
The Clovertown score is in 9.5. The quoted score for the Opteron 246 is from Cinebench 2003. His own scores in 9.5 show that Clovertown is faster at the same clock and that the 2GHz Opteron will score in the 330 range.
 

accord99

Distinguished
Jan 31, 2004
325
0
18,780
Please read
The Clovertown (Intel Core) got a score of 362.

2) Athlon 64 2800+ (Socket 754, 130nm, 1.9GHZ) got a score of 312 on exactly the same benchmark
http://i35.imagethrust.com/i/402784/athlon643000754occinebenc.jpg
[/img]

3) Someone did the same test on Opteron 246, got a score of 366.

What? 1.9GHz scores 312, scale it to 2GHz and you get a score of ~330, which is less than 362 that Clovertown scored. The 366 score of the 246 comes from a 2cpu review using Cinebench 2003. The scores aren't comparable.

http://www.2cpu.com/review.php?id=110&page=6
 

endyen

Splendid
You are right. I am the idiot for believing
3) Someone did the same test on Opteron 246, got a score of 366.
ment the same test.
Doesn't change the fact that a very old Amd single channel desktop chip, with bad timings almost scores as well (on one of Intel's favorite benches to boot).
 

JonathanDeane

Distinguished
Mar 28, 2006
1,469
0
19,310
4 Cores ? 2 Cores is more then anyone will ever need.... heheheh all jokes aside at this time 4 cores seems to be more of a pissing contest :( 99% of the programs out still do not utilize multiple threads :( hmmm any difference in programs being optimized for 2 or 4 or 8 processors ? if there is no difference then I guess more IS better, if code has to be updated from 2 cores to 4 then it will be a long time before that happens. Anyway the tech for this chip does seem impressive that unified cache sounds awesome ! I wonder if you could turn off 3 cores have a huge cache and OC the remaining core for games ? (note: this would be most usefull for current games that do not support more then one core)
 

ltcommander_data

Distinguished
Dec 16, 2004
997
0
18,980
Tell me, do you even look at what you're posting?

That Opteron 146 was running at 3GHz with a 300MHz base HTT which would explain why the results are so high. In fact, if you take that 502 and divide by 2/3 to get the score at 2 GHz you only have 335. And that's still including the increase in HTT speed.

Besides, the point is that it's really hard to compare systems from different sites without having things side-by-side on a workbench.
 

shinigamiX

Distinguished
Jan 8, 2006
1,107
0
19,280
http://sharikou.blogspot.com/

Please visit the link to see more details

Clovertown scores revealed

Clovertown compared to Athlon 64 2800+ (1.8GHZ, socket 754, 130nm, single channel DDR)


Intel showed off Clovertown quad-core server CPUs running on the Bensley platform with FB-DIMM memory at Spring IDF Taipei. Clovertown is basically two 65nm Conroe CPUs stacked together, with total of 8MB L2 cache. This page contained the benchmark scores for a 2P Clovertown. The clockspeed was 2GHZ. For single threaded test, it got a Cinebench 9.5* score of 362. Daniel J. Casaletto, Intel Vice President, Digital Enterprise Group Director, Microprocessor Architecture and Planning, was running the demo. For 2P 8 cores, the score scaled to 1723, or 4.7x. Adding 7 cores led to 3.7x more performance. I think this is quite poor, you get only about half a core's worth when you add a core -- FSB bottleneck.

Let's pay more attention to this photo here, which shows the 2P Clovertown in action and is quite exciting. Look at the upper left corner, it reads Cinebench 64 Bit Edition. Finally, we can see Intel got 64 bit working, it's running the 64 bit version of Cinebench 9.5!

In comparison, a 3 year old 2GHZ single core Opteron 246 achieves a score of 366 in single threaded test, 1.1% faster than the NGMA Core at the same clockspeed. Clock for clock, Intel CORE (Merom/Conroe) is slower than Hammer.

On my old Athlon 64 2800+ (1.8GHZ, Socket 754, 130nm), I got a Cinebench 9.5 score of 294. My ClawHammer is a bit slower than Conroe CORE, but only a little. If you consider my CPU is only 1.8GHZ and only uses single channel DDR, and my old PC only has integrated S3 UniChrome graphics which eats some memory, it's quite good. I managed to overclock it to 1.9GHZ and got a score of 312. I expect the old ClawHammer to get a score 0f 294*2/1.8= 327 at 2GHZ.

I am interested in seeing some Clovertown and Sempron socket 939 comparisons. If you have such a machine running Windows x64, please submit your results in the comments. Don't under estimate AMD desktop CPUs, check out this Athlon 64 and Xeon comparison.

The Conroe performance analysis is here. I pointed out that when working set is larger than Conroe's cache (4MB), Conroe performs slower than Athlon64. The Cinebench 9.5 needs over 150MB to run, as a result, Clovertown's 8MB cache didn't help.

Hmm blog sites always good for a laugh.
Seconded.
 

sharikou

Distinguished
Apr 15, 2006
42
0
18,530
Please read
The Clovertown (Intel Core) got a score of 362.

2) Athlon 64 2800+ (Socket 754, 130nm, 1.9GHZ) got a score of 312 on exactly the same benchmark
http://i35.imagethrust.com/i/402784/athlon643000754occinebenc.jpg
[/img]

3) Someone did the same test on Opteron 246, got a score of 366.

What? 1.9GHz scores 312, scale it to 2GHz and you get a score of ~330, which is less than 362 that Clovertown scored. The 366 score of the 246 comes from a 2cpu review using Cinebench 2003. The scores aren't comparable.

http://www.2cpu.com/review.php?id=110&page=6

I know, the Clovertown is still 10% faster than my Athlon 64 at the same clock, but compare the specs:

1) Clovertown: 65nm, 2x4MB cache, multi-channel FB-DIMM, cost $xxx
2) My Athlon 64: 130nm, 0.5MB cache, socket 754, rev CG, no SSE3, single channel DDR400, cost $50. Its score is within the the striking distance of the Clovertown.

Some guy posted on my blog saying he got a score of 371 on a socket 939 platform at 2GHZ. I am asking him to post a screen dump. I think doubling the cache and using dual channel DDR should bump the speed a bit.
 

levicki

Distinguished
Feb 5, 2006
269
0
18,780
These are additional proof that Intel CORE won't demonstrate any IPC advantage over current AMD64 implementation.

You:

#1 do not understand CPU architecture
#2 are comparing "grandmothers to frogs"
#3 deriving faulty conclusions based on your lack of knowledge and logic reasoning

On cache size, 4MB

Is pretty much the standard nowadays (as in 2x 2MB L2 in Presler) thank you.

where the big apps which really need performance are also memory intensive.

Name one memory intensive application where bigger cache doesn't help but instead hurts.

Adding cache is not an architectural solution.

Neither it is adding REX prefixes to enable 64-bit computing on top of a legacy x86 junk and thus breathing back life to something that should have been left to die.
 

sharikou

Distinguished
Apr 15, 2006
42
0
18,530
read this transcript:

http://seekingalpha.com/article/9012

When he was asked about Intel's coming products, AMD's CEO said:

"we expected, we had planned that our competitor would eventually have to follow and react to what we have done and get better. It will be interesting to see the things that we're going to do later, which will again continue to force them(Intel) to react and figure out what else to do next. We don't intend to in any shape or form give any leeway in our leadership relative to product and technology."

It seems AMD got something in their sleeves. Hector Ruiz is usually a very conservative guy.

Dirk Meyer, the DEC Alpha guy, is now AMD's President, he said

"Yes, this is Derrick. We will be transitioning our desktop, mobile and server offerings all from DDR1-based technologies to DDR2-based technologies. We feel that is the right time, because that is the rough timeframe where the price transition will occur between DDR1 and DDR2. To your point, there is not a huge incremental performance benefit to be had as a result of that transition, though there is some. We will be, along with the transition, introducing higher capability product as well that are kind of independent of the DRAM technology."

If I read Meyer's words correctly, DDR2 is just one part of the transition, there are other stuff involved.
 

gOJDO

Distinguished
Mar 16, 2006
2,309
1
19,780
WOW! 8O 8O 8O 8O 8O
AMAZING!
Not the AMD fanboy bloger, but those who belive the stores he is wasting time for.... :roll:
Any fanboy want to trade, i will trade three s754 3000+ for 1 Clovertown at whatever freq. Anytime, just PM
 

ltcommander_data

Distinguished
Dec 16, 2004
997
0
18,980
If I read Meyer's words correctly, DDR2 is just one part of the transition, there are other stuff involved.
I would hope so considering the benefit from DDR2 alone in the AM2 parts isn't particularly impressive. They're probably refering to K8L which I'd hope will be able to use all that new bandwidth. I haven't actually heard many details about it either than the marketed 2x increase in floating point performance. I wonder whether they are going to achieve that through increases FPU numbers, expanding the width to 128-bit, using some form of micro or macro-ops fusion or some other creative way.

If THG is to be believed, Intel has another new architecture in the wings for 2007-2008.

http://www.tgdaily.com/2006/04/14/intel_quad_core/

Personally, this is the first I've heard of it, but everyone always have something up their sleeves. It's just a matter of how it actually turns out for the masses in the end.
 

K8MAN

Distinguished
Apr 1, 2005
839
0
18,980
read this transcript:

http://seekingalpha.com/article/9012

When he was asked about Intel's coming products, AMD's CEO said:

"we expected, we had planned that our competitor would eventually have to follow and react to what we have done and get better. It will be interesting to see the things that we're going to do later, which will again continue to force them(Intel) to react and figure out what else to do next. We don't intend to in any shape or form give any leeway in our leadership relative to product and technology."

It seems AMD got something in their sleeves. Hector Ruiz is usually a very conservative guy.

Dirk Meyer, the DEC Alpha guy, is now AMD's President, he said

"Yes, this is Derrick. We will be transitioning our desktop, mobile and server offerings all from DDR1-based technologies to DDR2-based technologies. We feel that is the right time, because that is the rough timeframe where the price transition will occur between DDR1 and DDR2. To your point, there is not a huge incremental performance benefit to be had as a result of that transition, though there is some. We will be, along with the transition, introducing higher capability product as well that are kind of independent of the DRAM technology."

If I read Meyer's words correctly, DDR2 is just one part of the transition, there are other stuff involved.
Improved SSE, SSE2, SSE3, and more x86-64 instructions, improved FPU, DDR3 compatibility built-in, the inverse threading or whatever the heck it's called and futher optimisation's to the DDR controller. Not to mention HT3.0, Z-ram, and other suprises as well i'm sure.
 

sharikou

Distinguished
Apr 15, 2006
42
0
18,530
These are additional proof that Intel CORE won't demonstrate any IPC advantage over current AMD64 implementation.

You:

#1 do not understand CPU architecture
#2 are comparing "grandmothers to frogs"
#3 deriving faulty conclusions based on your lack of knowledge and logic reasoning

On cache size, 4MB

Is pretty much the standard nowadays (as in 2x 2MB L2 in Presler) thank you.

where the big apps which really need performance are also memory intensive.

Name one memory intensive application where bigger cache doesn't help but instead hurts.

Adding cache is not an architectural solution.

Neither it is adding REX prefixes to enable 64-bit computing on top of a legacy x86 junk and thus breathing back life to something that should have been left to die.

Dude, try comprehend, be calm.

Of course, adding cache improves performance, you don't have to be a genius to know that, the question is to what extent. Once your working set is much bigger than the cache size, doubling the cache won't do miracles. Right now, we are talking about whether Intel's CORE will show 20% IPC advantage over AMD as Mooly Eden bragged. All evidence show that to be false. As I have pointed out, from all independently verifiable data, the only cases where Intel showed a 20% IPC advantage are when the whole working set fits in the 4MB cache.

There is a major difference between Presler's 2x2MB and Conroe's 4MB. I hope you understand the difference.

Regarding the 64 bit extension on x86, Intel had 4 groups trying to figure it out, but they failed. Now Intel is simply following AMD.

See this http://news.com.com/2100-1001-985432.html

"Four separate design teams at Intel examined how the company could take one of its 32-bit chips and transform it into a 64-bit machine, said Richard Wirt, another senior fellow at Intel. After running simulations, all four teams concluded that such a transition wouldn't be economically feasible, he said. "

So, it's not trivial to do AMD64.
 

TRENDING THREADS