Sign in with
Sign up | Sign in
Your question

K8L 10-15% faster in SPECint 40% in SPECfp

Last response: in CPUs
Share
January 30, 2007 9:36:25 PM

According to this guy, the 40% number that AMD announced is for floating point only. For integer they expect an increase of 10-15% over the clovertown. Still wonder how this translates to real world apps though and if this 40% advantage is only when comparing clock for clock.

http://blogs.zdnet.com/Ou/?p=415
January 30, 2007 9:49:33 PM

Quote:
According to this guy, the 40% number that AMD announced is for floating point only. For integer they expect an increase of 10-15% over the clovertown. Still wonder how this translates to real world apps though and if this 40% advantage is only when comparing clock for clock.

http://blogs.zdnet.com/Ou/?p=415


Because he doesn't realize that Barcelona will have dual core derivatives, I'd take his report with a grain of salt. Maybe I should make that my sig. Yeah, I think I will.

He also notes a significant deficit for AMD with AM2 vs. Core2 which is far from the actual situation.

We will hear SPEC numbers soon and TPC numbers will follow.

Hey, I actually said that on one of the forums I visit.
January 30, 2007 10:01:34 PM

Quote:
He also notes a significant deficit for AMD with AM2 vs. Core2 which is far from the actual situation.


I can't help but lol at that one.
Related resources
January 30, 2007 10:02:48 PM

Quote:
According to this guy, the 40% number that AMD announced is for floating point only. For integer they expect an increase of 10-15% over the clovertown. Still wonder how this translates to real world apps though and if this 40% advantage is only when comparing clock for clock.

http://blogs.zdnet.com/Ou/?p=415


Because he doesn't realize that Barcelona will have dual core derivatives, I'd take his report with a grain of salt. Maybe I should make that my sig. Yeah, I think I will.

He also notes a significant deficit for AMD with AM2 vs. Core2 which is far from the actual situation.

We will hear SPEC numbers soon and TPC numbers will follow.

Hey, I actually said that on one of the forums I visit.

So what does him not realizing that K8L will have dual core derivatives have to do with numbers??? Do you believe that they will be larger with dual cores?
January 30, 2007 10:06:58 PM

Quote:
According to this guy, the 40% number that AMD announced is for floating point only. For integer they expect an increase of 10-15% over the clovertown. Still wonder how this translates to real world apps though and if this 40% advantage is only when comparing clock for clock.

http://blogs.zdnet.com/Ou/?p=415

Well, if AMD boasts a 40%, that is clearly a maximum, however, figures are pretty unclear up to this moment without any benchmarks. On an article I even saw a 80% per core over K8, which would mean 50%+ over Core2.
Until we have at least one proven benchmark, speculation will be the predominant form of input we'll get :roll:
January 30, 2007 10:35:15 PM

hey, looks like we've signed up the same day :D 
January 30, 2007 10:52:14 PM

Not specINT and specFP, but TPC-C (a database/transaction processing benchmark) and specFP_rate assuming that they are continuing to use the two released benchmarks from December.

http://xtremesystems.org/forums/showpost.php?p=1873442&...

The following is my estimated scores for the Barcelona system

SPECfp_rate:
fastest 2S Opteron 2220SE score is 96.0 (peak)
fastest 2S Xeon 5160 score is 83.4
fastest 2S Xeon 5355 score is 104

40% faster than the 2220SE gives a score of about 135-140 for a 2S, quad-core Barcelona. By comparison, a 4S Opteron 8220SE has a best score of 178.

SPECfp_rate is heavily memory bandwidth dependent and really isn't indicative of desktop application performance and most server type applications.

Results are from here:

http://www.spec.org/cpu2000/results/rfp2000.html

OLTP benchmark:

I think the existing systems mentioned in the test are:

Opteron 2220SE - HP DL385G2 with a score of 139,693
Xeon Woodcrest - HP DL380G5 with a score of 140,246
Xeon Clovertown - HP BL480c wth a score of 222,117 (~60% higher increase versus the Woodcrest system, which matches the graph)

A 70% increase over the Opteron system gives a score of around 235,000-240,000.

The best score for a Clovertown system currently is the HP ML370G5 with a score of 240,737.

The best score for a 4S Opteron system is the HP ProLiant DL585G2 with 8220SE with a score of 262,989. The best score for a 4S Intel system is 331.087 from the IBM x3950 with 3.5GHz Tulsas-based Xeons.
January 30, 2007 11:02:33 PM

Quote:
synthetic benches suck donkey


spec.org is the only IEEE/ASTM/ANSI approved benhcmarks since they are 1. peer reviewed for accuracy 2. designed to prevent testing manipulation 3. represent real operation power since you can't use benchmarks that fit in cpu cache.

"The SPEC CPU2000 benchmarks are intended to exercise the CPU itself, the memory hierarcy, and the compilers. How much memory do they actually use?

The data collected here show that SPEC met its goals for memory footprint: most benchmarks are larger than common cache sizes, many are larger than 100MB, and none are larger than 200MB.

* It is useful to have benchmarks that are larger than common caches, because SPEC would like to differentiate its benchmarks from "toy benchmarks" that are too easy to run or that simply reflect MHz.
* It is useful to keep the benchmarks under 200MB so that the suite leaves a reasonable margin on a 256MB machine. The other 56MB are available for the operating system, graphics system, network daemons, etc, without using 'single user mode' on Unix systems, or killing processes on NT systems. (Such measures may not be representative of how most people use their systems.)

The SPEC CPU2000 benchmarks are derived from real applications, and they exercise more of the system than just the CPU chip. " http://www.spec.org/cpu/analysis/memory/

"SPEC's Background

The System Performance Evaluation Cooperative, now named the Standard Performance Evaluation Corporation (SPEC), was founded in 1988 by a small number of workstation vendors who realized that the marketplace was in desperate need of realistic, standardized performance tests. The key realization was that an ounce of honest data was worth more than a pound of marketing hype.

SPEC has grown to become one of the more successful performance standardization bodies with more than 60 member companies. SPEC publishes several hundred different performance results each quarter spanning across a variety of system performance disciplines." http://www.spec.org/spec/

The operative words are "toy benchmarks". So if you have a problem with spec's benchmarks then you believe everything George W bush has said.
January 30, 2007 11:03:47 PM

Oh crap someone mentioned spec. Hows the IEEE going these days?
January 30, 2007 11:15:01 PM

Quote:
synthetic benches suck donkey


spec.org is the only IEEE/ASTM/ANSI approved benhcmarks since they are 1. peer reviewed for accuracy 2. designed to prevent testing manipulation 3. represent real operation power since you can't use benchmarks that fit in cpu cache.

"The SPEC CPU2000 benchmarks are intended to exercise the CPU itself, the memory hierarcy, and the compilers. How much memory do they actually use?

The data collected here show that SPEC met its goals for memory footprint: most benchmarks are larger than common cache sizes, many are larger than 100MB, and none are larger than 200MB.

* It is useful to have benchmarks that are larger than common caches, because SPEC would like to differentiate its benchmarks from "toy benchmarks" that are too easy to run or that simply reflect MHz.
* It is useful to keep the benchmarks under 200MB so that the suite leaves a reasonable margin on a 256MB machine. The other 56MB are available for the operating system, graphics system, network daemons, etc, without using 'single user mode' on Unix systems, or killing processes on NT systems. (Such measures may not be representative of how most people use their systems.)

The SPEC CPU2000 benchmarks are derived from real applications, and they exercise more of the system than just the CPU chip. " http://www.spec.org/cpu/analysis/memory/

"SPEC's Background

The System Performance Evaluation Cooperative, now named the Standard Performance Evaluation Corporation (SPEC), was founded in 1988 by a small number of workstation vendors who realized that the marketplace was in desperate need of realistic, standardized performance tests. The key realization was that an ounce of honest data was worth more than a pound of marketing hype.

SPEC has grown to become one of the more successful performance standardization bodies with more than 60 member companies. SPEC publishes several hundred different performance results each quarter spanning across a variety of system performance disciplines." http://www.spec.org/spec/

The operative words are "toy benchmarks". So if you have a problem with spec's benchmarks then you believe everything George W bush has said.

You saved me the trouble. Thx.
January 30, 2007 11:16:28 PM

Quote:
CPU2000 will be retired in February 2007

...

As of February 24, 2007, no further CPU2000 results will be accepted for publication on the SPEC web site.


http://www.spec.org/cpu2000/
January 30, 2007 11:18:29 PM

Quote:
Not specINT and specFP, but TPC-C (a database/transaction processing benchmark) and specFP_rate assuming that they are continuing to use the two released benchmarks from December.

http://xtremesystems.org/forums/showpost.php?p=1873442&...

The following is my estimated scores for the Barcelona system

SPECfp_rate:
fastest 2S Opteron 2220SE score is 96.0 (peak)
fastest 2S Xeon 5160 score is 83.4
fastest 2S Xeon 5355 score is 104

40% faster than the 2220SE gives a score of about 135-140 for a 2S, quad-core Barcelona. By comparison, a 4S Opteron 8220SE has a best score of 178.

SPECfp_rate is heavily memory bandwidth dependent and really isn't indicative of desktop application performance and most server type applications.

Results are from here:

http://www.spec.org/cpu2000/results/rfp2000.html

OLTP benchmark:

I think the existing systems mentioned in the test are:

Opteron 2220SE - HP DL385G2 with a score of 139,693
Xeon Woodcrest - HP DL380G5 with a score of 140,246
Xeon Clovertown - HP BL480c wth a score of 222,117 (~60% higher increase versus the Woodcrest system, which matches the graph)

A 70% increase over the Opteron system gives a score of around 235,000-240,000.

The best score for a Clovertown system currently is the HP ML370G5 with a score of 240,737.

The best score for a 4S Opteron system is the HP ProLiant DL585G2 with 8220SE with a score of 262,989. The best score for a 4S Intel system is 331.087 from the IBM x3950 with 3.5GHz Tulsas-based Xeons.



Barcelona will allow AMD to own TPC-H forever. Clusters will be beating up Itanium 2 and Power.

If their OLTP numbers are accurate, they will crack TPC-C.
January 30, 2007 11:19:50 PM

Quote:

If their OLTP numbers are accurate, they will crack TPC-C.

If their OLTP numbers are accurate, they're already matched by an existing Clovertown system.
January 30, 2007 11:29:04 PM

Quote:
According to this guy, the 40% number that AMD announced is for floating point only. For integer they expect an increase of 10-15% over the clovertown. Still wonder how this translates to real world apps though and if this 40% advantage is only when comparing clock for clock.

http://blogs.zdnet.com/Ou/?p=415

There is no any info about integer performance. Also, it is impossible to boost the ALU peroformance out of nothing and K8L has no improvements on the ALU.
TPC-C is not an ALU benchmark, but is an on-line transaction processing benchmark. It mostly measures system bandwidth.
That quote is borowed from David Kanter's article from RealWorldTech.
http://www.realworldtech.com/page.cfm?ArticleID=RWT012707024759
Quote:
However, what happens beyond the middle of the year is subject to quite a bit of uncertainty, with rhetoric issuing from both camps. AMD has claimed an advantage based on performance models, which are extremely accurate but may not account for faster speed grades from Intel, of around 10-15% for TPC-C and 40% for SPECfp_rate. The latter is likely to be somewhat of an outlier, but it is clear that AMD will be strongest in high performance computing workloads. AMD’s performance is largely attributed to microarchitectural improvements and a high level of system integration.

Those claims come directly from AMD's mouth. Before we see an K8L ES benchmark we can't conclude how fast will be Barcelona.
January 30, 2007 11:33:51 PM

Quote:

If their OLTP numbers are accurate, they will crack TPC-C.

If their OLTP numbers are accurate, they're already matched by an existing Clovertown system.

Go to www.tpc.org

Click on Non clustered TPC-H

Opteron OWNS 100GB and 300GB DB sizes. Only Power and Itanium's ability to expand beyond 8Way is keeping Opteron at bay for higher sizes and TPC-C. Barcelona should reach 16Way(a whopping 64 cores) at least with the L3 helping with cache coherency. Even an 8Way makes 32 Opteron+ cores.

The clustered results show Opteron leading the way at most DB sizes.
January 30, 2007 11:50:25 PM

Quote:

If their OLTP numbers are accurate, they will crack TPC-C.

If their OLTP numbers are accurate, they're already matched by an existing Clovertown system.

Go to www.tpc.org

Click on Non clustered TPC-H
Who cares about TPC-H. The benchmark shown by AMD is TPC-C and in that the score is already achieved by an existing Clovertown system.
January 31, 2007 12:00:34 AM

Quote:
According to this guy, the 40% number that AMD announced is for floating point only. For integer they expect an increase of 10-15% over the clovertown. Still wonder how this translates to real world apps though and if this 40% advantage is only when comparing clock for clock.


Buddy. Did you even read the article that you linked?? The article states 10-15% for TPC-C. And its not SpecFP, its SpecFP_Rate. I can show you why. It looks like you didn't.

Quote:
David Kanter: AMD has claimed an advantage based on performance models, which are extremely accurate but may not account for faster speed grades from Intel, of around 10-15% for TPC-C and 40% for SPECfp_rate.
January 31, 2007 12:01:10 AM

Quote:
According to this guy, the 40% number that AMD announced is for floating point only. For integer they expect an increase of 10-15% over the clovertown. Still wonder how this translates to real world apps though and if this 40% advantage is only when comparing clock for clock.

http://blogs.zdnet.com/Ou/?p=415


I didn't read any other answer, so I might sound off track.

I can't care less about SPEC numbers. What I want to know is how fast it'll encode my home video and play my games... all at the same time. For everything elses it's not worth my time.

Gotta say tough that I'm desperatly looking for real apps numbers. Keep me posted if you can find some. :wink:
January 31, 2007 12:05:29 AM

Quote:
According to this guy, the 40% number that AMD announced is for floating point only. For integer they expect an increase of 10-15% over the clovertown. Still wonder how this translates to real world apps though and if this 40% advantage is only when comparing clock for clock.

http://blogs.zdnet.com/Ou/?p=415

There is no any info about integer performance. Also, it is impossible to boost the ALU peroformance out of nothing and K8L has no improvements on the ALU.
TPC-C is not an ALU benchmark, but is an on-line transaction processing benchmark. It mostly measures system bandwidth.
That quote is borowed from David Kanter's article from RealWorldTech.
http://www.realworldtech.com/page.cfm?ArticleID=RWT012707024759
Quote:
However, what happens beyond the middle of the year is subject to quite a bit of uncertainty, with rhetoric issuing from both camps. AMD has claimed an advantage based on performance models, which are extremely accurate but may not account for faster speed grades from Intel, of around 10-15% for TPC-C and 40% for SPECfp_rate. The latter is likely to be somewhat of an outlier, but it is clear that AMD will be strongest in high performance computing workloads. AMD’s performance is largely attributed to microarchitectural improvements and a high level of system integration.

Those claims come directly from AMD's mouth. Before we see an K8L ES benchmark we can't conclude how fast will be Barcelona.

Then maybe you should ask Jack. The widened L1-L2, OoO loads, 2x128 bit loads/retires per cycle, enhanced prediction, larger branch history, updated stack handler, better TLBs, SSE4A(yes there are SSE int instructions), etc will enhance int performance probably significantly(>10%). I would say even more (closer to 20%) but we'll see.
January 31, 2007 12:07:02 AM

Quote:
According to this guy, the 40% number that AMD announced is for floating point only. For integer they expect an increase of 10-15% over the clovertown. Still wonder how this translates to real world apps though and if this 40% advantage is only when comparing clock for clock.

http://blogs.zdnet.com/Ou/?p=415


I didn't read any other answer, so I might sound off track.

I can't care less about SPEC numbers. What I want to know is how fast it'll encode my home video and play my games... all at the same time. For everything elses it's not worth my time.

Gotta say tough that I'm desperatly looking for real apps numbers. Keep me posted if you can find some. :wink:

You can believe that SPEC is a relevant benchmark for determining general performance. Much better than 3DMark.
January 31, 2007 12:29:55 AM

Quote:
Then maybe you should ask Jack. The widened L1-L2, OoO loads, 2x128 bit loads/retires per cycle, enhanced prediction, larger branch history, updated stack handler, better TLBs, SSE4A(yes there are SSE int instructions), etc will enhance int performance probably significantly(>10%). I would say even more (closer to 20%) but we'll see.


Sorry Baron, despite your fancy marketing talk, majority of the advantages is because of more simpler reasons. Relative to the platform, Barcelona is crappier than Clovertown is to the 1333MHz FSB. Or to put it in a different meaning, Barcelona is equal to Clovertown per core, but it gains advantages due to better memory subsystem.

Desktop apps couldn't care any less about platform superiority. Its different in servers however, which is why Core microarchitecture CPUs will be uncompetitive in greater than 4P environments due to crappier platform(despite the superior core).
January 31, 2007 1:00:39 AM

Quote:

If their OLTP numbers are accurate, they will crack TPC-C.

If their OLTP numbers are accurate, they're already matched by an existing Clovertown system.

Go to www.tpc.org

Click on Non clustered TPC-H
Who cares about TPC-H. The benchmark shown by AMD is TPC-C and in that the score is already achieved by an existing Clovertown system.

The lead score in TPC-C is IBM Power 5+. The Top 10 doesn't contain an X86 processor.

The Opteron owns the other relevant bench for transactions.
January 31, 2007 1:04:29 AM

Quote:
Yep, that is a wide range all right Smile ... people have caught on to the play on words, very much like AMD's 40% improvement for 65 nm announcement last year .... which of course has yet to materialize.



The spin doctor says I believe they meant "native" 65nm designs.
January 31, 2007 1:10:48 AM

Quote:
What did c2d specfp at before release?this would give us an idea of actual averages.


The talk here is not Core 2 Duo, but the Xeon 5100. Plus, AMD is talking about SpecFP_Rate. The first benchmarks that were touted on the Xeon were SpecFP, the single threaded benchmark. AMD is really talking about 2P systems.
January 31, 2007 1:11:38 AM

I keep hearing talk about "native" 65nm. Is there some how some "glued" or whatever non native version of 65nm?
January 31, 2007 1:12:41 AM

Quote:
What did c2d specfp at before release?this would give us an idea of actual averages.


I have no idea. I think we need to wait a few weeks for numbers to start "officially" coming out.
January 31, 2007 1:17:35 AM

Quote:
I keep hearing talk about "native" 65nm. Is there some how some "glued" or whatever non native version of 65nm?


I can't give you specifics but A64 was originally deigned for 130nm. Since Kuma is designed for 65nm certain aspects of the wafer process can be optimized to take advantage of the new design. I guess it's like putting 7950GX2 on AGP 2.0 first and then updating it to PCIe (loosely).
January 31, 2007 1:26:09 AM

Quote:
http://www.behardware.com/news/8484/amd-to-demonstrate-...

heres an owie to add to the current claim.

At the same time as the release of the 4x4 platform, AMD made the demonstration during the Industry Analyst Forum of a server using 4 Quad-Core Opteron and a total of 16 cores.

The Opteron is codenamed Barcelona and will use the 65 nm process. Just like with the Athlon 64 X2, AMD insist on the fact that this is a "native" Quad core solution and not two dies packaged together like Intel does.

AMD also said that the Opteron will be compatible with existent Socket F platforms. About performances, AMD announces respectively 13% and 46% improvements with TPC OLTP and SPECfp compared to a Xeon 5355. These performances are however only based on estimations and that means that the chip isn't yet fully functional.

AMD also showed that their employees can have their picture taken with their Quad Core like Intel does. They even took two pictures, one with the Barcelona on the left and one with a Xeon without its protection:


:wink: :lol: 


That music.......

The SPECfp numbers are slightly lower and the OLTP numbers are slightly higher, so it's looks like their analog equipment is pretty accurate.
January 31, 2007 1:31:51 AM

Quote:
I keep hearing talk about "native" 65nm. Is there some how some "glued" or whatever non native version of 65nm?


I can't give you specifics but A64 was originally deigned for 130nm. Since Kuma is designed for 65nm certain aspects of the wafer process can be optimized to take advantage of the new design. I guess it's like putting 7950GX2 on AGP 2.0 first and then updating it to PCIe (loosely).
Oh. Thanks. So its the transition from the m/arch original process size that makes it non native
January 31, 2007 1:37:26 AM

Quote:

The lead score in TPC-C is IBM Power 5+. The Top 10 doesn't contain an X86 processor.

The Opteron owns the other relevant bench for transactions.

The Opteron does well in the TPC benchmarks other companies don't really care about, and usually don't even participate in with many models. The TPC-C benchmark receives the most attention from companies and here, Intel scores higher and based on the preliminary information, Barcelona will be no faster than today's Clovertown.
January 31, 2007 2:47:37 AM

Quote:

The lead score in TPC-C is IBM Power 5+. The Top 10 doesn't contain an X86 processor.

The Opteron owns the other relevant bench for transactions.

The Opteron does well in the TPC benchmarks other companies don't really care about, and usually don't even participate in with many models. The TPC-C benchmark receives the most attention from companies and here, Intel scores higher and based on the preliminary information, Barcelona will be no faster than today's Clovertown.

The TPC-C Top 10 DOESN'T CONTAIN AN X86 PROCESSOR.

I think Barcelona will change that.

Do you think the humongous indexes Google uses don't need tremendous levels of bandwidth and transactional power while keeping purchase and power costs down?

I would bet that Dell, Rackable and Google will see the first "official" benchmarks and get the first procs and systems.

But then MS may want more for Longhorn server.
January 31, 2007 3:06:01 AM

Quote:

The lead score in TPC-C is IBM Power 5+. The Top 10 doesn't contain an X86 processor.

The Opteron owns the other relevant bench for transactions.

The Opteron does well in the TPC benchmarks other companies don't really care about, and usually don't even participate in with many models. The TPC-C benchmark receives the most attention from companies and here, Intel scores higher and based on the preliminary information, Barcelona will be no faster than today's Clovertown.

The TPC-C Top 10 DOESN'T CONTAIN AN X86 PROCESSOR.

I think Barcelona will change that.
So what, most companies buy 2S to 4S servers, which x86 do very well. The top x86 server is currently 15th fastest and that's only using Paxvilles.

The fact is AMD is used TPC-C for their estimate, and that score is not particularly great.

Quote:
Do you think the humongous indexes Google uses don't need tremendous levels of bandwidth and transactional power while keeping purchase and power costs down?

Google is reputed to use many commodity PC systems for their servers, not big multi-socket servers.
January 31, 2007 3:29:05 AM

Quote:

The lead score in TPC-C is IBM Power 5+. The Top 10 doesn't contain an X86 processor.

The Opteron owns the other relevant bench for transactions.

The Opteron does well in the TPC benchmarks other companies don't really care about, and usually don't even participate in with many models. The TPC-C benchmark receives the most attention from companies and here, Intel scores higher and based on the preliminary information, Barcelona will be no faster than today's Clovertown.

The TPC-C Top 10 DOESN'T CONTAIN AN X86 PROCESSOR.

I think Barcelona will change that.

Do you think the humongous indexes Google uses don't need tremendous levels of bandwidth and transactional power while keeping purchase and power costs down?

I would bet that Dell, Rackable and Google will see the first "official" benchmarks and get the first procs and systems.

But then MS may want more for Longhorn server.Don't wager too much.

Intel inside again for new Google servers: http://news.com.com/Google+using+Intel+servers+again/21...
January 31, 2007 3:55:47 AM

Quote:
I keep hearing talk about "native" 65nm. Is there some how some "glued" or whatever non native version of 65nm?
yeah, they're actually at 32nm already and glued them together for some reason... :wink:
January 31, 2007 9:12:35 AM

Quote:
isnt xeon a core uarch woodcrest core?


Doesn't matter. There is no 2P/4P Conroe system.
!