Sign in with
Sign up | Sign in
Your question

Why does AMD win in Floating Point?

Last response: in CPUs
Share
March 28, 2007 2:57:14 AM

Won't have the time to read tonight since my eyes are drooping and such, but the question is as above. Why do K8 processors and for that matter, possibly the upcoming K10, always beat Intel processors when it comes to floating point benches and apps?
Thanks.

More about : amd win floating point

March 28, 2007 3:34:14 AM

Quote:
Won't have the time to read tonight since my eyes are drooping and such, but the question is as above. Why do K8 processors and for that matter, possibly the upcoming K10, always beat Intel processors when it comes to floating point benches and apps?
Thanks.



Architecturally, I'd say it's because Intel went away from x87 code and towards SSE while AMD remained on X87 with 3DNow! extensions. I wish I had time to delve more deeply but alas, such is not the case.

Maybe tomorrow.
a b à CPUs
March 28, 2007 4:03:22 AM

Quote:
Won't have the time to read tonight since my eyes are drooping and such, but the question is as above. Why do K8 processors and for that matter, possibly the upcoming K10, always beat Intel processors when it comes to floating point benches and apps?
Thanks.



Architecturally, I'd say it's because Intel went away from x87 code and towards SSE while AMD remained on X87 with 3DNow! extensions. I wish I had time to delve more deeply but alas, such is not the case.

Maybe tomorrow.

Yes the Baron is correct. The floating point math was run off 80x87 and directly off the CPU die from 486DX on (not SX as that denoted no on-die co-processor). Just a side note the SX / DX notation meant something different for the 386 (SX=16bit, DX=32bit). When Intel developed SSE (an answer to AMD's 3DNow), they believed that software developers would move away from the old x87 instructions towards the parallized SSE instructions. For the most part this was true, but x87 instructions are still used in some areas (I have no explanation for why). So Intel sacrificed on-die space (and consequently efficiency) meant for the x87 core and used that space for the SSE execution unit, the old FPU is still there, but Intel hasn't put much into it in an effort to coerce developers to move their code to SSE support.

To say AMD is better at floating point math, is a little misleading. If you are talking x87 instructions, yes, AMD is better. If you are talking SSE (SIMD) instructions, Intel crushes AMD. It just depends whether the app supports SSE or not. Most modern apps (written in the last 5 years or so) support SSE. Again, I'm not sure why x87 would be used in favor of SSE. SSE handles repetitive (same) instructions on large data sets more efficiently that x87 instructions.

One thing to mention, MMX does for integer math what SSE does for floating point math. It was a great selling feature for Intel (as AMD didn't have it for a long time) at the time, but there wasn't much use for it. I think voice recognition uses MMX quite heavily, but I can't think of too many other examples that utilize heavy integer computations.
Related resources
March 28, 2007 7:15:27 AM

Are floaters like loggers? Think I just saw a loggerhead...
March 28, 2007 12:25:44 PM

@All

Ah thanks. So its not floating point per say, but apps that depend on x87 code.
March 28, 2007 2:37:36 PM

Who the frack keeps one-staring Da Ninja???

Shame on you. :wink:
March 28, 2007 2:50:22 PM

Quote:
@All

Ah thanks. So its not floating point per say, but apps that depend on x87 code.



For now, at least until K10 which will reportedly do 2x128bit SSE. There are still a lot of things that use X87. That's why things like ScienceMark run so well on K8.
March 28, 2007 6:07:28 PM

ScienceMark is a synthetic benchmark, so things like it have no practical use.
March 28, 2007 6:40:06 PM

Quote:
ScienceMark is a synthetic benchmark, so things like it have no practical use.


Not!

Science Mark is more valid than SuperPi as Primordia and the like do actual mathematical code not just divisions and remainders.
March 28, 2007 11:02:39 PM

Quote:
To say AMD is better at floating point math, is a little misleading. If you are talking x87 instructions, yes, AMD is better. If you are talking SSE (SIMD) instructions, Intel crushes AMD. It just depends whether the app supports SSE or not. Most modern apps (written in the last 5 years or so) support SSE. Again, I'm not sure why x87 would be used in favor of SSE. SSE handles repetitive (same) instructions on large data sets more efficiently that x87 instructions.


x87 is used a lot in a lot of programs i've seen.
fadd, fmul etc... All over the place!
That said, a lot of the instructions in the set aren't used.

I'm not sure what compilers are doing these days. I can't see everyone writing their bog standard float stuff in assembler. I know you can turn sse options on, never looked to see what the difference in output is.

I think x87 will phase out though, only when the program is so power hungry that it wouldn't run on a processor that is so old it doesnt have SSE.

sorry if that doesnt make any sense, i have a headache!
March 29, 2007 2:11:23 AM

Quote:
ScienceMark is a synthetic benchmark, so things like it have no practical use.


Not!

Science Mark is more valid than SuperPi as Primordia and the like do actual mathematical code not just divisions and remainders.

Baron, you are incorrect.... ScienceMark is indeed a synthetic benchmark, it is a collection of 7 different scientific code blocks intended to stress the CPU, particularly the computation float point capabilities of the CPU. There are several benchmarks that do this, PCMark, 3DMark to name a few. Oddly, if you download the software, softpedia describes the benchmark as not optimized for an architecture, yet it consistently shines on AMD platforms.... several reasons, really --- the FPU is stronger on the K8.... but I wonder..... let's see....

Download a copy of ScienceMark 2.0 and do 'About ScienceMark', 5 people contributed to the project:

Tim Wilkens, Ph.D.
Sean Stanek, B.S.
Julian Ruhe, Ph.D.
Per Kjellgren, M.S.
Alexander Goodrich, B.S.

Now, I wonder.... if you use google (it is your friend as you pointed out in another thread) would any of these people pop up....

a) Let's start with Tim....
Yep, he did a presentation --
http://www.amd.com/us-en/assets/content_type/Downloadab...
Wow.... what a coincidence.... ok, so could happen.... let's try another one.

B) Ok, well lets check out Julian.... not him... he could not possibly be a programmer who optimizes for AMD....
Ooops, here it is again:
Quote:
Julian Ruhe, a very talented ASM programmer, sent us an Athlon-optimized version of Stream. This optimized binary makes use of 3DNow! and MMX to get the most out of the memory subsystem.
An ASM programmer optimizing for AMD.... http://www.aceshardware.com/Spades/read.php?article_id=...
Well, it turned up.... another coincidence...

c) Sean, Sean Stanek could not be connected to AMD could he???

Yep:
Quote:
ApusHardware interviews AMD's code guru Sean Stanek

http://techreport.com/news_archive_overview.x/2000/8/28
Fascinating.... alas, the link TechReport did not point to the original interview, or I did not read down far enough.

D) Ok how about Alexander, he is just a B.S. he could not have anything to do with AMD optimizations... well, nope...
Quote:
Alexander Goodrich and Sean Stanek, who are both software engineers that are involved with AMD-optimizations

http://www.thg.ru/cpu/20001206/print.html

Could not find anything on Per Kjellgren M.S., perhaps this is all just a coincidence and ScienceMark actually is not biased ;)  ... what is funny about this is I knew one of the developers worked for AMD, per other discussions around the net but never dug... I just found all this within the last 10-15 minutes by simple googling. :p 

Ok... well, I do agree, ScienceMark 2.0 does run real code... but no more nor less real than SuperPI, which perplexes me to no end.... I would think that SuperPI would be your favorite benchmark of them all.... for you see, Pi is irrational --- which you should relate to very easily.

Pi is the essence of the universe, simply put it is the ratio of the circumference of a circle to the diameter of that circle. Some may say that it even has cosmic meaning or proves the existence of God.

I can draw a circle, I can sketch a the diameter of such circle. I can even measure both circumference and diameter exactly, but if I measure one, even though Pi is the ratio of the two, I cannot precisely calculate the other... for you see, Pi is irrational -- very much like you, our dear friend Baron, it just keeps going and going and going -- ...

The mathematical concept is that Pi is infinite in rank of fraction, an irrational number that cannot be specified to exact certainty. This little fact has puzzle philosophers, scientist, and mathematicians for centuries. Entire doctoral dissertations have been devoted to calculating Pi to the last digit, they have been unsuccessful. Carl Sagan romanticized on Pi in fictitious novels. Pi is what binds us and what drives us crazy.

It is within this computational challenge that calculating Pi is useful, such that, to produce 1 million decimal digits to Pi or even 8 million requires heavy lifting, many many iterative processes of successive approximation followed by minimization. The speed at which one may arrive at the 1 million digit calculation is a direct measure of the speed of the calculator. A person attempting it on an abacus would take some 2132 years, a person running attempting it on a slide rule may do it in 1722 years, a person with a hand calculator might do it in 826 years, on an early mainframe -- just a guess, but maybe a few months. On an 8088 years ago, perhaps a day.... but with Core 2 at about 4.6 GHz it takes 11 seconds.

"Computing pi is the ultimate stress test for a computer - a kind of digital cardiogram" - Ivars Peterson

(The above was posted by myself in another thread with a minor edit ....... did not feel like reinventing the wheel but the truth in Pi remains).

Jack


Well, we appreciate your well thought and admittedly long and drawn out attempt to show that to mathematically calculate Pi you need only do division with remainders.
Science Mark actually uses differential and integral calculations with X87 code and memory accesses, but your continued insistence upon demeaning and harassing statements merely allows me to conclude that your skin I am under, through no fault of my own.
March 29, 2007 2:32:34 AM

Quote:
Won't have the time to read tonight since my eyes are drooping and such, but the question is as above. Why do K8 processors and for that matter, possibly the upcoming K10, always beat Intel processors when it comes to floating point benches and apps?
Thanks.

Because the K8 has 3 FloatingPointUnits. 3 INTeger units (PER CORE)
The 486 - has 1 FPU, 2 INT
Pentium - 2 FPU, 2 INT
Pentium Pro, II, III, IV has 2 FPU, 3 INT
C2D has 2(or 3) FPU, 4 INT.
The Nehalem will be totally different. I think 8 & 8 !!

SSE are special purpose, so they can't speed up 3D Studio much for example. But can speed up video encoding.

But there are many other factors also.
March 29, 2007 3:37:22 AM

Quote:

Just a side note the SX / DX notation meant something different for the 386 (SX=16bit, DX=32bit).


16/32 bit BUS. Both are 32bit internally, by 386SX has different socket with only 16 bit data bus.

Quote:

For the most part this was true, but x87 instructions are still used in some areas (I have no explanation for why).


Obviously, because of backward compatibility. Majority of software does not depened on FPU performance anyway, there is no reason why it should not run on PIII or original Athlon.

Mirek
March 29, 2007 10:32:13 AM

Quote:
Obviously, because of backward compatibility. Majority of software does not depened on FPU performance anyway, there is no reason why it should not run on PIII or original Athlon.

Mirek


Yep, even when you do see FPU instructions, they are far and few.
Integer strength is much more important imo. There are no floating point memory addresses. 0x10.400 etc. Pretty much everything a program does uses integer math in some way or form.
March 29, 2007 12:53:23 PM

Quote:
SSE are special purpose, so they can't speed up 3D Studio much for example.


Why? I believe SSE is the reason why Lightwave renders used to run much faster on Pentium 4 than supposedly equivalent AMD chips (don't know how they compare today).

If Lightwave can do it, then 3D Studio can too.
March 29, 2007 2:28:55 PM

Quote:
SSE are special purpose, so they can't speed up 3D Studio much for example.


Why? I believe SSE is the reason why Lightwave renders used to run much faster on Pentium 4 than supposedly equivalent AMD chips (don't know how they compare today).

If Lightwave can do it, then 3D Studio can too.
Ok, 3D studio "can" but doesn't.
I remember reading this from the early days of the Pentium 2/3
Let me get more info..
March 29, 2007 7:04:57 PM

Quote:
Obviously, because of backward compatibility. Majority of software does not depened on FPU performance anyway, there is no reason why it should not run on PIII or original Athlon.

Mirek


Yep, even when you do see FPU instructions, they are far and few.
Integer strength is much more important imo. There are no floating point memory addresses. 0x10.400 etc. Pretty much everything a program does uses integer math in some way or form.

That depends on application. Windows, Word, Explorer, Outlook are examples of applications that do not do too much FP (afaik...).

OTOH, games or videoprocessing can be pretty FP intensive.

Mirek
March 29, 2007 9:08:00 PM

The importance of CPU's FP performance has diminished quite a lot since the days of Pentium III and AMD TB.
Today most FP number crunching is done in GPU, unless your full time hobby is calculating Pi, or similar.

What I'm waiting for to see is how AMD will glue CPU and GPU(with all it's FP units) together. Will it be an el cheapo system on a chip or some kind of a uber-CPU, I don't know, but I'm waiting for it with interest.
!