Sign in with
Sign up | Sign in
Your question

HAMMER scaling ability is scary

Last response: in CPUs
Share
October 19, 2002 6:46:21 PM

http://www.3dcenter.org/artikel/2002/10-18.php
*corrected link*

as seen here... Hammer murders all the other processors in scaling.... add more MHZ... scores rise much higher...

Opteron gains about 17 % in SPEC FPU per 200 MHZ boost
P4/XEON gain about 5 % in SPEC FPU per 200 MHZ boost

this means that if AMD can Ramp the clock speeds...they will be very potent against the next level of p4's

<P ID="edit"><FONT SIZE=-1><EM>Edited by popegoldx on 10/19/02 05:54 PM.</EM></FONT></P>
October 19, 2002 7:05:36 PM

That's not a correct link.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
October 19, 2002 7:07:10 PM

Well since I cant see anything other than a case in the right coner Ill just assume your crazy and go about my day.

-Jeremy

<font color=blue>Just some advice from your friendly neighborhood blue man </font color=blue> :smile:
Related resources
October 19, 2002 7:30:34 PM

:eek:  My bad sorry I'm sleepy today and the store is soo very cold.

-Jeremy

<font color=blue>Just some advice from your friendly neighborhood blue man </font color=blue> :smile:
October 19, 2002 7:57:33 PM

is that you poopy???? maybe you should post a link that makes sense instead of spreading amd fud.
October 19, 2002 9:55:53 PM

didnt know graphs and numbers needed translating.
October 19, 2002 10:02:05 PM

Looking at the Spec databases they linked to, I don't see Opteron scores. I wonder where they got them from.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
October 19, 2002 11:42:52 PM

Opteron gets beaton in SPECfp 2000? Is that significant. I have no clue what these tests are.

...And all the King's horses and all the King's men couldn't put my computer back together again...
October 19, 2002 11:52:04 PM

Hmm, the performance is definitly not scaling linearly. Notice the 1.6GHZ to 1.8GHZ, POWERFUL, more than 200 points in Int, but from 1.8 to 2 it only gives ~117. Once again the scaling has problems, I am surprised AMD didn't even look at that!
Had it been consistent like the 1.6 to 1.8, it would indeed be scary, as 1GHZ would lead a direct 1000 points, completly busting any P4 of any type including Prescott!

--
"Let Go." -Avril Lavigne
October 19, 2002 11:55:13 PM

Strange since the K8 core is using the K7 which was the king of x86 FP. It wasn't beaten by the 2.8GHZ though, so it still maintains the FPU lead in the x86 world. What I'd like is an SSE2 fight.

--
"Let Go." -Avril Lavigne
October 20, 2002 12:01:03 AM

Eden, you've gotta remember this is Opteron, not ClawHammer. If AMD can get Opteron chips down to $500USD, then it'll compete with the Prescott, otherwise, that's just a moot point.

...And all the King's horses and all the King's men couldn't put my computer back together again...
October 20, 2002 12:07:07 AM

Opteron is just the MP name, CH and SH can both be Opterons!
Athlon DT can also be Sledge but it'll be for small 1-2 way servers.

And yes it shouldn't go against Prescott, but I said that for the sake of the graphs' comparison chart CPUs. What I find odd is why is the Xeon 2.8GHZ using an NW core, weaker than the P4?! I mean I thought it used a smarter cache design and SMP capabilities,(which the P4 doesn't have anyways) Hyper Threading enab...Ooohhh, maybe that's why...
--
"Let Go." -Avril Lavigne<P ID="edit"><FONT SIZE=-1><EM>Edited by Eden on 10/19/02 08:09 PM.</EM></FONT></P>
October 20, 2002 12:13:08 AM

lol... seems like HT isn't all that great for the XEONs huh?

...And all the King's horses and all the King's men couldn't put my computer back together again...
October 20, 2002 12:54:09 AM

Quote:
Strange since the K8 core is using the K7 which was the king of x86 FP. It wasn't beaten by the 2.8GHZ though, so it still maintains the FPU lead in the x86 world.

Looks like you answered your own musings there. :wink: AMD is the king of <i>x87</i> FP, and x87 FP just isn't that great.

<i>I can love my fellow man...but I'm damned if I'll love yours.</i>
October 20, 2002 12:57:41 AM

Heheh, I guess then SpecFP must use some FP ops that aren't x86 limited then? (considering IA64 FPUs seem to rape, and only at 1GHZ so you can imagine at 2GHZ!)

BTW why is the FPU using a x86+1 number, x87?

--
"Let Go." -Avril Lavigne<P ID="edit"><FONT SIZE=-1><EM>Edited by Eden on 10/19/02 08:58 PM.</EM></FONT></P>
October 20, 2002 3:32:22 AM

Quote:
BTW why is the FPU using a x86+1 number, x87?

Back in the old days of the 386 and before, Intel CPUs didn't have FPUs built-in, they were sold separately. Motherboards had a second socket for the FPU chip, which was called an x87 (387 for the 386, 287 for the 286, etc). Both chips would then run together simultaneously.

The 486DX was the first Intel CPU to integrate the FPU into the core. Intel also sold a budget 486SX, which was the 486 core with the FPU removed. You could then buy the 487 unit separately if you decided later that you wanted the FPU. Curiously, the 487 was actually a full-fledged 486 and when you put it on the motherboard it would simply de-activate the 486SX completely. So in this case the two chips would not execute together, the 486SX was actually doing nothing!
October 20, 2002 3:36:50 AM

But it doesn't explain why they call it with a 7!

Also, if at their time, with no FPU, how the heck does the CPU possibly live through this? How does it calculate anything decimal?
It scares me out when thinking of an FPU-less workaround!

--
"Let Go." -Avril Lavigne
October 20, 2002 5:22:19 AM

Quote:
Also, if at their time, with no FPU, how the heck does the CPU possibly live through this? How does it calculate anything decimal?

You are young yet, Jedi apprentice.

Quote:
It scares me out when thinking of an FPU-less workaround!

Fear leads to anger. Anger leads to stress. Stress leads to doobies... :wink:

1) Apps could do IEEE-compliant floating-point operations using generic bit-manipulation techniques and produce the same end results as a numeric coprocessor. Very, very slow.

2) The O/S could trap the "numeric coprocessor not present" exception and do the work of (1). Again, very, very slow. Advantageous in that apps usually didn't have to account for the possibility of a missing FPU.

3) Apps could store and manipulate numbers in BCD (Binary Coded Decimal) format. Many developers were doing this anyways, simply because it allowed greater range/precision than most FPUs. Still rather slow.

<i>I can love my fellow man...but I'm damned if I'll love yours.</i>
October 20, 2002 5:36:28 AM

Oh great Jedi Master it pains me to mention that you forgot one of the great tools of the early 90s. The fixed point technique. It certainly had its limitations but at one point I had a whole 3d transformation pipeline coded in fixed point.

Complicated proofs are proofs of confusion.
October 21, 2002 10:05:36 AM

As other have said, calculations were made using alternative technics. But if you are interested in the "power" of old CPU just select it in Sisoft Sandra and see the numbers. Progression made is just astonishing!


DIY: read, buy, test, learn, reward yourself!
October 21, 2002 2:56:05 PM

Just some trivia.
387 FPU was mutch more expensive then the 386 CPU. IIRC.
October 21, 2002 4:01:12 PM

Quote:
Also, if at their time, with no FPU, how the heck does the CPU possibly live through this? How does it calculate anything decimal?

Well practically all games until Quake didn't use the FPU at all and they ran fine! Not sure whether System Shock or Ultima Underworld used the FPU, but they were both fully 3D games that came out before Quake. Clearly, games with full 3D graphics can be coded to be tremendously faster if they use the FPU, so they all do today.

Not sure about modern 2D games though (which today are becoming fewer and farther between). Anyone know if games like Age of Wonders 2, Red Alert 2, or HOMM4 use the FPU? I know they often use MMX or SSE.

Ritesh
October 21, 2002 6:36:28 PM

Yes, but they're incredibly inaccurate. That is, pictures don't look as smooth, shapes don't look as round and models weren't positioned as accurately. That's really the point of FP in games. If all you wanted was speed, integer would be much better.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
October 21, 2002 7:33:43 PM

Ironically, integer arithmetic is still slower on modern processors. If you were to do a 3d pipeline in integer you would have to shift adjust each calculation, which causes a non-parallel dependency. Thusly, each calculation would cost you at least 3 ticks, whereas floating-point calculations can enter the sub tick range.

Complicated proofs are proofs of confusion.
October 21, 2002 11:52:24 PM

That would be true, but in modern MPU's, integer calculations can take as little as 1 clock while almost all FP operations are pipelined and take up to 45 clocks to complete. Plus modern MPU's usually have greater integer resources (by which, I refer to the P4).

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
October 22, 2002 2:07:01 AM

Oh the argument with god. Please believe me when I say floating-point beats integer in every case.

The 20+ tick floating-point operations you speak of are all <A HREF="http://www.dictionary.com/search?q=transcendental" target="_new">transcendental</A> in nature (i.e. sin (96-192), cos (97-196), sqrt (19-35) etc) which can't be done with integer anyways. Division is also costly but can be avoided with reciprocal multiplication. Even a Pentium class processor can achieve floating point multiply and addition operations at 1 per tick when proper pipelining is used. Pentium class processors cannot do this with integer operations. With modern Athlon class processors, you can get close to 1 tick an integer multiply. However, the latency for an integer operation to complete is greater than a floating-point operation. (integer 4-9 vs floating-point 4). Athlon class processors can achieve close to 2 fp mults/adds a tick due to its 3 floating-point units and the Pentium 4 can do even better with SSE2.

Some comparisons of Athlon integer vs floating point latencies.

imul integer signed 5-9 unsigned 4-8 the lower numbers are reg to reg the higher numbers are mem to mem.
fmul 4 single/double precision.

idiv word 26-27 dword 43-44 depending addressing mode
fdiv 16 single 20 double precision

add register to register 1
add mem to mem 4
fadd 4 single/double precision.

So on an Athlon processor (and any other superscalar processor) floating point beats integer in every case.

Add to this the fact; if you are using integer arithmetic, you have to shift adjust your product after multiplication. This causes a dependant operation that prevents pipelining.

Complicated proofs are proofs of confusion.
October 22, 2002 1:59:40 PM

Not exactly sure on the Athlon but the Pentium 3 optimization guide states that average imul latency is 4 cycles vs average fmul latency of 5, with a throughput of 1/2 in the case of imul and 1/9 in the case of fmul:

http://gcc.gnu.org/ml/gcc/2001-11/msg00205.html

I'll look up some numbers on the Athlon as well.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
October 22, 2002 7:06:43 PM

Please read
<A HREF="http://developer.intel.com/design/pentium4/manuals/2489..." target="_new">Intel® Pentium® 4 Processor Optimization Reference Manual</A>
and
<A HREF="http://developer.intel.com/design/pentiumii/manuals/245..." target="_new">Intel® Architecture Optimization Reference Manual</A>

Sadly Intel shows no timings for the PIII other that MMX, but here are some comparisons for the P4.

add latency 0.5 throughput 0.5
fadd latency 5 throughput 1

imul latency 14-18 throughput 3-5
fmul latency 7 throughput 2

idiv latency 56-70 throughput 23
fdiv latency 23-58 throughput 23-58

So the P4 kicks ass in adding but looses in multiplying, which is much more important to graphics. Floating point still wins hands down.

I know for a fact that the PIII is no magic monster. I've seen the docs in the past before they moved the timings to the P4 and they were equal or worse to the Athlon.

Just so you understand fixed point arithmetic.

0x00000fff x 0x00000fff = 0x00ffe001 >> 8 = 0x0000ffe0

You multiply then you shift.

Complicated proofs are proofs of confusion.
!