Athlon XP and SSE

G

Guest

Guest
Ive been starting to suspect how efficently Athlon XP uses SSE commands. Only very rare software seems to even notice this possibility.

SSE is used with AXP processors for example in Intel optimized DIVX 4.11 codec, which seems to speed up quite a bit.

But then again in gaming it look like a totally different case. Is DirectX8.1 automaticly using SSE (when working with AXP cpu)?
How about 3DNOW! and SSE working simultaneously, is it possible?

Next row is extrapt from my wolfconfig.cfg file (Return to castle wolfenstein):
<b>seta r_lastValidRenderer "GeForce2 GTS/AGP/3DNOW!"</b>
I suppose this implies tha SSE _is_not_used_.
Hopefully some Intel users can cut+paste here their equivalent for this row so that we can be sure.
If anyone has more information about this is issue, or just want to corect my spelling, please, go ahead, the stage is yours ;)

tyia
-raaggu
 

MadCat

Distinguished
Jan 6, 2001
230
0
18,680
On my Pentium3 system I get this:

<b>seta r_lastValidRenderer "GeForce 256/AGP/SSE"</b>

Interesting!

This <A HREF="http://forumz.tomshardware.com/modules.php?name=Forums&file=faq&notfound=1&code=1" target="_new">thread</A> also raised the question as to when SSE was utilized in applications. I was looking for this Intel bias in RTCW but gave up. Looks like you found it. Seems like SSE is not always selected when present.
 

MadCat

Distinguished
Jan 6, 2001
230
0
18,680
The point I think he was trying to make was that it might be to the Athlon XP advantage to use SSE over 3DNOW!. For example, it might result in higher benchmark scores when comparing processors with the competition. Even the RTCW MP Test Demo that Anandtech uses as a benchmarking tool has this same line in the config file indicating that 3DNOW! is preferred over SSE.
 

8procstooslow

Distinguished
Dec 31, 2007
40
0
18,530
AMD must have a much better implementation of SSE than on the P4. I mean for the same instruction at a lower clock speed teh AXP is faster.
OK I don't know much about chip design, do teh SSE instruction use the great FPU or is the implementation so much better than intels?
 
G

Guest

Guest
the Athlon XP's implementation of SSE instructions is called 3Dnow Professional. that's probably why 3dnow shows up instead of SSE
 

MadCat

Distinguished
Jan 6, 2001
230
0
18,680
RTCW was well into developement before the Athlon XP came out. If they wanted to make a distinction between 3DNow! and SSE, it was probably to indicate which instruction set was being used. The selection of which instruction set to use is sometime based on processor vendor ID (i.e. "AuthenticAMD" or "GenuineIntel").
 

FatBurger

Illustrious
the Athlon XP's implementation of SSE instructions is called 3Dnow Professional. that's probably why 3dnow shows up instead of SSE

Exactly right, LosingStreak.


Interesting information few people seem to know: the Athlon XP has 52 SSE commands. According to Intel, the Pentium 3 has 70. Hold on though, that 70 includes the original MMX commands (that's right, when Intel says "SSE", they actually mean SSE and MMX). So if you include the original 19 MMX commands, the Athlon XP actually has 71 total SSE commands. Yeup, one more than the Pentium 3. Interesting, no? I would assume that the P4 has the same as the P3.

SSE2 is the P4s advantage. The problem is, SSE2 is just starting to show into the marketplace. Sure, Tribes 2 and other games have had SSE2 for several months. But just like with MMX and SSE, the first programs utilizing it don't do so very well. Good SSE2 implementation is still a little ways off. Should AMD be putting SSE2 in their processors right away? Of course, but Intel won't let them. Interpret that how you will.

As for FPU, most people think that the P4 has a screwed up FPU.
Let me ask you something, if you had a team of two guys and a team of three guys that had to push a car the length of a city block, and the three guys did it in 1:00 and the two guys did it in 1:10, which guys are stronger? The team of two guys are obviously stronger, since there's only two of them.

The Pentium 4 has two FPUs, the Athlon has 3. Let the myth of the P4s "screwed up FPU" lay at rest.

<font color=orange>Quarter</font color=orange> <font color=blue>Pounder</font color=blue> <font color=orange>Inside</font color=orange>
 

MadCat

Distinguished
Jan 6, 2001
230
0
18,680
Who knows, maybe the developers of RTCW would have indicated "3DNow!+ Pro" if they had taken advantage of the SSE feature of AXP. It may simply mean they employed 3DNow! instructions. So we really don't know at this point.
 

AMD_Man

Splendid
Jul 3, 2001
7,376
2
25,780
As for FPU, most people think that the P4 has a screwed up FPU.
Let me ask you something, if you had a team of two guys and a team of three guys that had to push a car the length of a city block, and the three guys did it in 1:00 and the two guys did it in 1:10, which guys are stronger? The team of two guys are obviously stronger, since there's only two of them.

The Pentium 4 has two FPUs, the Athlon has 3. Let the myth of the P4s "screwed up FPU" lay at rest.
I don't care how many FPUs the P4 has but it's FPU performance is still terrible. It takes ~3GHz P4 to match the FPU performance of a 1.2GHz Athlon in Sisoft Sandra (not including SSE2 performance).

AMD technology + Intel technology = Intel/AMD Pentathlon IV; the <b>ULTIMATE</b> PC processor
 

FatBurger

Illustrious
The Pentium 4's FPU underperforms the Athlons, there's no doubt about that.

The myth I am trying to dispel is that the P4's FPU needed to be fixed. Lots of people think that there is something wrong with it, when in reality, it's just that there is one less.

<font color=orange>Quarter</font color=orange> <font color=blue>Pounder</font color=blue> <font color=orange>Inside</font color=orange>
 

MadCat

Distinguished
Jan 6, 2001
230
0
18,680
This is what I found in the 'q3config' file for Quake 3 Demo!

<b>seta r_lastValidRenderer "GeForce3/AGP/3DNOW!"</b>

Now if I can just figure out how to force this program to use SSE (assuming of course this information indicates that SSE is not being used at all)!
 

FatBurger

Illustrious
I thought the Q3 engine didn't support SSE?

<font color=orange>Quarter</font color=orange> <font color=blue>Pounder</font color=blue> <font color=orange>Inside</font color=orange>
 

MadCat

Distinguished
Jan 6, 2001
230
0
18,680
Here's a quote from <A HREF="http://firingsquad.gamers.com/hardware/pentium3/page4.asp" target="_new"> Firing Squad's interview of John Carmack</A>, Lead Programmer, id Software on Quake 3: Arena

Most of the Katmai optimizations [for Quake 3] are in the OpenGL drivers. <b>We may have some loops in the main code Katmai optimized,</b> but it is a low priority. Because up to 75% of the execution time of the game is in the graphics driver, most of the burden of optimization is theirs. I know that Intel is working with ATI and Katmai on their drivers.
In theory, Katmai provides 4x the single precision floating point performance, but you would never see that on a real algorithm, let alone a full system level benchmark.

I believe that the driver guys are getting about a 25% total speedup with Katmai optimizations. Combined with the clock rate boost, that is a significant win.
Maybe the SIMD type information is passed back from the Graphic Drivers?
 

kief

Distinguished
Aug 27, 2001
709
0
18,980
The 2 or 3 FPU description is great, but that is a design limitation of the chip. A car with a 4 popper may produce more power per cylinder then a 6 popper, but the 6 is more powerful overall. Id still take the 6!!! Just cuz the P4 design team didnt have the room, budget, or any other excuse for a 3rd does not make it right or faster =)

Jesus saves, but Mario scores!!!
 

FatBurger

Illustrious
I never said the P4 perforumed better in floating point operations. In fact, I explicitly said the exact opposite.

<font color=orange>Quarter</font color=orange> <font color=blue>Pounder</font color=blue> <font color=orange>Inside</font color=orange>
 
G

Guest

Guest
Please read the original post before answering to this thread. Point <b>is not</b> if the P4 has good fpu-structure or not.
 

texas_techie

Distinguished
Oct 12, 2001
466
0
18,780
ALright, as someone pointed out.. AMD has licensed SSE from Intel and called it 3D NOW! Yes AMD added some additional instructions on top of the original SSE.
AMD has NOT been given the license for SSE2, but that agreement (or lack of) is only good through 2002. So you may see SSE2 in AMD chips in 2003.
Not sure if SSE or 3d NOW is better. Never really payed attention. ANyone know of some head to head comparisons?

year 2010: Intel? Whose that?
 
G

Guest

Guest
Q1: Why SSE isnt used in many apps/games which support SSE when running AXP instead of P3+ CPU?

A1: My theory is, that that piece of software has been written <b>before</b> AXP came into market. Code in such programs has been written (in a wrong way) to check processor id, and if result is intel P3+ SSE is used.. if its AMD Athlon+ 3DNOW!+ is used. (K6-2 has more simpler 3DNOW! instructions than Athlon)
Code <b>should</b> have been written to check if SSE instructions are present.

Q2: How much difference SSE makes compared to no multimedia optimization?

A2:
1)J Carmack guesses: 25% in drivers, but not much in their Q3 code itself. (well 25% sounds pretty lot to me.. -raaggu)
2)Jim Belcher, Director of Simulations at Infogames says that SSE optimization made their Descent 3 and Wargasm engines 10-15% faster.. (uuh and that combined to graphics driver optimizations.. it has a HUGE impact! -raaggu)

Source: http://firingsquad.gamers.com/hardware/pentium3/page4.asp
and
http://firingsquad.gamers.com/hardware/pentium3/page5.asp

Q3: Is 3DNOW!+ superior to SSE (and how about SSE2?)

A3: No it is not. SSE is faster but 3DNOW! optimization is much easier to do.

Sources:
SSE versus 3DNow!, A Developer's Perspective
http://www.hardwareanalysis.com/content/front_page/news_spotlights/article/1320/

and voodooextreme quoting ie:
John Carmack, id Software (DOOM, Quake series)
Tim Sweeney, Epic Games (Unreal series)
Jim Malmros, Insomnia Software
Dean Sekulic, Croteam
Nathan d'Obrenan, Firetoads Software
Phil Steinmeyer, PopTop
Rik Heywood, Synaptic Soup
Joel Huenink, 4D Rulers
Brian Hook, formerly of id and Verant
Chris Rhinehart, Humanhead Studios

http://www.voodooextreme.com/articles/intelvsamddeveloperbattle.html

thats all folks.. let me know what you think.. :)
 
G

Guest

Guest
not a word about <b>3DNOW! pro in there..</b> but I believe its 3DNOW! including the exact copy of SSE since intel made DIVX 4.11 SSE optimization and that works with AXP too.

What how about those games games games... I havent seen single game which would tell me that it utilizes SSE or <b>3DNOW! PRO</b> when using AXP. Therefore im "persuaded" to belieave that AXP is using only 3DNOW!+
 

Matisaro

Splendid
Mar 23, 2001
6,737
0
25,780
The Pentium 4 has two FPUs, the Athlon has 3. Let the myth of the P4s "screwed up FPU" lay at rest.

I think when people say bad fpu performance, they mean in general, not per fpu.

"The Cash Left In My Pocket,The BEST Benchmark"
No Overclock+stock hsf=GOOD!
 
G

Guest

Guest
A3: No it is not. SSE is faster but 3DNOW! Optimization is much easier to do.
That is an untrue statement 3dnow requires a library to be implemented. Thus more hassle and more time to optimize the code. Also note that MMX, SSE, SSE2, and OpenGL extensions are used for advanced math calculations thus to lower overhead and extended calculations on the float point unit's and graphics processing unit's.

-Spuddy


:lol: Go Ahead Peel Me. I Dare Ya" :lol:
 

Kelledin

Distinguished
Mar 1, 2001
2,183
0
19,780
That is an untrue statement 3dnow requires a library to be implemented.
And compared to SSE...a library is harder to use than hand-coded assembly? Especially if you have to define new macros within your assembly-language source code to handle "unsupported" SSE instruction pnemonics?

Compared to using Intel's compiler, yes, it could be more difficult--depending on whether AMD's library just transparently replaces other libraries, or requires a source code rewrite. Of course, Intel's compiler costs money <i>on top</i> of already paying for MSVC (unless it's purely for non-commercial), which is why it isn't used more often. With either Intel's compiler or AMD's library, compiled code needs another round of testing to make sure it doesn't trip on any bugs in the add-ons.

<i>If a server crashes in a server farm and no one pings it, does it still cost four figures to fix?