ExtremeTech: 3.2Ghz P4 vs Opteron 2Ghz

mr_gobbledegook

Distinguished
Sep 3, 2001
468
0
18,780
More conclusive proof that this is a 'gamers' processor.

<A HREF="http://www.extremetech.com/print_article/0,3998,a=59324,00.asp" target="_new">ExtremeTech - </A><i><font color=blue>It's apparent that the Opteron 146 is a natural born killer when it comes to gaming performance. The Pentium 4 only manages a dead heat in one test, Comanche 4, which had previously been a "no contest" lead for the P4. In all the other tests, the Opteron 146, running at a 1.2GHz deficit, walks all over the Intel CPU.</i></font color=blue>

<b>However rather worryingly it seems to fall short in the rest of the benchmarks i.e Content Creation and any software that utilises HyperThreading or SSE2. I dont think the performance crown can be easily claimed by AMD in this respect.</b>

<font color=purple>Ladies and Gentlemen, its...Hammer Time !</font color=purple>
 

c0d1f1ed

Distinguished
Apr 14, 2003
266
0
18,780
Nice results, but I'm afraid all future games will use SSE2 and support Hyper-Threading...

I'm also very curious how the non-server model, Althlon 64, will perform, and especially how much it will cost. Is it possible that Opterons are so expensive to recover some of the research costs? Or will high-clocked Hammers cost as much?
 

mr_gobbledegook

Distinguished
Sep 3, 2001
468
0
18,780
Then why do some of the latest games using DX9 perform up to 40% better on Opteron ?

<A HREF="http://www.anandtech.com/cpu/showdoc.html?i=1856" target="_new">Anandtech</A> - <font color=blue><i>When you find game benchmarks 10% to 20% higher, you are genuinely impressed.<b> However, in some of the very latest DX9 benchmarks, Athlon64/Opteron was 40% to 50% faster.</b> </i>

<font color=purple>Ladies and Gentlemen, its...Hammer Time !</font color=purple>
 

c0d1f1ed

Distinguished
Apr 14, 2003
266
0
18,780
Everything? As far as I know, DirectX library and driver run in a single thread, and SSE is only used when in software emulation mode.
 

AMD_Man

Splendid
Jul 3, 2001
7,376
2
25,780
Um, the DX library runs with direct access to video cards and CPU resources. It supports SSE and SSE2, and every other instructions that can potentially speed up 2D and 3D operations.

Intelligence is not merely the wealth of knowledge but the sum of perception, wisdom, and knowledge.
 

imgod2u

Distinguished
Jul 1, 2002
890
0
18,980
Not really. SIMD optimizations usually require hand-optimizing to achieve best results. Such a thing is very difficult, if not impossible to do in high-level languages without some type of SIMD primitive and currently, DX9 doesn't support such a thing.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
 

c0d1f1ed

Distinguished
Apr 14, 2003
266
0
18,780
Sure, it 'supports' it. But exaclty what would use it except for the software emulator? Things like geometry processing is nowadays done completely hardware accelerated with vertex shaders. Physics is done with hand-optimized libraries.

So I believe there must be another reason for Opteron's performance increase with DirectX games. And I also think that games that do use libraries optimized for SSE (Commanche?) and Hyper-Threading (next-gen games?) will benefit from Intel's architecture.

AMD has always had rather poor SIMD performance and they probably are not going to add Hyper-Threading soon. On the other hand they have more execution units for plain 386 instructions, which still form the majority in most applications. So I still think that, when applications take advantage of Intel's architecture, we might see a totally different picture.

Anyway, I'm glad AMD is finally back in the high-end market and kicking some Intel ass!
 

imgod2u

Distinguished
Jul 1, 2002
890
0
18,980
Why the Opteron is such a great gaming platform? Let's look at the advancements that brought gaming performance increases in the past:

P2 increased FSB and memory from 66MHz to 100MHz.
P3 added better caching and a higher FSB/memory.
Athlon added higher FSB (DDR) and memory.
P4 added a higher FSB.
P4 added more cache and an even higher FSB.
P4 yet again added a higher FSB and memory subsystem.
Opteron adds an integrated memory controller that has lower latency and achieves a higher throughput than the P4-C.

Notice a pattern?

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
 

eden

Champion
Ok let's not get overhyped here.

I agree totally, and I said it before, what I think of the gaming perf.

But if this CPU indicates what a 2GHZ 3200+ will perform, I don't believe it will do well in multimedia. The scores, albeit better than the 3200+ AXP, are still NO MATCH and FAR from it from the P4. The outdated K7 design is so blatantly apparent. I wished it would've finally sped ahead with SSE2, but imgod2u is right, it really now rests on clock speed mainly, since we are talking streaming SIMD.

Oh and btw, those 3dMark scores? They sure contradict Tbreak's results if we extrapolate 1024*768 results, which would lie what, around 20000 point, with the FX5900 Ultra?

--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>Are you ugly and looking into showing your mug? Then the THGC Album is the right place for you!</b></font color=blue></A>
 

eden

Champion
Then why do some of the latest games using DX9 perform up to 40% better on Opteron ?

Anandtech - When you find game benchmarks 10% to 20% higher, you are genuinely impressed. However, in some of the very latest DX9 benchmarks, Athlon64/Opteron was 40% to 50% faster.
40-50% is a great number, but it means jack if the game already ran choppy!
It doesn't mean anything until you see the actual numbers. It jumped from the 30s to the 50s, that doesn't make the game THAT much more playable, it is still under the 60FPS smoothly playable barrier. If 40-50% came at 60FPS and gave you like 80FPS, then yeah that means a lot, but if a game ran at 15FPS and now runs at 22FPS, I could care less if the % is 50%! It's still not any better for my game!

--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>Are you ugly and looking into showing your mug? Then the THGC Album is the right place for you!</b></font color=blue></A>
 

imgod2u

Distinguished
Jul 1, 2002
890
0
18,980
I left out many of the Athlon's FSB increases for a reason. Mainly that they didn't provide that much of a speed up in gaming at all. The K7 (and K8) architecture are much more latency dependent than bandwidth dependent like the P4. The only advancement in latency in the K7 series has been the introduction of the nForce2 chipset (which, yes, I forgot to mention) and the integrated memory controller in the K8. And surprise surprise, both offered significantly higher gaming performance.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
 

Schmide

Distinguished
Aug 2, 2001
1,442
0
19,280
Could the separation of AGP and memory controller, via Xbar, have something to do with this gaming performance? The Opteron seems to do average on operations the strictly involve memory and above average on operations that involve the Hyper-Transport tunnel to AGP.

Dichromatic for your viewing plesure...
 

imgod2u

Distinguished
Jul 1, 2002
890
0
18,980
It could be but if that were true, wouldn't the K7 benefit from having a higher FSB running async with memory as well?

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
 

Kelledin

Distinguished
Mar 1, 2001
2,183
0
19,780
I wished it would've finally sped ahead with SSE2, but imgod2u is right, it really now rests on clock speed mainly, since we are talking streaming SIMD.
Do we know which core stepping this Opteron is?

Remember, the first released Opteron revisions stumble a bit on SSE2--specifically converting between integer and SSE2 data types. It's supposedly corrected in later steppings, but which stepping do we have here? If it's an engineering sample (review pieces often are), it's likely to be an early stepping.

<i>I can love my fellow man...but I'm damned if I'll love yours.</i>
 

spud

Distinguished
Feb 17, 2001
3,406
0
20,780
They never improved SSE on the K7 so I hardly think they will on the K8 thats R&D money they dont have. They will probably focus on process techniques and technologies with IBM. They need to scale not increase the IPC anymore because if the Prescotts like anything I think it is they are going to need to scale quickly to stay in this race.

-Jeremy

:evil: <A HREF="http://service.futuremark.com/compare?2k1=6940439" target="_new">Busting Sh@t Up!!!</A> :evil:
:evil: <A HREF="http://service.futuremark.com/compare?2k3=1228088" target="_new">Busting More Sh@t Up!!!</A> :evil:
 

mr_gobbledegook

Distinguished
Sep 3, 2001
468
0
18,780
Schmide I think you are right in thinking the Graphics performance is mainly attributed to the HT AGP Tunnel/Chipset and not the Opteron.....

<A HREF="http://www.anandtech.com/cpu/showdoc.html?i=1856&p=8" target="_new">AnandTech </A> - <font color=blue><i> To satisfy curiosity, we also compared performance of the Workstation nVidia Quadro FX2000 video card on both the <b>dual Xeon</b> Intel 875 platform and the single-CPU Opteron platform.

You would expect that 2 Xeon 3.06 CPUs with 1MB of cache would be the clear winner of this comparison. The results, however, are quite surprising.

<b>The results are basically even, which is amazing considering we are comparing a single 2.0 GHz Opteron to Dual 3.06 Xeon with 1Mb cache.</b></i>

<font color=purple>Ladies and Gentlemen, its...Hammer Time !</font color=purple>
 

Kelledin

Distinguished
Mar 1, 2001
2,183
0
19,780
They never improved SSE on the K7 so I hardly think they will on the K8 thats R&D money they dont have.
They didn't "improve" Palomino's SSE support because it worked just fine from the official get-go. It just wasn't SSE2, which was arguably a much bigger mod than was technically feasible--mainly the XMM register size expansion. Updated instructions are relatively simple to add on a microcoded CPU; bigger registers are not.

(IIRC the K7 core taped out with some SSE instruction support, it just wasn't officially finalized--and didn't set the feature register bit--until the Palomino. So implementing SSE with Palomino's introduction wasn't a terribly big deal engineering-wise.)

Also, note that AMD already committed to correcting the SSE2 implementation hangups. According to their optimization guide, the most recent steppings do indeed fix the glitches. Problem is, considering that many (most? all?) review sites are currently working off older-revision overclocked parts, or possibly even engineering samples, reviews probably won't reflect the SSE2 revisions until sometime after the official release.

They need to scale not increase the IPC anymore because if the Prescotts like anything I think it is they are going to need to scale quickly to stay in this race.
So far it doesn't seem like AMD will have much problem scaling. AFAIK they're already planning a quick bump to 2.4GHz not long after the September 23rd release.

Plus, Prescott clearly has a few teething problems of its own, so don't expect it to ramp immediately.

<i>I can love my fellow man...but I'm damned if I'll love yours.</i>
 

SJJM

Distinguished
Aug 7, 2003
228
0
18,680
I would have to agree on this too. I think that hypertransport makes a difference.

<font color=blue>"You know, that my backstab attack does double the damage. I can make an off button for him." </font color=blue> :cool:
 

Kelledin

Distinguished
Mar 1, 2001
2,183
0
19,780
Not sure I follow you on that...if AGP8x gains us so little over AGP4x, and its host-bus bandwidth demands could be met by simple PC133, why would the HyperTransport tunnel make much difference?

<i>I can love my fellow man...but I'm damned if I'll love yours.</i>
 

imgod2u

Distinguished
Jul 1, 2002
890
0
18,980
More importantly, if there was such a lack of bandwidth in the part of processor -> AGP bus communication, why wouldn't running a K7's FSB at higher speeds (async with memory) bring the same results in terms of higher performance?

"We are Microsoft, resistance is futile." - Bill Gates, 2015.