A64 P4EE and THG bottom line

TknD

Distinguished
Dec 31, 2007
102
0
18,680
I'm getting tired of the whole A64 and P4EE debate. I'm also getting tired of the accusations on this and that and what's wrong and what's right.

So here is what I think is the bottom line:

<b>A64 vs P4EE:</b> there is no clear winner here. The cpu architectures are quite different and offer different advantages (and disadvantages) and that's obviously showing in the benchmarks. In my book on 32bit functionality and at this moment, I think they are equal--there is no other sound way to look at this. People are basing their entire arguments on a benchmark when the other cpu is just a couple of "hairs" away. In my book, that's not a substantial win for either processor.

<b>A64 and 64bit code:</b> It will be a while before we see optimized applications and even a longer while before we see compiler technology use optimized techniques for the new architecture. Compilers are a lot of work and in many ways an art form. I think in the next few months we will see significantly hand-optimized software for AMD64 out.

I'm not sure why nobody is excited about AMD64 (because it's made/owned by amd?) but it is new and available DESKTOP technology, and in my opinion new stuff is always exciting even if it means a lot of change... yet people seem more excited about intel when they add another 2MB of cache to an existing cpu. My opinion is obviously skewed because I do program and working with just 8 registers in x86 is not fun. (Not that having 16 would be more fun, just easier to optimize some fairly simple things). But this is an opinion, not something to argue about.

<b>THG accusations:</b> Yes, the added overclocked P4EE bench's were wrong, but the article has been updated. Everyone knows about the flaw that needs to know. It's over, stop beating a dead horse.
 

pitsi

Distinguished
Jan 19, 2003
650
0
18,980
Just a little someting I was thinking. I don't think is correct from a hardware review site like THG, to reccomend a product that doesn't exist (P4 EE) over another (A64 FX) that does exist (and don't tell me about limited supply because right now, anyone who wants an FX can get one). In two months, if P4 EE is available then ok, it's their right to do so if they think it's better, but right now it's a product that doesn't exist. It's like benchmarking the Athlon 64 FX-53 and declaring it as the best CPU available, while everyone knows that this CPU won't be available to consumers for a few more months.
 

ytoledano

Distinguished
Jan 16, 2003
974
0
18,980
I read at Ace's Hardware that they compared a P4 3.4 EE, so does it exist?

Roses are <font color=red>red</font color=red>, violets are <font color=blue>blue</font color=blue>, post something stupid and I won't reply to you!
 

spud

Distinguished
Feb 17, 2001
3,406
0
20,780
Aces also had a OC'd A64 at 2.4 makeing it the much touted FX-53. Well the CPU was unlocked so its really moot there. I dont see people hanginf Aces up on the cross for that. Its the double standard thats the problem and the only ones that got upset are the AMD fanboys. I for one was like meh cant get a P4EE probably wount either.

-Jeremy

:evil: <A HREF="http://forumz.tomshardware.com/modules.php?name=Forums&file=faq&notfound=1&code=1" target="_new">Busting Sh@t Up In My Buddies Face!!!</A> :evil:
 

dinkster9

Distinguished
Jun 27, 2001
314
0
18,780
rememinds me of the first to 1ghz war. It was all on paper for quite a while. Intel made like 5 or something so reviewers could "play with them" and supposidly claim some title which they were late on. It does get a bit rediculous to see a bunch of reviews and recommendations for CRAP you CAN'T get your hands on for months or EVER in some situations....anyone seen that world renound (by THG) Q17 $350 LCD? No, I couldn't find the one with DVI either, and especially not for $350.

<font color=purple>A7n8xDX2.0|B2500+@200x11=3200+(2.2ghz)(1.775v)Corsair512twinxpc3200llpt(2-2-2-6)R9700P(stock)160Gx2WD7200RPM 8MB HD|Enermax460W|T:43C@N,50C@FL,MB 28C|3DMark2001 17087|3DMark2003 5134</font color=purple>
 

ksoth

Distinguished
Dec 31, 2007
3,376
0
20,780
On Ace's harwdware he had an overclocked Athlong 64 FX as well as an overclocked P4EE. I'm sure that no one would have been mad at THG if their review would have had both chips overclocked. THG only had the P4 overclocked, which made it look unfairly good against the Athlon 64 FX. I think it was bad for THG to do this because a lot of enthusiasts got their first impression based on seeing the review the day it came out. I'm sure a lot of people didn't come back and reread the article and THG's explaination.

As for my opinion on the Athlon 64, I just don't know. The THG review doesn't make it look that good, but Anandtech and Ace's all have it doing pretty well against the P4, meeting or beating it in most tests. My next computer will be an Athlon 64, but that's mainly because I expect the Athlon to be a bit cheaper than the P4, yet perform well enough to where I won't really know the difference. I don't care about 5% or so. But, I mainly use my computer for web surfing and games, and for web surfing it doesn't matter but the Athlon seems to be a bit better with games (according to Ace's and Anandtech).

Also, AMD needs the support. :) Intel is great and all, but we really do need 2 big CPU makers to make sure prices stay down and innovation stays up.

-------------------------------------------
<font color=blue> "Trying is the first step towards failure." </font color=blue>
 

bronibbear

Distinguished
Dec 25, 2001
121
0
18,680
I'm pretty impressed with the A64 remember it’s a radical new design and still has a lot of room for improvement i.e. better chipsets to support it.

Even so it’s holding its own against Intel’s best P4s. That's more than can be said for Intel’s début of the P4 which struggled to keep up with its predecessor the P3 in many benchmarks let alone the Athlons of the time.
 

eden

Champion
Even then!

The Athlon 64's architecture back then against the P4 architecture would still be inferior.

P4s have IMO the most amazing core architecture technologies for the x86.
Radical new design, lol! It's only a K7 with slapped on extra features, get the reality people. (wasn't directed at you reever)

--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A> :lol:
 

eden

Champion
d00d, IMO the only reason it sounds so superior in other sites is because most didn't use as much as THG did. Let's put it into perspective: A64 rules games. Ok, but just how much did Anandtech show in other aspects?

THG used a WIDE variety of multimedia apps, not only a few, including many news ones like Nuendo. It was clear the P4 is still and will continue to be the ruler until AMD 64 takes place. (even then I don't think one bit it's due to 64-bit processing but the added registers)

THG so far seems to have the most diverse showing especially in anything other than gaming, however I have no problems admitting the A64 is the killer gamer. Whenever the P4 won in recent games, it was by a slim margin.
However Extremetech's multitasking benchmarks show the P4 easily almost twice ahead in performance. So to me I still think AMD should go use HT.

--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A> :lol:
 

spud

Distinguished
Feb 17, 2001
3,406
0
20,780
HT is very complex technology I dont AMD will be slapping that in any cores anytime soon. By the looks of it AMD will settle for physical dual cores instead. But hey IBM now has something similar to HT they may licence it over.

-Jeremy

:evil: <A HREF="http://forumz.tomshardware.com/modules.php?name=Forums&file=faq&notfound=1&code=1" target="_new">Busting Sh@t Up In My Buddies Face!!!</A> :evil:
 

ksoth

Distinguished
Dec 31, 2007
3,376
0
20,780
The problem with comparing many application and multimedia performance between the two is that there is so much P4 optimized software. I realize that's the way it is and the P4 justly performs faster, but it's because of the optimizations and not neccesarily because the P4 is a better chip. I'm sure that if companies went ahead and optimized their software for the Athlon 64 the two would be much closer in many applications. It's just a shame that the software industry usually puts AMD to the side like that, but oh well.

Maybe the Athlon 64 will be a turning point for AMD and will actually have companies optimizing their software for it, especially when WinXP 64 comes out. If you have Photoshop and 3DStudio Max and Maya, and all these other programs recompiling in 64-bit and making use of the extra registers and all that, you'd see quite a big improvement for the Athlon. And it would be really good because for the Athlon you'd have the ability to run the 32-bit applications and the 64-bit programs in the same OS, so you'd get top notch performance for recompiled and optimized programs, yet you can still use it for all older 32-bit software. I can't wait until the 64-bit part of the new Athlon gets mainstream.

-------------------------------------------
<font color=blue> "Trying is the first step towards failure." </font color=blue>
 

TknD

Distinguished
Dec 31, 2007
102
0
18,680
HT is very complex technology I dont AMD will be slapping that in any cores anytime soon.
They won't and HT wouldn't really benefit AMD's processors as much as Intel's.

Intel knows the P4 design and intentions; HT was essentially their way of squeezing maximum efficiency out of the chip architecture.

AMD went a different route of efficiency which is more like good plumbing: basically making sure all of the pipes are always being utilized.

Both strategies have benefits and disadvantages. Personally, I think the AMD way is more logical (but no multiple threading benefit) but I think at times Intel's HT could be useful (and at other times not useful).
 

infiltrator

Distinguished
Sep 24, 2003
73
0
18,630
I think both AMD and Intel have interesting architectures, if it were not for either one, the other would have dominated and we all know what that means. It's good to know that these vendors are competing. Intel have been around for a long time, and they definately have the PR and production advantage (and knowledge). But anyone who underestimates AMD are fools. AMD have been producing their own chipsets for MUCH less time, than Intel, they have come to take on the industrys top x86 chip manufacturer. AMD IS a threat to Intel any way you see it. They may fail in the future, and they may not - but for now, they are giving Intel a jolly good amount to worry about.

Don't get me wrong, Intel have a good footing to even take AMD to the cleaners. My point here is that neither of these 2 should be underestimated. For many years now, AMD have given us more BANG for our BUCKS. Intel has been the Power Performer (those who need power and got the cash) and AMD have changed this for now (until Prescott might change this).

Also, each chip performs differently in different areas (Intel = numbercruncing and AMD = Graphics?). Overclocking is yet another factor : Most (95%?) of people who buy chips will never OC their chip, so I think standard ratings should be used in major reviews. I like the OC reviews too, but that should be labeled differently - so people can clearly see the difference. I like both, but I think reviews should put the CPU's against each other head to head - with no 'to be released' chips - so the REAL comparison can be made.

Other reviews can be done, showing how much you can get out of your chip and then rate them in a OC Review.
 

Shielder

Distinguished
Jul 18, 2003
44
0
18,530
We use Windows Linux and Unix machines and I have to say, based on our real-life tests, (no optimised benchmarks here) the Athlon is much better at number crunching than the Pentium.

We trialled a number of systems against each other, the most notable ones were a 4 node PII cluster, a 1.0GHz UltraSPARC III system (Sun Blade 2000), a 1.4GHz Athlon and a 2.0GHz P4.

Which was fastest?

The Athlon.

Times for execution:
Sun - 128 minutes
P4 - 109 minutes
Cluster - 81 minutes
Athlon - 73 minutes.

Based on this, I would recommend AMD based processors over Intel for any number crunching application.

Games may be different (probably will be) but for our Monte Carlo calculations, AMD rules.

However, this is a pointless debate. Each processor has it's stengths and weaknesses. What one user wants will be different to another's. You can't please everyone. If you like Intel, buy Intel. If you like AMD, buy AMD. But saying that you are right and everyone else is wrong/stupid/arrogant etc etc, is not going to help any civilised debate.

Just my 2p.

(Sorry to ramble a bit, I need to break these boots in...)

There are lies, damn lies and statistics - Mark Twain.
 

eden

Champion
When I speak of superior/inferior architecture, I never make note of final performance. It is clear the K8 can hold its own.

What I meant is simply how the architecture is made. The P4 in every aspect is newer to the K7, who's design is very ressemblant to the traditional P6. Same kind of layout really.



--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A> :lol:
 

eden

Champion
The problem with comparing many application and multimedia performance between the two is that there is so much P4 optimized software. I realize that's the way it is and the P4 justly performs faster, but it's because of the optimizations and not neccesarily because the P4 is a better chip. I'm sure that if companies went ahead and optimized their software for the Athlon 64 the two would be much closer in many applications. It's just a shame that the software industry usually puts AMD to the side like that, but oh well.
The only things I perceive optimizable, are the AMD64 features. That aside, there isn't much since it's a K7, so basically you have little to add than what you did before.
The P4 is being optimized because it offers more than just SSE2. Programmers can try programming the trace cache, the different ports, the way it decodes, HT, etc.
The more linear feel of the P4 requires constant streaming to perform adequatly, something that is a shame it doesn't work as good as K8 per clock.

--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A> :lol: <P ID="edit"><FONT SIZE=-1><EM>Edited by Eden on 10/02/03 01:04 PM.</EM></FONT></P>
 

ksoth

Distinguished
Dec 31, 2007
3,376
0
20,780
How similar really is the K8 to the K7? Also, why does the fact that they are similar mean no optimization is possible? No programs were optimized for the old Athlon either.

-------------------------------------------
<font color=blue> "Trying is the first step towards failure." </font color=blue>
 

TknD

Distinguished
Dec 31, 2007
102
0
18,680
The P4 is being optimized because it offers more than just SSE2.
No, they optimize for P4 because it has a bigger market share. Every dell sold pretty much has a P4.

Programmers can try programming the trace cache, the different ports, the way it decodes, HT, etc.
You can't program the trace cache, that's something the chip does internally to determine which branch to use next. What you can do is make sure the next branches of code will be stored in the cache of the chip but that's highly unlikely and not something to be sure about when programming.

What you CAN program for is HT, other instruction extensions (SSE), and instruction latency. With HT, you have to divide your work into threads which isn't always the best way to go or an option in some cases.

The more linear feel of the P4 requires constant streaming to perform adequatly, something that is a shame it doesn't work as good as K8 per clock.
By definition, all processors need to be "linear" or something is inheritely wrong with them. Pipelining is not linear. In fact, due to pipelining, the order of execution becomes non-linear since with 2 instructions, the 2nd instruction may finish before the 1st. This isn't just common with the P4, but also the AMD processors and this has been going on since the days of the first Pentium.

Program execution is expected to be linear in output, so although the processor may be able to add 1 + 1 together faster than 1 * 1, if we ran

1 * 1 = sum

first then

sum + 1

We can't start the add operation until the multiply operation completes. In this case, HT nor pipelining will help you, you simply have to wait for that multiply no matter how slow it is.



Do programmers want to spend all of their time looking at every bit of code to make sure it optmimally runs the fastest per a case basis (per a processor)? Of course not. This is where I think Intel's biggest downfall has occured by releasing the P4 with low IPC. Intel obviously has made certain of several techniques to get around this: HT, high clock frequency, and of course large market share forcing software to be optimized when necessary. A good example of this is encoding where performance is critical.

On the other hand you have AMD: low market share, low influence on software development, and fewer manufacturing resources and research funds. So naturally, their decision is to go for high IPC so MOST programs will benefit. That plays out to some degree and particularly well when the code is not really optimized for either platform. But in many cases, the critical areas where most work is done is optimized for one processor and commonly not necessarily AMD's.
 

eden

Champion
How similar really is the K8 to the K7? Also, why does the fact that they are similar mean no optimization is possible? No programs were optimized for the old Athlon either.
Exactly, because the architecture was similar to the P6. Both didn't have that many programmable or optimizable paths, they were fairly straightforward.
Again, the P7 OTOH features tons more things that can be utilized to an advantage.

And to answer your first question, I'd say you'd have to be pretty stupid if you think it's a very different core. Simply going on THG's article and looking at the two cores near each other nullifies whatever doubt you had. The only architectural advances inside the core relative to K7 are the extended History Counter lists. Dunno if they did anything on the TLBs.

--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A> :lol:
 

eden

Champion
You can't program the trace cache, that's something the chip does internally to determine which branch to use next. What you can do is make sure the next branches of code will be stored in the cache of the chip but that's highly unlikely and not something to be sure about when programming.
You're welcome to debate the Trace Cache. However I've heard programmers do find the Trace Cache's storage amount not sufficient, ergo they must have some implication in programming and making sure it is used well. Indeed you can help by programming and making sure you can help it store the most whenever possible. Again don't take my word for it, but the one of some that imgod2u reported. I don't recall the thread, alas.
By definition, all processors need to be "linear" or something is inheritely wrong with them. Pipelining is not linear. In fact, due to pipelining, the order of execution becomes non-linear since with 2 instructions, the 2nd instruction may finish before the 1st. This isn't just common with the P4, but also the AMD processors and this has been going on since the days of the first Pentium.

Program execution is expected to be linear in output, so although the processor may be able to add 1 + 1 together faster than 1 * 1, if we ran

1 * 1 = sum

first then

sum + 1

We can't start the add operation until the multiply operation completes. In this case, HT nor pipelining will help you, you simply have to wait for that multiply no matter how slow it is
A big "MY BAD"!
I certainly fudged up my words with "linear". I wanted to say more that the P4 adopts a narrow and deep principle. And so, it doesn't always help. (But in SSE2 it sure does)
This is where I think Intel's biggest downfall has occured by releasing the P4 with low IPC.
Why do people keep beating this dead horse down. Why don't we at least acknowledge that it was released this way because of market competition and it came out premature? According to many, the P4 was supposed to come out 2 years later. That's 2 years of developpment lost. Hence Wilamette. That was it. Some estimate Prescott is the real P4. So obviously things didn't play out the way Intel wanted, and yeah lately against the A64 FX, per clock it can be up to twice slower in games. What Intel can still do, is for Nehalem to have another total redesign (which is likely it), however taking note of the P4's downfalls and uprises. The P4 features some of the most innovative features. But they are used in a setup that isn't optimal and often requires optimizing. Had they developped a wide and deep method, added some extra decoder to help get the instructions (even though in x86 that barely helps)flowing, then the features would likely rock.

How people still tell me HT is not for AMD is beyond me. I understand AMD is not ready for it yet, considering a few factors, one of them being the "suck-o-matic" of prefetching. I also understand it isn't easy. But their design is very optimal, if we look at the layout. I find it logical. It has 9 IPCs, more than half of them are estimated to be on standby most of the time (according to HP's research, a P4 in SSE2 number crunching achieves an average of less than 1 IPC, imagine Athlon, probably a bit more). HT has proven itself to truly boost application performance, specifically multimedia, despite its actual purpose to boost multithreading which often translated into multitasking. So what is stopping AMD from trying it out on a prototype? I may not have any semiconductor credibility, but I am almost 100% certain HT for AMD would give out so much more, as it is known that the IPC is never at 9, nor does it even peak that often at 4.5. It is also logical if you have dependencies often rather than perhaps getting instructions from another thread in. (consider 2 pipelines are used for an FP op, which means the FPU and AGU. Obviously if there is dependency, I see it logical to feed in another thread's instructions, as it at least keeps the flow and helps use more)

Feel free to shoot at me, I got little knowledge on this, and no programming skills. However I've learned some of this from the many knowledgeable people here and from website resources, in hope some of this makes sense. I feel the HT part does.

--
<A HREF="http://www.lochel.com/THGC/album.html" target="_new"><font color=blue><b>This just in, over 56 no-lifers have their pics up on THGC's Photo Album! </b></font color=blue></A> :lol:
 

ksoth

Distinguished
Dec 31, 2007
3,376
0
20,780
And to answer your first question, I'd say you'd have to be pretty stupid if you think it's a very different core. Simply going on THG's article and looking at the two cores near each other nullifies whatever doubt you had.

Actually, looking at the two cores side by side is what gave me the doubt. The outlines show that the overall layout of the chip is similar, but each individual part looks very different. Just look at the "FP Execution Unit" on the Hammer, then look at it on the K7. The individual units themselves are very different. I really have no idea if just looking different makes it truly different, but what it tells me is that AMD put a lot of effort into atleast redesigning and improving the K7 architecture to be better, even if they kept a similar layout for their chips. I would like to see a similar comparison between the P3 and P4 to see how different those 2 dies "look."

-------------------------------------------
<font color=blue> "Trying is the first step towards failure." </font color=blue>
 

spud

Distinguished
Feb 17, 2001
3,406
0
20,780
It has 9 IPCs,
Its like the P4 and any other superscalar machine they retire only up to 3 operands per cycle. No more and usually a lot less.

-Jeremy
Unofficial Intel PR Spokesman.

:evil: <A HREF="http://forumz.tomshardware.com/modules.php?name=Forums&file=faq&notfound=1&code=1" target="_new">Busting Sh@t Up In My Buddies Face!!!</A> :evil: