Sign in with
Sign up | Sign in
Your question

Why do P4's kill Athlon 64 in media encoding?

Last response: in CPUs
Share
June 24, 2005 6:14:59 PM

Does anyone know why P4's do so much better than Athlon 64's in the media encoding tests??
Is it SSE3 and Hyperthreading???

Take a look at the MPEG2 tests in:
http://www.sharkyextreme.com/hardware/cpu/article.php/3...

The P4's are generally 40% faster than the Athlons.
Even the dual core A64 X2 4800+ is bested by the dual core P4 EE 840 which is only running at 3.2Ghz.

What really stood out for me in that test was the P D 820 dual core which at $260 outruns both Intel and AMD processors costing $800-1000. Looks like a good value for those like me doing video editing.

More about : kill athlon media encoding

June 24, 2005 6:18:17 PM

The Venice core also includes SSE3.

<font color=green>If you work on a thing long enough to improve it, It will break</font color=green>
June 24, 2005 6:56:22 PM

Still doesn't explain why the A64 gets killed in this area unless it is due to HT?
Related resources
June 24, 2005 7:02:13 PM

Clock speed. Plain and simple: raw clock speed.

Video encoding is a streaming process where the netburst architecture of the P4 really shines. Most other uses are branchy and the cpu has to stop and start and keep changing direction (this is an analogy to the technical reality), while encoding lets it open up and run full out in a straight line. SSE2/3/SIMD/etc., helps, but the primary reason is raw clock speed.

Video editing on the other hand, is branchy and causes the CPU to be much less efficient. The A64 (and the P-M, PIII, Athlon XP, etc.) is a much more nimble cpu in this context, but just doesn't have the top speed that the P4 has when going in a straight line.

I guess a car analogy fits pretty well - P4 = muscle car: Lots of top speed, but has trouble in the corners. A64 = rice rocket: Decent top speed (but generally is beat by the muscle car at the end of a looooong straightaway), but corners like it was on rails.

For video EDITING, the A64 provides a little better performance (at a given price point), but when you are ENCODING, the P4 can shine. If I was to give you my opinion (I can't resist :lol:  ) if you want a single rig to do both, get the P4. But if you have a separate rig for encoding so you can encode and edit at the same time (on different PCs), then get a P4 for encoding, and an A64 for editing.

Mike.


<font color=blue>Outside of a dog, a book is man's best friend. Inside the dog its too dark to read.
-- Groucho Marx</font color=blue>
June 24, 2005 7:04:12 PM

The differences are very close now...if you look at the more recent tests.
The reason that Intel has an advantage here is that Intel has a much higher clockspeed than AMD.
But fishmahn put it better...:-)
Cheers,
Charles<P ID="edit"><FONT SIZE=-1><EM>Edited by viditor on 06/24/05 03:05 PM.</EM></FONT></P>
June 24, 2005 9:15:20 PM

What makes you think video editing is more branche sensitive ? Not saying you are wrong or right, just wondering.. gut feeling would put video editing in the same type of apps as encoding (it is actually encoding so... ?)

I do agree though, that media encoding is one of those tasks where clockspeed matters, and most of the disadvantages of long pipeline are mitigated. HOwever, as someone else pointed out, differences really aren't that big, and depending upon what settings you use, what codec, what app, one or the other cpu is faster. I guess it has as much to do what they optimized the codec for as anything else.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
June 24, 2005 10:07:05 PM

I'm not 100% certain I'm right, but here's my reasoning.

Editing involves short stretches of streaming (play to 'here', perform this effect, etc.) but most of what you do in editing isn't encoding, it's viewing, backing up, scrolling forward - skipping every Nth frame and such, find the right spot, now grab this other piece from somewhere else (either in the same file or somewhere else), perform these effects, etc.

Except for short stretches where the the info is streamed serially, and the CPU can stretch its legs, most of editing is a lot like photo editing, or using an excel/word/powerpoint type app - pull down this menu (or activate this floating toolbar, etc.), activate this feature/function in the program, apply it to a short stretch of streamed data, now stop, back up, stream a few secs, stop, scroll forward, pick a different function, insert this short clip, etc. Also, while editing you aren't reading the whole file - you're usually encoding/decoding in low-res, not full resolution, so the raw amount of data is much smaller.

Well, that's where I get that idea. Also, and I don't have links off the top of my head (or even remember where - but likely Anand/similar sites), but I remember seeing benchmarks/reviews where a comparable A64 (and AXP in its day) was slightly faster than the P4 in many/most editing simulations. Not as much faster as in gaming or even business apps, but still noticeably faster. Hmm, could have been within the error margin of the tool, but I don't know.

If I have time tonight (or if I don't forget tomorrow as tonight is my wife & I's 'date night' to help keep us close) I'll see if I can track a couple benches down.

Good point on different codecs. I should have brought that up. I think (without any real proof - just my 'feel' for systems) that if you were to take the same app/codec, with 2 different compiles and optimizations, one tuned for P4, one for A64, the P4 will still outperform, though not by the 40% that some benches report. I think it'd be more like 10% (WAG). IMO, those benchies that show the P4 totally dusting A64 are run with binaries tuned for P4, and maybe not consiously, but detuned for A64, making it a quite unfair, though understandable comparison (tune for the majority product).

Mike.

<font color=blue>Outside of a dog, a book is man's best friend. Inside the dog its too dark to read.
-- Groucho Marx</font color=blue>
June 25, 2005 12:53:01 AM

>Editing involves short stretches of streaming (play to
>'here', perform this effect, etc.) but most of what you do in
> editing isn't encoding, it's viewing, backing up, scrolling
>forward - skipping every Nth frame and such, find the right
>spot, now grab this other piece from somewhere else (either
>in the same file or somewhere else), perform these effects,
>etc

Hmm.. maybe we should define what we are discussing here; if editing in your vocubalary is the act of cutting, applying effect, etc,.. then i could agree, but I must immediately add that this doesn't seem like a very CPU intensive activity to me. Bottlenecks would rather include Harddisk, I/O,..

If you mean previewing effects, and mostly after the editing, renderin the movie, AFAIK, that is just like encoding/recoding.

>Good point on different codecs. I should have brought that
>up. I think (without any real proof - just my 'feel' for
>systems) that if you were to take the same app/codec, with 2
> different compiles and optimizations, one tuned for P4, one
> for A64, the P4 will still outperform, though not by the
>40% that some benches report. I think it'd be more like 10%
>(WAG)

My WAG is that most codecs have different codepaths for different ISA's, since these things are really small, and hugely cpu intensive. Anything else would be truly dumb. That said, the performance gains you can get from proper optimization (especially hand coding critical paths in asm), the potential gain/loss should be expressed in orders of magnitude, rather than percentage.

>(tune for the majority product)

A single binary can contain codepaths for different cpu's. Of course, its quite feasable more effort has been spent on certain codepaths than others, furthermore its even quite thinkable intel themselves handcode certain parts of popular apps (photoshop filters) or codecs to ensure they perform well on popular benchmarks (and as long as you end up using that exact same app, its even quite 'fair' as you will get that very speedup as a consumer).

= The views stated herein are my personal views, and not necessarily the views of my wife. =
June 25, 2005 4:27:54 AM

Editing comes in many forms...
For cuts only editing (cut and paste), the 2 are quite close but Intel has the edge because of their clockspeed.
For transitions and effects (wipes, dissolves, etc...), this involves rendering. While Intel used to have an edge there as well, this has changed. Now it's AMD with the slight edge, and with multi-threaded renders (this will depend on which software you're using) the X2 has a big edge.

Cheers,
Charles
a b à CPUs
June 25, 2005 5:20:46 AM

The P4 makes better use of RAM than A64's. In fact, the P4's performance is centered around the high bandwidth memory bus. These are simple things...

Now, my experience with video encoding shows that, more than anything else, it's RAM dependant.

<font color=blue>Only a place as big as the internet could be home to a hero as big as Crashman!</font color=blue>
<font color=red>Only a place as big as the internet could be home to an ego as large as Crashman's!</font color=red>
June 25, 2005 5:58:36 AM

I see your point re: bottlenecks, and a lot of time will also be spent waiting on user input, so the better performer for that bit is the CPU that's more responsive, or 'nimble'. Assuming the user uses the same HDD in either situation, then we're back to the same result - not much difference, but the nod going to the more nimble cpu.
Quote:
If you mean previewing effects, and mostly after the editing, renderin the movie, AFAIK, that is just like encoding/recoding.

Here's where I agree and disagree. I see that as 2 parts. First is the previewing, in which you're playing the video. Sure, it's possibly (probably?) not in divx or mpeg2 or whichever format yet, its still in some 'raw' format, and it may be split up all over your HDD instead of in 1 file for a nice serial stream (which is precisely what P4 likes), but that's not very cpu intensive either - how much cpu do you use when you're viewing a dvd? So, in that bit, HD and IO speed are your primary bottlenecks. In that case, given the same hd is used, the more 'nimble' cpu will get the nod, possibly by a miniscule amount.

Second is rendering, which is definitely encoding. Actually, that's precisely what I assumed was encoding, along with any conversion from/to formats. That's something you do when you're done editing, and in some situations is best served by a separate rig (or cluster, as in a rendering farm), or done at night when you're not sitting there twiddling your thumbs. Admittedly that's more big production work, but even at home, you start the render, and go away for an hour or so (exception is the short bits, where AMD/Intel is a moot point - are you going to notice a render took 10 or 11 min? - well, you might, but its not real likely to matter).

I tend to the cynical on programmers. While you can have multiple codepaths for different CPU's, given the near-monopoly majority of Intel CPUs in the overall market (more than 80% by anyone's benchmark - close to 90 overall), why spend more than perfunctory time on 'merely' 10% of your potential market? I agree that benchmark tuning happens as well and its a nice bonus if you're doing exactly what the benchmark does.

Mike.

<font color=blue>Outside of a dog, a book is man's best friend. Inside the dog its too dark to read.
-- Groucho Marx</font color=blue>
June 25, 2005 6:05:41 AM

Hmm, now that's exactly backwards from what I would have thought. I would have thought that rendering would be the definitively P4 era, along with encoding, and cut & paste performance would be more AMD oriented (though close as you'd said).

Clockspeed can overcome efficiency... A motto I once heard, "If force don't work, use more force" fits that concept well.

-- Off to cogitate on that a bit - gotta get my brain around that thought and see if I can find corroboration/proof.

Mike.

<font color=blue>Outside of a dog, a book is man's best friend. Inside the dog its too dark to read.
-- Groucho Marx</font color=blue>
June 25, 2005 7:21:54 AM

That's interesting that you say that the P4 makes better use of RAM than A64. Especially since AMD is promoting their memory controller as 'integrated into chip' therefore eliminating the FSB and instead using Hypertransport at 1Ghz or 2Ghz speeds. That sounds like it would be faster than Intel's 800Mhz FSB but maybe it marketing hype and not fact.
Anyone know what the benchmarks say about A64 vs. P4 in just pure memory bandwidth?
a b à CPUs
June 25, 2005 7:26:04 AM

AMD's on-die memory controller has lower latency, but the P4 makes use of more bandwidth.

You can see that fairly easily if you compare both processors in single and dual channel mode, the P4 gets a huge performance increase from dual-channel, while the A64 only gets a small one.

Low latency is great for programs that cache a lot of tiny files to RAM, such as games. Given that everything else on the A64 is great for games, this added boost makes the P4 look terrible by comparison.

<font color=blue>Only a place as big as the internet could be home to a hero as big as Crashman!</font color=blue>
<font color=red>Only a place as big as the internet could be home to an ego as large as Crashman's!</font color=red>
a b à CPUs
June 25, 2005 7:27:17 AM

AMD's on-die memory controller has lower latency, but the P4 makes use of more bandwidth.

You can see that fairly easily if you compare both processors in single and dual channel mode, the P4 gets a huge performance increase from dual-channel, while the A64 only gets a small one.

Low latency is great for programs that cache a lot of tiny files to RAM, such as games. Given that everything else on the A64 is great for games, this added boost makes the P4 look terrible by comparison.

So that is to say, the P4 makes use of more memory BANDWIDTH, even though the A64 does a great job at reducing memory controller latency.

<font color=blue>Only a place as big as the internet could be home to a hero as big as Crashman!</font color=blue>
<font color=red>Only a place as big as the internet could be home to an ego as large as Crashman's!</font color=red>
June 25, 2005 8:12:10 AM

I think you were right all along dispite what viditor said:
Quote:
For transitions and effects (wipes, dissolves, etc...), this involves rendering. While Intel used to have an edge there as well, this has changed. Now it's AMD with the slight edge, and with multi-threaded renders (this will depend on which software you're using) the X2 has a big edge.

That is simply not born out in the benchmark that I referred to in starting this discussion. See MPEG2 tests under:
<A HREF="http://www.sharkyextreme.com/hardware/cpu/article.php/3..." target="_new">http://www.sharkyextreme.com/hardware/cpu/article.php/3...;/A>

That benchmark was just done 2 days ago and the P4's have a substantial advantage over the A64s.

viditor also mentioned 'multi-threaded renders' and I have to disagree and say that Intel still retains an advantage here too. Nearly all of the single core P4s feature Hyperthreading which AMD can't do. If you compare the very top of the line the AMD X2 can run two threads because of dual core but the Intel EE 840 can run 4 simultaneous threads because of dual core AND Hyperthreading. In all the encoding benchmarks referred to above the X2 and the EE 840 are nearly identical. The single core processors are a different story with Intel having a definite advantage (around 40% which is substantial).
June 25, 2005 8:19:22 AM

So all this talk about the AMD having the FSB 'integrated into chip' and Hypertransport being 1 or 2GHz which makes it sound like it has tons of memory bandwidth is just marketing hype.
Intel still has more memory bandwith than AMD, correct?
June 25, 2005 8:43:59 AM

No. The Amd chip has just as much memory band width. Even THG shows <A HREF="http://www.tomshardware.com/cpu/20050221/prescott-10.ht..." target="_new">that</A>
Most of the time Intel's bandwidth is gobbled up. When programs are well optimized for Intel's SSE2, and other throughput enhancements, the speed and bandwidth can be better utilized.
This shows up better "on the rails" than in actually use. Video encoding seems to be the exception. Even there, it is only true, in comparison, with well optimized progs.
a b à CPUs
June 25, 2005 9:06:00 AM

Both have the same memory bandwidth, it's just that the P4 uses more of it in this application, from what I can tell. And "from what I can tell" comes from experience, using various processors at various bus speeds through various memory systems, for video encoding.

<font color=blue>Only a place as big as the internet could be home to a hero as big as Crashman!</font color=blue>
<font color=red>Only a place as big as the internet could be home to an ego as large as Crashman's!</font color=red>
June 25, 2005 12:12:26 PM

>Both have the same memory bandwidth, it's just that the P4
>uses more of it in this application

Nonsense. Memory bandiwth requirements are simply defined by the software and to some extent, how efficient the cpu is at executing the software. If the cpu is dogslow at certain routines, more memory bandwith will simply not be needed. If you'd somehow manage to make that code run 10x or 100x faster, either processor would be bandwith starved.

Its an urban legend P4 thrives on bandwith and A64 on low latency; reality is both thrive on low latency, and only on more bandwitdh for those apps where they are fast enough for the available bandwith to be a constraint.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
June 25, 2005 5:29:46 PM

Quote:
No. The Amd chip has just as much memory band width. Even THG shows that

The url you referenced <A HREF="http://www.tomshardware.com/cpu/20050221/prescott-10.ht... " target="_new">http://www.tomshardware.com/cpu/20050221/prescott-10.ht... </A>

does show a huge advantage that the Intel has in the PC Mark CPU Bench. Why is this if the A64 is supposed to be a faster CPU? Could this be the reason the video encoding benches are won by Intel rather than memory bandwidth?
June 25, 2005 7:29:16 PM

the reason pcmark shows an advantage is because its optimized for intel's ht. if you compare scores where the x2 is pitted against the pentium d, where both cpus can run mutliplte threads, the scores are a lot closer, dont remember who wins.
a b à CPUs
June 25, 2005 7:36:11 PM

That would likely take you back to branch prediction, which the P4 does poorly, and that video editing could make less use of it, taking full advantage of the P4's higher clock speed. Which would in turn put data through the RAM at a higher rate.

Still, you see the P4 getting a bigger boost from dual-channel mode than the A64. And still you see the A64 beeting the P4 in most applications even with dual-channel enabled.

<font color=blue>Only a place as big as the internet could be home to a hero as big as Crashman!</font color=blue>
<font color=red>Only a place as big as the internet could be home to an ego as large as Crashman's!</font color=red>
June 25, 2005 9:52:34 PM

Umm, P4 has better branch prediction than the A64 - it needs it because the penalty for a missed branch is almost twice what the A64 is.

Mike.

<font color=blue>Outside of a dog, a book is man's best friend. Inside the dog its too dark to read.
-- Groucho Marx</font color=blue>
a b à CPUs
June 25, 2005 10:17:22 PM

I'm looking at the end result, the P4 acting like it has a lot of latency, being caused by whatever reason related to its deeper pipeline.

<font color=blue>Only a place as big as the internet could be home to a hero as big as Crashman!</font color=blue>
<font color=red>Only a place as big as the internet could be home to an ego as large as Crashman's!</font color=red>
June 26, 2005 12:00:08 AM

>That would likely take you back to branch prediction, which
>the P4 does poorly,

As pointed out, it does the prediction rather excellently; it is however, generally still slower on branchy code than K8 because of, among other reasons, its longer pipeline.

>Still, you see the P4 getting a bigger boost from
>dual-channel mode than the A64.

I'd like to see some good benchmarks on that, before accepting this as a fact. Secondly, this wouldn't necessarely contradict my claim for several reasons I'm too lazy to explain right now.

Its my opinion that this "P4 loves bandwith and A64 loves LL" is for the most part, just a myth; in that other long thread where I discussed with slvrphoenix, I gave a list of reasons why people think this, but I have yet to see convincing data that shows there would be any significant difference between A64 and P4 related to bandwith or RAM latency scaling.

Besides, I think DDR2 disproves this theory pretty much. Even though DDR533 has considerable more bandwith than DDR400, and even its absolute latency numbers (expressed in ns, not CAS latency!) is quite close, there is nearly no gain whatsoever.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
June 26, 2005 12:03:09 AM

>I'm looking at the end result,

Be carefull doing that, because cpu performance is just an incredibly complex matter, depending on so many factors, you just can't decide 'it does better on streaming media encoding, so it will because of X'. To name just one potential (and likely) reason why the P4 performs so well, is that its SSE2 units basically have the same IPC as the K8s (at least, afaikà, yet run at a ~50% higher clock.

>the P4 acting like it has a lot of latency

Please do explain.. a cpu having a lot of latency ?

= The views stated herein are my personal views, and not necessarily the views of my wife. =
a b à CPUs
June 26, 2005 12:17:58 AM

Taking longer to execute.

Common now, everything has latency, how long does it take for light to get from the sun to the earth...

<font color=blue>Only a place as big as the internet could be home to a hero as big as Crashman!</font color=blue>
<font color=red>Only a place as big as the internet could be home to an ego as large as Crashman's!</font color=red>
a b à CPUs
June 26, 2005 12:21:06 AM

A P4 with 64-bit 800 bus can't use the extra bandwidth of a 128-bit 667MHz RAM bus. See, that's an easy one.

<font color=blue>Only a place as big as the internet could be home to a hero as big as Crashman!</font color=blue>
<font color=red>Only a place as big as the internet could be home to an ego as large as Crashman's!</font color=red>
June 26, 2005 1:29:14 AM

I think its a good observation. When you start a procedure (load a prog, select a menu choice, etc.) the shorter pipeline on the A64 means you start seeing results in a matter of dozens of clock ticks compared to the P4's hundred or so ticks. Even though the clock speed is 50% higher, it takes 75% more ticks to get the same work done (numbers in the rough - not to be taken as gospel or even real - just to show the point).

I notice it a lot between work and home. Home is an XP3200 (well, almost 191fsb, 2196mhz), work is a Prescott 3.0ghz with HT. Which is a better performer in your opinion? The P4 seems to lumber along and there's a slight delay when you click something before it gets up to speed. The AXP is more responsive and reacts faster. Some of that could be RAM (1gig on the AMD, 512m on Intel), but for basic Office apps & web surfing? I doubt much if any.

Mike.

<font color=blue>Outside of a dog, a book is man's best friend. Inside the dog its too dark to read.
-- Groucho Marx</font color=blue>
June 26, 2005 1:29:54 AM

OOps major duplication - Deleted!<P ID="edit"><FONT SIZE=-1><EM>Edited by fishmahn on 06/25/05 08:43 PM.</EM></FONT></P>
June 26, 2005 11:06:00 AM

>I think its a good observation. When you start a procedure
>(load a prog, select a menu choice, etc.) the shorter
>pipeline on the A64 means you start seeing results in a
>matter of dozens of clock ticks compared to the P4's hundred
> or so ticks. Even though the clock speed is 50% higher, it
>takes 75% more ticks to get the same work done (numbers in
>the rough - not to be taken as gospel or even real - just to
> show the point).

You may want to rethink that; A P4 pipeline "cycle" takes 0,000000007 seconds (3 GHz NW), I somehow doubt that by itselve could cause any noticeable delays. Especially when the A64 takes roughly 0,0000000055 seconds (@2Ghz) to do the same. Are you claiming you could notice the difference ?

Any difference in responsiveness are more likely caused by harddisk performance, HD fragmentation, the OS, background apps, size of the windows registry,.. your own stress level, ..




= The views stated herein are my personal views, and not necessarily the views of my wife. =
June 26, 2005 12:01:22 PM

Please do compare FSB 1066 with DDR-2 533 using medioce timings, with FSB 800 and low latency DDR-1 400 so both memory subsytems would have comparable absolute latency (in ns). I doubt the performance difference will be anything else than neglectable.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
June 26, 2005 1:18:04 PM

Ok, if it's not that...

Quote:
more likely caused by harddisk performance, HD fragmentation, the OS, background apps, size of the windows registry,.. your own stress level, ..

Both HDD's are 7200rpm 8mb cache 80gig. Don't know what brand is in the work machine (its a Dell), but at home it's a Seagate Barracuda 7200.7. Both SATA.

Work's PC is a pretty new installation (march IIRC) and there's about 50gig free space so fragmentation is minimal. At home the installation is a year old, there's 6gig of space left (95% full) and I've never defragged it yet. Both XP Pro SP2 w/auto updates.

At home I run the monster of cpu suckers for AV (Norton), McAffee at work which doesn't take as much.

Don't know about size of registry, but off the top of my head: Home PC with numerous games installed, 1 yr old installation with lots of installs/uninstalls would seem to get a larger one than work with just 3 mos and a couple apps and lots of free space.

Swapfile is set static on both. Don't remember sizes, but sufficient - 1gig or more in both cases.

Stress level? I doubt it as I'm normally more insterested in 'instant response' at home when its 'my' time than at work when it's 'their' time, and I work in the most un-stressful environment for a job I've ever had. (and I've been working for over 25 yrs)

Besides I'm not talking about a single instruction. The simple act of pulling down a menu is going to involve hundreds if not thousands of instructions - branches out to the language table, loading a new program module, windows calls to give the visual effect of clicking on the menu name, draw a the menu, etc. The excessively branchy nature of that sort of operation means the P4 branch predictor is likely at its lowest accuracy point, so...

I'd also wonder if, because of the longer pipeline, the P4 translates X86 instructions into more micro-ops than the A64. I'm (almost) certain they both don't use the exact same micro-op structure. That's going to be a difficult one to determine.

Oh yeah... and F@H loaded as a service on both systems. And the one at home allows large work units, the one at work small only.

All this just leads me to that conclusion. I'm open to other possibilities though, if they are feasible.

Mike.

<font color=blue>Outside of a dog, a book is man's best friend. Inside the dog its too dark to read.
-- Groucho Marx</font color=blue>
June 26, 2005 1:54:40 PM

>Besides I'm not talking about a single instruction. The
>simple act of pulling down a menu is going to involve
>hundreds if not thousands of instructions

Exactly, and pipeline length and the associated potential slow responsiveness is thereby completely mitigated, and only overall performance counts where the P4 is pretty much neck and neck with the A64. And I sincerly doubt opening a menu and sort of operatings would take longer than a nanosecond cpu time on anything as fast or faster than a P3. Anything taking longer than that is caused by other reasons, so its just not realistic there would be a noticable difference caused by different cpu architectures.

Try this: underclock your Athlon system as much as you can, and tell me if its still snappier than the P4 at work. Or wait, you probably don't even have to, since if you enabled Q&C its already running at 800 MHz, and there is no way a 800 MHz A64 is faster than a 3 GHz P4 on anything but the most obscure, CPU performance unrelated benchmark.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
June 26, 2005 7:34:52 PM

It's an XP not a 64. No C&Q.

Assuming what you say is true, what is your diagnosis of the perception then? I notice it on a regular basis - start 'x' program, or use 'y' function of word or excel at home and response is perceptibly quicker than doing the same thing at work. Not over a long time (i.e. the whole program load), just that split second (i.e. the splash screen display).

I suppose its possible that other factors come into play - MS mouse at home, Logitech at work, LCD at work, CRT at home, integrated graphics at work, 9600XT at home. Its possible that the combination of all these things *could* present what I see here, but I still feel a perceptible difference.

Mike.

<font color=blue>Outside of a dog, a book is man's best friend. Inside the dog its too dark to read.
-- Groucho Marx</font color=blue>
June 26, 2005 9:43:51 PM

>Assuming what you say is true, what is your diagnosis of the
>perception then?

Tough call, could be so many things.. just a WAG, at work you are in a domain (active directory or otherwise) while at home you are not ? As I said, try downclocking your Athlon, and if you still perceive a snappier system, you can be assured its NOT the cpu.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
June 26, 2005 11:36:22 PM

Quote:
Memory bandiwth requirements are simply defined by the software and to some extent, how efficient the cpu is at executing the software. If the cpu is dogslow at certain routines, more memory bandwith will simply not be needed. If you'd somehow manage to make that code run 10x or 100x faster, either processor would be bandwith starved.

That pretty much says it all. After all, the P4 does run 50% faster, and executes SSE2 better, so it does indeed need more bandwidth.
You also seem to forget that latency is unidirectional, while FSB is bi-directional, so an increase in fsb will have twice the impact on bandwidth that it will have on latency.
June 26, 2005 11:58:47 PM

But the Pentium D is a $259-600 CPU and the X2 is $600-1100.
The EE 840 is $1100 but it is hyperthreaded and dual core so can run 4 threads.
June 27, 2005 2:10:38 AM

We've had a lot of discussion here of a number of factors but I think I finally found something that points out clearly the advantage that Intel has in video performance.

I think it boils down to floating point performance, possibly compounded by better SSE2, SSE3 performance.

What really stood out to me was a benchmark I found on this site under the title "The Mother of All CPU Charts Part 2".
The Intel bars are blue and the AMD bars are green.
If you skip down to the <A HREF="http://www.tomshardware.com/cpu/20041221/cpu_charts-18...." target="_new">video performance benchmarks</A>
you'll see a lot of blue (Intel) at the top of the charts.
What's impressive is that some of these near the top are $270 CPU's (P43.4E) beating $500 (A644000+) and better CPU's.

What becomes even more clear is when you look at the synthetic benchmarks under <A HREF="http://www.tomshardware.com/cpu/20041221/cpu_charts-22...." target="_new">Whetstones </A> which is floating point performance. This is completely dominated by Intel.
The same is true for the <A HREF="http://www.tomshardware.com/cpu/20041221/cpu_charts-23...." target="_new">multimedia bench </A> both integer and floating point are dominated by Intel.

Clearly these differences are REAL and not explainable just by saying one benchmark is optimized for Intel. This is especially so since the video performance benchmarks are real applications and they tested 4 or 5 different applications.

I may be sounding like an Intel fan but believe me I am not! I just want the best performance for my money and if you are doing heavy video processing it just seems that Intel does better while A64 does better in gaming.

Like everyone I started out with Intel but switched to a 750Mhz Athlon when AMD had double the FSB and better CPU performance (including floating point). I switched back to Intel when they came out with dual channel DDR 800Mhz FSB and where way ahead in Ghz. I'd really like to switch back to an A64 but not if its going to cost more money or give less performance. Right now I just can't see anything that convinces me that a reasonably priced A64 is going to outperform a reasonably priced P4 (reasonably priced is $150-300).

Maybe AMD has dominated 64 bit computing which is still in the future but not floating point and multimedia performance which I need right now.
June 27, 2005 2:20:24 AM

Its all like AMD and Intel pull up to a stoplight and Intel starts reving its engine and AMD knows its gonna lose the race by a little bit so it goes extra slow to prove that it has no intention to race @ all.
Thats my story and i'm stickin to it


The know-most-of-it-all formally known as BOBSHACK
June 27, 2005 2:21:44 AM

The guy in the AMDmobile also get's the ladies whereas the intel guy is sad and lonely.

The know-most-of-it-all formally known as BOBSHACK
June 27, 2005 2:23:14 AM

You do however see 3 little blue heads bobbing up and down on intel's lap and this draws a lot of attention.

The know-most-of-it-all formally known as BOBSHACK
June 27, 2005 2:23:46 AM

drugs are bad mmmmmkay

The know-most-of-it-all formally known as BOBSHACK
June 27, 2005 3:57:46 AM

I may just do that.

I'm on an NT4 domain in both locations.

Mike.

<font color=blue>Outside of a dog, a book is man's best friend. Inside the dog its too dark to read.
-- Groucho Marx</font color=blue>
June 27, 2005 5:50:13 AM

Yes, an Intel 3.6 ghz prescott will beat an A64 3500, at some well optimized encoding progs. Divx is a prime example. The A64 will win most other benchmarks. On the other hand, that P4 will not beat a similarly priced A64 3800. True, the Intel will still win 1/2 the benches in encoding. The Amd chip will win the other 1/2 of encoding benches, and eat the Intel chip in all other benchmarks.
Add to that the simple fact that without high-end extreme cooling, the Intel system will spend a good deal of time throttling, and you have a good understanding of why Amd offers the better option for everyone, right now.
June 27, 2005 6:18:11 AM

>You also seem to forget that latency is unidirectional, while
> FSB is bi-directional, so an increase in fsb will have twice
> the impact on bandwidth that it will have on latency.

Nice logical phalacy, I almost fell for it :)  But a 10% increase in FSB will still reduce (fsb) latency by 10%, not 20% as you seem to suggest. Think about it...

= The views stated herein are my personal views, and not necessarily the views of my wife. =
June 27, 2005 6:40:37 AM

You mean an increase of 10% on memory calls, and 10% on memory. Not quite true though, as memory isn't the only thing using the fsb on Intel chips. The actual gain is marginally better than 10% for each.
The thing is, that getting more calls sooner, helps to negate the effect of latency, especially with streaming extensions. Then the increased memory bandwidth gives a further boost, on the return.
I agree that that doesn't work out to a 20% gain, but think that it may account for more of the total gain than latency does.
June 27, 2005 6:56:29 AM

>What becomes even more clear is when you look at the
>synthetic benchmarks under Whetstones which is floating
>point performance.

Whetstone and drystone are nonsense benchmarks these days; its a miniscule piece of 1980's code that simply doesn't reflect anything usefull. It used to be somehwat usefull for gauging performance in technical computing ("supercomputers"), but even there its considered obsolete. Just looking at those graphs should tell you that much; try and find me one real world app that runs faster on a 2.4 GHz northwood than on a FX55. Writing an app that reports the clockspeed and adds the square root of the ammount of cache would be a more meaningfull performance indicator.

In general synthetic benchmarks are totally useless to compare performance between different architectures. Its only usefull as an analysis tool if you know exactly what the benchmark does, it could help providing insights in what bottlenecks there might be or something (think apps like cachemem) , but since no one *runs* synthetic apps, why should you care about the results, especially if they do not reflect actual app performance ?

>This is completely dominated by Intel.
>The same is true for the multimedia bench both integer and >floating point are dominated by Intel.

Its not as easy as that. In fact, in x87 floating point, the AMD chips perform considerably better. Still its true generally netburst is better at video encoding, mostly because of its SSE2 performance. Other reasons include software optimization; intel provides excellent compilers and libraries which do perform very well, although of course primarely for their own cpu's. Lastly, performance really does depend on what codec/app you use, and even the workload can make a huge difference. I've seen tests where for instance a certain CPU would beat the other one with a large difference on 3DMax (or similar 3D renderer) using a certain scene, yet with the very same app, rendering a different scene, the tables turned. Now intel provides reviewers with 'guidelines', which are more often than not followed at least partially by the reviewers, and guess which of both scenes they will recommend ? How come so many review sites use the exact same scenes ?

Anyway, just a quick link to prove my point. Have a look here:
<A HREF="http://www.digit-life.com/articles2/intelamdcpuroundupv..." target="_new">http://www.digit-life.com/articles2/intelamdcpuroundupv...;/A>

Yes, its dated, but just look how different cpu's excell on one codec, and fail on another.. Everyone (especially THG) uses DivX to measure encoding speed, while Xvid is the more popular codec (its free, and generally better). Coincidence ?



bref, short story is: don't believe everything you read on THG, but its still true netburst generally has an edge over K8 in media encoding. But if you mainly use a single app/codec for your cpu hungry conversions, you should find a benchmarks that utilizes that specific app/codec, and base your purchase decission upon that. Not much point in getting cpu A if that is generally better in encoding, but offers considerable less bang/$ on the app *you* need it for.

one last thing still: whatever cpu you end up buying, make sure its 64 bit capable. Expect fairly considerable speedups for media encoding once 64 bit ports are released/mature.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
June 27, 2005 7:13:09 AM

You're not making too much sense here; using correct terminology might help getting your point across.

>You mean an increase of 10% on memory calls, and 10% on
>memory

No, I meant it doesn't matter memory access latency works "both ways" whereas bandwith would work "one way". If you decrease the latency by increasing clockspeed of FSB and RAM, you will get at best a linear speedup with that clockspeed. 10% faster FSB/RAM can never result in >10% performance on anything, even low level synthetic performance tests. And to get that 10% speedup, your benchmark should be 100% ram latency bottlenecked.

The exact same thing applies to bandwith, a 10% clock increase (of FSB/RAM) can only result in up to 10% better performance.

Worse even, 10% higher bandwith and 10% lower latency can only result in up to 10% better application performance, but never more, you can't add those percentages. In fact, to ensure 10% faster application performance, ALL performance sensitive clocks should get a 10% boost, so you would need 10% faster CPU clock AND 10% faster fsb AND 10% faster RAM (both bandwith and latency). Only then COULD you see a 10% actual performance boost. Performance can't scale superlinear with clockspeed of anything, only with things like cache *size* (doubling the cache can in theory increase performance by more than 2x on low level synthetic benches, if the bigger cache can hold the entire dataset, and smaller cache can't, since the L2 cache is much more than 2x faster than RAM).

>I agree that that doesn't work out to a 20% gain, but think
>that it may account for more of the total gain than latency
>does.

Nope. increase any or all clocks by X% and performance will increase by Y% where X>Y. Only possible exceptions are where you change other things as well, like running FSB in or out of synch.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
!