these quotes from ACe's hardware testing on P4
Both the P4 and the Itanium load FP instructions from the L2 cache (6 cycles latency), another indication that data dependancies are few and far between. As long as enough data arrives, the FPU can continue crunching and it doesn't matter if it takes a bit longer to get a certain piece of data. Therefore, we say that FP intensive code is more dependent upon bandwidth than Integer code.
The cachemem benchmark in the first article indicated that the latency of DDR SDRAM is about 16% lower than the average latency of dual-channel Rambus (i850). The stream test pointed out that in the best circumstances, the i850 (DRDRAM) can offer almost 80% more bandwidth than the AMD-760 (DDR SDRAM) (1574 vs 889). It is this massive amount of bandwidth that enabled the Pentium 4 to perform twice as well as the Athlon in the FP-intensive Linpack benchmark (Array size > 1024 KB). ""
" Intel added two instruction prefixes (to P4): "HWNT," Hint Weakly Not Taken, and "HST," Hint Strongly Taken. As such, the programmer or compiler can lessen the impact of branch misprediction somewhat. But how much?
The hint instructions are useful if (1) there are some branches in which the pattern does not converge well, but some degree of bias does nevertheless exist, or (2) there are so many branches that the CPU is running out of branch predictor resources.
The first can only be easily collected with "feed back directed optimizations," in which you perform a "test run" to collect information about the branch direction, ratio, and how well the predictors can predict them. This data is then used in a recompile.
Intel's latest version of its C++ compiler (5.0) is probably the most advanced x86 compiler on Earth. Intel C++ 5.0 was able to use this method to enable branch hint instructions to boost the Pentium 4's performance in SpecInt. Since these feedback directed optimizations, or profiling, is permitted in both the base and peak CPU2000 benchmarks, we unfortunately are unable to compare the two results to determine the overall impact of this optimization.
Integer applications, which have a rather large memory footprint like many of the benchmarks in the SPEC2000int suite, may perform much better on the Pentium 4 if developers decide to compile with special Pentium 4 optimized options.
Yes, the Pentium 4 has a lot of untapped potential here.
As you can see, all three Pentium 4 systems, from 1.3 GHz to 1.5 GHz perform nearly identically in the swim benchmark, indicating that increased clock-speed, and thus compute performance, is not the deciding factor in this particular benchmark. Further evidence supporting this claim is visible in the 1.2 GHz Athlon DDR's performance, which shows a considerable increase over its SDR SDRAM counterpart. This difference is not as pronounced in the other SPEC benchmarks shown here, but if enough do show this bias towards high-bandwidth memory interfaces, then this could boost the Pentium 4's overall scores considerably. To see more CPU2000 results, broken down into the individual sub-tests, see the Appendix.
P4 1200 AMD 794 "
The third reason, in combination with some SSE2 optimizations (the second reason) might explain why some CPU intensive benchmarks are still running faster on the Pentium 4.
Still, let us take a look at some Specfp numbers on different configurations.
AMD 1.2 359
The results of SpecFP 2000, show that the benchmark as whole, benefits more from increasing the bandwidth to the memory than higher clockspeeds. "
so you can see that the bandwidth and compilers for P4 makes the P4 much faster and this is why some lame testors
do not make fair tests
I guess in your mind "fair" would be not telling the people how P4 sucks in apps people using every day but only praising Intel how P4 would run apps if they were re-compiled and bandwidth-dependent. When cows gonna fly, sell them as airplanes, but not before...
You are the most idiotic person I have seen in my entire life.
First of all this topic is old and we don't care anymore, it's been proven the P4 sucks so BOO HOO
Second, your last comment, "so you can see that the andwidth and compilers for P4 makes the P4 much faster and this is why some lame testors do not make fair tests", doesn't even make sense. If the compilers are optimized and designed for P4 then doesn't that make <i>it</i> an unfair test? ROFLMAO you crack me up, boy.
"648kb is all the space anyone will ever need!"
Will you stupid lamers <b><i>shut up</i></b>. I am really getting fed up of all this AMD bashing. I can buy two Athlons for the price of your P4, which will kick the P4 even in the apps it supposedly shines in, like encoding, video editing, etc, even with optimised FlasK. I really don't see the argument.
No matter how slow you speak it won't be as slow as your P4, you complete and total muppet-boy.
~ I'm not AMD biased, I just think their chips are better. ~
March 12, 2001 6:20:05 PM
NO you misunderstood,
what I was trying to say is that all their apps are being redone for P4 as we speak,
inclduing office XP, windows XP, all games are being developed, Corel, etc etc etc
its called upgrading software to match hardware something done often..
there are over 400 P4 software developed apps comming..
the P3 and AMD athlon has same problem at first..
its called looking toward future and bineg prepared
It's interesting to me that the author of the article you are quoting (Ace's Hardware)has a little more to say about the performance on this particular benchmark:
"If the FPU of the P4 is so bad, why does it perform so well in the industry standard benchmark, SpecFP? Indeed, the Pentium 4 1.5 GHz achieves 562 (peak) where a PIII 1 GHz achieves only 314 (peak). However, Intel compiler 5.0 was used, which is heavily optimized for SSE-2. It will take some time before applications will have such degree of optimization.
SpecFP is an indicator for a few high-end industrial applications that use massive amounts of polygons (100-300k polygons) where memory bandwidth becomes a bottleneck and that the bandwidth to the FPU is more important than the peak FPU calculation power"
In other words, it's not so much the FPU of the Pentium-4, which by most accounts performs worse than the PIII, that is tested by this benchmark. It's the memory bandwidth. And I aknowledge that in this category the P4 kicks SERIOUS butt.
But in another post, you said that one of the p4's strong suits was it's FPU. That is simply not the case and Ace's spent a good deal of time proving that. And as for optimization being the "key" to P4 exhibiting superior floating point performance: "For SPECfp2000 the new SSE/SSE2 instructions offer about a 5% performance gain compared to an x87-only version. Five percent is not much, but is not very surprising, as SIMD can only show its strength in a limited portion of an application and the compiler is not clever enough to extract enough parallelism". If optimization is only going to yield 5% performance gains (as in this case) it's not going to help an aweful lot. And finally, their conclusion: "The Athlon 1200 with DDR is a more balanced and less pricey solution than the Pentium 4 with Rambus. The Athlon 1200 DDR came in first or second place in every benchmark while the Pentium 4 was very capricious with some ups but more downs. Most people hate upgrading their favorite software all the time and the Athlon runs legacy applications faster. IOHO, the FPU of the Pentium 4 should have been made more powerful." Not a resounding endorsement of their "superior" FPU, even given the "Amazing" (your words in another post I think)performance in Spec FP.
March 12, 2001 7:28:47 PM
Yes I agree that upgrading software for the sake of a CPU is not ideal, but if a new verions comes out that you would get anyway and its P4 compiled and all future versions will be then what is the harm..
also the pentium 4 FPU is well designed, but P3 code does not work well on it..
like another language, does not mean someone is stupid for not understanding it..
given proper code, the FPU unit is amazing and superior in performance and design architecture to that of athlon and P3. it operates at 3 ghz despite the code,
and in larger chunks as well..
SPEC FP is using non optimized code for P4 yet it
outperforms athlon by 200 points 550 to 350
imagine if it were compiled for P4 with sse 2.
score would be likely 600 +
two athlons huh well good for you now go find a board to put both your athlons on.
March 12, 2001 7:50:31 PM
Ahhh...but it's NOT the FPU that is responsible for the performance. It's the instruction set. Imagine what would happen if either the P3 or Athlon supported these optimizations as well. I would think that the superior FPU would continue to dominate. I think that AMD plans to support them in upcoming processors. And as Ace's points out, there are just some FP applications that do not lend themselves to optimization at all. They simply require a powerful FPU to execute quickly. In those cases, AMD's will win by a significant margin. At any rate, it's not fair to say that P4 has a superior FPU. The FPU itself is pretty sad from a performance perspective. It IS fair to say that this FPU, paired with SSE-SSE2 will be able to dominate in SOME (not all) applications. And although the SPEC FP did not use a P4 optimized compiler, I do think that it was optimized for SSE2. But as they point out, SPEC FP is more a test of BANDWIDTH than of true FP power anyway.
March 12, 2001 8:05:08 PM
YEs but when I mentioned SPEC FP that was an example of what I was trying to say that even a cross platform unoptimized FP benchmark of PSEc's reputation can perform well because the P4 FPU is wel designed...