P4 v Athlon: Number Crunching

I just read Toms comparison of P4 and Athlon systems for office and gaming use. "To cry or Not to Cry" Jan 30.

But I have a single objective: speed of floating point number crunching, say using C++. Do people think that the P4 with PC800 RDRAM is going to be faster than Athlon systems for heavy floating point number crunching?

54 answers Last reply
More about athlon number crunching
  1. i'm not sure which is better
    bu i would say keep the money and get more ram
    at school the commputer i work for vc++ 5.0 has a celron 500
    with 128 megs of ram

    at home i have vc++6.0 with a 450 p3 and 96 megs of ram

    and i gusses that each ram speeds it up becuase for some reason my commputer at home complies slower with the newer version of vc++ than the ones at my school
  2. http://www.spec.org/osg/cpu2000/results/cpu2000.html

    P4 wins, and comes in close to Alpha 64bit.

    Its gonna be nice when P4 gets die shrink to .13 micron
  3. No. The FPU on the Athlon is superior to anything of Intel's.

    Satan Clara...... 'Nuff said.
  4. Grizely...interesting, and thanks for replying, but is there any evidence for that? How do you reconcile what you said with the FP run time benchmarks at SPEC where the P4 1.4 seems to be a lot quicker, more than twice as fast for 183equake for example?

  5. I'm not saying anything about benchmarks or anything, just that Athlon's FP unit is much more advanced and faster than Intel's. Ask anyone, they know. Look around for some articles on it, I'm sure you could find some.

    Satan Clara...... 'Nuff said.
  6. Yes, the AMD FPU is vastly better than intel. 20% at same clock. Factor in the price comparison and you've to choose AMD. When ever tom releases benchmarks for cpu's he always includes 3dsmax 2 renderings per hour table. Everyone seems to look past this but that I the only thing I look for. Btw, for FPU duron and tbird have the same scores since cache doesn't factor in at all here. A Duron/tbird running at 700mhz ties a Pentium 3 at 1000mhz. Not bad at all.
  7. hi mkelder

    I cant find toms P4 and AMD benchmarks for 3dsmax...do you know if they're on site?
  8. I believe the P4 has worse FPU than the P3. I know it can't touch the duron/tbird for sure.
    Here is the P4 FPU benchmarks (you'll laugh hard, intel want to sell processors to people that want to surf the net, not have power)

    Here is the duron/tbird vs the P3 1000mhz
  9. mkelder

    Thanks, very interesting ... but now I'm puzzled because the SPEC FP benchmarks at


    show the Intel chip a lot faster than AMD 1.2Gz.
    How do we reconcile these different findings?
  10. From what I have been reading the P4 seems to perform better in graphic heavy applications. The Athlon Tbird is a well rounded chip that performs well in all areas. If you want a pure gaming chip and dont mind the cost the P4 is nice. For a well priced all around sweet chip the Athlon Tbird simply rocks.
  11. Only because the P4 has superior bandwidth (tons of it)!

    Satan Clara...... 'Nuff said.
  12. Howard,
    Take a look at this.
    It has results for the same file on different processors/machines. Maya uses huge amounts of fpu calculations for 3D rendering.It is also a good test of ram speed because its high ram usage. The single Athalon is the higest single in the group and higher than most of the double P3s. I personaly am waiting for the dual Athalon boards to come out. If the numbers of the dual Athalon 1200 is two times a single, (like the p3s) the dual Athalon will post numbers that have never been seen before.
  13. When are the dual althons coming out anyway? I have P3 500 and it really needs to be upgraded. Was going to get a 850 Tbird and oc to 1ghrz. If they are coming soon I will hold off.
  14. Acording to AMD zone the board will be released at the end of QI, next month, and the 760 MP will be released early Q2.
    Its anyboby guess as to when we can actualy get our hands on them,from the example of the 1200, maybe June???
  15. Hmm, sounds interesting, I think I could wait that long.
  16. I have yet to find any Athlon perform better than a P4 with Rambus. Even the Athlon 1.2 with 266 bus and DDR does not outperform in grapic heavy applications. If anyone has a link please put it up I would like to compare.
  17. That depends on the use of the cpu. If you want to render in any 3d app AMD has it hands down. That is all I care about really, I could care less if I get a few fps less in quake 3, that loss is made up by the 100's of dollars you'd save going with AMD. With that money you could upgrade other stuff or just have more pocket money.
  18. That's called bandwidth usage, and we're not talking about that.

    Satan Clara...... 'Nuff said.
  19. Seems like you don't have a clue, Grizely1, sorry to say.
    Athlon's FPU can't be called SUPERIOR to that of Pentium4's. SPEC numbers, which can be called 'the maximum potential of the core," basically shows you what the CPU can do when the software is properly optimized for the CPU. Athlon/Duron are pretty much optimzed for programs that are designed to run best on Intel's P6 cores... after all, they don't have a choice.
    On the other hand, P4 takes a different approach to improve the FPU performance. That is, as you probably heard a million times already, SSE2. SSE2 is a double-precision 128bit instruction set, and x87 FPU is not. P4's cache lines are optimized to move data in 128bit chunks, so when P4 is working with old 80bit x87 FPU instructions, it is not efficient. That's why P4 performs rather poorly (but not bad at all..some of the lost performance is due to the data in programs used to benchmark are not properly arranged for P4's 20-stage pipeline) on programs that heavily uses old x87 FPU instructions, like the 3Dmax which Tom likes so much. In some programs like Quake3 though, P4's massive bandwidth kicks in and makes up for some of the inefficiency.

    SPEC numbers tell you what P4 can do... smash Athlons and Durons and even some Alphas when it gets the SSE2 support, which I think it will get enough when Northwood is launched.

    Sorry for the poor English.. I'm not a native speaker.
  20. dude,no matter how you explain this to the "amd puppies" they will never get it.

    when a benchmark tool fails to see and use P4 144 new instruction set then you know that the numbers are bogus.

    they should all label their tests scores "handicap P4 results"

    amd puppies are just as blind as some of these benchmark tools!
  21. Exactly, the problem is most people don't really comprehend how the SPEC benchmarks work. If you've ever worked with Borland C++ or similar that has the Intel optimized compiler(bcc32i) as well as the regular compiler(bcc32), you can see huge performance differences even on the same processor. It's simple, Intel makes very good compilers for their processors, where as AMD does not seem to shine in this area. If AMD would put some more resources into making compilers that optimize for their processors, it's possible that they could gain some serious ground in the SPEC benchmarks. For home use, or regular office use, I'd go with an AMD any day, but for in house development, I'll go with Intel, the performance differences are usually pretty staggering.
  22. You made some good points. Yes p4 will be faster with sse2 software but how many programs are there at this point, not very many.

    Don't for get that AMD has rights to use sse2 and it will be in there next chip. Right now the P4 is a bad choice 1 because very few program use SSE 2 price of a p4 sucks.

  23. So, are you saying the SPEC has been SSE2 optimized? I honestly have very little clue about SPEC. Is it really an apples to apples comparison when what the person wanted to know was raw FPU performance? It *seemed* like he was using compiling as an example, not really the heart of the question. Just asking.
  24. The Athlon's FPU can and is rightly called superior to the P4's. The PIII's FPU is stronger than the P4's. SSE2 can make up for it, and even make it perform better in some cases, but most software isn't optimized for this yet. If you are talking about CPU performance on todays software, the Athlon is hands down the superior performer, unless you are buying your machine specifically to DivX encode movies and play Q3A, and in the case of Q3A the additional cost of the CPU and memory do not make up for the extra performance since the increases are imperceptable to the human senses. Once more software filters down the pipe I would consider purchasing a P4, but right now it just doesn't make sense. For the money you'd spend on a 1.5GHz P4/RDRAM system today, you could buy a T-Bird/PC133 SDRAM system and swap out the motherboard/CPU/memory 6 months down the road when software can cope with the P4 and Intel/Rambus offerings become competatively priced. You'd have a screaming system now, and a screaming system then, and come out better than if you'd bought just the P4 system today. The biggest thing to consider with the P4 is what you plan on doing with it. It beats the Athlon in some real world benchmarks (optimal condition benchmarks like SPEC do not relate directly to real world performance!) quite convincingly, but unless you plan on using it for that specific purpose (Q3A and MFlask) I would recommend the Athlon. It performs solidly across the board so you don't have to worry about buying a program and having it run slower than an 800 Celeron you just replace on that spanky new P4 1.5GHz becuase its developers didn't know how to optimize for SSE2, or more likely just didn't care to. You also don't have to sell your daughter's virginity to buy it and its memory. If you look at the P4 performance in today's real world benchmarks, and stack that up against its price, and the price of the memory you are required to use with it, there really doesn't seem to be a lot of thought required.

    Everything in moderation
  25. I would think the Benchmarks would be SSE2 optimized. The SPEC benchmarks aren't like a 3DMark or any other benchmark, they don't even provide a build, just the source code for anyone who wants to build their own executable. So, Intel would take this code and obviously build it so that it runs as efficiently as possible on their processors, in this case, the P4. Basically what these benchmarks show, like one of the above posts mentioned, is what a processor is capable of when software is fully optimized for it. I would still say, that if AMD put more effort/resources into their compilers, we would most certainly see higher SPEC CPU 2000 scores. Everyone knows that AMD's FPU is faster than Intel's PIII, but why would the SPEC scores be so similar between these two??? The only other factor involved here, that most people disregard altogether, is the compiler used. I was just looking at the disclosures from AMD and Intel with regards to the SPEC results and AMD isn't even using their own compiler (it might even be that they don't have one), they use Intel's C compiler 5.0.
  26. "Athlon's FPU can't be called SUPERIOR to that of Pentium4's" Yet you go on to ramble about the CPU itself. I'm talking about the FPU (Floating Point Unit), incase you haven't noticed. The Athlon's FPU is far superior to P4 and even the P3. I'm not talking about benchmarks or anything either, so don't go blabbing about them. Numbers mean nothing unless the P4 can prove itself in the real world.

    Satan Clara...... 'Nuff said.
  27. Lucol, that doesnt explain the Alpha64, mips, sun, qed and others that scored lower than P4.

    Grizly1 more info being release soon on FPU and P4 for next rev @ .13 im sure you will be very interested in.

    Athlon has two 64-bit MMX/Enhanced 3DNow! pipes And again the dedicated store-pipe. Willamette has one 128bit SSE(2) pipe and one dedicated load/store pipe. So when it comes to single-precison floating-point execution both can do calculation on 4 numbers per cycle.

    However with the Willamette it looks easier to me because there is only one execution-pipe and one doesn't have to look at pairing restrictions. But of course information about Willamette is still very early, so nothing is certain yet.

    Willamette's SSE2 unit contains much more instructions, which is advantage for Willamette. Also the fact that Willamette can do double precison (2 per cycle) and 128bit integer and Athlon cannot is an advantage. As last advantage the fact that Willamette is designed to reach higher clockspeeds, so even if it is clock for clock slower (which probably is only the case with the FPU) it will still be competetive and if SSE(2) is used will probably beat Athlon.

    But of course if someone else has an other opinion I'm very willing to hear it.

    Intel has taken notice to the FPU desparity and is looking at resolving it soon.

    I wish AMD would do the same reguarding the thermal protection. then I wouldnt have to keep responding to posts reguardin that subject.
  28. Quote:

    Grizly1 more info being release soon on FPU and P4 for next rev @ .13 im sure you will be very interested in.

    I am. If they price it considerably I might even consider getting it!

    Yes, I believe the P4 will be better once alot of the SSE2 optimized software comes out. But please don't forget Athlon wasn't designed to compete against P4, the Palomino and SledgeHammer were.

    It will be nice when AMD adds thermal protection.

    Satan Clara...... 'Nuff said.
  29. I was simply stating the fact that AMD doesn't use their own compilier, don't even really know if they have one for that matter. They're using Intel's C 5.0 compiler, and thus, it would not optimize the SPEC benchmarks for the Athlon. So if AMD ever makes a compiler for it's processors, the SPEC scores would most likely be a lot higher.

    Just wondering, do you even know how SPEC benchmarks work?
  30. I was talking about FPU .. read again carefully. I don't have anything to talk about other features of P4 or Athlon, because it doesn't matter.
    I think you still don't get it.. ramble about CPU itself?? Did I?? go read it again ... You think Athlon's FP unit is technologically "Superior" just because it runs 80bit codes better? NOT!!! I agree that Athlon's FP unit is better suited for most of today's apps, but that doesn't make it a CPU with BETTER FP.. what about Alpha then... just because it doesn't run x87 FP programs, can you call it a garbage??
  31. < Yes, I believe the P4 will be better once alot of the SSE2 optimized software comes out. But please don't forget Athlon wasn't designed to compete against P4, the Palomino and SledgeHammer were. >

    In my opinion, Palomino can't compete with P4 for long... without SSE2 optimized programs, it will be good, just like the Thunderbird, but it won't be for long. You have to realize that Palomino is nothing but a Thunderbird with lower power comsumption... maybe it will come out with a better Branch predictor(the one in TB stinks big time),and maybe even SSE support.

    Slegehammer is not AMD's answer to P4... Slegehammer will compete mostly with Itanium/Foster/McKinley because it is a 'SERVER' CPU... desktop version of the Hammer family is called 'Clawhammer'..this one will be a direct competitor to P4... get this one clear dude.
  32. Who's C++ are you using and is it SSE2 compatable? If you are using an SSE2 compiler or are going to use an SSE2 compiler at some point then P4 is the way the truth and the light. If you are using a non SSE2 compiler then AMD is a better all round choice for the money. As with many things these days it is about software not CPU. If you mail order a system double check the heatsink.

    Please do not read this next paragraph if you already know what SSE2 is. SSE2 is the future. Both AMD and Intel are going SSE2 in future chip designs. SSE2 is a kind of second super power FPU. The p4 has a decent (not great) regular FPU that works on all modern software. Athlon's FPU is argued to be better than P4's and that may very well be so. P4's have a second FPU named SSE2. Most of today's software cannot use the SSE2 part of the p4 so on today's non SSE2 software Athlon trades wins with p4 and is a better value for the money. In the future software will be SSE2 and p4 will shine.

    How long does the system need to last? Will you upgrade CPU/mobo in a year or will it be several years? If it is going to be several years then the p4 is more future proof because it has SSE2. If you will upgrade soon you could save money and get an Athlon now and rethink the question after some new stuff is out.
  33. Correct me if I'm wrong but fpu can't be optimized the way you people say. My knowledge is pretty much limit to graphic renderings which is all fpu, memory bandwidth and special instructions aside. When I bought my PIII 500 when it was brand new (sold one of my kidneys for my damn system) I was pleased that I'd be able to render really quickly because of SSE.

    Much to my disappointment I didn't see any gain in rendering times increase over a cpu with the same fpu ratings. Why is that? Well I email discreet about it (were talking 3dsmax3) and they said there was very little they could do to optimize renderings with SSE or 3dnow!. Granted, using the p3 gave me a 20% boost in the view ports but rendering is were it counts. I suspect this will most likely be the same with SSE2. This is a real life benchmark. It seems SSE and 3dnow! both work with things like quake3. You see enormous gain in games that support SSE and 3dnow!

    Out of all the animators I've talked to all use AMD or they can't afford to update their workstations yet (trying to recover from the last Intel expense). Flame me if you must Intel lovers or say that rendering times don't matter but we are talking about the most power hungry benchmark ever. And don't think that discreet didn't want to support SSE the best they could, we all bitch about rendering times and want them to speed it up but we can't sacrifice quality either. To some it up, to me FPU in it's purest form has to come with muscle from the start and not try to drink lots of water to puff up (make special instructions fix poor fpu).

    Just to add, we animators aren't too happy with the P4. We've never seen a boost in rendering times like the athlons gave us. Sure the P4 can perform better in graphic apps in certain situations but not when it comes to rendering. I just hope the animation community is big enough to keep amd from forgetting about fpu and go with the most software dependant boosts like Intel does. Gees, the cpu wars are pretty much ran by quake and fps. Don't forget about us AMD, we love you!
    <P ID="edit"><FONT SIZE=-1><EM>Edited by m_kelder on 02/15/01 07:57 AM.</EM></FONT></P>
  34. Actually it's a TBird with lower power consumption and higher clock speeds. We all know how crappy the P4 is clock for clock. (The Palomino is supposed to go upto at least 1.7GHz.

    Satan Clara...... 'Nuff said.
  35. Quote:
    Slegehammer is not AMD's answer to P4... Slegehammer will compete mostly with Itanium/Foster/McKinley because it is a 'SERVER' CPU... desktop version of the Hammer family is called 'Clawhammer'..this one will be a direct competitor to P4... get this one clear dude.

    That is why I said SledgeHammer <i>AND</i> Palomino. I read the roadmaps as avidly as I look at Jessica Alba pics (don't ask), I've heard all there is about them.

    And also I figured I better put SledgeHammer in there because, well, you're comparing the P4 to the TBird. Same as comparing Sledge to P4. Just a switch of sides.

    Satan Clara...... 'Nuff said.
  36. For rendering, raytracing and such, it's pure FPU x87, and very little can be done to optimize it with either 3DNow or SSE/SSE2. Some guy optimized POVRay to use SSE2, but the gains were a mere 1 or 2 seconds from what they were before. One thing that you can do, is optimize an application for a specific processor, this usually shows very large performance boosts as you can see with the following page:
  37. Howard has asked about C++ not rendering.

    Flask is a video conversion program written in C++. Flask was originally put out after being compiled with M$'s non SSE2 compiler. At this point there was little difference between the P4 and the Athlon. Then Intel got a hold of the source code and recompiled using Intel's SSE2 C++ compiler. AMD Athlon's times improved quite a little bit using Intel's compiler over M$'s. P4's times improved even more then Athlon's. No matter which chip you buy Intel's compiler seems a good investment.
  38. sorry, lost track of main topic.
  39. sorry, lost track of main topic. When ever peopl talk about fpu I forget everything but rendering, my bad
  40. I didn't mean to make anybody feel bad or anything. BTW rendering is one of the few good reasons to care about FPU and worthy of a thread of its own. Used to do rendering on Amiga 25MHz, hours for one picture. I can't believe that SSE2 helps so little in rendering. Ooops sorry, off topic.
  41. If the software based renderer is not SSE2 optimized of course it wont help much. duh!.

    Software rendering is dumb in the first place, hardware rendering is the only way to go.

    Research wildcat, and GLint. up to 16 processors for geometry and masking layers

    Example, Quake 3 uses hardware to render scenes, some video cards can render up to 200 frames in a second. ATI developed a Unreal map with 40mb of textures??

    Now step back to 3D studio and its ancient software based rendering, it takes too long to render a simple scene 1 frame.

    All the cool features like raytracing, bump mapping, per pixel shading, and so on are all done in hardware now.

    Now lets look at evans and sutherland, for those that dont know, E&S is used by the biggest comanies in the world for modeling, rendering, and RVD. Chrystler moters corp, uses E&S to design new cars. with every detail in a car from fan belt to tread pattern all rendered hardware real time.
    This wouldnt be possible with software bassed rendering.

    Maya has software based redering, good for modeling tho.
    Speilberg, ILM, and other movie production companies will use a Onyx or other high end SGI to render with hardware vs doing all the frames via software render.
  42. hey hey , once again ,lets just bash and argue back and forth , gee what a great forum, what ever happened to helping each other with problems , rather than argueing
  43. Fugger I may not be the smartest guy in the forum but I do try to help and I do actually read what others write. If a guy can't afford to upgrade to a p4 then he certianly can't afford to go out and buy a room full of SGIs and make a render farm. We have SGI where I work (two stacks of workstations Cray linked with hot swap SCSI ar ar ar)but you will not catch me buying SGI for home use.

    Fugger you are a smart guy how about helping with some real data on P4 vs Athlon in C++ applications with and without SSE2.
  44. Here's how I see it.
    T'Bird/Duron have strong FPU. The P4 is has a weaker FPU but when using SSE2 software becomes *very* efficient. T'Birds/Durons offer tremendous value for money.

    The thing is that *now*, there isn't really much SSE2 software around... like MMX took quite a while to sink in. The difference is that AMD will implement SSE2 as well. So by the time the P4 really shines, there will be an AMD solution, which will probably be cheaper. Please note that while MMX made very little difference, SSE2 will.

    On general performance, will most users notice much change? The average computer user almost never utilises the full capabilities of their system. Ok, if you do 3D stuff or weather simulations or whatever then yeah, but most people do this at work. For general internet etc. a very modest system will suffice (trying not to mention the iMac :). Gamers often push their systems, but then generally the graphics card becomes a bottleneck first.

    Please note that I am not particularly pro-AMD (especially after they ditched their socket 7 users), but as a student, what matters for me is price/performance ratio, and this is definately what AMD is achieving right now.

    Sorry if this is off at a tangent slightly


    Black holes really suck...
  45. Howard,

    I am in a similar position to you. I use my machine to run simulation code at work. Mostly Fortran code I've hacked together - pretty much all repetitive floating point operations (modeling of optical wavefront propagation if you're curious). Since this machine is for work, a few hundred bucks price difference doesn't matter and the "bang per buck" arguments of AMD's advantages are irrelevant. I just need fast execution times.

    Anyway, rather than rely on the various benchmarks I've seen on the web I ran a very simple one of my own. I took
    a chunk of Fortran code I'd written on an old compiler (only PII optimizations) and ran it on various current machines. The required execution times were:

    PIII (1 GHz) 58 sec
    TBird (1.1 GHz) 48 sec
    P4 (1.5 GHz) 73 sec

    This difference surprised me, but is consistent with the performance described on various hardware review sites. I subsequently got a copy of Intel's new SSE2 optimizing Fortran compiler (they also have a C version) but it appeared that I'd have to rewrite significant portions of my code to use it. In addition, it wasn't clear to me how much the SSE2 optimizations would improve simple floating point operation. It looked to me like the big advantages were in 3-D graphic operations which are irrelevant to me.

    In any case I just got my company to buy me a 1.2G TBird + KT133a machine. YMMV, but for my application this seems to be the best choice.

    Fred Vachss
  46. damn, you are one lazy programmer! ;-)

    you should give it a shot anyway, just to see what it's like.
  47. Gees fugger, you're a really smart guy. I said even if the renderer was SEE2 optimized it wouldn't run any faster. Do you know anything about computer animation besides seeing it in quake 3? Of course hardware renders 3d fast, that is why we have 3d video cards but they can't do [-peep-] compared to what needs to be done. No video card can render like software can. You can opengl render some things but the quality is pretty bad and not really worth it since you pretty much can only help in some parts. No video card in the world can handle the hundreds of high res textures in scenes. If you want realism you need software rendering and you need a solid FPU… You need AMD.
  48. Sorry Fugger but you are confusing apples and oranges. Hardware rendering is used for low resolution high speed rendering, previews basicaly.Hardware rendering is analogus to a screen capture. Even on the biggest Systems, hardware rendering cannot take the place of Software rendering. The Wildcat line of graphics boards has a hardware overlay that allows large files to be manipulated, in wireframe or simple shaded form on the screen. The graphics card does nothing to help actual rendering speed.All but the simplest mapping ie. bump, incandescance,specular,and so on are done in software rendering. Systems like the SGI Reality Center do hardware rendering in Open GL but again the resolution is still much lower than software rendering.Fanbelts, tread and leaves show up but in a corse resolution.
    I am writing this from an SGI Octane and I use Renderman on a daily basis.
Ask a new question

Read More

CPUs Gaming Systems Office