Well remember, almost all of the P4 is badly optimized for current trend of software, because its architecture is new to them. So the P4 is not working like it should. It's like an Alien spaceship coming here. Though they are so advanced, they have a bad time doing like us!
Kind of a weird example but you get what I mean. SSE2 will one day be there, but by the time it does, Athlons will also be with it, and the current P4 will be different so we can't judge future for now!
I can't speak for software in general, but I use lightwave which had be optimized for the AMD platform in version 7.0, then Intel stepped in and helped the programmers at Newtek to also have it optimized for Pentium 4 (mainly adding SSE2 support) for the release of 7.0b
Now benchmarks that were once ruled by AMD platforms are being outperformed by 20% or so on the Pent4.
Still though that is probably the only thing the P4 uses as reason to get it, hell the XP1900 is very close to it in Lightwave, in fact, Anandtech's review of the XP1900 shows it being beaten by a mere point!! And it is still lower priced, coupled with tons of ram sticks for the price of 256 RDRAM. So SSE2 is not gonna be adopted easily to prove P4 to be usable until Northwood hopefully. Who knows what AMD will add in their Thoroughbred unexpectably?
However, the current XPs are still quite fast, and high end XPs will still outperform several of the P4s even with some of the SECC2 enabled. How far reaching it will be, I'm not sure, because not every app can gain from it.
However, the XPs do have points to gain as SECC apps are streamlined as well. They have outperformed in apps that have been SECC enabled for the P4, and not for the XP, so it looks like we'll have to take implementation one code set at a time.
Chesnuts roasting on an open CPU
Bill Gates nipping at your wallet
Very true, no question the price performance would have to go towards AMD. I would love to see that link to the Lightwave benchmarks. I didn't think Anandtech used it, and I couldn't find it by searching their site.
To simplify things a bit, you can say that SSE2 doesn't really add any performance over SSE(1), just precision.
Not at floating point math anyway. Integer SSE a.k.a MMX is wider in SSE2 than it was in MMX.
When people talk about gaining performance by SSE2-optimizing software, they mean performance relative to the regular 80bit FPU. For applications where high (well, medium anyway) precision is important you can gain a lot, but your favourite first person shooter is not going to run any faster.