2.2 ghz P4 slower than P3!

cgreene

Distinguished
Feb 21, 2002
4
0
18,510
I use some statistical software heavy in floating point. Here is a puzzle the software firm has not been able to solve.
The software runs with nearly equivalent speed on a P3 under win98 and win2000.
But a 2.2Ghz P4 machine under win2000 runs SLOWER than a 1Ghz P3 under win2000 or win98. (Unfortunately, none of us have a P4 running under win98).

For instance, calculations which take my win98 P3 144 seconds, take 173 seconds on my win2000 P4. This is a Dell 2.2Ghz with 512k cache, 512 MB PC800 ECC RDRAM with is not on a network and has no other software running (no virus software etc). The software company (QMS, or Eviews) has found the same problem on all the P4's they have tried, but no idea what the source of the problem is. The software is written in C and then compiled and is not DOS-based.

Any ideas on how this is possible?
 

Quetzacoatl

Distinguished
Jan 30, 2002
1,790
0
19,780
Crippling the Pentium 4's memory bus, by either equipping it with poor performing PC-100 or PC-133 SDRAM, or even worse, giving it ridiculously slow PC-66 CL3 SDRAM.

"When there's a will, there's a way."

Forgot to mention, there is almost no way that a Pentium 4 with RDR can lose to a Pentium 3, my best guess is the system was advertised incorrectly, and used SDR.<P ID="edit"><FONT SIZE=-1><EM>Edited by Quetzacoatl on 07/10/02 11:58 AM.</EM></FONT></P>
 

Quetzacoatl

Distinguished
Jan 30, 2002
1,790
0
19,780
Are you positive?...That just doesn't seem right. Still, it is ECC, which does hurt the performance a bit...let me take a look at those specs again.

"When there's a will, there's a way."
 

Lucol

Distinguished
Dec 31, 2007
177
0
18,680
Since the architecture for PIII and P4 is very, very different, I would suggest you talk to the Software Company and ask them to compile optimizing for the P4(using Intels C/C++ compiler). P6 (PPro, PII, PIII) architecture optimized code usually runs pretty poorly on a P4, especially with regards to FPU operations.
 

Quetzacoatl

Distinguished
Jan 30, 2002
1,790
0
19,780
*sighs* This is a longshot, but there has to be some loopholes here. Might be software optimization, although that's doubtful at best. If the Pentium 4 has poor heating, it will throttle down, lowering the speed, which could explain it's poorer performance...hmmm....Might have to do with the operating system, Windows 98 is a little dated...Could also be Windows 98 first edition, that had numerous problems, very unstable. A Pentium 3 on windows 2000 on the other hand is very stable, and a much wiser idea. Still, have to keep in mind, the Pentium 3 could be a tually, which has 512kb cache for some of the workstation models, and operates well with PC-133 SDR.

"When there's a will, there's a way."
 

Quetzacoatl

Distinguished
Jan 30, 2002
1,790
0
19,780
That's what I was originally thinking, but it doesn't seem likely, considering a few facts. Both of them compile x86 instructions, and the Pentium 4 has support for more extensions as well (sse2 for that matter). Even if the software is optimized for the Pentium 3, there is a 1.2Ghz lead for the Pentium 4, which should have easily made up for the loss in optimization.

"When there's a will, there's a way."
 

Quetzacoatl

Distinguished
Jan 30, 2002
1,790
0
19,780
*shakes head in disagreement* The Willamette wasn't ready to be released. It lacked enough cache, and sse2 did not make any impact until software was optimized for it. The poor performance was made up with the die shrink and improvements with the northwood.

"When there's a will, there's a way."
 

Lucol

Distinguished
Dec 31, 2007
177
0
18,680
*shakes head in agreement*
Exactly, "...software was optimized for it". That's what I'm saying, that his stat program is not optimized for P4 in any way, so it doesn't run very well on the P4.
Here's another link of POVRay optimized and non-optimized scores.

http://students.washington.edu/sschmitt/pov/
 

flamethrower205

Illustrious
Jun 26, 2001
13,105
0
40,780
Ok, your software is heavy floating point, and the P3 is good at that (almost as much as an athlon). P4 (excuse me, I don't mean to start a flame war or anything, just helping him out) is inferior when it comes to PURE floating point (compiling it w/ SSE2 optimizations though makes it pretty nice). That's why; although it's running at 2.2Ghz, pure FPU isn't as good. This is why my system (1.2 Athlon) rapes my dad's dual 1.8Ghz P4 sys when he does pure fpu w/o sse2 optimizations.

My frog asked me for a straw...dunno what happened his ass all over the place :eek:
 

Quetzacoatl

Distinguished
Jan 30, 2002
1,790
0
19,780
Lol, you put that a little blunt, but alright. I didn't know CPU's could rape each other, but I guess you have more experience than me. Yes, the Pentium 4 will probably never catch up in FPU performance...shame, that's all it needs to be truly dangerous...well, that and a better memory bus to feed it. So much untapped power...

"When there's a will, there's a way."
 

slickstaa

Distinguished
Apr 7, 2002
406
0
18,780
heres a link on how to make em clickable
:smile:
<A HREF="http://forumz.tomshardware.com/modules.php?name=Forums&file=faq" target="_new">http://forumz.tomshardware.com/modules.php?name=Forums&file=faq</A>

<font color=blue> "If you dont have it, that's why you need it!" </font color=blue>
 

AmdMELTDOWN

Distinguished
Dec 31, 2007
2,000
0
19,780
The software company (QMS, or Eviews) has found the same problem on all the P4's they have tried, but no idea what the source of the problem is. The software is written in C and then compiled and is not DOS-based.
well I have no idea how lazy those programmers are but here's a great place to find out about coding for the P4.

maybe this company needs an Intel rep to come over and teach them some new tricks!

<A HREF="http://www.intel.com/software/products/" target="_new">http://www.intel.com/software/products/</A>


"<b>AMD/VIA!</b>...you are <i>still</i> the weakest link, good bye!"
 

imgod2u

Distinguished
Jul 1, 2002
890
0
18,980
I would seriously look at the code of this particular software. It is likely that is it just filled with FXCH (FP move) instructions which takes 0 time on the P3 but can take up 1 or 2 cycles on the P4. If this software was filled with such instructions then it may indeed run much better on the P3 than the P4 even with a 2x+ clockspeed gain. Sounds like horrid programming to me. Who the hell would fill their software teaming with FXCH instructions?
 

Kelledin

Distinguished
Mar 1, 2001
2,183
0
19,780
Who the hell would fill their software teaming with FXCH instructions?
Someone who wanted to get the most out of a P5/P6 FPU. The x87 FPU architecture is stack-based, whic means that from a programmer's view, software can only access one or two FPU registers at a time (but I'm sure you know all this already). This doesn't lend itself well to the instruction reordering involved in parallel execution. Intel's first solution to the problem was an FXCHG instruction that was "free" in many cases, which allowed software to take advantage of the P5/P6 FPU's parallel processing. So any code that was seriously optimized for the P5/P6 FPU would have to use a lot of FXCHG instructions.

<i>I can love my fellow man...but I'm damned if I'll love yours.</i>
 
G

Guest

Guest
because of your ECC ram?
moreover this prog isn't written with SSE but not SSE2 optimizations?
(SSE/P3 SSE2/P4)

sign linguage - SSL/HAL.
 

juin

Distinguished
May 19, 2001
3,323
0
20,780
i have hard time believe you on a t-bird beating a DUAL P4 1.8 on FPU with or without SSE/SSE2

cheap, cheap. Think cheap, and you'll always be cheap.AMD version of semi conducteur industrie
 

imgod2u

Distinguished
Jul 1, 2002
890
0
18,980
Still, the sheer amount of FXCH instructions used in order to tie a P4 2.2 down that much must be tremendous. Certainly more than it was ever needed. Some guy prolly felt he needed to reallign the entire FP register stack during every loop or something. That's just horrible programming practice.
 

bront

Distinguished
Oct 16, 2001
2,122
0
19,780
The FPU on the P4 isn't that thrilling, and if the code relies almost exclusively on that, it could be correct.

"When there's a will, there's a way."
You've got it wrong. It's "When there's a will, there's relitives"

English is phun.
 

Quetzacoatl

Distinguished
Jan 30, 2002
1,790
0
19,780
Hehe, well, let's say I did a phrase mod

"Where there's a will, there's a way."

Got that from some old video game...pretty obvious

"When there's a will, there's a way."

That's my phrase. And yes, if there is a will, there's relatives, plus food and money from inheritance.

"When there's a will, there's a way."
 

flamethrower205

Illustrious
Jun 26, 2001
13,105
0
40,780
Hehe, odd isn't it, but in sisoft pure fpu p4 2ghz scores much lower than my 1.2. Well here's something to prove it- <A HREF="http://www.tomshardware.com/mainboard/01q2/010605/760mp-07.html" target="_new">3D Studio Max R3 rendering (which is pure FPU w/ R3)</A>.

My frog asked me for a straw...dunno what happened his ass all over the place :eek: