kylew

Distinguished
Jun 15, 2005
4
0
18,510
It's basically testing the OS scheduler, but not the hardware capacity. It's much easier for the OS to schedule 4 tasks to 4 logical CPU than schedule 4 tasks to 2 logical CPU. Also. the context switch rate of the Intel system is MUCH lower than AMD system in the test. The AMD system can spent much much more processing power on context switch than the Intel system.

Tom's test are showing the most ideal scenario of intel system, while it shows the real-world performance of AMD system.


<P ID="edit"><FONT SIZE=-1><EM>Edited by kylew on 06/15/05 04:45 AM.</EM></FONT></P>
 

Ncogneto

Distinguished
Dec 31, 2007
2,355
53
19,870
The only data that can be ascertained from this stress test is the fact THG is run by a complete bunch of incompetent morons.

All was not rosy with AMD, however: our tests showed that it lagged behind Intel with respect to Divx compression. We still don't have a good answer for the cause of this difference, however. But if you do a lot of videos, stick with Intel for the time being.
This should actually read "We at THG are a complete bunch of morons. We have absolutely no clue how the Operating System works in regards to the Windows scheduler. Furthermore, if your going to do a lot of video work while playing a game and running winrar in a continuous loop, along with ripping a cd with LAME, your better off with Intel if the game your playing is far cry and you don't mind wild fluctuations in frame rate, rendering it unplayable anyways.

What are the reasons for the AMD dual core CPU being apparently unable to ideally distribute the four applications on the two cores, even though the load on the single cores is at maximum? Is this an issue of the integrated memory controller and its memory allocation/controller logic? Or perhaps the integrated memory controller of the X2 produces more overhead, resulting in lower performance?
As current tests are now clearly showing, what everyone else has been saying, except the morons at THG is becoming obvious. The Windows Scheduler is and was the reason for the low scores on the AMD machine in regards to DIVX. Now, those of you might automatically assume this is an issue with the OS (ala windows) but in all fairness its not even that. Truth of the matter is it was performing exactly as it should have been. The reason for the Intel with four logical cores being able to out pace the AMD system is actually a glitch in the Intel processor, albeit a helpful one to some extent for the purpose of this test. It does appear that this glitch as I call it did negatively effect the other three tests on the Intel system. If the Intel system with HT enabled chooses to run threads of Divx encoding at the cost of any performance loss on the other three apps which have been given a higher thread priority, its not doing what it is supposed to do. Furthermore, nobody but nobody would ever run applications like this test on there home machine. The original purpose of this test was merely to max out the CPU's and check for stability, not some sort of misguided performance test. After several motherboards, new RAM and power Supply, having to pull out one video card, guess which system won the stability test? Granted, I am not going to totally dismiss Intel as not being able to be used in a stable system, I am however abhorred at the incompetence of the system builders at THG. On a side note can someone tell me if the AMD system is still running with the PC Power and Cooling supply or did it get swapped out at the same time the Intel system did? If not, this totally skews and power usage data as well.

In Summary, Tom's has once again reached an all time low and is really becoming the laughing stock of the hardware community. For crying out loud, these guys can't even keep the network working!

It's not what they tell you, it’s what they don't tell you!
<P ID="edit"><FONT SIZE=-1><EM>Edited by ncogneto on 06/15/05 10:41 AM.</EM></FONT></P>
 

scotty_p

Distinguished
Jun 21, 2005
2
0
18,510
After reading this article for the first time, I was so amazed that the testers would even think that a 10X performance increase due to HT would be because of hardware that I HAD to go to the forums to make sure was someone out there who knew how software works! What, does HT suddenly unlock those extra 10 CPU cores that Intel has hiding in there?! Come on!!! This seems pretty obvious that the Windows scheduler is playing some havoc on these DivX processes and their threads so that the context switching is taking up more CPU time than the actual video encoding. I hope people out there don't actually read this article and think that HT is some magical piece of circuitry that's going to speed up their media encoding for ALL applications.

I just lost a lot of respect for THG.
 

kylew

Distinguished
Jun 15, 2005
4
0
18,510
THG just don't have any scense about how OS works and how OS schedule process. I wonder they even have no idea what "context switches" means. They just know about overclock .... well ... maybe they are right, they are "hardware guide", they just don't care about how OS works.
 

P4Man

Distinguished
Feb 6, 2004
2,305
0
19,780
>I wonder they even have no idea what "context switches" means

I have to wonder if *you* do, though, as I have seen not a shred of evidence the Ahtlon would be faster in this regard, let alone this would have a measureable performance impact. So what makes you think this ? I know AMD added some instructions in their AMD64 extentions to enable faster context switches, but since purely 32 bit binaries are used in this test, that won't show. Not that I expect an AMD64 OS and apps to show any measurable real world effect of this either though (not too mention, I'm not quite certain intel didn't implement the very same instructions into EM64T).

So what makes you think the Athlon switches contexts faster ??

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 

kylew

Distinguished
Jun 15, 2005
4
0
18,510
I didn't said AMD do context switches "FASTER", I said the AMD system do MUCH MORE context switching than the Intel system because of the scheduler effect.
 

P4Man

Distinguished
Feb 6, 2004
2,305
0
19,780
I agree with each and every word there, well said.

>On a side note can someone tell me if the AMD system is still
> running with the PC Power and Cooling supply or did it get
>swapped out at the same time the Intel system did? If not,
>this totally skews and power usage data as well.

AFAIK, the AMD rig is still using the 850W PSU, so indeed the power measurement results tell you nothing. We already know the real figures though, the 840EE consumes well over twice what an X2 pulls, and happily exceeds 150W.

With so much incompetency, there is still one thing that stands out: how the hell can they NOT test if the P4 is throtteling ? Its constantly running at the maximum specified Tcase.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 

P4Man

Distinguished
Feb 6, 2004
2,305
0
19,780
Ah, ok. To be fair though, you can't blame the OS for this, its simply a feature of the 840EE that it can run more threads simultaneously. So if that gives you a performance boost (actually, throughput boost) in certain "out of this world" scenario's, then great for the 840. If you have 4 DVD writers and regulary encode 4 DVDs at once, its possible the 840EE is the faster chip, even though it will be slower at encoding a single, and maybe 2 DVD's simultaneously and it will be a much worse game performer while encoding.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 

kylew

Distinguished
Jun 15, 2005
4
0
18,510
I'm not blaming the OS for the result. Actually ANY OS can schedule 4 tasks to 4 CPU much easiler than schedule 4 tasks to 2 CPU.

To simplify, 4 tasks -> 4 CPUs means that the scheduler don't need to keep switching the tasks among the CPUs. 4 tasks -> 2 CPU means that the schedules need to keep switching the task at very high rate.

I just try to explain the abnormal performance result from Tom's test. I'm sure if Tom run 5, 6 or even more tasks together, the result would be totally different.

Testing true multitasking performance should test something like web server, SQL server, etc which create large amount of threads, however, I understand most users just don't care about web server or SQL server performance in their daily life.

As I said, Tom's test is showing the most ideal scenario of Intel system, 4 tasks, 4 CPU.
 

Stimpy

Distinguished
Jan 15, 2001
138
0
18,680
From what I can make out, it is the event of switching out a context, i.e. writing out the information( Data/code) of a current process to memory (or cache if it will fit), and then loading the information (Data/Code) of the next process into cache and internal buffers/registers.

Although on paper HT means you should have fewer context switch, the problem arises that you don't actually have 4 seperate cores, only two that can be shared.
If all the data & code of all four threads can fit into cache at once you could get some good performance improvements, however if not you some context switching will still be required thus dropping performance
 

P4Man

Distinguished
Feb 6, 2004
2,305
0
19,780
More or less what you said, but don't think the 'data' includes the working set of a thread, all that is 'switched' is register values, program counter and maybe some other minor flags/stuff. And yes, that will always fit into the cache :) but I'm not sure its written there. Since you don't need fast access to it, and you need the cache for your next process, it would be better to put it somewhere in ram I think, but I don't know for sure how this is handled.

As for HT, there you are wrong. HT enabled CPU's actually have 2 independant contexts per core, so each virtual core has indeed its own program counters, registers (well, not quite, but almost,.. at least it looks like each virtual core has). AFAIK, context switch are not different on a HT enabled cpu than any other, at least if you compare a context switch of only one (virtual) core with a normal core; nor would they occur less or more frequently for any other reason than that the OS scheduler thinks its a good idea to time slice or switch threads, and cache size has nothing to do with it at all, other than that cache is shared by both virtual cores, and therefore, its possible one thread running on one virtual core 'pollutes' the cache for the other thread, which can result in performance degradation (both cores fighting over the cache).

Hope that clears it up a bit.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 

Stimpy

Distinguished
Jan 15, 2001
138
0
18,680
I stand corrected.
That clears a few things up, yet probably opens up a whole new can of worms that I'd need to think about.
Given the small trace cache is it not more likly that the two cores will fight over the cache, especially if 4 threads are running concurrently????
 

P4Man

Distinguished
Feb 6, 2004
2,305
0
19,780
>Given the small trace cache is it not more likly that the two
> cores will fight over the cache, especially if 4 threads are
> running concurrently????

wether its 4 threads on a 840EE or 2 threads on a regular P4 with HT is the same thing. Trace cache, just like L1 data and L2 cache is shared between virtual cores on the P4 (although tagged, afaik virtual cores can not use each others data), so effective size is cut in half, but I am not sure what you are comparing with when say 'more likely'.. did you mean to compare to a single core P4 ? THen, no, there is no difference, since dual core will also have 2 trace caches :) The only potential pittfall is if the OS doesn't know a virtual from a physical core, and when running just 2 intensive threads would schedule them on the same core, thereby rendering your dual core machine into just a regular P4. But windows (XP/2003) and Linux know about hyperthreading, so they won't. If memory serves, this was/is an issue with Windows 2000 though, hurting performance of dual xeons or dualcore P4s with hyperthreading enabled.


= The views stated herein are my personal views, and not necessarily the views of my wife. =
 

scotty_p

Distinguished
Jun 21, 2005
2
0
18,510
FYI, if anyone wants to read a good article on the subject of context switching, hyperthreading and P4 CPU design, go here:

http://arstechnica.com/articles/paedia/cpu/hyperthreading.ars/2?94885

My original point I guess is that it's not possible for HT to make the hardware 10X as effecient. HT is designed to fill empty execution slots in the CPU that would otherwise go wasted. It's maximum theoretical performance increase would only be 100% (which in reality is far less because of shared resources and other factors). This benchmark is showing a 1000% increase, therefore it's not the HT that's directly causing the performance increase.

The other thing that HT does besides increase effeciency (or possibly decrease depending on the situation) is to expose 2 CPUs to the Windows scheduler for each core. This affects the time slice quantum that Windows gives to each thread which in turn changes the number of context switches that take place. It's impossible to tell for sure what's happening without a debugger, but this seems the logical reason as to why there's such a drastic performance change.

The bottom line is that this problem is most likey caused by software, not hardware, and I would expect computer "experts" like THG to realize this.
 

P4Man

Distinguished
Feb 6, 2004
2,305
0
19,780
Thanks for link (even though I read it already :).

Mostly agreeing with what you say, but nevertheless:

>This benchmark is showing a 1000% increase,

No, it doesn't, as throughput isn't increased with a factor 10x. You would have to weigh the lost frames on Farcry (and the lost performance on the other apps) against the increased DivX frames to determine the increase (or who knows, decrease) in overall performance. Since its pretty hard (if not impossible) to do that, you can't make any conclusions at all. There could be a substantial performance increase or decrease, we just don't know, all we know is that performance on other 3 apps was traded for more performance on the DivX app.

Now if they had used 4 identical benchmarks, like 4x DiVX encoding, we could have seen what HT brings for that situation as you could simply add up all the rendered frames in both scenario's. But now you'd have to determine wether eg 100 renderered frames + 10 farcry runs + 50 CD's + something else is more than 500 rendered frames + 5 farcry runs + 25 CD's. Obviously I chose the numbers randomly, but you get the point.

Now somewhere else in this forum I made an attempt to do just that anyhow, by (incorrectly) assuming the 840EE spends an equal ammount of processing time on each benchmark, so each benchmark result is weighed equally. Interestingly, the 840EE and X2 ran neck and neck then (within a few percent or so).

Now I said 'incorrectly' because the DivX encoding runs at a lower thread priority, and FarCry (being the foreground app) at a higher priority; therefore the CPU's should spend more CPU time on Farcry and less on Divx, which means the DivX encoding result represents a smaller part of the overall throughput than the Farcry result. When you factor that in, ironically or not, the X2 outperforms the 840EE. I haven't done this exercise yet with the 840EE sans HT results, but I will later, and post it, but I assume the non HT 840 would perform even (much) worse in this regard.

>The bottom line is that this problem is most likey caused by
> software, not hardware

While I agree with your reasoning, I don't agree with your conclusion. First, I don't see a "problem" at all. The X2 does exactly what is being asked of it, that is: do not waste CPU cycles on DivX (low priority) instead focus on Farcry (high priority). So its no problem IMO. Secondly, the differing results are indeed a result of deferring hardware+software. If you want to summarize it neverthrless, it would say that HT can (and often/usually does) improve overall throughput, but possibly at the expense of foreground app performance. Neither the advantage nor disadvantage strike me as particulary huge though, its not different than what a differently tweaked OS scheduler could result in...




= The views stated herein are my personal views, and not necessarily the views of my wife. =
 

ofer987

Distinguished
May 9, 2005
16
0
18,510
What would be interesting is to run the same tests THG did, but use another video-encoding program, such as XVID or WMV instead. Maybe they could also run the same tests on a Linux setup (but with a different 3D game instead :) ). Anyways, cuz I think we need some more testing, such as running similar tests but with different software, in order to draw more definitive conclusions to the whole DIVX and HT affair.

And for all we know, the DIVX encoder could have been programmed in a way that it either requests more CPU time, or is specially programmed for HT-enabled processors. That's why I think THG should have ran other video-encoder programs too.

Oh and by running the test on Linux would help us understand the Windows task schedular a little bit better.

I know that I'm offering this idea after the test. But it would have really been a good test for THG to do. Cuz hey, the whole stress test posed a lot of questions, and IMO that calls for more tests to better understand the Intel and AMD systems.