Sign in with
Sign up | Sign in
Your question

Clovertown?? ... I don't get it!

Last response: in CPUs
Share
December 6, 2006 8:52:18 PM

Intel just managed to get a negative performance increase by doubling the number of cores from 4 to 8 on 'multi threaded software'. Can please someone explain to me how is this possible ??? Someone must have screwed up big time, but that just doesn't quite say it.

Even worse the negative performance increase is achieved by using Intel's best architecture, Core 2.

At this point I'm quite sure Intel's 45 nm upcoming chips won't make much difference in 2007, the FSB it's killing them.

And I thought AMD 4x4 launch was poor ... well actually 4x4 managed to bring positive performance increase ...

Share you thoughts on this with me, please!

More about : clovertown

December 6, 2006 8:59:51 PM

Quote:
Intel just managed to get a negative performance increase by doubling the number of cores from 4 to 8 on 'multi threaded software'. Can please someone explain to me how is this possible ??? Someone must have screwed up big time, but that just doesn't quite say it.

Most of the benchmarks in the review are workstation tests, that don't really take advantage of more than a few threads. Plus the comparison is with 3GHz Woodcrests vs 2.67GHz Clovertowns. Even for good scaling tests,

Quote:

And I thought AMD 4x4 launch was poor ... well actually 4x4 managed to bring positive performance increase ...

The FX-74 gets beat by the FX-62 in most single-thread applications, especially in games.
December 6, 2006 9:10:22 PM

Quote:

The FX-74 gets beat by the FX-62 in most single-thread applications, especially in games.

Yeah, but what did you expect?
A quad core is of no use for single threaded applications.
Sure FX-74 has a 200MHz advantage over FX-62, but with a non NUMA aware OS, it has far worse memory latency. (and with a NUMA aware OS, it probably can still have some memory latency problems)
Related resources
December 6, 2006 9:21:23 PM

At worst, one would expect the 3.0 to at least be on par with the 2.8..It's not like the 2.66 Kentsfield performs like a 2.13 E6400.
December 6, 2006 9:28:18 PM

Quote:
Intel just managed to get a negative performance increase by doubling the number of cores from 4 to 8 on 'multi threaded software'. Can please someone explain to me how is this possible ??? Someone must have screwed up big time, but that just doesn't quite say it.

Most of the benchmarks in the review are workstation tests, that don't really take advantage of more than a few threads. Plus the comparison is with 3GHz Woodcrests vs 2.67GHz Clovertowns. Even for good scaling tests,

Quote:

And I thought AMD 4x4 launch was poor ... well actually 4x4 managed to bring positive performance increase ...

The FX-74 gets beat by the FX-62 in most single-thread applications, especially in games.

A few threads??? yahoo messenger has around 30 threads, the winlogon process can have 10, 20 or more ... FireFox with 5 open pages has 17 ... I'm quite sure any one of the multi threaded benchmarks spawns (can spawn) more than 8 threads. The system process in windows can have hundreds of threads ... A typical instance of SQL Server running on my dev. machine has close to 300 threads ...

As a software developer that works with databases every day I sure would like to see some test on database performance.

As far as I know the best threaded software and most scalable is encoding software so I'm expecting to see even worse performance for database tests ... also databases don't even require that much number crunching power they just need fast memory and disk access to scale well.
December 6, 2006 9:33:56 PM

Quote:
At worst, one would expect the 3.0 to at least be on par with the 2.8..It's not like the 2.66 Kentsfield performs like a 2.13 E6400.

Uhh, first off:
Intel: 2.66 / 2.13 = 1.25 = 25% clock advantage
AMD: 3.00 / 2.80 = 1.07 = 7% clock advantage
So as you see, the gap between FX74 and FX62 is much more narrow than the gap between QX6700 and E6400.
Second, Clovertown uses the traditional Northbridge / FSB architecture.
This means that if one CPU is idle (e.g. single threades app), the other is getting no additional latency in accessing memory, and it can use the full bandwidth of the FSB, so it can run at the best of its potential.
In case of HT / Direct Connect architecture used by AMD, sure the 2 CPUs have a theoretical aggregate bandwidth which is twice as high as a single CPU / single socket system.
However, since the RAM is split between the 2 CPUs, if one CPU needs data which is stored into the memory of the other CPU, it has to fetch it through the HT link and has to go through much higher latency than if it had to access it through its Integrated Memory Controller.
With a non-NUMA aware OS, single threaded (or poorly threaded) applications can get their memory space allocated across both CPUs, with overall much higher latency, hence lower performance.
A NUMA aware OS instead, would try to allocate the threads / processes always in the memory portion which is closest to the CPU which is going to process them.
December 6, 2006 9:37:19 PM

Look at the 3D studio max 8 and cinebench 9.5 benchmarks you moron.
December 6, 2006 9:39:53 PM

Quote:

A few threads??? yahoo messenger has around 30 threads, the winlogon process can have 10, 20 or more ... FireFox with 5 open pages has 17 ... I'm quite sure any one of the multi threaded benchmarks spawns (can spawn) more than 8 threads. The system process in windows can have hundreds of threads ... A typical instance of SQL Server running on my dev. machine has close to 300 threads ...

Yes, but nearly all those threads aren't doing any work. They just sleep and occasionally wake up. It's rare that have you applications capable of splitting data to 8 threads and even a little bit of serial code means that scaling will suffer at 8 cores. Cinebench runs 1.8X faster with a second core, but by 8 cores is less than 5X faster.

Quote:

As a software developer that works with databases every day I sure would like to see some test on database performance.

As far as I know the best threaded software and most scalable is encoding software so I'm expecting to see even worse performance for database tests ... also databases don't even require that much number crunching power they just need fast memory and disk access to scale well.

Enterprise databases like SQL Server are very scalable and should respond well to extra cores (assuming you have enough memory and a fast enough storage system). An HP ML370 scores 42% higher in TPC-C just by replacing the 3GHz Woodcrests with 2.67GHz Clovertowns.
December 6, 2006 10:07:34 PM

Quote:
Look at the 3D studio max 8 and cinebench 9.5 benchmarks you moron.

Was it necessary to call him a moron?
This thread had been pretty civil and peaceful... until now. :roll:
December 6, 2006 10:10:25 PM

Yes, this thread is stupid and trollish and he has a history.
December 6, 2006 10:15:20 PM

Quote:
Look at the 3D studio max 8 and cinebench 9.5 benchmarks you moron.

Was it necessary to call him a moron?
This thread had been pretty civil and peaceful... until now. :roll:He did start off by saying that AMD's 4x4 platform has increased performance, while refusing to acknowledge Intel's Core 2 platform.

For the record to "Cryogenic", the Intel Core 2 Extreme QX6700 performs better than the FX74 pair while also consuming a much smaller amount of energy.
December 6, 2006 10:51:34 PM

So if i say something wrong about C2D, it's not like i'm personally insulting you, is it? :?

Quote:
For the record to "Cryogenic", the Intel Core 2 Extreme QX6700 performs better than the FX74 pair while also consuming a much smaller amount of energy.

Then maybe i should call you moron for saying this.
The point of this thread, that you completely missed, even if the poster's conclusions were wrong, was about the somehow disappointing scaling of the Core Quad architecture in a dual socket solution.
The fact that it consumes much less power than the FX74 and outperforms it, has everything to do with the uarchitecture of both CPUs, and little to do with the platform itself.
In the referenced article from Tom's HW, the quad gets beaten by the dual (albeit higher clocked) core in most of the tests.
An exception is, as Action_man said, Cinebench and 3DSMax.
However it's interesting to notice that Cinebench is the only test where the FX74 outperforms C2Q, and in 3DSMax it is also extremely close in performance to it (under Vista, according to this link that Action_Man posted himself.)
Clovertown also clearly wins in 2 other tests, Linpack and Sungard.
Now Cryogenic's conclusions were questionable and fanboyish, and i've already expressed in another thread that i find this testing methodology quite questionable for a server/workstation CPU, but the thread itself presented some interesting topics for discussion.
December 6, 2006 10:59:48 PM

We're at that junction again where we're waiting for software to catch up with the hardware.
December 6, 2006 11:03:16 PM

Sigh sigh sigh.

Quote:
So if i say something wrong about C2D, it's not like i'm personally insulting you, is it?


It makes you WRONG and when you post stupid threads like this one it makes you an idiot.

Quote:
Then maybe i should call you moron for saying this.


Why because hes right?

Lock this stupid thread now.
December 7, 2006 5:19:52 AM

Quote:
Yes, this thread is stupid and trollish and he has a history.


A history of what?

You too have a history hero you know, a history of insulting everyone you don't share an opinion with!

Sorry man if this thread offends you but it's not stupid or pointless!
December 7, 2006 6:04:31 AM

i havent been here so long but action_man always insult people, thats atleast what i have seen. i guess the kangaroo's in his yard pisses him off
December 7, 2006 6:16:07 AM

Quote:
A history of what?


Trolling, being a moron, etc etc.

Quote:
You too have a history hero you know, a history of insulting everyone you don't share an opinion with!


Or people who are too stupid to see that they're wrong.
December 7, 2006 6:17:17 AM

Another stupid American with lame kangaroo jokes, whats the deal with that? That lame ass crap doesn't offend or do anything except make you look like more of an idiot.
December 7, 2006 6:49:29 AM

Quote:
Intel just managed to get a negative performance increase by doubling the number of cores from 4 to 8 on 'multi threaded software'. Can please someone explain to me how is this possible ??? Someone must have screwed up big time, but that just doesn't quite say it.

Even worse the negative performance increase is achieved by using Intel's best architecture, Core 2.

At this point I'm quite sure Intel's 45 nm upcoming chips won't make much difference in 2007, the FSB it's killing them.

And I thought AMD 4x4 launch was poor ... well actually 4x4 managed to bring positive performance increase ...

Share you thoughts on this with me, please!



No you AMD fanboy. That's why a FX-62 outpaces a FX-74 in single threaded applications :roll:
December 7, 2006 6:50:12 AM

Quote:
Another stupid American with lame kangaroo jokes, whats the deal with that? That lame ass crap doesn't offend or do anything except make you look like more of an idiot.



I agree.
December 7, 2006 6:59:09 AM

Quote:
Intel just managed to get a negative performance increase by doubling the number of cores from 4 to 8 on 'multi threaded software'. Can please someone explain to me how is this possible ??? Someone must have screwed up big time, but that just doesn't quite say it.

Even worse the negative performance increase is achieved by using Intel's best architecture, Core 2.

At this point I'm quite sure Intel's 45 nm upcoming chips won't make much difference in 2007, the FSB it's killing them.

And I thought AMD 4x4 launch was poor ... well actually 4x4 managed to bring positive performance increase ...

Share you thoughts on this with me, please!



No you AMD fanboy. That's why a FX-62 outpaces a FX-74 in single threaded applications :roll:

This is not the case on single thread applications running on multi core CPU but the case of multi threaded software that actually benefits from multi core getting a negative performance increase by doubling the number of cores, which is an entirely different issue.

And don't call me a fanboy!!
December 7, 2006 7:01:28 AM

Quote:
Intel just managed to get a negative performance increase by doubling the number of cores from 4 to 8 on 'multi threaded software'. Can please someone explain to me how is this possible ??? Someone must have screwed up big time, but that just doesn't quite say it.

Even worse the negative performance increase is achieved by using Intel's best architecture, Core 2.

At this point I'm quite sure Intel's 45 nm upcoming chips won't make much difference in 2007, the FSB it's killing them.

And I thought AMD 4x4 launch was poor ... well actually 4x4 managed to bring positive performance increase ...

Share you thoughts on this with me, please!



No you AMD fanboy. That's why a FX-62 outpaces a FX-74 in single threaded applications :roll:

This is not the case on single thread applications running on multi core CPU but the case of multi threaded software that actually benefits from multi core getting a negative performance increase by doubling the number of cores, which is an entirely different issue.

And don't call me a fanboy!!Good Luck with that. :) 
December 7, 2006 7:13:51 AM

Quote:
Intel just managed to get a negative performance increase by doubling the number of cores from 4 to 8 on 'multi threaded software'. Can please someone explain to me how is this possible ??? Someone must have screwed up big time, but that just doesn't quite say it.

Even worse the negative performance increase is achieved by using Intel's best architecture, Core 2.

At this point I'm quite sure Intel's 45 nm upcoming chips won't make much difference in 2007, the FSB it's killing them.

And I thought AMD 4x4 launch was poor ... well actually 4x4 managed to bring positive performance increase ...

Share you thoughts on this with me, please!



No you AMD fanboy. That's why a FX-62 outpaces a FX-74 in single threaded applications :roll:

This is not the case on single thread applications running on multi core CPU but the case of multi threaded software that actually benefits from multi core getting a negative performance increase by doubling the number of cores, which is an entirely different issue.

And don't call me a fanboy!!Good Luck with that. :) 

Hey I'm not a AMD fanboy, I've owned several Intel PC and only one AMD it was a K6-III, even now my rig is on Intel ... no matter how much this upsets you Intel true fan boys there's something wrong in the Clovertown picture and I was hoping to get some smart answers not this.

I agree servers should be tested using server software but I don't see any reason why a perfectly good threaded application that takes full advantage of multicore should not receive any kind of advantage by doubling the number of cores....
December 7, 2006 7:18:59 AM

If the slower clocked cpu with more cores isn't using all those cores because the app doesn't have that many threads then *GASP* its going to be slower!



Notice how with 1 cpu the clovertown is slower but with more threads *GASP* its faster! OMG NO WAI!
December 7, 2006 7:27:05 AM

im scandinavian which is in europe :roll: take it easy dundee and go catch some crocs insted of sulting peoples topics and call em idiots
December 7, 2006 7:28:18 AM

Quote:
Intel just managed to get a negative performance increase by doubling the number of cores from 4 to 8 on 'multi threaded software'. Can please someone explain to me how is this possible ??? Someone must have screwed up big time, but that just doesn't quite say it.

Even worse the negative performance increase is achieved by using Intel's best architecture, Core 2.

At this point I'm quite sure Intel's 45 nm upcoming chips won't make much difference in 2007, the FSB it's killing them.

And I thought AMD 4x4 launch was poor ... well actually 4x4 managed to bring positive performance increase ...

Share you thoughts on this with me, please!



No you AMD fanboy. That's why a FX-62 outpaces a FX-74 in single threaded applications :roll:

This is not the case on single thread applications running on multi core CPU but the case of multi threaded software that actually benefits from multi core getting a negative performance increase by doubling the number of cores, which is an entirely different issue.

And don't call me a fanboy!!Good Luck with that. :) 

Hey I'm not a AMD fanboy, I've owned several Intel PC and only one AMD it was a K6-III, even now my rig is on Intel ... no matter how much this upsets you Intel true fan boys there's something wrong in the Clovertown picture and I was hoping to get some smart answers not this.

I agree servers should be tested using server software but I don't see any reason why a perfectly good threaded application that takes full advantage of multicore should not receive any kind of advantage by doubling the number of cores....I didn't mean to imply that you are. What i meant was GL with AM stopping calling you a fanboy. Telling him to stop calling you a fanboy, won't work. AM can be quite headstrong, at times. :) 
December 7, 2006 7:41:30 AM

Quote:
If the slower clocked cpu with more cores isn't using all those cores because the app doesn't have that many threads then *GASP* its going to be slower!



Notice how with 1 cpu the clovertown is slower but with more threads *GASP* its faster! OMG NO WAI!








etc .... I'm not going to show them all but if this isn't negative perf increase I don't know wich is.

BTW the results make Cinebench the most flawed benchmarking application as it's doesn't realistically show real world application scaling ...

So don't trow that Cinebenh crap in my eyes, surely you could do better ....
December 7, 2006 8:19:47 AM

Once again moron if the slower clocked cpu with more cores isn't using all those cores because the app doesn't have that many threads then *GASP* its going to be slower! OMFG NO WAI!

Quote:
etc .... I'm not going to show them all but if this isn't negative perf increase I don't know wich is.


What a moron.

Quote:
BTW the results make Cinebench the most flawed benchmarking application as it's doesn't realistically show real world application scaling ...


If the app uses more then 4 threads then it is moron.

Quote:
So don't trow that Cinebenh crap in my eyes, surely you could do better ....


Surely you can't be this stupid?
December 7, 2006 8:21:03 AM

Quote:
im scandinavian which is in europe Rolling Eyes take it easy dundee and go catch some crocs insted of sulting peoples topics and call em idiots


Wow, ignorance ftw!
December 7, 2006 8:35:14 AM

Quote:
Once again moron if the slower clocked cpu with more cores isn't using all those cores because the app doesn't have that many threads then *GASP* its going to be slower! OMFG NO WAI!

etc .... I'm not going to show them all but if this isn't negative perf increase I don't know wich is.


What a moron.

Quote:
BTW the results make Cinebench the most flawed benchmarking application as it's doesn't realistically show real world application scaling ...


If the app uses more then 4 threads then it is moron.

Quote:
So don't trow that Cinebenh crap in my eyes, surely you could do better ....


Surely you can't be this stupid?

Well I don't mind you insulting me if it actualy shows your class for the world to see ...... fanboy
December 7, 2006 8:38:36 AM

The scalability of an application is not based merely on the number of threads it spawns but the workload balancing among those threads and the spread of data transfer to avoid creating a bottleneck. Encoding applications are notorious for uneven workloads because the output is a stream or consists of blocks requiring varying amounts of CPU time, and switching to a new block requires burst data transfer to/from HD or RAM.

So far I don't see a lot of evidence of 4x4 scaling much better than Clovertown in any one application. I was expecting a much smoother NUMA setup for 4x4 to help with memory intensive applications and hope AMD finds its way around the high cHT latency.
December 7, 2006 8:38:47 AM

Good comeback assclown, completely wrong and you've got nothing. Remember when the dual cores came out and people were crying because they didn't run games faster? Games used one thread while there were two cores, same thing here but theres more cores and threads.

Come back when you know something.
December 7, 2006 8:39:51 AM

Quote:
The scalability of an application is not based merely on the number of threads it spawns but the workload balancing among those threads and the spread of data transfer to avoid creating a bottleneck. Encoding applications are notorious for uneven workloads because the output is a stream or consists of blocks requiring varying amounts of CPU time, and switching to a new block requires burst data transfer to/from HD or RAM.

So far I don't see a lot of evidence of 4x4 scaling much better than Clovertown in any one application. I was expecting a much smoother NUMA setup for 4x4 to help with memory intensive applications and hope AMD finds its way around the high cHT latency.


Yay someone else out there gets it! Its a christmas miracle!
December 7, 2006 8:52:25 AM

Quote:
Games used one thread while there were two cores, same thing here but theres more cores and threads.

Wrong! most (almost all) muti threaded aplications are not dual threaded or quad threaded!!! they simply spawn worker threads to do work. The numer of worker threads is either equal to core count, configurable or more !!!

This is where the diference lies! sigle threded aps don't take advantage of mlticore, multi threaded should take advantage of all of them, and they do[!] but not on the Clovertown architecture!
December 7, 2006 8:53:37 AM

haha really, ur pathetic. :lol: 
December 7, 2006 8:56:36 AM

Great comeback moron.
December 7, 2006 9:04:23 AM

Quote:
output is a stream or consists of blocks requiring varying amounts of CPU time, and switching to a new block requires burst data transfer to/from HD or RAM.


Well this is a fact that treads need to exchange some data and I don't see how one can build software without it. Should we make compresion software that never joins the pieces together or what? so that clovertown performs well? No, this is a limitation of the architecture, being unable to exchange data well without critical performance drop!


Quote:
So far I don't see a lot of evidence of 4x4 scaling much better than Clovertown in any one application. I was expecting a much smoother NUMA setup for 4x4 to help with memory intensive applications and hope AMD finds its way around the high cHT latency.


Does't scale much better? It actualy manages to close the gap between Core 2 and K8 ... wich is infinatly better than negative scaling.
December 7, 2006 9:35:40 AM

seriously i wonder if your dumb or joking me cuz u really dont sound very intelligent

dont call other people morons or stupid if u dont have anything 2 have it in! it really shows ur lack of intelligence.
December 7, 2006 10:14:34 AM

oh-oh, this going to become a fan boy flame war? :roll:
December 7, 2006 10:42:33 AM

Quote:
Well this is a fact that treads need to exchange some data and I don't see how one can build software without it. Should we make compresion software that never joins the pieces together or what? so that clovertown performs well? No, this is a limitation of the architecture, being unable to exchange data well without critical performance drop!


I was not explicit enough and perhaps contributed to a misunderstanding. The lack of scalability of many benchmarks from four to eight cores in the 2P Clovertown review probably has nothing to do with bandwidth constraints but program limitations.

When I mentioned encoding threads encountering a bottleneck at the RAM/HD when trying to move on to the next block, I was speaking theoretically of bad software design to illustrate the difficulty of multithreaded programming. Such a design would hurt 4x4, too, not just Clovertown - though I see no clear evidence of such a problem in any of the published benchmarks. In reality, any decent encoder or archival application employs buffers and keeps track of work finished for multiple worker threads... but only up to a limit.

Two years ago all the popular consumer-level encoders/archivers didn't scale from one to two cores. Today, they pretty much all scale to two cores, but only some continue onto four cores. And in THG's review of 2P Clovertown, it seemed apparent that none of the tested encoders or archival tools scaled to eight-core, although proof of that would involve comparative benchmarks on eight-core Opteron systems, which I have yet to find. Typically, programming for massively parallel workloads is reserved for professional level applications. You don't hear anyone today seriously complaining that DivX or WME can't keep track of more than four cores because most people are still running on one or two.

Quote:
Does't scale much better? It actualy manages to close the gap between Core 2 and K8 ... wich is infinatly better than negative scaling.


This is what I fail to see - K8 closing the gap with C2D in scaling. I'm looking at this pretty comprehensive benchmark review at Xbitlabs which includes identical clock comparisons of the FX-62 (2x 2.8GHz), FX-72 (4x 2.8GHz), E6700 (2x 2.66GHz), and QX6700 (4x 2.66GHz): http://xbitlabs.com/articles/cpu/display/amd-quad-fx_9.....

That page contains the encoding benchmarks, and I've also looked at the other pages except for the purely synthetic SysMark/PCMark page. I fail to see a single instance where Kentsfield scales noticeably worse than 4x4. Can you find one and write back with the name of the benchmark?

So far, the only thing 4x4 has helped AMD accomplish is to spread out the heat dissipation of the four K8 cores such that they can keep the voltages and clock speeds higher than they could under a single socket. This is not bandwidth scaling but rather processor frequency scaling. But the K8 core is far enough behind the C2D that this doesn't make 4x4 faster overall.
December 7, 2006 10:49:31 AM

Quote:

A few threads??? yahoo messenger has around 30 threads, the winlogon process can have 10, 20 or more ... FireFox with 5 open pages has 17 ... I'm quite sure any one of the multi threaded benchmarks spawns (can spawn) more than 8 threads. The system process in windows can have hundreds of threads ... A typical instance of SQL Server running on my dev. machine has close to 300 threads ...

As a software developer that works with databases every day I sure would like to see some test on database performance.

As far as I know the best threaded software and most scalable is encoding software so I'm expecting to see even worse performance for database tests ... also databases don't even require that much number crunching power they just need fast memory and disk access to scale well.


Sure they all runs heaps of 'threads', for overlapped workloads, etc, but how many are actually multi-threaded to the point where they'll use more than 1 CPU cores worth of processing power ?

The pefect example is game server software from 2000, they run with 6-12 threads, but will only use the equiv of 1 CPU core (eg: 80% of one core, 5% of another, 10% of another, 2.5% on another two).

The thread count in TaskManager does not indicate how many threads can be scaled over multiple processor cores.
December 7, 2006 11:16:37 AM

Ugh, this name calling is elementary school material.

Do people really believe that if you call someone names (and more names = more proof) that you will actually come across as believable and you will prove your point?

I cannot see why people cannot have a civil discussion and explain their points of view without having to resort to name calling. Not everyone will agree, and if you don't like someone in general, then just avoid discussion with them altogether. All it takes is a little self-control.
December 7, 2006 11:17:01 AM

Quote:
That page contains the encoding benchmarks, and I've also looked at the other pages except for the purely synthetic SysMark/PCMark page. I fail to see a single instance where Kentsfield scales noticeably worse than 4x4. Can you find one and write back with the name of the benchmark?


Kentsfield doesn't scale worse than 4x4 , Clovertown scales far worse than Kentsfield or 4x4. I'm not expectiong doubling in perf by doubling cores but at least some gains like 50% although with a good architecture the gains should be around 80% if badwidth scales acordingly and latency doesn't increase very much!

If the same encoding benchmarks showed a perf increase of sometimes 80% by going from dual to quad core although they were never recompiled (wich means the are properly threaded) why aren't they gaining at least a mere 25% with Clowertown?

I highly doubt the fact that all of the encoders are poorly writen and limited to 4 cores because even before Clovertown launch, computers with more than 4 procesors cores were available and video encoding is also done on high end computers, sometimes with 8 or 16 procs, so there was absolutely no reason in the world not to do proper threading for encoding appplications.

TH would do all of us a great favor if they published all the cores load percentage during the tests, to see if it's a software or harware problem, but the fact is that they never mentioned that only half of the cores were used!
December 7, 2006 11:53:55 AM

Quote:
Kentsfield doesn't scale worse than 4x4 , Clovertown scales far worse than Kentsfield or 4x4. I'm not expectiong doubling in perf by doubling cores but at least some gains like 50% although with a good architecture the gains should be around 80% if badwidth scales acordingly and latency doesn't increase very much!


Kentsfield and Clovertown are basically the same chip connected to different packages to fit their respective sockets/motherboards. But I guess what you mean is that Clovertown doesn't appear to be scaling from 4->8 cores in many applications (meanwhile, there is no 8-core setup with Kentsfield or 4x4, so we're speaking here of 2->4 scaling) - you're right that the benchmarks are suggesting this.

But the lack of any scaling at all on a CPU-based benchmark is a strong indicator of a software limitation and not of full FSB saturation.

Quote:
If the same encoding benchmarks showed a perf increase of sometimes 80% by going from dual to quad core although they were never recompiled (wich means the are properly threaded) why aren't they gaining at least a mere 25% with Clowertown?


It means that when the encoding software was updated from single-core to multi-core, the programmers planned to support up to 4 cores, not just 2. Some developers saw a little ahead and thought correctly that dual core would be followed by quad core on the desktop.

Quote:
I highly doubt the fact that all of the encoders are poorly writen and limited to 4 cores because even before Clovertown launch, computers with more than 4 procesors cores were available and video encoding is also done on high end computers, sometimes with 8 or 16 procs, so there was absolutely no reason in the world not to do proper threading for encoding appplications.


Those were server and workstation computers, and they normally run professional software, which tends to support large core counts. Encoding companies do not use Windows Media Encoder, Xvid, AutoGK, or DivX to make their HD-DVDs. End-users run this stuff. :) 

Quote:
TH would do all of us a great favor if they published all the cores load percentage during the tests, to see if it's a software or harware problem, but the fact is that they never mentioned that only half of the cores were used!


I completely agree - THG didn't explore why there was no scaling at all, nor did they comment on it. The readership needs to know that 0% scaling is probably caused by software limitations.
December 7, 2006 11:54:39 AM

Quote:
That page contains the encoding benchmarks, and I've also looked at the other pages except for the purely synthetic SysMark/PCMark page. I fail to see a single instance where Kentsfield scales noticeably worse than 4x4. Can you find one and write back with the name of the benchmark?


Kentsfield doesn't scale worse than 4x4 , Clovertown scales far worse than Kentsfield or 4x4. I'm not expectiong doubling in perf by doubling cores but at least some gains like 50% although with a good architecture the gains should be around 80% if badwidth scales acordingly and latency doesn't increase very much!

If the same encoding benchmarks showed a perf increase of sometimes 80% by going from dual to quad core although they were never recompiled (wich means the are properly threaded) why aren't they gaining at least a mere 25% with Clowertown?

I highly doubt the fact that all of the encoders are poorly writen and limited to 4 cores because even before Clovertown launch, computers with more than 4 procesors cores were available and video encoding is also done on high end computers, sometimes with 8 or 16 procs, so there was absolutely no reason in the world not to do proper threading for encoding appplications.

TH would do all of us a great favor if they published all the cores load percentage during the tests, to see if it's a software or harware problem, but the fact is that they never mentioned that only half of the cores were used!

Ha, you must have missed the part where Kentsfield and Clovertown are the exact same cpu except one is for socket LGA775 and one is for LGA771...

And i'll just send the reminder if the program does not utilize more all 4 cores on a 4 core cpu it will lose everytime to a cpu the is clocked faster regardless of core count, but since it is shown against a dual core you are like omg its slower than dual cores when that is not the case at all. What shows the truth is where 4 cores clocked slower(2.66GHz) are faster than a dual core clocked at (2.9Ghz)

Your thinking is flawed and all this bs is software limited. Not hardware limited. With the correct code doubling the cores would theoretically double performance, -3% due to memory or other random bottlenecks.

Also the fsb is not limited any of intel's current cpu's when they are clocked at stock speed. Its a old technology but it gets the job done for now till they come up with something better
December 7, 2006 12:15:43 PM

Not actually in reply to: IcY18

Man, people...

8-cores, over 2 sockets, and a heap of L2 cache, with 2 independent FSBs at 1333 MHz. (Screw ccNUMA for now if this provides 21.33 GB/sec peak without the NUMA complexity or 'stuttering' issue).

It is made for consolidation of servers as virtual machines to a single platform that is only 4.5 cm tall, and 19" wide. 8)

Next people will be complaining no software takes advantage of 'Sh' or 'C+' style code on GPUs so they are not seeing a 20x to 200x fold increase in floating point stream processing over huge arrays (eg: Encoding HD video in minutes, not hours or days using ATI's next card with Shader Model 4.0, or nVidia GeForce 8800 series cards or better).

Seriously - :roll: :lol:  :?

8)

You want software to scale over 8 cores or be 20x to 200x faster by offloading some calcs to a SM4.0 capable GPU (even if only DX 9.0c is installed btw) ?, How badly ?....

Badly enough to write it yourself ?

- No I didn't bloody think so - :p 
December 7, 2006 12:17:23 PM

I think we should have a Xmas appeal to ban annoying little tits from Toms. Reported.
December 7, 2006 1:10:06 PM

he call all people with less than 1000 posts names i believe

i havent been here that long but i have seen him sometimes just burst into topic and call people stupid and insulting em cuz they have another opinion than himself.
December 7, 2006 8:23:31 PM

Quote:
Ugh, this name calling is elementary school material.

Do people really believe that if you call someone names (and more names = more proof) that you will actually come across as believable and you will prove your point?

I cannot see why people cannot have a civil discussion and explain their points of view without having to resort to name calling. Not everyone will agree, and if you don't like someone in general, then just avoid discussion with them altogether. All it takes is a little self-control.


Completely agreed. Civil and informed discourse is the only way to coherently exchange information. And if you weren't so ugly and your mother didn't dress you so funny and if you didn't have cooties you'd know that.

:lol: 
December 7, 2006 9:40:21 PM

Quote:
But the lack of any scaling at all on a CPU-based benchmark is a strong indicator of a software limitation and not of full FSB saturation.


Quote:
It means that when the encoding software was updated from single-core to multi-core, the programmers planned to support up to 4 cores, not just 2. Some developers saw a little ahead and thought correctly that dual core would be followed by quad core on the desktop.


Quote:
Those were server and workstation computers, and they normally run professional software, which tends to support large core counts. Encoding companies do not use Windows Media Encoder, Xvid, AutoGK, or DivX to make their HD-DVDs. End-users run this stuff.


Hurrah for people who get it, someone buy this guy a drink.
!