In its article A Chip Too Far (playing on the WWII movie title A Bridge Too Far) Fortune explains some of the difficulties of multi threading.
Hmmm......where have I heard this before?
Michael Copeland, senior writer, Fortune
August 14, 2008: 5:47 AM EDT
| Quote : The change was set in motion four years ago when Intel (INTC, Fortune 500) and others reached a point where they could no longer make single processors go faster. So they began placing multiple processors (or "cores" ) on a single chip instead.
|
| Quote : Even the creators of multicore chips admit they're causing trouble. "I accept that parallel computing is a big problem," says Sean Maloney, executive vice president for sales and marketing at Intel. He says the company is hiring "a lot" of software people to tackle the challenge. |
So much for the kiddies 'all new games will be quad core optimized' theories

The most interesting thing is some games are multithreaded but do not take full advantage of what they have available.
A good example is VALVe Source engine. It has a consol comman that you can put in (MQM) that allows it wo take advantage of more than just one core or possible advantage of the dual/quads. It boosts the FPS in the games a lot but is not 100% stable.
VALVe has stated they are working on it for their next game Left 4 Dead but thats to be seen.

This is what got me laughing when all those reports came out saying Intel was investing 20 million $'s in it. Admirable as it is, it isnt enough for the upcoming needs.Good to see the word "alot" in the quote
Well, I hope this will stem some of the 'unicorn wishes' that every new app or game coming out will be quad optimized.
Honestly, you would think the fact that dual core has been around for years yet most programs are still single threaded would get that point accross, but it appears the 'quantity is better than quality' mentality can overwhelm logic.

Similar to Ghz, regardless of IPC. People generally either dont have a clue, or dont actually look into it deep enough
You think thats un-nerving I read an interview with the president of Intel a year ago or so. He said that there next step after the multiple cores is, they can apply a magnetic field to the computer. This forces all the elctrons to the outer edges of the mbo and the other chips (gcard etc.). By pushing them to the edges you reduce all the paths of the electrons making them straight. He said that this could produce a desktop size PC that ran at 10 GHZ or more! This is no-bull I read it in Discover. You guys discussion reminded me of it, I havent tried researching it in a while, but come on this was out of the mouth of the guy who started Intel. It also showed a graph of the increase in processor speed then and what was coming up (Dual Core) I wish I could find that issue to see if that graphs predictions of the futer came true. Anyway thought you peeps might find that interesting Peace.
| jaydeejohn wrote : Similar to Ghz, regardless of IPC. People generally either dont have a clue, or dont actually look into it deep enough |
True that.
Though, as Endyen likes to point out, and correctly so, IPC is an antiquated term. Its just that theres no easier way to describe the differences in work accomplished per cycle, and "IPC" is so easy to use to get a point accross, even if its no longer technically accurate.

Carmack said, give me 1 cpu that delivers 10GHz over a quad that does 4Ghz, or something similar. Its true, the need for speed cant be optimized very easily in an OoO world. Thats why graphics need to take a spot in this solution, as gfx cards run in parallel. Theres apps out there, and plenty, where these cards will help. The cpu and the gpu need to find a cozy coexistence until like the above post comes true. Making the electrons straight would increase speed, reduce friction, and allow for better thruput. Making chips for this would be ideal. HKMG might even be overkill if such a thing was practical today
Theory is great. Applying theory is something else. Reminds me of the 10GHz P4s we never saw.

Hmmm..
Question.
Then why does my system (XP home & Vista 64 & Linux) support up to 32 cores on a single socket?
I mean.. so I have a quad system that will never shine as bright as my old P4 northy and my dad's XPAthlon 2100+?
As an old SMP dude we've been hearing about the next killer parallel apps being 'right around the corner' for about ... 10 or more years now? - lol
It started heating up when NT popped and you could get 2p slot1 mobos relatively cheap (compared to the Pentium Pro). Starting getting really hot with Win2k and smokin' with XP Pro. Caught fire with dual-core CPUs and then quads ...
But in consumer land it's pretty much been a big yawn. There really isn't any demand for parallel multithreading on Mom & Pop's Desktop. I can't blame the software developers for it, really. And the game developers really have to thread lightly into the brave new multicore world.
I know it's frustrating for the gamers but I wouldn't count on any quantum leaps in the ever-expanding short term. Most likely any perceivable gains will be seen more from 'load balancing' across cores as opposed to concurrent multiple threads.
Let's face a little reality check, here. A game developer would be creating market suicide if their latest version would only run on dual- and/or quad-core CPUs or at substantially reduced capacity on a single-core. There is still a stout single-core market segment that's willing to pay $40-$50 for a PC game but is not about to pay $1,000+ for a new computer just to play games.
They will buy a $300-$400 game console before they do that ...
Ok so if multi threading is so far away for the average programmer then why arent the development tools - ie Havoc working with multi threaded capabilities thrown in..
For example 1 core working on physics, another on colision, another on user interaction and the other on game variables.
Surely this would make the game run faster as processors work on the task together - or am I missing the point of multi threaded applications..
Theres already multi threaded games out that severely under perform with 1 core. Gaming is the most cutting edge, so multi isnt a question of whether to use it for sales, more of cost in doing so
| Grimmy wrote : Hmmm..
|
In short, thats not nessasarily the case, not becuase of the core count, but because of the processor generation. While you might pump the P4 to a higher freq than the quad (depending which P4 you have) the odds that you can get it fast enough to 'do more work' than a single core on the quad are low.
The long answer:
Interesting territory that delves into, of all things, licencing and packaging as well as logical processors. Remember the old debates about the costs of XP and Vista in relation to multicore because of the socket issue? And the one about XP home supporting only a single core?
First, How does MS define the number of CPUs? By core count, socket or logical count?
Gotta look at the M$ legal stuff for that:
http://www.microsoft.com/licensing [...] icore.mspx
| Quote : Licensing on a per-processor rather than a per-core basis ensures that customers will not face additional software licensing requirements or incur additional licensing fees when they choose to adopt multicore processor technology. Customers who use software from vendors that license by individual core, as other software vendors currently do, may face increased software costs when they upgrade to multicore processor systems. Multicore processor systems licensed on a per-processor basis will also help make this new enterprise computing technology affordable to midsize and small business customers. |
Now that we know how M$ 'counts' CPUs (by socket) how do they support multicore?
Here is a good chart, taken from Paul Thurrotts "Supersite for Windows"
http://www.winsupersite.com/showca [...] _final.asp
Now, according to this, there is no limit on the number of cores Vista will support per socket. Unfortunately, it doesnt address logical CPUs. The number of physical CPUs has nothing to do (not really true) with the number of logical devices'cpus'. If I recall correctly, Vista is limited to 32 logical devices. I dont have a link to prove that, and I very well may be wrong.
The big thing is, just because the OS supports multicore, doesnt mean it makes use of all those cores. Its like a gas tank. You may have a 30 gallon tank in your car, but that doesnt mean it has 30 gallons of gas in it at the moment, only that it can hold up to 30.

Gmae dev is costly, thats the main thing holding it back. Most devs struggle using a multi approach, and those thatre good/acceptable at it, it still runs into costs/overhead scenarios. Hell, in gaming, theyve ruined many a game by diminishing the story line, while keeping eye candy in, only because of cost. In a FPS game, all they need is explosions and blood guts and glory to sell. Multi, story line, and even eyecandy are many times not even a consideration let alone a must. Most games are ported from console, which is a multi solution to begin with, so doing it for gaming comes down mainly to costs
as a programer, i can tell you multi threading is hard, it just not worth it. dont forget that two thread can run on one core or two, it is just 2 line of codes, which can never meet.
the prolem is memory addressing a debugging, one thread cant access another memory. having said for games you only reall need to multithread the engine once, and just reuse it.
on top of that multi core/multithread system are still great as you can run many diffrent apps at the same time.
Multi-threading an application seems easy: make one core do that and the other do that ... In fact it's pretty damn complex since they all have to be synchronized; you cannot render a particle until you know where in space it should be and things like that. Also you have to make sure every thread do not lock a resource the other needs (called dead-locks). Finally, multi-threaded applications are a hell to debug since since you have to deal with what's called "race conditions", some bugs will only show-up in very weird circumstances (ex: when you open an application while performing video encoding) just because some threads will become slower and not answer in time ... looping to my first challenge: synchronization.

What I meant by shine as bright... usually the single core CPU usage at 100% is easier to obtain, not that it can perform better. ![]()
Also, I've mentioned it before, my Vista seems to treat the multi-cores differently, well at least the quad. Kinda like there's a different threshold on the usage. Even Linux has the acts the same to where 1 core has to be fully loaded before the other core takes on another task. Never did take the time to figure out if there's a way to assign a core to a specific task on linux.
Like for example, Super_PI on XP when ran, will only load one core to 100%. On my Vista acts differently, the usages varies from 0-3. The only way to make it act like XP is to set Super_PI affinity to 1 core.
But it's a shame if they can't figure out a way to make mulit-core CPU's better, game wise. Although if you have a 10Ghz CPU, can't imagine how much copper you need to keep it cool.
| Hellboy wrote : Ok so if multi threading is so far away for the average programmer then why arent the development tools - ie Havoc working with multi threaded capabilities thrown in..
|
Which is intersting in how it affects the dynamics of marketing or 'marketeering'
Take for example the Ageia Physics processor
http://www.tomshardware.com/news/ageia-physx,2490.html
Sounds great, right? But wait!!! Why not just use the other CPU cores for that (as you suggest), rather than spending extra money on the Ageia product?
Well, in this latter Ageia Physics processor article by THG, they showed that the requirement to have a Ageia Physics processor to run Cell factor could be bypassed, and Cell Factor could be run without the Ageia Physics processor.
http://www.tomshardware.com/review [...] 285-5.html
| Quote : We took several runs of each test to come to the averaged frames per second results shown in the chart below. The raw scores were all within a frame or two of each other, with each card. |
| Quote : It is clear from the reported scores that there isn't much difference in the performance with and without the card. Of course this is better than having the scores with the card enabled lower than those with the card disabled, as was the case in the Ghost Recon Advanced Warfighter results in the last article. But still, this is not good news for Ageia. The Shader Model 2.0 Radeon X850XT was obviously rendering the scenes using a lower code path, but the results were the same. No matter what card we put into that test platform, we came up with the same 2-3 frames per second difference. |
So, if THGs work is to be trusted, the Ageia Physics processor is basically a waste of money. How much more of a waste would it be if the functions it performs were threaded into softeware to be run on the CPU, or a GPU? Then, the Ageia Physics processor would be uncontestably worthless. Putting that company out of business.
The point to this? Im with you. Thread what you can to the CPU cores. Unfortunately, some stuff just cant be put on the CPU cores.....but it could be put on the GPU, which complicates the problems of multithread developement even further, and its hard enough as it is...
| Quote : 2.1.....Multithreading
|
....not to mention that if you thread everything you can to the CPU, you deny the possibility of selling extra products like Ageia Physics processor.
It all comes down to time and money. Minimzie the time to get the app/game to market, maximize the profit.

| Grimmy wrote : What I meant by shine as bright... usually the single core CPU usage at 100% is easier to obtain, not that it can perform better. |
Well, if super pie runs similarly to Prime, you have to run 1 instance for every core. The embeded 'tasking' apps in TAT are the same. I played with that stuff about 3 years ago on an E6600 (yes, E, not Q). They wouldnt share the load of the instance, but they would spread the indvidual instances themselves accross the cores.

| turpit wrote : Well, if super pie runs similarly to Prime, you have to run 1 instance for every core. The embeded 'tasking' apps in TAT are the same. I played with that stuff about 3 years ago on an E6600 (yes, E, not Q). They wouldnt share the load of the instance, but they would spread the indvidual instances themselves accross the cores. |
I just brought my XP system up. Brought up prime to run 1 worker thread on my dual core. One core remained at 100%.
Super PI is a single thread app, so again, on XP, one core is loaded, even though the affinity is to use any core.
Now on Vista, using prime to run 1 worker thread, it spead the load over all 4 cores to around 30-40%. Its almost the same deal with Super PI, since its set to use any core. So to me, the threshold on Vista doesn't matter as much as my XP or Linux system.
So the OS does have an effect on the loads for the cores.
| Zenthar wrote : Multi-threading an application seems easy: make one core do that and the other do that ... In fact it's pretty damn complex since they all have to be synchronized; you cannot render a particle until you know where in space it should be and things like that. |
I've been writing multi-threaded applications since the late 80s; while it can be complex, there are various methods of dramatically reducing that complexity.
The simplest is to use message queues between threads, with one thread producing messages while the other thread processes them (e.g. on one side might be your physics thread(s) determining where objects are, the other side might be the render thread(s) actually drawing those objects on the screen). That eliminates the synchronisation problems since the only place where you have to synchronise is in the queue code; the downside is that if the system is poorly designed the overhead of the message passing can be greater than the benefit of the extra thread.
So yes, if you keep trying to traditional write single-threaded code and stick multiple threads into it, that's a huge pain in the ass; but that's like complaining that coding in assembler is a huge pain in the ass compared to C++... you're using the wrong tools for the job.
that doesn't sound right as i wouldn't have though you could just spread a thread about like that, spreading the load of multiple apps i thought windows could do but that i didn't.
| turpit wrote : Which is intersting in how it affects the dynamics of marketing or 'marketeering'
|
On Medal of Honour - Airbourne Assault it installs a Ageia Physics card driver wether or not and Ageia card is fitted...
Its a fun game btw..
Now I recommended this game to a mate of mine who is into wargames and such and had a P4 3.2 Ghz and a 7800gs AGP card and it ran like an absolute dog... Wasnt even playable, unlike bf2 and bf:2142 which runs like a dream on his system...
I put this down to the fact that physics will not run on a single core system... The 7800GS all though slow compared to some still cuts some mustard...
Anyway hes got him self a PS3 and the game for this, but the games shabby compared to my pc version - I guess down to the physics "emulation"as I am sure its not programmed in to the PS3 version..
On my C2D 6700 with a 8800GTX it runs unbeliveably smooth with no slow down, so to me it would seem that physics is either running on the second core or with the guts of the 8800GTX..
So this goes further to prove that Ageia cards are an unecessary for a dedicated physic card if the system is hi spec enough.. I mean how many THG frequenters boast about having a Ageia card...
Its about as much use as a 200 dollar network card..
There was talk of an Indiana Jones games with physics built in to the games engine - but in the light of Lucasarts going the way of many of the others, the titles will be passed on to either EA or Activision ( I cant remember who ) it might not even see the light of day..
What would do well for Ageia is to set the standard and license the code in a game creation tool for multi core processors.. Ageia Physics Compliant for example like Nvidia did with TWIMTBP. This maybe a re-invention of the wheel we are looking for whilst the rest of the gaming publishing communities are working out how to get the most of a Q6600 let alone a Skull Trail ( who in their right mind would buy on of these when Nahelem ( oops i7) will out run these) system to give us an enhanced experience over the consoles which are dominating our release schedules..
Unfortunatly, I hate to say it but there is not that much appart from Consoles to PC nowadays - yeah the graphics are better on pc but the game almost plays the same.....Some games like Car racing games, soccer and other sports games are better than pc due to the control pad and steering wheels which never caught on to the pc market as it should of done..
On a second note, I do honestly believe that THG work very hard at giving us benchmarks that are honest, non biased ( some who I would like to mention im sure would disagree ) , true and believable compared to some other un reliable sources which I am sure we all could mention...
All though I'm not a chip, or even a electronic engineer - I have been in this game for nearly 24 years now as a profession diagnosing faults etc and building pcs and have seen things come and go, but the pc is at a bit of a stigmata right now which I would never thought I would see, and needs some clever tools and software to move to the next level to take advantage of what we got, other wise its at a stale mate...
Unfortunatly Vista which promised everything has failed miserably in doing so in becoming the next big thing - more like a little jump and still doing the same thing but looking prettier but with a few complications thrown in ( UAC anyone )
All because your new girlfriend is better looking, doesnt mean her cookings any better !
X264 scales near linearly with # of cores and Mhz
| Grimmy wrote : I just brought my XP system up. Brought up prime to run 1 worker thread on my dual core. One core remained at 100%.
|
Well, I dunno what to tell you. Vista may handle things differently, though I doubt it. I suspect your doing something errr....off key, or youre looking at the CPU usage which is the summation of both cores and not the individual loading. You have to look at the CPU usage history to get individual core loading....at least in XP
Im not home, but I just ran prime on my LT, and without further chatter, here it is:
Note ther was no noatble load sharing. Core 1 was loaded, core 2 was runnin g everything else, until I stared running the second instance

I took the time to do my screen caps:
XP Prime 1 worker thread:

Vista Prime 1 worker thread:

I also used taskmanger to show that all cores are selected.
Edit:
So in order for me to make Vista run like XP, I have to un-select cores 1-2-3, to only have core 0 at 100%.
Does that make it more clear in what I'm saying?
Well, on the XP run, only core one is loaded, in the Vista run it does look like its distributing the load amongst the cores---but it shouldnt. Try running 3 and 4 instances in vista and see what it does to the loading. Make sure youre running the FPU intensive in and not loading the memory

Heh.. that was the FPU stress test.
I don't know why you won't take my word for it.
If I run 4 worker threads.. all 4 cores will be at 100%
Running 2-3 worker threads, it will act the same, it will spread the load through the other cores. I'll do another screen cap with a history.
| Grimmy wrote : Heh.. that was the FPU stress test.
|
Oh come on, you know I rarely take anyones word for anything unless they have the proof to back it up
But your results are indicative that vista does distribute the load....which I find difficult to believe because it is Vista and it is M$. But the thing is, and why I singled out FPU, the memory games Vista plays......I dunno...I just cant see Vista having some form of inherent fine Multithreading....its M$

| BaronMatrix wrote : The obvious issue with multithreading is not exactly the difficulty. The major problem with multithreading is that you can't charge more for the same SW. SO if you sell games for $49 and it costs X dollars to make it singlethreaded but 2X dollars to make it multithreaded, the game makers make half the money. |
Why would a game cost 2x as much to write with multithreading? Maybe if you have a gang of low-skilled code monkeys who don't know what they're doing, but there's no good reason why a mulithreaded game engine should add tens of millions of dollars to your development costs.
| Quote : Games are really the only thing that will need multi-threading on the desktop, but people want a 10GHz CPU for $100 (talk about devaluing something) so they're not going to pick up the extra costs for th more complex development cycle. |
Video compression, 3D rendering, video playback, etc, etc, etc. Anything CPU-intensive will benefit from multi-threading; it's just that most software most people run these days isn't CPU-intensive.
| Grimmy wrote :
|
LOL. in order to make Vista run like XP you'll have to add enough bloatware to consume 5% more processor resources, 25% more memory resources, incur a 5~10% perfromance hit and take up 10 more gig of HDD space

| turpit wrote : LOL. in order to make Vista run like XP you'll have to add enough bloatware to consume 5% more processor resources, 25% more memory resources, incur a 5~10% perfromance hit and take up 10 more gig of HDD space |
Had a feeling you'd say that. I did have dreamscape wallpaper running on what I showed the 1st one, and that does take up CPU resources.
Okay, now I took the time to make sure my idle was low (0-4%) jumps around. Also I used another program to draw a history. Its small, but you still can see how its historgram works:
Vista - Idle - before 1 worker thread set:
Vista - load - 1 worker thread:
Vista - Idle - 2 worker thread set:
Vista - Idle - 2 worker threads load:
That draw a better picture on the load balance is different?
![]()
Edit:
Whelp... believe what you want. I'm not trying to pull anyones leg on this. Just trying to show what I see between my 2 systems, which doesn't contain bloat ware. ![]()
Whelp.. I need to get some sleep.
For some applications multi threading is nearly impossible, and some just eat it up
Its always going to be that way.
Multi threading in software seems to me that it would require at least 3 threads 2 threads doing work and 1 thread doing traffic control or being memory management thread.
Although I still often wonder why cores cannot be organized more like a bucket brigade for software each core carries the bucket a little ways. I can see this as being a great where the source and destination are only in one place but when you move the target/source around one bucket would work best....
turpit it may be hard to believe but Vista does seem to load all the cores very well when doing tasks. Heck I play TF2 and I see core 0 at about 30-50% load and while playing in intense battles the other 3 cores tend to jump between 10-20% load.
Maybe Vista does have the stuff M$ says it does. You never know....

| Grimmy wrote :
|
np.. based on what you showed, im going to have to dig around for a liitle somethinng..we'll see what I find. See its still doesnt doesnt add up. Prime should load all of of all your cores to 100%. That its only loading to to 1/4 total usage still indicates that its still behaving normally, but the question is...how would it "know" what the capctiy is...its confusing.
EDIT--------
BTW, I dont think your trying to pull anyones leg...

| jimmysmitty wrote : turpit it may be hard to believe but Vista does seem to load all the cores very well when doing tasks. Heck I play TF2 and I see core 0 at about 30-50% load and while playing in intense battles the other 3 cores tend to jump between 10-20% load.
|
Loading cores per app is easy. Trying to externally 'share' the load of a single app is something different.

| BaronMatrix wrote : That would be based on a HW scheduling technique across cores. It would make missed branches even worse for some things. That would only work in cases where there's no shared data, but that rarely happens. Though I do remember reading about an initiative that allowed shared data without locks which would solve the deadlock problem. |
Yeah I think trans coding video is about the only ap I can think of that can make use of almost unlimited cores (you can break a movie up into frames and assign them to a core for processing with out corrupting what another core is working on)
Edit: Hmm maybe mirroring the data in ram so each CPU has its own complete data set for some programs ?
This is an interesting topic to say the least
Edit 2: I think different techniques would be needed for different problems (reinventing the wheel here I am sure lol but its a good mental exercise and you never know when you may stumble onto something new)
Branch prediction across CPU's is a cool idea to tell the truth it might have problems with some applications but others would work really well. (easily predicted things like video)
| turpit wrote : Loading cores per app is easy. Trying to externally 'share' the load of a single app is something different. |
Thats what I mean. Source itself is not fully multi-threaded even though certain things are such as the Physics and particle systems. When I play TF2 or any Source based game I have nothing esle running in the background and all of my cores are active. Yes only the main core, core 0, has the most load but the rest still are getting a bit of a load and it confuses me when there is nothing else running.
I think a great example of this is the low FPS people get in TF2 even with a C2D @ 3GHz+ and a 8800+ GPU. Its a weird occurance but it does happen to some people. Yet I can take my Q6600 @ 3GHz and my HD2900 Pro 1GB and play at 1920x1080p on my 40" TV with everything maxed including 16x AA and 8x AF and still get a smooth 60FPS.
IDK. I am just saying it could be a possibility but we wont know unless we do some intensive research.

| Grimmy wrote :
|
np.. based on what you showed, im going to have to dig around for a liitle somethinng..we'll see what I find. See its still doesnt doesnt add up. Prime should load all of of all your cores to 100%. That its only loading to to 1/4 total usage still indicates that its still behaving normally, but the question is...how would it "know" what the capctiy is...its confusing.
EDIT--------
BTW, I dont think your trying to pull anyones leg...
----NEW ENTRY-----
Grimmy,
Ive been doing some digging, but havent found anything yet.
Heres the thing, and its difficult to explain textually....
If Vista is doing what you beleive its doing, if you run 1 instance of prime95, single threaded, you should be seeing 100% load on all four cores.
If Vista is load distributing the actual single thread routine, then immediatly after you start prime, as the load on the primary core builds, some of it will be shunted to the other cores, leaving some of the prim ary cores resourses free. But because prime will consume all available resourses, those resourses 'freed' in the primary core by distributing the load to the other cores should still be consumed, and as the load builds it should continue to be distributed, until the resources of all 4 cores are completely consumed.
In short, given the way you think Vista is handling apps, what you should see is a near instantanious load cascade with all four cores sequentially building to full capacity. UNLESS, there is a limit to the resources prime95 will consume. Now, I have never heard of a limit to the FPU calcs prime will run, but I have heard of memory limitations, which is why I noted to run FPU only. So, if there is an FPU limit, then what you have shown should prove what you beleive to be true. But then, if there was a FPU limit, we should have known about it before now. That 'limit', would have shown for example, as a 50% distribution for an E6600 running prime in Vista. It should also show as something less than ~25% for say, a Q9550...maybe 17`19% accross all for cores (a loose guesstimate).
Now, assuming there is no limit to the FPU calcs on prime, then what youve shown indicates to me that something fishy is going on somewhere, either with prime or with Vista and this is the confusing part, becuase if vista is load sharing as you beleive, it shouldnt be limiting prime to 25%/core.
Guess where I think the fish stink is coming from.
And no, its not you, you know better than that. I dont think youre pulling anyones leg, but I think M$ may be pulling all our legs. It wouldnt be the first time and it certainly wont be the last.

| turpit wrote : np.. based on what you showed, im going to have to dig around for a liitle somethinng..we'll see what I find. See its still doesnt doesnt add up. Prime should load all of of all your cores to 100%. That its only loading to to 1/4 total usage still indicates that its still behaving normally, but the question is...how would it "know" what the capctiy is...its confusing. EDIT-------- BTW, I dont think your trying to pull anyones leg... ----NEW ENTRY----- Grimmy, Ive been doing some digging, but havent found anything yet. Heres the thing, and its difficult to explain textually.... If Vista is doing what you beleive its doing, if you run 1 instance of prime95, single threaded, you should be seeing 100% load on all four cores. If Vista is load distributing the actual single thread routine, then immediatly after you start prime, as the load on the primary core builds, some of it will be shunted to the other cores, leaving some of the prim ary cores resourses free. But because prime will consume all available resourses, those resourses 'freed' in the primary core by distributing the load to the other cores should still be consumed, and as the load builds it should continue to be distributed, until the resources of all 4 cores are completely consumed. In short, given the way you think Vista is handling apps, what you should see is a near instantanious load cascade with all four cores sequentially building to full capacity. UNLESS, there is a limit to the resources prime95 will consume. Now, I have never heard of a limit to the FPU calcs prime will run, but I have heard of memory limitations, which is why I noted to run FPU only. So, if there is an FPU limit, then what you have shown should prove what you beleive to be true. But then, if there was a FPU limit, we should have known about it before now. That 'limit', would have shown for example, as a 50% distribution for an E6600 running prime in Vista. It should also show as something less than ~25% for say, a Q9550...maybe 17`19% accross all for cores (a loose guesstimate). Now, assuming there is no limit to the FPU calcs on prime, then what youve shown indicates to me that something fishy is going on somewhere, either with prime or with Vista and this is the confusing part, becuase if vista is load sharing as you beleive, it shouldnt be limiting prime to 25%/core. Guess where I think the fish stink is coming from. |
Well.. first off, I think my MB died. After last night, it simply won't post anymore. I think the old NV 650i just gave out on me.
Now about the prime95. Its not the older version, its the updated version "25.6.1.0" which you can find from the OC guide on the quad.
In order to load all cores, you assign worker threads as I stated. On my first screen cap on XP, you should noticed only one core loaded since I used 1 worker thread. Now if I use 2 worker thread, the dual core will be at full CPU usage. The same goes for my quad when I assign 4 worker threads, all 4 cores will be fully loaded.
So... when I use 1 worker thread on Vista, it will not load on one core, UNLESS I assign one core using the affinity setting, which I could assign to any core (0/1/2/3). With Super PI, its a single threaded app. Now on XP like I explain (running Super PI 1MB), its loads only one core, but on Vista it doesn't do that, unless again, I use the affinity setting.
I can tell you that when I run 1MB iteration on Super PI without setting the affinity, it will take longer (.500 ms). So it is switching different cores around, kinda like 'hot potato' the process is being pushed by difference cores at time, or keeping the potato in the air so to speak.
But... atm I can't do anymore tests, unless someone here with a quad can do some test for Turpit.
Looks like I'll be looking around for a P35 or P45 chipset this week.
Edit:
Forgot I my OC testing.. here's Prime95 loading all 4 cores with 4 worker threads:
| Grimmy wrote : Well.. first off, I think my MB died. After last night, it simply won't post anymore. I think the old NV 650i just gave out on me.
|
Grimmy,
I understand how prime works. I used the same version to run the demos on my laptop the other day.
What Im trying to explain is that the program for the prime calculation is linear. You can run multiple threads, but those are only individual instances of the same calculation.
How you seem to be seeing this is that the prime FPU cacluation is finite, that is that there is a limit to the number of loops it will generate, like a game. Again, if this was so, we would have seen that limit a long time ago, on any core CPU, single, dual or quad. But prime consumes all the resourses it can if you set it to.
Vista cant 'load share' a program of its own accord...it can only do what the program tells it to do. If the program is written in such a manner that it will allow the OS to distribute the load (multithreaded) AND the OS is capable of such tasking, then that can be accomplished. Regardless, Vista itself cant 'break' apart a program to distribute operations among cores, it can only distribute what the program 'tells' it can be distributed. In the case of prime, were Vista to run like you think it is, shunting calculation loops of its own accord, then all 4 cores should load to 100% on a single thread. We both know this is not happening....thats not neither the question nor whats confusing.
Whats confusing is how Vista is presenting the information on the loading...essentially, it looks like Vista is lying, which frankly would not be suprising.
Here the math behind the prime calcs, from the mersenne site itself.
http://www.mersenne.org/math.htm
From the Great Internet Mersenne Prime Search
| Quote : The next step is to eliminate exponents by finding a small factor. There are very efficient algorithms for determining if a number divides 2P-1. For example, let's see if 47 divides 223-1. Convert the exponent 23 to binary, you get 10111. Starting with 1, repeatedly square, remove the top bit of the exponent and if 1 multiply squared value by 2, then compute the remainder upon division by 47.
|
If Vista were to load share of its own accord, 1 threads worth of the above should load every core to max.

I believe the way to deal with parallel programming is a much more modular design, not just in width, but depth - where a common information dataset is frequently updated for use in all threads.
Basically, a larger number of shallower threads, that allow greater crossflow of information.
For instance, a racing game.
Your car is affected by a number of different parameters, such as: your tyres, your suspension, steering angle, throttle setting/gear, brake setting, lateral g, longitudinal g, roll centre, centre of gravity, aerodynamic centre.
Fundamentally, the user controls just three of these, steering, throttle and brake. Thus, a control variable thread can be set up that updates these properties dependant on controller input.
There are also fixed variables, like centre of gravity.
After which, separate threads could be run for each independent variable, using the other independent variable values from the previous clock cycle (over a very short time cycle, the step variation tends to zero so it is valid to assumption the previous values carry - and there is a natural inertia anyway).
With the threads concentrating on one aspect of the car only, they are shallow, allowing the frequent updates of information.
That is the physics aspect broken up into several threads.
Then for AI, each car can be run on separate threads, again, with crossflow of info so they are racing the other AI cars.
Parallel programming is not easy, but in reality, much of what happens in games is parallel in nature - one thing ripples out to affect the rest.
Shortening the lag from initial perturbation to the change cascading down to affect everything is the key. I think shallower threads allows that.
| turpit wrote : Grimmy, I understand how prime works. I used the same version to run the demos on my laptop the other day. If Vista were to load share of its own accord, 1 threads worth of the above should load every core to max. Edited to make the quote shorter. |
Heh.. I'm okay at math, but thats quite abit for me to take in. So... the only thing I understand, is that you think Vista lying. I suppose that could be true, but it seems when I watch the temps, the corresponding load does cause the same cores temp to go up. And using the affinity can put the entire load on what ever core is chosen.
Just wish I had my quad system up and running. I'll be getting my MB replacement this week, and it's my vacation. Perhaps I could do some more tests once I get it back up and running.
If you can think of any other test to try, I'll try to do it with a screen cap.
I'm surprised I'm the only one seeing this. Can anyone else with a quad on Vista contribute some info or your experience?
| turpit wrote : Well, I hope this will stem some of the 'unicorn wishes' that every new app or game coming out will be quad optimized.
|
You're right. Because something is difficult to do means it will never happen, so we should all just throw in the towel and forget about multicore. And move back into caves while we're at it.
| Grimmy wrote : Heh.. I'm okay at math, but thats quite abit for me to take in. So... the only thing I understand, is that you think Vista lying. I suppose that could be true, but it seems when I watch the temps, the corresponding load does cause the same cores temp to go up. And using the affinity can put the entire load on what ever core is chosen.
|
Grimmy,
If the primary prime (no pun intended) calc is fine threaded, it would allow Vista to shunt those loops, which would explain distribution, but Vista should not be able to 'cap' the number to 1/4 usage per core.....soemthin wierd is going on....I think I have to aks someone to run some tests on this, though I doubt they will.

I get my MB today, which will prolly take me 3 to 6 hours to at least get up and running.
I might try to do a mini movie to show ya what the histogram on task manager is doing.
I know what your saying, that is weird. That was might first thought when I discovered it. I did have a mini movie uploaded, but the site limited it since it was a.. umm.. lil too big? ![]()
Edit:
<--needs to go to work. I'll talk to ya guys later.
| snarfies1 wrote : You're right. Because something is difficult to do means it will never happen, so we should all just throw in the towel and forget about multicore. And move back into caves while we're at it. |
That wasnt my intent, and I appologize if it came off it that way.
My meaning was this: Every day, here in the forum, anytime asks the question "quad or dual core?", invariably there are several people who will answer "get the quad, all new games are being to be written for quads"
This is simply untrue, and will always be untrue. It is the result of the "more is better" mentaility...making statements without knowing the facts. Eventually many games will be optimized for multicore., but not all. The simple little games will never need multicore. The more in depth games will see benefits from multicore. But as so many others stated in the thread, writing mutlithreaded is more difficult, time consuming and costly. If, for a given game, a single thread will provide the same results as a multithread, there is not point to expending the extra resources.
What this means for the time being is that rather than just running out and buying as many cores as you can, you should look at the apps/games you want to see what they will benefit the most from. If someone wants to play FSX, or render heavy 3D, or transcode, a slower quad will benefit (depending how slow) them more than a fast dual. Conversley, if they want to play Doom 3, RTCW, soloaire, surf the web or run a wordprocessor, they will get more beneift from the faster dual. Contrary to all the "everything is going to be quad" kiddies 'unicorn wishes', by the time there is sufficient tri or quad threaded software to undeniably offset the advantages of clock speed, every CPU now on the market will be obsolete....thus negating the 'future proofing' aurgument.

There are 1173 identified and unidentified users. To see the list of identified users, Click here.
You are about to answer a thread that has been inactive for more than 6 months.
If you still wish to proceed, please ensure that your posting is original and does not duplicate or overlap any prior responses to this thread.

