Sign in with
Sign up | Sign in
Your question

Will DX11 help end MicroStuttering?

Last response: in Graphics & Displays
Share
a b U Graphics card
June 17, 2009 3:58:16 AM

Micro Stuttering is seen on multi card setups in alot of cases, where with lower fps, one line is drawn quickly, while the other is often waiting or lagging behind, causing the MS effect.
Its my understanding, having DX11, not the cards, will boost cpu usage, and cause a really decent showing on the effects of minimal fps, and thus possibly also reduce MS, thru the use of DX11' MT.
Thoughts?
a c 271 U Graphics card
June 17, 2009 9:26:04 AM

Here is a perfect example of why 'microstuttering' is a non issue http://www.overclockers.com/index.php?option=com_content&view=article&id=4420:microstutter&catid=60:videocards&Itemid=4266 How much onboard cache does that CPU have?, overclocking the crap out it is not going to compensate. There's an old adage from the drag racing world that sort of applies here "you can't beat cubes" or in this case onboard cache, if that rig was running an e8400, e8600 or a Q9550 or suchlike with those cards there would be no 'microstutter'.
a b U Graphics card
June 17, 2009 9:42:08 AM

Thats my point. Speed can be obtained 2 ways, faster or wider. If the MT available in DX11 helps the cpu to get more/sooner, it could theorhetically help eliminate MS.
Looking at alot of examples, using AFR, theres all those micro gaps in time from 1 line to the next, causing the MS.
Dont want to be attacked, but Annand said the P2 seemed smoother when tested in games, similar to the i7. To me this sounds like quicker communication between the cores on a native quad vs running thru the FSB, as seen on C2Ds. Multiply the imput by using 2 cards, with mainly (currently) just 1 thread available, and youll still see MS, or something not as smooth as native, having that extra time to go thru the FSB.
DX11, for the first time allows for more than 1 thread out, and more than 1 thread in, thru the gpu (tho, this is now standard), but by having more going out, the cpu will have its data faster, even on slower cpus, and if those cpus are multi cored, will be able to push/use the data faster as well
Related resources
a c 271 U Graphics card
June 17, 2009 10:02:48 AM

I don't think there will ever be a software compensation for lack of onboard cache, it's been noted by many over a long period of time that the more onboard the better especially when it comes to gaming, things might be a little different with the i7's and the new 'Lynnfields' but as long as onboard cache is part of the CPU I reckon it will always be the decider and those who choose to buy the lowest CPU in the range thinking that they can simply OC to make up the difference will be the one's who will the complain of 'microstutter' whereas those who understand about matching components to each other will get the better suited CPU's and not have any issues other than having to put up with the ceaseless whining from the uneducated who have no idea how to put together a gaming rig.
a b U Graphics card
June 17, 2009 10:18:09 AM

At a certain point tho, cache becomes redundant, as most cpus today have enough, at least the uppers tiers do. Some games require alot from the cpu, and using 2 cards as well is usually when/where MS is seen, as the flow of data is bottlenecked, causing the MS, even with a decent cpu. Of course, as youve said, its determined by many things, res can also play a part as well. Im just hoping for a better thruput using DX11, which will help alleviate things like MS, and itll certainly raise the minimal fps in games, thats the good news, and its where MS shows its ugliness
a c 271 U Graphics card
June 17, 2009 10:39:40 AM

Only time will tell, as it's going to take the game devs a while to start using DX11 exclusively as the hardware to run it on is yet to appear and mature and as long as people are still running older kit the previous incarnations of DX will still have be around to cause coding issues for the devs, Assasins Creed's DX10.1 being removed by it's latest patch is a case in point (if true).
a c 130 U Graphics card
June 17, 2009 10:55:17 AM

I think MS is intrinsic to the whole set up personally by which i mean i agree with mousemonkey that matching components is by far the most important aspect of trying to stop these types of issues.
As far as i know there are a few theories about why MS happens but nothing proven to be the cause. Please correct me if I'm wrong.
While i understand where JDJ is coming from with the theory, i believe its equally possible for the increase in CPU usage to actually cause the issue rather than alleviate it.
Personally, and i have no proof here but i think the issue is something to do with the internal timings. That includes things like the lag you would get from running a CPU with low cache right through to the timings of the refresh interval and the whole host of issues that can occur with frame buffers etc, which would cause tearing or general lagging in a system with a single card. I believe MS is just an extension of these issues when occurring in a dual card set up.
As i said i have no proof, its just my theory based on what i know of the issues that can blight the whole graphics process.
When you start trying to compensate for any issue with any process be it computing,engenering etc, you end up adding variables that have the possability of causing further issues later. Much better to fix the problem at source than put a sticking plaster over it.

Mactronix
a c 271 U Graphics card
June 17, 2009 11:10:51 AM

This MS thing is the visual representation of a mismatched system IMHO, in the same way that a flaming fireball would be the visual representation of using the brakes from a Nissan micra on an F1 car, it would be cheaper and allow for more cash to be spent on making the engine more powerful and in the pit garage the car would stop when you pressed the brake pedal but at the first corner in the race you would soon be regretting your spending strategy.
a b U Graphics card
June 17, 2009 4:35:06 PM

Part of my point is, that having an underpowered cpu, DX11 will help alleviate these problems. i5 is said to be a screamer, with its fast abilities cpu/pci. That, and having the MT will/should make MS a thing of the past
a b U Graphics card
June 17, 2009 8:38:07 PM

I'm still waiting to hear what the exact cause is. I have my own theory...

Lets assume you arge getting 60FPS. That means that every second, 60 frames are drawn and displayed to the screen. The assumption we always make, however, is that we assume that a new frame will be ready every 1/60th of a second.

I believe Microstutter is the result of frames being delayed in the short term, causing skipping of frames without affecting the actual FPS count. For instance, instead of 1 new frame being created every 1/60th of a second for a full second (like we always assume), I think we get something more along the lines of (using a 10 frame sample)

Frame: Frames Drawn: Frameskip (from last drawn frame):
1 1 --
2 1 0
3 0 --
4 2 1
5 1 0
6 2 1 (A forward skip)
7 0 --
8 1 1
9 0 --
10 2 1

Hence, we get 10 frames as we expect, but over the corse of the run, we skip a total of 4, or 40%. Yet, if this pattern were to hold, we'd be getting a constant 60FPS on the screen. Hence: MicroStutter. And considering the latencies involved, it would be no shock why Dual GPU's would suffer from this type of behavior.

As such, I see the only solution to Microstutter as faster, low latency, GPU's. Thats my theory anyway, and it makes more sense then most of the other ones out there.
a c 271 U Graphics card
June 17, 2009 8:51:34 PM

gamerk316 said:
I'm still waiting to hear what the exact cause is. I have my own theory...

Lets assume you arge getting 60FPS. That means that every second, 60 frames are drawn and displayed to the screen. The assumption we always make, however, is that we assume that a new frame will be ready every 1/60th of a second.

I believe Microstutter is the result of frames being delayed in the short term, causing skipping of frames without affecting the actual FPS count. For instance, instead of 1 new frame being created every 1/60th of a second for a full second (like we always assume), I think we get something more along the lines of (using a 10 frame sample)

Frame: Frames Drawn: Frameskip (from last drawn frame):
1 1 --
2 1 0
3 0 --
4 2 1
5 1 0
6 2 1 (A forward skip)
7 0 --
8 1 1
9 0 --
10 2 1

Hence, we get 10 frames as we expect, but over the corse of the run, we skip a total of 4, or 40%. Yet, if this pattern were to hold, we'd be getting a constant 60FPS on the screen. Hence: MicroStutter. And considering the latencies involved, it would be no shock why Dual GPU's would suffer from this type of behavior.

As such, I see the only solution to Microstutter as faster, low latency, GPU's. Thats my theory anyway, and it makes more sense then most of the other ones out there.

To you perhaps.
a b U Graphics card
June 18, 2009 1:03:05 AM

gamerk316 said:
I'm still waiting to hear what the exact cause is. I have my own theory...

Lets assume you arge getting 60FPS. That means that every second, 60 frames are drawn and displayed to the screen. The assumption we always make, however, is that we assume that a new frame will be ready every 1/60th of a second.

I believe Microstutter is the result of frames being delayed in the short term, causing skipping of frames without affecting the actual FPS count. For instance, instead of 1 new frame being created every 1/60th of a second for a full second (like we always assume), I think we get something more along the lines of (using a 10 frame sample)

Frame: Frames Drawn: Frameskip (from last drawn frame):
1 1 --
2 1 0
3 0 --
4 2 1
5 1 0
6 2 1 (A forward skip)
7 0 --
8 1 1
9 0 --
10 2 1

Hence, we get 10 frames as we expect, but over the corse of the run, we skip a total of 4, or 40%. Yet, if this pattern were to hold, we'd be getting a constant 60FPS on the screen. Hence: MicroStutter. And considering the latencies involved, it would be no shock why Dual GPU's would suffer from this type of behavior.

As such, I see the only solution to Microstutter as faster, low latency, GPU's. Thats my theory anyway, and it makes more sense then most of the other ones out there.


If this were the case, then wed see it more on single gpus, and not on just on CF/SLI, where itd be even worse than it is now.
D3D allows the cpu to render 3 frames ahead, which having an even faster gpu, would put even more strain on the cpu, just like having CF/SLI does. Having a higher amount of cache on the cpu allows for this, and its what mm is refering to, so the cpu has enough room for rendering 3 frames ahead, without lag.
My understanding is, the input from 2 or more cards pushes the cpu beyond its limits, whether its cache, or just plain old speed, and we lose that constant 3 Fr Ahead. When this happens, the frames jump in lag time, not that theyre not drawn at all, just that theyre not drawn in a sequential timing, but a varied timing, and is detectable by the human eye.
I dont know all the intricacies of DX11, but getting MT out to the cpu should help in some instances, preventing the lag, or its at least what I hope itll do, that, and create better minimal fps
a b U Graphics card
June 18, 2009 12:00:20 PM

True, based on my theory, we probably should see more M$ on single GPU's; then again, I never hear of Microstutter cases where FPS is high...I think there might be a threashold where the GPU simply can't keep the buffers filled, leading to my theoretical behavior. I'd love to hear alternative theorys on MS though;

I also doubt that the CPU will have any impact that is positive on rendering. Rendering is by default a highly mathematical function that requires a massive amount of percise data, and I for one would expect the low bus width/register size that CPU's use to bottleneck everything if the CPU were to get involved. Heck, we already know the GPU is faster then the CPU for PhysX, another heavily mathematical function, so why would the CPU suddenly help with rendering, let alone MS?

EDIT

I jsut realized i missed a major point: yes, D3D does allow frames to be rendered ahead. Of course, if your only getting 45FPS, i'm willing to take a guess at how often there is spare time to build a frame ahead of time...
a c 130 U Graphics card
June 18, 2009 4:30:00 PM

gamerk316 said:
True, based on my theory, we probably should see more M$ on single GPU's; then again, I never hear of Microstutter cases where FPS is high...I think there might be a threashold where the GPU simply can't keep the buffers filled, leading to my theoretical behavior. I'd love to hear alternative theorys on MS though;

I also doubt that the CPU will have any impact that is positive on rendering. Rendering is by default a highly mathematical function that requires a massive amount of percise data, and I for one would expect the low bus width/register size that CPU's use to bottleneck everything if the CPU were to get involved. Heck, we already know the GPU is faster then the CPU for PhysX, another heavily mathematical function, so why would the CPU suddenly help with rendering, let alone MS?



Um sorry what ??? CPU have no impact on rendering??? The whole second part of your post just dosent make sense. Im assuming you actually know how the computer gets the image on the screen and as such are aware that the CPU is already heavily involved in the process.

Mactronix
a b U Graphics card
June 18, 2009 5:06:32 PM

For rendering, the only purpose the CPU serves is to give the renderer (the GPU) the data needed to create the image. This includes any other mathematical calculatons (Status, AI, Physics, etc) that are needed to determine the placement of objects in 3D space. In that regards, a slow CPU can effectivly bottleneck the process, but a faster one may not speed it up. Basically, the CPU, for gaming purposes, exists to CREATE the picture that will be sent to the GPU to be rendered (through Razterization).

My point is a simple one: The CPU is already overloaded as it is. Think about it: The CPU already needs to track everything that goes on in the game; every status, AI routine, Physics, etc, and now you want to RENDER as well? I believe all this would accomplish is delay the execution of all the other functions to the point where the CPU would be delayed in creating the next image the GPU needs to render (as a result of delayed AI/Physics/Status updates), creating a situation where adding CPU horsepower could theoretically slow performance due to delays elsewhere.

Dedicating one core to the cause might be worthwhile, but I don't belive there would be any significant impact for performance, while with the Windows OS itself becoming more optimized, might cost some overall system performance.
June 18, 2009 5:34:28 PM

PCI-3.0 might fix this issue, DX11 is going to be best at a PCI-E 3.0 interface, the early DX11 cards in pci-e 2.0 will be serverly limited.
a b U Graphics card
June 18, 2009 7:20:26 PM

And how do you come to that conclusion? The data that passes over a PCI-E lane is determined by the card, not the API. The API has nothing to do with the PCI-E interface whatsoever.

Again, I'd love to hear alternate theorys for MS, other then the usual "It exists".
a b U Graphics card
June 19, 2009 12:58:18 AM

Rendering 3 frames ahead. If at any time the gpu has to wait, I believe is when we see MS. I believe a decent cpu, that gets its info quicker, or more widely/wisely dispersed will/should keep better pace, thus my question.
Since most games are single threaded, and theres no getting around that in many instances, as theres just too many things being done serially, having the use of MT to the cpu should be a boon
a c 130 U Graphics card
June 19, 2009 7:26:07 AM

Um am i being thick here or are JDJ and gamerk316 talking at cross purposes here.
First JDJ you are saying that DX11 will make more use of the CPU as in Multi threading capabilities not just putting more load on it ?
It seems to me that gamerk316 seems to think you are talking about putting more load on the CPU as is, ie single threaded.

Thats i hope has got to be what you mean otherwise gamef316 is definatly right in saying they are asking for trouble trying to get the CPU to do more.

@ maximiza

I to would be interested to see some links supporting your statement about PCIE 3 being needed for DX11 to be seen at its best.
I mean if there is going to be that much more trafic over teh bus then we all need i11's now.

Mactronix
a b U Graphics card
June 19, 2009 8:04:09 AM

Yes, its what I mean. Instead of coming in single threaded, and the cpu having x amount of work to do in y amount of time, itll be MT'd, and should be processed quicker, and allow for the timing to be smoother
a b U Graphics card
June 19, 2009 8:55:37 AM

in the same way that a flaming fireball would be the visual representation of using the brakes from a Nissan micra on an F1 car, said:
in the same way that a flaming fireball would be the visual representation of using the brakes from a Nissan micra on an F1 car,


rofl! priceless..
a b U Graphics card
June 19, 2009 11:55:25 AM

JAYDEEJOHN said:
Yes, its what I mean. Instead of coming in single threaded, and the cpu having x amount of work to do in y amount of time, itll be MT'd, and should be processed quicker, and allow for the timing to be smoother


But thats where I disagree. More and more, we are seeing games taking advantage of more cores, yet even in quad optimized games (Lets use GTA:IV) we see MS. Also remember, even in non-threaded apps (there are very few "single threaded" apps), the second core will be used simply due to the way the OS handles the data.

I do not think MS is a timing issue with the CPU. If that were the case, we could just plop in a old P4 and see the MS.
a b U Graphics card
June 19, 2009 12:25:02 PM

DX11 changes all that. A game may be MT optimized, but theres still only a single thread coming thru, which is now changed. In other words, its currently viable to send out 1 string, from muliple sources, or multi strings from 1 source. Now, we have both.
So, even if todays games are MT'd, its not the same. Its better, and it does help, but the source is still lacking. Thats the way I understand its going to be, and not what we have now.
a b U Graphics card
June 19, 2009 12:43:17 PM

"Multi-threaded rendering? "But," you’re saying, "we’ve had multi-core CPUs for several years now and developers have learned to use them. So multi-threading their rendering engines is nothing new with Direct3D 11." Well, this may come as a surprise to you, but current engines still use only a single thread for rendering. The other threads are used for sound, decompression of resources, physics, etc. But rendering is a heavy user of CPU time, so why not thread it, too? There are a several reasons, some of them related to the way GPUs operate and others to the 3D API. So Microsoft set about solving the latter and working around the former."

http://www.tomshardware.com/reviews/opengl-directx,2019...

Read this, itll explain it better than I.

"But as you can see, a large share of the workload was still on the main thread, which was already overloaded. That doesn’t ensure good balance, needed for good execution times. So, Microsoft has introduced a new interface with Direct3D 11: a programmer can create one Device object per thread, which will be used to load resources. Synchronization within the functions of a Device is more finely managed than in Direct3D 10 and is much more economical with CPU time."



a b U Graphics card
June 19, 2009 3:32:01 PM

Just my humble opinion, but no, I don't think DX 11 will end the dreaded MS issue. It's a hardware bug. SLI is over 10 years old and still, it's problematic.

For the race car buffs, why would you put a second engine in a car hoping it'll mesh and synchronize perfectly with the tranny box?
The car will go faster of course but you introduce a new set of performance issues.

In most cases, isn't it more economical to get a single faster GPU that usually cost less than the two cards it outperforms? If SLI went away, went extinct like the Dodo bird, I wouldn't miss it.


a b U Graphics card
June 19, 2009 3:50:53 PM

hundredislandsboy said:
Just my humble opinion, but no, I don't think DX 11 will end the dreaded MS issue. It's a hardware bug. SLI is over 10 years old and still, it's problematic.

For the race car buffs, why would you put a second engine in a car hoping it'll mesh and synchronize perfectly with the tranny box?
The car will go faster of course but you introduce a new set of performance issues.

In most cases, isn't it more economical to get a single faster GPU that usually cost less than the two cards it outperforms? If SLI went away, went extinct like the Dodo bird, I wouldn't miss it.


Unfortunately or not SLI/Crossfire are here to stay.

I can't be sure, i suppose noone can, but I don't think they push SLI/Crossfire in order to sell more gpu's (at least not exactly). They push it becuase some people need/want that power and Nvidia/ATI are unable to cost effectively produce a GPU that can perform as fast as top tier SLI/Crossfire, so to still have products for that market they have SLI or corssfire.

While most people won't care, there are a few of us who want to 'improve on the best' and crossfire/sli is the only way to do that. I mean, consider how rediculously expensive a single GPU based on current tech would have to be in order to be as powerful as the X2 cards. It just doesnt make sence to produce such a high end tech that only a few will buy when they can produce mid high end gpu's and sell two to the high end instead of some super GPU that is useless for everyone but the enthusiasts.

They are in the business of making money afterall. I mean, I'd love to have a single card as powerful as my double 4890's.. but it just isnt worth the R&D to make a card that massive only to sell it to a fraction of the market at a price even fewer would want to pay. Even now we see lower GPU's in pairs outperforming higher end gpu's for less money. If such a mythical supr single gpu card was around, say a 4990, and it was indeed as fast as a 4890 crossfire it would cost easily 2.5 times as much as the crossfire alternative. (two 4850's are a good example of this. they easily out pace a 4890 but cost significantly less in a pair than the 4890's are single)

It is all about the bottom line. If one does not want a SLI/crossfire rig then there is no need to buy into one. But I don't think we will ever see a single GPU card aimed at the enthusiast sector again. Just costs too much for the user and for the developement. We are, for years to come, going to live in a world where the most powerful GPU's are equivalent to the 275/4890 and enthusiasts will just have to buy 2, 3 or 4 of them to get the most out of it.

Now, back on topic.. I have had 3 sli or crossfire setups (1 sli, 2 crossfire). I have never in my time with these rigs seen anything like the microstuttering you see on the videos on the net. I don't know if I am lucky, or if it is indeed all about a balanced system.. But I don't even consider it when I make a purchase, it simply is a non issue to me, while it may be for others.
a b U Graphics card
June 19, 2009 4:11:38 PM

Im of the opinion that MS is seen by only certain people.. Of course, first it has to exist on those peoples rigs, but also, just like some cant stand even the display rate at 60 cycles, so too do these people see MS. Id guess youd put it down ro an unfortunate ability heheh.
Now, having a poorly balanced system will obviously make MS occur at a much higher level, and also, be detectable by most people as well. Thats just MHLO tho
a b U Graphics card
June 19, 2009 4:52:03 PM

Heres the argument though jaydee, I don't belive the CPU is the cause of MS, and thus even if extre CPU utilization and threading could improve performance, i don't think it will end MS.
a c 130 U Graphics card
June 19, 2009 5:39:47 PM

Got to say what gamerk316 is saying does make more sense to me know i have had some time to think it over.
If it was hardware based, involving the CPU then in a lot of cases simply turning on V-sync would cure it. Also from a CPU point of view you are just standing still to go forwards anyway. If it allows more threads to be run on the hardware and enables MT with the CPU then you are getting more threads feed from more sources, which is the same difference, isn't it.
Its much more likely from what i know of it that its something to do with the rendering timing/process like Gamerk316 is saying.
It could all come down to poorly coded software in the end of the day, a Console port that hasn't been tuned to allow for the difference in Hz between the set plodding along of the Console would cause juddering graphical anomalies which could easily be compounded by throwing dual cards at it.
That's just one of several coding issues that would cause issues similar to MS and thats before we even get to the hardware.

Mactronix
a b U Graphics card
June 19, 2009 5:59:33 PM

Its been shown that the timing is thrown off. If its pci, then well know when the i5s come out, as the response time will be cut quite a bit. But, again, reread what I posted of the Toms article. The string is loaded, the gpus are idling, or every once in awhile, one is anyways. Toms states this. It wasnt Toms article that got me thinking this either, it was to show what I meant, and was trying to say, in a better form.

As to plopping in a old P4, this is where it changes, even for them. In essence, it is the OS thats going to change things, or more precisely, the DX, with Vista/W7. If it were simply the data flowing thru the pci lanes, then wed see it all the time, wouldnt we, being, if the pci lanes couldnt keep up?

Some accesses take longer than others, and if the strings full, theres going to be lulls on the other end, and choppiness. Change that scenario, and the cpu is handling everything in a much more timely manor. Its the way the data stream is currently constructed and delivered vs whats to come.

You yourself are of the notion that MT in games exist, but while that may be true for some things, heres where I disagree with you:

You said : "More and more, we are seeing games taking advantage of more cores, yet even in quad optimized games (Lets use GTA:IV) we see MS."

While I tried to say what Toms article said much more clearly:


""Multi-threaded rendering? "But," you’re saying, "we’ve had multi-core CPUs for several years now and developers have learned to use them. So multi-threading their rendering engines is nothing new with Direct3D 11." Well, this may come as a surprise to you, but current engines still use only a single thread for rendering. "


You also said

" Also remember, even in non-threaded apps (there are very few "single threaded" apps), the second core will be used simply due to the way the OS handles the data. "


While Toms says

"But as you can see, a large share of the workload was still on the main thread, which was already overloaded. That doesn’t ensure good balance, needed for good execution times. "

So, where are these very few apps, single threaded apps?
a b U Graphics card
June 19, 2009 6:10:15 PM

Mac, v sync doesnt alter the data stream. And show me where youll see MS at 60 fps?
a b U Graphics card
June 19, 2009 6:17:12 PM

Everything has to be sorted and presented. If theres a long line being moved very quickly, theres bound to be anomilies in that line, or stream, as theyre not all the same, and all dont require the same resources.

Add more lines, moving at the same speed, a much more even flow occurs. CF/SLI cards are only doing half as much work for the same workload. Im not saying in the end it couldnt be drivers, Im just trying to present a case here thats to me very plausible
a b U Graphics card
June 19, 2009 6:31:43 PM

I don't believe in the Loch Ness Monster or Sasquatch but I believe MS exists. I've seen it on my rig in Crysis, Far Cry, CoD4. After tweaking endless settings and failing to rid MS, it must have been ghosts in the machine. That, or my RAM wasn't SLI certified, lol.
a c 130 U Graphics card
June 19, 2009 6:36:19 PM

V-sync dosent alter the data stream granted but it alters the way its presented. You can just as easilly get a jittery effect with all the buffers saturated and V-sync enabled. If it was a hardware timing issue as in CPU -buffers -GPU then V-sync would iron it out in alot of cases.
However its obviously a deeper issue than that and for my money its much more likley to be based aroung timings and syncronisations within the rendering process than by lack of CPU power.
You may be correct and adding the MT effect may help to smooth the problem out.
Guess we are at wait and see time. When DX11 gets properly introduced and we get some systems with DX11 software and hardware we will be able to tell if your assesment is correct or not.

Where did i say anything about MS at 60 fps ?

Mactronix
a b U Graphics card
June 19, 2009 7:13:58 PM

OK, first, MS doesnt occur until lower fps occur. v-sync is usually synced with monitor , or 60 cycles, and thats what I meant.
Heres what I see as my point not getting thru. Its not that Im saying the cpu isnt fast, or powerful enough, but the data required for its reaction is piped in 1 long stream, 1 that is , as Toms put it, overloaded. I think thats where you may be misunderstanding me here. Due to the MT and the texture compression available on DX11, the calls will be dispersed, and easier for the cpu to get its hands on the data, not that the cpu is inaduquate. Its just going to be a much more even flow of data TO the cpu.
But evidence of cpu strain can be glimpsed by the fact, often a quad will do better than a dual, because even if the game is MT optimised, those particular opts are for other cores anyways, and having a dual, has no where to go but the 2 cores available, thus the overloading can be seen there. Thats our current scenario, which will be radically changed with the onset of DX11, where well see AI for 1 core as we have now, plus a multi sourced flow of data from the primary thread, which can be dispersed.

Like I said, in CF/SLI, the cards have only half the work to do, so the buffers arent overloaded on the gpu side. Since cpus are serial in nature, and use cache, itd be hard for me to believe theyd be overloaded either. Its the flow
!