Sign in with
Sign up | Sign in
Your question

AMD's RV770 to have 800 stream processors?

Last response: in Graphics & Displays
Share
March 14, 2008 2:45:54 PM

http://techreport.com/discussions.x/14344

wow, would this be pwnage or what?

More about : amd rv770 800 stream processors

March 14, 2008 3:10:52 PM

If game developers code for them, or AMD makes good drivers to utilise them.
March 14, 2008 3:20:40 PM

Look at the first comment, they sum up the counter argument quite well:

' I think it's disingenuous of AMD to say "scalar" processors; they are vector processors. Why does it matter? Because each group of 5 "scalar" processors can only process one vector at a time, even if that vector doesn't have as many as five dimensions.

So in reality, the RV670 has 64 vector processors, and the G80 has 128 scalar processors. Since those scalar processors also work at 1.35GHz instead of 775MHz, the RV670 would need the vectors to average about 3.5 dimensions to have equal resources (not counting memory).

I think having 160 vector processors would be impressive in and of itself, without needing to call them "800 scalar processors." Having said that, I seriously doubt they can do that with only about 30% more die space on the same process node. It would be much more plausible for the R700 to have 160 vector processors spread over two RV770 chips.'
Related resources
a c 130 U Graphics card
March 14, 2008 7:03:20 PM


That would be all well and good, However I beleive that Veculous is wrong in what he is saying. I remember reading an archetecture review that said the 5 scalar units can indeed process a thread each per clock. Beyond 3D i think it was and they are pretty clued up. I will see if i can find it and post it if you want ?
Mactronix
March 14, 2008 7:52:07 PM

Wow, quality over quantity; go ATI
a b U Graphics card
March 14, 2008 8:31:51 PM

I dont know the particulars, but in order for the fifth one to actually work in one cycle, it has to be written in a certain way, which some games have alot of, most dont. So, in essence, the 5 scaler units are sometimes great, oft times so so. Anyone with a wider knowledge could maybe explain this in more depth, as I only grasped a little of this
March 14, 2008 9:07:50 PM

Mactronix: I' d like to see that. I have never really read good article specifically contrasting ATi's and nVidia's approach to stream processors or scalar-ALUs.

From what I understand, nVidia (G80>) ALUs can process any type of math input (vertex, geometry, pixel...) per cycle, whereas ATi's (R600>) ALUs are dedicated to one type of input. As a result of their less elegant approach, ATi's 'stream processing' unit includes 5 ALUs (vertex shader data often has five components and pixel shader data four). Thus most people roughly compare nVidias SPs to ATi's SPs by simply dividing ATi's number by 5.

I remember something on The Tech Report discussing R600 vs G80 and shader performance. I think the moral of that story was R600 had a lot of potential, a lot of pure computational power, but needed a lot of help from its compiler (drivers).
a b U Graphics card
March 14, 2008 9:38:32 PM

mactronix said:
That would be all well and good, However I beleive that Veculous is wrong in what he is saying. I remember reading an archetecture review that said the 5 scalar units can indeed process a thread each per clock. Beyond 3D i think it was and they are pretty clued up. I will see if i can find it and post it if you want ?
Mactronix


Yes it depends on the math type as to how many scalar operations can be done.

One only needs to look at GPGPU processing in F@H to see that the RV670 outpaces it's rivals despite this supposed lack of computational power.

I all depends on the operations, just like comparing a group of kids with 5 calculators 4 of which that can do +-X/ and one that can do mroe functions including rots and quares/cubes; versus two with calcs that can do square/cube also running twice as fast. If they're all just adding n+/-1 20 times, then the the 5 kids will outperform the 2 kids consistently, if they do power/root equations then the 2 kids will outperform. However if you combine the operations then it depends on where the focus of the operations are.

I think this is the segment you were thinking of;
http://www.beyond3d.com/content/reviews/16/8
a b U Graphics card
March 14, 2008 9:40:53 PM

To see them in their different computational strength look at the TechReports original R600 review they run tests that run through a few different operational load situations to show the variability;

http://techreport.com/articles.x/12458/3
a b U Graphics card
March 14, 2008 9:41:58 PM

From the tech report :Let's stop and run some numbers so we can address the stream processor count claimed by AMD. Each SIMD on the R600 has 16 of these five-ALU-wide superscalar execution blocks. That's a total of 80 ALUs per SIMD, and the R600 has four of those. Four times 80 is 320, and that's where you get the "320 stream processors" number. Only it's not quite that simple.

The superscalar VLIW design of the R600's stream processor units presents some classic challenges. AMD's compiler—a real-time compiler built into its graphics drivers—will have to work overtime to keep all five of those ALUs busy with work every cycle, if at all possible. That will be a challenge, especially because the chip cannot co-issue instructions when one is dependent on the results of the other. When executing shaders with few components and lots of dependencies, the R600 may operate at much less than its peak capacity. (Cue sounds of crashing metal and human screams alongside images of other VLIW designs like GeForce FX, Itanium, and Crusoe.)

What Im wondering is, if these numbers are meant to be believed,the 800, then, like was said earlier, there needs to be more room on the gpu, or simply put, enough transistors. Im wondering, if ATI dropped their tessalation , wouldnt that bring the required room for this in their new arch?
a c 130 U Graphics card
March 14, 2008 10:04:20 PM


Reading on though they do say that the compiler has the possability of maturing and giving better yeilds. Also it says they have done tests that prove as some on the forum have sugested in the past, If its programed properly it can get very close to its potential.
As you say though its a case of IF these numbers are correct or not. Hopefully they have increased the throughput from the compiler for these cards ?
I am far from an expert when it gets this in depth, just wanted to point out where the quoted post was incorrect.
Mactronix
March 14, 2008 10:14:34 PM

IS this 800 though ona dual card? So its 400x2? I think this is more reasonable.
March 14, 2008 10:21:29 PM

^ No, rumor has it that RV770 is the GPU, we do not know how many cores are in the die/package. 4870x2 will incorporate 2 RV770s.
March 14, 2008 10:28:34 PM

Well despite this I still think it is likely they are refering to the dual card for the 800SP's. Can't see 800 going onto one card.
a b U Graphics card
March 14, 2008 11:04:08 PM

But, being that 800 is truly only 160, with 128 already being done, plus die shrink, plus new arch (possible reduction in other areas like tessalation) it could very well be true.
March 14, 2008 11:25:32 PM

More likely, this '800 SPs' rumor possibly confirms that RV770 has 2 cores with 400 SPs each combined into one package. This way it is quite possible for ATi to up their SP count by 133% and achieve the reported die sizes.
a b U Graphics card
March 14, 2008 11:29:43 PM

Thats quite possible, as Im buying into this rumor
!