Sign in with
Sign up | Sign in
Your question

Stream processors [ait/nvidia question]

Last response: in Graphics & Displays
Share
March 30, 2009 5:05:34 AM

I was looking at some Radeon Cards a second ago.
Specifically the HD 4670.
And it says it has 320 "stream processing units".
But the GTX 295 'only' has 480.
So what gives?

I'm probably completely misunderstanding the whole idea of stream processors and stuff, but I felt compelled to ask.
March 30, 2009 5:41:35 AM

Simple answer: A stream processor is general object/term, not a specific one. If you were comparing the value of a group of 3 cars to a group of 2 cars, you wouldn't have enough information to make a decision. In the same way, you can't compare ATI SPs to Nvidia SPs by quantity.

Slightly more complicated answer:
Nvidia's SPs are designed to be able to do all types of calculations. ATI SPs are designed to do specific calculations, thus you need more because not all are being used all the time.

Even more complicated answer: (taken from http://forums.bit-tech.net/showthread.php?t=158055)

"The stream processors in Nvidia's and AMD's respective architectures are slightly different, but there's a reason for that. Nvidia's architecture is scalar, while AMD's architecture is superscalar. Basically, with Nvidia's architecture, you can throw any piece of code at it and it will run on as many of the stream processors as it needs - they all have the same functionality (bar the special function unit - there's one of those per eight stream processors but it's not part of the count) and are pretty generalised. It's a brute force method, although Nvidia would crucify me for saying that - you just throw code at it and it works everything out itself.

On the other hand, AMD's architecture has blocks of five (well, technically six) stream processors that have differing functionality. Four of them can handle FP MAD, FP MUL, FP/INT ADD and dot product calculations), while the fifth unit can't handle dot products, INT ADD or double precision calculations, but can handle INT MUL, INT DIV, bit shifting and transcendental calculations (SIN, COS, LOG, etc). It's a bit more complex, but if the code is optimised well, it can deliver much higher performance - that's why the FLOPS throughputs on the AMD chips are quite a bit higher, too, because they only take FP MAD and FP MUL into account and all of the units in the AMD chips can do those calculations (they're the most widely used).

I'd say the AMD architecture is a lot cleverer in many respects, but it does require a bit more work from the developer to achieve peak performance."
!