The Architecture in Detail
Like Nvidia, AMD has chosen to build on its previous architecture rather than starting from scratch. It’s very much the same as that of the R600, which had already been re-used for the RV670.
The architecture initially introduced with the Xenos, which is the same GPU used in the Xbox 360, is based on a group of SIMD arrays. The Xenos had three SIMD arrays, and the R600 and RV670 have four. The RV770 goes much further with ten.
As you’ve deduced, each SIMD array contains 80 ALUs since the GPU has 800 ALUs. That’s true, but it’s a slightly simplified view of reality. In practice, the 80 ALUs are not independent of each other. They’re grouped together in five-way VLIW units – 16 units per SIMD array.This organization implies certain restrictions on the instructions executed; each of the five instructions of a VLIW bundle has to be independent from the others. It’s up to the compiler to find enough independent instructions to saturate the ALUs – unlike the G80, which uses a more "hardware" solution.
Here’s an example to illustrate what we just described:
- I1 FADD R1, R1, 3.14
- I2 FMUL R2, R1, 1.41
- I3 FMAD R3, R0, 0.5, 0.5
In this case, Instructions 1 and 3 can share the same bundle, but not Instruction 2, which depends on the result of Instruction 1. If the compiler can’t find enough operations in its window of instructions, it has to fill the bundle with NOP instructions that don’t do anything, thus reducing the chip’s performance. What all that adds up to in the present case is that Nvidia ALUs will hit their peak performance more often because they’re less dependent on the underlying code; but the down side is that they’re much more costly in terms of transistors. AMD’s units depend strongly on the compiler’s performance (the compiler that’s “internal” to the driver, which reorganizes the assembler instructions generated by the HLSL), but AMD can afford to include a much larger number on a die that’s still significantly smaller.
The VLIW units themselves haven’t been heavily reworked; there are four units capable of executing a FMAD or an integer addition and a special unit capable of executing either a FMAD or an integer multiplication, or a transcendental function (sine, cosine, log, exp, etc.) The only real improvement is to bit-shifting operations in integers, which can now be handled by any of the five units, whereas on the 2900/3800 only the special unit could perform these operations. Rather than make them more powerful, AMD has concentrated on optimizing them in order to reduce their size on the die to be able to fit more of them on the device.
$450 in Best Buy for a GTX 260.
And the 4850 is pretty close to the 280.
Ouu the 4870 is going to give Nvidia a run for there money
for the first time in a while.
P.S. +1000 -> 2222
MaxSmoothedFrameRate=62 in the Engine.GameEngine section
"it was unavailable due to the sloppy handling of this launch"
Seriously? AMD can't control if their retail partners screwed the pooch on the release date, because they were so anxious to get people this great product. They made sure the product was readily available well before the launch date.
They should be praised for not having a paper launch, not told that it was a sloppy launch, very poor form saying that.
Hell i went to best buy and bought 2 4850's on sunday, when the cards weren't even supposed to be available yet, the guy told me "they have been in stock for over a month in the back, they aren't supposed to be available yet but i can get two for you." Were the AMD police supposed to come and smack best buy on it's hand and keep me from giving them profits?
Sorry if i'm ranting, just put the blame where it belongs.
In french, but the graphs talk by themselves. Ho, and if you want a short translation = impressive and incredibly more efficient than Nvidia (if you compare the size of the GPU, yes it's A LOT more efficient)