Sign in with
Sign up | Sign in

Reworked Streaming Multiprocessors

Nvidia GeForce GTX 260/280 Review
By , Florian Charpentier

Aside from their increased number, each multiprocessor has undergone several optimizations. The first is the increased number of active threads per multiprocessor – from 768 to 1,024 (from 24 32-thread warps to 32). A larger number of threads are especially useful for masking the latency of texturing operations. For the totality of the GPU the increase is from 12,288 active threads to 30,720.

The number of registers per multiprocessor has doubled – from 8,192 registers to 16,384. With the concomitant increase in the number of threads, the number of registers usable simultaneously by a thread has increased from 10 registers to 16. On the G8x/G9x, our test algorithm used 67% of the processing units; on a GT200 that figure would be 100%. Combined with the two texture units, performance should be substantially higher than with the G80 we used for our test. Unfortunately, CUDA 2.0 requires a driver that’s still in a beta version and doesn’t recognize the GeForce 200 GTX. As soon as the main branch of the drivers adds support, we’ll redo the test.

gtx 260 280

That’s not the only improvement made to the multiprocessors: Nvidia announces that they’ve optimized the dual-issue mode. You’ll recall that since the G80, multiprocessors are supposed to be able to execute two instructions per cycle: one MAD and one floating MUL. We say “supposed to” because at the time we weren’t able to see this behavior in our synthetic tests – not knowing if this was a limitation of the hardware or the drivers. Several months and several driver versions later, we now know that MUL isn’t always easy to isolate on the G80, which led us to believe the problem was at the hardware level.

But how does dual-issue mode operate? At the time of the G80 Nvidia provided no details, but since then, by studying a patent, we’ve learned a little more about the way instructions are executed by the multiprocessors. First of all the patent clearly specifies that the multiprocessors can only launch execution of a single instruction for each GPU cycle (the “slow” frequency). So where is this famous dual-issue mode? In fact it’s a specificity of the hardware: One instruction uses two GPU cycles (four ALU cycles) to be executed on a warp (32 threads executed by 8-way SIMD units), but the front end of the multiprocessor can launch execution of one instruction at each cycle, provided that the instructions are of different types: MAD in one case, SFU in the other.

In addition to transcendental operations and interpolation of the values of each vertex, the SFU is also capable of executing a floating-point multiplication. By alternating execution of MAD and MUL instructions, there’s an overlap of the duration of the instructions. In this way each GPU cycle produces the result of a MAD or a MUL on a warp – that is, 32 scalar values. Whereas from Nvidia’s description you might expect to get the result of a MAD and a MUL every two GPU cycles. In practice, the result is the same, but from a hardware point of view it greatly simplifies the front end, which handles launching execution of the instructions, with one starting at each cycle.

gtx 260 280

What was it that limited the ability to do this on the G8x/G9x and has been corrected on the GT200? Nvidia, unfortunately, isn’t specific about that. They simply say that they’ve worked on such points as register allocation and scheduling and launching of instructions. But you can rely on us to pursue our investigation. Now let’s see if the changes Nvidia has made are useful in practice in a synthetic test – GPUBench. Benchl For purposed of comparison we’ve added the 9800 GTX’ scores to the graph. This time it’s clear; you can see the higher rate for MUL instructions compared to MAD instructions. But we’re still a long way from doubled values, with a gain of approximately 32% compared to the rate for MAD instructions. But that will do for now. We should mention that the results for DP3 or DP4 instructions shouldn’t be taken into account, since the scores aren’t consistent. The same goes for the results for POW instructions, which are probably due to a driver problem.

The last change made to the Streaming Multiprocessors is support for double precision (floating-point numbers on 64 bits instead of 32). Let’s be clear – the additional precision is only moderately useful in graphics algorithms. But as we know, GPGPU is taking on more and more importance for Nvidia, and in certain scientific applications, double precision is a non-negotiable demand!

Nvidia is not the first company to take note of that. IBM recently modified its Cell processor to increase the performance of the SPUs for this type of data. In terms of performance, the GT200 implementation leaves something to be desired – double-precision floating-point calculations are managed by a dedicated Streaming Multiprocessor unit. With a unit capable of executing one double-precision MAD calculation per cycle, we get a peak performance of: 1.296 x 10 (TPC) x 3 (SM) x 2 (Multiply+Add) = 77.78 Gflops, or between 1/8th and 1/12th of the single-precision performance. AMD has introduced support by using the same processing units over several cycles, with noticeably better results – only between two and four times slower than single precision calculations.

Display all 149 comments.
This thread is closed for comments
  • -8 Hide
    BadMannerKorea , June 16, 2008 1:19 PM
    FIRST OMFG NVIDIA pwns!
  • 8 Hide
    Lunarion , June 16, 2008 1:40 PM
    what a POS, the 9800gx2 is $150+ cheaper and performs just about the same. Let's hope the new ATI cards coming actually make a difference
  • 7 Hide
    foxhound009 , June 16, 2008 1:56 PM
    woow,.... that's the new "high end" gpu????
    lolz.. 3870 x2 wil get cheaper... and nvidia gtx200 lies on the shelves providing space for dust........
    (I really expectede mmore from this one... :/  )
  • 6 Hide
    thatguy2001 , June 16, 2008 2:02 PM
    Pretty disappointing. And here I was thinking that the gtx 280 was supposed to put the 9800gx2 to shame. Not too good.
  • 4 Hide
    cappster , June 16, 2008 2:06 PM
    Both cards are priced out of my price range. Mainstream decently priced cards sell better than the extreme high priced cards. I think Nvidia is going to lose this round of "next gen" cards and price to performance ratio to ATI. I am a fan of whichever company will provide a nice performing card at a decent price (sub 300 dollars).
  • 6 Hide
    njalterio , June 16, 2008 2:07 PM
    Very disappointing, and I had to laugh when they compared the prices for the GTX 260 and the GTX 280, $450 and $600, calling the GTX 260 "nearly half the price" of the GTX 280. Way to fail at math. lol.
  • 6 Hide
    NarwhaleAu , June 16, 2008 2:09 PM
    It is going to get owned by the 4870x2. In some cases the 3870x2 was quicker - not many, but we are talking 640 shaders total vs. 1600 total for the 4870x2.
  • 5 Hide
    MooseMuffin , June 16, 2008 2:11 PM
    Loud, power hungry, expensive and not a huge performance improvement. Nice job nvidia.
  • 6 Hide
    compy386 , June 16, 2008 2:17 PM
    This should be great news for AMD. The 4870 is rumored to come in at 40% above the 9800GTX so that would put it at about the 260GTX range. At $300 it would be a much better value. Plus AMD was expecting to price it in the $200s so even if it hits low, AMD can lower the price and make some money.
  • 0 Hide
    vochtige , June 16, 2008 2:23 PM
    i think i'll get a 8800ultra. i'll be safe for the next 5 generations of nvidia! try harder nv crew
  • 3 Hide
    cah027 , June 16, 2008 2:24 PM
    Looks like ATi might have a fighting chance of catching up to Nvidia. Hopefully this will help AMD out as a company.
  • 4 Hide
    Anonymous , June 16, 2008 2:25 PM
    I am fairly Dissapointed I thought Nvidia would go for the High end Market with great performance and a lot of money, but it's only a lot of money
  • -5 Hide
    baracubra , June 16, 2008 2:28 PM
    Finally!!! woot
  • 4 Hide
    sailormonz , June 16, 2008 2:31 PM
    I don't believe quad sli with the 9800GX2 works too well, therefore these cards may be best suited for a person with tons of money to waste and wanting a SLI system with top of the line cards. These results were rather disappointing however.
  • 3 Hide
    Annisman , June 16, 2008 2:54 PM
    My 9800GX2 looks like it's gonna be staying in my case for at least another month or two. (4870x2 anyone?)
  • -5 Hide
    RaZZ3R , June 16, 2008 2:57 PM
    what a pice of s***, nVidia what hapend to you, are you going the same path like AMD ???. and about the review: wher is the info about the integrated Ageia PhsyX in the GTX280 and 260 ... more info god dame it, and dont tell me about CUDA because that is software. I want hardware info and capabilities and some screen shot for god sake.
  • 2 Hide
    neodude007 , June 16, 2008 2:59 PM
    Boooo I want more power.
  • 8 Hide
    mr roboto , June 16, 2008 3:10 PM
    I've had my 8800GTX for almost a year and a half and it still owns. My card was $499 when I bought it and was the most expensive card I'll ever buy. However, it's looking more and more like a great investment. This GTX280 is disappointing to say the least. I would love to see ATI jump back in the game!

    What's funny is they might actually compete with this card with out even meaning to.
  • 5 Hide
    zarksentinel , June 16, 2008 3:12 PM
    this card is not worth the money.
  • 9 Hide
    dragoncyber , June 16, 2008 3:12 PM
    Dear Nvidia,

    I am obviously not the first one to state that they are entirely angered by the results of the recent GTX280 benchmarks. I have been an Nvidia customer since the original Geforce series, always trusting the reliable green team to be at the forefront of the graphics race.

    Instead this time I am completely thrown for a loop as Nvidia expects us to pay 600.00-650.00 for a (New Generation) graphics card release that performs in most cases lower than a 9800GX2 which at the time of this report is actually 150-200 dollars cheaper,and once released will most likely drop even further.

    Not to mention the 9800GTX which is basically on par with an 8800 ultra is down below 300.00!! Tri-SLi anyone?? Three 9800GTX's would out perform "Quad Sli 9800GX2's" and this has already been proven on several sites (AnandTech..Hello???)WTF is wrong with Nvidia?? Tri has already been posting great numbers!!

    So obviously knowing this going into a release you would want to put something on the table that would blow the doors off your current best configurations. Instead we are handed mere 15-20% gains in some situations, and actually being beat in others. And of course they didn't post SLI tests here for this report.

    The next thing that really gets my goat is selling a GTX260 card for 200-250 dollars less, and it has only an 18-20% decrease in performance as compared to your new shiny Top-Of-The Line. With a little over clocking and some good cooling it's the same card in performance tets. What gives??

    Now the biggest embarassment for Nvidia is that they are pushing CUDA technology and folding at home client CRAP!! These cards are designed for gaming!! Who gives a crap about how fast they can cure cancer or render 3D medical images and scans of the human body. Maybe 10% of the cards produced will be used for this purpose..the rest will be for GAMING!! What in the hell are we even talking about that junk for in this article?? Does that even matter when really what everyone cares about is will it beat the crap out of Crysis!!?? Will it Provide me solid gaming for the next 2 years?? Is it worth my hard earned money??

    So far I am in AWE...yes..but not the good AWE, the bad one, the I can't believe this is happening here comes the BIG RED DRAGON(ATI)breathing fire and brimstone , I'm a scared little peseant in a village AWE!!
    I am this close to throwing away my 790i board, putting my 8800GT SLi'd on Ebay, and switching over to a crossfire platform.

    Again I cannot state again how utterly disappointed I am at this turn of events, and the worst part is that there will be no way they could do a GTX280 X2 card because the manufacturing process and die size are too huge, and too hot to combine them. They would literally have to redesign the entire thing, basically doing a new release all over again.

    I'm afraid to say it but I think this will be like one of those boxing matches where, the underdog is in the corner getting punched on left and right...Then something clicks, and he decides enough is enough, and he throws a BIG left that hits the other guy right in the jaw. He doesnt see it coming, and has he hits the mat he thinks.."This guy still has some fight in him". Personally I hope ATI comes out on top with the 4870 X2. That will make Nvidia realize thay can't sit around in their offices all year playing Nerf Hoops, and living off the fat of their successes over the last 4 years.They have gotten fat and lazy, and this card shows it all over the place.

    I am however going to wait, and see if the driver improvements make any difference at all in the upcoming weeks to a couple months. I have a 8800GT Sli system, and I'm sitting pretty as far as I am concerned. I Still consider the 8800 GT 512mb to be one of the best graphics cards ever made, and a combo of them is hard to beat. So I will wait and see what happens.

    Best of Luck to any who choose to buy it upon release, I think soon you will wish you had waited just a bit longer. the price will drop drastically once all the Tech sites get there reviews out.
Display more comments