ATI Stream: Finally, CUDA Has Competition

The “Balanced Platform”

Despite that the phrase “balanced platform” just aches of being something that emerged from a marketing round table, it’s worth pausing here to explain some good thinking on AMD’s part. If you harken back to our CUDA coverage, you’ll recall that one of our greatest praises for Badaboom was that it exhibited some of the lowest CPU usage from the field of competing CUDA encoders—“low” being utilization in the 60% instead of 95% range. Transcoding is incredibly compute-intensive, and the de facto reasoning among software vendors seems to be that users want their jobs done as quickly as possible. If that means sacking the CPU and GPU concurrently to the exclusion of practically all other tasks, so be it. We have yet to see a Stream or CUDA application with a resource utilization slider.

But that doesn’t mean hardware vendors have to take the same approach. AMD’s “balanced platform” concept tries to leverage several major components of the system and spread the load as evenly as possible, doing the same work in the same time but leaving enough overhead for other applications to still function normally. That sounds pretty, but how well does it pan out in real life?

I hit on an excellent example of the balanced platform at work in our very first Espresso test, which involved taking a YouTube HD video (MPEG-4, 1280x720) and transcoding it into the iPhone 640x360 profile, which is also H.264 MPEG-4. The two images below show performance with the HD 4890. On the left, you see the test run without GPU acceleration, and on the right you have Stream enabled. You can see that with CPU-only encoding, all four cores of our Phenom II are practically maxed and GPU-Z registers the GPU load at a fairly steady and minimal 6%, which probably represents some element of the UVD pipeline being exercised during transcoding. In the GPU-enabled capture, you see a typical pattern emerge. Core 2 stays hammered (we don’t know why so many encoder utilities lean on this particular core) but cores 1, 3, and 4 drop down to sub-50% levels while the GPU load jumps.

The obvious next question is whether Nvidia, without a so-called balanced platform at its back, delivers similar results under CUDA. The question is a bit tricky to answer given our test setup because GPU-Z fails to display the GPU Load readout with the Nvidia card installed. Still, we can infer some things by CPU results and final performance. You’ll note in the left shot that CPU-only utilization looks topped out—very similar to AMD. When we add CUDA into the mix (right image), CPU utilization remains almost unchanged.

So is CUDA doing anything? You bet. There’s a 35% drop in output time with GPU acceleration enabled, so CUDA is plainly throwing everything it can at the job.  But here’s the interesting part: AMD and Nvidia show essentially the same encode time in CPU-only mode, but Stream yields a 108% performance gain, easily trouncing the CUDA result while averaging 40% less CPU utilization than CUDA.

As we’ll see, this is not a universal result. There are times when CUDA shines brighter. But tests like this show that AMD’s balanced platform statements are based on real benefits, not empty marketing meant to sell processors.

  • radiowars
    So..... TBH they both work pretty well, I hope that we don't start a whole competition over this.
  • falchard
    Did someone necro an old topic? I think ATI has been talking about ATI Stream for a while. I know atleast a year since FireStream.
  • cl_spdhax1
    arcsoft simhd plugin is currently only enabled for n- cuda graphic cards.
  • Andraxxus
    They're good but hopefully they will manage to improve them more. Competition is good for business.
  • DjEaZy
    ... why just now talk about? I use it sins Catalyst 8.12...
  • IzzyCraft
    Stream is old but not nearly as old and compatible as CUDA I'd get it a year or two more when more capable cards circulate the market and trickle down to the people before i would call it competition.

    Well it's good to see more then just 1 app that supports it.
  • ThisIsMe
    Just for the sake of it, and the fact that many pros would like to know the result, it would be nice to see comparisons like this using nVidia's Quadro cards vs. ATI's FirePro cards.
  • ohim
    why use 185.85 since those drivers are a total wreck

    13 pages with ppl having different problems with that driver
  • I think the second graph on the "Mixed Messages" page isn't the right graph.

    It's the same graph from the following "Heavier Lifting" page instead of a graph for the 298MB VOB file that should be shown?
  • Spanky Deluxe
    Stream and CUDA are likely to go the way of the dodo soon though. OpenCL's where its at. Unfortunately its a tad hard to get programming with it right now since you need to be a registered developer on nVidia's Early Access Program or you have to be a registered developer with Apple's developer program with access to pre-release copies of Snow Leopard.
    Virtually no one will bother using CUDA or Steam after OpenCL's out - why limit yourself to one hardware base after all? It'd be like writing Windows software that only ran on AMD processors and not Intel. Developers will not bother writing for both when they can just use one language that can run on both hardware platforms.