Can OpenGL And OpenCL Overhaul Your Photo Editing Experience?

Q&A: Under The Hood With Adobe, Cont.

Tom's Hardware: Within Photoshop, what limits exist in terms of what you can do with these APIs?

Russell Williams: With some things, we can look at them and know they're not suited for OCL or the GPU in general. In other cases, it's only after we expend some effort, by implementing it, that we discover we're not going to get the speed-up that will justify the effort. It's well known that the GPU is completely suited for certain kinds of things and completely unsuited for others. I believe it was AMD that told us that that while a GPU can speed up the things it is suited to by several hundred times compared to the CPU, a problem that it is ill-suited to GPUs, something inherently sequential, then it might be 10 times slower.

Some people think, “If the GPU is so much faster, then why not do everything on the GPU?” But the GPU is only suited for certain operations. And for every operation you want to run on there, you have to re-implement it. For instance, we accelerated the Liquify filter with OGL, not OCL, and that makes a tremendous difference. For large brushes, it goes from 1 to 2 FPS to being completely fluid, responsive, and tracking with your pen. That kind of responsiveness for modifying that much data could only be done on a GPU. But it took one engineer most of the entire product development cycle for CS6 to re-implement the whole thing.

Tom's Hardware: Which gets us back to why was only one feature implemented in OCL this time around. You don't have an infinite number of developers and only one year between versions.

Russell Williams: That's right. And, of course, we have an even more limited supply of developers that already know OCL.

Tom's Hardware: Did graphics vendors play a role in your OpenCL adoption? Education, tools, and so forth?

Russell Williams: We didn't have much input on creating the tools they had. But they gave us a tremendous amount of help in both learning OpenCL and in using the tools they have given us. Both Nvidia and AMD gave us support in prototyping algorithms, because both of their interests are to make more use of the GPU. For us, the big issue is where the performance is. We can't count on a particular level of GPU being in a system. Many systems have Intel integrated graphics, which have more limited GPU and OpenGL support, and no OpenCL support. A traditional C-based implementation has to be there, and it’s the only thing we can count on being there. On the other hand, if something is performance-critical, the GPU is really where most of the compute power is in the box.

Beyond that, AMD had their QA/engineering teams constantly available to us. We had weekly calls, access to hardware for testing, and so on. Nvidia and Intel helped, too, but AMD definitely stepped up.

Tom's Hardware: So which company has the better products, AMD or NVIDIA? [laughs]

Russell Williams: You're Tom's Hardware. You know that depends on what you're running and which week you ask the question—and how much money you have in your pocket. That's why it is so critical for us to support both vendors. Well, three if you include Intel integrated graphics, which is starting to become viable.

Tom's Hardware: At some point, performance bottlenecks are inevitable. But how far out do you look when trying to avoid them? Do you say, “Well, we’re already getting five times better performance—that’s good enough!” Or do you push as far as possible until you hit a wall?

Russell Williams: We do think about that, but it’s very hard. It’s impossible to know in a quantitative way, ahead of time, to know what those will be. We know qualitatively that we should spend a lot of time on bandwidth issues. Photoshop is munging pixels, so the number of times that pixels have to be moved from here to there is a huge issue, and we pay attention to every time in the processing pipeline that happens. Quite often, that’s more the limiting factor than just computation being done on the pixels. In particular, when you have discrete graphics, there is an expensive step of moving the pixels to the graphics card and back.

Tom's Hardware: So the bus is usually your bottleneck?

Russell Williams: Yes, the PCI bus. I really expect in the future that APUs will require us to rethink which features can be accelerated. Particularly once APUs start using what some people call zero-copy. When you have an APU, you don't have to go across that long-latency PCI bus to get to the GPU. Right now, you still have to go through the driver, and it still copies from one place in main memory to another place in memory—the space reserved for CPU and another place reserved for GPU. They're working on eliminating that step. And as they make that path more efficient, it becomes more and more profitable to do smaller and smaller operations on the APU, because the overhead on each one is smaller.

On the other hand, APUs are not as fast as discrete GPUs. In some ways, it comes down to on-card memory bandwidth versus main memory. But you can also think of it as power budgets with discrete cards that are sucking down several hundred watts by themselves. You have to keep copying large things across this small pipe to this hairdryer of a compute device. It just depends. There has to be enough computation involved to pay for copying it out across the PCI bus and bringing it back.

This thread is closed for comments
53 comments
    Your comment
  • ilysaml
    Now Adobe uses both CUDA and OpenCL that's superb.
  • alphaalphaalpha1
    Tahiti is pretty darned fast for compute, especially for the price of the 7900 cards, and if too many applications get proper OpenCL support, then Nvidia might be left behind for a lot of professional GPGPU work if they don't offer similar performance at a similar price point or some other incentive.

    With the 7970 meeting or beating much of the far more expensive Quadro line, Nvidia will have to step up. Maybe a GK114 or a cut-down GK110 will be put into use to counter 7900. I've already seen several forum threads talking about the 7970 being incredible in Maya and some other programs, but since I'm not a GPGPU compute expert, I guess I'm not in the best position to consider this topic on a very advanced level. Would anyone care to comment (or correct me if I made a mistake) about this?
  • A Bad Day
    How many CPUs would it take to match the tested GPUs?
  • blazorthon
    A Bad DayHow many CPUs would it take to match the tested GPUs?


    That would depend on the CPU.
  • esrever
    Would be interesting to compare the i7 ivybridge against trinity in openCL
  • mayankleoboy1
    why no nvidia cards here?
  • mayankleoboy1
    any CUDA vs OpenCL benchmarks?
  • de5_Roy
    can you test like these combos:
    core i5 + 7970
    core i5 hd4000
    trinity + 7970
    trinity apu
    core i7 + 7970
    and core i7 hd 4000, and compare against fx8150 (or piledriver) + 7970.
    it seemed to me as if the apu bottlenecks the 7970 and the 7970 could work better with an intel i5/i7 cpu on the graphical processing workloads.
  • vitornob
    Nvidia cards test please. People needs to know if it's better/faster to go OpenCL or CUDA.
  • bgaimur
    vitornobNvidia cards test please. People needs to know if it's better/faster to go OpenCL or CUDA.


    http://www.streamcomputing.eu/blog/2011-06-22/opencl-vs-cuda-misconceptions/

    CUDA is a dying breed.
  • no intel or nvidia because for professional editing you need hardware capable of more than gaming...
  • A Bad Day
    blazorthonThat would depend on the CPU.


    2687W: 2P server CPU, 8 core (16 threads), 3.1 GHz (3.8 GHz turbo), and 20 MB of L3 cache.

    Cost per CPU: $1885
  • blazorthon
    nousernameno intel or nvidia because for professional editing you need hardware capable of more than gaming...


    Quadro, Tesla... These are graphics cards that are also capable of more than gaming, even if like alpha said above, many of them aren't always the very fastest such cards for compute performance anymore and most definitely aren't the fastest compute cards for the money.

    A Bad Day2687W: 2P server CPU, 8 core (16 threads), 3.1 GHz (3.8 GHz turbo), and 20 MB of L3 cache.Cost per CPU: $1885


    I'll have a look and see if I can find benchmarks to compare with those done in this article.
  • annymmo
    I'm hoping that OpenCL will make it possible to implement high demanding video codecs for smartphone GPU's.

    This would allow software vendors to implement their video format of choice everywhere while making it able to play fluently everywhere where it matters!
  • annymmo
    And being able to play video's fluently on computers with weak CPU's.
  • blazorthon
    annymmoAnd being able to play video's fluently on computers with weak CPU's.


    What semi-modern computer has a CPU so weak that it can't play video? Even a single core Atom CPU can play video without trouble. I'd be more worried about old GPUs (such as older Atom netbook GPUs and other weak GPUs) not always being able to play modern video very well, not CPUs. Heck, even my almost ten year old laptop with an old P4 is GPU limited in video, not CPU limited.
  • Yuka
    blazorthonWhat semi-modern computer has a CPU so weak that it can't play video? Even a single core Atom CPU can play video without trouble. I'd be more worried about old GPUs (such as older Atom netbook GPUs and other weak GPUs) not always being able to play modern video very well, not CPUs. Heck, even my almost ten year old laptop with an old P4 is GPU limited in video, not CPU limited.


    Prior to the HD3k, Intel wasn't able to play videos decently; only blocky and badly rendered pictures of something moving on the screen. Period.

    And no, unless the Atoms are on the ION platform, they can't play any video in more than SD format. Let alone apply filters for re-size.

    And to directly answer your question. Core2 Duos on laptops were not able to play videos decently and nothing before that was able to, where any iGPU from nVidia or AMD was able to prior to the C2D's in notebooks. I'm pretty sure in desktop was not that much different.

    Cheers!
  • blazorthon
    YukaPrior to the HD3k, Intel wasn't able to play videos decently; only blocky and badly rendered pictures of something moving on the screen. Period.And no, unless the Atoms are on the ION platform, they can't play any video in more than SD format. Let alone apply filters for re-size.And to directly answer your question. Core2 Duos on laptops were not able to play videos decently and nothing before that was able to, where any iGPU from nVidia or AMD was able to prior to the C2D's in notebooks. I'm pretty sure in desktop was not that much different.Cheers!


    My GMA 950 IGP of my 2GHz Pentium-Dual Core computer (on-board IGP) from 2007 or so would disagree with you. It handles 720p excellently and 1080p well and even my Pentium 4 630 from my 2004 desktop can handle 1080p excellently once I gave it a Radeon 5450. It's CPU is only a 3GHz P4. My old Dell 2.4GHz P4 laptop with an Intel IGP (I'd have to check to make sure which one it is) can't handle 720p very well, but the CPU has not trouble with it, just the GPU. Heck, my Atom netbook (1.6GHz single core from around two years ago, I'd have to check the model to be sure of it's GPU and CPU model number) can play 480p just fine and 720p/1080p also don't tax the CPU much, just the GPU.

    My whole point is that weak CPUs have no trouble with video, only weak GPUs have trouble with video. You'd have to find an extremely slow CPU to be unable to watch video on it so long as the rest of the computer, such as the graphics, are good enough. Even low-end GPUs like my GMA 950 can handle video playback decently, so having a GPU should not be much of a problem except with extremely weak systems such as some Intel netbooks or a very old notebook/desktop without a decent video card.
  • wiyosaya
    bgaimurhttp://www.streamcomputing.eu/blog [...] nceptions/CUDA is a dying breed.

    Maybe so, howerver, nVidia is supporting openCL with 301.42 drivers. IMHO, having nVidia cards benchmarked would be of interest to those of us who own nVidia cards.
  • nebun
    bgaimurhttp://www.streamcomputing.eu/blog [...] nceptions/CUDA is a dying breed.

    that's why there are more CUDA apps out there....you are very wrong my friend....CUDA is and will be the better engine
  • blazorthon
    nebunthat's why there are more CUDA apps out there....you are very wrong my friend....CUDA is and will be the better engine


    That's why CUDA is being replaced with OpenCl and such more and more, isn't it? Whether or not it is better isn't even what bgaimur said, only that OpenCL and such are taking over CUDA's market. The industry can't have only one company's graphics cards compatible with their software, especially with how AMD offers the Radeon 7950 and 7970, consumer cards that rivals and even exceeds most Quadro cards, at much lower prices than Nvidia has been asking for their Quadros. Heck, 7900 can probably beat some even more expensive Tesla cards.

    Many of the modern CUDA apps are moving towards OpenCL, even if not restricted to OpenCL, to at least being capable of being completely run on OpenCL. Besides, do you have proof for your claims about CUDA's superiority? CUDA's main advantage, as I recall, was that it was easier to use than largely undocumented or poorly documented OpenCL. That's been improving (among other improvements) and so has the incentive to use a language that is compatible with more than just Nvidia GPUs.
  • teddymines
    Rather than pack all this power into a single machine, why not upload work units to the cloud, and let several hundred "idle" computers do the work? I'd like to acquire points for sharing my CPU, and then sell those points to others for cash. I'm sure a lot of people in the video and photo business would pay to have access to banks of computers for their rendering.
  • A Bad Day
    teddyminesRather than pack all this power into a single machine, why not upload work units to the cloud, and let several hundred "idle" computers do the work? I'd like to acquire points for sharing my CPU, and then sell those points to others for cash. I'm sure a lot of people in the video and photo business would pay to have access to banks of computers for their rendering.


    Easier said than done. It would require extensive campaigning to let the general public know, and they're still going to be very skeptical of participating in the cloud program.
  • assafbt
    De5_roycan you test like these combos:core i5 + 7970core i5 hd4000trinity + 7970trinity apucore i7 + 7970and core i7 hd 4000, and compare against fx8150 (or piledriver) + 7970.it seemed to me as if the apu bottlenecks the 7970 and the 7970 could work better with an intel i5/i7 cpu on the graphical processing workloads.


    I second that, especially when the 4000 is OpenCL compliant, and that you did encounter CPU bottlenecks in this reviews' benchmarks.

    Thanks