Can OpenGL And OpenCL Overhaul Your Photo Editing Experience?

Q&A: Under The Hood With Adobe, Cont.

Tom's Hardware: Within Photoshop, what limits exist in terms of what you can do with these APIs?

Russell Williams: With some things, we can look at them and know they're not suited for OCL or the GPU in general. In other cases, it's only after we expend some effort, by implementing it, that we discover we're not going to get the speed-up that will justify the effort. It's well known that the GPU is completely suited for certain kinds of things and completely unsuited for others. I believe it was AMD that told us that that while a GPU can speed up the things it is suited to by several hundred times compared to the CPU, a problem that it is ill-suited to GPUs, something inherently sequential, then it might be 10 times slower.

Some people think, “If the GPU is so much faster, then why not do everything on the GPU?” But the GPU is only suited for certain operations. And for every operation you want to run on there, you have to re-implement it. For instance, we accelerated the Liquify filter with OGL, not OCL, and that makes a tremendous difference. For large brushes, it goes from 1 to 2 FPS to being completely fluid, responsive, and tracking with your pen. That kind of responsiveness for modifying that much data could only be done on a GPU. But it took one engineer most of the entire product development cycle for CS6 to re-implement the whole thing.

Tom's Hardware: Which gets us back to why was only one feature implemented in OCL this time around. You don't have an infinite number of developers and only one year between versions.

Russell Williams: That's right. And, of course, we have an even more limited supply of developers that already know OCL.

Tom's Hardware: Did graphics vendors play a role in your OpenCL adoption? Education, tools, and so forth?

Russell Williams: We didn't have much input on creating the tools they had. But they gave us a tremendous amount of help in both learning OpenCL and in using the tools they have given us. Both Nvidia and AMD gave us support in prototyping algorithms, because both of their interests are to make more use of the GPU. For us, the big issue is where the performance is. We can't count on a particular level of GPU being in a system. Many systems have Intel integrated graphics, which have more limited GPU and OpenGL support, and no OpenCL support. A traditional C-based implementation has to be there, and it’s the only thing we can count on being there. On the other hand, if something is performance-critical, the GPU is really where most of the compute power is in the box.

Beyond that, AMD had their QA/engineering teams constantly available to us. We had weekly calls, access to hardware for testing, and so on. Nvidia and Intel helped, too, but AMD definitely stepped up.

Tom's Hardware: So which company has the better products, AMD or NVIDIA? [laughs]

Russell Williams: You're Tom's Hardware. You know that depends on what you're running and which week you ask the question—and how much money you have in your pocket. That's why it is so critical for us to support both vendors. Well, three if you include Intel integrated graphics, which is starting to become viable.

Tom's Hardware: At some point, performance bottlenecks are inevitable. But how far out do you look when trying to avoid them? Do you say, “Well, we’re already getting five times better performance—that’s good enough!” Or do you push as far as possible until you hit a wall?

Russell Williams: We do think about that, but it’s very hard. It’s impossible to know in a quantitative way, ahead of time, to know what those will be. We know qualitatively that we should spend a lot of time on bandwidth issues. Photoshop is munging pixels, so the number of times that pixels have to be moved from here to there is a huge issue, and we pay attention to every time in the processing pipeline that happens. Quite often, that’s more the limiting factor than just computation being done on the pixels. In particular, when you have discrete graphics, there is an expensive step of moving the pixels to the graphics card and back.

Tom's Hardware: So the bus is usually your bottleneck?

Russell Williams: Yes, the PCI bus. I really expect in the future that APUs will require us to rethink which features can be accelerated. Particularly once APUs start using what some people call zero-copy. When you have an APU, you don't have to go across that long-latency PCI bus to get to the GPU. Right now, you still have to go through the driver, and it still copies from one place in main memory to another place in memory—the space reserved for CPU and another place reserved for GPU. They're working on eliminating that step. And as they make that path more efficient, it becomes more and more profitable to do smaller and smaller operations on the APU, because the overhead on each one is smaller.

On the other hand, APUs are not as fast as discrete GPUs. In some ways, it comes down to on-card memory bandwidth versus main memory. But you can also think of it as power budgets with discrete cards that are sucking down several hundred watts by themselves. You have to keep copying large things across this small pipe to this hairdryer of a compute device. It just depends. There has to be enough computation involved to pay for copying it out across the PCI bus and bringing it back.

  • ilysaml
    Now Adobe uses both CUDA and OpenCL that's superb.
    Reply
  • alphaalphaalpha1
    Tahiti is pretty darned fast for compute, especially for the price of the 7900 cards, and if too many applications get proper OpenCL support, then Nvidia might be left behind for a lot of professional GPGPU work if they don't offer similar performance at a similar price point or some other incentive.

    With the 7970 meeting or beating much of the far more expensive Quadro line, Nvidia will have to step up. Maybe a GK114 or a cut-down GK110 will be put into use to counter 7900. I've already seen several forum threads talking about the 7970 being incredible in Maya and some other programs, but since I'm not a GPGPU compute expert, I guess I'm not in the best position to consider this topic on a very advanced level. Would anyone care to comment (or correct me if I made a mistake) about this?
    Reply
  • A Bad Day
    How many CPUs would it take to match the tested GPUs?
    Reply
  • blazorthon
    A Bad DayHow many CPUs would it take to match the tested GPUs?
    That would depend on the CPU.
    Reply
  • esrever
    Would be interesting to compare the i7 ivybridge against trinity in openCL
    Reply
  • mayankleoboy1
    why no nvidia cards here?
    Reply
  • mayankleoboy1
    any CUDA vs OpenCL benchmarks?
    Reply
  • de5_Roy
    can you test like these combos:
    core i5 + 7970
    core i5 hd4000
    trinity + 7970
    trinity apu
    core i7 + 7970
    and core i7 hd 4000, and compare against fx8150 (or piledriver) + 7970.
    it seemed to me as if the apu bottlenecks the 7970 and the 7970 could work better with an intel i5/i7 cpu on the graphical processing workloads.
    Reply
  • vitornob
    Nvidia cards test please. People needs to know if it's better/faster to go OpenCL or CUDA.
    Reply
  • bgaimur
    vitornobNvidia cards test please. People needs to know if it's better/faster to go OpenCL or CUDA.
    http://www.streamcomputing.eu/blog/2011-06-22/opencl-vs-cuda-misconceptions/

    CUDA is a dying breed.
    Reply