AMD APU13: AMD Talks DirectCompute in Gaming

Why do programmers need Microsoft's DirectCompute in gaming? This was one of many topics covered during AMD's developer summit last week, explaining the need for this API, which supports general-purpose computing on GPUs in Windows Vista, Windows 7 and Windows 8. This API was released as part of DirectX 11, but it also works on GPUs that use DirectX 10 as well.

"If you have things that can run on a GPU – you have a really powerful GPU that's massively parallel, and you're bottlenecked on the CPU, a lot of times you want to move things over to the GPU from the CPU for better performance," explained AMD's Bill Bilodeau. "And there are some algorithms that are just made more for a GPU: they're data parallel, they can take advantage of all the parallelism on the GPU. Things like post-processing techniques, in particular, where you can do the same thing for every pixel. Also things like physics are really well adapted to compute."

One example is AMD's TressFX, which uses the DirectCompute programming language to unlock "the massively-parallel processing capabilities of the Graphics Core Next architecture." Building on Order Independent Transparency (OIT), TressFX "makes use of Per-Pixel Linked-List (PPLL) data structures to manage rendering complexity and memory usage." The end result is in-game hair that was previously only available in pre-rendered images.

"Besides looking good in terms of the rendering, it also reacts really well too," Bilodeau said during his presentation, "because with every individual hair strand, the physics is simulated and that's what really gives it its natural look when it's moving. It also supports things like wind and other forces, collision with the body, and it is artist friendly. There's a lot of little tweaks you can do to it, modern constraints to allow for different effects like water, being wet. A great part of the entire simulation is done on a GPU with compute shaders so there's no going back and forth between the CPU and GPU."

Back in February, AMD updated its blog with news that TressFX treats each strand of hair as a chain with dozens of links, permitting for forces like gravity, wind and movement of the head to move and curl Lara's hair in a realistic fashion in Tomb Raider, which was released earlier this year. The blog also said that collision detection is performed to ensure that strands do not pass through one another, or other solid surfaces such as Lara's head, clothing and body. Hair styles, according to the blog, are simulated by gradually pulling the strands back towards their original shape after they have moved in response to an external force.

Bilodeau explained that raw vertex data starts on the CPU memory, is copied over to the GPU and then stored into the UAV, thus when doing the physics simulation, all that data on the GPU is available to use. The first step is to do integration (calculating movement and reaction to gravity) and global shape constraints (maintaining hairstyle), then local shape constraints, which calculates the finer aspects like curliness or straightness of the hair, then length constraints, meaning calculations that keep the hair vertices and overall strands from lengthening or shortening improperly; the force of wind is also calculated in this step. Finally, there's collision, the final step.

"All this stuff gets calculated and put back into the UAV," he said. "And then the next stage is rendering, so for our hair rendering, we just get it from that same place. So the great thing is this vertex data only needs to be copied over once the entire life of the program. So that slow connection between the CPU and GPU going over the PCI bus that you usually associate with physics is not a problem anymore because all of this is moved over to the GPU. And then we can use that data later for rendering so it stays on the GPU. This isn't done once per frame so it's really one of the good ways of doing physics.”

In a nutshell, DirectCompute gives developers access to tons of parallelism with the high-performance GPUs. The results that devs have computed with DirectCompute can later be used in the graphics pipeline. DirectCompute also has good interoperability, he said. There are existing algorithms that can be used in games such as air physics, which can be accessed from AMD directly here.