More Inklings: Video Transcoding
Let’s be honest—smooth video playback probably isn’t what springs to mind when you picture the melding of CPU and GPU resources onto a single die. You, like us, are looking for a more compelling application able to show off the strengths of both previously-separate worlds working together on a parallelized task. Video transcoding would be the perfect fit.
Unfortunately, Intel took a lot of the wind out of AMD’s sails with its introduction of Quick Sync, explicitly designed to accelerate decode and encode. Found on ultra-low voltage Sandy Bridge processors that dip down to 17 W, this functionality doesn’t come cheap—the least-expensive SKU (Intel’s Core i5-2537M) costs $250 for the processor alone. But its performance in this sort of workload is compelling. In contrast, Brazos-based platforms should be found for less than $500 for the entire netbook/nettop. Motherboards with the E-350 soldered on should be available for less than $100. Comparing Zacate to Sandy Bridge consequently really isn’t fair.
The compromise you make in stepping toward a more budget-friendly design is less aggressive transcode acceleration. The transcode pipeline involves reading a file in, decoding it, encoding it, and outputting it. AMD’s Zacate APU is able to accelerate the decode stage, naturally. From there, it’s able to take advantage of the fact that the GPU and CPU are on the same die to speed up the process of copying data from graphics memory to the processor. AMD markets this capability as Fast Copy Optimization.
The way it works is simple. Previously, transcoding apps used CPU instructions to copy decoded video data from a graphics card on one end of the PCI Express bus to the processor, where post-processing and encoding took place. This interaction between dissimilar memory spaces burned CPU cycles. On a modern desktop processor, that probably wasn’t a debilitating bottleneck. But in a more mobile implementation, burnt cycles not only hold back performance more noticeably, but also have an adverse impact on power consumption. Fast Copy facilitates DMA to copy the same data without using CPU cycles, freeing the two Bobcat cores to work on the encode.
Wait—encode is happening on the processor? We have 80 stream processors in a pair of SIMD engines—why not offload to those in much the same way that Intel involved its EUs in encode acceleration on Sandy Bridge? AMD does have encode acceleration available on its discrete graphics products. But the two SIMD engines on Zacate simply aren’t powerful enough to demonstrate an appreciable benefit. This functionality will be available through the Sabine platform’s Llano APU, so we’ll have to wait until later this year to see how well it works.
In the meantime, one of CyberLink’s competitors, ArcSoft, is working on its own OpenCL-based encoder that may or may not change the Brazos performance story in the near-term. CyberLink is going the OpenCL route later this year as well. But again, both companies are more likely focused on Llano, which has the GPU muscle to make encoding worth offloading to graphics.
Chris ... did you manage to overclock it at all?
Give it your best shot ... call crashman in with the liquid nitrogen if you need to mate !!
Really impressive stats for such a small piece of silicon.
Didn't get a chance to mess with overclocking. If this is something you guys want to see, I might try to push it a little harder over the weekend.
Yeah that would be much appreciated, these little chips are so much faster than Atom, let's see if you can get them to perform similarly to a Dual-Core CPU at 1.8GHz
also what happens with the intgrated graphics core when you plug in a discreet GPU ? you gave so much detail about this in the sandy bridge review but totaly skip it for Fusion ...
the board got me interested. I am trying to buy a small "workstation terminal" ... something to code OpenGL/OpenCL on a budget. Seems this is what I am looking for.