GPU Performance: More is Better
Apple's A5 effectively underwent a three-part upgrade. Aside from its processor and memory, the iPad 2 now sports Imagination Technologies’ dual-core PowerVR SGX 543MP2. In comparison, the original iPad employs a single-core PowerVR SGX 535. That's the same GPU Intel used on its GMA 500, built into the Poulsbo series of System Controller Hubs for Atom.
|GPU System-on-Chip||PowerVR SGX 535(Apple A4)||PowerVR SGX 543(Apple A5)|
|Bus Width (in bits)||64||64|
|Triangle rate @ 200 MHz||14 MTriangles/s||35 MTriangles/s|
The SGX 543 includes four USSE2 (Universal Scalable Shader Engine 2.0) pipes. The SGX 535 only has two USSE pipes. This unified shader design is similar to what we've seen from competing graphics vendors for years, as it allows vertex and pixel shader code to share the same hardware. The idea is to get better performance, even if you’re rendering more of one type of shader. It’s not clear how these are second-generation shaders aside from their name, but Imagination Technologies states that the pipeline effectively delivers “twice the peak floating point and instruction throughput of the Series5 USSE.”
This isn't just about a revised SIMD architecture though. Apple also doubles the number of rendering pipelines again by placing two SGX 543 GPU cores on the A5 (the MP2 in its name represents that pair of GPU cores). This helps account for the iPad 2’s quadrupling of available GPU resources.
The GPU clock rate remains an unknown, so benchmarks remain the best way to determine effective performance. Unfortunately, it's difficult to measure real-world graphics capabilities in a meaningful way. There's no real equivalent to Fraps, and we're still a ways away from game developers including frame rate counters in their code. That’s why we turning to GLBenchmark 2.0, a synthetic OpenGL ES 2.0 metric that emphasizes texture performance. Think of it as the 3DMark of mobile devices.
|GLBenchmark 2.0||Apple iPad||Apple iPad 2||Motorola Xoom|
|Triangle Test (textured), Mtriangles/s||8.6||29.0||15.3|
|Triangle Test (textured, fragment lit), Mtriangles/s||4.2||19.9||8.6|
Like on a desktop, graphics rendering on a tablet begins with an application sending a GPU an array of vertices, vertex shaders, fragment shaders, and a bunch of other control information. The sum of this information is used to draw millions (or billions) of triangles that are used to assemble a larger 3D object. It's important to know the number of triangles that a GPU is capable of rendering because more triangles translates into greater graphics detail. GLBenchmark offers a glimpse into real-world triangle performance because it measures the triangle rate for an actual gaming scene. The results aren't that much of a surprise. At 29.0 Mtriangles/s, the second-generation iPad delivers 3x the performance of its predecessor. This means that game developers can conceivable increase geometric detail three-fold on the iPad 2 and get the same performance out of the original iPad.
The fragment lit test taxes texturing performance, with an additional focus on lighting. Thus, it's a more stressful benchmark. As the geometry becomes more complex, the iPad 2 demonstrates its improved handling of more detailed graphics workloads. It actually delivers about five times more performance than the original iPad.
|GLBenchmark 2.0Frames per set duration||Apple iPad||Apple iPad 2||Motorola Xoom|
|Egypt frames (frames)||575||5075||1371|
The performance of an actual graphics scene is easier to understand. When you look at this in terms of frames rendered in a set period of time, you're getting a lot more performance with the iPad 2. Conservatively, you're looking at least 3x more frames rendered according to the Pro test, and up to 8x more according to the Egypt test.
Comparing GPU Performance: Words of Caution
If you really want to go to the trouble of researching tablet-based graphics performance (there may be a few of you out there), bear in mind that potential won't always match up to the numbers you see in the real world. The form factor's constraints prevent vendors from pairing graphics hardware with the memory that'd best demonstrate its peak specifications, for example. Instead, you end up with the configuration that hits the performance profile needed, and nothing more.
On the desktop, a graphics card manufacturer has the freedom to balance performance between a GPU and its memory subsystem, altering data rate, memory bus width, and capacity to best exploit the processor's capabilities, whether they're bleeding-edge or decidedly mainstream. When you're dealing with smartphones and tablets, that's no longer the case. In order to cut back on power, minimize heat, or avoid monopolizing too much space on the PCB, engineers might tolerate a memory bottleneck in order to achieve other design goals. So, forget comparing individual and theoretical pieces of the graphics puzzle. Rather, focus on the end product's measurable performance.
|GLBenchmark 2.0||Apple iPad||Apple iPad 2||Motorola Xoom|
|Egypt with FSAA (frames)||436||5057||-|
|Pro with FSAA (frames)||672||2851||-|
|Egypt with FSAA Fixed Time (sec)||825.6||65.0||-|
|Pro with FSAA Fixed Time (sec)||123.3||22.6||-|
|Swap Buffer Test (frames)||600||599||603|
|Fill Test (texture fetch) ktexel/s||170980||918551||129897|
|Trigonometric Test (vertex weighted) kvertex/s||1039||3326||2632|
|Trigonometric Test (fragment weighted) kfragment/s||1191||3512||4452|
|Trigonometric test (balanced) kshader/s||1259||3158||2543|
|Exponential Test (vertex weighted) kvertex/s||3130||3535||2628|
|Exponential Test (fragment weighted) kfragment/s||3774||11165||3003|
|Exponential Test (balanced) kshader/s||2043||11735||1656|
|Common Test (vertex weighted) kvertex/s||1524||3727||1973|
|Common Test (fragment weighted) kfragment/s||1634||3699||4451|
|Common Test (balanced) kshader/s||1065||4114||2530|
|Geometric Test (Vertex Weighted) kvertex/s||1949||3776||1316|
|Geometric Test (Fragment Weighted) kfragment/s||2081||6388||2888|
|Geometric Test (Balanced) kshader/s||1281||6181||1628|
|For Loop Test (Vertex Weighted) kvertex/s||1671||3860||1315|
|For Loop Test (Fragment Weighted) kfragment/s||1842||6237||7271|
|For Loop Test (balanced) kshader/s||1275||3718||3583|
|Branching Test (vertex weighted) kvertex/s||3906||3778||2633|
|Branching Test (fragment weighted) kfragment/s||6045||22557||3211|
|Branching Test (balanced) kshader/s||2106||11193||1493|
|Array Test (uniform array access) kvertex/s||2918||3658||3946|
|Triangle Test (white) ktriangle/s||9548||29957||12595|
|Triangle Test (textured, vertex lit) ktriangle/s||7058||21129||10520|