GPU And Gaming Performance
Apple’s A9 SoC includes an Imagination Technologies PowerVR GT7600 GPU, which has six cores arranged in three pairs around shared cache and logic. While not a significant departure from the Series6XT Rogue architecture used in last year’s iPhone 6, Series7XT does include a number of small tweaks that improve performance and reduce power consumption.
One of the changes is native support for FP16 (16-bit floating-point) operations in the Special Function Units (SFU), which were FP32 only in Series6 GPUs. Native FP16 support gives developers the option to save power when the extra precision of FP32 is not required. Additionally, SFU and ALU operations can now be co-issued, increasing instruction throughput in some situations.
Speaking of increasing throughput, improvements to the Vertex Data Master should help alleviate geometry setup bottlenecks, and the Compute Data Master, which performs front-end duties for setting up GPU compute wavefronts, sees up to a 300 percent gain. The Coarse Grain Scheduler also manages USC (Unified Shading Cluster) resources better, keeping pipelines full and reducing stalls caused by inter-tile dependencies.
iPhone GPU Comparison
GPU | PowerVR G6430 | PowerVR GX6450 | PowerVR GT7600 |
---|---|---|---|
Used In | iPhone 5s | iPhone 6 & 6 Plus | iPhone 6s & 6s Plus |
# of USCs | 4 | 4 | 6 |
# of Pipelines per USC | 16 | 16 | 16 |
FP32 ALUs per Pipeline | 2 | 2 | 2 |
FP16 ALUs per Pipeline | 2 | 4 | 4 |
Total FP32 FLOPS/cycle | 256 | 256 | 384 |
Total FP16 FLOPS/cycle | 384 | 512 | 768 |
Pixels/cycle | 8 | 8 | 12 |
Texels/cycle | 8 | 8 | 12 |
While the ALU resources on a per-core basis are the same for Series6XT and Series7XT, the GT7600’s two additional USCs mean that the total FP32 and FP16 FLOPS both increase by 50 percent. Combining these gains with the aforementioned improvements to the front-end blocks should yield some significant performance gains. Indeed, Apple is claiming up to a 90 percent advantage over the iPhone 6, which seems to also imply an increase in max GPU clock frequency, a consequence of the more power-efficient FinFET process.
Keeping this bigger GPU fed is likely the motivation for the move from LPDDR3-933 RAM in the iPhone 6 to the newer LPDDR4-1600 RAM in the 6s, resulting in ~41 percent more bandwidth according to the Geekbench memory test.
Below we explore how the PowerVR GT7600 GPU in the new iPhones performs by running several synthetic and real-world game engine tests. To learn more about how these benchmarks work, what versions we use, or our testing methodology, please read our article about how we test mobile device GPU performance.
The new iPhones see a significant overall graphics boost in 3DMark Ice Storm Unlimited compared to the previous generation; the iPhone 6s is 83 percent faster than the iPhone 6, and the 6s Plus is 71 percent faster than the 6 Plus, which has the same GPU as the 6 but uses a higher max clock frequency. In this test at least, we see scaling beyond what the 50 percent increase in ALUs can account for, and pretty close to Apple’s 90 percent claim.
The iPhone 6s also outpaces Samsung’s Galaxy S6 and its Mali-T760MP8 GPU by 80 percent. Imagination’s GT7600 even takes the top spot on the chart away from Qualcomm’s Adreno 430 GPU used in both the HTC One M9 and OnePlus 2. The margin of victory is smaller, though, just 30 percent faster than the M9 and 22 percent faster than the OnePlus 2, whose GPU runs at 630MHz versus the M9’s 600MHz.
GPU Performance Comparison (3DMark: Ice Storm Unlimited) | |||||
---|---|---|---|---|---|
Device | iPhone 6s | iPhone 6 Plus | iPhone 6 | OnePlus 2 | Galaxy S6 |
Graphics Test 1 | 100% | 55% | 52% | 70% | 51% |
Graphics Test 2 | 100% | 60% | 56% | 89% | 59% |
Breaking down the graphics results shows the architectural changes in Series7XT are fairly well balanced between the front-end and back-end, with similar performance gains between the iPhone 6s and iPhone 6 in each graphics test. We do see slightly better gains across the board in the first graphics test, which focuses on vertex operations (front-end) and minimal pixel processing, suggesting that the Vertex Data Master was the bigger bottleneck in Series6XT. The second graphics test, which focuses heavily on pixel operations by including particles and several post-processing effects, plays to the Adreno 430’s strength in ALU performance; the OnePlus 2 is not far behind the iPhone 6s here.
Apple’s SoCs have struggled in the Physics test since the A7. Focusing on CPU performance, the Physics test uses “non-sequential data structures with memory dependencies," according to Futuremark, the test’s developer. In the previous section, we discussed how Apple’s memory controller in the A7 onwards is optimized for sequential access patterns. This ends up being a disadvantage here, one which Qualcomm’s Snapdragon 808 and 810 SoCs also share.
GFXBench Manhattan uses an OpenGL ES 3.0 based game engine that uses several lighting and pixel-shader effects. Looking at the offscreen results, we see the new iPhones getting around twice the performance of the previous generation, once again scaling beyond what’s achievable by simply adding two additional GPU cores. The iPhone 6s Plus outpaces the Galaxy S6 edge+ by 55 perrcent, similar to what we saw in 3DMark Ice Storm Unlimited. Curiously, the larger S6 edge+ performs better than the standard S6 in the game engine tests, but the same in the GFXBench synthetic tests.
Despite Adreno’s ALU advantage, the 430 falls behind the Mali-T760MP8 in the Galaxy S6 family of phones, leading to a larger 67 percent margin of victory for the 6s Plus over the higher-clocked 430 in the OnePlus 2. Relative to the scaled-back Adreno 418 GPU in the LG G4 and Moto X Pure Edition, the iPhone 6s Plus is about 2.7x faster.
All of the iPhones as well as HTC’s M9 move to the top of the chart when rendering onscreen because of their lower resolution displays. The iPhone 6 and 6s, with their 1334x750 native resolutions, naturally see the largest performance increase. While the iPhone 6 Plus and 6s Plus render the UI at a higher 2208x1242 resolution and then downscale to 1080p, GFXBench renders the onscreen tests directly at 1080p, which is why the onscreen and offscreen results are the same.
We also ran GFXBench Metal, which uses the same tests as GFXBench 3.0 but rewritten to leverage Apple’s Metal graphics API that was first introduced in iOS 8. Similar to project Vulkan, the Metal API is meant to give game developers more direct access to the hardware, improving performance by cutting out software overhead. In Manhattan, however, we see very little to no benefit from the move to Metal.
In the OpenGL ES 2.0 based T-Rex game simulation, the iPhone 6s performs 82 percent better than last year’s model, once again very close to Apple’s 90 percent claim. The 6s Plus improves upon the 6 Plus’ score by 73 percent.
The HTC One M9 heats up quickly and throttles heavily in this test, which is why it performs no better than last year’s iPhone 6. The newer revision of the Snapdragon 810 SoC in the OnePlus 2 throttles less, boosting performance and closing the gap with the 6s Plus to 55 percent.
The move to higher bandwidth LPDDR4 memory helps give the iPhone 6s a 54 percent boost in alpha blending over the iPhone 6. The Galaxy S6, Galaxy S6 edge+, OnePlus 2, and HTC One M9 also use LPDDR4 RAM, but fail to match the new iPhones’ throughput, mirroring what we saw in the Geekbench memory bandwidth test.
The two additional USCs in the GT7600 give the iPhone 6s an even more impressive increase in ALU and texturing performance, with the iPhone 6s at least doubling performance in both the ALU and Fill tests. Unlike last year, where the iPhone 6 Plus’ GPU was clocked roughly 10 percent higher than the smaller-screened version, it’s clear from these tests that the iPhone 6s Plus and 6s share the same max GPU frequency.
Up to this point we have not seen any appreciable performance gains from using the Metal API. In the Driver Overhead test, which measures the graphics driver’s CPU overhead by issuing a large number of draw calls—the exact scenario targeted by the new lower-level APIs—we finally see Metal’s true potential. The latest iPhones see a 3x improvement in draw call performance, allowing for many more objects to be rendered onscreen at a time. Metal is even more helpful for older, lower-performing devices: the iPhone 6, 6 Plus, and 5s all see a 4x improvement.
With the iPhone 6s and 6s Plus, we see GPU performance nearly double in a single generation—an impressive feat. The front-end optimizations in Series7XT, especially the improved throughput in the Vertex Data Master, produce a more balanced GPU with no glaring weak spots. What’s really driving this performance increase, however, is the move away from the 20nm planar process used for the A8 SoC to 14/16nm FinFET. The additional die space affords Apple the room to include the GT7600 GPU, which has two more cores than the GPU in the A8, and the improved electrical characteristics allow Apple to ramp up clock frequency. Pairing a potent GPU with LPDDR4 memory and sensible screen pixel densities makes the iPhone 6s and 6s Plus two of the best gaming smartphones you can buy.