
By all accounts, Qualcomm’s Snapdragon processors are incredibly successful, powering over 1350 announced or available devices and, according to Strategy Analytics, garnering more than half of all smartphone SoC revenue in Q3 2013. The auspicious Snapdragon lineage started in 2008 when the Snapdragon S1 emerged from its lair with Adreno 200 graphics (based on technology acquired from AMD) and a Scorpion-based CPU core leveraging the ARMv7 instruction set. Manufactured on a 65 nm process and operating at clock rates as high as 1 GHz, Scorpion was a custom design based on the ARM Cortex-A8 reference architecture. In the years that followed, the Snapdragon S2, S3, and S4 were added to the family, each new SoC pushing the performance envelope further.
Then, in 2013, Qualcomm introduced a new CPU architecture named Krait and paired it with Adreno 3xx-series graphics to create the Snapdragon 400, 600, and 800 SoCs.
The top-performing Snapdragon 800 employs an Adreno 330 engine (450 MHz) and Krait 400 quad-core CPU at up to 2.26 GHz. Earlier this year, though, Qualcomm pushed frequencies even higher with its Snapdragon 801: 578 MHz for the GPU, up to 2.45 GHz for the CPU, and 465 MHz for the ISP (up from 320 MHz in Snapdragon 800).
Mobile is a highly competitive market, and for Qualcomm to remain a premier provider, it had to keep evolving. The newer Snapdragon 805 sports a faster Krait 450 CPU, a new Gobi 9x35 modem with support for up to 300 Mb/s LTE Advanced CAT6, and an improved ISP for purportedly better imaging. Company reps call this the first commercially available mobile SoC supporting a 4K viewing experience. To enable that, the 805 boasts an updated Adreno 420 GPU architecture along with significantly increased memory bandwidth.
CPU and Memory
The Krait 450 CPU in Snapdragon 805 is the final addition to the Krait family, and the last high-end 32-bit SoC on Qualcomm’s roadmap. We’ll have to wait for the Snapdragon 810's introduction in 2015 before we see a new 64-bit CPU architecture supporting the ARMv8 instruction set. So, there aren't any significant changes baked into Krait 450. It has four cores, each with a 4 KB + 4 KB L0 cache and a 16 KB + 16 KB L1 cache, complemented by a shared 2 MB L2 cache. The CPU is a custom design by Qualcomm (which, like Apple, has an architecture license for creating ARM ISA-compatible products) with some similarities to the ARM Cortex-A15.
But it’s interesting to note how Qualcomm’s approach to CPU design differs from both ARM and Apple (Nvidia and Samsung are ARM processor licensees; they haven’t introduced anything unique yet on the architecture front). With Krait, Qualcomm implements a narrower, simplified architecture with high clock rates. On the other hand, Apple’s Cyclone-based host processor in the A7 SoC appears more desktop-like in terms of execution width and complexity, but operates at much lower frequencies. Both IPC (instructions per cycle) throughput and frequency for ARM’s Cortex-A15 fall between Krait and Cyclone.
Krait features an out-of-order speculative issue (an optimization technique whereby the CPU uses branch prediction to guess the path a program may take and executes the instructions before they’re needed) superscalar execution pipeline with 11 integer stages, compared to the -A15's 15 stages. While the execution pipeline is out-of-order, the initial fetch/decode stages are in-order and instructions must also be retired in-order. Krait 450 can still fetch and decode three instructions per clock cycle, and execute up to four instructions in parallel. For comparison, Apple’s Cyclone design offers twice the IPC of Krait, fetching/decoding, executing, and retiring six instructions per cycle. Cyclone also has a larger instruction reorder buffer, holding up to 192 micro-ops versus 128 for the -A15 and only 40 for Krait, which, in theory, should allow for fewer stalls and more efficient use of its pipeline. Even though the Krait 450's architecture and IPC remain unchanged, it should still offer a small performance increase over Krait 400 thanks to a boost in maximum frequency from 2.45 to 2.65 GHz (Qualcomm’s advertised frequencies are rounded up to the nearest 100 MHz).
Right about now you're probably thinking that this situation sounds a lot like the transition from Snapdragon 800 to 801. So, why is the Krait brand incrementing for 805 when it retained its Krait 400 designation for 801? The answer begins with another question. Since Krait 450 reuses the same underlying architecture and 28 nm HPm process as Krait 400, how did Qualcomm boost its peak frequency again? Although we are unable to confirm this with Qualcomm, it’s likely that the circuit traces on the die have been optimized. With a new die comes a new designation.
While Snapdragon 805 doesn’t offer any alluring CPU performance gains, it does provide a substantial memory bandwidth increase. The 805 SoC pairs a 32-bit quad-channel bus with LPDDR3-800 memory for a peak throughput of 25.6 GB/s. That's twice the maximum bandwidth of Snapdragon 800 (12.8 GB/s) and about 70% more than Snapdragon 801 (14.9 GB/s). It’s also more than Apple’s A7. It actually approaches what a fairly modern desktop CPU's integrated memory controller can do. All of this extra memory bandwidth isn’t for the CPU, though. It's reserved for Qualcomm’s new Adreno 420 GPU.
Adreno 420 GPU
The new GPU now has direct access to main memory, while the controller in Snapdragon 805 uses quality of service (QoS) to ensure each processing engine (GPU, CPU, ISP) receives the bandwidth and latency it requires for peak performance. Along with the bump in memory bandwidth, the texture and L2 caches are also larger. Adreno 420's rendering pipeline benefits from an enhanced early z-buffer test for faster depth rejection and improvements to the ROPs on the back-end.
Qualcomm doesn’t provide any low-level details about its graphics architecture beyond those general enhancements. However, looking at the large increase in memory bandwidth and texture cache, I think it’s safe to assume that Adreno 420 wields more texture units. Qualcomm doesn’t mention if any changes were made to the design or quantity of shader units, or even GPU frequency, but based on our benchmark results, it’s likely that either one or both of these saw increases as well. According to Qualcomm, all of its improvements add up to 40%-higher performance and 20%-less power consumption than Snapdragon 800 running GFXBenchmark 2.7's T-Rex test at 1920x1080. We'll see if our benchmarks corroborate the company's claim, though we're forced to wait for a 805-based product to test the SoC's impact on battery life.
Adreno 420 does more than just raise the performance bar; it also improves rendering quality with support for OpenGL ES 3.1 and DirectX 11 feature level 11_2 (up from 9_3 in Adreno 3xx). It also adds support for geometry shaders and dynamic hardware tessellation, significantly reducing memory bandwidth requirements and power consumption, while simultaneously increasing scene detail. Rather than storing additional geometry mesh data in main memory and pulling it into the GPU, hardware tessellation generates the additional geometry detail programmatically on-chip without ever touching main memory.
The image below shows the visual advantage of tessellation, and according to Qualcomm, for “this simple hornet graphics scene, hardware tessellation delivers a bandwidth savings of ~360 MB/s, and a memory footprint savings of ~20 MB. For larger games, the savings on memory footprint could be in GBs.”
Another addition to Adreno 420 that can both reduce memory usage/bandwidth and improve visual quality is support for Adaptive Scalable Texture Compression (ASTC), the next-generation, lossy, block-based texture compression format introduced in OpenGL ES 3.0 (support is currently optional). ASTC offers developers more flexibility in choosing the appropriate texture size and quality than the ETC2 format used in the previous Adreno generation.
The 420 continues the Adreno tradition of using Qualcomm’s FlexRender technology to dynamically choose between two different rendering methods: immediate-mode rendering and tile-based deferred rendering (Adreno uses a different technique than Imagination Technologies). The goal of FlexRender is to select the most efficient rendering technique for a given workload.
Another efficiency feature is Dynamic Clock and Voltage Scaling (DCVS), which dynamically varies frequencies and voltages for each processing engine in the SoC. While this isn’t a new feature, the Adreno 420 GPU adds additional power levels for more granular control, reducing power usage.
4K Video
Ultra-high-definition television (UHDTV), with a 4K resolution of 3840x2160 for the consumer version, is the latest video standard looking to replace high-definition television (HDTV), with its well-known 1920x1080 resolution. Living room adoption has been slow, however, due to the high cost of televisions and general lack of content. The situation is improving, though. Some 4K TVs sell for less than $1000, while Netflix and YouTube are currently streaming limited content in 4K. Amazon and Comcast are preparing to stream 4K video later this year, too.
For Qualcomm, big-screen TVs aren't driving 4K adoption. Rather, the company has its eye on the smaller, more mobile screens on our smartphones and tablets, as well as their 4K-capable cameras. With Snapdragon 805, Qualcomm hopes to push 4K harder. The new 805 is capable of concurrently driving its native panel at 4K (presumably at 60 Hz) and an external monitor at 4K/24 Hz.
While Snapdragon 800/801 can encode/decode Ultra HD H.264 video in hardware, H.265 is handled in software. The 805 improves upon this by decoding 4K H.265 video in hardware. We'll have to wait for the Snapdragon 810 in 2015 for hardware-based encode, though. For now, the 805 can capture/encode Ultra HD video at 30 Hz and 1080p content at up to 120 Hz.
In the slide below, Qualcomm suggests up to a 75% power savings from the 805's hardware-based decode functionality.
Snapdragon 805 also includes Qualcomm’s Hollywood Quality Video (HQV) engine, a technology purchased from Integrated Device Technology in 2011. The HQV engine is supposed to improve image quality by reducing noise and optimizing image formatting and conversion from various formats. There are also image enhancement algorithms for low-resolution images.
ISP
The Snapdragon 805 retains the dual ISP (Image Signal Processor) design used previously, but gets a performance boost. It’s now capable of processing 1.2 Gigapixel/s and image captures up to 55 MP across a combination of four camera inputs (up from two inputs in Snapdragon 800). The additional ISP inputs enable stereo and depth camera support.
Also included in the 805 are gyro-based image stabilization, enhanced noise reduction, and auto-focus acceleration.
Test System Specs
The table below contains all the pertinent technical specifications for today’s comparison units:
| Products |
|
|
|
|
|
|
| Pricing |
|
|
|
|
|
|
| Operating System | Google Android 4.4 | Google Android 4.4 w/Samsung TouchWiz UI | Google Android 4.4.2 w/Samsung TouchWiz UI | Google Android 4.4.2 w/Samsung TouchWiz UI | Google Android 4.2.2 | Apple iOS 7.1 |
| SoC | Qualcomm Snapdragon 805 | Qualcomm Krait 400 (4 Core) @ 2.45 GHz | Qualcomm Snapdragon 800 (MSM8974AA) | Samsung Exynos 5 Octa (5420) | Nvidia Tegra 4 (T114) | Apple A7 |
| CPU Core | Qualcomm Krait 450 (4 Core) @ 2.7 GHz | Qualcomm Krait 400 (4 Core) @ 2.45 GHz | Qualcomm Krait 400 (4 Core) @ 2.26 GHz | ARM Cortex-A15 (4 Core) @ 1.9 GHz + ARM Cortex-A7 (4 Core) @ 1.3 GHz | ARM Cortex-A15 (4 Core) @ 1.8 GHz | Apple Cyclone (2 Core) @ 1.3 GHz |
| GPU Core | Qualcomm Adreno 420 | Qualcomm Adreno 330 (32 ALU) @ 578 MHz | Qualcomm Adreno 330 (32 ALU) @ 450 MHz | ARM Mali T628MP6 (6 Core) @ 600 MHz | Nvidia GeForce ULP (72 Core) @ 672 MHz | Imagination PowerVR G6430 (4 Cluster) @ 200 MHz |
| Memory | 3 GB LPDDR3 | 2 GB LPDDR3 | 3 GB LPDDR3 | 3 GB LPDDR3 | 1 GB LPDDR3 | 1 GB LPDDR3 |
| Display | 10.06-inch IPS @ 2560x1440 | 5.1-inch SAMOLED @ 1920x1080 (432 PPI) | 10.1-inch TFT @ 2560x1600 (299 PPI) | 10.1-inch TFT @ 2560x1600 (299 PPI) | 7-inch IPS @ 1280x720 (441 PPI) | 9.7-inch IPS @ 2048x1536 (264 PPI) |
| Battery | 3400 mAh (Removable) | 2800 mAh (Removable) | 8220 mAh (Non-removable) | 8220 mAh (Non-removable) | 4100 mAh (Non-removable) | 8820 mAh (Non-removable) |
Benchmark Suite
We tested the Snapdragon 805 SoC in four key sections: CPU, GPU, GPGPU, and Web.
CPU Core Benchmarks | AnTuTu X (Anti-Detection), Basemark OS II Full (Anti-Detection), Geekbench 3 Pro (Anti-Detection) |
|---|---|
GPU Core Benchmarks | 3DMark (Anti-Detection), Basemark X 1.1 Full (Anti-Detection), GFXBench 3.0 Corporate |
GPGPU Benchmarks | CompuBenchRS |
Web Benchmarks | JSBench, Peacekeeper 2.0, WebXPRT 2013 |
Methodology
All devices are benchmarked on a fully updated copy of the device's stock software. The table below lists other common device settings that we standardize to before testing.
Bluetooth | Off |
|---|---|
Cellular | SIM card removed |
Display Mode | Device Default (nonadaptive) |
Location Services | Off |
Sleep | Never (or longest available interval) |
Volume | Muted |
Wi-Fi | On |
Furthermore, for browser-based testing on Android, we're employing a static version of the Chromium-based Opera in order to keep the browser version even across all devices. Due to platform restrictions, Safari is the best choice for iOS-based devices, while Internet Explorer is the only game in town on Windows RT.
Since the Krait 450 complex in Snapdragon 805 is architecturally the same as Snapdragon 800/801's Krait 400, their relative CPU performance should scale based on clock rate.
For reference, the 805 has an 8% advantage over 801 (2.65 versus 2.45 GHz) and 17% over 800 (2.65 versus 2.26 GHz). While Snapdragon 805 has a significant memory bandwidth advantage, the CPU isn’t starved for data in the 800/801. Therefore, this shouldn’t have a significant influence on the CPU benchmarks.
AnTuTu X
AnTuTu is an Android system benchmark designed to test the performance capabilities of four major aspects of mobile devices: Graphics (encompassing 2D, UI, and basic 3D), CPU (fixed, floating-point, and threading), RAM (read and write), and I/O (read and write).
Our first CPU benchmark's RAM test highlights the 805’s improved memory bandwidth, achieving its theoretical advantage over the 801. The 805, however, doesn’t fare as well in the other sub-tests. Qualcomm's Adreno 420 engine turns in the lowest GPU test score for both 2D and 3D graphics. Its 3D result is actually 15% slower than Nvidia’s GeForce ULP in the Tegra 4 (the 3D graphics winner in this benchmark). The 805's Krait 450 CPU also finishes last in the CPU test, where its score is 35% lower than the category-winning 801.
Basemark OS II
Basemark OS II is an all-in-one tool designed for measuring the overall performance of mobile devices. It scores each one in four main categories: System, Memory, Graphics, and Web. The System score reflects CPU and memory performance, specifically testing integer and floating-point math, along with single- and multi-core CPU image processing using a 2048x2048, 32-bit image. Measuring the transfer rate of the internal NAND storage (Memory) is done by reading and writing files with a fixed size, files varying from 65 KB to 16 MB, and files in a fragmented memory scenario. Calculating the Graphics score involves mixing 2D/3D graphics inside the same scene, applying several pixel shader effects, and displaying 100 particles with a single draw call to test GPU vertex operations. The benchmark is rendered at 1920x1080 off-screen 100 times before being displayed on-screen. Finally, the Web score stresses the CPU by performing 3D transformations and object resizing with CSS, and also includes an HTML5 Canvas particle physics test.
The Basemark OS II Graphics test shows a complete reversal. Qualcomm's Adreno 420 posts the highest score as Nvidia's Tegra 4 finishes last. Isolating Snapdragon processors, the 805 is 22% faster than the 801 and 28% faster than the 800.
Similarly, the 805 outpaces the 801 and 800 in this benchmark's Web test by 3% and 23%, respectively, which is pretty close to their clock rate differences. In the CPU-oriented System metric, the 805 turns in a score 11% lower than the 801 and is only 6% faster than the 800. It’s curious that the 805 performs as expected in the CPU-bound Web test, but underperforms in the System test. So far, the theme for the 805’s CPU performance is inconsistency.
Geekbench 3 Pro
Primate Labs' Geekbench offers a wide selection of cross-platform compatibility, with apps available for Windows, OS X, Linux, iOS and Android. This simple system benchmark produces two sets of scores: single- and multi-threaded. For each, it runs a series of tests in three categories: Integer, Floating Point, and Memory. The individual results are used to calculate category scores, which, in turn, generate overall Geekbench scores.
In the Single-Core benchmark, we see Snapdragon 805 finish 8.6% higher than the 801 and 11% higher than the 800. The scores don’t exactly scale according to frequency, suggesting the software is encountering some other bottleneck. Looking at the margin of victory for Apple’s A7 SoC, it appears that Geekbench 3 Pro prefers the A7’s higher IPC and larger caches to Snapdragon’s higher clock rate.
Once again, we record inconsistent performance from Qualcomm's Snapdragon 805. Even with half as many cores, Apple’s A7 outscores it, as do both previous-generation Snapdragons. While the 805 cruises to victory in the Multi-Core Memory test (as expected), it’s 9% slower than the 801 in Multi-Core Floating Point and a surprising 21% slower in Multi-Core Integer.
Since our time with the 805 was limited during a hands-on benchmarking event with Qualcomm, we didn’t have an opportunity to dig into these results. However, because the 805’s score is greater than half of the 801, it’s safe to assume that all four cores were active during the tests. The most reasonable explanation then is that the 805 isn’t achieving its peak frequency when all four cores are active. This might explain the inconsistencies in the other CPU benchmarks too, as some focus more on single-core performance (like the benchmarks based on Web browsing) and some stress multiple cores (AnTuTu CPU and Basemark OS II System).
The tests on this page are JavaScript- and HTML5-heavy selections from our Web Browser Grand Prix series. They're extremely meaningful to mobile devices because so much of the in-app content is served via the platform's native Web browser. These tests not only offer a view of each device's Web browsing performance, but since they're traditionally so CPU-dependent, we also get a great way to compare SoC performance between platforms running the same browser software.
In order to keep the browser version even across all Android devices, we utilize a static version of the Chromium-based Opera on that operating system. Due to platform restrictions, Safari is the best choice for iOS based devices, while Internet Explorer is the only game in town on Windows RT.
JSBench
Unlike most JavaScript performance benchmarks, JSBench could almost be considered real-world, since it utilizes actual snippets of JavaScript from Amazon, Google, Facebook, Twitter, and Yahoo!.
The results from this benchmark align perfectly by CPU architecture, with Krait trailing both Apple’s A7 and the two Cortex-A15-based CPUs. JSBench appears indifferent to the number of cores (the A7 is dual-core and the others are quad-core) and frequency. These results highlight the complexities of designing a modern CPU. Sometimes taking the “easy” approach of adding more cores or bumping clock frequency just doesn’t produce any tangible benefits if the design is bottlenecked elsewhere.
Peacekeeper
Peacekeeper is a synthetic JavaScript performance benchmark from Futuremark.
Once again, the results organize themselves based on CPU architecture, with Krait producing the lowest scores. Snapdragon 805 does manage an 8% improvement over the 801, likely due to its higher memory bandwidth.
WebXPRT 2013
Principled Technologies' WebXPRT 2013 is an HTML5-based benchmark that simulates common productivity tasks that are traditionally handled by locally-installed applications, including photo editing, financial charting, and offline note-taking.
Another benchmark yields another inconsistent result from Qualcomm's newest Snapdragon. In this HTML5-based test, Snapdragon 800 scores 16% higher than the 801 and 19% higher than the 805, again suggesting that the higher-clocked Krait cores aren't spending much time at their peak clock rates.
For our GPU-based benchmarks, we are only looking at off-screen rendering performance to remove the influence of screen resolution on the results. This allows us to directly compare SoC performance, rather than overall product performance.
3DMark (Anti-Detection)
Futuremark has become a name synonymous with benchmarking, and the company's latest iteration of 3DMark offers three main graphical benchmarks: Ice Storm, Cloud Gate, and Fire Strike. Currently, the DirectX 9-level Ice Storm tests are cross-platform for Windows, Windows RT, Android, and iOS.
Ice Storm simulates the demands of OpenGL ES 2.0 games using shaders, particles, and physics via the company's in-house engine. Although it was just released in May of last year, the on-screen portions of Ice Storm have already been outpaced by modern mobile chipsets. Nvidia's Tegra 4 and Qualcomm's Snapdragon 800 both easily max-out the Extreme version (1080p with high-quality textures). Ice Storm Unlimited, however, renders the scene off-screen at 720p and is still a good gauge of GPU-to-GPU performance.
The Adreno 420 in Snapdragon 805 demonstrates a mere 6% advantage over the 801's Adreno 330 and a more impressive 29% advantage over the 800's lower-clocked Adreno 330.
Compared to Apple’s dual-core A7 SoC, Qualcomm's quad-core Krait 450 extends the company's already-impressive lead in the threaded Physics test. Isolating rendering performance in the two graphics sub-tests, however, reveals a much smaller margin of victory. It's only 3% in the Graphics Score test and 7% in Graphics Test 2.
Basemark X 1.1
Based on the Unity 4.0 game engine, Rightware’s Basemark X is a cross-platform graphics benchmark for Android, iOS, and Windows Phone 8. This test utilizes Unity’s modern features via the OpenGL ES 2.0 render path. Features like high poly count models, shaders with normal maps, complex LoD algorithms, and extensive per-pixel lighting (including directional and point light), along with a comprehensive set of post process, particle systems, and physics effects, test how a modern game might look and run. Basemark X is aggressive in that it still hasn't been maxed-out by the latest mobile SoCs.
The Adreno 420 engine shows off a more significant victory over its predecessor in this benchmark. Snapdragon 805 manages a 31% improvement over the 801 and a staggering 64% over the 800. These are the kind of gains we like to see from a new architecture. With future driver optimization, we could potentially see the 805 pull even further ahead.
In the High Quality test, the 805 outpaces both the 800/801 by about 60%. It also establishes a 37% advantage over the A7.
GFXBench 3.0
Kishonti GFXBench 3.0 is a cross-platform GPU benchmark supporting both the OpenGL ES 2.0 and OpenGL ES 3.0 APIs. It comprises both “high-level” game-like scenarios, along with more “low-level” tests designed to measure specific subsystems.
Among the high-level tests are Manhattan and T-Rex. Manhattan is a modern, complex OpenGL ES 3.0-based scenario, while the OpenGL ES 2.0-level T-Rex is a holdover from GFXBench v2.7.
The low-level workloads include Fill, which measures fill rate by rendering four layers of compressed textures; Alpha Blending, a test that renders layers of semi-transparent quads using high-resolution, uncompressed textures; ALU, for measuring shader compute performance; and Driver Overhead, which measures the CPU overhead of the graphics driver and API by making a lot of draw calls and state changes.
See GFXBench 3.0: A Fresh Look At Mobile Benchmarking for a complete test-by-test breakdown of this benchmark.
It’s exciting to see Snapdragon 805 blow past both the 801 and the A7. Qualcomm's Adreno 420 also earns the honor of being the first mobile GPU able to run T-Rex at 1080p with playable frame rates.
In the more complex OpenGL ES 3.0-based Manhattan test, Snapdragon 805 yields an impressive 42% advantage over the 801 and 31% lead over the A7. This result suggests that modern gaming titles should enjoy a significant performance boost from the Adreno 420.
The Fill test conveys the full effect of the 805’s extra memory bandwidth, pulling textures from main memory and then writing finished pixels back to the video buffer. Qualcomm's Snapdragon 805 achieves more than twice the performance of the 801! I speculated that the company added some additional TMUs to better-utilize its more potent memory controller and texture cache back in the GPU section; I believe this chart confirms it.
The Snapdragon 805 again shows its texture mapping prowess, with an almost-40% gain in throughput over the 801.
Snapdragon 805’s advantage is less than 10% over the 801 in this shader compute performance test.
The 805 does no worse than the other Snapdragons when we isolate Driver Overhead, but it’s still the lowest-performing SoC family in this benchmark.
CompuBenchRS
CompuBenchRS tests the compute performance of multi-core systems supporting the RenderScript API (a component of the Android operating system), which is similar to CUDA or OpenCL, and can distribute parallel tasks across all compute cores. As of Android 4.2, RenderScript is expanded to run on the GPU, in addition to the CPU of supported systems.
On compute-capable GPUs, the benchmark runs on the graphics engine. Otherwise, the tests stress CPU cores. CompuBenchRS sub-tests cover the following categories: Computer Vision (Face Detection), 3D Graphics (Provence - ray tracing), Image Processing (Gaussian Blur, Histogram), Physics (Particle Simulation – 4K), and Throughput (Julia Set, Ambient Occlusion).
Snapdragon 805 sees a small regression on the Face Detection test, although driver maturity may be holding Adreno 420 back. Since RenderScript requires software support to run on the GPU, CompuBenchRS results vary widely between devices depending on what driver version they’re using.
The ray tracing test shows a similar result, with the 805 trailing its predecessors. It's clear from the previous GPU benchmarks that the higher-clocked Adreno 420 packs more compute power than Adreno 330, so I'm not overly concerned with these results at this point. Hopefully, driver maturity will improve once devices with the 805 SoC start shipping.
The Gaussian Blur results are certainly more promising, with the 805 showing almost a 2x advantage over the 801. Snapdragon 800 suffers a significant deficit to the 801 that can't be accounted for based on clock rate differences alone. It's possible that the 800 (and possibly the Tegra 4) are running this test on the CPU cores instead of the GPU.
Snapdragon 805 and 801 trade places in the second Image Processing test. It's difficult to draw conclusions based on such inconsistent results.
Snapdragon 805 wins two out of the three Image Processing tests, demonstrating a respectable lead over the 801 in the Histogram metric. It appears the Snapdragon 800 and Tegra 4 devices aren't running on the GPU again in this benchmark, likely due to older drivers.
The 805 finishes ahead of the 801, but interestingly falls behind both of the Cortex-A15-powered SoCs. This finishing order seems more plausible if the Physics Simulation is running on the CPUs, rather than the GPUs.
The Ambient Occlusion test shows Snapdragon 805 with a significant advantage over Qualcomm's 801. However, it falls short of the Exynos 5 Octa. I'm not sure if we're seeing a driver or hardware limitation.
Wrapping up our GPGPU benchmarks on a positive note, we see the 805 trounce the other SoCs by establishing an almost-6x advantage over the 801. This result is tempered somewhat, however; I suspect the other SoCs are only using their CPU cores and not benefiting from GPU acceleration like the 805.
It's difficult to draw definitive conclusions about Snapdragon 805’s compute performance based on still-spotty industry support for running RenderScript on competing GPUs. As a result, these tests say more about the state of feature support than actual hardware potential. Hopefully, RenderScript support continues to improve within the Android ecosystem. Having a GPU-accelerated compute API that's hardware vendor-agnostic will not only make benchmarking easier, but also lead to the development of more apps leveraging graphics horsepower.
Snapdragon 805 turns out to be much more interesting than the 801, which we tested in Qualcomm Snapdragon 801: Performance Previewed. While Snapdragon 801 was a mere clock rate bump over the 800 and didn't offer any architectural changes, Snapdragon 805 introduces us to Qualcomm's next-generation Adreno GPU.
The Adreno 420 sees improvements along the entire length of the rendering pipeline, from an improved z-buffer to tuned ROPs. Qualcomm won't say for sure, but there are likely additional TMUs fed by copious amounts of memory bandwidth, and larger texture and L2 caches. All of those improvements lead to impressive performance gains over the Adreno 330 in Snapdragon 800/801, placing the 805 firmly ahead of the PowerVR G6430.
While Adreno 420’s benchmark performance is impressive, does it push the envelope far enough to outperform the PowerVR Series6XT GPU due to arrive later this year? And, equally important, does it compete with Nvidia’s Kepler-powered Tegra K1? If initial performance figures from Nvidia are to be believed, the Adreno 420’s benchmark dominance may be short-lived.
We'll have to wait until 2015 and Snapdragon 810 to see any significant changes to the CPU complex. For now, Krait 450's tuned circuit layer delivers a higher maximum frequency, at least on paper. While our single-core CPU benchmarks confirm performance gains commensurate with a clock rate increase, Snapdragon 805 struggles to achieve its peak frequency with all four cores active. We can’t blame thermal throttling, since the 805 we tested was housed in a large tablet with a cool-running chassis. Also, we spread the benchmarks over several of these reference platforms, which helped keep heat build-up at bay. Keep in mind though that these were development tablets, not shipping units. So, Qualcomm’s software stack may not be fully optimized, or perhaps the company is using a conservative frequency scaling algorithm to keep SoC temperatures under control. This is a topic to revisit once retail devices start shipping.
With the Krait family of CPUs, Qualcomm opted for clock frequency over pipeline width and complexity. This strategy still works, but I don't see it being viable long-term. We’ve already seen a similar strategy fail on the desktop. Remember Intel's Pentium 4? The CPU/SoC either runs into a power/thermal wall or the weakest link in the pipeline becomes a bottleneck that prevents further scaling. Getting more work done in the same amount of time through IPC improvements can have a detrimental impact on power consumption. However, racing to get as much of the SoC back to sleep as possible, along with clever power gating, helps mitigate some of that. I suspect that Snapdragon 810’s new 64-bit architecture will look more like Apple’s Cyclone CPU than Krait.
With its more powerful GPU, Snapdragon 805 seems best suited to high-resolution tablets and smartphones with large screens. Larger form factors also provide more thermal headroom for Krait 450’s higher frequencies. It’s likely we’ll see Snapdragon 801 remain the more popular option for smartphones, while the 805 powers a new generation of tablets.





























