Sign in with
Sign up | Sign in

Tegra 2: Nvidia Goes Mobile

Motorola Xoom: The First Android 3.1 (Honeycomb) Tablet
By

As we’ve mentioned in the past, mobile devices like smartphones and tablets use what’s known as a system-on-chip (SoC). This integrates the processor, GPU, RAM, along with several other subsystems onto single device. Since all of those components sit next to each other on the same chip, there is greater efficiency in data transfers, while reducing the amount of space consumed on the PCB.


Apple A4 (iPad)
Apple A5 (iPad 2)
Tegra 2 (Xoom)
Processor
1 GHz ARM Cortex-A8 (single-core)
1 GHz ARM Cortex-A9 (dual-core)
1 GHz ARM Cortex-A9 (dual-core)
Memory
256 MB 333 MHz LP-DDR (single-channel)
512 MB 800 MHz LP-DDR2 (dual-channel)
1 GB 667 MHz LP-DDR2 (single-channel)
Graphics
PowerVR SGX535
PowerVR SGX545MP2
ULP GeForce
L1 Cache
(Instruction/Data)
32 KB / 32 KB32 KB / 32 KB32 KB / 32 KB
L2 Cache640 KB1 MB1 MB


Tegra is Nvidia’s SoC brand, and it symbolizes the company’s effort to tap into the mobile market beyond its desktop-derived GeForce graphics processors. A lot of engineering is tied up in this initiative, and what we see today in tablets like the Xoom represents the company's second incarnation of Tegra.

You may be asking "What happened to the first Tegra?" Flatly, it was far less impressive, even when it hit the market in 2009. Compared to Apple’s A4, it was a much more conservative design. Nvidia choose the older ARM11 processor, which probably explains the lack of design wins. Microsoft’s Zune HD was the only major product that employed the original Tegra.

Tegra 2Tegra 2

Tegra 2 is an entirely different beast. It’s based on the Cortex-A9, which is a generation ahead of the older ARM11. This is the same CPU seen in Apple’s A5 (iPad 2). Read Apple's iPad 2 Review: Tom's Goes Down The Tablet Rabbit Hole for a full discussion of Cortex-A9 performance.

Tegra 2: Graphics Processing PipelineTegra 2: Graphics Processing Pipeline

The ultra-low power GeForce isn't just a physically smaller GPU than the A5’s SGX 543MP2. Unlike Nvidia's desktop GPUs, Tegra 2 is based on an architecture that pre-dates its unified design. So, you’re looking at four pixel shader cores and four vertex shader cores. This means Tegra 2 operates most efficiently when it's presented with an even mix of vertex and shader code. We expect Nvidia to address that constraint in Tegra 3 (code named Kal-El).

GPU (System-on-Chip)
PowerVR SGX 535
(Apple A4)
PowerVR SGX 543
(Apple A5)
ULP GeForce (Tegra 2)
SIMD
USSE
USSE2
Core
Pipelines
2 (unified)
4 (unified)
8 (4 pixel / 4 vertex)
TMUs
2
2
2
Bus Width (bit)
64
64
32
Triangle rate @ 200 MHz
14 MTriangles/s35 MTriangles/s?


The ULP GeForce has a maximum operating frequency of 300 MHz, but device vendors can tweak this setting to save on power. Nvidia provides less information on the Tegra 2 than it does for its desktop GPUs, so it’s best to move on to benchmarks. As in our iPad 2 review, we're turning to GLBenchmark 2.0.

In terms of frames rendered in a set period of time, the Xoom offers more performance than the original iPad, but it still falls short of the iPad 2. Conservatively, Google's first Honeycomb-based tablet renders 50% fewer frames according to the Pro test, and up to 3.7x less according to the Egypt test.

GPU (System-on-Chip)
PowerVR SGX 535
(Apple A4)
PowerVR SGX 543
(Apple A5)
ULP GeForce (Tegra 2)
SIMD
USSE
USSE2
Core
Channels
Single
Dual
Single
Memory Bandwidth
2.6 GB/s
17.0 GB/s
2.6 GB/s


You can't use fill or triangle rates to draw a direct comparison of how well Tegra 2 utilizes its memory bandwidth, even though it's a quick-and-dirty way of sizing up other mobile GPUs.

According to Intel, the SGX 535 (GMA 500) requires 4.2 GB/s of memory bandwidth to reach a 14 Mtriangles/s triangle rate, but that's not the result that we get in GLBenchmark's triangle test. If you do the math, you'll find that the iPad's A4 uses 333 MHz LP-DDR, which offers up to 2.6 GB/s of throughput. This matches the memory bandwidth ratio (2.6/4.2 = 63%) to the triangle rate (8.6/14 = 61%).

In comparison, the iPad 2 uses 800 MHz LP-DDR2 in a dual-channel configuration. This adds up to about 17.0 GB/s of memory bandwidth. GLBenchmark suggests that this isn't enough though, because a single-core SGX 543 should reach 35 Mtriangles/sec. And yet, we only achieve about 30 Mtriangles/sec with our dual-core SGX 543. Adding another core doesn't exactly double performance because it's not a linear scale. However, given our previous experience with desktop GPUs, we suspect that another 30-40% could be squeezed out of the iPad 2's GPU if Apple used higher-performance memory.

We can make this assertion because there is a direct relationship between memory bandwidth and triangle rates in the A4's and A5's PowerVR GPUs, due to their tile-based deferred rendering architecture. Those GPUs operate differently than what we're used to seeing on the desktop. Tegra 2, however, is an entirely different beast. It employs a more traditional z-buffered rendering architecture, like desktop GPUs. That's why it's pointless to compare triangle and fill rates. It's more important to look at the Egypt and Pro benchmarks.

Interestingly, Tegra 2 only employs a single-channel 32-bit LP-DDR2 memory controller. This could be a bottleneck restricting performance, but there is no benchmark we can use to determine that for sure. Then again, we do know that the version of Tegra 2 in the Xoom is somewhat restricted. Motorola wanted to emphasize better battery life, so it capped the Tegra 2's memory clock at 600 MHz. This effectively limits bandwidth to 2.4 GB/s. Nvidia specs the Tegra 2 for up to 667 MHz operation, which means there could be other tablets that offer better performance through a higher data rate.

GLBenchmark 2.0Apple iPadApple iPad 2Motorola Xoom
Egypt frames (frames)
575
5075
1371
Egypt with FSAA (frames)
436
5057
-
Pro (frames)
880
2897
1347
Pro with FSAA (frames)
672
2851
-
Egypt with FSAA Fixed Time (sec)
825.6
65.0
-
Pro with FSAA Fixed Time (sec)
123.3
22.6
-
Swap Buffer Test (frames)
600
599
603
Fill Test (texture fetch) ktexel/s17 098091 855112 9897
Trigonometric Test (vertex weighted) kvertex/s103933262632
Trigonometric Test (fragment weighted) kfragment/s119135124452
Trigonometric test (balanced) kshader/s125931582543
Exponential Test (vertex weighted) kvertex/s313035352628
Exponential Test (fragment weighted) kfragment/s377411 1653003
Exponential Test (balanced) kshader/s204311 7351656
Common Test (vertex weighted) kvertex/s152437271973
Common Test (fragment weighted) kfragment/s163436994451
Common Test (balanced) kshader/s106541142530
Geometric Test (Vertex Weighted) kvertex/s194937761316
Geometric Test (Fragment Weighted) kfragment/s208163882888
Geometric Test (Balanced) kshader/s128161811628
For Loop Test (Vertex Weighted) kvertex/s167138601315
For Loop Test (Fragment Weighted) kfragment/s184262377271
For Loop Test (balanced) kshader/s127537183583
Branching Test (vertex weighted) kvertex/s390637782633
Branching Test (fragment weighted) kfragment/s604522 5573211
Branching Test (balanced) kshader/s210611 1931493
Array Test (uniform array access) kvertex/s291836583946
Triangle Test (white) ktriangle/s954829 95712 595
Triangle Test (textured, vertex lit) ktriangle/s705821 12910 520
React To This Article