Nvidia's Tegra 4 GPU: Doubling Down On Efficiency

Inside Tegra 4’s GPU

From the announcement of its first Tegra SoC back in 2008, Nvidia’s greatest advantages in the mobile segment were its background with GPUs and platforms. The company’s close relationships with game developers was destined to be a boon too, since most mobile titles are easily characterized as mainstream. Increasingly, though, ISVs are utilizing powerful architectures to rival current-gen consoles, benefiting from more mature tools to fully exploit potent graphics engines.

Although we saw companies like Intel go after the power consumption of Nvidia's Cortex-A9-based Tegra 3, the company is eager to show its Tegra 4 as a solution designed with performance per square millimeter and, in turn, performance per watt in mind. In fact, Nvidia already has a reference phone design based on its Tegra 4i SoC called Phoenix. The two boards below fit into the 5" device and host different implementations of Tegra 4.

We already know that Tegra 4's GPU isn’t a unified shader design. Nvidia claims the time is simply not yet right to make that transition. And so, we still have separate programmable pixel and vertex shaders. The company also isn’t able to declare OpenGL ES 3.0 compatibility, though it’s emphatic this doesn’t adversely affect what developers are able to do with Tegra 4.

And so, the GPU in its newest SoC looks a lot like an evolution of Tegra 3, plus a number of improvements.

Swipe to scroll horizontally
Header Cell - Column 0 Tegra 4Tegra 4iTegra 3
Vertex Processing Engines631
Pixel Pipes422
MADs726012
Clock Rate672 MHz660 MHz416 /520 MHz
Fill Rate2.68 Gpix/s1.32 Gpix/s1.04 Gpix/s
Memory Interface2 x 32-bit1 x 32-bit1 x 32-bit
Memory SupportDDR3L-1866, LPDDR3-1866DDR3L-1866, LPDDR3-2133DDR3-1600, LPDDR2-1066
Manufacturing28 nm28 nm40 nm

Tegra 3 employs a single vertex shading unit with four FP32-capable cores. It also includes two fragment pipes, each with four cores capable of FP20 precision. The four vertex and eight pixel shaders is how we come to call Tegra 3’s GPU a 12-core design.

In contrast, Tegra 4 has six vertex processing engines with four “cores” each. Factor in clock rate differences (using 672 MHz for Tegra 4 and 520 MHz for Tegra 3), and that adds up to about 7.75x more vertex shading performance this generation.

Its four pixel pipes contain 12 shader “cores” each (that’s three ALUs per pipe, and four multiply-add units per ALU), adding up to 48. Assuming the same clocks, you’re again looking at 7.75x more fragment shader performance.

Chris Angelini
Chris Angelini is an Editor Emeritus at Tom's Hardware US. He edits hardware reviews and covers high-profile CPU and GPU launches.
  • s3anister
    I'm always amazed with the progress made in strides in this ultra-competitive sector so it's nice to see nvidia finally hit 28mm with Tegra 4. I'm sure some of their performance gains can be attributed to this.
    Reply
  • levin70
    Charlie at semiaccurate is correct. The Tegra 4 is DOA. Almost no one will be using it. Everyone else is already ahead of where the T4 is today, and it hasn't even launched. How many design wins were noted? 1?

    Yeah, says it all.
    Reply
  • Memnarchon
    A Sunday article? :O
    Reply
  • deedee2die4
    Nvidia staying on top, the best of the best!
    Reply
  • blazorthon
    deedee2die4Nvidia staying on top, the best of the best!
    Uhh, no... T4 isn't supposed to be out for like six months, yet it's already not as fast as some of Qualcomm's latest. Nvidia is improving, but as usual, they're staying a little behind in technology.
    Reply
  • aicom
    levin70Charlie at semiaccurate is correct. The Tegra 4 is DOA. Almost no one will be using it. Everyone else is already ahead of where the T4 is today, and it hasn't even launched. How many design wins were noted? 1?Yeah, says it all.Nobody is ahead of Tegra's four Cortex A15 cores. Krait is at less performance than A15 (until the refresh at least). Samsung's got Exynos 5 Octa, but that's not out yet either and T4 will probably still top it in the GPU performance department. Speaking of which, Tegra 4 has the most powerful GPU in floating-point of anyone (including the iPad 4) with 74.8 TFLOPS @ 672 MHz. It only takes a 825 MHz Cortex A15 to match a 1.6 GHz A9, and Tegra 4 is supposed to ship at 1.9 GHz. Unfortunately, TDP does go up in the process.

    You also have to look at where these parts are targeted. Krait is really gunning for phone design wins and they have many. It's a very power efficient chip that found its way into some very nice phones. Tegra 4 is not aimed at that market; Tegra 4i is. Tegra 4 will have a much higher TDP than 4i (and Krait) and will get substantially higher performance as a result.
    Reply
  • tjosborne
    Hey guys, I am considering getting a Asus transformer prime tablet with the tegra 3. Would it be best to wait till this processor ends up in a tablet to get one?
    Reply
  • So at 1.3Gpix/s, Nvidia has just admitted to 10x overdraw...per second? So we're looking at 9~10 frames per second oh high res displays. Lag lives on.
    Reply
  • PreferLinux
    aicomNobody is ahead of Tegra's four Cortex A15 cores. Krait is at less performance than A15 (until the refresh at least). Samsung's got Exynos 5 Octa, but that's not out yet either and T4 will probably still top it in the GPU performance department. Speaking of which, Tegra 4 has the most powerful GPU in floating-point of anyone (including the iPad 4) with 74.8 TFLOPS @ 672 MHz. It only takes a 825 MHz Cortex A15 to match a 1.6 GHz A9, and Tegra 4 is supposed to ship at 1.9 GHz. Unfortunately, TDP does go up in the process.You also have to look at where these parts are targeted. Krait is really gunning for phone design wins and they have many. It's a very power efficient chip that found its way into some very nice phones. Tegra 4 is not aimed at that market; Tegra 4i is. Tegra 4 will have a much higher TDP than 4i (and Krait) and will get substantially higher performance as a result.You mean Gigaflops, not Teraflops.
    Reply
  • blazorthon
    aicomNobody is ahead of Tegra's four Cortex A15 cores. Krait is at less performance than A15 (until the refresh at least). Samsung's got Exynos 5 Octa, but that's not out yet either and T4 will probably still top it in the GPU performance department. Speaking of which, Tegra 4 has the most powerful GPU in floating-point of anyone (including the iPad 4) with 74.8 TFLOPS @ 672 MHz. It only takes a 825 MHz Cortex A15 to match a 1.6 GHz A9, and Tegra 4 is supposed to ship at 1.9 GHz. Unfortunately, TDP does go up in the process.You also have to look at where these parts are targeted. Krait is really gunning for phone design wins and they have many. It's a very power efficient chip that found its way into some very nice phones. Tegra 4 is not aimed at that market; Tegra 4i is. Tegra 4 will have a much higher TDP than 4i (and Krait) and will get substantially higher performance as a result.
    S4 Pro is a faster CPU IIRC. IDK about how the graphics compares and won't comment about it.

    Nvidia, like I said, is getting better, but they're still going to be a little behind. They're making up a lot of ground here, especially with how they're making Tegra 4 and Tegra 4i instead of a single SoC to take both places, but they seem like they'l still have a little room to make up, at least in CPU performance, to be the best. Like I said before (at least in other articles about it), they'll still be near the top either way.
    Reply