Sign in with
Sign up | Sign in

Tegra K1’s CPU: An Updated 4+1 Cortex-A15 Design

Nvidia Tegra K1 In-Depth: The Power Of An Xbox In A Mobile SoC?
By

Nvidia and Samsung both utilize cores designed by ARM, while Qualcomm and Apple build their own cores using ARM’s instruction set. Tegra 4 was Nvidia’s first effort based on Cortex-A15, and it ran at clock rates up to 1.9 GHz. The company sticks with Cortex-A15 in its 32-bit Tegra K1, but makes some improvements that it claims facilitate up to 40% more performance at the same power level or allow the SoC to use 45% of the power at a specific performance point in SPECint2000.

Nvidia's four Cortex-A15 cores share a 16-way set associative L2 cacheNvidia's four Cortex-A15 cores share a 16-way set associative L2 cache

Those are some fairly substantial claims given the same processor generation. Yet, Nvidia says that they’re the product of three factors. First, its engineers have all of that time building Tegra 4 under their belts. Although this is Cortex-A15, there are purportedly some layout-related optimizations unique to Nvidia’s implementation. The shift from TSMC’s 28 nm HPL to HPM process brings dynamic power down a bit as well. Finally, ARM is on its fourth revision of Cortex-A15. Tegra K1 employs r3p3 (the third revision), whereas Tegra 4 was based on r2. According to ARM’s technical documentation, most of the changes between the two SoCs are related to regional clock gating and a couple of other configurable power-saving options. Estimates put the performance/watt improvements between 5 and 10%. As a result, Nvidia reaches clock rates as high as 2.3 GHz with the quad-core Tegra K1.

The flexibility to choose between higher performance or lower consumption means that Nvidia can allocate its power budget more freely than the generation prior, scaling back on the CPU in favor of its graphics engine, for example. In graphics-bound applications, this is precisely what you’d want to see. Doubly so given Tegra K1’s broader API support, which at least make it technically feasible for developers to port higher-end titles down to Android-based tablets.

Notably less was said about the dual-core Denver-based model, except that its wide superscalar execution pipeline should facilitate notably better threaded performance and fast single-core speed at clock rates of up to 2.5 GHz.

React To This Article