On MediaTek's Deca-Core Mobile Chip Strategy

MediaTek has been making a bit of noise in the mobile market of late, particularly as it pertains to its recently-announced deca-core Helio X20 chip. Sure, there's a sort of "core" race in the mobile market (I see your quad-core and raise you octa-core!), but putting ten cores on one chip seems almost childlike in its one-upmanship on the surface. "More cores equals a better chip, a better chip equals a faster phone, and we win," seems to be the message.

Indeed, MediaTek has felt the blowback from naysayers who see its core-happy chip design that way. Even the more midrange P10, which the company announced at Computex, and the Helio X10, are brimming with eight cores each.

But Mohit Bhushan, MediaTek's VP and GM of Marketing and Business Development, told us in an interview that those detractors don't understand the point of the design.

Think Three Clusters, Not Ten Cores

Fundamentally, Bhushan said, the X20 is not about ten cores; it's about three clusters of cores that each serve a different purpose.

"Really, you're not running 10 processors at the same time. That's a very important part. Essentially, what we did is take the concept of big.LITTLE, and we stretched it a bit. You have the performance processors, and the power-conserving processors, and you pick which breed you want to turn on at any given time."

He explained that this is a tri-frequency architecture with three frequency planes -- 2.5 GHz, 2.0 GHz, and 1.4 GHz -- which are spaced out by about half a GHz. MediaTek populated each of those planes with processors.

The 2.5 GHz plane has two ARM Cortex-A72 cores, and the 2.0 GHz and 1.4 GHz planes each have four Cortex-A53 cores. (That's 2+4+4, for 10 total cores.) "Now, could we have put two A72s, two A53s, and two A53s [clocked lower] and done six [cores]?" he asked. "Sure, we could have done that. Or could we have done 2+3+3, and done 8 [cores]? Sure, we could have done that."

But, he said, the secondary idea here is that each cluster can offer the full processing experience for different tiers. For example, quad-core SoCs are the new norm; that's why the mid- and low-tier clusters are effectively each quad-core A53s. (MediaTek would have done the same with the top cluster, but the physical space required by the A72s pushed it to stick with a dual-core option for now.)

"Now let's talk use case," Bhusan continued. "You use Facebook all the time, you touch the icon on the desktop, it launches the app, and it shows you the feed, and you start scrolling through posts. All that stuff kicks off on the middle [frequency plane]. It's neither hot nor cold. And then if you pause, and start reading/liking/commenting, it uses the lowest one, 1.4 GHz. And then you turn on a video or a game in [Facebook], it goes to the high [frequency plane]."

"In just one app, what we did is optimize Facebook to run on three frequencies. Why do we do that? Power. Power is the main reason why." He said that with the three-frequency plane versus a two-frequency configuration, "We are getting 30 percent extra power benefit for doing the same app, same process, same everything."

Coherent Cache And Fixed-Point Hardware

Bhushan continued, "Part of this scheme is that you have to keep the caches coherent. There was a lot of work that went into, 'How do you keep cache coherency across three clusters?' And then once you get the caches coherent, you have to have the ability to turn on/off the cluster as you deem fit and also share the clusters with other components of the chip, like the GPU."

"That's where CorePilot comes in," he added. "CorePilot is like a scheduler, which is looking at the underlying hardware, looking at the input queue of tasks that keep coming, because users keep touching the smartphone, kicking off the processes. Those are the things that went into designing this chip."

When we asked, Bhushan also noted that much of the work done by the X20 is accomplished by fixed-point hardware, which handles items such as audio, video decoding and more.  

"Let's say you're gaming or watching an intense 1080p video," said Bhushan. "The processor is where the app is running, but when it has to do the media decoding, it's being done in the hardware. So it's not just all CPU-driven. The mobile industry has been relying on dedicated hardware for multimedia for quite some time now. And it keeps getting better."

Efficiency Is The Key

It may seem counter-intuitive considering all the cores MediaTek crammed onto the X20, but its main goal is efficiency. Again, hearkening back to the Tri-Cluster approach, that makes some sense: If you're engaging in a low-demand activity, the chip will use the lowest-necessary cluster. If you need more oomph, you can get it with the most powerful cluster. This way, you get the maximum amount of computing power with, ostensibly at least, the most efficient cluster.

Bhushan likens this paradigm to gears on a car. It's silly to drive 10 MPH in third gear, just as it's not feasible to accelerate onto a roadway stuck in first. You use the gear that's most effective for the speed you're driving.

That all sounds well and good, but there must be an inherent inefficiency in all that switching, and Bhushan admitted that's certainly the case, but there's still a net gain that makes this paradigm practical. He said that MediaTek has profiled several apps -- Facebook, Gmail, Skype, and a few Chinese apps -- and compared the performance between chips using the Tri-Cluster setup and those using a dual approach. At an overall system level, said Bhushan, "We are finding -- from kickoff to usage to going on and doing something else -- we're finding 20-25 percent power savings, despite the switching cost."

We asked where MediaTek saw itself penetrating deeper into the smartphone space -- on high-end flagships, low-end new-market devices, or somewhere in the middle -- but Bhushan turned the question around, pointing away from that high/middle/low conversation and aiming at what MediaTek is really concerned with, which is the user experience.

(We infer that MediaTek believes its three gears can satisfy users at all performance levels.)

Bhushan talked first about battery issues. "The battery has to last longer. You simply cannot afford to have your battery run out in the middle of the day. Batteries need to now last at least two or three days," he said.

"We had to innovate on how to conserve power, with Tri-Cluster as an example. There are more things being done in future chips, like obviously going to the next node, which is always the right thing to do. We're also looking at interesting technologies on components that suck less power, circuit technologies right on the board, which really improve line losses and how you design boards. So there's a strong R&D effort underway," he added.

MediaTek expects the Helio X20 to ship in consumer devices by the end of the year. When that happens, we'll look forward to putting the company's claims to the test.

Seth Colaner is the News Director at Tom's Hardware. Follow him on Twitter @SethColaner. Follow us @tomshardware, on Facebook and on Google+.

  • Achoo22
    No, you nailed it when you labeled it one-upmanship. They just wanna' stuff this into a phone and call it a 9g experience.
    Reply
  • canadianvice
    That's nice, but until MediaTek starts being honest and honouring their licensing obligations instead of parasitising the work of others, I have no interest in that company and I would urge people to boycott them.

    Edit: For those of you who may not know, MediaTek does not observe GPL compliance and has a very poor record with the Android development community. Condoning this practice encourages it in others and makes it harder to contribute to a better running platform for all of us. That's why I say this.
    Reply
  • TechyInAZ
    Reminds me of the NVidia tegra CPUs.

    Sure it's a 10 core cpu, but having extra cores that aren't being used seems unnecessary. I'd prefer just using 4 powerful cores, and clocking them for the appropriate usage.
    Reply
  • de5_Roy
    what's the fabrication process for helio x20 ? afaik the a72 core was optimized for finfet process.

    it's a nice idea on paper. but there's gotta be some delay in task switching when the processors are stalled yet sipping power. and if the scheduler (both OS and the CPU) can't keep up, it'll turn the whole event into a power drain instead. e.g. the os sticks a certain process on the high cluster (dual a72 running at 2.2 GHz).
    Reply
  • sam1275tom
    Cannot they just adjust the frequency instead put so much unused cores in???
    Reply
  • ZolaIII
    Big litle with two clusters is stupid but this is super stupid. Even if they made shared coherent L2 Cache (what I doubt) between 3 clusters trade migration would eat possible benefits on small short running tasks. Than again having 4 more cores even running in active idle will certainly consume more power, not to mention all of them shifting tough voltage table. And again 1.4GHz for "slower" small cluster is not really green power limit, 1GHz is. So all of this is super stupid.
    Just to mention that Cortex A72 @ 1GHz would approximately use same amount of power vile delivering the same amount of performance as A53 @ 1.4GHz.
    Now proper way to do it for now would be with two clusters & adding time based condition lop to scheduler to steady frequency transitions.
    Don't get me wrong I do think their is space for let's say two more general purpose cores but micro controller class ones that would do the light tasks vile device is inactive & rest are in a deep sleep state & when switched on they can be used for offloading main ones from peripheral tasks like for instance acting as storage controller. Problem is ARM still didn't done any V8 compactible (64 bit) microcontroller design...
    Reply
  • Quixit
    There is a reason that all last generation phones picked Qualcomm's Snapdragon 800 over all the big.little configurations. The idea has marginal benefits over a dynamic clock speed, a lot of restrictions and wastes die space. Adding an extra, middle tier is very stupid, when are you even using a middling load load-term? Low performance with spikes of high perfomance at least makes some theoretical sense.

    P.S. Relating the power efficiency of CPU cores to automobiles is fallacious and I can't believe you didn't call them out on that.
    Reply
  • jaber2
    Sure more memory would be nice but who would use more than 256k?
    Reply
  • Tibeardius
    This is the simple approach to building a better core. Use 3 groups of cores that are only good for one thing and then switch between them. I'm not saying it won't/doesn't work, but it isn't a very elegant solution. Creating a more flexible core that knows when to power up, power down, idle etc, is what we need. I guess they don't have the money for R&D.
    Reply
  • PaulBags
    How about putting bigger batteries in phones instead of "innovating" work arounds?
    Reply