ARM Cortex-A72 Architecture Deep Dive

ARM's Cortex-A72 CPU adds power and performance optimizations to the previous A57 design. Here's an in-depth look at the changes to each stage in the pipeline, from better branch prediction to next-gen execution units.

Introduction

ARM announced the Cortex-A72, the high-end successor to the Cortex-A57, near the beginning of 2015. For more than a year now, SoC vendors have been working on integrating the new CPU core into their products. Now that mobile devices using the A72 are imminent, it’s a good time to discuss what makes ARM’s flagship CPU tick.

With the A57, ARM looked to expand the market for its CPUs beyond mobile devices and into the low-power server market. Using a single CPU architecture for both smartphones and servers sounds unreasonable, but according to ARM’s Mike Filippo, lead architect for the A72, high-end mobile workloads put a lot of pressure on caches, branch prediction, and the translation lookaside buffer (TLB), which are also important for server workloads. Where the A57 seemed skewed towards server applications based on its power consumption, the A72 takes a more balanced approach and looks to be a better fit for mobile.

The Cortex-A72 is an evolution of the Cortex-A57; the baseline architecture is very similar. However, ARM tweaked the entire pipeline for better power and performance. Perhaps the A57’s biggest weakness was its relatively high power consumption, especially on the 20nm node, which severely limited sustained performance in mobile devices, relegating it to short, bursty workloads and forcing SoCs to use the lower-performing Cortex-A53 cores for extended use.

ARM looks to correct this issue with the A72, going back and optimizing nearly every one of the A57’s logical blocks to reduce power consumption. For example, ARM was able to realize a 35-40% reduction in dynamic power for the decoder stage, and by using an early IC tag lookup, the A72’s 3-way L1 instruction and 2-way L1 data caches also use less power, similar to what direct-mapped caches would use. According to ARM, all of the changes made to the A72 result in about a 15% reduction in energy use compared to the A57 when both cores are running the same workload at the same frequency and using the same 28nm process. The A72 sees an even more significant reduction when using a modern FinFET process, such as TSMC’s 16nm FinFET+, where an A72 core stays within a 750mW power envelope at 2.5GHz, according to ARM.

[Image Source: Hiroshige Goto PC Watch]

MORE: Best Smartphones
MORE: How We Test Smartphones & Tablets
MORE: All Smartphone Content
MORE: All Tablet Content

Architecture Overview

Instruction Fetch

The A72 sees improvements to performance too, starting with a much improved branch prediction algorithm. ARM’s performance modeling group is continuously updating the simulated workloads fed to the processor, and these affect the design of the branch predictor as well as the rest of the CPU. For example, based on these workloads, ARM found that instruction offsets between branches are often close together in memory. This allowed it to make certain optimizations to its dynamic predictor, like enabling the Branch Target Buffer (BTB) to hold anywhere from 2000 large branches to 4000 small branches.

Because real-world code tends to include many branch instructions, branch prediction and speculative execution can greatly improve performance—something that’s usually not tested by synthetic benchmarks. Better branch prediction usually costs more power, however, at least in the front-end. This is at least partially offset by fewer mispredictions, saving power on the back-end by avoiding pipeline flushes and wasted clock cycles. Another way ARM is saving power is by shutting off the branch predictor in situations where it’s unnecessary. There are many blocks, for instance, where instructions between branches are greater than the 16-byte instruction window the predictor uses, so it makes sense to shut it down because the predictor will obviously not hit a branch in that section of code.

Decode / Rename

The rest of the front-end is still in-order with a 3-way decoder just like the A57. However, unlike the A57’s decoder, which decodes instructions into micro-ops, the A72 decodes instructions into macro-ops that can contain multiple micro-ops. It also supports AArch64 instruction-fusion within the decoder. These macro-ops get “late-cracked” into multiple micro-ops at the dispatch level, which is now capable of issuing up to five micro-ops, up from three on the A57. ARM quotes a 1.08 micro-ops per instruction ratio on average when executing common code. Improving the A72’s front-end bandwidth helps keep the new lower-latency execution units fed and also reduces the number of cases where the front-end bottlenecks performance.

Dispatch / Retire

As explained above, the A72 is now able to dispatch five micro-ops into the issue queues that feed the execution units. The issue queues still hold a total of 66 micro-ops like they did on the A57—eight entries for each pipeline except the branch execution unit queue that holds ten entries. While queue depth is the same, the A72 does have an improved issue-queue load-balancing algorithm that eliminates some additional cases of mismatched utilization of the FP/Advanced SIMD units.

The A72’s dispatch unit also sees significant area and power reductions by reorganizing the architectural and speculative register files. ARM also reduced the number of register read ports through port sharing. This is important because the read ports are very sensitive to timing and require larger gates to enable higher core frequencies, which causes a second-order effect to area and power by pushing other components further away. In total, the dispatch unit sees a 10% reduction in area with no adverse affect on performance.

Execution Units

Moving to the out-of-order back-end, the A72 can still issue eight micro-ops per cycle like the A57, but execution latency is significantly reduced because of the next-generation FP/Advanced SIMD units. Floating-point instructions see up to a 40% latency reduction over the A57 (5-cycle to 3-cycle FMUL), and the new Radix-16 FP divider doubles bandwidth. The changes to the integer units are less extensive, but they also make the change to a Radix-16 integer divider (doubling bandwidth over A57) and a 1-cycle CRC unit.

Reducing pipeline length improves performance directly by reducing latency and indirectly by easing pressure on the out-of-order window. So even though the instruction reorder buffer remains at 128 entries like the A57, it now has more opportunity to extract instruction-level parallelism (ILP).

Expanded zero-cycle forwarding (all integer and some floating-point instructions) further reduces execution latency. This technique allows dependent operations—where the output of one instruction is the input to the next—to directly follow each other in the pipeline without a one or more cycle wait period between them. This is an important optimization for cryptography.

Load / Store

The load/store unit also sees improvements to both performance and power. One of the big changes from A57 is a move away from separate L1 and L2 prefetchers to a more sophisticated combined prefetcher that fetches from both the L1 and L2 caches. This improves bandwidth while also reducing power. There are additional power optimizations to the L1-hit pipeline and forwarding network too.

The L2 cache sees significant optimizations for higher bandwidth workloads. Memory streaming tasks see the biggest benefit, but ARM says general floating-point and integer workloads also see an increase in performance. A new cache replacement policy increases hit rates in the L2, which again improves performance and reduces power overall.

Another optimization increases parallelism in the table-walker hardware for the memory management unit (MMU), responsible for translating virtual memory addresses to physical addresses among other things. Together with a lower-latency L2 TLB, performance improves for programs that spread data across several data pages such as Web browsers.

Final Thoughts

At a high-level, the A72 looks nearly identical to the A57, but at a lower level there are a significant number of changes throughout the entire pipeline that appear to make the A72 a decent upgrade. The most notable changes affecting performance are the improved branch prediction, increased dispatch bandwidth, lower-latency execution units, and higher bandwidth L2 cache. All of these enhancements, and many more which we did not discuss here, lead to better performance—between 16-50% across a range of synthetic benchmarks, according to ARM. Real-world performance gains will be less, of course, but A72 is definitely an improvement over the A57, especially with floating-point workloads.

The A72 is not a pure performance play, however. ARM is targeting a much higher power efficiency with this architecture than with any previous high-end CPU core. It’s clear from the lengthy list of optimizations discussed above that reducing power consumption was paramount; many of the changes are purely focused on power with no net performance gain.

Reducing power and area—the A72 achieves a 10% core area reduction overall—obviously has a positive effect on battery life and cost, but it has a secondary effect on performance too. Normally, reducing latency in the execution units puts pressure on the max attainable core frequency due to increased circuit complexity and tighter timing windows; however, the A72’s power and area optimizations elsewhere, not to mention the move to FinFET, actually allow the A72 to reach a slightly higher frequency. Reducing power also reduces thermal load, allowing for higher sustained performance, something the A57 struggles with at 20nm.

The Cortex-A72 may not be a revolutionary design that catapults it above Apple’s Twister CPU in the A9 SoC for single-core performance or undercuts the A53 in power consumption, but it’s a significant update nonetheless, addressing the A57’s issues by enabling higher peak and sustained performance while using less power.

MORE: Best Smartphones
MORE: How We Test Smartphones & Tablets
MORE: All Smartphone Content
MORE: All Tablet Content

Update, 1/12/16, 10:55am PT: Clarified how the Decode/Rename/Dispatch pipeline works and added some information about the issue queues.

Matt Humrick is a Staff Editor at Tom's Hardware, covering Smartphones and Tablets. Follow him on Twitter.

Follow us on Facebook, Google+, RSS, Twitter and YouTube.

Create a new thread in the US Reviews comments forum about this subject
This thread is closed for comments
20 comments
    Your comment
  • Aspiring techie
    I wonder what would happen if these guys dipped their toe into the desktop CPU market.
    1
  • utroz
    Quote:
    I wonder what would happen if these guys dipped their toe into the desktop CPU market.
    Well no current Windows support so you would need to run Linux or other ARMv8 compatible OS. For people that just watch netflix, check facebook and other web pages, and type up a few papers for school or work an ARM cpu would have plenty of CPU performance..
    0
  • InvalidError
    Anonymous said:
    I wonder what would happen if these guys dipped their toe into the desktop CPU market.

    Not much.

    Changing instruction set does not magically free the architecture from process limitations nor remove bottlenecks from software architecture. If ARM designed a CPU core specifically for desktop, it would hit most of the same performance scaling bottlenecks x86 has. You would likely end up with a 50W ARM chip being roughly even with a 50W Intel chip, the main difference between the two - aside from the ISA - being that the ARM chip is $100 while the Intel chip is $400.
    1
  • somebodyspecial
    And I think that's his point...The PRICE and how much of the public at large such a machine could get. Myself I can't wait until they put out a full desktop chip with heatsink/fan big psu, HD or SSD, 16-32GB mem etc (hopefully with an optional slot for discrete gpu when desired). As games amp up on ARM you'd most likely only be missing WINDOWS/x86 if you use pro apps stuff. Unreal4/Unity5 etc will provide nice graphics for games on the ARM side (most engines port easily today and get even better with Vulkan coming), so only pro apps would be left off for years and some of the big ones (adobe etc) might put out full apps soon anyway. Take off $200-300 for cpu and $100 for windows and I'm guessing an ARM desktop could do quite a bit of damage to WINTEL.

    We see people already opting for chromebooks, tablets etc as PC's. Knock a chunk off desktop prices and you'll gains some users and push devs past mobile on arm. I'm hoping NV builds such a box at some point (just a much bigger Shield TV box really), but with multiple OS's (steamos, linux, and android) or at least a way to do it yourself. That would be a pretty versatile box ;) It isn't so much about if they BEAT intel, as it is about dropping the price of PC's everywhere. If ARM's side wants to grow much more they have to go to desktops. Surely everyone wants a chunk of Intel's ~13B a year.
    0
  • viewtyjoe
    Quote:
    If ARM's side wants to grow much more they have to go to desktops. Surely everyone wants a chunk of Intel's ~13B a year.


    ARM makes their money by licensing out their designs to other companies which manufacture the actual chips. The only company I'm aware of with the resources and ARM license to theoretically make something like this happen is AMD, and their use of ARM is more directed towards the server sector.
    0
  • TechyInAZ
    Quote:
    I wonder what would happen if these guys dipped their toe into the desktop CPU market.


    I doubt that would happen, having ARM AND X86 on the desktop patform is just going to cause frustration.
    0
  • pug_s
    Quote:
    Quote:
    If ARM's side wants to grow much more they have to go to desktops. Surely everyone wants a chunk of Intel's ~13B a year.


    ARM makes their money by licensing out their designs to other companies which manufacture the actual chips. The only company I'm aware of with the resources and ARM license to theoretically make something like this happen is AMD, and their use of ARM is more directed towards the server sector.

    Quote:
    And I think that's his point...The PRICE and how much of the public at large such a machine could get. Myself I can't wait until they put out a full desktop chip with heatsink/fan big psu, HD or SSD, 16-32GB mem etc (hopefully with an optional slot for discrete gpu when desired). As games amp up on ARM you'd most likely only be missing WINDOWS/x86 if you use pro apps stuff. Unreal4/Unity5 etc will provide nice graphics for games on the ARM side (most engines port easily today and get even better with Vulkan coming), so only pro apps would be left off for years and some of the big ones (adobe etc) might put out full apps soon anyway. Take off $200-300 for cpu and $100 for windows and I'm guessing an ARM desktop could do quite a bit of damage to WINTEL.

    We see people already opting for chromebooks, tablets etc as PC's. Knock a chunk off desktop prices and you'll gains some users and push devs past mobile on arm. I'm hoping NV builds such a box at some point (just a much bigger Shield TV box really), but with multiple OS's (steamos, linux, and android) or at least a way to do it yourself. That would be a pretty versatile box ;) It isn't so much about if they BEAT intel, as it is about dropping the price of PC's everywhere. If ARM's side wants to grow much more they have to go to desktops. Surely everyone wants a chunk of Intel's ~13B a year.

    Quote:
    And I think that's his point...The PRICE and how much of the public at large such a machine could get. Myself I can't wait until they put out a full desktop chip with heatsink/fan big psu, HD or SSD, 16-32GB mem etc (hopefully with an optional slot for discrete gpu when desired). As games amp up on ARM you'd most likely only be missing WINDOWS/x86 if you use pro apps stuff. Unreal4/Unity5 etc will provide nice graphics for games on the ARM side (most engines port easily today and get even better with Vulkan coming), so only pro apps would be left off for years and some of the big ones (adobe etc) might put out full apps soon anyway. Take off $200-300 for cpu and $100 for windows and I'm guessing an ARM desktop could do quite a bit of damage to WINTEL.

    We see people already opting for chromebooks, tablets etc as PC's. Knock a chunk off desktop prices and you'll gains some users and push devs past mobile on arm. I'm hoping NV builds such a box at some point (just a much bigger Shield TV box really), but with multiple OS's (steamos, linux, and android) or at least a way to do it yourself. That would be a pretty versatile box ;) It isn't so much about if they BEAT intel, as it is about dropping the price of PC's everywhere. If ARM's side wants to grow much more they have to go to desktops. Surely everyone wants a chunk of Intel's ~13B a year.


    It is possible to make a hardware ARM chip that rivals Intel. The only problem is software as there is no software that would take advantage of it. Maybe in the distant future that Android OS and its apps would morph to a desktop OS that would rival to Microsoft, but not yet. Microsoft's closest adaptation of ARM soc's are windows 10 developer edition running in the Raspberry Pi 2. All of the software and games have to be written to ARM compatible. Even so, Intel's main focus now are now on low powered and mobile chips that is competing with ARM itself. Who knows, maybe by then AMD would get its act together and compete with Intel on the low powered space when they have access to 14-16nm technologies.
    0
  • MichaelWest
    With regard to the discussion of when someone is going to provide ARM type cpu's for desktops for ARM gaming etc we already pretty much have the beginning of this with all the many Media streaming TV boxes on the market. They mainly run Android which is not ideal for desktop applications yet and this could take many years to see this improve. Hardware wise they are more than fast enough for all the current ARM android games and there is room for them to get even faster. They don't face the same power usage limitations mobile devices face. They may be targeted for use on TV's with remotes but they can work just as well with HDMI monitors, keyboards, mice and game controllers. If all you wanted was a simple cheap desktop for email/internet and ARM gaming then they already fit the bill.
    0
  • cbxbiker61
    Quote:

    It is possible to make a hardware ARM chip that rivals Intel. The only problem is software as there is no software that would take advantage of it. Maybe in the distant future that Android OS and its apps would morph to a desktop OS that would rival to Microsoft, but not yet. Microsoft's closest adaptation of ARM soc's are windows 10 developer edition running in the Raspberry Pi 2. All of the software and games have to be written to ARM compatible. Even so, Intel's main focus now are now on low powered and mobile chips that is competing with ARM itself. Who knows, maybe by then AMD would get its act together and compete with Intel on the low powered space when they have access to 14-16nm technologies.


    That is only true when you define "software" being "Windows binaries".

    More and more people every day are waking up to the fact that "software" is really source code, which can be compiled on any architecture for which there is a compiler. You just have to use an open platform with an open compiler, i.e. Linux/BSD.
    0
  • bit_user
    Anonymous said:
    Changing instruction set does not magically free the architecture from process limitations nor remove bottlenecks from software architecture. If ARM designed a CPU core specifically for desktop, it would hit most of the same performance scaling bottlenecks x86 has. You would likely end up with a 50W ARM chip being roughly even with a 50W Intel chip
    If that were true, Intel could've made an Atom that's at least as efficient as competing ARM cores. But ISA does actually count for something. x86 is significantly harder to decode, and occupies more space in ICaches.

    It would be interesting if someone designed an ARM v8 core for optimal single-thread performance. I'm pretty sure it could provide superior performance at the same power, and use less power at the same performance as Skylake (assuming similar design resources & process node as Intel). But this is a tall order, and there's not yet a big enough market. Maybe in 5 years, once ARM has grabbed a significant chunk of server market share, there'll be enough interest in building workstation-oriented ARM cores.
    0
  • bit_user
    Anonymous said:
    ... Now that mobile devices using the A72 are imminent ...

    Imminent? Huawei Mate 8 launched in back in November, using the A72-based HiSilicon Kirin 950.

    It'll be really interesting to see how Qualcomm's Kryo and NVidia's next Tegra (Parker) compare. Especially if Kryo has been outmatched before any Snapdragon 820-based phones have seen the light of day.
    0
  • bit_user
    Anonymous said:
    That is only true when you define "software" being "Windows binaries".

    More and more people every day are waking up to the fact that "software" is really source code, which can be compiled on any architecture for which there is a compiler. You just have to use an open platform with an open compiler, i.e. Linux/BSD.
    Recompile from sources? That's so 1980's, dude.

    Android and even .NET both support CPU-independent binaries that utilize just-in-time compilation. So, Google can ship Chromebooks based on x86, ARM, MIPS, Power, or whatever, and nearly all apps will work just fine. Only a few games & such that use the NDK might have problems.

    In theory, Microsoft could do similar (and Windows 10 does have some level of ARM support), but Windows has more native apps, as you imply.
    0
  • cbxbiker61
    Quote:

    Recompile from sources? That's so 1980's, dude.

    Android and even .NET both support CPU-independent binaries that utilize just-in-time compilation. So, Google can ship Chromebooks based on x86, ARM, MIPS, Power, or whatever, and nearly all apps will work just fine. Only a few games & such that use the NDK might have problems.

    In theory, Microsoft could do similar (and Windows 10 does have some level of ARM support), but Windows has more native apps, as you imply.


    Actually I didn't say "you" would have to compile from source. It's simply that the source code already exists, therefore a maintainer compiles for the new architecture.

    Android wouldn't exist today if it wasn't able to start from a working "open source" code base.

    Open source is 2000+, Windows is 1990's.
    0
  • bit_user
    Anonymous said:
    Android wouldn't exist today if it wasn't able to start from a working "open source" code base.

    Open source is 2000+, Windows is 1990's.
    In terms of mainstream, maybe, but there was quite a bit of code sharing, dating back to the birth of computers. It wasn't really until the 80's that most academics and hobbyists even started copyrighting their code.

    I'm all for open source, though. I'm a bit surprised there's no mainstream CPU that's opensource, by now. But I guess that's probably due to the economics of semiconductor fabrication.
    0
  • jonmasters
    To respond to a few misconceptions in other comments:

    Open Source is an attempt to replace the term "Free Software" with something more business friendly. It dates back to the 1980s, not the 90s. And most software is still compiled code, except on modern mobile platforms, where it is heavily binary compiled code (Apple) or JVM (but not Java, Android). There are open source CPUs, but they're not going to hit prime time any time soon because the costs of building an actual high performance design without treading on patents, and also guaranteeing that the expense of silicon is going to work right within a couple of spins, coupled with the incredible cost of building a new ISA and doing all the work just to be roughly where you started, doesn't justify the effort.
    0
  • jimmysmitty
    Anonymous said:
    Anonymous said:
    Changing instruction set does not magically free the architecture from process limitations nor remove bottlenecks from software architecture. If ARM designed a CPU core specifically for desktop, it would hit most of the same performance scaling bottlenecks x86 has. You would likely end up with a 50W ARM chip being roughly even with a 50W Intel chip
    If that were true, Intel could've made an Atom that's at least as efficient as competing ARM cores. But ISA does actually count for something. x86 is significantly harder to decode, and occupies more space in ICaches.

    It would be interesting if someone designed an ARM v8 core for optimal single-thread performance. I'm pretty sure it could provide superior performance at the same power, and use less power at the same performance as Skylake (assuming similar design resources & process node as Intel). But this is a tall order, and there's not yet a big enough market. Maybe in 5 years, once ARM has grabbed a significant chunk of server market share, there'll be enough interest in building workstation-oriented ARM cores.


    The only reason x86 is harder to decode is because of all the features it has that ARM lacks. If Intel stripped most of the features from x86 that make it so much more powerful than ARM they could easily hit the power numbers. They have gotten pretty damn close in some cases with CPUs that are still more powerful.

    Either way, ARM would have a very hard time taking the desktop market due to software support. Unless they can convince the software companies to give out free copies of the software for ARM systems to customers, which they can't, they will be where most others OS and ISA stand, unable to convert the masses who hold onto everything.
    0
  • bit_user
    Anonymous said:
    The only reason x86 is harder to decode is because of all the features it has that ARM lacks.
    x86 has over 35 years worth of cruft. Each time they wanted to add instructions, they got longer and longer, because the short opcode space was already consumed. And there are multiple constants, offsets, different addressing modes, and other complexities floating around in the stew, as well.

    It's pretty hard to argue that a modern ISA with all the worthwhile features of x86 would look anything like that. I think ARMv8 probably isn't too far off, feature-wise, but perhaps look at Power, if you prefer.

    Probably the best acknowledgement of x86's deficiencies is IA64 - the mere fact that Intel risked a departure from the cash cow that was x86. Itanium failed for a number of reasons, but not because x86 is actually good.

    And I still think it's incredibly telling that Intel hasn't resoundingly beaten ARM, given their process lead and the massive resources they've been throwing at the problem. I wonder if they'll either risk developing another proprietary ISA or possibly even designing an ARM core of their own.
    0
  • jimmysmitty
    Anonymous said:
    Anonymous said:
    The only reason x86 is harder to decode is because of all the features it has that ARM lacks.
    x86 has over 35 years worth of cruft. Each time they wanted to add instructions, they got longer and longer, because the short opcode space was already consumed. And there are multiple constants, offsets, different addressing modes, and other complexities floating around in the stew, as well.

    It's pretty hard to argue that a modern ISA with all the worthwhile features of x86 would look anything like that. I think ARMv8 probably isn't too far off, feature-wise, but perhaps look at Power, if you prefer.

    Probably the best acknowledgement of x86's deficiencies is IA64 - the mere fact that Intel risked a departure from the cash cow that was x86. Itanium failed for a number of reasons, but not because x86 is actually good.

    And I still think it's incredibly telling that Intel hasn't resoundingly beaten ARM, given their process lead and the massive resources they've been throwing at the problem. I wonder if they'll either risk developing another proprietary ISA or possibly even designing an ARM core of their own.


    IA64 failed because it had to emulate x86 code and transitioning from x86 to IA64, no one was going to take that drop in performance when at the same time they could have the same x86 performance and 64bit with x86-64.

    And I am not dissing ARM, but ARM itself is a low power uArch. The more features you add the more power it takes. The way ARM stays so low power is by not having a lot of the features. When ARM first came out for use in cell phones it did not have OoOE. If it did have OoOE it would have put it above the TDP spec for phone use. After time and more advanced processes cane out they were able to add in OoOE into ARM CPUs so they get better performance.

    I just personally think x86 is given too much crap and people always compare it to very different applications of CPU uArchs. ARM is specifically low power. I could even compare it to the SPARC T4 which is HPC specific and has a 240W TDP but outperforms x86 in HPC applications.

    As for why Intel has not beaten them, I think it is because Intel is trying to deliver a desktop experience on a phone. Atom, while much smaller than normal x86, still has most features of a desktop CPU. As well, ARM has a very strong hold and as history shows, once a technology has a hold it tends to stay for a very long time hence why we all us x86.
    1
  • ivyanev
    Why would anyone want an arm based Windows PC: Both AMD and Intel have some relatively low power and relatively low price CPU(APU) - Are they fast enough for browsing and watching youtube ? Sure. And ARM chip would undoubtedly be too. But how will this change the user experience at all? By using 10 watt less? I doubt it.
    0
  • bit_user
    Anonymous said:
    Why would anyone want an arm based Windows PC
    I have a ARM-based microserver (currently Raspberry Pi, but looking to upgrade). The low power consumption & passive cooling means I can leave it running 24/7 without concerns about heat, noise, dust buildup, or electricity costs. I use it mainly for media streaming, but also various automated tasks.

    And I know people that use ARM-based devices as a HTPC.

    That doesn't exactly answer your question, but I think some of the interest in competitive ARM-based desktop offerings is to put more pressure on Intel. If nothing else, their CPUs could be cheaper. Another benefit would be to give people more options. Since there would likely be a range of different manufacturers, perhaps there'd be some useful variation. Maybe it would push Intel to adopt new technologies faster, like PCIe v4.
    0