Chip Fights: Nvidia Takes Issue With Intel's Deep Learning Benchmarks

Intel recently published some Xeon Phi benchmarks, which claimed that its “Many Integrated Core” Phi architecture, based on small Atom CPUs rather than GPUs, is significantly more efficient and higher performance than GPUs for deep learning. Nvidia seems to have taken issue with this claim, and has published a post in which it detailed the many reasons why it believes Intel’s results are deeply flawed.

GPUs Vs. Everything Else

Whether they are the absolute best for the task or not, it’s not much of a debate that GPUs are the mainstream way to train deep learning neural networks right now. That’s because training neural networks requires low precision computation (as low as 8-bit), and not high-precision computation, for which CPUs are generally built. Whether GPUs will one day be replaced by more efficient alternatives for most customers, it remains to be seen.

Nvidia has not only kept optimizing its GPUs for machine learning over the past few years, but it has also invested many resources into the software that makes it easy for developers to train their neural networks. That’s also one of the main reasons why researchers usually go with Nvidia for machine learning rather than AMD. Nvidia said that the performance of its software has improved by an order of magnitude when comparing the Kepler software era to the Pascal one.

However, GPUs are not the only game in town when it comes to training deep neural networks. As the field seems to be booming right now, there are all sorts of companies, old and new, trying to take a share of this market for deep learning-optimized chips.

There are companies that focus on FPGAs for machine learning, but also companies that create custom deep learning chips such as Google, CEVA, and Movidius. Then, there’s also Intel, which wants to compete against GPUs by using dozens of small Atom (Bay Trail-T) cores instead under the Xeon Phi brand.

Intel’s Claims

In its paper, Intel claimed that four Knights Landing Xeon Phi chips were 2.3x faster than “four GPUs.” Intel also claimed that Xeon Phi chips could scale 38 percent better across multiple nodes (up to 128, which according to Intel can’t be achieved by GPUs). Intel said systems made out of 128 Xeon Phi servers are 50x faster than single-Xeon Phi servers, implying that Xeon Phi servers scale rather well.

Intel also said in its paper that when using an Intel-optimized version of the Caffe deep learning framework, its Xeon Phi chips are 30x faster compared to the standard Caffe implementation.

Nvidia’s Rebuttals

Nvidia’s main arguments seem to be that Intel was using old data in its benchmarks, which can be misleading when comparing against GPUs, especially because Nvidia’s GPUs saw drastic increases in performance and efficiency once they moved from a 28nm planar process to a 16nm FinFET one. Not only that, but in the past few years, Nvidia has also optimized various software frameworks for its GPUs.

That’s why now Nvidia claims that if Intel had used a more recent implementation of the Caffe AlexNet test, it would’ve seen that four of Nvidia’s previous-generation Maxwell GPUs were actually 30% faster than four of Intel’s Xeon Phi servers, according to Nvidia.

In regards to Xeon Phi’s “38% better scaling,” Nvidia also said that Intel’s comparison includes its latest Xeon Phi servers with the latest interconnect technology, which Intel pitted against four-year-old Kepler-based Titan X systems. Nvidia mentioned that Baidu has already proven that, for instance, speech training workloads scale almost linearly across 128 Maxwell GPUs.

Nvidia also believes that, for deep learning, it’s better to have fewer strong nodes than more weaker nodes anyway. It added that a single one of its latest DGX-1 “supercomputer in a box” is slightly faster than 21 Xeon Phi servers, and 5.3x faster than four Xeon Phi servers.

Considering the OpenAI non-profit just became the first ever customer of a DGX-1 system, it’s understandable that Intel couldn’t use one to compare its Xeon Phi chips against. However, Maxwell-based systems are quite old by now, so it’s unclear why Intel decided to test its latest Xeon Phi chips against GPUs from a few generations ago with software from 18 months ago.

AI Chip Competition Heating Up (In A Good Way)

It’s likely that Xeon Phi is still quite behind GPU systems when it comes to deep learning, in both the performance and software support dimensions. However, if Nvidia’s DGX-1 can barely beat 21 Xeon Phi servers, then that also means the Xeon Phi chips are quite competitive price-wise.

A DGX-1 currently costs $129,000, whereas a single Xeon Phi server chip costs anywhere from $2,000 to $6,000. Even when using 21 of Intel’s highest-end Xeon Phi chips, that system still seems to match the Nvidia DGX-1 on price.

Although the fight between Nvidia and Intel is likely to ramp up significantly over the next few years, what’s going to be even more interesting is whether ASIC-like chips like Google’s TPU can actually be the ones to win the day.

Intel is already using more “general purpose” cores for its Phi coprocessor, and Nvidia still has to think about optimizing its GPUs for gaming. That means the two companies may be unable to follow the more extreme optimization paths of custom deep learning chips. However, software support will also play a big role in the adoption of deep learning chips, and Nvidia arguably has the strongest software support right now.

This thread is closed for comments
    Your comment
  • bit_user
    Thanks for the article. I appreciate your coverage of Deep Learning, Lucian, and generally find your articles to be both well-written and accessible. A couple minor points, though...

    First, please try to clarify which Xeon Phi product you mean. I wish Intel had called the new chip Xeon Theta, or something, but they didn't. Xeon Phi names a product line, which now has 2 generations. The old generation is code named Knight's Corner, and the new one is Knight's Landing. Intel is referring to the new one, while Nvidia is probably referring to the old one, as Knight's Landing isn't yet publicly available.

    four-year-old Kepler-based Titan X
    Titan X is neither Kepler-based nor four-years-old.

    Intel's paper references a "Titan" supercomputer, containing 32 K20's, which are Kepler-based. The paper also mentions K80 GPUs (also Kepler-based). They don't appear to compare themselves to a Titan-series graphics card, at any point (not least of all, because it's a consumer product, and much cheaper than Nvidia's Tesla GPUs).

    It added that a single one of its latest DGX-1 “supercomputer in a box” is slightly faster than 21 Xeon Phi
    Which generation? It might be the same thing Intel did - comparing their latest against 4-year-old hardware. Given that the new Xeon Phi generation hasn't yet launched, this seems likely.

    It’s likely that Xeon Phi is still quite behind GPU systems when it comes to deep learning, in both the performance and software support dimensions.
    The new Xeon Phi did take one significant step backward, which is dropping fast fp16 support - something they even added to their Gen9 HD graphics GPU. Knight's Corner had it, but It doesn't appear to exist, in AVX-512 (what Knights Landing uses).

    Anyway, one way to see through the smokescreen of each company's PR is to simply look at the specs. Both Knights Landing and the GP100 have 16 GB of HBM2-class memory (although Knights Landing has an additional 6-channel DDR-4 interface). The rated floating-point performance is 3/6 and 4.7/9.3 TFLOPS (double/single-precision), respectively. So, I'm expecting something on the order of a 2-3x advantage for Nvidia (partly due to their superior fp16). But, maybe Intel can close that gap, if they can harness their strong integer performance on the problem.

    One thing is for sure: this beef isn't new, and it's not going away anytime soon. Intel has been comparing Xeon Phi to Tesla GPUs since Knights Corner launched, and Nvidia has been making counter-claims, thereafter.
  • turkey3_scratch
    May I ask, what is "deep learning"?
  • I love it when the mud slinging starts. Means lower prices and higher performance. Ahhh, the beauty of capitalism. Why can't more people see it? Competition is good, folks.