At its first AI Developer Conference, Intel announced the Nervana NNP-L1000, which is the first neural network processor (NNP) to come out of the Nervana acquisition. The chip will prioritize memory bandwidth and compute utilization over theoretical peak performance.
Intel’s New Attempt To Succeed In The ML Market
Initially, Intel started competing with Nvidia in the machine learning (ML) chip market with its Xeon Phi architecture, which used tens of Atom cores to “accelerate” ML tasks. However, Intel must have realized that Phi alone wasn’t going to allow it to catch up to Nvidia, which seems to make significant leaps in performance every year.
As such, the company began looking for other options, which led it to buy Altera for its field programmable gate arrays (FPGAs), Movidius for its embedded vision processor, MobilEye for its self-driving chip, and Nervana for its specialized neural network processor. Also, Intel has started working on its own dedicated GPU. The company is also working on neuromorphic and quantum computing chips.
Intel calls all of these options a “holistic approach” to artificial intelligence. However, the company may also want to avoid betting everything on a single architecture again, as it did with Phi, and then fall even farther behind Nvidia in the ML chip market. On the other hand, this scattered strategy for ML may also confuse developers, because they won’t know which technology Intel will back the most in the long-term (and Intel may not know yet, either).
Intel’s Nervana NNP-L1000
For now, Intel seems to focus more on pushing its Nervana chips to ML researchers, possibly because it may be the one that is going to compete most directly with Nvidia (at least until Intel’s dedicated GPUs arrive).
The Nervana NNP-L1000, code-named Spring Crest, seems to put great emphasis not so much on peak trillion operations per second (TOPS) as on high memory bandwidth and low latency.
Intel showed the following performance numbers for its Lake Crest prototype, which is currently being demoed to some partners:
General Matrix to Matrix Multiplication (GEMM) operations using A(1536, 2048) and B(2048, 1536) matrix sizes have achieved more than 96.4 percent compute utilization on a single chip. This represents around 38 TOP/s of actual (not theoretical) performance on a single chip.
Multichip distributed GEMM operations that support model parallel training are realizing nearly linear scaling and 96.2 percent scaling efficiency for A(6144, 2048) and B(2048, 1536) matrix sizes – enabling multiple NNPs to be connected together and freeing us from memory constraints of other architectures.
We are measuring 89.4 percent of unidirectional chip-to-chip efficiency of theoretical bandwidth at less than 790ns (nanoseconds) of latency and are excited to apply this to the 2.4Tb/s (terabits per second) of high bandwidth, low-latency interconnects.
All of this is happening within a single chip total power envelope of under 210 watts. And this is just the prototype of our Intel Nervana NNP-L1000 (Lake Crest) from which we are gathering feedback from our early partners.
The Nervana NNP-L1000, which will be the first Nervana product to ship to customers, promises 3-4x the neural network training performance compared to Lake Crest, according to Intel.
Intel said that the NNP-L1000 would also support bfloat16, a numerical format that’s being adopted by all the ML industry players for neural networks. The company will also support bfloat16 in its FPGAs, Xeons, and other ML products. The Nervana NNP-L1000 is scheduled for release in 2019.