Intel Xeon Phi Architecture
Intel has a vast portfolio of technology its engineers have developed, and Xeon Phi unquestionably taps some of that. However, the Many Integrated Core architecture is notably more than a bunch of modified Pentium processors manufactured at 22 nm. Some of its notable attributes include:
- An in-order, dual-issue x86 design with 64-bit support
- Four threads per core, and up to 61 cores per coprocessor
- 512-bit SIMD for wider vectors
- 512 KB of L2 cache per core (up to 30.5 MB per Xeon Phi)
- 22 nm tri-gate transistors
- Red Hat Enterprise Linux 6.x or SuSE Linux 12+ support
- 6 or 8 GB of GDDR5 per card
You'll notice that even the highest-end Xeon Phi wields far fewer cores than a typical GPU. However, you cannot compare an MIC core to a CUDA core, for example, on a 1:1 basis. Just one Phi core is quad-threaded with a 512-bit SIMD unit. A fair comparison requires getting past marketing's definition of a "core."
It's also interesting that the card runs Linux. This probably isn't a solution you'd want to run a LAMP package on, but I would also guess that someone will try to do it anyway. You can SSH into the Xeon Phi card, though, to find out more about the hardware. We were advised that the following screenshot came from a pre-production board.
In the following diagram of a MIC architecture core, Intel claims that less than two percent of the core and cache die area is x86-specific logic. Although the Xeon E5-2680 CPUs also found in the Stampede supercomputer are made up of 2.27 billion transistors each, the lineage of x86 comes from the 20 000- to 30 000-transistor 8086 processor.
Of course, even today's desktop CPUs are incredibly complex, emphasizing the importance of getting data to and from where it needs to go as expediently as possible. Like the Sandy and Ivy Bridge-based CPUs, the prototype product code-named Knights Corner employs a ring bus interconnect to most effectively maximize throughput and available die area. By also giving each core lots of cache, the processor is able to avoid the performance hit it'd take if each core instead needed to be fed constantly from the GDDR memory controller stops.