The Value Proposition Of Xeon Phi: Optimization
Intel came at us with a dual-pronged message during its presentation at the TACC:
- With Xeon Phi, Intel is enabling a 2x or more performance-per-watt improvement in HPC apps compared to the Xeon E5 family.
- Creating Xeon Phi-ready code happens through a similar approach developers use to exploit the Xeon's multiple cores.
The value in the first statement is both simple and powerful. Getting a greater-than 2x performance boost in optimized applications for somewhere between 225 and 300 W is not bad, particularly when you consider a pair of Xeon E5s is typically going to have a 190 to 300 W thermal ceiling, too. But because the Xeon Phi is a PCI Express-based card, you can also use more than one per pair of Xeon E5s. So long as the scaling is there, you can use two, three, or even four in a server with Xeon CPUs in it. The resulting combination can give you a lot more performance for a given amount of rack space, more performance under a defined power budget, or comparable performance using far less space, power, and cooling.
Then again, almost any step forward in manufacturing or architecture should yield gains in those same metrics. Xeon Phi's big advantage is tied to the development effort needed to harness the hardware's potential. Intel is hoping that programming for CUDA and OpenCL is still in its infancy, and that the idea of using familiar languages and tools is a compelling advantage able to generate support from the software community. Intel's position is strong, if it's able to maintain competitive performance. A unified programming model allows developers to exploit host processors and co-processors via x86, minimizing the time it takes to generate optimized code.
One of the questions we put to Intel during the Xeon Phi Q&A was, "If you sell these accelerators for $2000 or more, how will the next generation of college students learn to write code able to exploit this hardware?" I received two responses. First, Xeon Phi will not be available at retail, so Intel is working to arm universities with hardware resources. The second answer was more intriguing. If a budding programmer has a Core i3 or better CPU, they can already learn the programming model. Remember, most of the gains you get from a many-core architecture come simply from optimizing code for parallelism.
Intel doesn't care if you program for a Core i3, a Xeon E5, or a Xeon Phi. The company simply wants code written for multi-core x86 architectures. Back in the 1990s when I was doing computer science, we didn't have multi-core CPUs in our desktops. Nowadays, this is the model that will proliferate going forward.
I would also encourage Intel to get the Xeon Phi chips with manufacturing defects (resulting in lower usable core counts) into the hands of students. The name of the game is big data analytics, and that field is open to innovation from today's generation. For some at the University of Texas at Austin, they can look forward to access to the TACC and its Xeon Phi-equipped Stampede supercomputer.