Xeon Phi: Intel's Larrabee-Derived Card In TACC's Supercomputer

Introducing Intel Xeon Phi

Intel has its sights on supercomputers capable of exaFLOP-class performance by 2020. To put that in perspective, we were looking at teraFLOPS-capable systems in the mid-1990s. And today's fastest supercomputers are in the tens of petaFLOPS. Achieving one exaFLOPS requires a 1000x speed-up compared to a one-petaFLOPS machine. That's a stunningly-large number.

Getting there is unquestionably going to require accelerators, as Intel calls them. AMD and Nvidia are content to credit their GPUs for the sudden surge in floating-point performance wielded by today's fastest supercomputers. But all parties will agree that the future of this space doesn't belong exclusively to Xeons or Opterons. Most analysts instead expect a mix of compute resources from those big CPUs to smaller, more specialized cores.

Today, in an effort to face the head start both GPU vendors already have in this space, and to address the surging demand for compute performance, Intel is introducing its Xeon Phi Coprocessor 5110P and announcing the Xeon Phi Coprocessor 3100-series, which will be released in 2013.

In essence, Xeon Phi takes 60 (at least in the announced 5110P SKU) x86 cores with big 512-bit vector units, runs them in excess of 1 GHz, and yields more than 1 teraFLOPS of double-precision performance on a dual-slot PCI Express card with a custom Linux distribution. That same Xeon Phi 5110P includes 8 GB of GDDR5 memory, although Intel plans to arm the 3100-series cards with 6 GB. To be sure, the cores are not designed to address the general-purpose workloads you'd tackle with a third-gen Core or even Atom processor. Rather, they excel in parallelized tasks able to leverage those many cores to greater effect. 

Why might you need an accelerator card like the Phi? Weather modeling, medical imaging, energy exploration, simulation, financial analysis, content creation, and manufacturing are all fields currently leveraging hardware from AMD and Nvidia for their compute power. Intel is simply trying to do the same thing with a product that doesn't require coding in CUDA or OpenCL. Instead, ISVs can optimize for Phi using C, C++, and Fortran, with specific additions to the code that accommodate and utilize the accelerator.

Of course, getting here was no easy task, and many enthusiasts will recognize the Larrabee business unit name, which came to be as far back as 2005. In '04, Intel embarked on a multi-year project after seeing that clock rates could not scale indefinitely due to material (process) and power constraints. Larrabee was years in the making, feeding us a stream of both promising and embarrassing headlines over the course of its development.

The milestones on Intel's timeline were all met with great interest as the company evangelized a concept of many integrated cores that was different from what its competition was doing. Of course, when it came to be known that Larrabee would under-perform existing graphics processors from AMD and Nvidia, Intel canceled its plan to introduce a graphics card of its own and instead focused on the architecture's high-performance computing capabilities. As we'll see, pre-production examples of the hardware are already part of the Top500 project. 

As part of its Xeon Phi launch, Intel flew members of the press into the Texas Advanced Computing Center to see the Stampede supercomputer, which employs Xeon Phi. Of course, we were able to sneak in a few photos of one of the world's fastest computing systems during the trip. But before we're able to understand Intel's approach to HPC, we need to understand Larrabee. So, let's take a brief step back in time.

  • esrever
  • tacoslave
    i wonder if they can mod this to run games...
  • mocchan
    Articles like these is what makes me more and more interested in servers and super computers...Time to read up and learn more!
  • wannabepro
    Highly interesting.
    Great article.

    I do hope they get these into the hands of students like myself though.
  • ddpruitt
    Intriguing idea....

    These X86 cores have the uumph to run something a little more complex than what a GPGPU can. But is it worth it and what kind of effort does it require. I'd have to disagree with Intel's assertion that you can get used to it by programming for an "i3". Anyone with a relatively modern graphics card can learn to program OpenCL or CUDA on there own system. But learning how to program 60 cores efficiently (or more) from an 8 core (optimistically) doesn't seem reasonable. And how much is one of these cards going to run? You might get more by stringing a few GPUs together for the same cost.

    I'm wonder if this is going to turn into the same time of niche product that Intel's old math-coprocessors did.
  • CaedenV
    man, I love these articles! Just the sheer amounts of stuffs that go into them... measuring ram in hundreds of TBs... HDD space in PBs... it is hard to wrap one's brain around!

    I wonder what AMD is going to do... on the CPU side they have the cheaper (much cheaper) compute power for servers, but it is not slowing Intel sales down any. Then on the compute side Intel is making a big name for themselves with their new (but pricy) cards, and nVidia already has a handle on the 'budget' compute cards, while AMD does not have a product out yet to compete with PHI or Tesla.
    On the processor side AMD really needs to look out for nVidia and their ARM chip prowess, which if focused on could very well eat into AMD's server chip market for the 'affordable' end of this professional market... It just seems like all the cards are stacked against AMD... rough times.

    And then there is IBM. The company that has so much data center IP that they could stay comfortably afloat without having to make a single product. But the fact is that they have their own compelling products for this market, and when they get a client that needs intel or nvidia parts, they do not hesitate to build it for them. In some ways it amazes me that they are still around because you never hear about them... but they really are still the 'big boy' of the server world.
  • A Bad Day
    *Looks at the current selection of desktops, laptops and tablets, including custom built PCs*

    *Looks at the major data server or massively complex physics tasks that need to be accomplished*

    *Runs such tasks on baby computers, including ones with an i7 clocked to 6 GHz and quad SLI/CF, then watches them crash or lock up*


    tacoslavei wonder if they can mod this to run games...
    A four-core game that mainly relies on one or two cores, running on a thousand-core server. What are you thinking?
  • ThatsMyNameDude
    Holy shit. Someone tell me if this will work. Maybe, if we pair this thing up with enough xeons and enough quadros and teslas, we can connect it with a gaming system and we could use the xeons to render low load games like cod mw3 and tf2 and feed it to the gaming system.
  • mayankleoboy1
    Main advantage of LRB over Tesla and AMD firepro S10000 :

    A simple recompile is all thats needed to use PHI. Tesla/AMD needs a complete code re write. Which is very very expensive .
    I see LRB being highly successful.
  • PudgyChicken
    It'd be pretty neat to use a supercomputer like this to play a game like Metro 2033 at 4K, fully ray-traced.

    I'm having nerdgasms just thinking about it.