Xeon Phi: Intel's Larrabee-Derived Card In TACC's Supercomputer

A Look Into The Competition

Intel's Xeon Phi launch needs to be taken into context. There are many trends that are converging on the big data analytics space.

Nvidia

Nvidia is going in a couple of interesting directions pertinent to our discussion today. First up is its Tegra 3 (and, by extension, upcoming versions of Tegra). One of the platform's biggest attractions is its four (plus one) ARM cores and GPU in a low-power package. That's minimal power consumption, even on a no-longer-current 40 nm process. When we look at examples like the Barcelona Supercomputing Center's Mont-Blanc Project, it becomes clear that are opportunities to create powerful clusters using power-optimized hardware. Nvidia will, however, need to enable cores with ECC support.

Of course, as we see from ORNL's Titan supercomputer, Nvidia's push for CUDA adoption yields sigificant results, particularly when any extraneous development costs are outweighed by performance gains. Should Nvidia continue to demonstrate the benefits of its architecture and platform, it'll have an easier time convincing ISVs to jump on-board.

AMD

AMD's portfolio of HPC-capable components is perhaps the most interesting, if only because of its diversity. 

To begin, it has Socket G34, and is well known for low-cost quad-processor configurations. Cray uses Socket G34 heavily, and in fact deployed it in the Titan supercomputer. Of course, Titan employs Opteron 6200-series CPUs, so there's yet another upgrade path should Opteron 6300-series chips show enough promise.

There's also a compelling GPU-oriented compute story. On the desktop, Fusion-based APUs already demonstrate the potential of x86 cores and graphics processing resources on the same die. Given the suggested modularity of AMD's Bulldozer module, it's not hard to imagine the company replacing its shared floating-point unit with shader cores for OpenCL-optimized applications.

Moving in the other direction, AMD countered some of Intel's cloud computing-specific advantages with the acquisition of SeaMicro, and will be embedding its Freedom Fabric into an upcoming generation of server processors with 64-bit ARM cores.

HP/Dell

HP and Dell are very large players in the HPC space. Stampede is largely made up of Dell servers, for instance. Both HP and Dell have relationships with many vendors, and therefore many technologies. Both are exploring ARM for cloud computing (and HPC), much like AMD.

In the case of Stampede, Dell's design was very much based on Intel's hardware. With that said, both Dell and HP have AMD and ARM in parts of their portfolios, and enjoy huge sales networks to use whichever company's technologies they want.

IBM

No discussion of supercomputing is complete without mentioning IBM. After all Sequoia was given the top spot on the Top500 list, achieving 16.32 petaFLOPS, using IBM's PowerPC architecture.

In developing the Cell architecture, which was made famous for its use in the PlayStation 3 but also helped score another number-one finish on the Top500 list back in 2008, IBM implemented the idea of using smaller specialized cores to increase compute efficiency.

Facing PowerPC, 64-bit ARM, and graphics-oriented hardware, Intel's advocacy of x86 makes sense (at least for its own business purposes). Xeon Phi's value is that it allows developers to spend less time learning new languages and tools. Instead, there's a many-core x86-based solution that works in much the same way as Intel's CPUs, at least from a programming standpoint.

  • esrever
    meh
    Reply
  • tacoslave
    i wonder if they can mod this to run games...
    Reply
  • mocchan
    Articles like these is what makes me more and more interested in servers and super computers...Time to read up and learn more!
    Reply
  • wannabepro
    Highly interesting.
    Great article.

    I do hope they get these into the hands of students like myself though.
    Reply
  • ddpruitt
    Intriguing idea....

    These X86 cores have the uumph to run something a little more complex than what a GPGPU can. But is it worth it and what kind of effort does it require. I'd have to disagree with Intel's assertion that you can get used to it by programming for an "i3". Anyone with a relatively modern graphics card can learn to program OpenCL or CUDA on there own system. But learning how to program 60 cores efficiently (or more) from an 8 core (optimistically) doesn't seem reasonable. And how much is one of these cards going to run? You might get more by stringing a few GPUs together for the same cost.

    I'm wonder if this is going to turn into the same time of niche product that Intel's old math-coprocessors did.
    Reply
  • CaedenV
    man, I love these articles! Just the sheer amounts of stuffs that go into them... measuring ram in hundreds of TBs... HDD space in PBs... it is hard to wrap one's brain around!

    I wonder what AMD is going to do... on the CPU side they have the cheaper (much cheaper) compute power for servers, but it is not slowing Intel sales down any. Then on the compute side Intel is making a big name for themselves with their new (but pricy) cards, and nVidia already has a handle on the 'budget' compute cards, while AMD does not have a product out yet to compete with PHI or Tesla.
    On the processor side AMD really needs to look out for nVidia and their ARM chip prowess, which if focused on could very well eat into AMD's server chip market for the 'affordable' end of this professional market... It just seems like all the cards are stacked against AMD... rough times.

    And then there is IBM. The company that has so much data center IP that they could stay comfortably afloat without having to make a single product. But the fact is that they have their own compelling products for this market, and when they get a client that needs intel or nvidia parts, they do not hesitate to build it for them. In some ways it amazes me that they are still around because you never hear about them... but they really are still the 'big boy' of the server world.
    Reply
  • A Bad Day
    esrevermeh
    *Looks at the current selection of desktops, laptops and tablets, including custom built PCs*

    *Looks at the major data server or massively complex physics tasks that need to be accomplished*

    *Runs such tasks on baby computers, including ones with an i7 clocked to 6 GHz and quad SLI/CF, then watches them crash or lock up*

    ENTIRE SELECTION IS BABIES!

    tacoslavei wonder if they can mod this to run games...
    A four-core game that mainly relies on one or two cores, running on a thousand-core server. What are you thinking?
    Reply
  • ThatsMyNameDude
    Holy shit. Someone tell me if this will work. Maybe, if we pair this thing up with enough xeons and enough quadros and teslas, we can connect it with a gaming system and we could use the xeons to render low load games like cod mw3 and tf2 and feed it to the gaming system.
    Reply
  • mayankleoboy1
    Main advantage of LRB over Tesla and AMD firepro S10000 :

    A simple recompile is all thats needed to use PHI. Tesla/AMD needs a complete code re write. Which is very very expensive .
    I see LRB being highly successful.
    Reply
  • PudgyChicken
    It'd be pretty neat to use a supercomputer like this to play a game like Metro 2033 at 4K, fully ray-traced.

    I'm having nerdgasms just thinking about it.
    Reply