Skip to main content

Sponsored Article: Hyper Performance Computing with Intel Parallel Studio 2011 XE

Taking Care of the Cluster

Last month, at the SC10 trade show, Intel conducted its first public demonstration of “Knights Ferry,” the company’s debut HPC accelerator platform, featuring a 1.2 GHz, 32-core x86 processor with quad-HyperThreading. The Larrabee graphics architecture that Intel scrapped from becoming a consumer GPU product morphed into the Knights Ferry effort. As an x86-based graphics architecture, Larrabee was expected to be a powerhouse for general purpose GPU (GPGPU) computing made even more attractive by its industry standard x86 architecture. While a consumer incarnation didn’t pan out, the effort that went into Larrabee may still pay big dividends in the HPC world. Clearly, Knights Ferry should be making NVIDIA’s Tesla group a bit nervous.

To illustrate Knights Ferry in action, Intel took massive financial derivative problems and showed that the new many integrated core (MIC) platform could deliver twice the performance of Intel’s prior generation HPC platform. The application was written in standard C++ code with a version of the Intel Parallel Studio XE 2011 tweaked for Intel’s MIC. The point was to show how conventional applications could be improved and greatly scaled via Parallel Studio XE for ultra-demanding future applications.

Of course, most supercomputing doesn’t happen on a stand-alone system. Loads are distributed across server clusters, elevating the role of parallelism to a whole new level. Predictably, the software tools needed to code for clusters vary somewhat from the tools used on single systems. Many of the primary elements in Intel Cluster Studio 2011 carry over from Studio XE 2011, but there are some notable differences.

First off, there’s Intel’s message-passing interface (MPI) library. This helps programmers integrate forward-looking MPI-2 support while also liberating apps to run on multiple types of cluster fabric interconnects. All told, Cluster Studio can help scale applications for compatibility with up to 50,000 cores. In place of Intel Inspector XE and VTune Amplifier XE, Cluster Studio 2011 employs the Intel Trace Analyzer and Collector. This tool pinpoints where application bottlenecks and communication hotspots exist while also tracking and reporting performance data across all threads. The Analyzer helps users work through debugging and features a rich, multi-layered GUI.

The bottom line is that Intel now offers the tools needed for creating and updating multi-threaded software at all levels. The industry has big needs in every space from consumer entertainment to quantum modeling and every business application in between. To meet these needs, developers need mature, well-supported ways to manage their development workflow. Whether that workflow applies to individual systems or clusters, Intel now has the right Studio suite for the job. Give the trial a spin. There’s nothing to lose—but 10% to well over 50% in multi-threaded performance to gain. Buy or renew here.

  • Is this thing only beneficial for Intel, or are the AMD CPU's going to get a free ride out of this as well? And also, if Intel is so much interested in improving multi-core performance in games, wouldn't it have helped if they'd let Nvidia or AMD(ATI) in on the development of this software? Just wondering. Cheers.
  • dEAne
    This is very helpful to me. Thank you so much people from toms hardware.