FSB Limits Exposed: Intel CPUs Don't Scale Very Well In UC Berkeley Test


Berkeley (CA) - Researchers from the Computer Science Division at UC Berkeley and Lawrence Berkeley National Laboratories (CRD/NERSC) recently submitted a paper to the IEEE, highlighting the subject of scaling an optimized Lattice Boltzmann Simulation on popular supercomputer architectures. TG Daily was told that the paper was good enough to prompt the IEEE to issue an award. However Intel may not be completely happy with the findings: At least in this very specific environment, the Xeon and Itanium 2 processors did not scale very well, while Sony’s Cell BE came out on top.

The paper itself was published as "Lattice Boltzmann Simulation Optimization on Leading Multicore Platforms" and tries to shed some light on a specific area of socket-per-socket HPC (High-Performance-Computing) scaling in supercomputer environments. The scientists evaluated AMD’s Opteron (Santa Rosa), Intel’s Itanium 2 and Xeon (Clovertown), as well as the Sony-Toshiba-IBM Cell BE and Sun’s Niagara 2 processors. The researchers apparently spent quite some time on optimizing the application itself, rather than the hardware. This optimization was claimed to have resulted in a 14x improvement over the original LBMHD code (Lattice Boltzmann magneto-hydrodynamics).

According to the paper, the best scaling was delivered by the STI Cell BE system, followed by Sun’s Niagara 2, AMD’s Opteron, Intel’s Xeon and Itanium 2.

We contacted Intel to discuss UC Berkeley’s findings, but Intel declined to comment as the company said it wasn’t familiar with the content in the paper.

However, Lattice Boltzmann applications are known to have a high demand for system memory bandwidth and this fact may have put Intel’s system at a disadvantage in this specific test: Intel uses FB-DIMM 667, AMD DDR2-667, Niagara 2 FB-DIMM 667 and Cell the ultra-fast (and Rambus-based) XDR technology. Until Nehalem (Bloomfield core) and its integrated triple DDR3 memory controller comes along, Intel is likely to trail the pack in such tests. Regardless of the name of your Xeon processor, whether it is Cloverfield, Harpertown or Tigerton, any bandwidth-intensive application will cause a poor scaling performance on a FSB-burdened platform. In this UC Berkeley test, Intel’s Xeon and Itanium 2 followed the pack with a substantial distance.

It is interesting to note that even AMD’s Opteron processors were scaling almost in a linear fashion when additional CPUs were added. The Xeons scaled only by 43% on a socket-per-socket basis.

The lesson learned? Obviously, there are different benchmarks out there, most of them stressing a particular discipline. This specific test indicates that you should not run a memory bandwidth-intensive application through a Xeon or Itanium 2 system, if you have the luxury of having an Opteron, Niagara 2 or Cell system available as well. But does it mean that Xeons and Itanium generally scale worse than other architectures? No. There is more to supercomputers than memory bandwidth and Intel certainly has the edge on pure processing horsepower at this time.

  • TripGun
    If the test is so dependent on memory bandwidth then it seems they should have submitted a few GPU's in their test as they are so well known for their throughput. These synthetic benchmarks are optimized for certain core architecture and should be taken with a grain of salt. What are they teaching our kids in school?
  • Kari
    I wouldn't call LBMHD as a 'synthetic benchmark', they actually use it in their research and stuff.. :P
  • StenLi
    "..AMD?s Opteron (Santa Rosa).." r u sure? :)
  • traviso
    But did they test the old 65nm Xeons or the new 45nm ones, as they have a much higher FSB (800mhz vs 1033mhz and 1666mhz). I'm suspecting they probably didn't.

    Also, Intel has a new chipset coming out in the end of 2008 that totally gets rid of the northbridge/southbridge concept and is promising (no benchmarks yet, but over the past 1.5yrs, Intel has been delivery on promises, unlike the past) to make big improvements in bus limitations for PCs.