Berkeley (CA) - Researchers from the Computer Science Division at UC Berkeley and Lawrence Berkeley National Laboratories (CRD/NERSC) recently submitted a paper to the IEEE, highlighting the subject of scaling an optimized Lattice Boltzmann Simulation on popular supercomputer architectures. TG Daily was told that the paper was good enough to prompt the IEEE to issue an award. However Intel may not be completely happy with the findings: At least in this very specific environment, the Xeon and Itanium 2 processors did not scale very well, while Sony’s Cell BE came out on top.
The paper itself was published as "Lattice Boltzmann Simulation Optimization on Leading Multicore Platforms" and tries to shed some light on a specific area of socket-per-socket HPC (High-Performance-Computing) scaling in supercomputer environments. The scientists evaluated AMD’s Opteron (Santa Rosa), Intel’s Itanium 2 and Xeon (Clovertown), as well as the Sony-Toshiba-IBM Cell BE and Sun’s Niagara 2 processors. The researchers apparently spent quite some time on optimizing the application itself, rather than the hardware. This optimization was claimed to have resulted in a 14x improvement over the original LBMHD code (Lattice Boltzmann magneto-hydrodynamics).
According to the paper, the best scaling was delivered by the STI Cell BE system, followed by Sun’s Niagara 2, AMD’s Opteron, Intel’s Xeon and Itanium 2.
We contacted Intel to discuss UC Berkeley’s findings, but Intel declined to comment as the company said it wasn’t familiar with the content in the paper.
However, Lattice Boltzmann applications are known to have a high demand for system memory bandwidth and this fact may have put Intel’s system at a disadvantage in this specific test: Intel uses FB-DIMM 667, AMD DDR2-667, Niagara 2 FB-DIMM 667 and Cell the ultra-fast (and Rambus-based) XDR technology. Until Nehalem (Bloomfield core) and its integrated triple DDR3 memory controller comes along, Intel is likely to trail the pack in such tests. Regardless of the name of your Xeon processor, whether it is Cloverfield, Harpertown or Tigerton, any bandwidth-intensive application will cause a poor scaling performance on a FSB-burdened platform. In this UC Berkeley test, Intel’s Xeon and Itanium 2 followed the pack with a substantial distance.
It is interesting to note that even AMD’s Opteron processors were scaling almost in a linear fashion when additional CPUs were added. The Xeons scaled only by 43% on a socket-per-socket basis.
The lesson learned? Obviously, there are different benchmarks out there, most of them stressing a particular discipline. This specific test indicates that you should not run a memory bandwidth-intensive application through a Xeon or Itanium 2 system, if you have the luxury of having an Opteron, Niagara 2 or Cell system available as well. But does it mean that Xeons and Itanium generally scale worse than other architectures? No. There is more to supercomputers than memory bandwidth and Intel certainly has the edge on pure processing horsepower at this time.