Sign in with
Sign up | Sign in

IBM Sequoia to Blaze Past Other Supercomputers

By - Source: Tom's Hardware US | B 19 comments

There is a new speed demon in the supercomputer circuit.

IBM has revealed that its next supercomputer will be delivered to the U.S. government sometime in 2012. Dubbed "Sequoia," the new machine will be based on IBM's 45-nanometer PowerPC processors, with each processor containing 16 cores. The Sequoia will have over 4000 processors per rack (4096), and up to 1.6 million cores total. 

According to EETimes, Sequoia deal with the government is two fold. First, IBM will deliver a BlueGene/P supercomputer to the Lawrence Livermore National Laboratory in Livermore, California. The BlueGene/P is a 65nm PowerPC-based system that can perform at over one Petaflops (or one quadrillion floating point operations per second). While that is certainly nothing to sneeze at, the Sequoia can do much, much more. "The Sequoia system will be 15 times faster than BlueGene/P," said IBM's Herb Schultz, "with roughly the same footprint and a modest increase in power consumption."

While the BlueGene/P will be delivered sometime this April, the Sequoia will not arrive at the government facility until sometime in 2012. The 20 Teraflops speedster will have 1.6 Petabytes (note: 1 Petabyte = over one million gigabytes), connected to its 1.6 million cores. Beyond that, the details are scarce.

So what will the Sequoia be doing? Plotting world domination? Tracking down alien life forms? Its primary purpose will be to calculate nuclear explosions, along with analyzing the entire U.S. nuclear stockpile. The Sequoia was one of four bids considered by the National Nuclear Security Administration and the U.S. Dept. of Energy. "These powerful machines will provide NNSA with the capabilities needed to resolve time-urgent and complex scientific problems, ensuring the viability of the nation's nuclear deterrent into the future," said NNSA administrator Thomas D'Agostino. "This endeavor will also help maintain U.S. leadership in high performance computing and promote scientific discovery."

The Petaflop barrier was originally broken in June of 2008, when IBM announced that its Roadrunner supercomputer would be able to consistently calculate at such a high level. With the Sequoia topping out at around 20Tflops, the bar has been raised tremendously in the span of only seven months.

Analyzing nuclear weapons is all well and good, but we're hoping the scientists in Livermore try running Crysis on this bad boy during their downtime.

Discuss
Ask a Category Expert

Create a new thread in the News comments forum about this subject

Example: Notebook, Android, SSD hard drive

This thread is closed for comments
  • 2 Hide
    StupidRabbit , February 5, 2009 9:14 AM
    Looks like the NNSA will be playing "Global Thermonuclear War"...
  • 1 Hide
    leo2kp , February 5, 2009 10:12 AM
    I don't think Crysis will work on that thing. It only uses 2 or 4 cores. Wouldn't run any better, if not worse, than a high-end gaming rig IMO.
  • 0 Hide
    Shadow703793 , February 5, 2009 10:33 AM
    1. Why not use Cell and GPUs fore more acceleration.

    2. Why PowerPC CPUs? Cause of the already written code for PPC?
  • Display all 19 comments.
  • 0 Hide
    curnel_D , February 5, 2009 10:48 AM
    Shadow7037931. Why not use Cell and GPUs fore more acceleration.2. Why PowerPC CPUs? Cause of the already written code for PPC?

    For the poeple writing the code, they're probably more experienced with writing unix based code for ppc instead of x86. Just a guess.
  • 5 Hide
    daBliggah , February 5, 2009 11:02 AM
    Quote:
    The 20 Teraflops speedster will ...


    Shouldn't that read "The 20 Petaflops speedster? Otherwise it would be 1000 times slower than the BlueGene/P at LLNL.

  • 0 Hide
    daskrabbe , February 5, 2009 12:21 PM
    leo2kpI don't think Crysis will work on that thing. It only uses 2 or 4 cores. Wouldn't run any better, if not worse, than a high-end gaming rig IMO.


    I'm pretty sure the graphics use more than 4 pipelines.
  • 3 Hide
    Pei-chen , February 5, 2009 1:03 PM
    Hmm, 2012. Maybe the existence of this super fast PC is what's going to cause the mass extinction on Earth.
  • 3 Hide
    FlayerSlayer , February 5, 2009 1:20 PM
    "With the Sequoia topping out at around 20Tflops, the bar has been raised tremendously in the span of only seven months."

    Don't you mean 20 Pflops? And 7 months from the CREATION of a 1 PFLOPS machine to the DESIGN of the 20 PFLOPS machine, it will be 4 years between their creation.
  • 0 Hide
    Tindytim , February 5, 2009 2:40 PM
    leo2kpI don't think Crysis will work on that thing. It only uses 2 or 4 cores. Wouldn't run any better, if not worse, than a high-end gaming rig IMO.

    It wouldn't work, but it has nothing to do with cores. It's PowerPC, not x86. Unless they got there hands on the source, or Crytek decided to releases it specifically for that system, it's not going to work.

    Shadow7037931. Why not use Cell and GPUs fore more acceleration.2. Why PowerPC CPUs? Cause of the already written code for PPC?

    You do know that the Cell is a PowerPC based processor? And they don't need GPUs, this isn't going to do graphical processing. It's for number crunching, not HD video, or gaming.
  • 0 Hide
    AbhiNambiar , February 5, 2009 4:27 PM
    Maybe the Buddhists in Tibet would like to borrow it for a couple days, perhaps? (reference - Arthur C. Clarke, The Nine Billion Names of God)
  • 0 Hide
    Mr_Man , February 5, 2009 4:35 PM
    I'm cool with this just as long as they stick with the name Sequoia. If they ponder naming it Skynet... I'll be moving to Mars.
  • -1 Hide
    resonance451 , February 5, 2009 4:56 PM
    No, I think Crysis will slow it down. That miserable waste of resources can take a 20 lumaflop super super computer and put it to waste.
  • 0 Hide
    TheFace , February 5, 2009 5:10 PM
    Kilo
  • 1 Hide
    nottheking , February 5, 2009 7:32 PM
    Shadow7037931. Why not use Cell... ...fore more acceleration.

    Contrary to what most are under the belief of, the Cell actually sucks for general-purpose computing operations. Each of the SPEs is utterly "dumb" in that it cannot handle complex operations, and really only works well for streaming SIMD operations.

    Additionally, the performance numbers claimed by Sony are utterly bogus, just like their claims that the PS2 would've been restricted and impossible to export from Japan for being "too technologically advanced." Basically, it's NOT a 1-2 TFlop chip, and in the PS3, can only handle up to 185.6 GigaFLOPs as its theoretical peak... If they are 32-bit operations! FP32 is not very useful for heavy-math applications, including gaming, physics, and anything a supercomputer would be handling, which is usually dealing with double-precision FP64, which is used by the well-known LINPACK benchmark, (the standard for testing supercomputers) as well as popular distributed computing applications such as all those @home programs. On such a test, the Cell's design would cripple it to a peak theoretical performance of around 51.2 GigaFLOPs; a number that's comparable to, say, a readily affordable PC gaming CPU like the Core2Duo E8400, with the C2D having the advantage of being vastly more programmable and flexible, meaning it'd most likely score much higher on LINPACK. In other words, the peak for the Cell assumes that the 1/8th of the 64-bit power that lies in the PPE isn't being sapped by giving commands to the PPEs, and that the 7/8ths that are the SPEs aren't being held up waiting for commands from the PPE, which is a situation that'd be very hard to get in the first place, and an impossible one to sustain outside of streaming SIMD for longer than a couple clock cycles, let alone the 3.2 billion cycles for a full second. And once you consider the extreme level of power consumption of the Cell... It doesn't make sense for supercomputing applications; it winds up LESS efficient than other designs.

    Basically, the Cell was designed to be really good at one thing: handling streaming media. Encoding/decoding is an application that's very intense on low-precision math, and not very intense at all when it comes to instructions. Hence, the Cell is a perfect fit; it is the only chip out there (CPU, GPU, whatever) that's capable of handling SEVERAL fully high-definition video streams, complete with HDCP and even in exotic formats like DivX/XviD, all simultaneously, an application that is largely impossible for most PC CPUs without the aid of a graphics card alongside it.

    Shadow703793Why not use... ...GPUs fore more acceleration.

    It's not the same problem with Cell processors that results in no supercomputers built out of GPUs, but nonetheless, there are technical reasons behind it that keep them from being used as well. There are actually three that I can think of.

    The first is that of memory latency; GPUs operate with a very high degree of latency on the memory; since they're handling relatively linear tasks, and when dealing with textures and shaders, always call up very large, sequential blocks of memory at a time, having a CAS latency of 30+, 40+, or more clock cycles doesn't really matter, since the GPU will know much farther in advance what it'll be needing next around 99% of the time. The same benefit can be applied to decoding media; being a streaming application, latency doesn't hurt it. However, when it comes to scientific applications, that really can be harmful, as in those cases the predominant bottleneck invariably winds up being data and instruction latency, something that's also hurt heavily by how GPUs have an extremely skewed processing unit-to-cache ratio, a ratio that's vastly different than what's found in general-purpose CPUs.

    The second reason that occurred to me is the lack of a standard multi-GPU architecture that would be able to support a large quantity of GPUs even just for mathematic operations; the current limit for ANY design appears to be 4 GPUs, from either nVidia or ATi/AMD. So, while yes, while in theory you could produce the same floating-point capacity using only 1/7.5th the number of RV770s compared to what Sequoia uses (i.e, 13.3% the number) as of yet, there is no way to actually build that assembly, so in practice, it's a moot point.

    The final reason is actually that of power and heat; GPUs may have a very high degree of performance-per-watt efficiency when it comes to math, but they STILL have a very high TDP per chip. The cost of the actual chips are usually one of the minor parts of a supercomputer, as a lot more care has to be given to providing enough power to stably run thousands upon thousands of nodes, with not just multiple CPUs per node, but all the other components as well, all of which must be powered and cooled. With GPUs, you're going to have your heat production focused on a far smaller number of chips, so you'll need to actually have more intensive cooling, and likely greater spacing between GPUs, since you can't just blow hot air out the back of the case, since there will be more nodes in every direction. There's a good chance that one would actually have to construct a LARGER facility to house an equally-powerful supercomputer built from GPUs than one built from multi-core general-purpose CPUs.
  • 0 Hide
    Shadow703793 , February 5, 2009 11:04 PM
    Quote:

    You do know that the Cell is a PowerPC based processor? And they don't need GPUs, this isn't going to do graphical processing. It's for number crunching, not HD video, or gaming.

    Yes, Cell is partly based oof of Power. BUT the reason I mentioned GPU: GPGPU
  • 0 Hide
    Shadow703793 , February 5, 2009 11:22 PM
    @ Nottheking: Thanks for the info.
  • 0 Hide
    Anonymous , February 6, 2009 6:52 PM
    I think they won't need GPGPU.
    There are about 4000 cores in that baby; probably enough to fully simulate each thread per cpu, and play it at 100 viewpoints graphics to the max, and limitless viewingdistance @ 500fps or higher!
    It can be rerouted to select 1 processor by itself,or a cluster of processors, only calculating the stencil buffer,while another will be calculating light paths and shadows.
    Unlike a GPGPU which needs to run everything through 1 core,and benefits from a smaller thread than a cpu.
    In this case it probably wouldn't matter, even if it only had 5 CPU's with each being a Quadcore and without graphics card.

    As far as naming, they probably rather want to call it Mammoth than Baby.
  • 1 Hide
    DeadlyPredator , February 26, 2009 7:14 PM
    What a 400 000 R770 gpus and 196 tb of ddr5 ram super computer could do in theory? GPU = uber power at stream computing.
  • 0 Hide
    salem80 , March 26, 2009 3:01 PM
    that' stupid why he keep saying 20Tflops ..
    it's 20PetaFlos not Teraflops ..see IBM web Site
    http://www.ibm.com/developerworks/blogs/page/woolf?entry=ibm_sequoia_supercomputer