Sign in with
Sign up | Sign in

Larrabee Versus Cell

Larrabee: Intel's New GPU
By

It’s very tempting to compare Larrabee and Cell. Both use a multitude of single cores (in-order), putting the accent on vector calculation, 256 KB of dedicated memory per core, a ring bus to connect it all, etc. The similarities are numerous at first glance. Yet, the differences are also substantial: The Cell is first and foremost a CPU. Although it’s oriented toward streaming-type applications, it is not intended for rendering calculation, and consequently, there are no texture units.

Another major difference is in the way memory is managed. On the Cell, except for the PPE, which is the only part of the processor that has a global vision of the memory space, all the SPU's memory accesses are limited to 256 KB of local store memory. So, access to main memory must be done explicitly via direct memory access (DMA) operations. Conversely, as we saw earlier, all of Larrabee’s cores have access to the entire memory space, via a cache memory whose management is transparent to the programmer, even if the programmer does have a certain form of control. Intel’s choice greatly simplifies programming and avoids having to include a more generalist core like the PPE. This heterogeneous system is one of the Cell’s handicaps, since it complicates things for the programmer. In addition to explicit management of memory, he or she must also build two executables using two different sets of instructions, which means using two different compilers.

So Larrabee’s cores are much more complete than the Cell’s SPUs, since they support all the x86 instructions. However, their performance is also better in terms vector calculation. That’s because they operate on 512-bit vectors instead of the SPUs’ 128 bits, and while the Cell should have the advantage in clock frequency (Larrabee is expected to clock at 2 to 2.5 GHz, but that’s still very hypothetical), that doesn’t compensate for such a big disadvantage. So does that mean the Cell has nothing going for it? Not really. The Cell, with 234 million transistors (a number that was impressive three years ago but is far from earth shattering today) will be significantly less costly to manufacture than the Larrabee, which will be much larger and very expensive to produce.

Finally, although it hasn’t met with the success some were expecting, the Cell is still built into more than 20 million PlayStation 3s, and lot of programmers who have been working very hard at developing applications for this platform for three years now have undeniable expertise in how to get the most out of it. For the time being, Larrabee only exists on paper, and even once it becomes available, few programmers are likely to have the courage to “write to the metal.” Most will simply use the APIs (OpenGL/Direct3D for 3D and OpenCL/Compute Shader for GPGPU).

However, from a hardware point of view, it’s undeniable that Larrabee is much more interesting. The Cell prefigured several important concepts that are now showing up in Larrabee. But with hindsight, it may have been too ambitious for its time, and IBM had to make serious compromises to make its vision compatible with the technology available then.

React To This Article