IBM releases complete Cell v. 1.0 specifications

Hopewell Junction (NY) - With less than a minimum of fanfare, IBM today released the complete version 1.0 specification for its forthcoming Cell processor, or as the company now calls it, the "Cell Broadband Engine." IBM will be producing this graphics-intensive CPU in cooperation with Sony and Toshiba.

What Intel calls "cores" in a processing engine, IBM calls "elements;" and IBM's counterpart to what Intel called "NetBurst" in Pentium 4, is referred to as "CBEA." So it may, at some point, become politically incorrect to refer to Cell as a "multicore" processor. Nonetheless, as previously anticipated, the specification mandates that a CBEA-compliant processor must contain at least one of two types of processing elements: a PowerPC processor and a secondary type called the synergistic processor element (SPE). Like the co-processor of ancient days, an SPE is subordinate to the PowerPC element, and performs no system management functions whatsoever. Instead, it can be delegated user-specific tasks, especially graphics processing, which can take advantage of the SPE's Single Instruction/Multiple Data (SIMD) architecture.

As the Cell 1.0 spec confirms, the SPEs utilize "true SIMD" architecture. Here, a single control unit can direct a multitude of slave arithmetic units to execute the same category of instruction on multiple data elements simultaneously. The "true" part of Cell's SIMD implementation lies in the simultaneous execution: In graphics cards, multiplexing of instructions takes place through pipelines, which address different elements of memory in sequence, often by offsetting an index with each iteration. In Cell's true SIMD, all the arithmetic units are given their marching orders, where they march in unison rather than single-file.

But here is where Cell's architecture becomes truly unique: No SPE has a view into system memory. In Intel's multicore technology, for instance, all processing cores operate as fully-capable CPUs unto themselves, with equivalent access to system memory whose arbiter is a memory controller looking over the front-side bus. In Cell architecture, only the PowerPC element (PPE) has a view of system memory, and there can be as few as one of these elements within a processor. The PPE is the only conventional processing element, with complete access to system functions (or sharing that access with another PPE, when present). Those system functions include directing another processing element - which hasn't been discussed in detail until today - with the more familiar-sounding name of Memory Interface Controller. The MIC fetches swatches of memory for the SPEs, providing them with a shared, collective "sandbox."

Here is where cache organization plays a critical role. Each PPE has its own L1 cache, as you might expect, which is not shared with other PPEs. Performance is boosted - as with the Power processors we've seen to date - by an L2 cache, the size of which appears not to be limited by the spec. For the SPEs, there is a separate and new type of cache called the SL1. All SPEs within a group share a single SL1 cache. This cache is the only world they know. In conventional caching, the processor addresses data in memory by its absolute address, but caches provide that memory as though it were being provided directly from system RAM. But SPEs are little computers, and the SL1 cache is their system RAM. The memory controller acquires the products of their work like a teacher picking up after her students at the end of class.

Another unique revelation of the Cell 1.0 specification is an apparent second order of element grouping - or, translated into an Intel context, a "multi-multicore" possibility. A CBEA-compliant processor package can contain groups of PPE elements and separate groups of SPE elements. Judging from the algebra IBM uses to describe the interaction between elements, there need not necessarily be as many PPE groups as SPE groups. This is important because it indicates that grouping isn't necessarily the product of simply sandwiching multiple Cell processors together, although the specification deals entirely with logic and not packaging. It's therefore conceivable that a Cell processor vendor could create multiple performance tiers by integrating any number of SPEs (probably in multiples of two) with one, two, or three PPEs.

The Cell processor is slated to make its premiere as the principal engine of Sony's PlayStation 3 game console, due for release in spring 2006. For the inquisitive reader, the other two companies involved in jointly producing the Cell processor are the same Sony and Toshiba that are currently locked in a market war over high-definition DVD standards.