The EIB on Cell?

G

Guest

Guest
Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

Greetings!

http://arstechnica.com/articles/paedia/cpu/cell-2.ars

has a nice, if puzzling, layout of the Cell processor architecture.
What puzzles me is the "EIB," which, wideband though it may be, and
operating at a modest frequency, has *eleven* connections to what is
diagrammed as a single shared bus.

<quote>

The individual SPEs can use this bus to communicate with each other,
and this includes the transfer of data in between SPEs acting as peers
on the network. The SPEs also communicate with the L2 cache, with main
memory (via the MIC), and with the rest of the system (via the BIC).
The onboard memory interface controller (MIC) supports the new Rambus
XDR memory standard, and the BIC (which I think stands for "bus
interface controller" but I'm not 100% sure) has a coherent interface
for SMP and a non-coherent interface for I/O.

</quote>

Seems like that's a great deal of traffic and many of drops for one
bus. Any thoughts?

RM
 
G

Guest

Guest
Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

Robert Myers <rmyers1400@comcast.net> wrote:
> Greetings!

> http://arstechnica.com/articles/paedia/cpu/cell-2.ars

> has a nice, if puzzling, layout of the Cell processor architecture.
> What puzzles me is the "EIB," which, wideband though it may be, and
> operating at a modest frequency, has *eleven* connections to what is
> diagrammed as a single shared bus.

> <quote>

> The individual SPEs can use this bus to communicate with each other,
> and this includes the transfer of data in between SPEs acting as peers
> on the network. The SPEs also communicate with the L2 cache, with main
> memory (via the MIC), and with the rest of the system (via the BIC).
> The onboard memory interface controller (MIC) supports the new Rambus
> XDR memory standard, and the BIC (which I think stands for "bus
> interface controller" but I'm not 100% sure) has a coherent interface
> for SMP and a non-coherent interface for I/O.

> </quote>

> Seems like that's a great deal of traffic and many of drops for one
> bus. Any thoughts?

Not a multi-drop bus.

It's a repeater ring.

http://www.realworldtech.com/page.cfm?ArticleID=RWT021005084318&p=9

More SPE's you add, more cycles it takes for data to hop across the
ring, clockwise or counter-clockwise.

The data rings operages at 4 GHz. The control part operates at half
of that freq.

I am guessing that the control is a multidrop bus, although I do not
know for certain.

--
davewang202(at)yahoo(dot)com
 
G

Guest

Guest
Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

On Tue, 8 Mar 2005 15:07:50 +0000 (UTC), David Wang <foo@bar.invalid>
wrote:

>Robert Myers <rmyers1400@comcast.net> wrote:

<snip>

>
>> Seems like that's a great deal of traffic and many of drops for one
>> bus. Any thoughts?
>
>Not a multi-drop bus.
>
>It's a repeater ring.
>
>http://www.realworldtech.com/page.cfm?ArticleID=RWT021005084318&p=9
>
>More SPE's you add, more cycles it takes for data to hop across the
>ring, clockwise or counter-clockwise.
>
>The data rings operages at 4 GHz. The control part operates at half
>of that freq.
>
>I am guessing that the control is a multidrop bus, although I do not
>know for certain.

Thanks. I had actually looked at your writeup on realworldtech. For
some reason, the (really very clear) diagram of the EIB interconnect
didn't register.

The one-hop interconnect is consistent with the streaming architecture
I had been expecting.

RM
 
G

Guest

Guest
Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

On Tue, 8 Mar 2005 15:07:50 +0000 (UTC), David Wang <foo@bar.invalid> wrote:

>Robert Myers <rmyers1400@comcast.net> wrote:
>> Greetings!
>
>> http://arstechnica.com/articles/paedia/cpu/cell-2.ars
>
>> has a nice, if puzzling, layout of the Cell processor architecture.
>> What puzzles me is the "EIB," which, wideband though it may be, and
>> operating at a modest frequency, has *eleven* connections to what is
>> diagrammed as a single shared bus.
>
>> <quote>
>
>> The individual SPEs can use this bus to communicate with each other,
>> and this includes the transfer of data in between SPEs acting as peers
>> on the network. The SPEs also communicate with the L2 cache, with main
>> memory (via the MIC), and with the rest of the system (via the BIC).
>> The onboard memory interface controller (MIC) supports the new Rambus
>> XDR memory standard, and the BIC (which I think stands for "bus
>> interface controller" but I'm not 100% sure) has a coherent interface
>> for SMP and a non-coherent interface for I/O.
>
>> </quote>
>
>> Seems like that's a great deal of traffic and many of drops for one
>> bus. Any thoughts?
>
>Not a multi-drop bus.
>
>It's a repeater ring.
>
>http://www.realworldtech.com/page.cfm?ArticleID=RWT021005084318&p=9
>
>More SPE's you add, more cycles it takes for data to hop across the
>ring, clockwise or counter-clockwise.
>
>The data rings operages at 4 GHz. The control part operates at half
>of that freq.
>
>I am guessing that the control is a multidrop bus, although I do not
>know for certain.

That'd pretty much clobber any point in running the ring at 4ghz, wouldn't it?
Hopefully, all control is in-band...

/daytripper
 
G

Guest

Guest
Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

On Fri, 11 Mar 2005 04:50:26 +0000 (UTC), David Wang <foo@bar.invalid>
wrote:

>Larry R. Moore <larry.r.moore@att.net> wrote:
>> Some additional thoughts on the EIB logic:
>

<snip>

>
>> A "tag" as mentioned in the first article is something that usually
>> travels with a "package", in this case, the data. I don't understand
>> how a single tag can travel with data moving in two directions. Is it
>> possible that the tag is communicated somehow to the bus interface
>> controller (BIC)? The BIC must be told the destination of the data in
>> order to schedule its path. It would seem to me that there must be a
>> tag generated at each element outbound interface for each transmission
>> of data.
>
>My understanding of the EIB is that the rings are controlled by the
>switching network actually labelled as EIB in the center of the chip.
>The rings themselves as physical wires runs over parts of the SPE, and
>the EIB reaches into the SPE's to direct on/off/repeat buffer
>operations. The scheduling for the EIB is coordinated with the little
>block labelled as MBL (Master Bus Logic? I'm not sure) The BIC
>controls the FlexIO, not the EIB. The FlexIO block is a special
>circuit that Rambus developed for IBM's 90nm SOI process, and the BIC
>is the logic that drives the FlexIO circuits.
>
>Anyways, the reason why I think the "tag" runs to/from different places
>is this: Data doesn't have to travel to all SPE's, it just has to travel
>from source to destination, but the tag's have to be broadcast to all
>SPE's due to the fact that the SPE's do have to snoop the tag (bus?)
>for coherency of addresses in the host processor's address space.
>
>That's why I think that the tag part is a differnt sort of animal
>that lets you broadcast things, put address request on it, the EIB
>controller then sets up the switching fabric that directs the
>on/off/pass through operations on the data rings.
>
How do you envision that working, in practice? The on/off/pass
through setting of the interface at each SPE is programmed on time?
Pass through so many clock ticks, consume data for so many clock
ticks? How does the consuming SPE know what data it is getting?

RM
 
G

Guest

Guest
Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

Robert Myers <rmyers1400@comcast.net> wrote:
> On Fri, 11 Mar 2005 04:50:26 +0000 (UTC), David Wang <foo@bar.invalid>
> wrote:

> >My understanding of the EIB is that the rings are controlled by the
> >switching network actually labelled as EIB in the center of the chip.
> >The rings themselves as physical wires runs over parts of the SPE, and
> >the EIB reaches into the SPE's to direct on/off/repeat buffer
> >operations. The scheduling for the EIB is coordinated with the little
> >block labelled as MBL (Master Bus Logic? I'm not sure) The BIC
> >controls the FlexIO, not the EIB. The FlexIO block is a special
> >circuit that Rambus developed for IBM's 90nm SOI process, and the BIC
> >is the logic that drives the FlexIO circuits.
> >
> >Anyways, the reason why I think the "tag" runs to/from different places
> >is this: Data doesn't have to travel to all SPE's, it just has to travel
> >from source to destination, but the tag's have to be broadcast to all
> >SPE's due to the fact that the SPE's do have to snoop the tag (bus?)
> >for coherency of addresses in the host processor's address space.
> >
> >That's why I think that the tag part is a differnt sort of animal
> >that lets you broadcast things, put address request on it, the EIB
> >controller then sets up the switching fabric that directs the
> >on/off/pass through operations on the data rings.

> How do you envision that working, in practice? The on/off/pass
> through setting of the interface at each SPE is programmed on time?

Yes.

The way I imagine the EIB working is based on the notion that
"the user controls all data movement explicitly" via software
managed thread. So all the "tags" (i.e. requests) are going
to be initiated by the PPE. That is sent to the MBL, which
programs the EIB control for the on/off/pass through operations
as the data streams are moved from point A to point B. That
tag has to be sent from the PPE/MBL through the tag structure
to all SPE's in the processor, so they can have a chance to
snoop it if and intervene. The "intervention" mechanism in
turn means that the SPE's must be able to respond in some way
via the same tag structures back to PPE/MBL.

The EIB controller knows how long to hold each repeater element,
so when it's done, it just releases the switch and the switching
element can then be used for the construction of another set of
pipes to direct dataflow.

> Pass through so many clock ticks, consume data for so many clock
> ticks? How does the consuming SPE know what data it is getting?

The EIB controller will just have to hold the switches for as many
ticks as required. This jively nicely with the description of
"reserving channel capacity deterministically".

The SPE knows what data it is getting because the tag structure
interface reaches into the SPE and touches the DMA engine. The
DMA engine knows where the data is coming from and where in LS to
put that data. Or where in LS it should grab the data from and
how much of it to put onto the on ramp of the EIB.

<disclaimer>

Based on my understanding of the EIB control flow mechanism,
obtained from a 20 minute chat with the DE who designed the
EIB, with diagrams drawn literally on the back of a napkin.
It may contain inaccuracies due to faulty memory or incorrect
interpretation of statements.

</disclaimer>

--
davewang202(at)yahoo(dot)com
 
G

Guest

Guest
Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

On Fri, 11 Mar 2005 16:20:09 +0000 (UTC), David Wang <foo@bar.invalid>
wrote:

>Robert Myers <rmyers1400@comcast.net> wrote:

<snip>

>
>> How do you envision that working, in practice? The on/off/pass
>> through setting of the interface at each SPE is programmed on time?
>
>Yes.
>
>The way I imagine the EIB working is based on the notion that
>"the user controls all data movement explicitly" via software
>managed thread. So all the "tags" (i.e. requests) are going
>to be initiated by the PPE. That is sent to the MBL, which
>programs the EIB control for the on/off/pass through operations
>as the data streams are moved from point A to point B. That
>tag has to be sent from the PPE/MBL through the tag structure
>to all SPE's in the processor, so they can have a chance to
>snoop it if and intervene. The "intervention" mechanism in
>turn means that the SPE's must be able to respond in some way
>via the same tag structures back to PPE/MBL.
>
>The EIB controller knows how long to hold each repeater element,
>so when it's done, it just releases the switch and the switching
>element can then be used for the construction of another set of
>pipes to direct dataflow.
>

That makes the network like a circuit-switched telephone network.
Producer and consumer have a reserved connection until the
communication is complete, with the MBL giving a fast busy when no
circuit is available.

>> Pass through so many clock ticks, consume data for so many clock
>> ticks? How does the consuming SPE know what data it is getting?
>
>The EIB controller will just have to hold the switches for as many
>ticks as required. This jively nicely with the description of
>"reserving channel capacity deterministically".
>
>The SPE knows what data it is getting because the tag structure
>interface reaches into the SPE and touches the DMA engine. The
>DMA engine knows where the data is coming from and where in LS to
>put that data. Or where in LS it should grab the data from and
>how much of it to put onto the on ramp of the EIB.
>
Presumably there is a protocol that we have yet to learn about,
although I suppose the only thing a software type needs to know about
is the interface to the protocol.

><disclaimer>
>
>Based on my understanding of the EIB control flow mechanism,
>obtained from a 20 minute chat with the DE who designed the
>EIB, with diagrams drawn literally on the back of a napkin.
>It may contain inaccuracies due to faulty memory or incorrect
>interpretation of statements.
>
></disclaimer>

What? Like you're going to get sued over a usenet post? ;-).

RM
 
G

Guest

Guest
Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

Larry R. Moore <larry.r.moore@att.net> wrote:
> David Wang wrote:
> >
> > My understanding of the EIB is that the rings are controlled by the
> > switching network actually labelled as EIB in the center of the chip.
> > The rings themselves as physical wires runs over parts of the SPE, and
> > the EIB reaches into the SPE's to direct on/off/repeat buffer
> > operations. The scheduling for the EIB is coordinated with the little
> > block labelled as MBL (Master Bus Logic? I'm not sure)

> But let me back up a little. When you refer to an "on/off/repeat
> buffer", is this a buffer of 128 bits that can be written (on), read
> (off), or transmitted to the next buffer on the ring (repeat)? Does
> this mean that a word outbound from SPE#1 is moved to the SPE#2 buffer
> in one bus cycle and then repeated to the SPE#3 inbound buffer in a
> second bus cycle? Is it accurate to say that the data rings are
> comprised of 128 twelve-stage shift registers, with additional logic at
> each stage to support read/write functions? This is, of course, an
> oversimplification. It must be a little more complicated than this
> because the Hofstee paper suggests that there are both inbound and
> outbound interfaces that can be used simultaneously to effect twelve
> transfers in one bus cycle.

I haven't thought about what happens when you have one buffer dumping
data off at the "off ramp", and the "on ramp" driving data onto the
next state. Perhaps that's where you'd lose half of your efficiency
@ 4 GHz, and get the 96 byte per second concurrency.

I'll think about it and draw myself a structure to illustrate the
dataflow later. Right now I'm supposed to be doing something more
productive.

> > Anyways, the reason why I think the "tag" runs to/from different places
> > is this: Data doesn't have to travel to all SPE's, it just has to travel
> > from source to destination, but the tag's have to be broadcast to all
> > SPE's due to the fact that the SPE's do have to snoop the tag (bus?)
> > for coherency of addresses in the host processor's address space.
> >
> Oh! You believe that "tag" is the name of a bus? Perhaps it was meant
> in the sense of "playing tag" with the data. They could be simple
> signals controlling ring selection, on, off and repeat logic at each
> interface. I don't know if 64 bits would be enough, though.

I think of it as the "tag" of a block of memory. Sort of like a
tag for a cacheline, but this is a tag for a block of memory
explicitly passed to and from the LS. That's what contains the
request in terms of PPE address pointer/size/source/destination.

> > That's why I think that the tag part is a differnt sort of animal
> > that lets you broadcast things, put address request on it, the EIB
> > controller then sets up the switching fabric that directs the
> > on/off/pass through operations on the data rings.

> You may be right. In order to avoid bus contention, the MBL could be
> bus master, polling each of the interfaces for data transfer requests
> and writing data transfer schedules. It must be an interesting
> algorithm.

I don't think the MBL needs to "poll" in the sense of asking each guy
what its status is. I'd imagine that a lookup table would exist
somewhere near the MBL that tells the MBL about which trasfer(s)
are occuring on the EIB, how long the trasnfer is for, and when
"resoruce X" will be free. The PPE can then tell the MBL to
schedule the next trasfer based on resource availability.

> > I don't think it's a packet network. I think the data rings just carry
> > data, and the request is encapsulated in the tag ring/bus/blah, that
> > request goes through the EIB/MBL, sets up the transfer, and tells the
> > destination guy that something is coming.

> Yes, it doesn't make sense to shove the tag onto a data ring that may
> already be in use and blocked. A fabric switch would have packet
> buffering and that capability has to be shoved back into the computing
> elements. Better to send the tag to the scheduler, the MLB, right away.
> I think it is safe to assume, however, that the data will be sent in
> measured packets to minimize the number of tag requests.

I think part of the request is "size". Doesn't seem like there's a need
for fixed size packets, I think since the LS is limited in size and
the user (through the use of the PPE) has explicit control as to when
the data migration occurs and when the data processing occurs, he/she
should be able to trade off packet sizes for specific applications.
Some threads may want to deal with 1 KB data chunks, while others may
be better with 64 KB data chunks. (Just a WAG)












--
davewang202(at)yahoo(dot)com