Is HT faster infiniband?

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

Is Hypertransport faster or Infiniband?
21 answers Last reply
More about faster infiniband
  1. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    On Fri, 29 Jul 2005 00:50:51 GMT, Student <spam@intel.com> wrote:

    >Is Hypertransport faster or Infiniband?

    http://139.95.253.214/SRVS/CGI-BIN/WEBCGI.EXE/,/?St=51,E=0000000000220593058,K=9241,Sxi=1,Case=obj(4650)

    How much faster is HyperTransport™ than other technologies like PCI,
    PCI-X or Infiniband?

    Traditional PCI transfers data at 133 MB/sec, PCI-X at 1 GB/sec,
    InfiniBand at about 4GB/sec in the 12 channel implementation and 1.25 in
    the more popular 4 channels. HyperTransport transfers data at 6.4 GB. It
    is about 50 times faster than PCI, 6 times faster than PCI-X and 5 times
    faster than InfiniBand 4 channels. It is important to remember that
    InfiniBand is not an alternative to HyperTransport technology. Each
    HyperTransport I/O bus consists of two point-to-point unidirectional
    links. Each link can be from two bits to 32 bits wide. Standard bus
    widths of 2, 4, 8, 16, and 32 bits are supported. Asymmetric
    HyperTransport I/O buses are designed to be permitted in situations
    requiring different upstream and downstream bandwidths. Commands,
    addresses, and data (CAD) all use the same bits. So, a simple, low-cost
    HyperTransport I/O implementation using two CAD bits in each direction
    is designed to provide a raw bandwidth of up to 400 Megabytes per second
    in each direction (at the highest possible speed of 1.6 Gbit/sec). Two
    directions combined give almost 8 times the peak bandwidth of PCI 32/33.
    A larger implementation using 16 CAD bits in each direction is designed
    to provide bandwidth up to 3.2 Gigabytes per second both ways - 48 times
    the peak bandwidth of 32-bit PCI running at 33MHz.
  2. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    On Thu, 28 Jul 2005 19:53:05 -0500, Ed <spam@hotmail.com> wrote:

    >On Fri, 29 Jul 2005 00:50:51 GMT, Student <spam@intel.com> wrote:
    >
    >>Is Hypertransport faster or Infiniband?
    >
    >http://139.95.253.214/SRVS/CGI-BIN/WEBCGI.EXE/,/?St=51,E=0000000000220593058,K=9241,Sxi=1,Case=obj(4650)
    >
    >How much faster is HyperTransport™ than other technologies like PCI,
    >PCI-X or Infiniband?
    >
    >Traditional PCI transfers data at 133 MB/sec, PCI-X at 1 GB/sec,
    >InfiniBand at about 4GB/sec in the 12 channel implementation and 1.25 in
    >the more popular 4 channels. HyperTransport transfers data at 6.4 GB. It
    >is about 50 times faster than PCI, 6 times faster than PCI-X and 5 times
    >faster than InfiniBand 4 channels. It is important to remember that
    >InfiniBand is not an alternative to HyperTransport technology. Each
    >HyperTransport I/O bus consists of two point-to-point unidirectional
    >links. Each link can be from two bits to 32 bits wide. Standard bus
    >widths of 2, 4, 8, 16, and 32 bits are supported. Asymmetric
    >HyperTransport I/O buses are designed to be permitted in situations
    >requiring different upstream and downstream bandwidths. Commands,
    >addresses, and data (CAD) all use the same bits. So, a simple, low-cost
    >HyperTransport I/O implementation using two CAD bits in each direction
    >is designed to provide a raw bandwidth of up to 400 Megabytes per second
    >in each direction (at the highest possible speed of 1.6 Gbit/sec). Two
    >directions combined give almost 8 times the peak bandwidth of PCI 32/33.
    >A larger implementation using 16 CAD bits in each direction is designed
    >to provide bandwidth up to 3.2 Gigabytes per second both ways - 48 times
    >the peak bandwidth of 32-bit PCI running at 33MHz.

    Your HT info is a little out of date: clock speeds on mbrds has been at
    1GHz since Fall 2004, giving a peak bandwidth of 4GB/s in each direction on
    a 16/16 link, minus any packetization overhead of course. The next jump to
    1.4GHz is in the works but I don't know where they go when they reach the
    original design target of 1.6GHz.

    --
    Rgds, George Macdonald
  3. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    On Fri, 29 Jul 2005 00:50:51 GMT, Student <spam@intel.com> wrote:

    >Is Hypertransport faster or Infiniband?

    It doesn't matter - they are not targeted at the same "problem". I must
    say Inifiniband's proponents have not helped here by announcing it as a
    "do-all" "solution" for on-board as well as off-board links... so bloody
    confusing. The way I see it, Inifiniband, despite claims, is an off-board
    wired or back-plane transport which possibly has better error
    detection/recovery.

    On that last point, I keep reading that Hypertransport suffers from lack of
    error detection/recovery but CRC checking and packet retries are clearly in
    the specs so I don't know what the full story is there yet. To put things
    in perspective, Hypertransport links on currently available mbrds run at
    approximately the same speed as current PCI Express, which on a x16 link
    has a peak bandwidth of 4.1GB/s (B=Byte); a 16/16 HT link has a peak
    bandwidth of 4GB/s in each direction.

    --
    Rgds, George Macdonald
  4. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    George Macdonald wrote:
    > On Fri, 29 Jul 2005 00:50:51 GMT, Student <spam@intel.com> wrote:
    >
    >
    >>Is Hypertransport faster or Infiniband?
    >
    >
    > It doesn't matter - they are not targeted at the same "problem". I must
    > say Inifiniband's proponents have not helped here by announcing it as a
    > "do-all" "solution" for on-board as well as off-board links... so bloody
    > confusing. The way I see it, Inifiniband, despite claims, is an off-board
    > wired or back-plane transport which possibly has better error
    > detection/recovery.
    >
    > On that last point, I keep reading that Hypertransport suffers from lack of
    > error detection/recovery but CRC checking and packet retries are clearly in
    > the specs so I don't know what the full story is there yet. To put things
    > in perspective, Hypertransport links on currently available mbrds run at
    > approximately the same speed as current PCI Express, which on a x16 link
    > has a peak bandwidth of 4.1GB/s (B=Byte); a 16/16 HT link has a peak
    > bandwidth of 4GB/s in each direction.
    >

    It may be in the spec, but the retry is recent. So what do you do in a
    pc when you get an error and don't find out about it for 512 bytes? Do
    you retry the last 512 bytes worth of transactions? If you would check
    into it you would find that the actual result is a crash of some sort.

    I don't know anyone that has been pushing IB as a "do all" solution.
    Clearly it is not a FSB. And IB does for sure have better recovery and
    detection. Although HT is trying, but they have an installed base
    problem with the networking extensions.

    --
    Del Cecchi
    "This post is my own and doesn’t necessarily represent IBM’s positions,
    strategies or opinions.”
  5. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    On Wed, 03 Aug 2005 13:37:28 -0500, Del Cecchi <cecchinospam@us.ibm.com>
    wrote:

    >George Macdonald wrote:
    >> On Fri, 29 Jul 2005 00:50:51 GMT, Student <spam@intel.com> wrote:
    >>
    >>
    >>>Is Hypertransport faster or Infiniband?
    >>
    >>
    >> It doesn't matter - they are not targeted at the same "problem". I must
    >> say Inifiniband's proponents have not helped here by announcing it as a
    >> "do-all" "solution" for on-board as well as off-board links... so bloody
    >> confusing. The way I see it, Inifiniband, despite claims, is an off-board
    >> wired or back-plane transport which possibly has better error
    >> detection/recovery.
    >>
    >> On that last point, I keep reading that Hypertransport suffers from lack of
    >> error detection/recovery but CRC checking and packet retries are clearly in
    >> the specs so I don't know what the full story is there yet. To put things
    >> in perspective, Hypertransport links on currently available mbrds run at
    >> approximately the same speed as current PCI Express, which on a x16 link
    >> has a peak bandwidth of 4.1GB/s (B=Byte); a 16/16 HT link has a peak
    >> bandwidth of 4GB/s in each direction.
    >>
    >
    >It may be in the spec, but the retry is recent. So what do you do in a
    >pc when you get an error and don't find out about it for 512 bytes? Do
    >you retry the last 512 bytes worth of transactions? If you would check
    >into it you would find that the actual result is a crash of some sort.

    So the packet length is 512 bit-times and the CRC comes embedded at 64-bit
    times into the next packet. I guess the only solution would be to hold a
    packet-sized buffer, which would kill the latency advantage. Is that
    unusual? Does PCI Express use a much smaller packet-size, thus giving it a
    faster retry cycle? Then again HT has separate channels for the
    up/down-links so you don't have to turn-aroun a bi-di channel.

    >I don't know anyone that has been pushing IB as a "do all" solution.
    >Clearly it is not a FSB. And IB does for sure have better recovery and
    >detection. Although HT is trying, but they have an installed base
    >problem with the networking extensions.

    The initial hype on Infiniband was very waffly IMO - it clearly wanted to
    pose as a do-all on-board/off-board link, not necessarily a FSB. From my
    POV they were not clear enough in what it was to target, i.e. back planes
    and wires.

    --
    Rgds, George Macdonald
  6. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    Del Cecchi wrote:
    > George Macdonald wrote:
    > > On Fri, 29 Jul 2005 00:50:51 GMT, Student <spam@intel.com> wrote:
    > >
    > >
    > >>Is Hypertransport faster or Infiniband?
    > >
    > >
    > > It doesn't matter - they are not targeted at the same "problem". I must
    > > say Inifiniband's proponents have not helped here by announcing it as a
    > > "do-all" "solution" for on-board as well as off-board links... so bloody
    > > confusing. The way I see it, Inifiniband, despite claims, is an off-board
    > > wired or back-plane transport which possibly has better error
    > > detection/recovery.
    > >
    > > On that last point, I keep reading that Hypertransport suffers from lack of
    > > error detection/recovery but CRC checking and packet retries are clearly in
    > > the specs so I don't know what the full story is there yet. To put things
    > > in perspective, Hypertransport links on currently available mbrds run at
    > > approximately the same speed as current PCI Express, which on a x16 link
    > > has a peak bandwidth of 4.1GB/s (B=Byte); a 16/16 HT link has a peak
    > > bandwidth of 4GB/s in each direction.
    > >
    >
    > It may be in the spec, but the retry is recent. So what do you do in a
    > pc when you get an error and don't find out about it for 512 bytes? Do
    > you retry the last 512 bytes worth of transactions? If you would check
    > into it you would find that the actual result is a crash of some sort.
    >

    So how does that work in practice? One gathers from Stone and
    Partridge work on ethernet checksum vs CRC errors that undetected
    errors are probably much more common than anyone would have cared to
    think. Does anybody know about HT-type traffic? If someone bothers to
    do a study, we'll find out that computers have turned into random
    number generators?

    As it is, the Stone and Partridge stuff doesn't seem to have created
    much more than some interesting exchanges on comp.arch. Does anybody
    care anymore? I'm sure that IBM does, but can it afford to?

    RM
  7. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    On 4 Aug 2005 08:22:53 -0700, "Robert Myers" <rbmyersusa@gmail.com> wrote:

    >Del Cecchi wrote:
    >> George Macdonald wrote:
    >> > On Fri, 29 Jul 2005 00:50:51 GMT, Student <spam@intel.com> wrote:
    >> >
    >> >
    >> >>Is Hypertransport faster or Infiniband?
    >> >
    >> >
    >> > It doesn't matter - they are not targeted at the same "problem". I must
    >> > say Inifiniband's proponents have not helped here by announcing it as a
    >> > "do-all" "solution" for on-board as well as off-board links... so bloody
    >> > confusing. The way I see it, Inifiniband, despite claims, is an off-board
    >> > wired or back-plane transport which possibly has better error
    >> > detection/recovery.
    >> >
    >> > On that last point, I keep reading that Hypertransport suffers from lack of
    >> > error detection/recovery but CRC checking and packet retries are clearly in
    >> > the specs so I don't know what the full story is there yet. To put things
    >> > in perspective, Hypertransport links on currently available mbrds run at
    >> > approximately the same speed as current PCI Express, which on a x16 link
    >> > has a peak bandwidth of 4.1GB/s (B=Byte); a 16/16 HT link has a peak
    >> > bandwidth of 4GB/s in each direction.
    >> >
    >>
    >> It may be in the spec, but the retry is recent. So what do you do in a
    >> pc when you get an error and don't find out about it for 512 bytes? Do
    >> you retry the last 512 bytes worth of transactions? If you would check
    >> into it you would find that the actual result is a crash of some sort.
    >>
    >
    >So how does that work in practice? One gathers from Stone and
    >Partridge work on ethernet checksum vs CRC errors that undetected
    >errors are probably much more common than anyone would have cared to
    >think. Does anybody know about HT-type traffic? If someone bothers to
    >do a study, we'll find out that computers have turned into random
    >number generators?

    Turned into? PCs have always been random number generators. Why else did
    we need a Ctl/Alt/Del key combo?:-)

    I guess it's true to say that, as PCs have migrated up to "important"
    tasks, the need for confidence in data integrity has increased.

    --
    Rgds, George Macdonald
  8. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    Robert Myers wrote:
    > Del Cecchi wrote:
    >
    >>George Macdonald wrote:
    >>
    >>>On Fri, 29 Jul 2005 00:50:51 GMT, Student <spam@intel.com> wrote:
    >>>
    >>>
    >>>
    >>>>Is Hypertransport faster or Infiniband?
    >>>
    >>>
    >>>It doesn't matter - they are not targeted at the same "problem". I must
    >>>say Inifiniband's proponents have not helped here by announcing it as a
    >>>"do-all" "solution" for on-board as well as off-board links... so bloody
    >>>confusing. The way I see it, Inifiniband, despite claims, is an off-board
    >>>wired or back-plane transport which possibly has better error
    >>>detection/recovery.
    >>>
    >>>On that last point, I keep reading that Hypertransport suffers from lack of
    >>>error detection/recovery but CRC checking and packet retries are clearly in
    >>>the specs so I don't know what the full story is there yet. To put things
    >>>in perspective, Hypertransport links on currently available mbrds run at
    >>>approximately the same speed as current PCI Express, which on a x16 link
    >>>has a peak bandwidth of 4.1GB/s (B=Byte); a 16/16 HT link has a peak
    >>>bandwidth of 4GB/s in each direction.
    >>>
    >>
    >>It may be in the spec, but the retry is recent. So what do you do in a
    >>pc when you get an error and don't find out about it for 512 bytes? Do
    >>you retry the last 512 bytes worth of transactions? If you would check
    >>into it you would find that the actual result is a crash of some sort.
    >>
    >
    >
    > So how does that work in practice? One gathers from Stone and
    > Partridge work on ethernet checksum vs CRC errors that undetected
    > errors are probably much more common than anyone would have cared to
    > think. Does anybody know about HT-type traffic? If someone bothers to
    > do a study, we'll find out that computers have turned into random
    > number generators?
    >
    > As it is, the Stone and Partridge stuff doesn't seem to have created
    > much more than some interesting exchanges on comp.arch. Does anybody
    > care anymore? I'm sure that IBM does, but can it afford to?
    >
    > RM
    >

    I don't know about their work, and I know little about ethernet. I do
    know from experience in the lab that a 32 bit crc, properly chosen, with
    retry can cope with quite high error rates without any problem with the
    system. And I would believe that the systems in question would not
    tolerate very many undetected errors because the disks for the virtual
    memory, and the coherence traffic if any was carried over the network in
    question along with all the other I/O traffic.


    --
    Del Cecchi
    "This post is my own and doesn’t necessarily represent IBM’s positions,
    strategies or opinions.”
  9. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    Del Cecchi wrote:
    > Robert Myers wrote:

    > >
    > > As it is, the Stone and Partridge stuff doesn't seem to have created
    > > much more than some interesting exchanges on comp.arch. Does anybody
    > > care anymore? I'm sure that IBM does, but can it afford to?
    > >
    >
    > I don't know about their work, and I know little about ethernet. I do
    > know from experience in the lab that a 32 bit crc, properly chosen, with
    > retry can cope with quite high error rates without any problem with the
    > system. And I would believe that the systems in question would not
    > tolerate very many undetected errors because the disks for the virtual
    > memory, and the coherence traffic if any was carried over the network in
    > question along with all the other I/O traffic.
    >

    I think you did participate in the discussion of this subject on
    comp.arch:

    Stone, J., Partridge, C.: "When The CRC and TCP Checksum Disagree",
    Proceedings of the ACM conference on Applications, Technologies,
    Architectures, and Protocols for Computer Communication (SIGCOMM'00),
    Stockholm, Sweden, August/September 2000, pp. 309-319

    Abstract

    "Traces of Internet packets from the past two years show that between 1
    packet in 1,100 and 1 packet in 32,000 fails the TCP checksum, even on
    links where link-level CRCs should catch all but 1 in 4 billion errors.
    For certain situations, the rate of checksum failures can be even
    higher: in one hour-long test we observed a checksum failure of 1
    packet in 400. We investigate why so many errors are observed, when
    link-level CRCs should catch nearly all of them.We have collected
    nearly 500,000 packets which failed the TCP or UDP or IP checksum. This
    dataset shows the Internet has a wide variety of error sources which
    can not be detected by link-level checks. We describe analysis tools
    that have identified nearly 100 different error patterns. Categorizing
    packet errors, we can infer likely causes which explain roughly half
    the observed errors. The causes span the entire spectrum of a network
    stack, from memory errors to bugs in TCP.After an analysis we conclude
    that the checksum will fail to detect errors for roughly 1 in 16
    million to 10 billion packets. From our analysis of the cause of
    errors, we propose simple changes to several protocols which will
    decrease the rate of undetected error. Even so, the highly non-random
    distribution of errors strongly suggests some applications should
    employ application-level checksums or equivalents."

    It may not be a good model for the possiblity of other link-level
    errors, but it does make you wonder.

    In overclocking tests, I've found that PC's will tolerate significant
    memory errors without giving any immediate indication of a problem.
    Short of a crash, I don't know how you'd know anything was wrong
    without application-level checking.


    RM
  10. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    On 5 Aug 2005 04:10:47 -0700, "Robert Myers" <rbmyersusa@gmail.com> wrote:

    >Del Cecchi wrote:
    >> Robert Myers wrote:
    >
    >> >
    >> > As it is, the Stone and Partridge stuff doesn't seem to have created
    >> > much more than some interesting exchanges on comp.arch. Does anybody
    >> > care anymore? I'm sure that IBM does, but can it afford to?
    >> >
    >>
    >> I don't know about their work, and I know little about ethernet. I do
    >> know from experience in the lab that a 32 bit crc, properly chosen, with
    >> retry can cope with quite high error rates without any problem with the
    >> system. And I would believe that the systems in question would not
    >> tolerate very many undetected errors because the disks for the virtual
    >> memory, and the coherence traffic if any was carried over the network in
    >> question along with all the other I/O traffic.
    >>
    >
    >I think you did participate in the discussion of this subject on
    >comp.arch:
    >
    >Stone, J., Partridge, C.: "When The CRC and TCP Checksum Disagree",
    >Proceedings of the ACM conference on Applications, Technologies,
    >Architectures, and Protocols for Computer Communication (SIGCOMM'00),
    >Stockholm, Sweden, August/September 2000, pp. 309-319
    >
    >Abstract
    >
    >"Traces of Internet packets from the past two years show that between 1
    >packet in 1,100 and 1 packet in 32,000 fails the TCP checksum, even on
    >links where link-level CRCs should catch all but 1 in 4 billion errors.
    >For certain situations, the rate of checksum failures can be even
    >higher: in one hour-long test we observed a checksum failure of 1
    >packet in 400. We investigate why so many errors are observed, when
    >link-level CRCs should catch nearly all of them.We have collected
    >nearly 500,000 packets which failed the TCP or UDP or IP checksum. This
    >dataset shows the Internet has a wide variety of error sources which
    >can not be detected by link-level checks. We describe analysis tools
    >that have identified nearly 100 different error patterns. Categorizing
    >packet errors, we can infer likely causes which explain roughly half
    >the observed errors. The causes span the entire spectrum of a network
    >stack, from memory errors to bugs in TCP.After an analysis we conclude
    >that the checksum will fail to detect errors for roughly 1 in 16
    >million to 10 billion packets. From our analysis of the cause of
    >errors, we propose simple changes to several protocols which will
    >decrease the rate of undetected error. Even so, the highly non-random
    >distribution of errors strongly suggests some applications should
    >employ application-level checksums or equivalents."
    >
    >It may not be a good model for the possiblity of other link-level
    >errors, but it does make you wonder.

    I know of two NICs which are reputed to have a bug in their checksum
    offloading. What amazes me is that the only software where I've seen a
    hiccup from this is Eudora, where it reports error 10053 or 10054 when you
    try to send a longish e-mail msg. Turning off "Checksum Offload" fixes the
    problem.

    >In overclocking tests, I've found that PC's will tolerate significant
    >memory errors without giving any immediate indication of a problem.
    >Short of a crash, I don't know how you'd know anything was wrong
    >without application-level checking.

    I suspect that many overclockers are running slightly over the ragged
    edge... and that stable really means very low error rate.

    --
    Rgds, George Macdonald
  11. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    "Robert Myers" <rbmyersusa@gmail.com> wrote in message
    news:1123240247.383142.203260@g14g2000cwa.googlegroups.com...
    > Del Cecchi wrote:
    >> Robert Myers wrote:
    >
    >> >
    >> > As it is, the Stone and Partridge stuff doesn't seem to have created
    >> > much more than some interesting exchanges on comp.arch. Does
    >> > anybody
    >> > care anymore? I'm sure that IBM does, but can it afford to?
    >> >
    >>
    >> I don't know about their work, and I know little about ethernet. I do
    >> know from experience in the lab that a 32 bit crc, properly chosen,
    >> with
    >> retry can cope with quite high error rates without any problem with
    >> the
    >> system. And I would believe that the systems in question would not
    >> tolerate very many undetected errors because the disks for the virtual
    >> memory, and the coherence traffic if any was carried over the network
    >> in
    >> question along with all the other I/O traffic.
    >>
    >
    > I think you did participate in the discussion of this subject on
    > comp.arch:
    >
    > Stone, J., Partridge, C.: "When The CRC and TCP Checksum Disagree",
    > Proceedings of the ACM conference on Applications, Technologies,
    > Architectures, and Protocols for Computer Communication (SIGCOMM'00),
    > Stockholm, Sweden, August/September 2000, pp. 309-319
    >
    > Abstract
    >
    > "Traces of Internet packets from the past two years show that between 1
    > packet in 1,100 and 1 packet in 32,000 fails the TCP checksum, even on
    > links where link-level CRCs should catch all but 1 in 4 billion errors.
    > For certain situations, the rate of checksum failures can be even
    > higher: in one hour-long test we observed a checksum failure of 1
    > packet in 400. We investigate why so many errors are observed, when
    > link-level CRCs should catch nearly all of them.We have collected
    > nearly 500,000 packets which failed the TCP or UDP or IP checksum. This
    > dataset shows the Internet has a wide variety of error sources which
    > can not be detected by link-level checks. We describe analysis tools
    > that have identified nearly 100 different error patterns. Categorizing
    > packet errors, we can infer likely causes which explain roughly half
    > the observed errors. The causes span the entire spectrum of a network
    > stack, from memory errors to bugs in TCP.After an analysis we conclude
    > that the checksum will fail to detect errors for roughly 1 in 16
    > million to 10 billion packets. From our analysis of the cause of
    > errors, we propose simple changes to several protocols which will
    > decrease the rate of undetected error. Even so, the highly non-random
    > distribution of errors strongly suggests some applications should
    > employ application-level checksums or equivalents."
    >
    > It may not be a good model for the possiblity of other link-level
    > errors, but it does make you wonder.
    >
    > In overclocking tests, I've found that PC's will tolerate significant
    > memory errors without giving any immediate indication of a problem.
    > Short of a crash, I don't know how you'd know anything was wrong
    > without application-level checking.
    >
    >
    > RM

    OK, thanks for the reminder. As I recall, claiming that the checksums
    were missing the errors was a mild distortion. The errors were
    transpositions of data blocks being fetched to the adapter so the data
    was bad when it got there. Is that not the case?

    As for PCs not being affected by memory errors, how many do you estimate
    it took to crash the system? The lab system I was referring to was
    seeing many errors per second.

    del
    >
  12. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    Bitstring <1123240247.383142.203260@g14g2000cwa.googlegroups.com>, from
    the wonderful person Robert Myers <rbmyersusa@gmail.com> said
    <snip>
    >In overclocking tests, I've found that PC's will tolerate significant
    >memory errors without giving any immediate indication of a problem.
    >Short of a crash, I don't know how you'd know anything was wrong
    >without application-level checking.

    If it'll (successfully) run Memtest86 overnight, and it'll pass the
    Prime95 torture tests, then it's working right (IME). If it won't, then
    it might run WinXP and applications anyway, but it'll do strange things
    from time to time ...

    --
    GSV Three Minds in a Can
    Contact recommends the use of Firefox; SC recommends it at gunpoint.
  13. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    Del Cecchi wrote:

    >
    > OK, thanks for the reminder. As I recall, claiming that the checksums
    > were missing the errors was a mild distortion. The errors were
    > transpositions of data blocks being fetched to the adapter so the data
    > was bad when it got there. Is that not the case?
    >

    That was one of the explanations. I wasn't convinced there was any one
    single explanation that would have dominated.

    I concluded that, if you really had to know that your data are
    reliable, you should probably do your own end-to-end error checking.

    > As for PCs not being affected by memory errors, how many do you estimate
    > it took to crash the system? The lab system I was referring to was
    > seeing many errors per second.
    >

    Oh, a few errors per hour will generally let a system run, IIRC. The
    difference in speed between running on the ragged edge like that and
    not running at all is so small that it isn't worth running on the
    ragged edge.

    RM
  14. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    "Robert Myers" <rbmyersusa@gmail.com> wrote in message
    news:1123349547.132583.294650@f14g2000cwb.googlegroups.com...
    > Del Cecchi wrote:
    >
    >>
    >> OK, thanks for the reminder. As I recall, claiming that the checksums
    >> were missing the errors was a mild distortion. The errors were
    >> transpositions of data blocks being fetched to the adapter so the data
    >> was bad when it got there. Is that not the case?
    >>
    >
    > That was one of the explanations. I wasn't convinced there was any one
    > single explanation that would have dominated.
    >
    > I concluded that, if you really had to know that your data are
    > reliable, you should probably do your own end-to-end error checking.
    >
    >> As for PCs not being affected by memory errors, how many do you
    >> estimate
    >> it took to crash the system? The lab system I was referring to was
    >> seeing many errors per second.
    >>
    >
    > Oh, a few errors per hour will generally let a system run, IIRC. The
    > difference in speed between running on the ragged edge like that and
    > not running at all is so small that it isn't worth running on the
    > ragged edge.
    >
    > RM
    >
    The lab system was in the range of 10**5 error/sec and still ran
    perfectly with 32 bit crc and retry. A bad cable can really prove your
    recovery mechanism.

    So it sounds like the protocol or the software or something for the
    ethernet systems in question were broken.

    del
  15. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    Del Cecchi <dcecchi.nospam@att.net> wrote:
    > The lab system was in the range of 10**5 error/sec and
    > still ran perfectly with 32 bit crc and retry.

    Yes, this is barely possible on a 100 Mbit/s system.
    A 64 byte packet has a 60% chance of arriving error free.
    Unfortunately, a 1500 byte packet has only a 0.0006% chance,
    assuming random error distribution. So acks get through,
    but data in will be bad.

    > A bad cable can really prove your recovery mechanism.

    Yep! Beware of newbies with crimpers! RJ45s are hard to do,
    and not just because the correct pattern is counter-intuitive.
    All the intuitive patterns split a pair which often gives
    some connectivity but poor performance. There are 40,320 ways
    of wiring the 8 conductor cable straight-thru. All but 1,152
    split at least one pair necessary for 10baseT or 100baseTX.

    > So it sounds like the protocol or the software or something
    > for the ethernet systems in question were broken.

    I would hope any system with anything near 0.1% error rates
    was using ECC, not just CRC.

    -- Robert
  16. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    On Mon, 08 Aug 2005 13:12:06 GMT, Robert Redelmeier
    <redelm@ev1.net.invalid> wrote:

    >Del Cecchi <dcecchi.nospam@att.net> wrote:
    >> The lab system was in the range of 10**5 error/sec and
    >> still ran perfectly with 32 bit crc and retry.
    >
    >Yes, this is barely possible on a 100 Mbit/s system.
    >A 64 byte packet has a 60% chance of arriving error free.
    >Unfortunately, a 1500 byte packet has only a 0.0006% chance,
    >assuming random error distribution. So acks get through,
    >but data in will be bad.
    >
    >> A bad cable can really prove your recovery mechanism.
    >
    >Yep! Beware of newbies with crimpers! RJ45s are hard to do,
    >and not just because the correct pattern is counter-intuitive.
    >All the intuitive patterns split a pair which often gives
    >some connectivity but poor performance. There are 40,320 ways
    >of wiring the 8 conductor cable straight-thru. All but 1,152
    >split at least one pair necessary for 10baseT or 100baseTX.

    Switches with just a Web based interface, which allow you to collect error
    rates and mirror ports, are cheap now. All the Cat5 that I put in is now
    running 1Gb/s Full Duplex, with maybe 5-6 errors/week/port due, I believe,
    to speed ramping at PC power on.

    While there is undoubtedly bad cable around, much of it was done by
    "professionals" or taken off the shelf... or even just caused by physical
    abuse or misrouting in the wall or ceiling/floor cavity. I also tend to
    think much of "bad cable" is due to legacy "telephone" mentality, equipment
    practices and personnel. To me the punch-down block is a scary and
    dangerous place.:-)

    --
    Rgds, George Macdonald
  17. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    "Robert Redelmeier" <redelm@ev1.net.invalid> wrote in message
    news:GSIJe.390$Ub1.125@newssvr29.news.prodigy.net...
    > Del Cecchi <dcecchi.nospam@att.net> wrote:
    >> The lab system was in the range of 10**5 error/sec and
    >> still ran perfectly with 32 bit crc and retry.
    >
    > Yes, this is barely possible on a 100 Mbit/s system.
    > A 64 byte packet has a 60% chance of arriving error free.
    > Unfortunately, a 1500 byte packet has only a 0.0006% chance,
    > assuming random error distribution. So acks get through,
    > but data in will be bad.
    >
    >> A bad cable can really prove your recovery mechanism.
    >
    > Yep! Beware of newbies with crimpers! RJ45s are hard to do,
    > and not just because the correct pattern is counter-intuitive.
    > All the intuitive patterns split a pair which often gives
    > some connectivity but poor performance. There are 40,320 ways
    > of wiring the 8 conductor cable straight-thru. All but 1,152
    > split at least one pair necessary for 10baseT or 100baseTX.
    >
    >> So it sounds like the protocol or the software or something
    >> for the ethernet systems in question were broken.
    >
    > I would hope any system with anything near 0.1% error rates
    > was using ECC, not just CRC.
    >
    > -- Robert
    >
    This was a parallel source synchronous link (RIO) running at a GB/sec.
    And the error rate was packet errors. I didn't have any way to collect
    statistics on bit errors. CRC with retry is the moral equivilent of ECC.
    >
    >
  18. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    George Macdonald <fammacd=!SPAM^nothanks@tellurian.com> wrote:
    > Switches with just a Web based interface, which allow you
    > to collect error rates and mirror ports, are cheap now.

    Any particular brands/models you'd recommend?

    > All the Cat5 that I put in is now running 1Gb/s Full Duplex,
    > with maybe 5-6 errors/week/port due, I believe, to speed
    > ramping at PC power on.

    Could be. Also could be interference from noisemakers
    like motor starts. But sounds like good cable.

    > While there is undoubtedly bad cable around, much of it
    > was done by "professionals" or taken off the shelf... or
    > even just caused by physical abuse or misrouting in the
    > wall or ceiling/floor cavity. I also tend to think much
    > of "bad cable" is due to legacy "telephone" mentality,

    Yes, a lot of that. But crimpers still aren't easy
    even after you know T-568A from T-568B

    > equipment practices and personnel. To me the punch-down
    > block is a scary and dangerous place.:-)

    Hey, jacks have punchdowns too! And if you _really_ like
    retro, Siemon makes a Cat5e rated 66 block :)

    -- Robert
  19. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    Del Cecchi <dcecchi.nospam@att.net> wrote:
    > This was a parallel source synchronous link (RIO) running
    > at a GB/sec. And the error rate was packet errors.
    > I didn't have any way to collect statistics on bit errors.

    Probably the same 1e5/s -- that link was running around
    1e10 bit/s. Unless non random, or extremely large packets,
    the chances of having two+ errors in one packet are very small.

    > CRC with retry is the moral equivilent of ECC.

    Well, that's an odd sense of morality :)

    CRC with retry has low "clean" overhead, but throws away lots
    of good? bits. ECC has much higher overhead but seldom throws
    anything away. There is an error-rate breakpoint below which
    CRC/retry is best, above ECC is better.

    -- Robert
  20. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    "Robert Redelmeier" <redelm@ev1.net.invalid> wrote in message
    news:YkTJe.775$FV1.685@newssvr33.news.prodigy.com...
    > Del Cecchi <dcecchi.nospam@att.net> wrote:
    >> This was a parallel source synchronous link (RIO) running
    >> at a GB/sec. And the error rate was packet errors.
    >> I didn't have any way to collect statistics on bit errors.
    >
    > Probably the same 1e5/s -- that link was running around
    > 1e10 bit/s. Unless non random, or extremely large packets,
    > the chances of having two+ errors in one packet are very small.
    >
    >> CRC with retry is the moral equivilent of ECC.
    >
    > Well, that's an odd sense of morality :)
    >
    > CRC with retry has low "clean" overhead, but throws away lots
    > of good? bits. ECC has much higher overhead but seldom throws
    > anything away. There is an error-rate breakpoint below which
    > CRC/retry is best, above ECC is better.
    >
    > -- Robert

    Well, the error rate was *supposed* to be low, and latency is not an
    issue generally with this interface.

    Del
    >
  21. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    On Mon, 08 Aug 2005 22:42:23 GMT, Robert Redelmeier
    <redelm@ev1.net.invalid> wrote:

    >George Macdonald <fammacd=!SPAM^nothanks@tellurian.com> wrote:
    >> Switches with just a Web based interface, which allow you
    >> to collect error rates and mirror ports, are cheap now.
    >
    >Any particular brands/models you'd recommend?

    Well, when we needed more 100Mb ports in the office and I figured I might
    look at a switch with 24x10/100Mb w. 2x1Gb ports to replace the old hubs, I
    couldn't see not splurging for a 24x10/100/1Gb for the small difference in
    price. I got a D-Link DGS-1224T - this was partly based on the fact that
    D-Link is apparently one of the few real mfrs of this stuff and my
    experience with the old D-Link hubs which worked reliably for nearly 10
    years. Newegg has the DGS-1224T for $355. now - less than I paid - and it
    has a Web-based management intreface.

    It's been working fine for several months now; my only complaint: noise.
    The 2 fans are a bit loud and I'm going to have a word with them about it -
    they sell it as a rack-mountable or desktop switch and for the latter, it's
    too loud.

    It's possible that Linksys has something good but I'm prejudiced against
    them because of some bad NICs I got a few years ago which would hang the
    network/hubs when the system they were in was powered down. It's possible
    that they've improved quality since being taken over by Cisco but I'm still
    wary - those damned NICs tore out a lot of my hair till it dawned on me
    what was going on.

    >> All the Cat5 that I put in is now running 1Gb/s Full Duplex,
    >> with maybe 5-6 errors/week/port due, I believe, to speed
    >> ramping at PC power on.
    >
    >Could be. Also could be interference from noisemakers
    >like motor starts. But sounds like good cable.

    Yeah I guess it could be A/C sags/surges but I'd think we would see more if
    that was the case.

    >> While there is undoubtedly bad cable around, much of it
    >> was done by "professionals" or taken off the shelf... or
    >> even just caused by physical abuse or misrouting in the
    >> wall or ceiling/floor cavity. I also tend to think much
    >> of "bad cable" is due to legacy "telephone" mentality,
    >
    >Yes, a lot of that. But crimpers still aren't easy
    >even after you know T-568A from T-568B

    I do OK with ours but you have to buy a quality crimper - Paladin IIRC? To
    trim wires square, I find a sturdy pair of scissors is best.

    >> equipment practices and personnel. To me the punch-down
    >> block is a scary and dangerous place.:-)
    >
    >Hey, jacks have punchdowns too! And if you _really_ like
    >retro, Siemon makes a Cat5e rated 66 block :)

    We don't have a wiring closet/cabinet for our small (24xGb + 8x10BaseT
    network - all wires are err, straight through from PC to switch/hub -
    breaks all the rules, I know, but it works. I looked at getting a
    wall-mount cabinet but too much aggro - just buying the damned things is an
    exercise.

    --
    Rgds, George Macdonald
Ask a new question

Read More

CPUs Hardware IBM