Sign in with
Sign up | Sign in
Your question

Is HT faster infiniband?

Last response: in CPUs
Share
July 29, 2005 4:50:51 AM

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

Is Hypertransport faster or Infiniband?

More about : faster infiniband

July 29, 2005 4:50:52 AM

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

On Fri, 29 Jul 2005 00:50:51 GMT, Student <spam@intel.com> wrote:

>Is Hypertransport faster or Infiniband?

http://139.95.253.214/SRVS/CGI-BIN/WEBCGI.EXE/,/?St=51,...(4650)

How much faster is HyperTransport™ than other technologies like PCI,
PCI-X or Infiniband?

Traditional PCI transfers data at 133 MB/sec, PCI-X at 1 GB/sec,
InfiniBand at about 4GB/sec in the 12 channel implementation and 1.25 in
the more popular 4 channels. HyperTransport transfers data at 6.4 GB. It
is about 50 times faster than PCI, 6 times faster than PCI-X and 5 times
faster than InfiniBand 4 channels. It is important to remember that
InfiniBand is not an alternative to HyperTransport technology. Each
HyperTransport I/O bus consists of two point-to-point unidirectional
links. Each link can be from two bits to 32 bits wide. Standard bus
widths of 2, 4, 8, 16, and 32 bits are supported. Asymmetric
HyperTransport I/O buses are designed to be permitted in situations
requiring different upstream and downstream bandwidths. Commands,
addresses, and data (CAD) all use the same bits. So, a simple, low-cost
HyperTransport I/O implementation using two CAD bits in each direction
is designed to provide a raw bandwidth of up to 400 Megabytes per second
in each direction (at the highest possible speed of 1.6 Gbit/sec). Two
directions combined give almost 8 times the peak bandwidth of PCI 32/33.
A larger implementation using 16 CAD bits in each direction is designed
to provide bandwidth up to 3.2 Gigabytes per second both ways - 48 times
the peak bandwidth of 32-bit PCI running at 33MHz.
Anonymous
a b à CPUs
August 3, 2005 1:49:59 PM

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

On Thu, 28 Jul 2005 19:53:05 -0500, Ed <spam@hotmail.com> wrote:

>On Fri, 29 Jul 2005 00:50:51 GMT, Student <spam@intel.com> wrote:
>
>>Is Hypertransport faster or Infiniband?
>
>http://139.95.253.214/SRVS/CGI-BIN/WEBCGI.EXE/,/?St=51,...(4650)
>
>How much faster is HyperTransport™ than other technologies like PCI,
>PCI-X or Infiniband?
>
>Traditional PCI transfers data at 133 MB/sec, PCI-X at 1 GB/sec,
>InfiniBand at about 4GB/sec in the 12 channel implementation and 1.25 in
>the more popular 4 channels. HyperTransport transfers data at 6.4 GB. It
>is about 50 times faster than PCI, 6 times faster than PCI-X and 5 times
>faster than InfiniBand 4 channels. It is important to remember that
>InfiniBand is not an alternative to HyperTransport technology. Each
>HyperTransport I/O bus consists of two point-to-point unidirectional
>links. Each link can be from two bits to 32 bits wide. Standard bus
>widths of 2, 4, 8, 16, and 32 bits are supported. Asymmetric
>HyperTransport I/O buses are designed to be permitted in situations
>requiring different upstream and downstream bandwidths. Commands,
>addresses, and data (CAD) all use the same bits. So, a simple, low-cost
>HyperTransport I/O implementation using two CAD bits in each direction
>is designed to provide a raw bandwidth of up to 400 Megabytes per second
>in each direction (at the highest possible speed of 1.6 Gbit/sec). Two
>directions combined give almost 8 times the peak bandwidth of PCI 32/33.
>A larger implementation using 16 CAD bits in each direction is designed
>to provide bandwidth up to 3.2 Gigabytes per second both ways - 48 times
>the peak bandwidth of 32-bit PCI running at 33MHz.

Your HT info is a little out of date: clock speeds on mbrds has been at
1GHz since Fall 2004, giving a peak bandwidth of 4GB/s in each direction on
a 16/16 link, minus any packetization overhead of course. The next jump to
1.4GHz is in the works but I don't know where they go when they reach the
original design target of 1.6GHz.

--
Rgds, George Macdonald
Related resources
Can't find your answer ? Ask !
Anonymous
a b à CPUs
August 3, 2005 1:50:11 PM

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

On Fri, 29 Jul 2005 00:50:51 GMT, Student <spam@intel.com> wrote:

>Is Hypertransport faster or Infiniband?

It doesn't matter - they are not targeted at the same "problem". I must
say Inifiniband's proponents have not helped here by announcing it as a
"do-all" "solution" for on-board as well as off-board links... so bloody
confusing. The way I see it, Inifiniband, despite claims, is an off-board
wired or back-plane transport which possibly has better error
detection/recovery.

On that last point, I keep reading that Hypertransport suffers from lack of
error detection/recovery but CRC checking and packet retries are clearly in
the specs so I don't know what the full story is there yet. To put things
in perspective, Hypertransport links on currently available mbrds run at
approximately the same speed as current PCI Express, which on a x16 link
has a peak bandwidth of 4.1GB/s (B=Byte); a 16/16 HT link has a peak
bandwidth of 4GB/s in each direction.

--
Rgds, George Macdonald
Anonymous
a b à CPUs
August 3, 2005 5:37:28 PM

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

George Macdonald wrote:
> On Fri, 29 Jul 2005 00:50:51 GMT, Student <spam@intel.com> wrote:
>
>
>>Is Hypertransport faster or Infiniband?
>
>
> It doesn't matter - they are not targeted at the same "problem". I must
> say Inifiniband's proponents have not helped here by announcing it as a
> "do-all" "solution" for on-board as well as off-board links... so bloody
> confusing. The way I see it, Inifiniband, despite claims, is an off-board
> wired or back-plane transport which possibly has better error
> detection/recovery.
>
> On that last point, I keep reading that Hypertransport suffers from lack of
> error detection/recovery but CRC checking and packet retries are clearly in
> the specs so I don't know what the full story is there yet. To put things
> in perspective, Hypertransport links on currently available mbrds run at
> approximately the same speed as current PCI Express, which on a x16 link
> has a peak bandwidth of 4.1GB/s (B=Byte); a 16/16 HT link has a peak
> bandwidth of 4GB/s in each direction.
>

It may be in the spec, but the retry is recent. So what do you do in a
pc when you get an error and don't find out about it for 512 bytes? Do
you retry the last 512 bytes worth of transactions? If you would check
into it you would find that the actual result is a crash of some sort.

I don't know anyone that has been pushing IB as a "do all" solution.
Clearly it is not a FSB. And IB does for sure have better recovery and
detection. Although HT is trying, but they have an installed base
problem with the networking extensions.

--
Del Cecchi
"This post is my own and doesn’t necessarily represent IBM’s positions,
strategies or opinions.”
Anonymous
a b à CPUs
August 3, 2005 8:33:08 PM

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

On Wed, 03 Aug 2005 13:37:28 -0500, Del Cecchi <cecchinospam@us.ibm.com>
wrote:

>George Macdonald wrote:
>> On Fri, 29 Jul 2005 00:50:51 GMT, Student <spam@intel.com> wrote:
>>
>>
>>>Is Hypertransport faster or Infiniband?
>>
>>
>> It doesn't matter - they are not targeted at the same "problem". I must
>> say Inifiniband's proponents have not helped here by announcing it as a
>> "do-all" "solution" for on-board as well as off-board links... so bloody
>> confusing. The way I see it, Inifiniband, despite claims, is an off-board
>> wired or back-plane transport which possibly has better error
>> detection/recovery.
>>
>> On that last point, I keep reading that Hypertransport suffers from lack of
>> error detection/recovery but CRC checking and packet retries are clearly in
>> the specs so I don't know what the full story is there yet. To put things
>> in perspective, Hypertransport links on currently available mbrds run at
>> approximately the same speed as current PCI Express, which on a x16 link
>> has a peak bandwidth of 4.1GB/s (B=Byte); a 16/16 HT link has a peak
>> bandwidth of 4GB/s in each direction.
>>
>
>It may be in the spec, but the retry is recent. So what do you do in a
>pc when you get an error and don't find out about it for 512 bytes? Do
>you retry the last 512 bytes worth of transactions? If you would check
>into it you would find that the actual result is a crash of some sort.

So the packet length is 512 bit-times and the CRC comes embedded at 64-bit
times into the next packet. I guess the only solution would be to hold a
packet-sized buffer, which would kill the latency advantage. Is that
unusual? Does PCI Express use a much smaller packet-size, thus giving it a
faster retry cycle? Then again HT has separate channels for the
up/down-links so you don't have to turn-aroun a bi-di channel.

>I don't know anyone that has been pushing IB as a "do all" solution.
>Clearly it is not a FSB. And IB does for sure have better recovery and
>detection. Although HT is trying, but they have an installed base
>problem with the networking extensions.

The initial hype on Infiniband was very waffly IMO - it clearly wanted to
pose as a do-all on-board/off-board link, not necessarily a FSB. From my
POV they were not clear enough in what it was to target, i.e. back planes
and wires.

--
Rgds, George Macdonald
Anonymous
a b à CPUs
August 4, 2005 12:22:53 PM

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

Del Cecchi wrote:
> George Macdonald wrote:
> > On Fri, 29 Jul 2005 00:50:51 GMT, Student <spam@intel.com> wrote:
> >
> >
> >>Is Hypertransport faster or Infiniband?
> >
> >
> > It doesn't matter - they are not targeted at the same "problem". I must
> > say Inifiniband's proponents have not helped here by announcing it as a
> > "do-all" "solution" for on-board as well as off-board links... so bloody
> > confusing. The way I see it, Inifiniband, despite claims, is an off-board
> > wired or back-plane transport which possibly has better error
> > detection/recovery.
> >
> > On that last point, I keep reading that Hypertransport suffers from lack of
> > error detection/recovery but CRC checking and packet retries are clearly in
> > the specs so I don't know what the full story is there yet. To put things
> > in perspective, Hypertransport links on currently available mbrds run at
> > approximately the same speed as current PCI Express, which on a x16 link
> > has a peak bandwidth of 4.1GB/s (B=Byte); a 16/16 HT link has a peak
> > bandwidth of 4GB/s in each direction.
> >
>
> It may be in the spec, but the retry is recent. So what do you do in a
> pc when you get an error and don't find out about it for 512 bytes? Do
> you retry the last 512 bytes worth of transactions? If you would check
> into it you would find that the actual result is a crash of some sort.
>

So how does that work in practice? One gathers from Stone and
Partridge work on ethernet checksum vs CRC errors that undetected
errors are probably much more common than anyone would have cared to
think. Does anybody know about HT-type traffic? If someone bothers to
do a study, we'll find out that computers have turned into random
number generators?

As it is, the Stone and Partridge stuff doesn't seem to have created
much more than some interesting exchanges on comp.arch. Does anybody
care anymore? I'm sure that IBM does, but can it afford to?

RM
Anonymous
a b à CPUs
August 4, 2005 8:43:18 PM

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

On 4 Aug 2005 08:22:53 -0700, "Robert Myers" <rbmyersusa@gmail.com> wrote:

>Del Cecchi wrote:
>> George Macdonald wrote:
>> > On Fri, 29 Jul 2005 00:50:51 GMT, Student <spam@intel.com> wrote:
>> >
>> >
>> >>Is Hypertransport faster or Infiniband?
>> >
>> >
>> > It doesn't matter - they are not targeted at the same "problem". I must
>> > say Inifiniband's proponents have not helped here by announcing it as a
>> > "do-all" "solution" for on-board as well as off-board links... so bloody
>> > confusing. The way I see it, Inifiniband, despite claims, is an off-board
>> > wired or back-plane transport which possibly has better error
>> > detection/recovery.
>> >
>> > On that last point, I keep reading that Hypertransport suffers from lack of
>> > error detection/recovery but CRC checking and packet retries are clearly in
>> > the specs so I don't know what the full story is there yet. To put things
>> > in perspective, Hypertransport links on currently available mbrds run at
>> > approximately the same speed as current PCI Express, which on a x16 link
>> > has a peak bandwidth of 4.1GB/s (B=Byte); a 16/16 HT link has a peak
>> > bandwidth of 4GB/s in each direction.
>> >
>>
>> It may be in the spec, but the retry is recent. So what do you do in a
>> pc when you get an error and don't find out about it for 512 bytes? Do
>> you retry the last 512 bytes worth of transactions? If you would check
>> into it you would find that the actual result is a crash of some sort.
>>
>
>So how does that work in practice? One gathers from Stone and
>Partridge work on ethernet checksum vs CRC errors that undetected
>errors are probably much more common than anyone would have cared to
>think. Does anybody know about HT-type traffic? If someone bothers to
>do a study, we'll find out that computers have turned into random
>number generators?

Turned into? PCs have always been random number generators. Why else did
we need a Ctl/Alt/Del key combo?:-)

I guess it's true to say that, as PCs have migrated up to "important"
tasks, the need for confidence in data integrity has increased.

--
Rgds, George Macdonald
Anonymous
a b à CPUs
August 4, 2005 9:06:15 PM

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

Robert Myers wrote:
> Del Cecchi wrote:
>
>>George Macdonald wrote:
>>
>>>On Fri, 29 Jul 2005 00:50:51 GMT, Student <spam@intel.com> wrote:
>>>
>>>
>>>
>>>>Is Hypertransport faster or Infiniband?
>>>
>>>
>>>It doesn't matter - they are not targeted at the same "problem". I must
>>>say Inifiniband's proponents have not helped here by announcing it as a
>>>"do-all" "solution" for on-board as well as off-board links... so bloody
>>>confusing. The way I see it, Inifiniband, despite claims, is an off-board
>>>wired or back-plane transport which possibly has better error
>>>detection/recovery.
>>>
>>>On that last point, I keep reading that Hypertransport suffers from lack of
>>>error detection/recovery but CRC checking and packet retries are clearly in
>>>the specs so I don't know what the full story is there yet. To put things
>>>in perspective, Hypertransport links on currently available mbrds run at
>>>approximately the same speed as current PCI Express, which on a x16 link
>>>has a peak bandwidth of 4.1GB/s (B=Byte); a 16/16 HT link has a peak
>>>bandwidth of 4GB/s in each direction.
>>>
>>
>>It may be in the spec, but the retry is recent. So what do you do in a
>>pc when you get an error and don't find out about it for 512 bytes? Do
>>you retry the last 512 bytes worth of transactions? If you would check
>>into it you would find that the actual result is a crash of some sort.
>>
>
>
> So how does that work in practice? One gathers from Stone and
> Partridge work on ethernet checksum vs CRC errors that undetected
> errors are probably much more common than anyone would have cared to
> think. Does anybody know about HT-type traffic? If someone bothers to
> do a study, we'll find out that computers have turned into random
> number generators?
>
> As it is, the Stone and Partridge stuff doesn't seem to have created
> much more than some interesting exchanges on comp.arch. Does anybody
> care anymore? I'm sure that IBM does, but can it afford to?
>
> RM
>

I don't know about their work, and I know little about ethernet. I do
know from experience in the lab that a 32 bit crc, properly chosen, with
retry can cope with quite high error rates without any problem with the
system. And I would believe that the systems in question would not
tolerate very many undetected errors because the disks for the virtual
memory, and the coherence traffic if any was carried over the network in
question along with all the other I/O traffic.



--
Del Cecchi
"This post is my own and doesn’t necessarily represent IBM’s positions,
strategies or opinions.”
Anonymous
a b à CPUs
August 5, 2005 8:10:47 AM

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

Del Cecchi wrote:
> Robert Myers wrote:

> >
> > As it is, the Stone and Partridge stuff doesn't seem to have created
> > much more than some interesting exchanges on comp.arch. Does anybody
> > care anymore? I'm sure that IBM does, but can it afford to?
> >
>
> I don't know about their work, and I know little about ethernet. I do
> know from experience in the lab that a 32 bit crc, properly chosen, with
> retry can cope with quite high error rates without any problem with the
> system. And I would believe that the systems in question would not
> tolerate very many undetected errors because the disks for the virtual
> memory, and the coherence traffic if any was carried over the network in
> question along with all the other I/O traffic.
>

I think you did participate in the discussion of this subject on
comp.arch:

Stone, J., Partridge, C.: "When The CRC and TCP Checksum Disagree",
Proceedings of the ACM conference on Applications, Technologies,
Architectures, and Protocols for Computer Communication (SIGCOMM'00),
Stockholm, Sweden, August/September 2000, pp. 309-319

Abstract

"Traces of Internet packets from the past two years show that between 1
packet in 1,100 and 1 packet in 32,000 fails the TCP checksum, even on
links where link-level CRCs should catch all but 1 in 4 billion errors.
For certain situations, the rate of checksum failures can be even
higher: in one hour-long test we observed a checksum failure of 1
packet in 400. We investigate why so many errors are observed, when
link-level CRCs should catch nearly all of them.We have collected
nearly 500,000 packets which failed the TCP or UDP or IP checksum. This
dataset shows the Internet has a wide variety of error sources which
can not be detected by link-level checks. We describe analysis tools
that have identified nearly 100 different error patterns. Categorizing
packet errors, we can infer likely causes which explain roughly half
the observed errors. The causes span the entire spectrum of a network
stack, from memory errors to bugs in TCP.After an analysis we conclude
that the checksum will fail to detect errors for roughly 1 in 16
million to 10 billion packets. From our analysis of the cause of
errors, we propose simple changes to several protocols which will
decrease the rate of undetected error. Even so, the highly non-random
distribution of errors strongly suggests some applications should
employ application-level checksums or equivalents."

It may not be a good model for the possiblity of other link-level
errors, but it does make you wonder.

In overclocking tests, I've found that PC's will tolerate significant
memory errors without giving any immediate indication of a problem.
Short of a crash, I don't know how you'd know anything was wrong
without application-level checking.


RM
Anonymous
a b à CPUs
August 5, 2005 7:42:05 PM

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

On 5 Aug 2005 04:10:47 -0700, "Robert Myers" <rbmyersusa@gmail.com> wrote:

>Del Cecchi wrote:
>> Robert Myers wrote:
>
>> >
>> > As it is, the Stone and Partridge stuff doesn't seem to have created
>> > much more than some interesting exchanges on comp.arch. Does anybody
>> > care anymore? I'm sure that IBM does, but can it afford to?
>> >
>>
>> I don't know about their work, and I know little about ethernet. I do
>> know from experience in the lab that a 32 bit crc, properly chosen, with
>> retry can cope with quite high error rates without any problem with the
>> system. And I would believe that the systems in question would not
>> tolerate very many undetected errors because the disks for the virtual
>> memory, and the coherence traffic if any was carried over the network in
>> question along with all the other I/O traffic.
>>
>
>I think you did participate in the discussion of this subject on
>comp.arch:
>
>Stone, J., Partridge, C.: "When The CRC and TCP Checksum Disagree",
>Proceedings of the ACM conference on Applications, Technologies,
>Architectures, and Protocols for Computer Communication (SIGCOMM'00),
>Stockholm, Sweden, August/September 2000, pp. 309-319
>
>Abstract
>
>"Traces of Internet packets from the past two years show that between 1
>packet in 1,100 and 1 packet in 32,000 fails the TCP checksum, even on
>links where link-level CRCs should catch all but 1 in 4 billion errors.
>For certain situations, the rate of checksum failures can be even
>higher: in one hour-long test we observed a checksum failure of 1
>packet in 400. We investigate why so many errors are observed, when
>link-level CRCs should catch nearly all of them.We have collected
>nearly 500,000 packets which failed the TCP or UDP or IP checksum. This
>dataset shows the Internet has a wide variety of error sources which
>can not be detected by link-level checks. We describe analysis tools
>that have identified nearly 100 different error patterns. Categorizing
>packet errors, we can infer likely causes which explain roughly half
>the observed errors. The causes span the entire spectrum of a network
>stack, from memory errors to bugs in TCP.After an analysis we conclude
>that the checksum will fail to detect errors for roughly 1 in 16
>million to 10 billion packets. From our analysis of the cause of
>errors, we propose simple changes to several protocols which will
>decrease the rate of undetected error. Even so, the highly non-random
>distribution of errors strongly suggests some applications should
>employ application-level checksums or equivalents."
>
>It may not be a good model for the possiblity of other link-level
>errors, but it does make you wonder.

I know of two NICs which are reputed to have a bug in their checksum
offloading. What amazes me is that the only software where I've seen a
hiccup from this is Eudora, where it reports error 10053 or 10054 when you
try to send a longish e-mail msg. Turning off "Checksum Offload" fixes the
problem.

>In overclocking tests, I've found that PC's will tolerate significant
>memory errors without giving any immediate indication of a problem.
>Short of a crash, I don't know how you'd know anything was wrong
>without application-level checking.

I suspect that many overclockers are running slightly over the ragged
edge... and that stable really means very low error rate.

--
Rgds, George Macdonald
Anonymous
a b à CPUs
August 5, 2005 9:48:08 PM

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

"Robert Myers" <rbmyersusa@gmail.com> wrote in message
news:1123240247.383142.203260@g14g2000cwa.googlegroups.com...
> Del Cecchi wrote:
>> Robert Myers wrote:
>
>> >
>> > As it is, the Stone and Partridge stuff doesn't seem to have created
>> > much more than some interesting exchanges on comp.arch. Does
>> > anybody
>> > care anymore? I'm sure that IBM does, but can it afford to?
>> >
>>
>> I don't know about their work, and I know little about ethernet. I do
>> know from experience in the lab that a 32 bit crc, properly chosen,
>> with
>> retry can cope with quite high error rates without any problem with
>> the
>> system. And I would believe that the systems in question would not
>> tolerate very many undetected errors because the disks for the virtual
>> memory, and the coherence traffic if any was carried over the network
>> in
>> question along with all the other I/O traffic.
>>
>
> I think you did participate in the discussion of this subject on
> comp.arch:
>
> Stone, J., Partridge, C.: "When The CRC and TCP Checksum Disagree",
> Proceedings of the ACM conference on Applications, Technologies,
> Architectures, and Protocols for Computer Communication (SIGCOMM'00),
> Stockholm, Sweden, August/September 2000, pp. 309-319
>
> Abstract
>
> "Traces of Internet packets from the past two years show that between 1
> packet in 1,100 and 1 packet in 32,000 fails the TCP checksum, even on
> links where link-level CRCs should catch all but 1 in 4 billion errors.
> For certain situations, the rate of checksum failures can be even
> higher: in one hour-long test we observed a checksum failure of 1
> packet in 400. We investigate why so many errors are observed, when
> link-level CRCs should catch nearly all of them.We have collected
> nearly 500,000 packets which failed the TCP or UDP or IP checksum. This
> dataset shows the Internet has a wide variety of error sources which
> can not be detected by link-level checks. We describe analysis tools
> that have identified nearly 100 different error patterns. Categorizing
> packet errors, we can infer likely causes which explain roughly half
> the observed errors. The causes span the entire spectrum of a network
> stack, from memory errors to bugs in TCP.After an analysis we conclude
> that the checksum will fail to detect errors for roughly 1 in 16
> million to 10 billion packets. From our analysis of the cause of
> errors, we propose simple changes to several protocols which will
> decrease the rate of undetected error. Even so, the highly non-random
> distribution of errors strongly suggests some applications should
> employ application-level checksums or equivalents."
>
> It may not be a good model for the possiblity of other link-level
> errors, but it does make you wonder.
>
> In overclocking tests, I've found that PC's will tolerate significant
> memory errors without giving any immediate indication of a problem.
> Short of a crash, I don't know how you'd know anything was wrong
> without application-level checking.
>
>
> RM

OK, thanks for the reminder. As I recall, claiming that the checksums
were missing the errors was a mild distortion. The errors were
transpositions of data blocks being fetched to the adapter so the data
was bad when it got there. Is that not the case?

As for PCs not being affected by memory errors, how many do you estimate
it took to crash the system? The lab system I was referring to was
seeing many errors per second.

del
>
Anonymous
a b à CPUs
August 5, 2005 10:02:36 PM

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

Bitstring <1123240247.383142.203260@g14g2000cwa.googlegroups.com>, from
the wonderful person Robert Myers <rbmyersusa@gmail.com> said
<snip>
>In overclocking tests, I've found that PC's will tolerate significant
>memory errors without giving any immediate indication of a problem.
>Short of a crash, I don't know how you'd know anything was wrong
>without application-level checking.

If it'll (successfully) run Memtest86 overnight, and it'll pass the
Prime95 torture tests, then it's working right (IME). If it won't, then
it might run WinXP and applications anyway, but it'll do strange things
from time to time ...

--
GSV Three Minds in a Can
Contact recommends the use of Firefox; SC recommends it at gunpoint.
Anonymous
a b à CPUs
August 6, 2005 2:32:27 PM

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

Del Cecchi wrote:

>
> OK, thanks for the reminder. As I recall, claiming that the checksums
> were missing the errors was a mild distortion. The errors were
> transpositions of data blocks being fetched to the adapter so the data
> was bad when it got there. Is that not the case?
>

That was one of the explanations. I wasn't convinced there was any one
single explanation that would have dominated.

I concluded that, if you really had to know that your data are
reliable, you should probably do your own end-to-end error checking.

> As for PCs not being affected by memory errors, how many do you estimate
> it took to crash the system? The lab system I was referring to was
> seeing many errors per second.
>

Oh, a few errors per hour will generally let a system run, IIRC. The
difference in speed between running on the ragged edge like that and
not running at all is so small that it isn't worth running on the
ragged edge.

RM
Anonymous
a b à CPUs
August 6, 2005 5:48:39 PM

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

"Robert Myers" <rbmyersusa@gmail.com> wrote in message
news:1123349547.132583.294650@f14g2000cwb.googlegroups.com...
> Del Cecchi wrote:
>
>>
>> OK, thanks for the reminder. As I recall, claiming that the checksums
>> were missing the errors was a mild distortion. The errors were
>> transpositions of data blocks being fetched to the adapter so the data
>> was bad when it got there. Is that not the case?
>>
>
> That was one of the explanations. I wasn't convinced there was any one
> single explanation that would have dominated.
>
> I concluded that, if you really had to know that your data are
> reliable, you should probably do your own end-to-end error checking.
>
>> As for PCs not being affected by memory errors, how many do you
>> estimate
>> it took to crash the system? The lab system I was referring to was
>> seeing many errors per second.
>>
>
> Oh, a few errors per hour will generally let a system run, IIRC. The
> difference in speed between running on the ragged edge like that and
> not running at all is so small that it isn't worth running on the
> ragged edge.
>
> RM
>
The lab system was in the range of 10**5 error/sec and still ran
perfectly with 32 bit crc and retry. A bad cable can really prove your
recovery mechanism.

So it sounds like the protocol or the software or something for the
ethernet systems in question were broken.

del
Anonymous
a b à CPUs
August 8, 2005 5:12:06 PM

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

Del Cecchi <dcecchi.nospam@att.net> wrote:
> The lab system was in the range of 10**5 error/sec and
> still ran perfectly with 32 bit crc and retry.

Yes, this is barely possible on a 100 Mbit/s system.
A 64 byte packet has a 60% chance of arriving error free.
Unfortunately, a 1500 byte packet has only a 0.0006% chance,
assuming random error distribution. So acks get through,
but data in will be bad.

> A bad cable can really prove your recovery mechanism.

Yep! Beware of newbies with crimpers! RJ45s are hard to do,
and not just because the correct pattern is counter-intuitive.
All the intuitive patterns split a pair which often gives
some connectivity but poor performance. There are 40,320 ways
of wiring the 8 conductor cable straight-thru. All but 1,152
split at least one pair necessary for 10baseT or 100baseTX.

> So it sounds like the protocol or the software or something
> for the ethernet systems in question were broken.

I would hope any system with anything near 0.1% error rates
was using ECC, not just CRC.

-- Robert
Anonymous
a b à CPUs
August 8, 2005 8:56:12 PM

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

On Mon, 08 Aug 2005 13:12:06 GMT, Robert Redelmeier
<redelm@ev1.net.invalid> wrote:

>Del Cecchi <dcecchi.nospam@att.net> wrote:
>> The lab system was in the range of 10**5 error/sec and
>> still ran perfectly with 32 bit crc and retry.
>
>Yes, this is barely possible on a 100 Mbit/s system.
>A 64 byte packet has a 60% chance of arriving error free.
>Unfortunately, a 1500 byte packet has only a 0.0006% chance,
>assuming random error distribution. So acks get through,
>but data in will be bad.
>
>> A bad cable can really prove your recovery mechanism.
>
>Yep! Beware of newbies with crimpers! RJ45s are hard to do,
>and not just because the correct pattern is counter-intuitive.
>All the intuitive patterns split a pair which often gives
>some connectivity but poor performance. There are 40,320 ways
>of wiring the 8 conductor cable straight-thru. All but 1,152
>split at least one pair necessary for 10baseT or 100baseTX.

Switches with just a Web based interface, which allow you to collect error
rates and mirror ports, are cheap now. All the Cat5 that I put in is now
running 1Gb/s Full Duplex, with maybe 5-6 errors/week/port due, I believe,
to speed ramping at PC power on.

While there is undoubtedly bad cable around, much of it was done by
"professionals" or taken off the shelf... or even just caused by physical
abuse or misrouting in the wall or ceiling/floor cavity. I also tend to
think much of "bad cable" is due to legacy "telephone" mentality, equipment
practices and personnel. To me the punch-down block is a scary and
dangerous place.:-)

--
Rgds, George Macdonald
Anonymous
a b à CPUs
August 8, 2005 11:09:09 PM

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

"Robert Redelmeier" <redelm@ev1.net.invalid> wrote in message
news:GSIJe.390$Ub1.125@newssvr29.news.prodigy.net...
> Del Cecchi <dcecchi.nospam@att.net> wrote:
>> The lab system was in the range of 10**5 error/sec and
>> still ran perfectly with 32 bit crc and retry.
>
> Yes, this is barely possible on a 100 Mbit/s system.
> A 64 byte packet has a 60% chance of arriving error free.
> Unfortunately, a 1500 byte packet has only a 0.0006% chance,
> assuming random error distribution. So acks get through,
> but data in will be bad.
>
>> A bad cable can really prove your recovery mechanism.
>
> Yep! Beware of newbies with crimpers! RJ45s are hard to do,
> and not just because the correct pattern is counter-intuitive.
> All the intuitive patterns split a pair which often gives
> some connectivity but poor performance. There are 40,320 ways
> of wiring the 8 conductor cable straight-thru. All but 1,152
> split at least one pair necessary for 10baseT or 100baseTX.
>
>> So it sounds like the protocol or the software or something
>> for the ethernet systems in question were broken.
>
> I would hope any system with anything near 0.1% error rates
> was using ECC, not just CRC.
>
> -- Robert
>
This was a parallel source synchronous link (RIO) running at a GB/sec.
And the error rate was packet errors. I didn't have any way to collect
statistics on bit errors. CRC with retry is the moral equivilent of ECC.
>
>
Anonymous
a b à CPUs
August 9, 2005 2:42:23 AM

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

George Macdonald <fammacd=!SPAM^nothanks@tellurian.com> wrote:
> Switches with just a Web based interface, which allow you
> to collect error rates and mirror ports, are cheap now.

Any particular brands/models you'd recommend?

> All the Cat5 that I put in is now running 1Gb/s Full Duplex,
> with maybe 5-6 errors/week/port due, I believe, to speed
> ramping at PC power on.

Could be. Also could be interference from noisemakers
like motor starts. But sounds like good cable.

> While there is undoubtedly bad cable around, much of it
> was done by "professionals" or taken off the shelf... or
> even just caused by physical abuse or misrouting in the
> wall or ceiling/floor cavity. I also tend to think much
> of "bad cable" is due to legacy "telephone" mentality,

Yes, a lot of that. But crimpers still aren't easy
even after you know T-568A from T-568B

> equipment practices and personnel. To me the punch-down
> block is a scary and dangerous place.:-)

Hey, jacks have punchdowns too! And if you _really_ like
retro, Siemon makes a Cat5e rated 66 block :) 

-- Robert
Anonymous
a b à CPUs
August 9, 2005 5:07:04 AM

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

Del Cecchi <dcecchi.nospam@att.net> wrote:
> This was a parallel source synchronous link (RIO) running
> at a GB/sec. And the error rate was packet errors.
> I didn't have any way to collect statistics on bit errors.

Probably the same 1e5/s -- that link was running around
1e10 bit/s. Unless non random, or extremely large packets,
the chances of having two+ errors in one packet are very small.

> CRC with retry is the moral equivilent of ECC.

Well, that's an odd sense of morality :) 

CRC with retry has low "clean" overhead, but throws away lots
of good? bits. ECC has much higher overhead but seldom throws
anything away. There is an error-rate breakpoint below which
CRC/retry is best, above ECC is better.

-- Robert
Anonymous
a b à CPUs
August 9, 2005 5:07:05 AM

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

"Robert Redelmeier" <redelm@ev1.net.invalid> wrote in message
news:YkTJe.775$FV1.685@newssvr33.news.prodigy.com...
> Del Cecchi <dcecchi.nospam@att.net> wrote:
>> This was a parallel source synchronous link (RIO) running
>> at a GB/sec. And the error rate was packet errors.
>> I didn't have any way to collect statistics on bit errors.
>
> Probably the same 1e5/s -- that link was running around
> 1e10 bit/s. Unless non random, or extremely large packets,
> the chances of having two+ errors in one packet are very small.
>
>> CRC with retry is the moral equivilent of ECC.
>
> Well, that's an odd sense of morality :) 
>
> CRC with retry has low "clean" overhead, but throws away lots
> of good? bits. ECC has much higher overhead but seldom throws
> anything away. There is an error-rate breakpoint below which
> CRC/retry is best, above ECC is better.
>
> -- Robert

Well, the error rate was *supposed* to be low, and latency is not an
issue generally with this interface.

Del
>
Anonymous
a b à CPUs
August 9, 2005 8:08:00 PM

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

On Mon, 08 Aug 2005 22:42:23 GMT, Robert Redelmeier
<redelm@ev1.net.invalid> wrote:

>George Macdonald <fammacd=!SPAM^nothanks@tellurian.com> wrote:
>> Switches with just a Web based interface, which allow you
>> to collect error rates and mirror ports, are cheap now.
>
>Any particular brands/models you'd recommend?

Well, when we needed more 100Mb ports in the office and I figured I might
look at a switch with 24x10/100Mb w. 2x1Gb ports to replace the old hubs, I
couldn't see not splurging for a 24x10/100/1Gb for the small difference in
price. I got a D-Link DGS-1224T - this was partly based on the fact that
D-Link is apparently one of the few real mfrs of this stuff and my
experience with the old D-Link hubs which worked reliably for nearly 10
years. Newegg has the DGS-1224T for $355. now - less than I paid - and it
has a Web-based management intreface.

It's been working fine for several months now; my only complaint: noise.
The 2 fans are a bit loud and I'm going to have a word with them about it -
they sell it as a rack-mountable or desktop switch and for the latter, it's
too loud.

It's possible that Linksys has something good but I'm prejudiced against
them because of some bad NICs I got a few years ago which would hang the
network/hubs when the system they were in was powered down. It's possible
that they've improved quality since being taken over by Cisco but I'm still
wary - those damned NICs tore out a lot of my hair till it dawned on me
what was going on.

>> All the Cat5 that I put in is now running 1Gb/s Full Duplex,
>> with maybe 5-6 errors/week/port due, I believe, to speed
>> ramping at PC power on.
>
>Could be. Also could be interference from noisemakers
>like motor starts. But sounds like good cable.

Yeah I guess it could be A/C sags/surges but I'd think we would see more if
that was the case.

>> While there is undoubtedly bad cable around, much of it
>> was done by "professionals" or taken off the shelf... or
>> even just caused by physical abuse or misrouting in the
>> wall or ceiling/floor cavity. I also tend to think much
>> of "bad cable" is due to legacy "telephone" mentality,
>
>Yes, a lot of that. But crimpers still aren't easy
>even after you know T-568A from T-568B

I do OK with ours but you have to buy a quality crimper - Paladin IIRC? To
trim wires square, I find a sturdy pair of scissors is best.

>> equipment practices and personnel. To me the punch-down
>> block is a scary and dangerous place.:-)
>
>Hey, jacks have punchdowns too! And if you _really_ like
>retro, Siemon makes a Cat5e rated 66 block :) 

We don't have a wiring closet/cabinet for our small (24xGb + 8x10BaseT
network - all wires are err, straight through from PC to switch/hub -
breaks all the rules, I know, but it works. I looked at getting a
wall-mount cabinet but too much aggro - just buying the damned things is an
exercise.

--
Rgds, George Macdonald
!