Archived from groups: comp.sys.ibm.pc.hardware.chips (
More info?)
"Robert Myers" <rbmyersusa@gmail.com> wrote in message
news:1123240247.383142.203260@g14g2000cwa.googlegroups.com...
> Del Cecchi wrote:
>> Robert Myers wrote:
>
>> >
>> > As it is, the Stone and Partridge stuff doesn't seem to have created
>> > much more than some interesting exchanges on comp.arch. Does
>> > anybody
>> > care anymore? I'm sure that IBM does, but can it afford to?
>> >
>>
>> I don't know about their work, and I know little about ethernet. I do
>> know from experience in the lab that a 32 bit crc, properly chosen,
>> with
>> retry can cope with quite high error rates without any problem with
>> the
>> system. And I would believe that the systems in question would not
>> tolerate very many undetected errors because the disks for the virtual
>> memory, and the coherence traffic if any was carried over the network
>> in
>> question along with all the other I/O traffic.
>>
>
> I think you did participate in the discussion of this subject on
> comp.arch:
>
> Stone, J., Partridge, C.: "When The CRC and TCP Checksum Disagree",
> Proceedings of the ACM conference on Applications, Technologies,
> Architectures, and Protocols for Computer Communication (SIGCOMM'00),
> Stockholm, Sweden, August/September 2000, pp. 309-319
>
> Abstract
>
> "Traces of Internet packets from the past two years show that between 1
> packet in 1,100 and 1 packet in 32,000 fails the TCP checksum, even on
> links where link-level CRCs should catch all but 1 in 4 billion errors.
> For certain situations, the rate of checksum failures can be even
> higher: in one hour-long test we observed a checksum failure of 1
> packet in 400. We investigate why so many errors are observed, when
> link-level CRCs should catch nearly all of them.We have collected
> nearly 500,000 packets which failed the TCP or UDP or IP checksum. This
> dataset shows the Internet has a wide variety of error sources which
> can not be detected by link-level checks. We describe analysis tools
> that have identified nearly 100 different error patterns. Categorizing
> packet errors, we can infer likely causes which explain roughly half
> the observed errors. The causes span the entire spectrum of a network
> stack, from memory errors to bugs in TCP.After an analysis we conclude
> that the checksum will fail to detect errors for roughly 1 in 16
> million to 10 billion packets. From our analysis of the cause of
> errors, we propose simple changes to several protocols which will
> decrease the rate of undetected error. Even so, the highly non-random
> distribution of errors strongly suggests some applications should
> employ application-level checksums or equivalents."
>
> It may not be a good model for the possiblity of other link-level
> errors, but it does make you wonder.
>
> In overclocking tests, I've found that PC's will tolerate significant
> memory errors without giving any immediate indication of a problem.
> Short of a crash, I don't know how you'd know anything was wrong
> without application-level checking.
>
>
> RM
OK, thanks for the reminder. As I recall, claiming that the checksums
were missing the errors was a mild distortion. The errors were
transpositions of data blocks being fetched to the adapter so the data
was bad when it got there. Is that not the case?
As for PCs not being affected by memory errors, how many do you estimate
it took to crash the system? The lab system I was referring to was
seeing many errors per second.
del
>