The case for ECC.

Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

Hi,

I know this issue has been pretty much been run in the ground, but the
last year has changed my mind on this issue.

Two cases.

1) Very old pentium pro system with ecc memory used for a
firewall/bridge. Immediately had a memory parity error show on the
screen and the system halted. Checked the memory with memtest and
sure enough it was bad. Replaced the memory and everything was back
to normal. It took a day to correct the problem, and data was
intact. Memory was noname and no warranty.

2) Had a 2 year old amd athlon system with non-ecc memory and the
system started locking up. One of the disks was corrupted. I
started trying to track the problem down, and continued to have
random system lockups. It got so bad the system was not booting.
Removed all cards but the video card, and still lockups. Finally
checked the memory with memtest, and sure enough the memory was
bad. System was never overclocked and did not have any heat
related problems. Well after data corruption, and 4-5 days of
pulling my hair out, I figured it out. Memory was name brand with a
lifetime warranty, I sent for a RMA on the memory.

The long story is I prefer case #1 over case #2. At least it is easier
to diagnosis the problem with ECC memory. I thought that memory was so
good now that home users did not need ECC memory, and that is what
many regular posters in this newsgroup have said over and over.

The next system I purchase will have ECC memory. My time is well worth
the minor difference in price. Since I don't overclock it is not an
issue. Heck, that fancy overclocking memory costs way more than ECC
memory.

Whatever,

Alan
13 answers Last reply
More about case
  1. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    On Sun, 12 Dec 2004 10:50:55 -0600, Alan Walpool wrote:

    > Hi,
    >
    > I know this issue has been pretty much been run in the ground, but the
    > last year has changed my mind on this issue.
    >
    > Two cases.
    >
    > 1) Very old pentium pro system with ecc memory used for a
    > firewall/bridge. Immediately had a memory parity error show on the
    > screen and the system halted. Checked the memory with memtest and
    > sure enough it was bad. Replaced the memory and everything was back
    > to normal. It took a day to correct the problem, and data was
    > intact. Memory was noname and no warranty.
    >
    > 2) Had a 2 year old amd athlon system with non-ecc memory and the
    > system started locking up. One of the disks was corrupted. I
    > started trying to track the problem down, and continued to have
    > random system lockups. It got so bad the system was not booting.
    > Removed all cards but the video card, and still lockups. Finally
    > checked the memory with memtest, and sure enough the memory was
    > bad. System was never overclocked and did not have any heat
    > related problems. Well after data corruption, and 4-5 days of
    > pulling my hair out, I figured it out. Memory was name brand with a
    > lifetime warranty, I sent for a RMA on the memory.
    >
    > The long story is I prefer case #1 over case #2. At least it is easier
    > to diagnosis the problem with ECC memory. I thought that memory was so
    > good now that home users did not need ECC memory, and that is what
    > many regular posters in this newsgroup have said over and over.

    I'm not sure what regulars have said such here. I have ECC memory
    even on my K6-III system. Memory has never been "so good" that it never
    fails. My only "issue" with ECC is that I can't test whether it's really
    working (why have I never seen an error?). How do I know that any errors
    are actually getting reported somewhere so I can take corrective action?

    > The next system I purchase will have ECC memory. My time is well worth
    > the minor difference in price. Since I don't overclock it is not an
    > issue. Heck, that fancy overclocking memory costs way more than ECC
    > memory.

    ECC memory prices dropped down to the 11% overhead number a long time ago.
    Memory for the K6-III was cheap enough in '99 that I figured, "why not?"

    --
    Keith
  2. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    Alan Walpool wrote:
    > 1) Very old pentium pro system with ecc memory used for a
    > firewall/bridge. Immediately had a memory parity error show on the
    > screen and the system halted. Checked the memory with memtest and
    > sure enough it was bad. Replaced the memory and everything was back
    > to normal. It took a day to correct the problem, and data was
    > intact. Memory was noname and no warranty.

    Did the motherboard/BIOS support ECC RAM? The thing about ECC ram is
    that it should transparently fix 1-bit memory errors.

    Or do you think that the RAM has been going bad and the system has been
    fixing 1-bit errors and then finally got to the point where it
    encountered a 2-bit error?

    Either way, I do agree that ECC is nice to have in a system. My current
    system has it, and I think it was a nice investment. The only thing I
    think would be nice is if my motherboard had some sort of DMI logging
    mechanism for memory errors. That way I'd be able to see if the ECC
    has done its job at any point during the time I've owned it.


    --
    -WD
  3. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    >>>>> "keith" == keith <krw@att.bizzzz> writes:

    keith> I'm not sure what regulars have said such here. I have ECC
    keith> memory even on my K6-III system. Memory has never been "so
    keith> good" that it never fails. My only "issue" with ECC is that I
    keith> can't test whether it's really working (why have I never seen
    keith> an error?). How do I know that any errors are actually getting
    keith> reported somewhere so I can take corrective action?

    My old pentium pro motherboard has a memory error count in the bios.
    At least on the bios I have you can monitor ECC corrections there. If
    it gets really bad it will cause a parity error and shutdown the
    system.

    Depends on the bios and motherboard.

    Interesting.

    Alan
  4. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    >>>>> "Will" == Will Dormann <wdormann@yahoo.com.invalid> writes:

    Will> Alan Walpool wrote:
    >> 1) Very old pentium pro system with ecc memory used for a
    >> firewall/bridge. Immediately had a memory parity error show on the
    >> screen and the system halted. Checked the memory with memtest and
    >> sure enough it was bad. Replaced the memory and everything was
    >> back to normal. It took a day to correct the problem, and data was
    >> intact. Memory was noname and no warranty.

    Will> Did the motherboard/BIOS support ECC RAM? The thing about ECC
    Will> ram is that it should transparently fix 1-bit memory errors.

    Will> Or do you think that the RAM has been going bad and the system
    Will> has been fixing 1-bit errors and then finally got to the point
    Will> where it encountered a 2-bit error?

    Will> Either way, I do agree that ECC is nice to have in a system. My
    Will> current system has it, and I think it was a nice investment.
    Will> The only thing I think would be nice is if my motherboard had
    Will> some sort of DMI logging mechanism for memory errors. That way
    Will> I'd be able to see if the ECC has done its job at any point
    Will> during the time I've owned it.

    The motherboard bios reported and detected the ECC memory fine. The
    bios in that old pentium pro motherboard logs ECC errors. It was
    reporting some errors at first but nothing that it could not handle. I
    guess it became so bad that it gave it trying to correct the memory
    error and halted the system completely with a message saying memory
    error. Didn't write down the exact error message.

    I guess this all depends on the bios weather it reports errors or not.

    I have not checked lately but I seriously doubt desktop PC's have any
    logging for ECC errors. Really that old pentium pro system I have was
    really a server motherboard at one time.

    Interesting.

    Alan
  5. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    Alan Walpool wrote:
    > I have not checked lately but I seriously doubt desktop PC's have any
    > logging for ECC errors. Really that old pentium pro system I have was
    > really a server motherboard at one time.

    Yes, that seems to be the case. The only machines I've used that log
    ECC errors are SGI workstations and Dell servers. Nothing desktop-wise,
    which is a shame.

    There is a linux kernel module that supposedly monitors and reports ECC
    errors, but I haven't been able to get it to compile on my Gentoo (2.6
    kernel) system.

    http://www.anime.net/~goemon/linux-ecc/

    Would be nice if Windows had some sort of equivalent functionality...


    --
    -WD
  6. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    Will Dormann <wdormann@yahoo.com.invalid> wrote:
    > Alan Walpool wrote:
    > > I have not checked lately but I seriously doubt desktop PC's have any
    > > logging for ECC errors. Really that old pentium pro system I have was
    > > really a server motherboard at one time.

    > Yes, that seems to be the case. The only machines I've used that log
    > ECC errors are SGI workstations and Dell servers. Nothing desktop-wise,
    > which is a shame.

    The hardware hooks are there for the 925X series chipset.
    I haven't looked very hard, but IIRC, they're pretty much
    in all the older "high end" desktop chipsets as well, something
    like the 875P.

    Whether software uses those hooks and log (correctable) 1 bit ECC
    errors or not is another story.


    --
    davewang202(at)yahoo(dot)com
  7. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    On Sun, 12 Dec 2004 10:50:55 -0600, Alan Walpool
    <awalpool@onzedge.net> wrote:

    >Hi,
    >
    >I know this issue has been pretty much been run in the ground, but the
    >last year has changed my mind on this issue.
    >
    >Two cases.
    >
    >1) Very old pentium pro system with ecc memory used for a
    > firewall/bridge. Immediately had a memory parity error show on the
    > screen and the system halted. Checked the memory with memtest and
    > sure enough it was bad. Replaced the memory and everything was back
    > to normal. It took a day to correct the problem, and data was
    > intact. Memory was noname and no warranty.
    >
    >2) Had a 2 year old amd athlon system with non-ecc memory and the
    > system started locking up. One of the disks was corrupted. I
    > started trying to track the problem down, and continued to have
    > random system lockups. It got so bad the system was not booting.
    > Removed all cards but the video card, and still lockups. Finally
    > checked the memory with memtest, and sure enough the memory was
    > bad. System was never overclocked and did not have any heat
    > related problems. Well after data corruption, and 4-5 days of
    > pulling my hair out, I figured it out. Memory was name brand with a
    > lifetime warranty, I sent for a RMA on the memory.
    >
    >The long story is I prefer case #1 over case #2. At least it is easier
    >to diagnosis the problem with ECC memory. I thought that memory was so
    >good now that home users did not need ECC memory, and that is what
    >many regular posters in this newsgroup have said over and over.
    >
    >The next system I purchase will have ECC memory. My time is well worth
    >the minor difference in price. Since I don't overclock it is not an
    >issue. Heck, that fancy overclocking memory costs way more than ECC
    >memory.
    >
    >Whatever,
    >
    >Alan
    Man, you just made a case for socket 940 - the board is cheaper than
    939, the CPU (Opteron) goes for roughly the same price as equivalent
    939 (A64FX), the only complaint usually is that registered ECC RAM it
    uses is somewhat slower and more expensive. But you want ECC, so 940
    is the way to go, unless you are willing to pay an arm and a leg for
    slower Xeon.
  8. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    Alan Walpool wrote:
    >
    > 2) Had a 2 year old amd athlon system with non-ecc memory and the
    > system started locking up. One of the disks was corrupted. I
    > started trying to track the problem down, and continued to have
    > random system lockups. It got so bad the system was not booting.
    > Removed all cards but the video card, and still lockups. Finally
    > checked the memory with memtest, and sure enough the memory was
    > bad. System was never overclocked and did not have any heat
    > related problems. Well after data corruption, and 4-5 days of
    > pulling my hair out, I figured it out. Memory was name brand with
    a
    > lifetime warranty, I sent for a RMA on the memory.


    I feel your pain. It's wise to consider ECC.

    When a system has disk corruption, crashes, or blue screens I reach for
    MEMTEST first (Disk Doctor second). You can screw up memory fiddling
    with hardware, it can fail on it's own, or due to a power spike, it can
    even glitch when a cosmic ray hits it (at least that use to be a
    worry), it can be running under marginal and deteriorating conditions,
    etc.

    However, for non-critical use, if you buy "reasonable quality" (ps,
    motherboard, memory, cooling), operate within manufacturer's
    parameters, and perform burn in testing - you'll be ok. Keep MEMTEST
    handy. In my experience memory failure hasn't been an issue for years
    and years. If in doubt, ask your local hardware shop what they think of
    current configurations. I respect the expertise of good local shops,
    espcially if they warrant what they sell.
  9. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    nobody@nowhere.net wrote:

    > On Sun, 12 Dec 2004 10:50:55 -0600, Alan Walpool wrote:
    >
    >>I know this issue has been pretty much been run in the ground, but the
    >>last year has changed my mind on this issue.
    >>
    >>Two cases.
    >>
    >>1) Very old pentium pro system with ecc memory used for a
    >> firewall/bridge. Immediately had a memory parity error show on the
    >> screen and the system halted. Checked the memory with memtest and
    >> sure enough it was bad. Replaced the memory and everything was back
    >> to normal. It took a day to correct the problem, and data was
    >> intact. Memory was noname and no warranty.
    >>
    >>2) Had a 2 year old amd athlon system with non-ecc memory and the
    >> system started locking up. One of the disks was corrupted. I
    >> started trying to track the problem down, and continued to have
    >> random system lockups. It got so bad the system was not booting.
    >> Removed all cards but the video card, and still lockups. Finally
    >> checked the memory with memtest, and sure enough the memory was
    >> bad. System was never overclocked and did not have any heat
    >> related problems. Well after data corruption, and 4-5 days of
    >> pulling my hair out, I figured it out. Memory was name brand with a
    >> lifetime warranty, I sent for a RMA on the memory.
    >>
    >>The long story is I prefer case #1 over case #2. At least it is easier
    >>to diagnosis the problem with ECC memory. I thought that memory was so
    >>good now that home users did not need ECC memory, and that is what
    >>many regular posters in this newsgroup have said over and over.
    >>
    >>The next system I purchase will have ECC memory. My time is well worth
    >>the minor difference in price. Since I don't overclock it is not an
    >>issue. Heck, that fancy overclocking memory costs way more than ECC
    >>memory.
    >
    > Man, you just made a case for socket 940 - the board is cheaper than
    > 939, the CPU (Opteron) goes for roughly the same price as equivalent
    > 939 (A64FX), the only complaint usually is that registered ECC RAM it
    > uses is somewhat slower and more expensive. But you want ECC, so 940
    > is the way to go, unless you are willing to pay an arm and a leg for
    > slower Xeon.

    Registered (also known as buffered) and ECC are two separate features.
    You can buy unbuffered ECC DDR SDRAM DIMMs, for example from Kingston:

    http://www.ec.kingston.com/ecom/configurator/PartsInfo.asp?ktcpartno=KVR400X72C3A/512

    Click on search to see a list of compatible motherboards.

    There are Socket 754 and Socket 939 motherboards which support ECC
    memory modules. For example:

    http://www.asus.com/prog/spec.asp?m=K8N-E%20Deluxe

    --
    Regards, Grumble
  10. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    On Mon, 13 Dec 2004 04:15:31 GMT, "nobody@nowhere.net"
    <mygarbage2000@hotmail.com> wrote:

    >On Sun, 12 Dec 2004 10:50:55 -0600, Alan Walpool
    ><awalpool@onzedge.net> wrote:
    >
    >>The next system I purchase will have ECC memory. My time is well worth
    >>the minor difference in price. Since I don't overclock it is not an
    >>issue. Heck, that fancy overclocking memory costs way more than ECC
    >>memory.
    >>
    >Man, you just made a case for socket 940 - the board is cheaper than
    >939, the CPU (Opteron) goes for roughly the same price as equivalent
    >939 (A64FX), the only complaint usually is that registered ECC RAM it
    >uses is somewhat slower and more expensive. But you want ECC, so 940
    >is the way to go, unless you are willing to pay an arm and a leg for
    >slower Xeon.

    Perhaps more to the point, he just made a point for integrating a
    memory controller onto your CPU that supports ECC, as is done in BOTH
    the Opteron and the Athlon64.

    You don't need Socket 940 at all to use ECC, ALL Athlon64 boards
    support it (unless the BIOS goes to lengths to intentionally disable
    this feature). It's all built into the processor, and all
    Athlon64/Opteron chips, whether they be Socket 754, Socket 939 or
    Socket 940, support it.

    Unregistered (aka unbuffered) ECC chips actually only add a small cost
    over standard unregistered/non-ECC memory, they do not carry as large
    of a price premium as registered (buffered) memory. For example, if
    you check Crucial's prices:

    http://www.crucial.com/store/listmodule.asp?module=DDR+PC3200&Attrib=Package&cat=RAM

    For 512MB, the unbuffered/non-ECC memory costs $81, unbuffered ECC
    costs $106 and buffered ECC memory costs $123. The only problem here
    is that Crucial doesn't sell unbuffered ECC memory at all sizes (and
    they don't sell buffered non-ECC at any size, though the demand for
    such a setup is pretty small), ie for 1GB modules they only sell
    unbuffered non-ECC and buffered ECC.

    -------------
    Tony Hill
    hilla <underscore> 20 <at> yahoo <dot> ca
  11. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    On Sun, 12 Dec 2004 15:25:43 -0600, Alan Walpool wrote:

    >>>>>> "keith" == keith <krw@att.bizzzz> writes:
    >
    > keith> I'm not sure what regulars have said such here. I have ECC
    > keith> memory even on my K6-III system. Memory has never been "so
    > keith> good" that it never fails. My only "issue" with ECC is that I
    > keith> can't test whether it's really working (why have I never seen
    > keith> an error?). How do I know that any errors are actually getting
    > keith> reported somewhere so I can take corrective action?
    >
    > My old pentium pro motherboard has a memory error count in the bios.
    > At least on the bios I have you can monitor ECC corrections there. If
    > it gets really bad it will cause a parity error and shutdown the
    > system.

    That's a server system. I'm *quite* sure IBM's Z-Series logs memory
    errors and reports them to the mothership too. That doesn't give me a
    wonderful feeling with my commodity desktop system.
    >
    > Depends on the bios and motherboard.

    Obviously. That's the point! How does one *know*. BTW, IMO BIOS
    reporting isn't good enough. I want somehting that I can querry (indeed
    be promped with) from the OS, perhaps as root if security demands it.
    >
    > Interesting.

    Long ago in a galaxy far-far away I proposed to test for ECC function on
    motherboards that said they supported it. I couldn't figure out a
    reliable way of doing it, so that idea went west.

    --
    Keith
  12. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    On Mon, 13 Dec 2004 16:31:01 -0500, Tony Hill
    <hilla_nospam_20@yahoo.ca> wrote:

    >On Mon, 13 Dec 2004 04:15:31 GMT, "nobody@nowhere.net"
    ><mygarbage2000@hotmail.com> wrote:
    ....snip...
    >You don't need Socket 940 at all to use ECC, ALL Athlon64 boards
    >support it (unless the BIOS goes to lengths to intentionally disable
    >this feature). It's all built into the processor, and all
    >Athlon64/Opteron chips, whether they be Socket 754, Socket 939 or
    >Socket 940, support it.
    >
    >Unregistered (aka unbuffered) ECC chips actually only add a small cost
    >over standard unregistered/non-ECC memory, they do not carry as large
    >of a price premium as registered (buffered) memory. For example, if
    >you check Crucial's prices:
    >
    >http://www.crucial.com/store/listmodule.asp?module=DDR+PC3200&Attrib=Package&cat=RAM
    >
    >For 512MB, the unbuffered/non-ECC memory costs $81, unbuffered ECC
    >costs $106 and buffered ECC memory costs $123. The only problem here
    >is that Crucial doesn't sell unbuffered ECC memory at all sizes (and
    >they don't sell buffered non-ECC at any size, though the demand for
    >such a setup is pretty small), ie for 1GB modules they only sell
    >unbuffered non-ECC and buffered ECC.
    >
    >-------------
    >Tony Hill
    >hilla <underscore> 20 <at> yahoo <dot> ca

    Maybe it was my luck, but when I was building my current system, I
    found DDR 3200 ECC reg. 512 MB modules for just over $100. The
    cheapest ECC unbuffered modules at that moment were priced even
    higher, as well as buffered non-ECC. Yes, I bought them not from the
    likes of Crucial, but rather from one of pricewatch bottom-feeders,
    and these vendors probably don't store not-so-common varieties. As
    you mentioned, the choice was between unbuffered non-ECC and buffered
    ECC. Since I wanted SMP, the choice was between Opteron and Xeon, not
    between 940 and 939. Obviously Xeon looked like a loser in both
    performance and price departments ;-) but this one is a whole
    different topic. But back to the memory: if you are already set on
    ECC, and it quite likely will be registered as a side-effect, 940 only
    makes sense because both CPU and motherboard would likely come a tad
    cheaper (though had not checked prices for a few months - things could
    have changed since).
  13. Archived from groups: comp.sys.ibm.pc.hardware.chips (More info?)

    On 13 Dec 2004 12:34:14 -0800, "John" <twohandsfree@hotmail.com>

    >When a system has disk corruption, crashes, or blue screens I reach for
    >MEMTEST first (Disk Doctor second).

    http://cquirke.mvps.org/9x/bthink.htm :-)

    >In my experience memory failure hasn't been an issue for years

    Oh, it's common in the context of PCs that just don't work properly.

    I see more bad HDs than bad RAM, but it's close, and more bad RAM than
    bad motherboards or SVGA cards. Bad PSUs are common too, but they
    usually present in less ambiguous ways.


    >-------------------- ----- ---- --- -- - - - -
    Running Windows-based av to kill active malware is like striking
    a match to see if what you are standing in is water or petrol.
    >-------------------- ----- ---- --- -- - - - -
Ask a new question

Read More

CPUs Cases Memory