Revamped MemTest86 Can Highlight Bad ICs on Your DIMMs

Memory testing
(Image credit: Crucial)

PassMark Software is teasing an updated version of one of the PC memory troubleshooter’s most versatile tools. With an upcoming release, MemTest86 will usher in a “new era of memory testing,” according to the software vendor. The biggest highlight of the new memory stress testing software is the ability to pinpoint which ICs on DIMMs have errors.

When troubleshooting a PC fault, memory hardware issues are perhaps one of the most devilish a DIYer will face. Some hardware is more prone to developing problems than others or may have a tell-tale sign which will provide a red flag. However, memory issues manifest in many different and unpredictable ways. Moreover, DRAM seems particularly sensitive to problems and compatibility issues with other hardware. For years MemTest86 has helped people thoroughly check and stress test memory.

Depending on your PC problem’s severity, you might want to test each memory module individually, following the basic logic of deduction by elimination. However, if your system can run and complete tasks with all its memory modules seated, then MemTest86 can check through them without tedious plugging and unplugging – and then trying the various slots.

(Image credit: PassMark)

The image above shows the headlining feature brought to the forefront. A user put a quartet of Crucial 16GB DDR5-4800 DIMMs through the MemTest86 tool, and you can see an issue with DIMM_B1. It failed the test, throwing up 99 errors.

The new MemTest86 is showing more granular test results than ever in the screenshot. About halfway down the ‘image’ of the DIMM, ‘U6’ is highlighted in red, with a column beside it indicating all 99 errors were due to this DRAM chip or IC.

With the error shown, most users would swap out the affected DIMM with a new one with the same specs. Hopefully, it would be covered by a warranty so that you could get a new module a few days after return. Some PC and electronics repair enthusiasts/workers might keep the module with errors. The knowledge of the faulty chip could mean a repair is feasible, or the DIMM could be held in stock with several donor DRAM chips / ICs available for fixing subsequent PCs.

PassMark commented on its image a few hours after posting to highlight that the IC error highlighting feature is only working on DDR5 platforms running in a dual-channel mode right now. Also, at this time, this is an Alder Lake and Z690 exclusive feature.

That is all we have for now, and as with many teasers, we have more questions. For example, when will this feature arrive, will it be present in the free version of the tool, and will we get the IC-specific error highlighting with DDR4 memory?

Mark Tyson
Freelance News Writer

Mark Tyson is a Freelance News Writer at Tom's Hardware US. He enjoys covering the full breadth of PC tech; from business and semiconductor design to products approaching the edge of reason.

  • tennis2
    Back from the dead eh?
    When was the last version release? Somewhere around 2011 - 2013?

    I gave up on Memtest cuz it couldn't reliably/efficiently test high-GB sticks (aka >2GB) that are common in modern PCs.
    Reply
  • InvalidError
    Figuring out which chips are bad isn't rocket science: DRAM chips have a strict bit grouping due to per-byte strobes. Look at the bad bits pattern, divide each error bit position number by eight rounded down, that is the chip number counting from zero.

    tennis2 said:
    Back from the dead eh?
    When was the last version release? Somewhere around 2011 - 2013?

    I gave up on Memtest cuz it couldn't reliably/efficiently test high-GB sticks (aka >2GB) that are common in modern PCs.
    The open-source mem86+ came back to active development life a year or two ago.

    The commercial PassMark MemTest86 is a whole different thing.
    Reply
  • Darkbreeze
    tennis2 said:
    Back from the dead eh?
    When was the last version release? Somewhere around 2011 - 2013?

    I gave up on Memtest cuz it couldn't reliably/efficiently test high-GB sticks (aka >2GB) that are common in modern PCs.
    Yeah, I think you're confusing Memtest86+ with Passmark's Memtest86. Totally different products and Memtest86 has no such problems with testing any size or memory configuration that I've seen.
    Reply
  • JWNoctis
    I wonder what kind of assumptions would have to be made with the layout of double-sided modules, or modules with multiple rows of chips aka SO-DIMM. Are those really the same across manufacturers?

    But yes, translating failed address range to specific module is already much better than nothing.
    Reply
  • Darkbreeze
    Does it really matter which side of a module the failure is on? It's not like anybody, including the manufacturer, is going to bother (Or be capable) of fixing it so whether one side or the other is bad the result is the same, you return it for replacement under the lifetime warranty which most memory manufacturers offer. And if we're really being honest, for most users it probably doesn't matter which module is bad either unless you are using DIMMs from multiple kits, which you often CAN do, but really isn't recommended as a "best practice". And the reason it really doesn't matter which DIMM in cases where all DIMMs came in one kit, is that you'd want to return the WHOLE KIT for replacement, not just a single DIMM, since these are tested for compatibility at the factory and you really don't want them sending you a single replacement DIMM to add back into a kit that it wasn't tested and validated as compatible with.

    At lower speeds, within the JEDEC defaults, probably not as big of a deal as compatibility is pretty good there across the board when mixing DIMMs, but at higher speeds, not so much, so I'd recommend sending the entire kit for replacement and insisting on it, rather than settling for only having a single DIMM replaced.
    Reply
  • bit_user
    FYI, if you have a DIMM with only a handful of errors that are consistently at the same addresses, you can have Linux exclude them from use. I'm not sure if Windows has a similar capability, but I wouldn't be surprised if it did.

    It's not a bad option for a DIMM that's out of warranty, but once some errors start cropping up, they're likely to be followed by others. Therefore, it should be seen as a stop-gap solution, while procuring replacement hardware.

    BTW, I wish Intel supported in-band ECC on all their CPUs. So far, not even all of their Elkhart Lake models support it. It comes at a performance cost, but the upside is that it could work with any DIMM or motherboard (since no extra traces or chips are needed).
    Reply
  • bit_user
    Also, I'm a fan of MemTest86 and bought a personal copy to help fund its development.

    It's saved me on a couple occasions. I always do an overnight run (or at least 2 full passes) whenever I build a machine or change its RAM.
    Reply
  • salgado18
    It would be great if the RAM itself could exclude the faulty chip, lowering the total capacity but keeping its working status. Why exchange everything when only a small part is not working properly? And, if you don't want the lower capacity, there's always an RMA or purchase another stick.
    Reply
  • InvalidError
    salgado18 said:
    It would be great if the RAM itself could exclude the faulty chip
    The entire memory system and cache structure is designed around DIMMs transfering 128 bytes per burst. If you lose 1/8th of that by writing off a chip altogether, there would be a significant performance penalty from most memory accesses being oddly aligned.

    If you want to keep using bad memory, Linux lets you flag memory pages with bad bits. That way, the only thing you lose is a 4kB page.
    Reply
  • bit_user
    salgado18 said:
    It would be great if the RAM itself could exclude the faulty chip, lowering the total capacity but keeping its working status.
    Mainframes let you do things like that, although it turns out Chipkill is actually an ECC scheme rather than directly disabling a specific IC.

    Such schemes usually come at the expense of complexity and additional cost (i.e. for the extra IC, motherboard traces, etc.). There should also be some hit on energy-efficiency, as well.

    salgado18 said:
    Why exchange everything when only a small part is not working properly?
    Well, at least using removable DIMMs gives you the option to replace only a stick, rather than the CPU (if memory is in-package) or the entire board (if soldered down).

    However, we could return to the point about ECC: conventional ECC can correct single-bit errors while detecting double-bit errors. At work, we had a server which had a low rate of single-bit errors for years. We tried replacing the DIMM, but it seems the problem was either in the motherboard or the CPU. So, in the end, it kept running like that for quite a while and remained stable the entire time.

    BTW, it was a compute server - not a fileserver or database server - mostly running regression tests. So, the potential memory errors didn't put any data at risk.
    Reply