Trouble w/ Athlon X2 5000+ System! What happened?

altazi

Distinguished
Jan 23, 2007
264
0
18,810
Hello All,

My system:
- Athlon X2 5000+ Black Edition (Brisbane 65nm) overclocked to 3.0GHz
- MSI K9A2 Platinum motherboard (790FX chipset)
- Crucial Ballistix DDR2 1066, 1GB x 2 (running at DDR2800)
- PC Power & Cooling Silencer 750 Quad PSU
- GeCube HD3870 video card
- Seagate Barracuda 750GB HDD
- Samsung SH-S203N Sata DVD burner
- Kingwin RVT-9225 92mm CPU cooler, installed with Arctic Silver 5
- Sony 1.44MB floppy
- Windows XP Pro (32-bit) w/SP2

System was built mid-February. I began by running Memtest86 v3.4 for 24+ hours with no errors. I then installed the OS, and ran stability tests using Prime95 on both cores, and encountered no errors for 10+ hours. I then set the CPU clock from the initial 2.6GHz to 3.0GHz, and repeated the tests - again, finding no troubles. Although I can't read the core temperatures due to the Brisbane problem, the motherboard reports the CPU temperature in the mid-20s for idle, and low 30s for high loading. The CPU HSF is cool to the touch. The system has been 100% rock stable ever since - until now.

I finished a game of Oblivion last night, and left the system to go into power-saving standby mode. This morning, the screen was blank (as it should have been), but the system wouldn't "wake-up" on keyboard or mouse activity. I reset the system, but the screen remained utterly blank. I repeated this several times, and finally got the BIOS message saying that my overclock had failed. I went into BIOS setup, verified the settings (but didn't change anything) and started Windows. Everything seemed fine.

I started Prime95 in the mixed mode and it failed almost immediately. I then tried the small FFT test, which runs primarily from CPU cache and uses little external memory. The system ran a little longer, but it still failed the test after a minute or so. I then restarted the system and dropped the CPU clock to 2.8GHz, and tried Prime95 again. Still had failures.

I put the Memtest86 floppy into the drive and restarted the system. After running for one minute and 45 seconds, Memtest86 halted with an "unexpected interrupt" error. This is not what I would expect to see for a memory error. I have run this program on other systems with bad RAM, and have seen what those kind of error messages look like. I restarted Memtest86, and shortly afterwards, the program crashed!

This is where I am now. I don't believe it's a RAM problem, but it might be. It could also be a motherboard or CPU problem. My question to all of you is this: How can I figure out what is bad? I don't have any other AMD2+ motherboards or suitable CPUs to swap out, and I don't want to buy more just for the purpose of testing.

I would like (helpful) suggestions as to what I should do next?

Thanks in advance for your help!

Altazi
 

Andrius

Distinguished
Aug 9, 2004
1,354
0
19,280
Clearing CMOS might help. yes. If not unplug the computer from the wall socket for a few hours (sometimes, that helps with freaky stuff like this). That unexpected interrupt could be due to overheating. Check if the cooler has come loose.


What kind of heatsink are you using?
I guess you're using AutoVoltage for the CPU?
Have you tried it at stock settings?
 

johnnyq1233

Distinguished
Aug 15, 2007
1,233
0
19,460
Hopefully you might have a friend that runs this socket and try your cpu on his/her board. You didn't mention if you did the one stick at a time thing.
The only other way I know of to test the cpu is take it to a reputable shop.
I'ld hate to think your oc damaged you on die cache....Hope this helps.
 

altazi

Distinguished
Jan 23, 2007
264
0
18,810
I have cleared the CMOS, restarted the computer, and am running Memtest86 again. So far, it has been running for 58 minutes, and is 76% of the way through the first pass.

If clearing the CMOS fixed the problem, all I can say is "WTF???" What caused this? A stray cosmic ray? The system is connected to a good UPS on a properly-grounded circuit. I use this system for work, and I need it to be reliable.

I'll let you all know how this shakes out. Thanks for all of your great suggestions.

Altazi

 

Andrius

Distinguished
Aug 9, 2004
1,354
0
19,280
With a system that has some 10^12 "components" (software and hardware) we tend to underestimate the power of a single logic value. If the right value get's "altered" just about everything can go wrong.

That's why my very knowledgable logic designer friend says : "He who said computers are deterministic machines lied". ;)
 

Clearing the CMOS may have solved it so the problem could be that the BIOS is not retaining BIOS settings because the power from the battery is too low. Had a smiler problem with a P5K a few months back.
 

altazi

Distinguished
Jan 23, 2007
264
0
18,810
Memtest86 ran for one hour and 22 minutes before crashing. It looks like the program itself crashed, because it is totally unresponsive. Hitting "ESC" should exit the program and force a system restart, but the system is utterly frozen. I have never seen Memtest86 respond like this before.

I will try running on one RAM stick at a time, but I am doubting if this is a memory issue. . .
 

altazi

Distinguished
Jan 23, 2007
264
0
18,810
Doesn't the lithium coin cell only provide power for the CMOS RAM when there is no system power present? Surely the designers wouldn't do anything stupid here, would they?
 

Andrius

Distinguished
Aug 9, 2004
1,354
0
19,280
Not all batteries are the same. The shelf life is around 5 years. Various events shorten the lifespan (IE faulty BIOS settings/extreme temperatures during storage/...).

You say your system is on an UPS. Can you test on a surge protector socket only? I've had my issues with UPSes and a CRT screen. And sometimes my board would ignore my BIOS settings entirely.
 

altazi

Distinguished
Jan 23, 2007
264
0
18,810
Reflashing the BIOS? Now that's a conundrum. The warning that comes up for BIOS flash programming warns that the system must be stable before you attempt the re-flash.

At this point, I had five successful passes of Memtest86 on one stick of the RAM (several hours worth of testing). I switched sticks and am now testing the other, and it's 38% of the way through the first pass - no errors so far.

Just thinking ahead - what if both sticks pass individually, but fail as a pair? I am not overclocking the memory, and the system has been absolutely stable up until this morning. I haven't had to reset the system due to a problem even once until today.

I measured the coin cell voltage with my DMM and it measures just above 3.0V, so I don't think it's the cell. . .

Keep those suggestions coming! Thanks!
 

altazi

Distinguished
Jan 23, 2007
264
0
18,810
Good points, MrsB. These tests are being done at 2.8GHz, a 200MHz overclock. The CPU is barely warm. I suppose I should drop the frequency to 2.6GHz, but why would the system be stable for almost 3 months solid, and then go to crap all at once?
 

snajper69

Distinguished
Apr 16, 2008
222
0
18,680
Exactly. It totally blows me away that the system was stable for co many months and all of a sudden it turns crups. When I said reflash the BIOS I meant once you get the system stable enough.
 

altazi

Distinguished
Jan 23, 2007
264
0
18,810
OK, now for the final and total weirdness. I ran five passes of Memtest86 on the other RAM stick (6 hours of testing) with no failures. Next, I ran Prime95 in the RAM test modes, again receiving no failures after several hours of testing. Both sticks of RAM individually test OK, and everything seems to be rock solid when the system is running with either stick of RAM.

Why would the system be challenged when both sticks of RAM are installed?
 

altazi

Distinguished
Jan 23, 2007
264
0
18,810
None of the timings or voltages have been changed. I am wondering if the bus drivers are damaged, and therefore the extra load of two sticks of RAM causes the instability. I think I am going to pick up some extra RAM - Fry's is having a special on Crucial Ballistix DDR2-800 4-4-4-12 2 x 1GB for $20 after $30MIR. I'll swap these in and continue testing. If I still get faults, then I guess it points to either the CPU or the motherboard. Wish I knew how to tell. I don't have an extra AM2 CPU sitting around, and I certainly don't have another MSI K9A2 Platinum mobo.
 

Andrius

Distinguished
Aug 9, 2004
1,354
0
19,280
Did you test in all 4 slots?

Run the "in cache" test of Prime95 with 1 stick. If it fails you'll know its the CPU. If it doesn't it's a good bet one of the slots went to heaven.


 

altazi

Distinguished
Jan 23, 2007
264
0
18,810
Hi Andrius,

I have run Prime95 on both cores (simultaneously) using the small footprint (data fits in cache). I have not seen a failure in this mode. However, when I run the mixed test that uses lots of RAM, I get failures in both test windows almost instantly.

I have tested the RAM sticks individually using slot 1, and the test is successful. Using both of the sticks, either in slots 1&2 or 3&4 causes a failure.

I still don't quite get how the system could have been rock solid for about three months, and then just go gunnybag all at once. I went and got some Crucial Ballistix DDR2 800 4-4-4-12 and I suppose I will try that next.

Is this looking like a bad motherboard?
 

Andrius

Distinguished
Aug 9, 2004
1,354
0
19,280
The small footprint test means no bugs in the cache I guess.

Can you try in slot 1&3 or 2&4?

The pattern would indicate that the IMC has a bug. Now if that is true the same thing should happen with slots 1&3 or 2&4. That would then indicate something is wrong with the dual channel memory controller part.

You could buy a cheap dualcore X2 and test or you could just downclock your chip to say 2.0GHz and repeat.

Hardware fails with time (capacitor aging, electromigration, ...) but 3 months is a very short time for it. Unless you run your chip at an insane voltage of say 25% or higher over stock. It's a bad sign anyway. :/
 

altazi

Distinguished
Jan 23, 2007
264
0
18,810
Prime95 blend test fails with RAM in slots 1 & 3. Now trying slots 2 & 4. I will be out for a day or so, but will continue to log in here; I just won't know the results of the test until later tomorrow.

Since this is an Athlon, with the integrated memory controller, wouldn't that point to a bad CPU?
 

Andrius

Distinguished
Aug 9, 2004
1,354
0
19,280
It would as it is not limited to a certain RAM slot. I think it is time to RMA the chip and play dumb when asked about overclocking (or maybe not since it's a Black Edition). :)
 

altazi

Distinguished
Jan 23, 2007
264
0
18,810
So far, the system is passing the test with RAM in DIMM slots 2 & 4. Core 1 is running the Prime95 blend (lots of RAM tested) and core 2 is running the small footprint. The system has been stable for almost 10 hours now.

I had only overclocked the chip to 3.0GHz; I did all of the prime95 stability testing, and it passed with flying colors. It was just fine for almost three months.

For the current testing, I am running the CPU at 2.8GHz, a mere 200MHz overclock.

I checked on Newegg, and they don't seem to have the X2 5000+ BE any more. . . What to get now?