mrkorb

Distinguished
Oct 14, 2004
47
0
18,530
I put together a fileserver for my work a few months ago. Specs:

* AMD Athlon 64 X2 4200+
* ABIT KN8 Ultra Motherboard
* Crucial 512MB PC 3200 BL6464Z402 x 2
* Maxtor DiamondMax Plus 8 6E040L0 40GB IDE Ultra ATA133 for general programs and OS
* Western Digital Caviar SE WD800JD 80GB Serial ATA150 x 2 in RAID mirror for critical data
* SAPPHIRE 1024-2C50-04-SA Radeon X300SE 128MB DDR PCI Express x16 Low Profile Video Card
* XP Pro

Full list with links here.

Things ran great, fine, awesome until about mid-December when it began crashing. I did the usual taking account of my recent changes to the system, and I found none. The crashes started getting more and more frequent, and they seemed to be going hand in hand with OpenGL related programs, specifcally Google Earth, an OpenGL based 3d screensaver, and Celestia. All programs that up until then had been running 100% fine. So I tried updating drivers, the programs themselves, no fix. It seemed to me that it was the video card going bad, and I would just not use those programs anymore.

Then it started happening with other programs that I couldn't just avoid using like the others and would result in constant STOP error blue screens on boot up, naming every system driver file you can imagine. A few days ago it was 10 minutes of constant rebooting, blue screens, and hand wringing. Today I spent 30 minutes in such a cycle before deciding it was going nowhere and used safe mode to backup and move what we needed to use another, slower computer as the server.

This has turned my attention to the RAM being at fault. I haven't yet tested that, though I plan to do that tomorrow, but I wanted to see if you guys agree with my thinking on this. It's causing me all kinds of headaches at works since this computer is at the core of our customer management system and when it goes down people are unable to do their jobs.

So, ideas?
 

mpjesse

Splendid
Can't say for sure, but it sounds like it's the memory that's giving you all those BSOD's. All those programs you listed use a TON of RAM. Use this program to test your RAM:

http://www.memtest86.com/#download0

Download the ISO image and use it to boot ur PC. Warning: it's going to take a few hours, so make sure u run it before you go to bed.

-mpjesse
 

mrkorb

Distinguished
Oct 14, 2004
47
0
18,530
Did that. I let each stick run individually for 4 hours, letting the latest memtest86+ do 20 passes on them, resulting in no errors on either of them. I can't help feeling that I'm back to square one now. Although I should mention that I let Windows boot with just one stick installed, and though it ran slow as hell, it successfully ran everything I threw at it, whereas before it would promptly blue screen on me. I almost want to think that they had gotten loose in their slots and that's why it had been acting up before and once I unplugged and moved them around it made a better pin connection.
 

mrkorb

Distinguished
Oct 14, 2004
47
0
18,530
Holy crap!

Ok, so I had no errors when I tested the two DIMMs individually, nor were there any errors when I tested them together. So I decided to dig around in the configuration on Memtest86+. On the memory sizing menu, I selected the 2nd option, BIOS - All, instead of what I assume is the default option of BIOS - Std, right? In less than 5 seconds over 5000 errors filled the screen and the program locked up.

I think I found a sore spot.
 

TBlaar

Distinguished
Dec 10, 2005
84
0
18,630
Dude, I know what your problem is....


You're using AMD!!!!!!!!!



lol


Are all the chips exactly similar(even chips types)?
Are they running dual-channel?
I've had it before that a pc keeps bsod-ing because i ram 2 kingston 512s in dual that had different chips on the Ramstick.

Pardon the confusion, but I have just forgotten the term for the little chip on the chip!?!?! Goldfish....
 

mrkorb

Distinguished
Oct 14, 2004
47
0
18,530
Dude, I know what your problem is....


You're using AMD!!!!!!!!!



lol
Oh yes, that answers everything. Thanks for being so helpful. I'll be sure to send you a fruit basket for that wonderful bit of insight.
Are all the chips exactly similar(even chips types)?
Are they running dual-channel?
I've had it before that a pc keeps bsod-ing because i ram 2 kingston 512s in dual that had different chips on the Ramstick.

Pardon the confusion, but I have just forgotten the term for the little chip on the chip!?!?! Goldfish....

Well, I can't see the chips, as the DIMMs are covered by heat spreaders, however they are the same brand of DIMM with identical batch and PN numbers, so I would assume that they have similar chips. They are running dual channel. As I said in my first post, two Crucial 512mb PC 3200. (link with pic)
 
Dude, I know what your problem is....


You're using AMD!!!!!!!!!



lol


Are all the chips exactly similar(even chips types)?
Are they running dual-channel?
I've had it before that a pc keeps bsod-ing because i ram 2 kingston 512s in dual that had different chips on the Ramstick.

Pardon the confusion, but I have just forgotten the term for the little chip on the chip!?!?! Goldfish....

SPD - tells the bios what timings at what speeds to use.

Think Logically - THE SYSTEM RAN FOR MONTHS WITHOUT AN ISSUE so somethings gone south, perhaps the motherboard?
 

mrkorb

Distinguished
Oct 14, 2004
47
0
18,530
Think Logically - THE SYSTEM RAN FOR MONTHS WITHOUT AN ISSUE so somethings gone south, perhaps the motherboard?

Yeah, that's what I'm thinking now too. Almost as though it's lost the ability to handle dual channel RAM all of a sudden. Each stick runs Google Earth and allows Remote Desktop Connection without any problems at all when they're in there by themselves, but together it's nothing but constant blue screens.

Also, I read on the memtest86+ forums that choosing the BIOS-All option like I did will give pretty much anybody a load of errors now days, so I guess that observation is pretty meaningless, heh.

Oh, and both sticks are running 2-2-2-8, btw and nothing is overclocked, so this isn't a case of burn out from anything like that.
 

mrkorb

Distinguished
Oct 14, 2004
47
0
18,530
Well, whatever happened, it's definately related to the RAM running dual channel. I stuck the DIMMs into slots 1 and 3 and they performed perfectly running Google Earth and Remote Desktop Connection. At this point, I'm not sure if I want to RMA the board with Abit or not, because the system works like this. Sure the RAM being single channel makes it slower, but it's working, which is really the important thing as far as I'm concerned and waiting for a new board is just more days of downtime.

EDIT: Or at the very least it's a problem with slots 1 and 2 running dual channel. I just put the RAM into 3 and 4 and it ran dual channel fine, with the exception of it running at DDR333 instead of DDR400. The memory timings changed to 2-2-2-7 too.
 

jammydodger

Distinguished
Sep 12, 2001
2,416
0
19,780
Make sure you test each of your ram modules seperatly with memtest, I had a very similar problem recently. Computer would work fine most of the time, but when i played certain memory hungry games it would crash. I eventually came to the conclusion that it was the motherboard to blame and hastily went out and brought a brand spanking new MSI neo2 platinum. Only to discover, to m horror, that the same problems were occuring. After testing all my memory modules I found that one of them was crashing in memtest in the same place every time!

Im now running the computer fine with 2 of my four memory modules, and am gonna rma the other 2 modules to Geil and hopefully get them replaced. The moral of this story is test everything thoroughly before you jump to conclusions.
 

mrkorb

Distinguished
Oct 14, 2004
47
0
18,530
Make sure you test each of your ram modules seperatly with memtest

I did that already. Ran each DIMM individually through memtest86+ for 5 hours (about 23 passes) and received 0 errors on each of them. Those results, plus playing musical chairs with the DIMMs is what leads me to believe that there's something screwy with slot 2 on my board.

You might try updating your BIOS to see if that fixes your dual channel problem. Can't hurt to try...

You're right, it can't hurt to try that, but like I said in my first post, everything was just fine for nearly 4 months before something went bad in mid-December.
 

blue68f100

Distinguished
Dec 25, 2005
1,803
0
19,780
I haven't found memory86 good for testing anything. Use Prime 95 instead. It will stress things better.

Use cpu-z to make sure the bios is reading the memory timings, and voltages correctly.
 

mrkorb

Distinguished
Oct 14, 2004
47
0
18,530
Ran Prime95 for 14 hours with the RAM in slots 3 and 4 where things seemed to work fine. No errors.

I put the RAM back into slots 1 and 2 (where the crashes seemed to be coming from), ran Prime95 for 14 hours there. No errors. Ran Google Earth and it crashed.
 

mrkorb

Distinguished
Oct 14, 2004
47
0
18,530
I hope you all will forgive this little bit of thread necromancy, but I just wanted to wrap this up.

I RMA'd the motherboard with Abit and mailed it to them on Feb 7th. About 2 weeks later, they e-mailed me offering a KN8 SLI board instead of a replacement KN8 Ultra. I wasn't sure if it would have caused any problems with the already installed OS and software or if it would be just fine, but I refused the SLI board in favor of getting an KN8 Ultra. Personally, I didn't see much use for a SLI board at an insurance agency!

A few days ago I got a surprise in the mail. It was the replacement board. I can now happily say that everything works just as it should. The RAM is running smoothly at DDR400 with no crashes from either Google Earth or Remote Desktop Connection, which were always crashing with the previous MB. I think the only thing that really gives me pause at the moment is wondering if I applied the Arctic Silver compound to the processor when I popped it in the new board, but I'd say considering that the system temp is at 70F while idle is a good sign that I did it right.

Much thanks to all that chimed in on this and gave suggestions. After almost 2 months of being on a slower crappier backup server at work this appears to be finally over.