Yet another BSOD

linkyone

Distinguished
Apr 9, 2010
6
0
18,510
I have had this pc (shown below) for 3 years. It has been stable with 2 gigs of ram and XP x86, all the way up until now. I have recently installed Windows 7 and switched out the 2 gigs of ram for 4 gigs of the exact same brand. In fact, the new ram is on the QVL for the motherboard while the old was not. I have contacted Corsair and tweaked the ram like crazy. I am still getting BSOD. I have run Memtest for hours and over 8 passes - no errors. I have upgraded PSU - no change. I have reformatted and reinstalled everything - no change. I have tried all kinds of drivers and it is still not stable. I can render video for 8 hours straight and the system doesnt hick-up but when I open IE8 and open a new tab, BAM.. BSOD.

I have disabled both on board NICs and installed a realtek GIGe PCI NIC. Still have issues. The system is on an APC active battery backup. The voltages on the PSU are all ok (+12 sits at around 11.7).

Just today, I left the computer on for over 12 hours and it was fine until i opened a folder of pictures. I have had errors from Bad_pool_header, IRQ_not_less_or_equal, memory_management, system_service_exception and the occasional BSOD with no description. I have searched the computer for memory dumps and i cannot find them. I double checked that the full dump option is on, it is. Where are they hiding?

What am I forgetting? Why is this so random?


Specs:
Asus P5N32-E SLI Plus
Intel QC 6600 @2.4
4G Corsair DDR2 6400C4DHZ 4-4-4-12-2T (2.1v)
Dual nVidia 8600GTS (non-sli)
Quad LCDs: 2X Sony SDM-S204, Samsung SycnMaster 204B, HP w19
4x 500GB Samsung in Raid 0+1
Audigy 2ZS Platnium
Corsair 650 PSU 80+
Coolermaster HAF 932
Win 7 Ult x64
Logitech diNovo Bluetooth set
 
The memory dumps will be located under C:\Windows\minidump.

Based on your description above, I'm going to hazard a guess and say perhaps you have a dying hard drive, since you've tested just about everything else in the system.

Also check your BIOS to make sure your RAM timings and voltages are set correctly. My server system has RAM hat requires 2.0V, but my BIOS defaults to 1.8V. When the server was my main system, I regularly made that change, but since it became my server, I had completely forgotten about it.
 

linkyone

Distinguished
Apr 9, 2010
6
0
18,510
Yea, I looked there already. I don't have that directory, nor do i see an event log entry. I just get an "unexpected shutdown at ..."

I manually created the directory so we will see if it now creates a dump file.

How do you suggest I go about checking the HDDs? They are in a RAID 1+0. I do not have any SMART errors. I have rebuild the RAID, formatted and reinstalled a few times already. The controller is onboard and only provides re-syncing options. I manually ran a SMART test in the nVidia storage manager, nothing. I have had drives fail in another pc (same RAID config) and the drive would throw up an error in the event log or just disconnect all together. This event log is squeaky clean!

My ram is sitting at 2.1v, NB @ 1.45, SB @ 1.55. It doesnt matter if I set the timings at 4-4-4-12-2T or 5-5-5-15-2T, I get the same result.

Edit: Drives are samsung, not seagate. I am downloading samsung's diag tool. I'll see what info that produces.
 

linkyone

Distinguished
Apr 9, 2010
6
0
18,510
I tried OCCT earlier - nothing. I also tried that Prime95 (i think thats the name) number stress tester - nothing. The HDDs are 2 years old and the pc is on only maybe once a week if that. I have the same drives in my servers and they run 24/7 and i have not had a problem with them at all. I ran the HDD diag tool from samsung and it did not find any issues but it did degrade my RAID. I let it rebuild then tested the next one, and so on...

I pulled 1 stick of ram to see if it will run stable. I dont think that will help because i can run memtest for 10 passes without any issues but i am running out of ideas. Will pulling 1 stick "disable" DDR? Does this mean it will run at 400MHz? I know what DDR is but i dont know how the speed part works. If the ram is DDR2-800 is that the speed with a paired ram stick or by its self. Do i reach speeds of 1600 with 2x DDR2-800 modules (not overclocked)?
 

beamj

Distinguished
Apr 7, 2010
36
0
18,540
I was having a similar issue in WinXP and turns out my mem (patriot ddr2 800) required 2.1v to run. Even at that setting the machine was not stable for more than a week at a time. I changed out my memory for some Mushkin memory that used less voltage overall for the same latencies (1.8v) and have not had any kind of crash or stall at all. Could be that your motherboard is not FULLY supplying the voltage required.
 

linkyone

Distinguished
Apr 9, 2010
6
0
18,510
Sleep and hibernate are turned off.

I added a fan to the northbridge heatsink, the system ran for about 6 hours (watching hd movie(VLC), running adobe bridge, photoshop, everest system monitor and 2 folders of pictures) then bsod. I even (dont tell corsair) bumped my ram up to 2.2 to see if maybe my mb wasnt supplying enough voltage like beamj said.

Then, I pulled one video card and switched out the ram with some generic stuff that only runs on 1.8v and let the ram's spd profile set everything else. it ran for about 5 hours (listening to audio(VLC), 2 folders of pictures, ie8) and crashed when i opened a picture with windows preview. Tonight i will swap video cards with the second one, move it to PCIe #2. Keep your fingers crossed!

All my temps are low... cpu=19c mb=30c MCP=53c spp=52c I dont think its heat. I have 3 240mm case fans on my coolermaster HAF932 case and they are all at 100%.

I am waiting on some HDDs from tiger, when those show up, ill test the system on a new drive and let you all know what happens.

Thank you all again for the suggestions! keep em coming!

If i had a bad CPU would it cause these random errors? besides running a stress test (everest, occt or prime95) on the cpu, what could i do to test that?
 

linkyone

Distinguished
Apr 9, 2010
6
0
18,510
The new HDDs arrived. Unhooked all the old ones, attached one new 1.5TB seagate drive. installed win 7 x64. everything looked good. let it run idle for about 12 hours and all was ok. starting using the pc and all the problems were still there. I called Asus just to see what they would say. They needed my serial number and i could not see it. I removed the board from the case and discovered that the board was warped around the cpu. It looks like the heatsink was pulling too hard i guess. The mosfets on the top of the board were not even touching the small heatsink above the cpu. I called them back and they issued me an RMA. once i get the new board ill give you all an update. i also have access to another quad core (its a q6700) so if the board swap doesnt fix it, ill try that next.

oh, btw. dont remotely mention that you are using an unsupported OS on one of Asus' boards. the guy on the phone was extremely helpfull until he found out that i was using win 7. after that, his answer for every question i had was, "install a supported OS and give us a call back and we will go from there." he didnt even care that nvidia had drivers for the 650i chipset.
 

linkyone

Distinguished
Apr 9, 2010
6
0
18,510
Well, after months of phone calls and rejected RMAs from ASUS i finally got a working board. So I can assume the board was the issue. The first board they sent me was also warped and had heat sink compound all over it. it was also extremely dusty! I rejected that one. In a few weeks I received the second board... it was the SAME BOARD! I yelled and screamed and the RMA was immediately upgraded to level 3 tech support. I got to talk to one of the techs and he said he didnt have any more boards to send me. He talked to his boss and ended up sending me a striker board. It has an identical layout, just an upgraded chipset. I have had it up and running for a few days now and all seems ok.

Thanks for everyone's help. This was a nightmare! I am glad it is over. :bounce:

Oh, btw, this board is slightly warped too. Is it common for the board to be warped around the cpu socket?