Folks, I need help. I have been having a re-occurring problem (I believe linked to the video card). Before I do something drastic I would appreciate some outside input.
My computer is:
3.15 Ghz Intel Core2 Duo Processor
ASUSTeK P5N-D Motherboard
6 GB RAM (2x2GB + 2x1GB)
OS is Windows 7
My monitor resolution is 1900 x 1200
ATI Radeon HD 4870 video card*
PSU is a 650W ANTEC Earthwatts
This problem seems to occur exclusively while playing games (such as EVE, APB: Reloaded, Portal 2, Bioshock 2, Counter Strike Beta, SecondLife, AION, etc.). My computer does not crash when surfing the internet, working on documents, editing photos, listening to music or watching movies. In fact, I can comfortably leave my computer on all night doing things like defragmentation, deep virus scans, or file sharing with no problems. Games will run fine at times, but will frequently trigger a crash - which can happen right after the game boots up to an hour into it.
When the machine crashes, the screen will either go black or freeze. A moment later the fans will rev up and spin at a high RPM. If the screen has frozen it will soon go black and the light on the monitor turns amber (meaning it is getting no signal). When the crash starts, whatever sound was playing at the time will play for a few seconds and then begin looping. The looping sound will become shorter and shorter until it is a constant buzz. Sometimes the computer reboots entirely, but mostly I have to manually turn it off (both the power button and the reboot button consistently work).
I first noticed that EVE would crash my machine at more or less predictable points (when using scanner probes, or when I warped). SecondLife wrecked out whenever I ran it fullscreen. APB will tank if I am very close to an explosion or if I maneuver the camera into a cloud of spraypaint. Portal 2 seems to have trouble with the gels, especially when I get it on me and the screen blurs from the splatter. Bioshock2 crashes at random times.
This is not the first forum I have sought help on, nor is it the first time I have gone seeking an answer. I have tried the following:
* checked inside the case for dirt and dust, used a couple of cans of compressed air cleaning off what mild dust there was on fans and heatsink gills.
* checked, flashed and cleanly installed the latest BIOS for my motherboard (1401)
* installed the most current chipset drivers for my motherboard.
* updated the video card drivers through AMD's latest Catalyst Control Center
* completely uninstalled and then reinstalled the video card drivers manually
* running with and without Catalyst installed.
* unseating, inspecting and then re-seating the video card.
* removing the video card and replacing it with a different, but identical video card
* created a boot CD for Memtest 86 for Windows 7 64 bit and let it run for 10 hours - no errors found
* slightly underclocked the video card
* uninstalled and re-installed the offending programs, also checked for latest patches.
* run EVE, SL, AION and APB with their lowest possible video settings.
* run the programs in windowed mode.
* dowloaded and installed GPU-Z, CPU-Z and FurMark
* crashed the computer while running FurMark, logging the crash with GPU-Z (that information to follow shortly.
I am putting below three different log entries from GPU-Z from three different crash events (separated by a comma). These crashes were caused while using different applications.
first thing is the video card. you did everything but remove the shroud and clean the fins and you didn't say you removed the heat-sink and cleaned and reapplied thermal compound.
you should also manually set the fan to run at least 37% from the start instead of waiting for the fan to spin up automatically. the 4870's were always hot and this was one way to help keep it "reasonably" cool.
second thing, the processor......... way too hot. remove the heat-sink and fan, clean off old and reapply new thermal paste. make sure the heat-sink goes back on squarely.
Quas, since we've talked I've pulled and re-set my RAM, inspecting each for damage and dust (cleaning where needed) and then re-set them on the mobo. I have not run RAMtest, but I will do this next.
I ordered a 4g tube of Arctic MX-4 thermal compound. Today I pulled the CPU heat sink, cleaned off the old thermal paste (which wasn't very gooey, but was not flakey) using 70% isopropyl alcohol. I cleaned off the fins and the fan blades, and got the sink looking nice and clean before applying the fresh compound and re-setting everything.
I also pulled out the video card, removed the plastic shroud and sink, cleaning everything before removing the thermal paste on the video card GPU. The paste was gummy, and took a bit of time to clean out carefully, but I got it all off (again using the isopropyl alcohol). I re-applied thermal paste and re-set the sink and shroud.
I tried one other fix. Furmark lists failing PSU as a possible source for video card problems. As I noted above, I have been running a 650W ANTEC EarthwattsPSU. I replaced this PSU with an older 450W Kingwin ABT-450mm PSU. This older PSU was fully functional when I replaced it with the 650W some years ago. My reasoning was that it would be unlikely to have 2 failing PSUs, and that if the 450W didn't have the same issue, it would be strong evidence that the 650W was the problem.
The results were identical to the issues that I had with the 650W. Upon booting up, I turned on Furmark and tried the burn in test. It almost instantly repeated the aforementioned problem (blank screen, loss of signal, etc). This happened after the thermal paste was applied.
I believe that this proves that the PSU is not the issue, although it is possible that both could be bad. Since the problem seems to be the same with either PSU, I put the newer 650W PSU back in.
Running Furmark with the new thermal paste and with the 650W PSU installed produces the familiar problem.
Swiftly_morgan, I see in the Catalyst Control Center where there is an option to manually control fan speed, but it is greyed out to me. I will try to find out how to get access to this to set it to your recommendations.
try the second pci-e slot. you may or may not need to make that you first slot in the BIOS. ( some of the old boards were like that ).
uninstall the amd drivers. 1. find an old set to try or 2. do a fresh custom or advanced install and only install the graphics driver and CCC. nothing else.
remove the NB heat sink. clean and reapply thermal paste ( these get very, very hot )...... I used to install 40mm fans on them. something you should think about, or at least get air to blow directly on it in some way.
One last thing that I tried (at the suggestion of a friend) was to set up windows to create a small memory dump file. His thinking was that there might be an error code in there that could be helpful in diagnosing what was going wrong.
I went to Start -> Control Panel -> System and Security -> System -> Advanced System Settings -> Advanced -> Settings. I made sure that the boxes on system failure were checked to write an event log and to automatically restart. At the suggestion of my friend I selected Small Memory dump (256 KB) instead of Kernel Memory Dump, but when I did this I got an error that said that my Virtual Memory paging file was set to 0 MB, and to correct this for the small memory dump to work right.
Again at the recommendation of my friend, I set the Virtual Memory to a minimum of 9213 MB and a maximum of 12384 MB.
Since I did this, I have played all of the games that I usually have problems with, and the computer has not crashed a single time. Furmark did not trigger the issue. I don't want to be overly optimistic; but this is good news.
Could a 0 MB Virtual Memory setting have caused all of my problems? Does that even make sense? I have 8 GB of RAM, and 512 MB of VRAM. Is this all from my machine being starved of RAM?
I will test this for a bit, but I am optimistic at the stability that I am feeling right now. More info will follow, but thank you to everyone who gave me input up till now.
It's been a couple of days since I beefed up the virtual memory, and I get this impression that things are different, but not resolved. I have been playing a lot of Deus Ex Machina: Human Revolution with very little problem. However, I have experienced a couple of glitches.
So, after a few crashes, I'm back to give you some more info. The .dmp files from the previous post seem to reference different offending drivers (serial.sys, dxgkrnl.sys and blbdrive.sys) and different addresses. What sticks out to me is that all of the .dmp files have the same bug check code (0x00000116), which tracks back to a VIDEO_TDR_ ERROR. According to Microsoft, "The VIDEO_TDR_ ERROR bug check has a value of 0x00000116. This indicates that an attempt to reset the display driver and recover from a timeout failed."
It's still pretty early on since I've changed the Virtual memory, and there is an improvement in the performance. I'm going to continue as is for a bit longer. No doubt I will get more .dmp files and maybe a pattern will be more visible.