CPU: Core 2 Duo E6850
RAM: Corsair Dominator 2 GB kit (2x 1GB sticks)
Video: 2x BFG Nvidia 8800 GTX
MB: Asus Striker Extreme
HDs: 4x Seagate Barracuda 250 GB in RAID 0+1
PS: Real Power Pro 1000 W
Case: Coolermaster Stacker evolution 830
I've already had to RMA this MB once due to issues with the HD controller (my RAID array reported healthy, but kept getting file system problems... ran chkdsk /r multiple times in a row, and errors were found and corrected each time... it froze on the 3rd run). I also simultaneously started having problems with my USB ports (USB mouse and keyboard would start acting like someone was banging keys). They repaired it instead of sending me a new one...
Anyway, got it back about 2 weeks ago, and everything was fine for about a week and a half. Now, I am having problems with my graphics. I am monitoring temp levels, and they weren't going over 73C or so under full load. I am getting either "There is a problem with your display driver, please save your work and restart your computer" (and the display changes to 800x600 and has many artifacts), or a BSOD with nv4_whatever.dll as the culprit. Anyway, I took out one of my cards and ran in single card mode, and everything was fine for a day or so. Left my machine on overnight, when I got home from work the room it is in was about 80 degrees. The card reported its temp at 53, my CPU & MB temps were about 34 each (celcius). Soon after I fire up a game (and I tested with a variety of games, and this problem occurs), things start stuttering, the screen flashes black and then back to the display, no artifacts occur though. After a few minutes of this, if I don't reboot, I either get a BSOD or the error message I wrote above.
I tried each card in the first PCIE slot, and after a swap it would work for about a day (even with 6-7 hours of heavy gaming) without a problem. But after a day or so each card experienced the same issue. I now have one of the cards in the 3rd PCIE slot, and so far so good, but I just did this; will leave it running overnight and throughout the day tomorrow and see what happens when i get home from work.
I am thinking there is a problem on the motherboard, whether it is with overheating (or regular heating) bringing out problems with a faulty slot, or whether it is somewhere else (which I guess I'll find out if the problems continue in this other PCIE slot).
I just wanted to post to see if anyone had any other ideas. I don't currently have another system working that I can test these on, but I may be able to sometime soon if necessary.
Exactly on why on the last few computer that I have built, I went with Gigabyte. Their boards are great as of late and Asus could care less about helping. I have no idea to a solution other than get a new board from them.
If they won't give me a new board, I will go buy a new one from another manufacturer... I've been building with mostly Asus boards for 10 years now... if they won't be reasonable (I can understand not sending a new board with the first problem (I don't like it, but I can understand just wanting to repair or send a refurb), but a couple weeks after I get it I start having problems with the refurb... they should send me a new board, in my opinion.
@OP: Here are some things you could try if you haven't already tried them:
1. Increase PCIe voltage by 0.+1 to +0.2 (if you have that setting)
2. Update to latest GPU drivers or if they are the latest try BETA drivers.
3. Open up the side panel and try running it (to keep temps as low as possible)
4. Run MEmtest86+ and other tests (just to be sure every thing else is working)
5. Reinstall Windows and see if possible.
6. Try different PSU that is GOOD quality, the CoolerMaster is not the best in quality (I would say its in about Tire 3)
My CPU was at around 40 C, MoBo around 40 C, I don't have a case temp probe right now, and the HD temps I'm not sure. I am unable to get SMART info because I'm using NVRAID, and so I just get "Couldn't read HD information" or something like that when I try to get SMART info. Is there a HDD Temp utility out there which doesn't use SMART?
I'm pretty damn sure its something to do with the PCIe bus or something around there... when I swapped the card out for my other 8800GTX, the issue stopped until about 20 hours later. I tried this with both cards... its a little weird.
ie: Leave it on for a while, it starts crashing. Take the card not currently installed, switch it with the card in there. No more crashes for 20 hours or so. Repeat, no more crashes. Cards NOT overheating according to temp readings (they didn't go over 70)
As for bumping the PCIe voltage, I don't see how that would help since I don't have any problems until its been running for a long while... if it was underpowered I'd expect problems to occur at more spread out times. I may try that in a bit if Asus won't advanced replace this thing (I tried, they said I'd have to speak to a supervisor)