Stability issue with SLI rig – probably power supply.

eightdrunkengods

Distinguished
Apr 28, 2011
424
0
18,860
Hello, enthusiasts,

tl;dr I would like a PSU recommendation to “future proof” my system. I would like something that can comfortably handle a SLI system with two very high end GPUs (560ti or similar), three hard drives, two optical drives, an Phenom X4 processor (with overclocking headroom), 4x 2GB DDR3, and a few USB devices. I would really appreciate recommendations from those with similar systems.

Current System:

Phenom II 555BE @3.2 GHz (unlocked)
ASUS M4N98TD EVO AM3
OCZ ModXStream Pro 600 Watt
2x 320 GB HD (RAID 0)
Windows 7 64-bit
8GB DDR3 @ 1600
2x GeForce 9800 GT 512 MB (SLI)
“Monitor” is a Sony Bravia, 1080p, 60 Hz

Long version:

The issue I have is that during games that hit my entire system really hard (currently: Crysis and Hydrophobia) I get bluescreens after about 20 minutes. The cause has been somewhat difficult to track down because both games are notoriously buggy and will very occasionally BSOD a stable system. After a few weeks testing I figured out that if I take either GPU out, the system is very stable and as soon as I put both of them in, I get the BSODs after a few minutes of gaming. I think my current 600W PSU is *almost* sufficient for the current system and only when the entire system is stressed does it become unstable. Portal 2 and Half-life 2 play flawlessly (max settings, AA, 1080p, Vsync) which, I think, is because they do not actually stress my system very much. I didn’t have this instability at all until I installed two more sticks of RAM and started playing Crysis and Hydrophobia.

Tests I have done to conform identify the PSU as the culprit:

-Gamed for some time with each video card in (but not both). I tested both PCIe slots this way. The cards and PCIe slots are fine.
-Prime 95 run for 8-10 hours at a time with and without the CPU cores unlocked.
-Memtest86+ for 27 hours <--- HEY! This only means the DIMMs are fine. RAM configuration could still be unstable!
-CheckDisk to make sure the HDDs are ok.
-Temps monitored. My CPU runs quite cool. The GPUs can get up around 75 C.
-Reinstalled Windows and removed a lot of non-essential programs (rivatuner, punkbuster, GPU OC software)
-Some testing with OCCP (Linpack and GPU tests) but I found that program to be buggy and I decided that actually playing Crysis was a better indication of stability
-Tried several different driver versions, using Driver Sweeper to make sure driver installs are clean.
-Tried running with only 2 (of 4) fans running and without the optical drive hooked up.
-I suppose the problem could be the motherboard. I’m not sure how to figure that out other than to replace the PSU and see if the problem persists. Any suggestions for more tests I could run would be welcome.

UPDATE: It's the RAM/motherboard combo. I had my RAM set at it's recommended speeds and timings which put it at 1600 MHz. At this speed any combination of two sticks was stable in any dual channel configuration. Adding the other two sticks introduced instability. Raising DRAM voltage past 1.6 V made it even less stable. Slowing the RAM down to 1333 MHz allows me to run 4 sticks. So there you have it. Since it was kind of stable at 1600, I guess I can play with the NB voltage and try to get the RAM back up to 1600 but right now I'm just enjoying a rock solid system. In hindsight, I recall that the AM3 boards aren't rated for 1600 and getting RAM to run that fast was just a bonus. It should have been one of the first things I checked but I assumed my RAM was clear when Memtest86 didn't error. TIL Memtest doesn't test the memory controller - just the RAM.

UPDATE #2: I can't get the RAM stable at 1600. Raising the RAM voltage AT ALL leads to instability. The CPU/NB voltage doesn't seem to help. However, I've managed to tighten the timings a lot which makes me happy. I'm at 7-7-7-20 with 1T. So that's cool.

Thanks for all the input.
 
Solution

His motherboard uses a 980a nVidia chipset.

eight:
A pair of 9800GT's pull about 240 watts. A pair of 560Ti's pulls about 400 watts.

Your present PSU is adequate for the 9800GT's. For a pair of 560Ti's, I think a good 650 watt would be barely adequate, but I'd be happier with a 750 watt PSU.

I do not think that your BSOD's are being caused by your PSU. Inadequate power generally causes seemingly random reset/reboot cycles, not BSOD's.

Does unlocking/locking CPU cores affect the BSOD's?

Do you have current video drivers?

eightdrunkengods

Distinguished
Apr 28, 2011
424
0
18,860


There were only one or two AMD/SLI boards when I bought this one. If it's a driver issue then it's an issue with my cards being no longer supported... which would bum me out.
 

eightdrunkengods

Distinguished
Apr 28, 2011
424
0
18,860
I'm not terribly budget constrained. I would rather not spend a extra money for something flashy or unnecessary. I've looked at a bunch of PSUs. I don't really think spending more than $300 is necessary for the system I want to build. Obviously, all else being equal, lower price is preferable.
 

His motherboard uses a 980a nVidia chipset.

eight:
A pair of 9800GT's pull about 240 watts. A pair of 560Ti's pulls about 400 watts.

Your present PSU is adequate for the 9800GT's. For a pair of 560Ti's, I think a good 650 watt would be barely adequate, but I'd be happier with a 750 watt PSU.

I do not think that your BSOD's are being caused by your PSU. Inadequate power generally causes seemingly random reset/reboot cycles, not BSOD's.

Does unlocking/locking CPU cores affect the BSOD's?

Do you have current video drivers?
 
Solution

_Pez_

Distinguished
Aug 20, 2010
415
0
18,810
Yes, you are right, I am sure that your problem resides in your PSU. Also These PSU's would do the job without any problem and with enough headroom for future hardware and high end gpu's, (except for tri and quad configurations). :
http://www.amazon.com/HALE90-Power-Supply-Modular-HALE90-750-M/dp/B003YFIUEG/ref=sr_1_14?s=electronics&ie=UTF8&qid=1306361741&sr=1-14
http://www.amazon.com/Corsair-Professional-Performance-750-Watt-CMPSU-750AX/dp/B003PJ6QWE/ref=sr_1_17?s=electronics&ie=UTF8&qid=1306361741&sr=1-17
http://www.amazon.com/Cooler-Master-Modular-Certified-RS850-AMBAJ3-US/dp/B002RWJGC2/ref=sr_1_20?s=electronics&ie=UTF8&qid=1306361741&sr=1-20
http://www.amazon.com/Thermaltake-Toughpower-Management-FanDelayCool-TPG-750M/dp/B003WOL4VA/ref=sr_1_3?ie=UTF8&s=electronics&qid=1306362041&sr=1-3
 

eightdrunkengods

Distinguished
Apr 28, 2011
424
0
18,860


No, I still got errors even with the CPU set back to stock. I do have the current drivers. I also tried rolling the drivers back (to two different versions) doing clean installs every time. I'm pretty sure that, if it's a driver problem, it has to do with how they are written. (Not to bash Nvidia or whatever. It's just that the drivers are installed as perfectly as I think it's possible to install them).

Yeah, I read that PSU problems typically cause straight up shutdown errors which is why it was the last thing I suspected. I did find some examples claiming that a too-weak PSU can lead to BSODs. I figure that, since you can get BSODs from having your CPU or DRAM voltage set too low, you can get them if your PSU fails to deliver the voltage required for those components. My theory is that, once my PSU gets warm (20 minutes of working) it becomes less efficient. Then the GPU, CPU, or RAM gets starved for current, they error and cause a BSOD.