I am having random freezes which seem to occur primarily when the GPU is under heavy load. The freezes are hard and take hold of the entire system, mouse, keyboard, and all. The only solution is a reboot. I usually cannot go for more than 24 hours without a lockup while under 100% GPU load. Despite my suspicion that the GPU is involved, the freezes will also occasionally occur when simply surfing the web, or doing other not very GPU-intensive tasks. It even once froze while in the BIOS.
I have made absolutely sure that my BIOS settings reflect the ratings for my RAM, and I was able to run memtest86 for 24 hours without errors.
I have updated my BIOS, SSD firmware, and NVIDIA drivers all to the latest versions (2.10, 000F, and 304.37 respectively), which at first seemed to decrease the frequency of the freezes.
I have monitored my voltages and the 12V drops from 12.35 while idle to 12.24 under heavy GPU load, which seems fine.
I can run Prime95 on all 6 cores for at least 24 hours without any errors, CPU temp staying below 64C.
It is mainly when I use something like cuda_memtest, or perform lengthy CUDA computations that directly put the GPU under heavy load that I will get consistent freezes within 12-24 hours (often much sooner).
I have monitored the GPU temps and they never go above 72C (which should be reasonable, yes?)
I should also add that there is no helpful information pertaining to any errors at the time of freeze in any system or kernel logs.
One thing that might be important is I have a mid-tower case which is quite compact and the cabling situation is kind of messy inside, with the 5V power cable hanging right in front of the GPU fans, and fan cables dangling everywhere.
I am wondering if it is a power/heat issue? If so, maybe getting a water block for the GPU is in order? I am just wondering if there is something else to try before spending $$.