New Computer; BSOD, Prime95 fails and variur errors. (Win7, i5 3570K)

Nik_Yawn

Honorable
Sep 16, 2012
8
0
10,510
New Computer; BSOD, Prime95 fails and variur errors. (Win7, i5 3570K)

Hi,

I'm having trouble with a custom build I just created. All parts are new, fresh from the box.
The setup is:

OS:
Windows 7 Home Premium 64 bit

CPU:
Intel Core i5-3570K - 4 thread / 3,4GHz (3,8GHz Turbo) / 6MB / Socket 1155 (77w)

CPU Cooler:
Cooler Master HYPER 212 EVO

MoBo:
ASUS P8Z77-V PRO - ATX / Z77

RAM:
DDR3-DIMM1600 Corsair XMS3 Vengeance DDR3 PC12800/1600MHz CL10 8GB

Graphics:
ASUS GeForce GTX 660 Ti 2GB DirectCU II (GTX660 TI-DC2-2GD5)

PSU:
Corsair PowerSupply (PSU) GS 600W Gaming Series 80 Plus

HDD:
Samsung Intern SSD 128GB 830 SATA6G Basic Kit (SSD) (MZ-7PC128B/WW)


After building the computer I installed Windows and added drivers from the cds bundled with the MoBo and Graphic card. Then updated graphic drivers to latest.
Then I started gettig BSOD at random intervals when just using internet browsers and looking through windows. Of the BSOD i saved the errors where MEMORY_MANAGEMENT, KMODE_EXCEPTION_NOT_HANDLED and PAGE_FAULT_IN_NONPAGED_AREA.
I tried running Guild Wars 2 with High graphic settings, and it worked well for about 30 min, but the started crasching frequently (5min -> 2min ->30 sec) with "Memory at address xxxxxxx could not be read" errors.

I then updated my BIOS to latest version, pulled out the RAM and put it back in the same slot (recomended slot by MoBo) making sure it connected correctly. Then I ran Memtest86+ for 9 hours with 8 passes and no errors.

After that I tried running Guild Wars 2 again, which worked well during the 2-3hrs i played, and the computer whas stable that whole time. But the day after the BSOD crashes appeared again.

I reinstalled windows and installed the newest drivers for for all hardware.

Then I decided to run Prime95 which stops working on the first test, on all cores, while running Blended Torture tests:
[Sep 16 12:27] Worker starting
[Sep 16 12:27] Setting affinity to run worker on logical CPU #1
[Sep 16 12:27] Beginning a continuous self-test to check your computer.
[Sep 16 12:27] Please read stress.txt. Choose Test/Stop to end this test.
[Sep 16 12:27] Test 1, 6500 Lucas-Lehmer iterations of M12451841 using Pentium4 type-2 FFT length 640K, Pass1=640, Pass2=1K.
[Sep 16 12:27] FATAL ERROR: Rounding was 0.4999993295, expected less than 0.4
[Sep 16 12:27] Hardware failure detected, consult stress.txt file.
[Sep 16 12:27] Torture Test completed 0 tests in 0 minutes - 1 errors, 0 warnings.
[Sep 16 12:27] Worker stopped.

The same occurs for In-place large FFTs, but for FFTs it sometimes starts randomly (haven't found a connection to when it works or not).

I also tried running 3dMark but it doesn't start, or crashes before any visuals start (didn't save error message if there were any).
Tried running Furmark_1.10.2, and got an error message, but now that I wanted to paste it here it starts just fine (will start running it as soon as I post this (if it will start again :p)).
I tried running MSI Kombustor, but get error "Could not start the program because D3DCompiler_42.dll is missing on the computer" at start up.

As a final act I went into the BIOS and was going to see if I could correct the timing and frequebcy of the RAM, but the timing was correct 10-10-10-27 and I guess the frequency seemed correct as well, but I tried putting it to auto to se if there were any changes. But everything from Prime95 and down still fails. But no BSOD for 2 hrs at least...


For me, this is very random and I really appreciate if anyone would help me to find what is causing all this.

Thanks
/Nik
 

phyco126

Distinguished
Nov 6, 2011
1,014
0
19,460
Download and run Memtest86+ on each stick of RAM for several hours. Any errors, you have bad memory.

Check to see if Samsung has a diagnostic test for the SSD. Perhaps there is an issue that is being caused with that (if you have a spare hard drive, you can simply install windows on that and give it a shot).
 


Seems like a CPU issue. Have you checked the temps using Real Temp or HWMonitor?

Might be an SSD issue, but then you successfully installed and ran games from it, and I would not expect that.

However, when you FULLY stress your CPU it seems to crash.
 

aicom

Honorable
Mar 29, 2012
923
1
11,160
This does sound like a CPU issue if your RAM is fine. Try running Prime95 with small FFTs which should focus mostly on the CPU. If you get an error during that test, it's likely that the CPU is faulty.
 
make sure on the asus efi bios your running it in stock not over clock mode. in over clock mode the asus bios will change your ram and cpu bus from a 1 to 1 setting to a 1 to 5 setting. also in the ai setting make sure the dram speed set to xmp. also make sure the 24 and 4/8 pin atx power plugs are on tight. one thing to check too is how tight the cpu coolorer is on..if it on to tight it can bend the cpu..mb causing issues. one thing to check also is the mb guild and ram vendor to see if the ram been tested for z77 chipset. if you think it ram..see if someone has a spare stick..the last issue would be a power supply with bad ripple or not holding the correct ac voltage under load. (asking a friend if they have a spare.).
 

Nik_Yawn

Honorable
Sep 16, 2012
8
0
10,510
I'm currently at work with Memtest86+ running on my computer at home, will check status when I get home. Checked before I left and no errors after 7 passes so far.

Have been running Prime95 with small FFTs and it has failed 1 or 2 cores quite quickly (within 15 min), but I haven't really checked what's been wrong since Prime95 failed all the time during the blend test. Will run it again and plot down my results as soon as I get home.

I've been monitoring the temp and voltage to the CPU and nothing seems really odd, voltage seems fine and the temperature is around 20-30 C at normal usage and 65-70 at 100% (Prime95).

Regarding smorizios post, I will have to check all of that when I get home. If the CPU/MoBo is bent, will it be fixed by loosening the fan or is it permanently damaged?
 
65-70C max core temps in a Ivy Bridge? That is not good, and not to be expected. With a 212+ EVO it is VERY BAD. At stock speeds and normal room temps your cores should be at about 60C max under Prime95.
Perhaps your CPU is not throttling down, but I bet the memory controller is getting hot enough to spit out errors that look like memory errors.

That would explain why you are getting memory errors only when the CPU is being stressed.
 

phyco126

Distinguished
Nov 6, 2011
1,014
0
19,460
*sigh* I'm such a idiot - read the entire post and managed to miss where you mentioned you already ran memtest, so forgive me offering that suggestion. Guess that's what happens when you are falling asleep at your desk... you miss things.

At this point I would suggest either the CPU is shoddy (rare, but it happens) or its your SSD. Might be prudent to send off the CPU for a RMA right away, then report back here with the replacement.
 

Nik_Yawn

Honorable
Sep 16, 2012
8
0
10,510
I'm not sure about the temperature, looking through the forum there's alot of people who have temperatures above 60 C att full stress, and alot of people saing that is normal for Ivy bridge since they're hotter than Sandy bridge. (But that doesn't mean it's right though)

A friend of mine suggested it could be that I only am running with a single stick of memory (8gb) and it could be caused because of dual channeling. But is that enough to produce blue screen of death?
 
I suggest try a bios update if there is no change RMA the CPU but there is a chance it could be the motherboard so if you got them from the same place you could ask them if you can return both. If your temps where not already high I would suggest trying to tweak the CPU voltage but its unlikely to be to low if the CPU is running hot. Also if you can't run prime 95 for 15 mins the temp you have is not likely the full load temp.
 
With an after market air cooler, scroll down for stock temps
http://www.hitechlegion.com/reviews/processors/18324-intel-core-i5-3570k-ivybridge?start=22

A quick google for temps with your cooler
http://www.techpowerup.com/forums/showthread.php?t=168869
~60C at 30C room temp

However I agree that is within the variance I see.

What about Tcase? HWMonitor reports that as "package" under CPU temps, and Intel specifies the max as 67.4C
http://ark.intel.com/products/65520/Intel-Core-i5-3570K-Processor-6M-Cache-up-to-3_80-GHz
 

Nik_Yawn

Honorable
Sep 16, 2012
8
0
10,510
Memtest86+ had ran for 19 hours and 16 passes, no errors...

I also ran Seagate Harddrive diagnostics tool, all tests passed (exept for the extreme one that warned that it might delete content on the HDD, I didn't run that).

One thing that I haven't mentioned is that one of the errors were that Window updater didn't fully work. Started with SP1 disc and got some installations done. Finally I got all mandatory updates installed and Prime95 blended tests started working. I still can't install Win Update optional installs though.
Will run Prime95 blend overnight to see results...
 

Nik_Yawn

Honorable
Sep 16, 2012
8
0
10,510
Since lasttime I wrote I have had two Blue Screen crashes, both with new error codes; SYSTEM_SERVICE_EXCEPTION and INTERRUPT_EXCEPTION_NOT_HANDLED.

I ran Prime95 with blended tests twice and the first time all cores failed within 13 minutes of eachother, 23-26 minutes after launch.

[Sep 18 09:35] Test 2, 5300 Lucas-Lehmer iterations of M14155775 using Core2 type-3 FFT length 720K, Pass1=320, Pass2=2304.
[Sep 18 09:36] FATAL ERROR: Final result was 00000000, expected: B4C8B09B.
[Sep 18 09:36] Hardware failure detected, consult stress.txt file.
[Sep 18 09:36] Torture Test completed 33 tests in 31 minutes - 1 errors, 0 warnings.
[Sep 18 09:36] Worker stopped.

---

[Sep 18 09:40] Test 7, 5300 Lucas-Lehmer iterations of M13069345 using Core2 type-2 FFT length 720K, Pass1=320, Pass2=2304.
[Sep 18 09:40] FATAL ERROR: Resulting sum was 2811126942775229, expected: 2769248237297130
[Sep 18 09:40] Hardware failure detected, consult stress.txt file.
[Sep 18 09:40] Torture Test completed 38 tests in 35 minutes - 1 errors, 0 warnings.

---

[Sep 18 09:27] Test 10, 800000 Lucas-Lehmer iterations of M135169 using FFT length 8K.
[Sep 18 09:27] ERROR: ILLEGAL SUMOUT
[Sep 18 09:27] Possible hardware failure, consult readme.txt file, restarting test.
[Sep 18 09:27] ERROR: ILLEGAL SUMOUT
[Sep 18 09:27] Possible hardware failure, consult readme.txt file, restarting test.

... this message repeats itself about 50 times (cut out) ...

[Sep 18 09:27] ERROR: ILLEGAL SUMOUT
[Sep 18 09:27] Possible hardware failure, consult readme.txt file, restarting test.
[Sep 18 09:27] ERROR: ILLEGAL SUMOUT
[Sep 18 09:27] Possible hardware failure, consult readme.txt file, restarting test.
[Sep 18 09:27] ERROR: ILLEGAL SUMOUT
[Sep 18 09:27] Maximum number of warnings exceeded.
[Sep 18 09:27] Torture Test completed 21 tests in 22 minutes - 0 errors, 100 warnings.
[Sep 18 09:27] Worker stopped.

---

[Sep 18 09:40] Test 7, 5300 Lucas-Lehmer iterations of M13069345 using Core2 type-2 FFT length 720K, Pass1=320, Pass2=2304.
[Sep 18 09:40] ERROR: ILLEGAL SUMOUT
[Sep 18 09:40] Possible hardware failure, consult readme.txt file, restarting test.
[Sep 18 09:40] ERROR: ILLEGAL SUMOUT
[Sep 18 09:40] Possible hardware failure, consult readme.txt file, restarting test.

... this message also repeats itself about 50 times ...

[Sep 18 09:40] ERROR: ILLEGAL SUMOUT
[Sep 18 09:40] Possible hardware failure, consult readme.txt file, restarting test.
[Sep 18 09:40] ERROR: ILLEGAL SUMOUT
[Sep 18 09:40] Maximum number of warnings exceeded.
[Sep 18 09:40] Torture Test completed 38 tests in 35 minutes - 0 errors, 100 warnings.
[Sep 18 09:40] Worker stopped.


The second time I ran Prime 95 all tests on all cores failed within 24 minutes of eachother after 1 h 10 - 34 minutes.


[Sep 19 01:20] Test 11, 5300 Lucas-Lehmer iterations of M13069345 using Pentium4 type-2 FFT length 800K, Pass1=640, Pass2=1280.
[Sep 19 01:21] FATAL ERROR: Rounding was 0.4915771484, expected less than 0.4
[Sep 19 01:21] Hardware failure detected, consult stress.txt file.
[Sep 19 01:21] Torture Test completed 72 tests in 1 hour, 13 minutes - 1 errors, 0 warnings.
[Sep 19 01:21] Worker stopped.

---

[Sep 19 01:17] Test 8, 5300 Lucas-Lehmer iterations of M13669345 using Pentium4 type-3 FFT length 800K, Pass1=640, Pass2=1280.
[Sep 19 01:18] FATAL ERROR: Rounding was 0.4827575684, expected less than 0.4
[Sep 19 01:18] Hardware failure detected, consult stress.txt file.
[Sep 19 01:18] Torture Test completed 70 tests in 1 hour, 10 minutes - 1 errors, 0 warnings.
[Sep 19 01:18] Worker stopped.

---

[Sep 19 01:24] Test 2, 340000 Lucas-Lehmer iterations of M335393 using Core2 type-1 FFT length 16K, Pass1=64, Pass2=256.
[Sep 19 01:24] FATAL ERROR: Resulting sum was 1067310186520861, expected: 1155906287823293
[Sep 19 01:24] Hardware failure detected, consult stress.txt file.
[Sep 19 01:24] Torture Test completed 77 tests in 1 hour, 17 minutes - 1 errors, 0 warnings.
[Sep 19 01:24] Worker stopped.

---

[Sep 19 01:41] Test 3, 4000 Lucas-Lehmer iterations of M18274369 using Core2 type-2 FFT length 960K, Pass1=256, Pass2=3840.
[Sep 19 01:42] FATAL ERROR: Final result was 11E80A74, expected: B2FD8175.
[Sep 19 01:42] Hardware failure detected, consult stress.txt file.
[Sep 19 01:42] Torture Test completed 100 tests in 1 hour, 34 minutes - 1 errors, 0 warnings.
[Sep 19 01:42] Worker stopped.


On the plus side the temperatures won't go over 59 C degrees since I found out the motherboard had automatically overclocked the CPU to 103 MHz and changed it down to 100 MHz.


I'm thinking of handing the CPU in for testing, but I also suspect the PSU to be the culprit, so I will test how the computer will run if I detach the Graphics card, which will ease the strain.
 


That's pretty much standard on Asus boards, but it's a good call to set it to 100.

Lowering the load on the PSU might help, but then again it might hurt or do nothing, while still being a PSU problem. Say for instance your 12V rail was sitting at 13.5 volts at 7 amps, and you lowered the load to 4 amps and the voltage actually rose to 13.8V. Or perhaps you had very bad ripple at any load.
 

Nik_Yawn

Honorable
Sep 16, 2012
8
0
10,510
What is the best way to monitor the voltage and amp on all parts? I use HWMonitor, but it oly shows so much...

I used paste on the cooler/CPU, I actually had to whipe some that were spilling over after taking a little too much. (nothing spilling on to anything, just hanging over the CPU edge)

I have 2 120mm case fans, and are getting max 59 C now that I set it down to 100.

---

I tried switching the PSU with my old 580W that I had in my old computer to see if I got any change. I ran Prime95 and the first run all cores had failed within 2 minutes. The second try (after reeboot and whatnot) it ran longer so I decided to leav it during the night.
Two cores failed after 1h40m and 1h48m. The third failed after 3h30m and the fourth was still running after 8h when I shut it down...

Can't really conclude anything from that...

I also got a new BSOD error; NTFS_FILE_SYSTEM.

With all various BSOD and CPU crashes it seems like the culprit might be something central, like the Mother Board (in the dining room with a lead pipe (Clue joke. See, I'm fine. Not totaly insane yet.))
 
You can't really measure amps easily. You can use a multimeter to measure voltages directly, and that would be the most accurate means available. The volt measurements you get from your board aren't too accurate.

You can measure the draw from the wall of course, with a kill-a-watt.

You used too much paste then. As popatim says, about the size of a BB.

I don't know, the likelihood of both PSUs failing in the same way and producing similar results are slim.

Maybe so, but I can't recall any instance of a MB failing only under load.

Just RMA the board and CPU at the same time.
 

Nik_Yawn

Honorable
Sep 16, 2012
8
0
10,510
I will use less paste when I have to put everything back together then :)

The failures (BSOD) aren't occuring only during stress, it has happened while only using a internet browser or even idle this last time (After running IntelBurn the computer was idle during the night when it crashed).