Spitfire_x86

Splendid
Jun 26, 2002
7,248
0
25,780
At least 10 WUs crashed within last 16 hours. Here's a part of the log files.

<pre>[06:47:00] - Connecting to assignment server
[06:47:03] - Successful: assigned to (171.65.103.156).
[06:47:03] + News From Folding@Home: Welcome to Folding@Home
[06:47:03] Loaded queue successfully.
[06:59:18] + Closed connections
[06:59:18]
[06:59:18] + Processing work unit
[06:59:18] Core required: FahCore_78.exe
[06:59:18] Core found.
[06:59:18] Working on Unit 08 [August 20 06:59:18]
[06:59:18] + Working ...
[06:59:18]
[06:59:18] *------------------------------*
[06:59:18] Folding@Home Gromacs Core
[06:59:18] Version 1.80 (March 16, 2005)
[06:59:18]
[06:59:18] Preparing to commence simulation
[06:59:18] - Looking at optimizations...
[06:59:18] - Created dyn
[06:59:18] - Files status OK
[06:59:26] - Expanded 2966937 -> 16166417 (decompressed 544.8 percent)
[06:59:27] - Starting from initial work packet
[06:59:27]
[06:59:27] Project: 1140 (Run 98, Clone 14, Gen 17)
[06:59:27]
[06:59:27] Assembly optimizations on if available.
[06:59:27] Entering M.D.
[06:59:36] Protein: p1140_RIBO_FSpeptide_EXT_nospring
[06:59:36]
[06:59:36] Writing local files
[06:59:44] Extra SSE boost OK.
[06:59:46] Writing local files
[06:59:46] Completed 0 out of 250000 steps (0)
[07:55:10] Writing local files
[07:55:10] Completed 2500 out of 250000 steps (1)
[08:45:02] Writing local files
[08:45:02] Completed 5000 out of 250000 steps (2)
[09:16:09] Quit 101 - Fatal error: NaN detected: (ener[12])
[09:16:09]
[09:16:09] Simulation instability has been encountered. The run has entered a
[09:16:09] state from which no further progress can be made.
[09:16:09] This may be the correct result of the simulation, however if you
[09:16:09] often see other project units terminating early like this
[09:16:09] too, you may wish to check the stability of your computer (issues
[09:16:09] such as high temperature, overclocking, etc.).
[09:16:09] Going to send back what have done.
[09:16:09] logfile size: 8012
[09:16:09] - Writing 8575 bytes of core data to disk...
[09:16:09] ... Done.
[09:16:09]
[09:16:09] Folding@home Core Shutdown: EARLY_UNIT_END
[09:16:13] CoreStatus = 72 (114)
[09:16:13] Sending work to server


[09:16:13] + Attempting to send results
[09:16:19] + Results successfully sent
[09:16:19] Thank you for your contribution to Folding@Home.


[09:16:23] + Attempting to send results
[09:16:36] + Results successfully sent
[09:16:36] Thank you for your contribution to Folding@Home.
[09:16:36] - Preparing to get new work unit...
[09:16:36] + Attempting to get work packet
[09:16:36] - Connecting to assignment server
[09:16:44] - Successful: assigned to (171.65.103.158).
[09:16:44] + News From Folding@Home: Welcome to Folding@Home
[09:16:44] Loaded queue successfully.
[09:18:41] + Closed connections
[09:18:46]
[09:18:46] + Processing work unit
[09:18:46] Core required: FahCore_82.exe
[09:18:46] Core found.
[09:18:46] Working on Unit 09 [August 20 09:18:46]
[09:18:46] + Working ...
[09:18:47]
[09:18:47] *------------------------------*
[09:18:47] Folding@Home PMD Core
[09:18:47] Version 1.01 (Oct 15, 2004)
[09:18:47]
[09:18:47] Preparing to commence simulation
[09:18:47] - Looking at optimizations...
[09:18:47] - Created dyn
[09:18:47] - Files status OK
[09:18:47] - Expanded 82038 -> 558743 (decompressed 681.0 percent)
[09:18:47]
[09:18:47] Project: 1805 (Run 12, Clone 756, Gen 8)
[09:18:47]
[09:18:47] Assembly optimizations on if available.
[09:18:47] Entering M.D.
[09:18:53] Protein: p1805_Collagen_POG10_refolding_gamma
[09:18:53]
[09:18:53] Completed 0 out of 500000 steps (0)
[09:33:51] Writing checkpoint files
[09:42:51] Writing local files
[09:42:51] Completed 5000 out of 500000 steps (1)
[09:48:52] Writing checkpoint files
[10:01:47] Writing local files
[10:01:47] Completed 10000 out of 500000 steps (2)
[10:03:53] Writing checkpoint files
[10:18:55] Writing checkpoint files
[10:19:16] Writing local files
[10:19:16] Completed 15000 out of 500000 steps (3)
[10:33:55] Writing checkpoint files
[10:36:18] Writing local files
[10:36:18] Completed 20000 out of 500000 steps (4)
[10:40:36] NaN/Inf detected e[0]
[10:40:36] Going to send back what have done.
[10:40:36] logfile size: 5120
[10:40:36] - Writing 5640 bytes of core data to disk...
[10:40:36] ... Done.
[10:40:36]
[10:40:36] Folding@home Core Shutdown: EARLY_UNIT_END
[10:40:39] CoreStatus = 72 (114)
[10:40:39] Sending work to server


[10:40:39] + Attempting to send results
[10:40:47] + Results successfully sent
[10:40:47] Thank you for your contribution to Folding@Home.
[10:40:51] - Preparing to get new work unit...
[10:40:51] + Attempting to get work packet
[10:40:51] - Connecting to assignment server
[10:41:07] + Could not connect to Assignment Server
[10:41:20] - Successful: assigned to (171.65.103.158).
[10:41:20] + News From Folding@Home: Welcome to Folding@Home
[10:41:20] Loaded queue successfully.
[10:42:12] + Closed connections
[10:42:17]
[10:42:17] + Processing work unit
[10:42:17] Core required: FahCore_82.exe
[10:42:17] Core found.
[10:42:17] Working on Unit 00 [August 20 10:42:17]
[10:42:17] + Working ...
[10:42:17]
[10:42:17] *------------------------------*
[10:42:17] Folding@Home PMD Core
[10:42:17] Version 1.01 (Oct 15, 2004)
[10:42:17]
[10:42:17] Preparing to commence simulation
[10:42:17] - Looking at optimizations...
[10:42:17] - Created dyn
[10:42:17] - Files status OK
[10:42:18] - Expanded 81057 -> 557800 (decompressed 688.1 percent)
[10:42:18]
[10:42:18] Project: 1807 (Run 14, Clone 41, Gen 0)
[10:42:18]
[10:42:18] Assembly optimizations on if available.
[10:42:18] Entering M.D.
[10:42:26] Protein: p1807_Collagen_POG10new_restrained_unfolding
[10:42:26]
[10:42:26] Completed 0 out of 500000 steps (0)
[10:44:44] Writing local files
[10:44:44] Completed 5000 out of 500000 steps (1)
[10:46:59] Writing local files
[10:46:59] Completed 10000 out of 500000 steps (2)
[10:49:10] Writing local files
[10:49:10] Completed 15000 out of 500000 steps (3)
[10:51:32] Writing local files
[10:51:32] Completed 20000 out of 500000 steps (4)
[10:53:45] Writing local files
[10:53:45] Completed 25000 out of 500000 steps (5)
[10:56:05] Writing local files
[10:56:05] Completed 30000 out of 500000 steps (6)
[10:57:18] Writing checkpoint files
[10:57:24] NaN/Inf detected e[0]
[10:57:24] Going to send back what have done.
[10:57:24] logfile size: 13312
[10:57:24] - Writing 13832 bytes of core data to disk...
[10:57:24] ... Done.
[10:57:24]
[10:57:24] Folding@home Core Shutdown: EARLY_UNIT_END
[10:57:27] CoreStatus = 72 (114)
[10:57:27] Sending work to server


[10:57:27] + Attempting to send results
[10:57:39] + Results successfully sent
[10:57:39] Thank you for your contribution to Folding@Home.
[10:57:43] - Preparing to get new work unit...
[10:57:43] + Attempting to get work packet
[10:57:43] - Connecting to assignment server
[10:57:49] - Successful: assigned to (171.65.103.156).
[10:57:49] + News From Folding@Home: Welcome to Folding@Home
[10:57:49] Loaded queue successfully.
[11:25:56] + Closed connections
[11:26:01]
[11:26:01] + Processing work unit
[11:26:01] Core required: FahCore_78.exe
[11:26:01] Core found.
[11:26:01] Working on Unit 01 [August 20 11:26:01]
[11:26:01] + Working ...
[11:26:02]
[11:26:02] *------------------------------*
[11:26:02] Folding@Home Gromacs Core
[11:26:02] Version 1.80 (March 16, 2005)
[11:26:02]
[11:26:02] Preparing to commence simulation
[11:26:02] - Looking at optimizations...
[11:26:02] - Created dyn
[11:26:02] - Files status OK
[11:26:10] - Expanded 3035945 -> 16546233 (decompressed 545.0 percent)
[11:26:10] - Starting from initial work packet
[11:26:10]
[11:26:10] Project: 1144 (Run 115, Clone 12, Gen 5)
[11:26:10]
[11:26:11] Assembly optimizations on if available.
[11:26:11] Entering M.D.
[11:26:19] Protein: p1144_RIBO_nopeptide
[11:26:19]
[11:26:20] Writing local files
[11:26:27] Extra SSE boost OK.
[11:26:28] Writing local files
[11:26:29] Completed 0 out of 250000 steps (0)
</pre><p>
My CPU has been always undervoted (1.55 from default 1.65) and my PC2700 CL2.5 DDR has been running at 2.0-2-2-5 @ PC2100. CPU temp around 50-52.

------------
<font color=orange><b><A HREF="http://www.mozilla.org/products/firefox" target="_new">Rediscover the web</A></b></font color=orange>
 

mozzartusm

Splendid
Sep 17, 2004
4,693
0
22,780
The last time that I added up the uncompleted WUs that I have lost with no credit it was around 5000 points. I hardly ever fold anymore because of this.

ASUS P5WD2 Premium
Intel 3.73 EE @ 5.6Ghz
XMS2 DDR2 @ 1180Mhz

<A HREF="http://valid.x86-secret.com/records.php?PHPSESSID=792e8f49d5d9b8a4d1ad6f40ca029756" target="_new">#2 CPUZ</A>
SuperPI 25secs
 

mozzartusm

Splendid
Sep 17, 2004
4,693
0
22,780
At one time there was an e-mail addy to contact folding@home but I havent been able to find it lately. Do you know of one?

ASUS P5WD2 Premium
Intel 3.73 EE @ 5.6Ghz
XMS2 DDR2 @ 1180Mhz

<A HREF="http://valid.x86-secret.com/records.php?PHPSESSID=792e8f49d5d9b8a4d1ad6f40ca029756" target="_new">#2 CPUZ</A>
SuperPI 25secs
 

Spitfire_x86

Splendid
Jun 26, 2002
7,248
0
25,780
I haven't tried memtest86 yet, but increasing the voltage to 1.6v (still lower than default 1.65v) didn't solve the problem. Now I'm back to 1.55v and trying 2.0-3-3-6 with RAM. If this doesn't work, then I'll run memtest86.

My voltages are fine, at least not abnormal compared to the 100% stable days. Although F@H keeps generating errors, everything else still work fine.

I'm really worried about this problem. It's threatening to shatter all of my landmark dreams. So far I've managed 84 points from 20 broken WUs (4 among these 20 were 600 pointers :frown: )

------------
<font color=orange><b><A HREF="http://www.mozilla.org/products/firefox" target="_new">Rediscover the web</A></b></font color=orange>
 

Spitfire_x86

Splendid
Jun 26, 2002
7,248
0
25,780
This kind of overclocked system is highly unlikely to be stable enough for F@H :wink:

------------
<font color=orange><b><A HREF="http://www.mozilla.org/products/firefox" target="_new">Rediscover the web</A></b></font color=orange>
 

mozzartusm

Splendid
Sep 17, 2004
4,693
0
22,780
I dont run it very often on that system. I ran it on my Mom, grandfather and 2 friends systems.

ASUS P5WD2 Premium
Intel 3.73 EE @ 5.6Ghz
XMS2 DDR2 @ 1180Mhz

<A HREF="http://valid.x86-secret.com/records.php?PHPSESSID=792e8f49d5d9b8a4d1ad6f40ca029756" target="_new">#2 CPUZ</A>
SuperPI 25secs
 

Spitfire_x86

Splendid
Jun 26, 2002
7,248
0
25,780
F@H WUs crashed again, even after using RAM at 2.0-3-3-6.

I've went back to 2.0-2-2-5 and ran memtest86. It passed memtest 3 times without errors before I stopped it.

Edit: For memtest86, I used CPU at default 1.65v

------------
<font color=orange><b><A HREF="http://www.mozilla.org/products/firefox" target="_new">Rediscover the web</A></b></font color=orange>
 

mozzartusm

Splendid
Sep 17, 2004
4,693
0
22,780
Earlier I tried to get one started and it crashed 8 times before it finally started one.

ASUS P5WD2 Premium
Intel 3.73 EE @ 5.6Ghz
XMS2 DDR2 @ 1180Mhz

<A HREF="http://valid.x86-secret.com/records.php?PHPSESSID=792e8f49d5d9b8a4d1ad6f40ca029756" target="_new">#2 CPUZ</A>
SuperPI 25secs
 

endyen

Splendid
When was the last time you shut down? If I dont shut down once a week, some of my systems go a little funny.
Your mobo is getting a little long in the tooth, maybe it's time to give it a good clean and look. Check the north bridge hsf well.
How old is your xp install?
 

mozzartusm

Splendid
Sep 17, 2004
4,693
0
22,780
The three systems that I usually fold on are older P3's. They started this about 2 months ago, up until then it worked fine.

ASUS P5WD2 Premium
Intel 3.73 EE @ 5.6Ghz
XMS2 DDR2 @ 1180Mhz

<A HREF="http://valid.x86-secret.com/records.php?PHPSESSID=792e8f49d5d9b8a4d1ad6f40ca029756" target="_new">#2 CPUZ</A>
SuperPI 25secs
 

Spitfire_x86

Splendid
Jun 26, 2002
7,248
0
25,780
After setting voltage to default 1.65v, the problem seems to be gone. I said before that it paseed memtest thrice, and now it has finished a WU without crashing.

FYI, my WinXP installation is 8 months old, case is quite dust free (I've cleaned it couple of weeks ago). Northbridge HSF is working fine. Usually I can never run PC for more than 48 hours, not for stabliity issues but for unstable internet connection and loadshedding.

Has my mobo become not good enough to handle undervolted CPU anymore? The problem with 1.65v is my CPU is reaching upto 56-57C while folding (but it can come down to 43C while idle).

My PSU is probably not liking 1.65v CPU at full load, +12v is swinging around 11.80v to 11.74v. But when it's idle (1.65v) or working under full load (1.55v), +12v usually gives 11.9v to 11.94v.

------------
<font color=orange><b><A HREF="http://www.mozilla.org/products/firefox" target="_new">Rediscover the web</A></b></font color=orange>
 

slvr_phoenix

Splendid
Dec 31, 2007
6,223
1
25,780
I've noticed that every so often F@H crashes on my system too. (Though more like one crash every two months with the PC running 24/7.) But my system is a completely stock P4C 2.6 with a sturdy i865 mobo, good CAS2 DDR400 at reasonable (ie. not as tight as can be) timings and slightly volted up for ensured stability, no PAT (I could enable it if I wanted, I just don't want), good voltages from a reliable PSU, good airflow, and good cooling. I even load WHQL-certified drivers over non-certified whenever I have the choice. It's got an intake filter. I clean dust out with compressed air fairly regularly. Heck, it's even got a UPS. (More for voltage regulation than power outages.) And it's on RAID1 in case of a hard drive failure.

So the only thing that I <i>didn't</i> do to make it the most stable system on earth was give it registered/ECC RAM and maybe give it ten foot thick concrete, water, and lead shielding. **ROFL**

And yet F@H crashes every so often, even though no other apps give me any troubles. Memtests run fine. Prime95 runs fine. Sanrda runs fine. Etc. So frankly, I think at lease some of the problem people see with F@H crashing is the work units themselves. I mean heaven forbid suggesting that the code actually contains a bug. :O

I mean sure, obviously a PC tweaked enough may stress out under the extreme load of constantly running F@H 24/7. So if there's any fault at all in your system, running F@H seriously is probably going to bring it out fairly quickly. But when even a majorly stable system can crash from F@H (but nothing else) it begins to suggest that F@H itself may be the culprit, at least from time to time.

I love F@H, but if you look at, for example, the oddities of the Windows GUI version of F@H closely enough, you'll begin to wonder about the sanity (if not skill) of some of the developers. Besides, no one is perfect. We all make mistakes.

:evil: یί∫υєг ρђœŋίχ :evil:
<font color=red><i>The Devil himself is good, when he is pleased.</i></font color=red>
@ 195K of 200K!
 

Spitfire_x86

Splendid
Jun 26, 2002
7,248
0
25,780
F@H GUI or core wasn't crashing, the WUs were crashing!

My system was undervolted and running with aggressive memory timing, so I can't complain much about the problem. My PSU and cooling is at best just good enough. It's back to stable state since I stopped undervolting. But it has been F@H stable so far (except for odd WU crashes, like your case) with same settings and power/cooling equipments. Why would it suddenly start to dislike undervolting?

------------
<font color=orange><b><A HREF="http://www.mozilla.org/products/firefox" target="_new">Rediscover the web</A></b></font color=orange>
 

stefan

Distinguished
Apr 14, 2004
334
0
18,780
maybe thats the wear & tear that comes with running F@H constantly. I'm not much of a hardware-guy myself, all of my systems are running at stock-speeds and of course I see some workunit crash from time to time. (I think, most of the time the WU's crash, but sometimes it's the app thats processing the workunits: I have more trust in the console-version; I've both versions running on various systems, and the console-version not only keeps the logs smaller but also makes the more stable impression on me, judged over a longer period [ie months])
the wear and tear of hardware running constantly at full load can be seen most easily with laptops in my opinion: the fan gets louder (probably dust, but tricky to clean!) - so why might there not also be other symptoms such as a mobo needing more voltage? cheers
 

slvr_phoenix

Splendid
Dec 31, 2007
6,223
1
25,780
F@H GUI or core wasn't crashing, the WUs were crashing!
I know. I'm saying that if the same developer(s) that made the F@H GUI program also make WU exes then it's no wonder that some WUs crash! **ROFL**

Why would it suddenly start to dislike undervolting?
That is weird, but probably just has to do with parts aging or a change in ambient temps?

:evil: یί∫υєг ρђœŋίχ :evil:
<font color=red><i>The Devil himself is good, when he is pleased.</i></font color=red>
@ 195K of 200K!
 

Spitfire_x86

Splendid
Jun 26, 2002
7,248
0
25,780
You mean, we download "fahcore_78.exe" or "fahcore_65.exe" everytime we get a WU?

Anyway, this has been a very hot summer. System temp has been hitting 45 almost for the whole summer, maybe that had some effects on my mobo. It has gone through 2½ summer so far.

------------
<font color=orange><b><A HREF="http://www.mozilla.org/products/firefox" target="_new">Rediscover the web</A></b></font color=orange>
 

stefan

Distinguished
Apr 14, 2004
334
0
18,780
you only download the core-files, when the client-programm does not find any. it even may be wise to delete the core-files from time to time, as the client is forced to download what probably is a newer version of the core.
 

endyen

Splendid
They do a lot of beta testing before release, but they do changes on the fly. It's robably the changes that cause the odd crash.
XP is the best os M$ has ever put out, but having said that, it still needs rebooting, and or reloading from time to time.
 

endyen

Splendid
The filter cct for v-core on the NF7 boards is thier weakest link. It's what makes them undervolt so often. If you can get a little more air on the coils, it may help.
 

Spitfire_x86

Splendid
Jun 26, 2002
7,248
0
25,780
yeah, NF7 always undervolted the CPUs by 0.2-0.3v, it hasn't started to undervolt more than this range recently. But I was intentionally undervolting, everything used to be fine with (my undervolting + auto undervolting)

------------
<font color=orange><b><A HREF="http://www.mozilla.org/products/firefox" target="_new">Rediscover the web</A></b></font color=orange>
 

Spitfire_x86

Splendid
Jun 26, 2002
7,248
0
25,780
How can northbridge overvolting help me to undervolt my CPU? I'm running FSB at only 133 MHz and my RAM is doing fine at 2.0-2-2-5

------------
<font color=orange><b><A HREF="http://www.mozilla.org/products/firefox" target="_new">Rediscover the web</A></b></font color=orange>
 

TRENDING THREADS