Multiple File_IO_Errors

This client will only use a single CPU

Moderators: Site Moderators, PandeGroup

Multiple File_IO_Errors

Postby DrBB1 » Fri May 18, 2012 12:32 am

My PC was folding unattended for a few days. When I returned, I found that three WUs had run almost to completion (possibly just short of 100%), then crashed with I/O errors. I have included excerpts from the log file below for several days:

Code: Select all
[20:54:06] - Autosending finished units... [May 15 20:54:06 UTC]
[20:54:06] Trying to send all finished work units
[20:54:06] + No unsent completed units remaining.
[20:54:06] - Autosend completed
[20:54:06] + Working...
[20:55:33] Completed 250000 out of 250000 steps  (100%)
[20:55:34] DynamicWrapper: Finished Work Unit: sleep=10000
[20:55:44]
[20:55:44] Finished Work Unit:
[20:55:44] - Reading up to 1348944 from "work/wudata_03.trr": Read 1348944
[20:55:44] trr file hash check passed.
[20:55:44] - Reading up to 810612 from "work/wudata_03.xtc": Read 810612
[20:55:44] xtc file hash check passed.
[20:55:44] edr file hash check passed.
[20:55:44] logfile size: 22242
[20:55:44] Leaving Run
[20:55:49] - Writing 2187290 bytes of core data to disk...
[20:55:49] Done: 2186778 -> 2092919 (compressed to 95.7 percent)
[20:55:50]   ... Done.
[20:55:51] - Shutting down core
[20:55:51]
[20:55:51] Folding@home Core Shutdown: FINISHED_UNIT
[20:55:54] CoreStatus = 64 (100)
[20:55:54] Unit 3 finished with 90 percent of time to deadline remaining.
[20:55:54] Updated performance fraction: 0.878462
[20:55:54] Sending work to server
[20:55:54] Project: 8042 (Run 1, Clone 3999, Gen 21)


[20:55:54] + Attempting to send results [May 15 20:55:54 UTC]
[20:55:54] - Reading file work/wuresults_03.dat from core
[20:55:54]   (Read 2093431 bytes from disk)
[20:55:54] Connecting to http://171.67.108.59:8080/
[20:56:04] Posted data.
[20:56:04] Initial: 0000; - Uploaded at ~204 kB/s
[20:56:04] - Averaged speed for that direction ~198 kB/s
[20:56:04] + Results successfully sent
[20:56:04] Thank you for your contribution to Folding@Home.
[20:56:04] + Number of Units Completed: 990

[20:56:09] Trying to send all finished work units
[20:56:09] + No unsent completed units remaining.
[20:56:09] - Preparing to get new work unit...
[20:56:09] + Attempting to get work packet
[20:56:09] - Will indicate memory of 3999 MB
[20:56:09] - Connecting to assignment server
[20:56:09] Connecting to http://assign.stanford.edu:8080/
[20:56:10] Posted data.
[20:56:10] Initial: 43AB; - Successful: assigned to (171.67.108.59).
[20:56:10] + News From Folding@Home: Welcome to Folding@Home
[20:56:10] Loaded queue successfully.
[20:56:10] Connecting to http://171.67.108.59:8080/
[20:56:11] Posted data.
[20:56:11] Initial: 0000; - Receiving payload (expected size: 1131312)
[20:56:12] - Downloaded at ~1104 kB/s
[20:56:12] - Averaged speed for that direction ~678 kB/s
[20:56:12] + Received work.
[20:56:12] Trying to send all finished work units
[20:56:12] + No unsent completed units remaining.
[20:56:12] + Closed connections
[20:56:12]
[20:56:12] + Processing work unit
[20:56:12] Core required: FahCore_a4.exe
[20:56:12] Core found.
[20:56:12] Working on queue slot 04 [May 15 20:56:12 UTC]
[20:56:12] + Working ...
[20:56:12] - Calling '.\FahCore_a4.exe -dir work/ -suffix 04 -priority 96 -nocpulock -checkpoint 30 -verbose -lifeline 5928 -version 623'

[20:56:13]
[20:56:13] *------------------------------*
[20:56:13] Folding@Home Gromacs GB Core
[20:56:13] Version 2.27 (Dec. 15, 2010)
[20:56:13]
[20:56:13] Preparing to commence simulation
[20:56:13] - Looking at optimizations...
[20:56:13] - Created dyn
[20:56:13] - Files status OK
[20:56:13] - Expanded 1130800 -> 3058560 (decompressed 270.4 percent)
[20:56:13] Called DecompressByteArray: compressed_data_size=1130800 data_size=3058560, decompressed_data_size=3058560 diff=0
[20:56:13] - Digital signature verified
[20:56:13]
[20:56:13] Project: 8042 (Run 0, Clone 2148, Gen 14)
[20:56:13]
[20:56:13] Assembly optimizations on if available.
[20:56:13] Entering M.D.
[20:56:19] Mapping NT from 1 to 1
[20:56:20] Completed 0 out of 250000 steps  (0%)
[21:10:41] Completed 2500 out of 250000 steps  (1%)
[21:24:58] Completed 5000 out of 250000 steps  (2%)
[21:39:13] Completed 7500 out of 250000 steps  (3%)
[21:53:28] Completed 10000 out of 250000 steps  (4%)
[22:09:58] Completed 12500 out of 250000 steps  (5%)
[22:26:15] Completed 15000 out of 250000 steps  (6%)
[22:40:50] Completed 17500 out of 250000 steps  (7%)
[22:55:09] Completed 20000 out of 250000 steps  (8%)
[23:09:32] Completed 22500 out of 250000 steps  (9%)
[23:25:08] Completed 25000 out of 250000 steps  (10%)
.
.
.

[22:27:06] Completed 242500 out of 250000 steps  (97%)
[22:42:42] Completed 245000 out of 250000 steps  (98%)
[22:55:40] 564 from "work/wudata_04.trr": Read 1350564
[22:55:40] trr file hash check passed.
[22:55:40] - Reading up to 811660 from "work/wudata_04.xtc": Read 811660
[22:55:40] - Checksum of file (work/wudata_04.xtc) read from disk doesn't match
[22:55:40]
[22:55:40] Folding@home Core Shutdown: FILE_IO_ERROR
[23:14:11] ed.
[23:14:11] - Reading up to 811660 from "work/wudata_04.xtc": Read 811660
[23:14:11] - Checksum of file (work/wudata_04.xtc) read from disk doesn't match
[23:14:11]
[23:14:11] Folding@home Core Shutdown: FILE_IO_ERROR
[23:14:13] CoreStatus = 75 (117)
[23:14:13] Error opening or reading from a file.
[23:14:13] Deleting current work unit & continuing...
[23:14:18] Trying to send all finished work units
[23:14:18] + No unsent completed units remaining.
[23:14:18] - Preparing to get new work unit...
[23:14:18] + Attempting to get work packet
[23:14:18] - Will indicate memory of 3999 MB
[23:14:18] - Connecting to assignment server
[23:14:18] Connecting to http://assign.stanford.edu:8080/
[23:14:18] Posted data.
[23:14:18] Initial: 43AB; - Successful: assigned to (171.67.108.58).
[23:14:18] + News From Folding@Home: Welcome to Folding@Home
[23:14:18] Loaded queue successfully.
[23:14:18] Connecting to http://171.67.108.58:8080/
[23:14:19] Posted data.
[23:14:19] Initial: 0000; - Receiving payload (expected size: 547129)
[23:14:21] - Downloaded at ~267 kB/s
[23:14:21] - Averaged speed for that direction ~595 kB/s
[23:14:21] + Received work.
[23:14:21] + Closed connections
[23:14:26]
[23:14:26] + Processing work unit
[23:14:26] Core required: FahCore_a4.exe
[23:14:26] Core found.
[23:14:26] Working on queue slot 05 [May 16 23:14:26 UTC]
[23:14:26] + Working ...
[23:14:26] - Calling '.\FahCore_a4.exe -dir work/ -suffix 05 -priority 96 -nocpulock -checkpoint 30 -verbose -lifeline 5928 -version 623'

[23:14:26]
[23:14:26] *------------------------------*
[23:14:26] Folding@Home Gromacs GB Core
[23:14:26] Version 2.27 (Dec. 15, 2010)
[23:14:26]
[23:14:26] Preparing to commence simulation
[23:14:26] - Looking at optimizations...
[23:14:26] - Created dyn
[23:14:26] - Files status OK
[23:14:26] - Expanded 546879 -> 1326096 (decompressed 242.4 percent)
[23:14:26] Called DecompressByteArray: compressed_data_size=546879 data_size=1326096, decompressed_data_size=1326096 diff=0
[23:14:26] - Digital signature verified
[23:14:26]
[23:14:26] Project: 8014 (Run 7, Clone 212, Gen 10)
[23:14:26]
[23:14:26] Assembly optimizations on if available.
[23:14:26] Entering M.D.
[23:14:26] Mapping NT from 1 to 1
[23:14:26] Completed 0 out of 250000 steps  (0%)
[23:14:26] Completed 2500 out of 250000 steps  (1%)
[23:14:26] Completed 5000 out of 250000 steps  (2%)
[23:18:37] Completed 7500 out of 250000 steps  (3%)
[23:26:10] Completed 10000 out of 250000 steps  (4%)
[23:33:44] Completed 12500 out of 250000 steps  (5%)
[23:41:06] Completed 15000 out of 250000 steps  (6%)
[23:48:35] Completed 17500 out of 250000 steps  (7%)
[23:56:15] Completed 20000 out of 250000 steps  (8%)
[00:03:41] Completed 22500 out of 250000 steps  (9%)
.
.
.
[0:48:54] Completed 237500 out of 250000 steps  (95%)
[10:56:15] Completed 240000 out of 250000 steps  (96%)
[11:03:36] Completed 242500 out of 250000 steps  (97%)
[11:10:57] Completed 245000 out of 250000 steps  (98%)
[11:18:13] Completed 247500 out of 250000 steps  (99%)
[11:25:34] Completed 250000 out of 250000 steps  (100%)
[11:25:35] DynamicWrapper: Finished Work Unit: sleep=10000
[11:25:45]
[11:25:45] Finished Work Unit:
[11:25:45] - Reading up to 769164 from "work/wudata_05.trr": Read 769164
[11:25:45] - Checksum of file (work/wudata_05.trr) read from disk doesn't match
[11:25:45]
[11:25:45] Folding@home Core Shutdown: FILE_IO_ERROR
[11:41:00] _05.xtc) read from disk doesn't match
[11:41:00]
[11:41:00] Folding@home Core Shutdown: FILE_IO_ERROR
[11:41:04] CoreStatus = 75 (117)
[11:41:04] Error opening or reading from a file.
[11:41:04] Deleting current work unit & continuing...
[11:41:08] Trying to send all finished work units
[11:41:08] + No unsent completed units remaining.
[11:41:08] - Preparing to get new work unit...
[11:41:08] + Attempting to get work packet
[11:41:08] - Will indicate memory of 3999 MB
[11:41:08] - Connecting to assignment server
[11:41:08] Connecting to http://assign.stanford.edu:8080/
[11:41:09] Posted data.
[11:41:09] Initial: 43AB; - Successful: assigned to (171.67.108.58).
[11:41:09] + News From Folding@Home: Welcome to Folding@Home
[11:41:09] Loaded queue successfully.
[11:41:09] Connecting to http://171.67.108.58:8080/
[11:41:10] Posted data.
[11:41:10] Initial: 0000; - Receiving payload (expected size: 544958)
[11:41:11] - Downloaded at ~532 kB/s
[11:41:11] - Averaged speed for that direction ~583 kB/s
[11:41:11] + Received work.
[11:41:11] + Closed connections
[11:41:16]
[11:41:16] + Processing work unit
[11:41:16] Core required: FahCore_a4.exe
[11:41:16] Core found.
[11:41:16] Working on queue slot 06 [May 17 11:41:16 UTC]
[11:41:16] + Working ...
[11:41:16] - Calling '.\FahCore_a4.exe -dir work/ -suffix 06 -priority 96 -nocpulock -checkpoint 30 -verbose -lifeline 5928 -version 623'

[11:41:16]
[11:41:16] *------------------------------*
[11:41:16] Folding@Home Gromacs GB Core
[11:41:16] Version 2.27 (Dec. 15, 2010)
[11:41:16]
[11:41:16] Preparing to commence simulation
[11:41:16] - Looking at optimizations...
[11:41:16] - Created dyn
[11:41:16] - Files status OK
[11:41:16] - Expanded 546968 -> 1326096 (decompressed 242.4 percent)
[11:41:16] Called DecompressByteArray: compressed_data_size=546968 data_size=1326096, decompressed_data_size=1326096 diff=0
[11:41:16] - Digital signature verified
[11:41:16]
[11:41:16] Project: 8014 (Run 0, Clone 996, Gen 8)
[11:41:16]
[11:41:16] Assembly optimizations on if available.
[11:41:16] Entering M.D.
[11:41:16] Mapping NT from 1 to 1
[11:41:16] Completed 0 out of 250000 steps  (0%)
[11:41:16] Completed 2500 out of 250000 steps  (1%)
[11:41:16] Completed 5000 out of 250000 steps  (2%)
[11:48:11] Completed 7500 out of 250000 steps  (3%)
[11:55:33] Completed 10000 out of 250000 steps  (4%)
[12:02:50] Completed 12500 out of 250000 steps  (5%)
[12:10:02] Completed 15000 out of 250000 steps  (6%)
[12:17:20] Completed 17500 out of 250000 steps  (7%)
[12:24:42] Completed 20000 out of 250000 steps  (8%)
[12:31:56] Completed 22500 out of 250000 steps  (9%)
[12:39:10] Completed 25000 out of 250000 steps  (10%)
.
.
.
[22:48:52] Completed 240000 out of 250000 steps  (96%)
[22:55:42] Completed 242500 out of 250000 steps  (97%)
[23:02:41] Completed 245000 out of 250000 steps  (98%)
[23:09:46] Completed 247500 out of 250000 steps  (99%)
[23:16:37] Completed 250000 out of 250000 steps  (100%)
[23:16:38] DynamicWrapper: Finished Work Unit: sleep=10000
[23:16:48]
[23:16:48] Finished Work Unit:
[23:16:48] - Reading up to 769992 from "work/wudata_06.trr": Read 769992
[23:16:48] trr file hash check passed.
[23:16:48] - Reading up to 461424 from "work/wudata_06.xtc": Read 461424
[23:16:48] - Checksum of file (work/wudata_06.xtc) read from disk doesn't match
[23:16:48]
[23:16:48] Folding@home Core Shutdown: FILE_IO_ERROR
[23:16:51] CoreStatus = 75 (117)
[23:16:51] Error opening or reading from a file.
[23:16:51] Deleting current work unit & continuing...
[23:16:55] Trying to send all finished work units
[23:16:55] + No unsent completed units remaining.
[23:16:55] - Preparing to get new work unit...
[23:16:55] + Attempting to get work packet
[23:16:55] - Will indicate memory of 3999 MB
[23:16:55] - Connecting to assignment server
[23:16:55] Connecting to http://assign.stanford.edu:8080/
[23:16:56] Posted data.
[23:16:56] Initial: 43AB; - Successful: assigned to (171.67.108.58).
[23:16:56] + News From Folding@Home: Welcome to Folding@Home
[23:16:56] Loaded queue successfully.
[23:16:56] Connecting to http://171.67.108.58:8080/
[23:16:57] Posted data.
[23:16:58] Initial: 0000; - Receiving payload (expected size: 547205)
[23:16:58] Conversation time very short, giving reduced weight in bandwidth avg
[23:16:58] - Downloaded at ~1068 kB/s
[23:16:58] - Averaged speed for that direction ~637 kB/s
[23:16:58] + Received work.
[23:16:58] + Closed connections
[23:17:03]
[23:17:03] + Processing work unit
[23:17:03] Core required: FahCore_a4.exe
[23:17:03] Core found.
[23:17:03] Working on queue slot 07 [May 17 23:17:03 UTC]
[23:17:03] + Working ...
[23:17:03] - Calling '.\FahCore_a4.exe -dir work/ -suffix 07 -priority 96 -nocpulock -checkpoint 30 -verbose -lifeline 5928 -version 623'

[23:17:05]
[23:17:05] *------------------------------*
[23:17:05] Folding@Home Gromacs GB Core
[23:17:05] Version 2.27 (Dec. 15, 2010)
[23:17:05]
[23:17:05] Preparing to commence simulation
[23:17:05] - Looking at optimizations...
[23:17:05] - Created dyn
[23:17:05] - Files status OK
[23:17:05] - Expanded 546693 -> 1326096 (decompressed 242.5 percent)
[23:17:05] Called DecompressByteArray: compressed_data_size=546693 data_size=1326096, decompressed_data_size=1326096 diff=0
[23:17:05] - Digital signature verified
[23:17:05]
[23:17:05] Project: 8014 (Run 3, Clone 773, Gen 3)
[23:17:05]
[23:17:05] Assembly optimizations on if available.
[23:17:05] Entering M.D.
[23:17:11] Mapping NT from 1 to 1
[23:17:11] Completed 0 out of 250000 steps  (0%)
[23:21:48] ***** Got a SIGTERM signal (2)
[23:21:48] Killing all core threads

Folding@Home Client Shutdown.


I shut down FAH and rebooted my computer, which hopefully will take care of whatever the problem was. In that context, I have two questions:

1. Can anyone tell from the log what happened and why? Note that the three WU repressented two different projects.
2. I believe I did not earn credit for any of these WUs. Can that be confirmed?

That'll teach me to go away for a few days.... Image
========
DrBB1
DrBB1
 
Posts: 159
Joined: Wed Mar 26, 2008 12:30 am
Location: SE PA

Re: Multiple File_IO_Errors

Postby bruce » Fri May 18, 2012 1:01 am

DrBB1 wrote:1. Can anyone tell from the log what happened and why? Note that the three WU repressented two different projects.

Not with any degree of certainty, but I do have a guess that works a lot of the time. Every time the client writes a checkpoint, your AntiVirus program probably checks the data for a sequence of bits that resembles a known virus. If you write enough random sequences, sooner or later the AV may find something it doesn't like -- especially if doesn't have code that's good enough to eliminate false positives. If you AV program messes with FAH's data, it will corrupt the results and one possibility is a file that's the wrong length. There are other possibilities (such as an irrecoverable disk error) but a poor quality AV program is more likely. Configuring your AV program so that it doesn't scan FAH's WORK folder is a reasonable approach.

Search this forum for the name of your AV program and you may learn more.
2. I believe I did not earn credit for any of these WUs. Can that be confirmed?

That's what this message is telling you: Deleting current work unit & continuing...
bruce
 
Posts: 21565
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Multiple File_IO_Errors

Postby DrBB1 » Fri May 18, 2012 3:37 am

bruce-- Thanks for the advice. I checked my AV log (Avast 7 Free); no mention of any unusual activity. That said, I'll fold one more WU; if it errs again, I'll block the FAH Work folder scan.
DrBB1
 
Posts: 159
Joined: Wed Mar 26, 2008 12:30 am
Location: SE PA

Re: Multiple File_IO_Errors

Postby P5-133XL » Fri May 18, 2012 5:53 am

Not AV then I would be checking the system log for file errors and do a chkdsk to make sure the file structure on the disk is OK. As dumb as it seems, make sure you actually have room on the disk. If you are storing the folding data on an SSD, make sure that its firmware is up to date for many SSD's had disconnection problems that has been addressed in recent firmware updates.
Image
P5-133XL
 
Posts: 4034
Joined: Sun Dec 02, 2007 4:36 am
Location: Salem. OR USA

Re: Multiple File_IO_Errors

Postby DrBB1 » Fri May 18, 2012 1:12 pm

Well, latest WU completed and was sent--no problem, so I suspect whatever the problem was fixed by rebooting. Time will tell....

P5-133XL-- I appreciate all your thoughtful suggestions. Will check system log and do a chkdsk; under the circumstances, seems wise just to be on the safe side in case there is still an underlying issue. Disk space no problem at all and not using SSD.
DrBB1
 
Posts: 159
Joined: Wed Mar 26, 2008 12:30 am
Location: SE PA


Return to Windows Classic V6.23 Client

Who is online

Users browsing this forum: No registered users and 1 guest

cron