Project: 2669 - 2 Units any credit?

Moderators: Site Moderators, FAHC Science Team

Post Reply
Joe_H
Site Admin
Posts: 7878
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Project: 2669 - 2 Units any credit?

Post by Joe_H »

At the end of last week I had two units from 2669 run and no credit showed up for either. One ran to 49%, and aborted with a fatal error. Console message was "Fatal error in MPI_Sendrecv: Error message texts are not available", which was not included in the log file. The second ran to 100%, then ran into an error transmitting the completed unit. I ran qfix and then did a -send all to see if any partial results would go out. Any records on these? In any case, qfix has left a clean queue, I have a new unit running on that machine.

Thanks for any info

1st: Project: 2669 (Run 8, Clone 150, Gen 122)

Code: Select all

--- Opening Log file [April 24 05:05:37 UTC] 


# Mac OS X SMP Console Edition ################################################
###############################################################################

                       Folding@Home Client Version 6.24R1

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /Users/jheimann/Library/Folding@home
Executable: ./fah6
Arguments: -oneunit 

[05:05:37] - Ask before connecting: No
[05:05:37] - User name: Joe_H (Team 38910)
[05:05:37] - User ID: 74A191A36AA6AED3
[05:05:37] - Machine ID: 1
[05:05:37] 
[05:05:37] Loaded queue successfully.
[05:05:37] - Preparing to get new work unit...
[05:05:37] Cleaning up work directory
[05:05:37] + Attempting to get work packet
[05:05:37] - Connecting to assignment server
[05:05:37] - Successful: assigned to (171.64.65.56).
[05:05:37] + News From Folding@Home: Welcome to Folding@Home
[05:05:37] Loaded queue successfully.
[05:06:03] + Closed connections
[05:06:03] 
[05:06:03] + Processing work unit
[05:06:03] At least 4 processors must be requested; read 1.
[05:06:03] Core required: FahCore_a2.exe
[05:06:03] Core found.
[05:06:03] Working on queue slot 06 [April 24 05:06:03 UTC]
[05:06:03] + Working ...
[05:06:03] 
[05:06:03] *------------------------------*
[05:06:03] Folding@Home Gromacs SMP Core
[05:06:03] Version 2.06 (Mon Mar 30 18:46:18 PDT 2009)
[05:06:03] 
[05:06:03] Preparing to commence simulation
[05:06:03] - Ensuring status. Please wait.
[05:06:03] Files status OK
[05:06:03] Need version 207
[05:06:03] Error: Work unit read from disk is invalid
[05:06:03] 
[05:06:03] Folding@home Core Shutdown: CORE_OUTDATED
[05:06:08] CoreStatus = 6E (110)
[05:06:08] + Core out of date. Auto updating...
[05:06:08] - Attempting to download new core...
[05:06:08] + Downloading new core: FahCore_a2.exe
[05:06:09] + 10240 bytes downloaded
[05:06:09] + 20480 bytes downloaded

...

[05:06:14] + 1495040 bytes downloaded
[05:06:14] + 1505280 bytes downloaded
[05:06:14] + 1512001 bytes downloaded
[05:06:14] Verifying core Core_a2.fah...
[05:06:14] Signature is VALID
[05:06:14] 
[05:06:14] Trying to unzip core FahCore_a2.exe
[05:06:15] Decompressed FahCore_a2.exe (4631828 bytes) successfully
[05:06:15] + Core successfully engaged
[05:06:20] 
[05:06:20] + Processing work unit
[05:06:20] At least 4 processors must be requested; read 1.
[05:06:20] Core required: FahCore_a2.exe
[05:06:20] Core found.
[05:06:20] Working on queue slot 06 [April 24 05:06:20 UTC]
[05:06:20] + Working ...
[05:06:20] 
[05:06:20] *------------------------------*
[05:06:20] Folding@Home Gromacs SMP Core
[05:06:20] Version 2.07 (Sun Apr 19 14:29:51 PDT 2009)
[05:06:20] 
[05:06:20] Preparing to commence simulation
[05:06:20] - Ensuring status. Please wait.
[05:06:29] - Looking at optimizations...
[05:06:29] - Working with standard loops on this execution.
[05:06:29] - Files status OK
[05:06:32] - Expanded 4829574 -> 23976217 (decompressed 496.4 percent)
[05:06:32] Called DecompressByteArray: compressed_data_size=4829574 data_size=23976217, decompressed_data_size=23976217 diff=0
[05:06:32] - Digital signature verified
[05:06:32] 
[05:06:32] Project: 2669 (Run 8, Clone 150, Gen 122)
[05:06:32] 
[05:06:32] Entering M.D.
[05:06:43] Completed 0 out of 250000 steps  (0%)
[05:21:16] Completed 2500 out of 250000 steps  (1%)
[05:35:47] Completed 5000 out of 250000 steps  (2%)
[05:50:16] Completed 7500 out of 250000 steps  (3%)

...

[16:43:17] Completed 120000 out of 250000 steps  (48%)
[16:57:49] Completed 122500 out of 250000 steps  (49%)
[17:08:58] CoreStatus = 1 (1)
[17:08:58] Sending work to server
[17:08:58] Project: 2669 (Run 8, Clone 150, Gen 122)
[17:08:58] - Error: Could not get length of results file work/wuresults_06.dat
[17:08:58] - Error: Could not read unit 06 file. Removing from queue.
[17:08:58] + -oneunit flag given and have now finished a unit. Exiting.- Preparing to get new work unit...
[17:08:58] Cleaning up work directory

Folding@Home Client Shutdown.
2nd: Project: 2669 (Run 9, Clone 109, Gen 75)

Code: Select all

--- Opening Log file [April 24 17:45:07 UTC] 


# Mac OS X SMP Console Edition ################################################
###############################################################################

                       Folding@Home Client Version 6.24R1

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /Users/jheimann/Library/Folding@home
Executable: ./fah6
Arguments: -queue 

[17:45:07] - Ask before connecting: No
[17:45:07] - User name: Joe_H (Team 38910)
[17:45:07] - User ID: 74A191A36AA6AED3
[17:45:07] - Machine ID: 1
[17:45:07] 
[17:45:07] Loaded queue successfully.
[17:45:07] 
[17:45:07] + Processing work unit
[17:45:07] At least 4 processors must be requested; read 1.
[17:45:07] Core required: FahCore_a2.exe
[17:45:07] Core found.
[17:45:07] Working on queue slot 07 [April 24 17:45:07 UTC]
[17:45:07] + Working ...
[17:45:07] 
[17:45:07] *------------------------------*
[17:45:07] Folding@Home Gromacs SMP Core
[17:45:07] Version 2.07 (Sun Apr 19 14:29:51 PDT 2009)
[17:45:07] 
[17:45:07] Preparing to commence simulation
[17:45:07] - Ensuring status. Please wait.
[17:45:17] - Looking at optimizations...
[17:45:17] - Working with standard loops on this execution.
[17:45:17] - Files status OK
[17:45:20] - Expanded 4839370 -> 23982265 (decompressed 495.5 percent)
[17:45:20] Called DecompressByteArray: compressed_data_size=4839370 data_size=23982265, decompressed_data_size=23982265 diff=0
[17:45:21] - Digital signature verified
[17:45:21] 
[17:45:21] Project: 2669 (Run 9, Clone 109, Gen 75)
[17:45:21] 
[17:45:22] Entering M.D.
[17:45:28] Using Gromacs checkpoints
[17:45:35] Resuming from checkpoint
[17:45:36] Verified work/wudata_07.log
[17:45:36] Verified work/wudata_07.trr
[17:45:36] Verified work/wudata_07.xtc
[17:45:36] Verified work/wudata_07.edr
[17:46:10] Completed 2500 out of 250000 steps  (1%)
[18:00:39] Completed 5000 out of 250000 steps  (2%)
[18:15:11] Completed 7500 out of 250000 steps  (3%)

...

[17:12:59] Completed 245000 out of 250000 steps  (98%)
[17:27:28] Completed 247500 out of 250000 steps  (99%)
[17:41:58] Completed 250000 out of 250000 steps  (100%)
[17:42:00] DynamicWrapper: Finished Work Unit: sleep=10000
[17:42:10] 
[17:42:10] Finished Work Unit:
[17:42:10] - Reading up to 21122496 from "work/wudata_07.trr": Read 21122496
[17:42:10] trr file hash check passed.
[17:42:10] - Reading up to 4392612 from "work/wudata_07.xtc": Read 4392612
[17:42:10] xtc file hash check passed.
[17:42:10] - Checksum of file (work/wudata_07.edr) read from disk doesn't match
[17:42:10] 
[17:42:10] Folding@home Core Shutdown: FILE_IO_ERROR
[17:45:35] CoreStatus = 64 (100)
[17:45:35] Sending work to server
[17:45:35] Project: 2669 (Run 9, Clone 109, Gen 75)
[17:45:35] - Error: Could not get length of results file work/wuresults_07.dat
[17:45:35] - Error: Could not read unit 07 file. Removing from queue.
[17:45:35] + -oneunit flag given and have now finished a unit. Exiting.- Preparing to get new work unit...
[17:45:35] Cleaning up work directory

Folding@Home Client Shutdown.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
toTOW
Site Moderator
Posts: 6312
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Project: 2669 - 2 Units any credit?

Post by toTOW »

Project: 2669 (Run 8, Clone 150, Gen 122)
Project: 2669 (Run 9, Clone 109, Gen 75)

Both WU have been completed successfully by other donors.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Joe_H
Site Admin
Posts: 7878
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Project: 2669 - 2 Units any credit?

Post by Joe_H »

Oh well, that is what I thought might happen. That kills my point accumulation for last week and hopefully the queue is clean now on that machine. Will see in another 12 hours or so.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 2669 - 2 Units any credit?

Post by bruce »

Is your machine unstable? Have you looked into any procedures which might avoid errors like this in the future? It's hard to see a connection between the two errors except if something other that FAH is manipulating the work files. Perhaps you have an antivirus program that is quarantining some of FAH's files. NOD32 has been reported guilty of that. (It might be fixed in the latest version.) It's also pretty easy to exclude the FAH work directory from virus scans and that applies to any AV program.
Joe_H wrote:[16:43:17] Completed 120000 out of 250000 steps (48%)
[16:57:49] Completed 122500 out of 250000 steps (49%)
[17:08:58] CoreStatus = 1 (1)
[17:08:58] Sending work to server
[17:08:58] Project: 2669 (Run 8, Clone 150, Gen 122)
[17:08:58] - Error: Could not get length of results file work/wuresults_06.dat
[17:08:58] - Error: Could not read unit 06 file. Removing from queue.
[17:08:58] + -oneunit flag given and have now finished a unit. Exiting.- Preparing to get new work unit...
[17:08:58] Cleaning up work directory

Folding@Home Client Shutdown.
[17:42:10] - Checksum of file (work/wudata_07.edr) read from disk doesn't match
[17:42:10]
[17:42:10] Folding@home Core Shutdown: FILE_IO_ERROR
[17:45:35] CoreStatus = 64 (100)
[17:45:35] Sending work to server
[17:45:35] Project: 2669 (Run 9, Clone 109, Gen 75)
[17:45:35] - Error: Could not get length of results file work/wuresults_07.dat
Joe_H
Site Admin
Posts: 7878
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Project: 2669 - 2 Units any credit?

Post by Joe_H »

This machine has been stable up until now running the 6.20 client for dozens of units, but I recently uninstalled that and installed the 6.24 client. NOD32 is not an issue, it is a Mac system running OS X. No AV software currently running, and I have done some other checks to see that the system has not been compromised with any malware. I suspect the first unit failing the way it did may have corrupted the work queue and caused problems when the second unit was started 12 hours later and finished in 24 hours.

I will know more in another 30-45 minutes, the current unit running is at 98% and should finish then. Running qfix does appear to have cleaned up the queue on this machine. As for the rest, I am still learning all the ins and outs of the version 6 console client, up until about 6 months ago I was just running version 5 on a couple older PowerPC Macs. Some of the postings in the forums here have been very helpful in getting things fixed and running. While I have been learning, I have been limiting the client on this Intel mac to runs with the "-oneunit" flag.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Post Reply