What really happens when you delete a WU?

Moderators: Site Moderators, FAHC Science Team

Post Reply
Frogging101
Posts: 85
Joined: Wed Mar 25, 2020 2:39 am
Location: Canada

What really happens when you delete a WU?

Post by Frogging101 »

On March 26th, I "dumped" a WU by pausing the slot and deleting the directory /var/lib/fahclient/work/NN where NN was the WU number.

This is what happened when I unpaused:

Code: Select all

04:03:34:FS00:Unpaused
04:03:34:WU00:FS00:Starting
04:03:34:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 705 -lifeline 26801 -checkpoint 15 -np 8
04:03:34:WU00:FS00:Started FahCore on PID 1047
04:03:34:WU00:FS00:Core PID:1051
04:03:34:WU00:FS00:FahCore 0xa7 started
04:03:35:WU00:FS00:0xa7:*********************** Log Started 2020-03-26T04:03:34Z ***********************
04:03:35:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
04:03:35:WU00:FS00:0xa7:       Type: 0xa7
04:03:35:WU00:FS00:0xa7:       Core: Gromacs
04:03:35:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 705 -lifeline 1047 -checkpoint 15 -np 8
04:03:35:WU00:FS00:0xa7:************************************ CBang *************************************
04:03:35:WU00:FS00:0xa7:       Date: Nov 5 2019
04:03:35:WU00:FS00:0xa7:       Time: 06:06:57
04:03:35:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
04:03:35:WU00:FS00:0xa7:     Branch: master
04:03:35:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
04:03:35:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
04:03:35:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
04:03:35:WU00:FS00:0xa7:       Bits: 64
04:03:35:WU00:FS00:0xa7:       Mode: Release
04:03:35:WU00:FS00:0xa7:************************************ System ************************************
04:03:35:WU00:FS00:0xa7:        CPU: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
04:03:35:WU00:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
04:03:35:WU00:FS00:0xa7:       CPUs: 8
04:03:35:WU00:FS00:0xa7:     Memory: 19.48GiB
04:03:35:WU00:FS00:0xa7:Free Memory: 1.24GiB
04:03:35:WU00:FS00:0xa7:    Threads: POSIX_THREADS
04:03:35:WU00:FS00:0xa7: OS Version: 5.5
04:03:35:WU00:FS00:0xa7:Has Battery: false
04:03:35:WU00:FS00:0xa7: On Battery: false
04:03:35:WU00:FS00:0xa7: UTC Offset: -4
04:03:35:WU00:FS00:0xa7:        PID: 1051
04:03:35:WU00:FS00:0xa7:        CWD: /var/lib/fahclient/work
04:03:35:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
04:03:35:WU00:FS00:0xa7:    Version: 0.0.18
04:03:35:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
04:03:35:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
04:03:35:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
04:03:35:WU00:FS00:0xa7:       Date: Nov 5 2019
04:03:35:WU00:FS00:0xa7:       Time: 06:13:26
04:03:35:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
04:03:35:WU00:FS00:0xa7:     Branch: master
04:03:35:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
04:03:35:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
04:03:35:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
04:03:35:WU00:FS00:0xa7:       Bits: 64
04:03:35:WU00:FS00:0xa7:       Mode: Release
04:03:35:WU00:FS00:0xa7:************************************ Build *************************************
04:03:35:WU00:FS00:0xa7:       SIMD: avx_256
04:03:35:WU00:FS00:0xa7:********************************************************************************
04:03:35:WU00:FS00:0xa7:ERROR:Failed to open '../wudata_01.dat': Failed to open '../wudata_01.dat': No such file or directory: iostream error: No such file or directory
04:03:35:WU00:FS00:0xa7:Saving result file ../logfile_01.txt
04:03:35:WU00:FS00:0xa7:Folding@home Core Shutdown: BAD_WORK_UNIT
04:03:35:WARNING:WU00:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
04:03:35:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:14311 run:4 clone:14 gen:23 core:0xa7 unit:0x0000001c0002894b5df2b6f7900cebb5
04:03:35:WU00:FS00:Uploading 4.00KiB to 155.247.166.219
04:03:35:WU00:FS00:Connecting to 155.247.166.219:8080
04:03:35:WU00:FS00:Upload complete
04:03:35:WU00:FS00:Server responded WORK_QUIT (404)
04:03:35:WARNING:WU00:FS00:Server did not like results, dumping
04:03:35:WU00:FS00:Cleaning up
According to this, it successfully uploaded a FAULTY result to the server. Which I think should allow the server to reassign it immediately. But the server responded with WORK_QUIT instead of the usual WORK_ACK that I see for other errors. And the WU Status page does not list any result from my client: https://apps.foldingathome.org/wu#proje ... =14&gen=23

So this would lead me to believe that the server rejected my result, my client turfed the WU, and the WU is dead until it times out. Is that correct? And why did the server not accept the FAULTY result caused by the core error? Is it because the data in the work folder is necessary to authenticate the result?

I know that the official way to abandon a WU is to use the --dump argument to FAHClient. Though this requires any running FAHClient process to be stopped first or you get "Exception: Error executing: 'PRAGMA synchronous=NORMAL': database is locked". And it is not available through --send-command or telnet.

So dumping a WU "properly" with --dump pretty much requires the client (and any other slots running on it) to be stopped. One can avoid disrupting other slots by pausing the offending slot, deleting the WU's work directory, and unpausing it, but this apparently delays the WU until it times out (bad for the project).

I didn't really have a point to make here, it's just a curiosity with the internals for which I could not find much previous discussion.
davidcoton
Posts: 1102
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: What really happens when you delete a WU?

Post by davidcoton »

AIUI WORK_QUIT will trigger immediate reassignment, but will not credit any points (ie, it knows you dumped it). Returning a faulty WU will get you partial credit.
Image
Frogging101
Posts: 85
Joined: Wed Mar 25, 2020 2:39 am
Location: Canada

Re: What really happens when you delete a WU?

Post by Frogging101 »

davidcoton wrote:AIUI WORK_QUIT will trigger immediate reassignment, but will not credit any points (ie, it knows you dumped it). Returning a faulty WU will get you partial credit.
That's the bit I'm not sure about. I think when you dump a WU the official way (with --dump), you still get WORK_ACK and it shows up on the WU status page as "Dumped". It will be for 0 points, but it will still show up. Same with FAULTY results (but not this one, apparently).

After WORK_QUIT the client says "Server did not like results, dumping". To me this implies that the server rejected the result. And I don't mean that it knows there was an error and the WU was not completed; I mean that it rejected and discarded the entire submission containing the error, and the error itself. As far as the server is concerned, that upload was invalid and never happened, and my client still has the WU.
davidcoton
Posts: 1102
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: What really happens when you delete a WU?

Post by davidcoton »

You may be right. I don't know exactly how WORK_QUIT is handled by the server. Dumping by any means is discouraged, but sometimes it is necessary.
Image
Frogging101
Posts: 85
Joined: Wed Mar 25, 2020 2:39 am
Location: Canada

Re: What really happens when you delete a WU?

Post by Frogging101 »

It would be interesting to hear some insight from someone with knowledge of the internals.
davidcoton
Posts: 1102
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: What really happens when you delete a WU?

Post by davidcoton »

You might be waiting a long time. The message is (AFAICT) a recent addition, only one guy knows, and he is rather busy (understatement).
Image
Post Reply