Client suspended work for another task?

Moderators: Site Moderators, FAHC Science Team

Post Reply
mcr42
Posts: 3
Joined: Fri Mar 27, 2020 6:40 am

Client suspended work for another task?

Post by mcr42 »

Hi all,

I have a client running for a few days now.
All working well so far, but for a few days now, I have an unfinished task (14 minutes to go), that the client stopped working on in favor of a new work unit.
As far as I remember, that was before the Timeout was reached, but I'm not certain.

Image


Update:

The log reads:

Code: Select all

******************************* Date: 2020-03-26 *******************************
11:57:05:WU02:FS01:0x22:Completed 1920000 out of 2000000 steps (96%)
12:11:40:WU02:FS01:0x22:Completed 1940000 out of 2000000 steps (97%)
12:26:10:WU02:FS01:0x22:Completed 1960000 out of 2000000 steps (98%)
12:41:02:WU02:FS01:0x22:Completed 1980000 out of 2000000 steps (99%)
12:41:02:WU01:FS01:Connecting to 65.254.110.245:8080
12:41:03:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
...
12:42:04:WU01:FS01:Connecting to 18.218.241.186:80
12:42:04:WU01:FS01:Assigned to work server 140.163.4.231
12:42:04:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GK106 [GeForce GTX 765M] from 140.163.4.231
12:42:04:WU01:FS01:Connecting to 140.163.4.231:8080
12:42:25:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
12:42:25:WU01:FS01:Connecting to 140.163.4.231:80
12:44:46:WU01:FS01:Downloading 7.85MiB
12:44:52:WU01:FS01:Download 30.26%
12:44:58:WU01:FS01:Download 66.10%
12:45:03:WU01:FS01:Download complete
12:45:03:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:11750 run:0 clone:1142 gen:10 core:0x22 unit:0x000000198ca304e75e6a801f85923962

12:54:10:WARNING:WU02:FS01:FahCore returned an unknown error code which probably indicates that it crashed
12:54:10:WARNING:WU02:FS01:FahCore returned: WU_STALLED (127 = 0x7f)

12:54:10:WU01:FS01:Starting
12:54:10:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\mcr\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 705 -lifeline 2152 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
12:54:10:WU01:FS01:Started FahCore on PID 8692
12:54:10:WU01:FS01:Core PID:6080
12:54:10:WU01:FS01:FahCore 0x22 started
12:54:11:WU01:FS01:0x22:*********************** Log Started 2020-03-26T12:54:10Z ***********************
Funny thing is, the client seems to have asked for more work before finishing the job, and the core crashed after that.
It then started working on the new task, so a BSOD is not the cause. (Thus, I_m not pdating the title as proposed.)

However, meanwhile the client finished the newer task, and seems to have returned to the old one.
Update: Yep, confirmed, it finished the old task and published the results.
Last edited by mcr42 on Fri Mar 27, 2020 10:09 am, edited 4 times in total.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Client suspended work for another task?

Post by bruce »

Could this be due to a blue screen reboot? Sure, it might be. I'd go to Log and click Warnings and Errors.

You need to fix whatever is causing the bsod crashes. [Is your system getting too hot? ... is it overclocked? ... is it time to blow the dust out of your cooling air path? ... or is it simply time to upgrade your machine?
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Client suspended work for another task?

Post by bruce »

You're title isn't very discriptive. I'd change it to something like. :shock: BSOD Corrupted a WorkUnit.
mcr42
Posts: 3
Joined: Fri Mar 27, 2020 6:40 am

Re: Client suspended work for another task?

Post by mcr42 »

No no, the BSOD was just an occurrence. I have this Laptop running for years now, without a BSOD (except from the BT/Wifi driver crashing on Suspend/Resume, which I did not trigger since I started FAH).
FAH has been running for more than a week on it now, without any BSOD or other problems. So the BSOD is not the problem.
The Warnings and Errors does show a lot of download errors, and only 2 or 3 times a core crashed.
I reduce Folding power from medium to light now, and watch closely.

The question remains: Should the client fetch new work when there is an unfinished task (that might time out meanwhile) ?
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Client suspended work for another task?

Post by Neil-B »

iirc default setting is that new WU is requested/downloaded when progress reaches 99% … this allows minimal overlap but has the WU ready for when previous one finishes - keeps the CPU/GPU nicely loaded with only a slight/short dip on the meters … for my kit that leaves a window usually less than 60 secs for a crash to occur - on slower kit that may be a few minutes or so (look at your TPF figure) … not sure if it is possible or even advisable to change setting to wait until 100% but seen it done the other way to (say 90% to) cover slow download speeds so might be doable.

As to your question - for me I never have stability issues or crashes so having a WU ready and waiting so my CPUs don't cool off between WUs minimise wear and tear through heat cycling and one TPF (1%) is easily long enough for me to download whilst minimising any risk of failure once new WU received so I am happy to say yes it should happen - from your perspective I can see why you might question otherwise.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
mcr42
Posts: 3
Joined: Fri Mar 27, 2020 6:40 am

Re: Client suspended work for another task?

Post by mcr42 »

Thanks, that's a good explanation, I didn't think of that.
I didn't intend to touch any of the controls (didn't know there are any); I was just curious why this might have happenend.
Thanks.

Accepted answer, Problem solved, Thread can be closed.
Joe_H
Site Admin
Posts: 7870
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Client suspended work for another task?

Post by Joe_H »

Yes, that is a good explanation. The option for download defaults to 99%, and can be set between 90-100%. I would have to look up the exact name as I have not used it recently, being on DSL having an upload overlap with a download from setting it to 100% became a problem when the WU download sizes increased.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Post Reply