16434 (run:232 clone:3 gen:13) WORK_QUIT

Moderators: Site Moderators, FAHC Science Team

Post Reply
legalalien
Posts: 4
Joined: Fri May 15, 2020 3:43 am

16434 (run:232 clone:3 gen:13) WORK_QUIT

Post by legalalien »

Just received Server responded WORK_QUIT (404) / Server did not like results, dumping .. there goes several days of GPU work :( The GPU is EVGA Nvidia GT 710 2GB, no overclocking.

I suspect it may have something to do with CORE_OUTDATED received in the middle of uploading the results:

**

Code: Select all

***************************** Date: 2020-05-15 *******************************
00:52:48:WU02:FS01:0x22:Completed 2475000 out of 2500000 steps (99%)
00:52:49:WU01:FS01:Connecting to 65.254.110.245:80
00:52:49:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:80': No WUs available for this configuration
00:52:49:WU01:FS01:Connecting to 18.218.241.186:80
00:52:50:WU01:FS01:Assigned to work server 206.223.170.146
00:52:50:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GK208 [GeForce GT 710 LP] from 206.223.170.146
00:52:50:WU01:FS01:Connecting to 206.223.170.146:8080
00:52:59:WU01:FS01:Downloading 35.98MiB
00:53:05:WU01:FS01:Download 44.64%
00:53:11:WU01:FS01:Download 79.73%
00:53:13:WU01:FS01:Download complete
00:53:13:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:14251 run:114 clone:0 gen:11 core:0x22 unit:0x00000013cedfaa925eab0323e1367802
02:18:53:WU02:FS01:0x22:Completed 2500000 out of 2500000 steps (100%)
02:19:26:WU02:FS01:0x22:Saving result file ../logfile_01.txt
02:19:26:WU02:FS01:0x22:Saving result file checkpointState.xml
02:19:26:WU02:FS01:0x22:Saving result file checkpt.crc
02:19:26:WU02:FS01:0x22:Saving result file positions.xtc
02:19:27:WU02:FS01:0x22:Saving result file science.log
02:19:27:WU02:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
02:19:27:WU02:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
02:19:27:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:16434 run:232 clone:3 gen:13 core:0x22 unit:0x0000001503854c135e9cbacc8fa1f2ea
02:19:27:WU02:FS01:Uploading 104.38MiB to 3.133.76.19
02:19:27:WU02:FS01:Connecting to 3.133.76.19:8080
02:19:27:WU01:FS01:Starting
02:19:27:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 982 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0 -tmax=80 -twait=900
02:19:27:WU01:FS01:Started FahCore on PID 18857
02:19:27:WU01:FS01:Core PID:18861
02:19:27:WU01:FS01:FahCore 0x22 started
02:19:28:WARNING:WU01:FS01:FahCore returned: CORE_OUTDATED (110 = 0x6e)
02:19:28:WU01:FS01:Downloading core from http://cores.foldingathome.org/v7/lin/64bit/Core_22.fah
02:19:28:WU01:FS01:Connecting to cores.foldingathome.org:80
02:19:29:WU01:FS01:FahCore 22: Downloading 3.59MiB
02:19:29:WU01:FS01:FahCore 22: Download complete
02:19:29:WU01:FS01:Valid core signature
02:19:29:WU01:FS01:Unpacked 9.32MiB to cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22
02:19:30:WU01:FS01:Starting
02:19:30:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 982 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0 -tmax=80 -twait=900
02:19:30:WU01:FS01:Started FahCore on PID 18864
02:19:30:WU01:FS01:Core PID:18868
02:19:30:WU01:FS01:FahCore 0x22 started
02:19:30:WU01:FS01:0x22:*********************** Log Started 2020-05-15T02:19:30Z ***********************
02:19:30:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
02:19:30:WU01:FS01:0x22:       Type: 0x22
02:19:30:WU01:FS01:0x22:       Core: Core22
02:19:30:WU01:FS01:0x22:    Website: https://foldingathome.org/
02:19:30:WU01:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
02:19:30:WU01:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
02:19:30:WU01:FS01:0x22:             <rafal.wiewiora@choderalab.org>
02:19:30:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 706 -lifeline 18864 -checkpoint 15
02:19:30:WU01:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
02:19:30:WU01:FS01:0x22:             0 -gpu 0 -tmax=80 -twait=900
02:19:30:WU01:FS01:0x22:     Config: <none>
02:19:30:WU01:FS01:0x22:************************************ Build *************************************
02:19:30:WU01:FS01:0x22:    Version: 0.0.5
02:19:30:WU01:FS01:0x22:       Date: Apr 22 2020
02:19:30:WU01:FS01:0x22:       Time: 03:57:11
02:19:30:WU01:FS01:0x22: Repository: Git
02:19:30:WU01:FS01:0x22:   Revision: 2d69202c898bd9bb3e093f51cd32bf411c2a0388
02:19:30:WU01:FS01:0x22:     Branch: HEAD
02:19:30:WU01:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
02:19:30:WU01:FS01:0x22:    Options: -std=c++11 -O3 -funroll-loops
02:19:30:WU01:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
02:19:30:WU01:FS01:0x22:       Bits: 64
02:19:30:WU01:FS01:0x22:       Mode: Release
02:19:30:WU01:FS01:0x22:************************************ System ************************************
02:19:30:WU01:FS01:0x22:        CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 5600+
02:19:30:WU01:FS01:0x22:     CPU ID: AuthenticAMD Family 15 Model 67 Stepping 3
02:19:30:WU01:FS01:0x22:       CPUs: 2
02:19:30:WU01:FS01:0x22:     Memory: 7.77GiB
02:19:30:WU01:FS01:0x22:Free Memory: 3.36GiB
02:19:30:WU01:FS01:0x22:    Threads: POSIX_THREADS
02:19:30:WU01:FS01:0x22: OS Version: 5.4
02:19:30:WU01:FS01:0x22:Has Battery: false
02:19:30:WU01:FS01:0x22: On Battery: false
02:19:30:WU01:FS01:0x22: UTC Offset: -5
02:19:30:WU01:FS01:0x22:        PID: 18868
02:19:30:WU01:FS01:0x22:        CWD: /var/lib/fahclient/work
02:19:30:WU01:FS01:0x22:         OS: Linux 5.4.0-29-generic x86_64
02:19:30:WU01:FS01:0x22:    OS Arch: AMD64
02:19:30:WU01:FS01:0x22:********************************************************************************
02:19:30:WU01:FS01:0x22:Project: 14251 (Run 114, Clone 0, Gen 11)
02:19:30:WU01:FS01:0x22:Unit: 0x00000013cedfaa925eab0323e1367802
02:19:30:WU01:FS01:0x22:Reading tar file core.xml
02:19:30:WU01:FS01:0x22:Reading tar file integrator.xml
02:19:30:WU01:FS01:0x22:Reading tar file state.xml
02:19:32:WU01:FS01:0x22:Reading tar file system.xml
02:19:33:WU01:FS01:0x22:Digital signatures verified
02:19:33:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
02:19:33:WU01:FS01:0x22:Version 0.0.5
02:21:38:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
02:21:38:WU02:FS01:Connecting to 3.133.76.19:80
02:21:45:WU02:FS01:Upload 0.06%
02:22:35:WU01:FS01:0x22:Completed 0 out of 500000 steps (0%)
02:22:36:WU01:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
02:22:43:WU02:FS01:Upload 0.12%
02:22:44:WARNING:WU02:FS01:Exception: Failed to send results to work server: Transfer failed
02:22:44:WU02:FS01:Trying to send results to collection server
02:22:44:WU02:FS01:Uploading 104.38MiB to 3.21.157.11
02:22:44:WU02:FS01:Connecting to 3.21.157.11:8080
02:22:50:WU02:FS01:Upload 11.50%
02:22:56:WU02:FS01:Upload 22.33%
02:23:02:WU02:FS01:Upload 32.33%
02:23:08:WU02:FS01:Upload 42.21%
02:23:14:WU02:FS01:Upload 52.57%
02:23:20:WU02:FS01:Upload 63.41%
02:23:26:WU02:FS01:Upload 75.14%
02:23:32:WU02:FS01:Upload 86.40%
02:23:38:WU02:FS01:Upload 98.97%
02:23:39:WU02:FS01:Upload complete
02:23:39:WU02:FS01:Server responded WORK_QUIT (404)
02:23:39:WARNING:WU02:FS01:Server did not like results, dumping
02:23:39:WU02:FS01:Cleaning up
Thoughts?
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 16434 (run:232 clone:3 gen:13) WORK_QUIT

Post by bruce »

No. The CORE_OUTDATED event had nothing to do with the server's rejection of the WU.

You're talking about project:16434 run:232 clone:3 gen:13 which was also identified as WU02:FS01. It was rejected by the Collection Server.

Perhaps it had expired You would have to look back in the previous logs to find when that WU was assigned and downloaded.

I happen to have personal experience with a GT710 which is a very slow GPU. It is very difficult to complete assignments before they expire.
legalalien
Posts: 4
Joined: Fri May 15, 2020 3:43 am

Re: 16434 (run:232 clone:3 gen:13) WORK_QUIT

Post by legalalien »

The WU was definitely completed before the deadline; I've been watching it very carefully.

ETA: Yes, I realize that the core was updated when the next WU started; I was wondering if core restarting in the middle of the completed unit submission had some effect on the submission itself.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 16434 (run:232 clone:3 gen:13) WORK_QUIT

Post by bruce »

When a WU is not returned quickly, duplicates are issued. Assuming your result was error-free, it should still have been accepted, even if a duplicate was completed before you uploaded your result. I'll have to check to see if that happened.

According to the public records, Two people returned it (not counting you). One had an error and the other completed it successfully.

Your WU as rejected at 02:23:39 ... presumably that on 2020-05-15
2020-05-15T02:23:39
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 16434 (run:232 clone:3 gen:13) WORK_QUIT

Post by bruce »

The successful WU as issued at 2020-05-09T06:28:32 and returned at 2020-05-09T18:18:46 and was credited at 2020-05-09T18:25:42
Your WU as rejected at 2020-05-15T02:23:39
Unfortunately I can't tell when your WU was assigned to you. Can you find that date/time?
legalalien
Posts: 4
Joined: Fri May 15, 2020 3:43 am

Re: 16434 (run:232 clone:3 gen:13) WORK_QUIT

Post by legalalien »

bruce wrote:The successful WU as issued at 2020-05-09T06:28:32 and returned at 2020-05-09T18:18:46 and was credited at 2020-05-09T18:25:42
Your WU as rejected at 2020-05-15T02:23:39
Unfortunately I can't tell when your WU was assigned to you. Can you find that date/time?
Back on May 8, it seems:

Code: Select all

******************************* Date: 2020-05-08 *******************************
18:37:35:WU01:FS01:0x22:Completed 960000 out of 1000000 steps (96%)
19:27:11:WU01:FS01:0x22:Completed 970000 out of 1000000 steps (97%)
20:16:56:WU01:FS01:0x22:Completed 980000 out of 1000000 steps (98%)
20:18:44:WU00:FS00:0xa7:Completed 975000 out of 1250000 steps (78%)
21:05:50:WU01:FS01:0x22:Completed 990000 out of 1000000 steps (99%)
21:05:50:WU02:FS01:Connecting to 65.254.110.245:80
[93m21:05:50:WARNING:WU02:FS01:Failed to get assignment from '65.254.110.245:80': No WUs available for this configuration[0m
21:05:50:WU02:FS01:Connecting to 18.218.241.186:80
21:05:51:WU02:FS01:Assigned to work server 3.133.76.19
21:05:51:WU02:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GK208 [GeForce GT 710 LP] from 3.133.76.19
21:05:51:WU02:FS01:Connecting to 3.133.76.19:8080
21:07:08:WU02:FS01:Downloading 67.19MiB
21:07:14:WU02:FS01:Download 60.00%
21:07:17:WU02:FS01:Download complete
21:07:17:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:16434 run:232 clone:3 gen:13 core:0x22 unit:0x0000001503854c135e9cbacc8fa1f2ea
Like you said earlier, GT 710 is a slow card by today's standards, but I was watching the progress very closely and the unit finished about 20 hours ahead of the expiration deadline (taking into account that the times are in UTC). It usually doesn't cut it *that* close, but indeed struggles to complete units before the "timeout".

<off-topic>
According to the public records, Two people returned it (not counting you). One had an error and the other completed it successfully.
This is an old Linux box that I had resurrected to fold, with the idea of it sitting quietly in a corner and not being used for anything else other than looking for a possible cure. Based on the above, even if my client returns complete WUs, it is likely that someone else has already finished the same WU. I'll have to rethink whether it's a rational use of electricity, as much as I would like to help.

I'm sure this question has been asked before .. will go do some soul-googling with the morning coffee.
</off-topic>
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: 16434 (run:232 clone:3 gen:13) WORK_QUIT

Post by _r2w_ben »

legalalien wrote:This is an old Linux box that I had resurrected to fold, with the idea of it sitting quietly in a corner and not being used for anything else other than looking for a possible cure. Based on the above, even if my client returns complete WUs, it is likely that someone else has already finished the same WU. I'll have to rethink whether it's a rational use of electricity, as much as I would like to help.
A lot of COVID19 projects run on the CPU so you can still contribute. Removing the GPU slot would let the CPU slot use both cores.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 16434 (run:232 clone:3 gen:13) WORK_QUIT

Post by bruce »

I have a veriety of GPUs, including one GT710. (It's the only GPU that will fit in that slot.) FAH is doing some WU/GPU optimization studies and they are distributing some small proteins to slow GPUs. (They haven't said anything about the proposed deadlines, though.) I hope they do manage to send my GT710 WUs where it can do some good and avoid burdening it with big WU where it can't help.) I'll be very happy to have the GPU idle a lot of the time, just displaying my desktop the rest of the time.
JohnChodera
Pande Group Member
Posts: 470
Joined: Fri Feb 22, 2013 9:59 pm

Re: 16434 (run:232 clone:3 gen:13) WORK_QUIT

Post by JohnChodera »

Oh no! I'm so sorry this happened.

> I was wondering if core restarting in the middle of the completed unit submission had some effect on the submission itself.

I don't think this should impact things.

We're about to roll out a bunch of COVID Moonshot projects that work well for older GPUs and run in just a few hours. Hopefully this will help---we very much value these contributions!

~ John Chodera // MSKCC
legalalien
Posts: 4
Joined: Fri May 15, 2020 3:43 am

Re: 16434 (run:232 clone:3 gen:13) WORK_QUIT

Post by legalalien »

JohnChodera wrote:Oh no! I'm so sorry this happened.

We're about to roll out a bunch of COVID Moonshot projects that work well for older GPUs and run in just a few hours. Hopefully this will help---we very much value these contributions!

~ John Chodera // MSKCC
Revisiting the old subject, in case someone runs into this thread...not sure if the new (smaller) projects have been released, or if the new '13' core with support for CUDA is making a difference, but now my old GT710 is cranking through WUs in 3 hours or less, well before the timeout deadline. :D
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: 16434 (run:232 clone:3 gen:13) WORK_QUIT

Post by PantherX »

The CUDA optimizations on Nvidia GPUs does provide a decent speed-up from anywhere of ~15% to 100% depending on the simulation type. Traditional simulations are towards the ~15% improvement while the free energy calculations (primarily use in Moonshot Projects) are towards the 100% range. For more details, you can read the blog post: https://foldingathome.org/2020/09/28/fo ... a-support/
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Post Reply