Page 1 of 2

ETA 10 days on RX550: Project 13416 run 1140 clone 293 gen 1

Posted: Mon Jul 20, 2020 1:12 pm
by GregC
This WU is estimated to finish in 10 days.

https://apps.foldingathome.org/wu#proje ... =293&gen=1

I have stable hardware, but saved off and gave up on the WU after 6%.
TPF is a reliable 2 hours and 22 minutes for each percent marker.

I'm re-posting from Discord, extended discussion with screen captures and thoughts:
https://discordapp.com/channels/5738706 ... 4667077644

There's a more serious issue with the way I'm allowed to drop WUs with server being none the wiser, discussion in that thread.

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Posted: Mon Jul 20, 2020 1:31 pm
by Neil-B
The Work Server will queue any WU not returned by the Timeout for reassignment ... People dump WUs for a variety of reasons - some more valid than others - The reissue at Timeout approach is what FAH has used to date - If dumping becomes a major issue then that may change.

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Posted: Tue Jul 21, 2020 2:21 pm
by bruce
Dumped WU should also be reported automatically, depending on how you dump it.

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Posted: Tue Jul 21, 2020 11:22 pm
by GregC
So how should I dump this WU then?

Code: Select all

10:52:42:WU00:FS01:Starting
10:52:42:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\GregC\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.11/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 7664 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
10:52:42:WU00:FS01:Started FahCore on PID 5604
10:52:42:WU00:FS01:Core PID:4304
10:52:42:WU00:FS01:FahCore 0x22 started
10:52:43:WU00:FS01:0x22:*********************** Log Started 2020-07-21T10:52:42Z ***********************
10:52:43:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
10:52:43:WU00:FS01:0x22:       Core: Core22
10:52:43:WU00:FS01:0x22:       Type: 0x22
10:52:43:WU00:FS01:0x22:    Version: 0.0.11
10:52:43:WU00:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
10:52:43:WU00:FS01:0x22:  Copyright: 2020 foldingathome.org
10:52:43:WU00:FS01:0x22:   Homepage: https://foldingathome.org/
10:52:43:WU00:FS01:0x22:       Date: Jun 26 2020
10:52:43:WU00:FS01:0x22:       Time: 19:49:16
10:52:43:WU00:FS01:0x22:   Revision: 22010df8a4db48db1b35d33e666b64d8ce48689d
10:52:43:WU00:FS01:0x22:     Branch: core22-0.0.11
10:52:43:WU00:FS01:0x22:   Compiler: Visual C++ 2015
10:52:43:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
10:52:43:WU00:FS01:0x22:   Platform: win32 10
10:52:43:WU00:FS01:0x22:       Bits: 64
10:52:43:WU00:FS01:0x22:       Mode: Release
10:52:43:WU00:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
10:52:43:WU00:FS01:0x22:             <peastman@stanford.edu>
10:52:43:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 5604 -checkpoint 15
10:52:43:WU00:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
10:52:43:WU00:FS01:0x22:************************************ libFAH ************************************
10:52:43:WU00:FS01:0x22:       Date: Jun 26 2020
10:52:43:WU00:FS01:0x22:       Time: 19:47:12
10:52:43:WU00:FS01:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
10:52:43:WU00:FS01:0x22:     Branch: HEAD
10:52:43:WU00:FS01:0x22:   Compiler: Visual C++ 2015
10:52:43:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
10:52:43:WU00:FS01:0x22:   Platform: win32 10
10:52:43:WU00:FS01:0x22:       Bits: 64
10:52:43:WU00:FS01:0x22:       Mode: Release
10:52:43:WU00:FS01:0x22:************************************ CBang *************************************
10:52:43:WU00:FS01:0x22:       Date: Jun 26 2020
10:52:43:WU00:FS01:0x22:       Time: 19:46:11
10:52:43:WU00:FS01:0x22:   Revision: f8529962055b0e7bde23e429f5072ff758089dee
10:52:43:WU00:FS01:0x22:     Branch: master
10:52:43:WU00:FS01:0x22:   Compiler: Visual C++ 2015
10:52:43:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
10:52:43:WU00:FS01:0x22:   Platform: win32 10
10:52:43:WU00:FS01:0x22:       Bits: 64
10:52:43:WU00:FS01:0x22:       Mode: Release
10:52:43:WU00:FS01:0x22:************************************ System ************************************
10:52:43:WU00:FS01:0x22:        CPU: AMD Ryzen 3 3100 4-Core Processor
10:52:43:WU00:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
10:52:43:WU00:FS01:0x22:       CPUs: 8
10:52:43:WU00:FS01:0x22:     Memory: 15.93GiB
10:52:43:WU00:FS01:0x22:Free Memory: 13.80GiB
10:52:43:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
10:52:43:WU00:FS01:0x22: OS Version: 6.2
10:52:43:WU00:FS01:0x22:Has Battery: false
10:52:43:WU00:FS01:0x22: On Battery: false
10:52:43:WU00:FS01:0x22: UTC Offset: -5
10:52:43:WU00:FS01:0x22:        PID: 4304
10:52:43:WU00:FS01:0x22:        CWD: C:\Users\GregC\AppData\Roaming\FAHClient\work
10:52:43:WU00:FS01:0x22:********************************************************************************
10:52:43:WU00:FS01:0x22:Project: 13418 (Run 382, Clone 17, Gen 3)
10:52:43:WU00:FS01:0x22:Unit: 0x0000000512bc7d9a5f128297cf05ec49
10:52:43:WU00:FS01:0x22:Digital signatures verified
10:52:43:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
10:52:43:WU00:FS01:0x22:Version 0.0.11
10:52:43:WU00:FS01:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
10:52:43:WU00:FS01:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
10:52:43:WU00:FS01:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
10:52:43:WU00:FS01:0x22:  Global context and integrator variables write interval: 25000 steps (2.5%) [40 total]
10:53:07:WU00:FS01:0x22:Completed 0 out of 1000000 steps (0%)
15:17:21:WU00:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
******************************* Date: 2020-07-21 *******************************
18:27:35:WU00:FS01:0x22:Completed 20000 out of 1000000 steps (2%)
15:17:21:WU00:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
******************************* Date: 2020-07-21 *******************************
18:27:35:WU00:FS01:0x22:Completed 20000 out of 1000000 steps (2%)
21:36:54:WU00:FS01:0x22:Completed 30000 out of 1000000 steps (3%)
******************************* Date: 2020-07-21 *******************************

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Posted: Wed Jul 22, 2020 1:37 am
by bruce
You're running Project: 13418 (Run 382, Clone 17, Gen 3), not Project 13416 run 1140 clone 293 gen 1. Which WU do you need to dump?

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Posted: Wed Jul 22, 2020 2:44 am
by GregC
I need to dump 13418 (Run 382, Clone 17, Gen 3). The 13416/1140/293/1 was dumped a little while back.

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Posted: Wed Jul 22, 2020 2:48 am
by bruce
How much progress have you made while FAHClient is estimating 10 days? That estimate is VERY poor until you've completed the first 5% or more.

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Posted: Wed Jul 22, 2020 2:53 am
by GregC
Here's what it looks like from GPU-Z, GPU load perspective.
The nearly-no-load on left represent the WU I just dumpt.
The all-the-way-load on right represents the newly downloaded WU.
http://gpuz.techpowerup.com/20/07/22/qdp.png

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Posted: Wed Jul 22, 2020 12:20 pm
by GregC
As I've mentioned, I've retained the WU folders, willing to assist in debugging the issue.

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Posted: Wed Jul 22, 2020 3:56 pm
by bruce
The new WU seems to be working well. I'm not sure you need help, at this point ... it looks like you solved it.

If WU00 is still present on your system, pause folding. Open FAHData and inside of /work delete 00. Resume folding and it should take care of itself.

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Posted: Wed Jul 22, 2020 5:17 pm
by GregC
The point of this posting is to highlight the simple fact that there are many WUs that leave me no choice but to drop them. This gives no opportunity for developers to address the issue, as my dropping the WUs doesn't register on the WU Status page. I now have a few such WUs sitting on my desktop, awaiting action from devs. And no, I didn't delete them.

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Posted: Thu Jul 23, 2020 1:12 am
by _r2w_ben
Are you running CPU folding as well? Some work slower 1341x work units use more CPU time on AMD GPUs. Increasing the priority of FahCore_22.exe or reducing the number of cores allocated to a CPU slot may improve performance.

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Posted: Thu Jul 23, 2020 2:06 am
by GregC
I have FAHCore_A22 at below normal priority, and FAHCore_A07 at idle. Only 6 of 8 threads are assigned.

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Posted: Thu Jul 23, 2020 2:11 am
by GregC
This GPU slot can and does complete WUs. It just finished a WU.

Code: Select all

22:56:39:WU00:FS01:0x22:Completed 830000 out of 1000000 steps (83%)
******************************* Date: 2020-07-22 *******************************
etc.etc.
01:46:22:WU00:FS01:0x22:Completed 980000 out of 1000000 steps (98%)
01:57:38:WU00:FS01:0x22:Completed 990000 out of 1000000 steps (99%)
01:57:39:WU02:FS01:Connecting to assign1.foldingathome.org:80
01:57:39:WU02:FS01:Assigned to work server 18.188.125.154
01:57:39:WU02:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:Baffin [Polaris11] from 18.188.125.154
01:57:39:WU02:FS01:Connecting to 18.188.125.154:8080
01:57:39:WU02:FS01:Downloading 7.04MiB
01:57:41:WU02:FS01:Download complete
01:57:41:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:13416 run:618 clone:285 gen:1 core:0x22 unit:0x0000000212bc7d9a5f0f8f5cb2de1d31
02:08:53:WU00:FS01:0x22:Completed 1000000 out of 1000000 steps (100%)
02:08:53:WU00:FS01:0x22:Average performance: 25.6 ns/day
02:08:59:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
02:08:59:WU00:FS01:0x22:Saving result file checkpointState.xml.bz2
02:08:59:WU00:FS01:0x22:Saving result file globals.csv
02:08:59:WU00:FS01:0x22:Saving result file positions.xtc
02:08:59:WU00:FS01:0x22:Saving result file science.log
02:08:59:WU00:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
02:09:00:WU00:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
02:09:00:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:13418 run:730 clone:75 gen:3 core:0x22 unit:0x0000000812bc7d9a5f12828ba73e2d0a
02:09:00:WU00:FS01:Uploading 5.69MiB to 18.188.125.154
02:09:00:WU00:FS01:Connecting to 18.188.125.154:8080
02:09:00:WU02:FS01:Starting
02:09:00:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\GregC\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.11/Core_22.fah/FahCore_22.exe -dir 02 -suffix 01 -version 706 -lifeline 6984 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
02:09:00:WU02:FS01:Started FahCore on PID 1408
02:09:00:WU02:FS01:Core PID:5164
02:09:00:WU02:FS01:FahCore 0x22 started
02:09:00:WU02:FS01:0x22:*********************** Log Started 2020-07-23T02:09:00Z ***********************
02:09:00:WU02:FS01:0x22:*************************** Core22 Folding@home Core ***************************
02:09:00:WU02:FS01:0x22:       Core: Core22
02:09:00:WU02:FS01:0x22:       Type: 0x22
02:09:00:WU02:FS01:0x22:    Version: 0.0.11
02:09:00:WU02:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
02:09:00:WU02:FS01:0x22:  Copyright: 2020 foldingathome.org
02:09:00:WU02:FS01:0x22:   Homepage: https://foldingathome.org/
02:09:00:WU02:FS01:0x22:       Date: Jun 26 2020
02:09:00:WU02:FS01:0x22:       Time: 19:49:16
02:09:00:WU02:FS01:0x22:   Revision: 22010df8a4db48db1b35d33e666b64d8ce48689d
02:09:00:WU02:FS01:0x22:     Branch: core22-0.0.11
02:09:00:WU02:FS01:0x22:   Compiler: Visual C++ 2015
02:09:00:WU02:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
02:09:00:WU02:FS01:0x22:   Platform: win32 10
02:09:00:WU02:FS01:0x22:       Bits: 64
02:09:00:WU02:FS01:0x22:       Mode: Release
02:09:00:WU02:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
02:09:00:WU02:FS01:0x22:             <peastman@stanford.edu>
02:09:00:WU02:FS01:0x22:       Args: -dir 02 -suffix 01 -version 706 -lifeline 1408 -checkpoint 15
02:09:00:WU02:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
02:09:00:WU02:FS01:0x22:************************************ libFAH ************************************
02:09:00:WU02:FS01:0x22:       Date: Jun 26 2020
02:09:00:WU02:FS01:0x22:       Time: 19:47:12
02:09:00:WU02:FS01:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
02:09:00:WU02:FS01:0x22:     Branch: HEAD
02:09:00:WU02:FS01:0x22:   Compiler: Visual C++ 2015
02:09:00:WU02:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
02:09:00:WU02:FS01:0x22:   Platform: win32 10
02:09:00:WU02:FS01:0x22:       Bits: 64
02:09:00:WU02:FS01:0x22:       Mode: Release
02:09:00:WU02:FS01:0x22:************************************ CBang *************************************
02:09:00:WU02:FS01:0x22:       Date: Jun 26 2020
02:09:00:WU02:FS01:0x22:       Time: 19:46:11
02:09:00:WU02:FS01:0x22:   Revision: f8529962055b0e7bde23e429f5072ff758089dee
02:09:00:WU02:FS01:0x22:     Branch: master
02:09:00:WU02:FS01:0x22:   Compiler: Visual C++ 2015
02:09:00:WU02:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
02:09:00:WU02:FS01:0x22:   Platform: win32 10
02:09:00:WU02:FS01:0x22:       Bits: 64
02:09:00:WU02:FS01:0x22:       Mode: Release
02:09:00:WU02:FS01:0x22:************************************ System ************************************
02:09:00:WU02:FS01:0x22:        CPU: AMD Ryzen 3 3100 4-Core Processor
02:09:00:WU02:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
02:09:00:WU02:FS01:0x22:       CPUs: 8
02:09:00:WU02:FS01:0x22:     Memory: 15.93GiB
02:09:00:WU02:FS01:0x22:Free Memory: 13.73GiB
02:09:00:WU02:FS01:0x22:    Threads: WINDOWS_THREADS
02:09:00:WU02:FS01:0x22: OS Version: 6.2
02:09:00:WU02:FS01:0x22:Has Battery: false
02:09:00:WU02:FS01:0x22: On Battery: false
02:09:00:WU02:FS01:0x22: UTC Offset: -5
02:09:00:WU02:FS01:0x22:        PID: 5164
02:09:00:WU02:FS01:0x22:        CWD: C:\Users\GregC\AppData\Roaming\FAHClient\work
02:09:00:WU02:FS01:0x22:********************************************************************************
02:09:00:WU02:FS01:0x22:Project: 13416 (Run 618, Clone 285, Gen 1)
02:09:00:WU02:FS01:0x22:Unit: 0x0000000212bc7d9a5f0f8f5cb2de1d31
02:09:00:WU02:FS01:0x22:Reading tar file core.xml
02:09:00:WU02:FS01:0x22:Reading tar file integrator.xml
02:09:00:WU02:FS01:0x22:Reading tar file state.xml.bz2
02:09:00:WU02:FS01:0x22:Reading tar file system.xml.bz2
02:09:00:WU02:FS01:0x22:Digital signatures verified
02:09:00:WU02:FS01:0x22:Folding@home GPU Core22 Folding@home Core
02:09:00:WU02:FS01:0x22:Version 0.0.11
02:09:00:WU02:FS01:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
02:09:00:WU02:FS01:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
02:09:00:WU02:FS01:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
02:09:00:WU02:FS01:0x22:  Global context and integrator variables write interval: 2500 steps (0.25%) [400 total]
02:09:06:WU00:FS01:Upload 62.59%
02:09:09:WU00:FS01:Upload complete
02:09:09:WU00:FS01:Server responded WORK_ACK (400)
02:09:09:WU00:FS01:Final credit estimate, 68278.00 points
02:09:09:WU00:FS01:Cleaning up
02:09:24:WU02:FS01:0x22:Completed 0 out of 1000000 steps (0%)

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Posted: Thu Jul 23, 2020 3:58 am
by bruce
GregC wrote:This GPU slot can and does complete WUs. It just finished a WU.

So what do you want from us? Should we call this "case closed"?