ETA 10 days on RX550: Project 13416 run 1140 clone 293 gen 1

Moderators: Site Moderators, FAHC Science Team

GregC
Posts: 24
Joined: Wed May 20, 2020 12:36 am
Location: Houston, TX

ETA 10 days on RX550: Project 13416 run 1140 clone 293 gen 1

Post by GregC »

This WU is estimated to finish in 10 days.

https://apps.foldingathome.org/wu#proje ... =293&gen=1

I have stable hardware, but saved off and gave up on the WU after 6%.
TPF is a reliable 2 hours and 22 minutes for each percent marker.

I'm re-posting from Discord, extended discussion with screen captures and thoughts:
https://discordapp.com/channels/5738706 ... 4667077644

There's a more serious issue with the way I'm allowed to drop WUs with server being none the wiser, discussion in that thread.
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Post by Neil-B »

The Work Server will queue any WU not returned by the Timeout for reassignment ... People dump WUs for a variety of reasons - some more valid than others - The reissue at Timeout approach is what FAH has used to date - If dumping becomes a major issue then that may change.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Post by bruce »

Dumped WU should also be reported automatically, depending on how you dump it.
GregC
Posts: 24
Joined: Wed May 20, 2020 12:36 am
Location: Houston, TX

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Post by GregC »

So how should I dump this WU then?

Code: Select all

10:52:42:WU00:FS01:Starting
10:52:42:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\GregC\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.11/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 7664 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
10:52:42:WU00:FS01:Started FahCore on PID 5604
10:52:42:WU00:FS01:Core PID:4304
10:52:42:WU00:FS01:FahCore 0x22 started
10:52:43:WU00:FS01:0x22:*********************** Log Started 2020-07-21T10:52:42Z ***********************
10:52:43:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
10:52:43:WU00:FS01:0x22:       Core: Core22
10:52:43:WU00:FS01:0x22:       Type: 0x22
10:52:43:WU00:FS01:0x22:    Version: 0.0.11
10:52:43:WU00:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
10:52:43:WU00:FS01:0x22:  Copyright: 2020 foldingathome.org
10:52:43:WU00:FS01:0x22:   Homepage: https://foldingathome.org/
10:52:43:WU00:FS01:0x22:       Date: Jun 26 2020
10:52:43:WU00:FS01:0x22:       Time: 19:49:16
10:52:43:WU00:FS01:0x22:   Revision: 22010df8a4db48db1b35d33e666b64d8ce48689d
10:52:43:WU00:FS01:0x22:     Branch: core22-0.0.11
10:52:43:WU00:FS01:0x22:   Compiler: Visual C++ 2015
10:52:43:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
10:52:43:WU00:FS01:0x22:   Platform: win32 10
10:52:43:WU00:FS01:0x22:       Bits: 64
10:52:43:WU00:FS01:0x22:       Mode: Release
10:52:43:WU00:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
10:52:43:WU00:FS01:0x22:             <peastman@stanford.edu>
10:52:43:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 5604 -checkpoint 15
10:52:43:WU00:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
10:52:43:WU00:FS01:0x22:************************************ libFAH ************************************
10:52:43:WU00:FS01:0x22:       Date: Jun 26 2020
10:52:43:WU00:FS01:0x22:       Time: 19:47:12
10:52:43:WU00:FS01:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
10:52:43:WU00:FS01:0x22:     Branch: HEAD
10:52:43:WU00:FS01:0x22:   Compiler: Visual C++ 2015
10:52:43:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
10:52:43:WU00:FS01:0x22:   Platform: win32 10
10:52:43:WU00:FS01:0x22:       Bits: 64
10:52:43:WU00:FS01:0x22:       Mode: Release
10:52:43:WU00:FS01:0x22:************************************ CBang *************************************
10:52:43:WU00:FS01:0x22:       Date: Jun 26 2020
10:52:43:WU00:FS01:0x22:       Time: 19:46:11
10:52:43:WU00:FS01:0x22:   Revision: f8529962055b0e7bde23e429f5072ff758089dee
10:52:43:WU00:FS01:0x22:     Branch: master
10:52:43:WU00:FS01:0x22:   Compiler: Visual C++ 2015
10:52:43:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
10:52:43:WU00:FS01:0x22:   Platform: win32 10
10:52:43:WU00:FS01:0x22:       Bits: 64
10:52:43:WU00:FS01:0x22:       Mode: Release
10:52:43:WU00:FS01:0x22:************************************ System ************************************
10:52:43:WU00:FS01:0x22:        CPU: AMD Ryzen 3 3100 4-Core Processor
10:52:43:WU00:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
10:52:43:WU00:FS01:0x22:       CPUs: 8
10:52:43:WU00:FS01:0x22:     Memory: 15.93GiB
10:52:43:WU00:FS01:0x22:Free Memory: 13.80GiB
10:52:43:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
10:52:43:WU00:FS01:0x22: OS Version: 6.2
10:52:43:WU00:FS01:0x22:Has Battery: false
10:52:43:WU00:FS01:0x22: On Battery: false
10:52:43:WU00:FS01:0x22: UTC Offset: -5
10:52:43:WU00:FS01:0x22:        PID: 4304
10:52:43:WU00:FS01:0x22:        CWD: C:\Users\GregC\AppData\Roaming\FAHClient\work
10:52:43:WU00:FS01:0x22:********************************************************************************
10:52:43:WU00:FS01:0x22:Project: 13418 (Run 382, Clone 17, Gen 3)
10:52:43:WU00:FS01:0x22:Unit: 0x0000000512bc7d9a5f128297cf05ec49
10:52:43:WU00:FS01:0x22:Digital signatures verified
10:52:43:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
10:52:43:WU00:FS01:0x22:Version 0.0.11
10:52:43:WU00:FS01:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
10:52:43:WU00:FS01:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
10:52:43:WU00:FS01:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
10:52:43:WU00:FS01:0x22:  Global context and integrator variables write interval: 25000 steps (2.5%) [40 total]
10:53:07:WU00:FS01:0x22:Completed 0 out of 1000000 steps (0%)
15:17:21:WU00:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
******************************* Date: 2020-07-21 *******************************
18:27:35:WU00:FS01:0x22:Completed 20000 out of 1000000 steps (2%)
15:17:21:WU00:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
******************************* Date: 2020-07-21 *******************************
18:27:35:WU00:FS01:0x22:Completed 20000 out of 1000000 steps (2%)
21:36:54:WU00:FS01:0x22:Completed 30000 out of 1000000 steps (3%)
******************************* Date: 2020-07-21 *******************************
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Post by bruce »

You're running Project: 13418 (Run 382, Clone 17, Gen 3), not Project 13416 run 1140 clone 293 gen 1. Which WU do you need to dump?
GregC
Posts: 24
Joined: Wed May 20, 2020 12:36 am
Location: Houston, TX

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Post by GregC »

I need to dump 13418 (Run 382, Clone 17, Gen 3). The 13416/1140/293/1 was dumped a little while back.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Post by bruce »

How much progress have you made while FAHClient is estimating 10 days? That estimate is VERY poor until you've completed the first 5% or more.
GregC
Posts: 24
Joined: Wed May 20, 2020 12:36 am
Location: Houston, TX

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Post by GregC »

Here's what it looks like from GPU-Z, GPU load perspective.
The nearly-no-load on left represent the WU I just dumpt.
The all-the-way-load on right represents the newly downloaded WU.
http://gpuz.techpowerup.com/20/07/22/qdp.png
GregC
Posts: 24
Joined: Wed May 20, 2020 12:36 am
Location: Houston, TX

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Post by GregC »

As I've mentioned, I've retained the WU folders, willing to assist in debugging the issue.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Post by bruce »

The new WU seems to be working well. I'm not sure you need help, at this point ... it looks like you solved it.

If WU00 is still present on your system, pause folding. Open FAHData and inside of /work delete 00. Resume folding and it should take care of itself.
GregC
Posts: 24
Joined: Wed May 20, 2020 12:36 am
Location: Houston, TX

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Post by GregC »

The point of this posting is to highlight the simple fact that there are many WUs that leave me no choice but to drop them. This gives no opportunity for developers to address the issue, as my dropping the WUs doesn't register on the WU Status page. I now have a few such WUs sitting on my desktop, awaiting action from devs. And no, I didn't delete them.
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Post by _r2w_ben »

Are you running CPU folding as well? Some work slower 1341x work units use more CPU time on AMD GPUs. Increasing the priority of FahCore_22.exe or reducing the number of cores allocated to a CPU slot may improve performance.
GregC
Posts: 24
Joined: Wed May 20, 2020 12:36 am
Location: Houston, TX

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Post by GregC »

I have FAHCore_A22 at below normal priority, and FAHCore_A07 at idle. Only 6 of 8 threads are assigned.
GregC
Posts: 24
Joined: Wed May 20, 2020 12:36 am
Location: Houston, TX

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Post by GregC »

This GPU slot can and does complete WUs. It just finished a WU.

Code: Select all

22:56:39:WU00:FS01:0x22:Completed 830000 out of 1000000 steps (83%)
******************************* Date: 2020-07-22 *******************************
etc.etc.
01:46:22:WU00:FS01:0x22:Completed 980000 out of 1000000 steps (98%)
01:57:38:WU00:FS01:0x22:Completed 990000 out of 1000000 steps (99%)
01:57:39:WU02:FS01:Connecting to assign1.foldingathome.org:80
01:57:39:WU02:FS01:Assigned to work server 18.188.125.154
01:57:39:WU02:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:Baffin [Polaris11] from 18.188.125.154
01:57:39:WU02:FS01:Connecting to 18.188.125.154:8080
01:57:39:WU02:FS01:Downloading 7.04MiB
01:57:41:WU02:FS01:Download complete
01:57:41:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:13416 run:618 clone:285 gen:1 core:0x22 unit:0x0000000212bc7d9a5f0f8f5cb2de1d31
02:08:53:WU00:FS01:0x22:Completed 1000000 out of 1000000 steps (100%)
02:08:53:WU00:FS01:0x22:Average performance: 25.6 ns/day
02:08:59:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
02:08:59:WU00:FS01:0x22:Saving result file checkpointState.xml.bz2
02:08:59:WU00:FS01:0x22:Saving result file globals.csv
02:08:59:WU00:FS01:0x22:Saving result file positions.xtc
02:08:59:WU00:FS01:0x22:Saving result file science.log
02:08:59:WU00:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
02:09:00:WU00:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
02:09:00:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:13418 run:730 clone:75 gen:3 core:0x22 unit:0x0000000812bc7d9a5f12828ba73e2d0a
02:09:00:WU00:FS01:Uploading 5.69MiB to 18.188.125.154
02:09:00:WU00:FS01:Connecting to 18.188.125.154:8080
02:09:00:WU02:FS01:Starting
02:09:00:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\GregC\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.11/Core_22.fah/FahCore_22.exe -dir 02 -suffix 01 -version 706 -lifeline 6984 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
02:09:00:WU02:FS01:Started FahCore on PID 1408
02:09:00:WU02:FS01:Core PID:5164
02:09:00:WU02:FS01:FahCore 0x22 started
02:09:00:WU02:FS01:0x22:*********************** Log Started 2020-07-23T02:09:00Z ***********************
02:09:00:WU02:FS01:0x22:*************************** Core22 Folding@home Core ***************************
02:09:00:WU02:FS01:0x22:       Core: Core22
02:09:00:WU02:FS01:0x22:       Type: 0x22
02:09:00:WU02:FS01:0x22:    Version: 0.0.11
02:09:00:WU02:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
02:09:00:WU02:FS01:0x22:  Copyright: 2020 foldingathome.org
02:09:00:WU02:FS01:0x22:   Homepage: https://foldingathome.org/
02:09:00:WU02:FS01:0x22:       Date: Jun 26 2020
02:09:00:WU02:FS01:0x22:       Time: 19:49:16
02:09:00:WU02:FS01:0x22:   Revision: 22010df8a4db48db1b35d33e666b64d8ce48689d
02:09:00:WU02:FS01:0x22:     Branch: core22-0.0.11
02:09:00:WU02:FS01:0x22:   Compiler: Visual C++ 2015
02:09:00:WU02:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
02:09:00:WU02:FS01:0x22:   Platform: win32 10
02:09:00:WU02:FS01:0x22:       Bits: 64
02:09:00:WU02:FS01:0x22:       Mode: Release
02:09:00:WU02:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
02:09:00:WU02:FS01:0x22:             <peastman@stanford.edu>
02:09:00:WU02:FS01:0x22:       Args: -dir 02 -suffix 01 -version 706 -lifeline 1408 -checkpoint 15
02:09:00:WU02:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
02:09:00:WU02:FS01:0x22:************************************ libFAH ************************************
02:09:00:WU02:FS01:0x22:       Date: Jun 26 2020
02:09:00:WU02:FS01:0x22:       Time: 19:47:12
02:09:00:WU02:FS01:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
02:09:00:WU02:FS01:0x22:     Branch: HEAD
02:09:00:WU02:FS01:0x22:   Compiler: Visual C++ 2015
02:09:00:WU02:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
02:09:00:WU02:FS01:0x22:   Platform: win32 10
02:09:00:WU02:FS01:0x22:       Bits: 64
02:09:00:WU02:FS01:0x22:       Mode: Release
02:09:00:WU02:FS01:0x22:************************************ CBang *************************************
02:09:00:WU02:FS01:0x22:       Date: Jun 26 2020
02:09:00:WU02:FS01:0x22:       Time: 19:46:11
02:09:00:WU02:FS01:0x22:   Revision: f8529962055b0e7bde23e429f5072ff758089dee
02:09:00:WU02:FS01:0x22:     Branch: master
02:09:00:WU02:FS01:0x22:   Compiler: Visual C++ 2015
02:09:00:WU02:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
02:09:00:WU02:FS01:0x22:   Platform: win32 10
02:09:00:WU02:FS01:0x22:       Bits: 64
02:09:00:WU02:FS01:0x22:       Mode: Release
02:09:00:WU02:FS01:0x22:************************************ System ************************************
02:09:00:WU02:FS01:0x22:        CPU: AMD Ryzen 3 3100 4-Core Processor
02:09:00:WU02:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
02:09:00:WU02:FS01:0x22:       CPUs: 8
02:09:00:WU02:FS01:0x22:     Memory: 15.93GiB
02:09:00:WU02:FS01:0x22:Free Memory: 13.73GiB
02:09:00:WU02:FS01:0x22:    Threads: WINDOWS_THREADS
02:09:00:WU02:FS01:0x22: OS Version: 6.2
02:09:00:WU02:FS01:0x22:Has Battery: false
02:09:00:WU02:FS01:0x22: On Battery: false
02:09:00:WU02:FS01:0x22: UTC Offset: -5
02:09:00:WU02:FS01:0x22:        PID: 5164
02:09:00:WU02:FS01:0x22:        CWD: C:\Users\GregC\AppData\Roaming\FAHClient\work
02:09:00:WU02:FS01:0x22:********************************************************************************
02:09:00:WU02:FS01:0x22:Project: 13416 (Run 618, Clone 285, Gen 1)
02:09:00:WU02:FS01:0x22:Unit: 0x0000000212bc7d9a5f0f8f5cb2de1d31
02:09:00:WU02:FS01:0x22:Reading tar file core.xml
02:09:00:WU02:FS01:0x22:Reading tar file integrator.xml
02:09:00:WU02:FS01:0x22:Reading tar file state.xml.bz2
02:09:00:WU02:FS01:0x22:Reading tar file system.xml.bz2
02:09:00:WU02:FS01:0x22:Digital signatures verified
02:09:00:WU02:FS01:0x22:Folding@home GPU Core22 Folding@home Core
02:09:00:WU02:FS01:0x22:Version 0.0.11
02:09:00:WU02:FS01:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
02:09:00:WU02:FS01:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
02:09:00:WU02:FS01:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
02:09:00:WU02:FS01:0x22:  Global context and integrator variables write interval: 2500 steps (0.25%) [400 total]
02:09:06:WU00:FS01:Upload 62.59%
02:09:09:WU00:FS01:Upload complete
02:09:09:WU00:FS01:Server responded WORK_ACK (400)
02:09:09:WU00:FS01:Final credit estimate, 68278.00 points
02:09:09:WU00:FS01:Cleaning up
02:09:24:WU02:FS01:0x22:Completed 0 out of 1000000 steps (0%)
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Post by bruce »

GregC wrote:This GPU slot can and does complete WUs. It just finished a WU.

So what do you want from us? Should we call this "case closed"?
Post Reply