How can I stop getting 13420 WUs on my GPU?

Moderators: Site Moderators, FAHC Science Team

Re: How can I stop getting 13420 WUs on my GPU?

Postby themartymonster » Wed Aug 12, 2020 12:14 am

I set it to HighPriority.
Current WU is a 13420 4 1/2 hours
Est PPD 969000
Est Credit 173500
Est TPF 2 min 35 secs

GPU load 65%
themartymonster
 
Posts: 9
Joined: Mon Apr 20, 2020 2:36 am

Re: How can I stop getting 13420 WUs on my GPU?

Postby HaloJones » Wed Aug 12, 2020 5:43 am

I had a system with two 1070s powered by a Kolink 100W PSU with some cheap Chinese cable extenders so it would look all pretty inside a computer I never looked at. Cards wouldn't boost over 1900 despite being custom watercooled.

Removed the cable extenders and got to 1925.
Swapped out the Kolink for an EVGA G2 and got 2075.

Keep your power runs as short as possible and as high quality as possible.
1x Titan X, 5x 1070, 1x 970, 1 x Ryzen 3600

Image
HaloJones
 
Posts: 869
Joined: Thu Jul 24, 2008 11:16 am

Re: How can I stop getting 13420 WUs on my GPU?

Postby PantherX » Wed Aug 12, 2020 9:54 am

Please note that there's no way to exclude a Project from your system. You will be allocated WUs that best match your hardware and client configuration.

I am aware that Project 134XX WUs are highly experimental and also time sensitive. There has been few iterations of it and in each, there has been optimizations done to ensure that the best possible use of your hardware is achieved along with better science. We do appreciate your patience and continuations during this global pandemic :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
User avatar
PantherX
Site Moderator
 
Posts: 6733
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: How can I stop getting 13420 WUs on my GPU?

Postby Kebast » Wed Aug 12, 2020 12:29 pm

I got this one with an error last night:

Code: Select all
09:03:47:WU01:FS02:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13420 run:7739 clone:4 gen:0 core:0x22 unit:0x0000000312bc7d9a5f2249bd67921a1e
09:07:35:WU00:FS02:0x22:Completed 1000000 out of 1000000 steps (100%)
09:07:35:WU00:FS02:0x22:Average performance: 74.87 ns/day
09:07:41:WU00:FS02:0x22:Saving result file ../logfile_01.txt
09:07:41:WU00:FS02:0x22:Saving result file checkpointState.xml.bz2
09:07:41:WU00:FS02:0x22:Saving result file globals.csv
09:07:41:WU00:FS02:0x22:Saving result file positions.xtc
09:07:41:WU00:FS02:0x22:Saving result file science.log
09:07:41:WU00:FS02:0x22:Folding@home Core Shutdown: FINISHED_UNIT
09:07:42:WU00:FS02:FahCore returned: FINISHED_UNIT (100 = 0x64)
09:07:42:WU00:FS02:Sending unit results: id:00 state:SEND error:NO_ERROR project:13420 run:4932 clone:10 gen:1 core:0x22 unit:0x0000000212bc7d9a5f22494fb94181a5
09:07:42:WU00:FS02:Uploading 5.71MiB to 18.188.125.154
09:07:42:WU00:FS02:Connecting to 18.188.125.154:8080
09:07:42:WU01:FS02:Starting
09:07:42:WU01:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.11/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 705 -lifeline 1462 -checkpoint 20 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 1 -cuda-device 1 -gpu 1
09:07:42:WU01:FS02:Started FahCore on PID 23520
09:07:42:WU01:FS02:Core PID:23524
09:07:42:WU01:FS02:FahCore 0x22 started
09:07:43:WU01:FS02:0x22:*********************** Log Started 2020-08-12T09:07:42Z ***********************
09:07:43:WU01:FS02:0x22:*************************** Core22 Folding@home Core ***************************
09:07:43:WU01:FS02:0x22:       Core: Core22
09:07:43:WU01:FS02:0x22:       Type: 0x22
09:07:43:WU01:FS02:0x22:    Version: 0.0.11
09:07:43:WU01:FS02:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
09:07:43:WU01:FS02:0x22:  Copyright: 2020 foldingathome.org
09:07:43:WU01:FS02:0x22:   Homepage: https://foldingathome.org/
09:07:43:WU01:FS02:0x22:       Date: Jun 27 2020
09:07:43:WU01:FS02:0x22:       Time: 22:50:00
09:07:43:WU01:FS02:0x22:   Revision: cfc2940c5dd1aa80f60daa6e28d4a2a417f74edb
09:07:43:WU01:FS02:0x22:     Branch: core22-0.0.11
09:07:43:WU01:FS02:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
09:07:43:WU01:FS02:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
09:07:43:WU01:FS02:0x22:             -funroll-loops
09:07:43:WU01:FS02:0x22:   Platform: linux2 4.19.76-linuxkit
09:07:43:WU01:FS02:0x22:       Bits: 64
09:07:43:WU01:FS02:0x22:       Mode: Release
09:07:43:WU01:FS02:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
09:07:43:WU01:FS02:0x22:             <peastman@stanford.edu>
09:07:43:WU01:FS02:0x22:       Args: -dir 01 -suffix 01 -version 705 -lifeline 23520 -checkpoint 20
09:07:43:WU01:FS02:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 1 -cuda-device
09:07:43:WU01:FS02:0x22:             1 -gpu 1
09:07:43:WU01:FS02:0x22:************************************ libFAH ************************************
09:07:43:WU01:FS02:0x22:       Date: Jun 27 2020
09:07:43:WU01:FS02:0x22:       Time: 22:11:04
09:07:43:WU01:FS02:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
09:07:43:WU01:FS02:0x22:     Branch: HEAD
09:07:43:WU01:FS02:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
09:07:43:WU01:FS02:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
09:07:43:WU01:FS02:0x22:             -funroll-loops
09:07:43:WU01:FS02:0x22:   Platform: linux2 4.19.76-linuxkit
09:07:43:WU01:FS02:0x22:       Bits: 64
09:07:43:WU01:FS02:0x22:       Mode: Release
09:07:43:WU01:FS02:0x22:************************************ CBang *************************************
09:07:43:WU01:FS02:0x22:       Date: Jun 27 2020
09:07:43:WU01:FS02:0x22:       Time: 22:10:11
09:07:43:WU01:FS02:0x22:   Revision: f8529962055b0e7bde23e429f5072ff758089dee
09:07:43:WU01:FS02:0x22:     Branch: HEAD
09:07:43:WU01:FS02:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
09:07:43:WU01:FS02:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
09:07:43:WU01:FS02:0x22:             -funroll-loops -fPIC
09:07:43:WU01:FS02:0x22:   Platform: linux2 4.19.76-linuxkit
09:07:43:WU01:FS02:0x22:       Bits: 64
09:07:43:WU01:FS02:0x22:       Mode: Release
09:07:43:WU01:FS02:0x22:************************************ System ************************************
09:07:43:WU01:FS02:0x22:        CPU: AMD FX(tm)-6300 Six-Core Processor
09:07:43:WU01:FS02:0x22:     CPU ID: AuthenticAMD Family 21 Model 2 Stepping 0
09:07:43:WU01:FS02:0x22:       CPUs: 6
09:07:43:WU01:FS02:0x22:     Memory: 15.63GiB
09:07:43:WU01:FS02:0x22:Free Memory: 10.50GiB
09:07:43:WU01:FS02:0x22:    Threads: POSIX_THREADS
09:07:43:WU01:FS02:0x22: OS Version: 4.15
09:07:43:WU01:FS02:0x22:Has Battery: false
09:07:43:WU01:FS02:0x22: On Battery: false
09:07:43:WU01:FS02:0x22: UTC Offset: -4
09:07:43:WU01:FS02:0x22:        PID: 23524
09:07:43:WU01:FS02:0x22:        CWD: /var/lib/fahclient/work
09:07:43:WU01:FS02:0x22:********************************************************************************
09:07:43:WU01:FS02:0x22:Project: 13420 (Run 7739, Clone 4, Gen 0)
09:07:43:WU01:FS02:0x22:Unit: 0x0000000312bc7d9a5f2249bd67921a1e
09:07:43:WU01:FS02:0x22:Reading tar file core.xml
09:07:43:WU01:FS02:0x22:Reading tar file integrator.xml
09:07:43:WU01:FS02:0x22:Reading tar file state.xml.bz2
09:07:43:WU01:FS02:0x22:Reading tar file system.xml.bz2
09:07:43:WU01:FS02:0x22:Digital signatures verified
09:07:43:WU01:FS02:0x22:Folding@home GPU Core22 Folding@home Core
09:07:43:WU01:FS02:0x22:Version 0.0.11
09:07:43:WU01:FS02:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
09:07:43:WU01:FS02:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
09:07:43:WU01:FS02:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
09:07:43:WU01:FS02:0x22:  Global context and integrator variables write interval: 25000 steps (2.5%) [40 total]
09:07:48:WU00:FS02:Upload 55.83%
09:07:52:WU00:FS02:Upload complete
09:07:52:WU00:FS02:Server responded WORK_ACK (400)
09:07:52:WU00:FS02:Final credit estimate, 142584.00 points
09:07:52:WU00:FS02:Cleaning up
09:07:58:WU01:FS02:0x22:Completed 0 out of 1000000 steps (0%)
09:08:19:WU01:FS02:0x22:An exception occurred at step 250: Particle coordinate is nan
09:08:19:WU01:FS02:0x22:ERROR:98: Attempting to restart from last good checkpoint by restarting core.
09:08:19:WU01:FS02:0x22:Folding@home Core Shutdown: CORE_RESTART
09:08:19:WARNING:WU01:FS02:FahCore returned: CORE_RESTART (98 = 0x62)
09:08:19:WU01:FS02:Starting
09:08:19:WU01:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.11/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 705 -lifeline 1462 -checkpoint 20 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 1 -cuda-device 1 -gpu 1
09:08:19:WU01:FS02:Started FahCore on PID 23560
09:08:19:WU01:FS02:Core PID:23564
09:08:19:WU01:FS02:FahCore 0x22 started
09:08:20:WU01:FS02:0x22:*********************** Log Started 2020-08-12T09:08:19Z ***********************
09:08:20:WU01:FS02:0x22:*************************** Core22 Folding@home Core ***************************
09:08:20:WU01:FS02:0x22:       Core: Core22
09:08:20:WU01:FS02:0x22:       Type: 0x22
09:08:20:WU01:FS02:0x22:    Version: 0.0.11
09:08:20:WU01:FS02:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
09:08:20:WU01:FS02:0x22:  Copyright: 2020 foldingathome.org
09:08:20:WU01:FS02:0x22:   Homepage: https://foldingathome.org/
09:08:20:WU01:FS02:0x22:       Date: Jun 27 2020
09:08:20:WU01:FS02:0x22:       Time: 22:50:00
09:08:20:WU01:FS02:0x22:   Revision: cfc2940c5dd1aa80f60daa6e28d4a2a417f74edb
09:08:20:WU01:FS02:0x22:     Branch: core22-0.0.11
09:08:20:WU01:FS02:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
09:08:20:WU01:FS02:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
09:08:20:WU01:FS02:0x22:             -funroll-loops
09:08:20:WU01:FS02:0x22:   Platform: linux2 4.19.76-linuxkit
09:08:20:WU01:FS02:0x22:       Bits: 64
09:08:20:WU01:FS02:0x22:       Mode: Release
09:08:20:WU01:FS02:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
09:08:20:WU01:FS02:0x22:             <peastman@stanford.edu>
09:08:20:WU01:FS02:0x22:       Args: -dir 01 -suffix 01 -version 705 -lifeline 23560 -checkpoint 20
09:08:20:WU01:FS02:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 1 -cuda-device
09:08:20:WU01:FS02:0x22:             1 -gpu 1
09:08:20:WU01:FS02:0x22:************************************ libFAH ************************************
09:08:20:WU01:FS02:0x22:       Date: Jun 27 2020
09:08:20:WU01:FS02:0x22:       Time: 22:11:04
09:08:20:WU01:FS02:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
09:08:20:WU01:FS02:0x22:     Branch: HEAD
09:08:20:WU01:FS02:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
09:08:20:WU01:FS02:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
09:08:20:WU01:FS02:0x22:             -funroll-loops
09:08:20:WU01:FS02:0x22:   Platform: linux2 4.19.76-linuxkit
09:08:20:WU01:FS02:0x22:       Bits: 64
09:08:20:WU01:FS02:0x22:       Mode: Release
09:08:20:WU01:FS02:0x22:************************************ CBang *************************************
09:08:20:WU01:FS02:0x22:       Date: Jun 27 2020
09:08:20:WU01:FS02:0x22:       Time: 22:10:11
09:08:20:WU01:FS02:0x22:   Revision: f8529962055b0e7bde23e429f5072ff758089dee
09:08:20:WU01:FS02:0x22:     Branch: HEAD
09:08:20:WU01:FS02:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
09:08:20:WU01:FS02:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
09:08:20:WU01:FS02:0x22:             -funroll-loops -fPIC
09:08:20:WU01:FS02:0x22:   Platform: linux2 4.19.76-linuxkit
09:08:20:WU01:FS02:0x22:       Bits: 64
09:08:20:WU01:FS02:0x22:       Mode: Release
09:08:20:WU01:FS02:0x22:************************************ System ************************************
09:08:20:WU01:FS02:0x22:        CPU: AMD FX(tm)-6300 Six-Core Processor
09:08:20:WU01:FS02:0x22:     CPU ID: AuthenticAMD Family 21 Model 2 Stepping 0
09:08:20:WU01:FS02:0x22:       CPUs: 6
09:08:20:WU01:FS02:0x22:     Memory: 15.63GiB
09:08:20:WU01:FS02:0x22:Free Memory: 10.50GiB
09:08:20:WU01:FS02:0x22:    Threads: POSIX_THREADS
09:08:20:WU01:FS02:0x22: OS Version: 4.15
09:08:20:WU01:FS02:0x22:Has Battery: false
09:08:20:WU01:FS02:0x22: On Battery: false
09:08:20:WU01:FS02:0x22: UTC Offset: -4
09:08:20:WU01:FS02:0x22:        PID: 23564
09:08:20:WU01:FS02:0x22:        CWD: /var/lib/fahclient/work
09:08:20:WU01:FS02:0x22:********************************************************************************
09:08:20:WU01:FS02:0x22:Project: 13420 (Run 7739, Clone 4, Gen 0)
09:08:20:WU01:FS02:0x22:Unit: 0x0000000312bc7d9a5f2249bd67921a1e
09:08:20:WU01:FS02:0x22:Digital signatures verified
09:08:20:WU01:FS02:0x22:Folding@home GPU Core22 Folding@home Core
09:08:20:WU01:FS02:0x22:Version 0.0.11
09:08:20:WU01:FS02:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
09:08:20:WU01:FS02:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
09:08:20:WU01:FS02:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
09:08:20:WU01:FS02:0x22:  Global context and integrator variables write interval: 25000 steps (2.5%) [40 total]
09:08:33:WU01:FS02:0x22:Completed 0 out of 1000000 steps (0%)
09:08:52:WU01:FS02:0x22:An exception occurred at step 250: Particle coordinate is nan
09:08:52:WU01:FS02:0x22:Max number of attempts to resume from last checkpoint (2) reached. Aborting.
09:08:52:WU01:FS02:0x22:ERROR:114: Max number of attempts to resume from last checkpoint reached.
09:08:52:WU01:FS02:0x22:Saving result file ../logfile_01.txt
09:08:52:WU01:FS02:0x22:Saving result file globals.csv
09:08:52:WU01:FS02:0x22:Saving result file science.log
09:08:52:WU01:FS02:0x22:Saving result file state.xml.bz2
09:08:52:WU01:FS02:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
09:08:52:WARNING:WU01:FS02:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
09:08:52:WU01:FS02:Sending unit results: id:01 state:SEND error:FAULTY project:13420 run:7739 clone:4 gen:0 core:0x22 unit:0x0000000312bc7d9a5f2249bd67921a1e
Image
Ryzen 3800x 12T - 3xGTX970
Kebast
 
Posts: 361
Joined: Thu Aug 06, 2015 6:21 pm

Re: How can I stop getting 13420 WUs on my GPU?

Postby bruce » Wed Aug 12, 2020 5:51 pm

NaN errors are sometimes a result of overclocking and sometimes a characteristic of the WU and sometimes a driver issue. The uploaded error reports for p134xx are being carefully monitored so it's not necessary to report that particular error here.

The ~30 seconds of GPU time that it took to generate that error and make the report was not wasted.
bruce
 
Posts: 19990
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: How can I stop getting 13420 WUs on my GPU?

Postby JohnChodera » Thu Aug 13, 2020 4:58 am

Yikes! Sorry for the issues here! These WUs should only take ~1-2 h to run on fast modern GPUs! Glad you got the issue sorted, and huge thanks for helping us out!

~ John Chodera // MSKCC
User avatar
JohnChodera
Pande Group Member
 
Posts: 406
Joined: Fri Feb 22, 2013 10:59 pm

Previous

Return to Issues with a specific WU

Who is online

Users browsing this forum: No registered users and 2 guests

cron