How can I stop getting 13420 WUs on my GPU?

Moderators: Site Moderators, FAHC Science Team

themartymonster
Posts: 9
Joined: Mon Apr 20, 2020 1:36 am

Re: How can I stop getting 13420 WUs on my GPU?

Post by themartymonster »

I set it to HighPriority.
Current WU is a 13420 4 1/2 hours
Est PPD 969000
Est Credit 173500
Est TPF 2 min 35 secs

GPU load 65%
HaloJones
Posts: 920
Joined: Thu Jul 24, 2008 10:16 am

Re: How can I stop getting 13420 WUs on my GPU?

Post by HaloJones »

I had a system with two 1070s powered by a Kolink 100W PSU with some cheap Chinese cable extenders so it would look all pretty inside a computer I never looked at. Cards wouldn't boost over 1900 despite being custom watercooled.

Removed the cable extenders and got to 1925.
Swapped out the Kolink for an EVGA G2 and got 2075.

Keep your power runs as short as possible and as high quality as possible.
single 1070

Image
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: How can I stop getting 13420 WUs on my GPU?

Post by PantherX »

Please note that there's no way to exclude a Project from your system. You will be allocated WUs that best match your hardware and client configuration.

I am aware that Project 134XX WUs are highly experimental and also time sensitive. There has been few iterations of it and in each, there has been optimizations done to ensure that the best possible use of your hardware is achieved along with better science. We do appreciate your patience and continuations during this global pandemic :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Kebast
Posts: 386
Joined: Thu Aug 06, 2015 5:21 pm

Re: How can I stop getting 13420 WUs on my GPU?

Post by Kebast »

I got this one with an error last night:

Code: Select all

09:03:47:WU01:FS02:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13420 run:7739 clone:4 gen:0 core:0x22 unit:0x0000000312bc7d9a5f2249bd67921a1e
09:07:35:WU00:FS02:0x22:Completed 1000000 out of 1000000 steps (100%)
09:07:35:WU00:FS02:0x22:Average performance: 74.87 ns/day
09:07:41:WU00:FS02:0x22:Saving result file ../logfile_01.txt
09:07:41:WU00:FS02:0x22:Saving result file checkpointState.xml.bz2
09:07:41:WU00:FS02:0x22:Saving result file globals.csv
09:07:41:WU00:FS02:0x22:Saving result file positions.xtc
09:07:41:WU00:FS02:0x22:Saving result file science.log
09:07:41:WU00:FS02:0x22:Folding@home Core Shutdown: FINISHED_UNIT
09:07:42:WU00:FS02:FahCore returned: FINISHED_UNIT (100 = 0x64)
09:07:42:WU00:FS02:Sending unit results: id:00 state:SEND error:NO_ERROR project:13420 run:4932 clone:10 gen:1 core:0x22 unit:0x0000000212bc7d9a5f22494fb94181a5
09:07:42:WU00:FS02:Uploading 5.71MiB to 18.188.125.154
09:07:42:WU00:FS02:Connecting to 18.188.125.154:8080
09:07:42:WU01:FS02:Starting
09:07:42:WU01:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.11/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 705 -lifeline 1462 -checkpoint 20 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 1 -cuda-device 1 -gpu 1
09:07:42:WU01:FS02:Started FahCore on PID 23520
09:07:42:WU01:FS02:Core PID:23524
09:07:42:WU01:FS02:FahCore 0x22 started
09:07:43:WU01:FS02:0x22:*********************** Log Started 2020-08-12T09:07:42Z ***********************
09:07:43:WU01:FS02:0x22:*************************** Core22 Folding@home Core ***************************
09:07:43:WU01:FS02:0x22:       Core: Core22
09:07:43:WU01:FS02:0x22:       Type: 0x22
09:07:43:WU01:FS02:0x22:    Version: 0.0.11
09:07:43:WU01:FS02:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
09:07:43:WU01:FS02:0x22:  Copyright: 2020 foldingathome.org
09:07:43:WU01:FS02:0x22:   Homepage: https://foldingathome.org/
09:07:43:WU01:FS02:0x22:       Date: Jun 27 2020
09:07:43:WU01:FS02:0x22:       Time: 22:50:00
09:07:43:WU01:FS02:0x22:   Revision: cfc2940c5dd1aa80f60daa6e28d4a2a417f74edb
09:07:43:WU01:FS02:0x22:     Branch: core22-0.0.11
09:07:43:WU01:FS02:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
09:07:43:WU01:FS02:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
09:07:43:WU01:FS02:0x22:             -funroll-loops
09:07:43:WU01:FS02:0x22:   Platform: linux2 4.19.76-linuxkit
09:07:43:WU01:FS02:0x22:       Bits: 64
09:07:43:WU01:FS02:0x22:       Mode: Release
09:07:43:WU01:FS02:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
09:07:43:WU01:FS02:0x22:             <peastman@stanford.edu>
09:07:43:WU01:FS02:0x22:       Args: -dir 01 -suffix 01 -version 705 -lifeline 23520 -checkpoint 20
09:07:43:WU01:FS02:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 1 -cuda-device
09:07:43:WU01:FS02:0x22:             1 -gpu 1
09:07:43:WU01:FS02:0x22:************************************ libFAH ************************************
09:07:43:WU01:FS02:0x22:       Date: Jun 27 2020
09:07:43:WU01:FS02:0x22:       Time: 22:11:04
09:07:43:WU01:FS02:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
09:07:43:WU01:FS02:0x22:     Branch: HEAD
09:07:43:WU01:FS02:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
09:07:43:WU01:FS02:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
09:07:43:WU01:FS02:0x22:             -funroll-loops
09:07:43:WU01:FS02:0x22:   Platform: linux2 4.19.76-linuxkit
09:07:43:WU01:FS02:0x22:       Bits: 64
09:07:43:WU01:FS02:0x22:       Mode: Release
09:07:43:WU01:FS02:0x22:************************************ CBang *************************************
09:07:43:WU01:FS02:0x22:       Date: Jun 27 2020
09:07:43:WU01:FS02:0x22:       Time: 22:10:11
09:07:43:WU01:FS02:0x22:   Revision: f8529962055b0e7bde23e429f5072ff758089dee
09:07:43:WU01:FS02:0x22:     Branch: HEAD
09:07:43:WU01:FS02:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
09:07:43:WU01:FS02:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
09:07:43:WU01:FS02:0x22:             -funroll-loops -fPIC
09:07:43:WU01:FS02:0x22:   Platform: linux2 4.19.76-linuxkit
09:07:43:WU01:FS02:0x22:       Bits: 64
09:07:43:WU01:FS02:0x22:       Mode: Release
09:07:43:WU01:FS02:0x22:************************************ System ************************************
09:07:43:WU01:FS02:0x22:        CPU: AMD FX(tm)-6300 Six-Core Processor
09:07:43:WU01:FS02:0x22:     CPU ID: AuthenticAMD Family 21 Model 2 Stepping 0
09:07:43:WU01:FS02:0x22:       CPUs: 6
09:07:43:WU01:FS02:0x22:     Memory: 15.63GiB
09:07:43:WU01:FS02:0x22:Free Memory: 10.50GiB
09:07:43:WU01:FS02:0x22:    Threads: POSIX_THREADS
09:07:43:WU01:FS02:0x22: OS Version: 4.15
09:07:43:WU01:FS02:0x22:Has Battery: false
09:07:43:WU01:FS02:0x22: On Battery: false
09:07:43:WU01:FS02:0x22: UTC Offset: -4
09:07:43:WU01:FS02:0x22:        PID: 23524
09:07:43:WU01:FS02:0x22:        CWD: /var/lib/fahclient/work
09:07:43:WU01:FS02:0x22:********************************************************************************
09:07:43:WU01:FS02:0x22:Project: 13420 (Run 7739, Clone 4, Gen 0)
09:07:43:WU01:FS02:0x22:Unit: 0x0000000312bc7d9a5f2249bd67921a1e
09:07:43:WU01:FS02:0x22:Reading tar file core.xml
09:07:43:WU01:FS02:0x22:Reading tar file integrator.xml
09:07:43:WU01:FS02:0x22:Reading tar file state.xml.bz2
09:07:43:WU01:FS02:0x22:Reading tar file system.xml.bz2
09:07:43:WU01:FS02:0x22:Digital signatures verified
09:07:43:WU01:FS02:0x22:Folding@home GPU Core22 Folding@home Core
09:07:43:WU01:FS02:0x22:Version 0.0.11
09:07:43:WU01:FS02:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
09:07:43:WU01:FS02:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
09:07:43:WU01:FS02:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
09:07:43:WU01:FS02:0x22:  Global context and integrator variables write interval: 25000 steps (2.5%) [40 total]
09:07:48:WU00:FS02:Upload 55.83%
09:07:52:WU00:FS02:Upload complete
09:07:52:WU00:FS02:Server responded WORK_ACK (400)
09:07:52:WU00:FS02:Final credit estimate, 142584.00 points
09:07:52:WU00:FS02:Cleaning up
09:07:58:WU01:FS02:0x22:Completed 0 out of 1000000 steps (0%)
09:08:19:WU01:FS02:0x22:An exception occurred at step 250: Particle coordinate is nan
09:08:19:WU01:FS02:0x22:ERROR:98: Attempting to restart from last good checkpoint by restarting core.
09:08:19:WU01:FS02:0x22:Folding@home Core Shutdown: CORE_RESTART
09:08:19:WARNING:WU01:FS02:FahCore returned: CORE_RESTART (98 = 0x62)
09:08:19:WU01:FS02:Starting
09:08:19:WU01:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.11/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 705 -lifeline 1462 -checkpoint 20 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 1 -cuda-device 1 -gpu 1
09:08:19:WU01:FS02:Started FahCore on PID 23560
09:08:19:WU01:FS02:Core PID:23564
09:08:19:WU01:FS02:FahCore 0x22 started
09:08:20:WU01:FS02:0x22:*********************** Log Started 2020-08-12T09:08:19Z ***********************
09:08:20:WU01:FS02:0x22:*************************** Core22 Folding@home Core ***************************
09:08:20:WU01:FS02:0x22:       Core: Core22
09:08:20:WU01:FS02:0x22:       Type: 0x22
09:08:20:WU01:FS02:0x22:    Version: 0.0.11
09:08:20:WU01:FS02:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
09:08:20:WU01:FS02:0x22:  Copyright: 2020 foldingathome.org
09:08:20:WU01:FS02:0x22:   Homepage: https://foldingathome.org/
09:08:20:WU01:FS02:0x22:       Date: Jun 27 2020
09:08:20:WU01:FS02:0x22:       Time: 22:50:00
09:08:20:WU01:FS02:0x22:   Revision: cfc2940c5dd1aa80f60daa6e28d4a2a417f74edb
09:08:20:WU01:FS02:0x22:     Branch: core22-0.0.11
09:08:20:WU01:FS02:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
09:08:20:WU01:FS02:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
09:08:20:WU01:FS02:0x22:             -funroll-loops
09:08:20:WU01:FS02:0x22:   Platform: linux2 4.19.76-linuxkit
09:08:20:WU01:FS02:0x22:       Bits: 64
09:08:20:WU01:FS02:0x22:       Mode: Release
09:08:20:WU01:FS02:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
09:08:20:WU01:FS02:0x22:             <peastman@stanford.edu>
09:08:20:WU01:FS02:0x22:       Args: -dir 01 -suffix 01 -version 705 -lifeline 23560 -checkpoint 20
09:08:20:WU01:FS02:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 1 -cuda-device
09:08:20:WU01:FS02:0x22:             1 -gpu 1
09:08:20:WU01:FS02:0x22:************************************ libFAH ************************************
09:08:20:WU01:FS02:0x22:       Date: Jun 27 2020
09:08:20:WU01:FS02:0x22:       Time: 22:11:04
09:08:20:WU01:FS02:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
09:08:20:WU01:FS02:0x22:     Branch: HEAD
09:08:20:WU01:FS02:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
09:08:20:WU01:FS02:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
09:08:20:WU01:FS02:0x22:             -funroll-loops
09:08:20:WU01:FS02:0x22:   Platform: linux2 4.19.76-linuxkit
09:08:20:WU01:FS02:0x22:       Bits: 64
09:08:20:WU01:FS02:0x22:       Mode: Release
09:08:20:WU01:FS02:0x22:************************************ CBang *************************************
09:08:20:WU01:FS02:0x22:       Date: Jun 27 2020
09:08:20:WU01:FS02:0x22:       Time: 22:10:11
09:08:20:WU01:FS02:0x22:   Revision: f8529962055b0e7bde23e429f5072ff758089dee
09:08:20:WU01:FS02:0x22:     Branch: HEAD
09:08:20:WU01:FS02:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
09:08:20:WU01:FS02:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
09:08:20:WU01:FS02:0x22:             -funroll-loops -fPIC
09:08:20:WU01:FS02:0x22:   Platform: linux2 4.19.76-linuxkit
09:08:20:WU01:FS02:0x22:       Bits: 64
09:08:20:WU01:FS02:0x22:       Mode: Release
09:08:20:WU01:FS02:0x22:************************************ System ************************************
09:08:20:WU01:FS02:0x22:        CPU: AMD FX(tm)-6300 Six-Core Processor
09:08:20:WU01:FS02:0x22:     CPU ID: AuthenticAMD Family 21 Model 2 Stepping 0
09:08:20:WU01:FS02:0x22:       CPUs: 6
09:08:20:WU01:FS02:0x22:     Memory: 15.63GiB
09:08:20:WU01:FS02:0x22:Free Memory: 10.50GiB
09:08:20:WU01:FS02:0x22:    Threads: POSIX_THREADS
09:08:20:WU01:FS02:0x22: OS Version: 4.15
09:08:20:WU01:FS02:0x22:Has Battery: false
09:08:20:WU01:FS02:0x22: On Battery: false
09:08:20:WU01:FS02:0x22: UTC Offset: -4
09:08:20:WU01:FS02:0x22:        PID: 23564
09:08:20:WU01:FS02:0x22:        CWD: /var/lib/fahclient/work
09:08:20:WU01:FS02:0x22:********************************************************************************
09:08:20:WU01:FS02:0x22:Project: 13420 (Run 7739, Clone 4, Gen 0)
09:08:20:WU01:FS02:0x22:Unit: 0x0000000312bc7d9a5f2249bd67921a1e
09:08:20:WU01:FS02:0x22:Digital signatures verified
09:08:20:WU01:FS02:0x22:Folding@home GPU Core22 Folding@home Core
09:08:20:WU01:FS02:0x22:Version 0.0.11
09:08:20:WU01:FS02:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
09:08:20:WU01:FS02:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
09:08:20:WU01:FS02:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
09:08:20:WU01:FS02:0x22:  Global context and integrator variables write interval: 25000 steps (2.5%) [40 total]
09:08:33:WU01:FS02:0x22:Completed 0 out of 1000000 steps (0%)
09:08:52:WU01:FS02:0x22:An exception occurred at step 250: Particle coordinate is nan
09:08:52:WU01:FS02:0x22:Max number of attempts to resume from last checkpoint (2) reached. Aborting.
09:08:52:WU01:FS02:0x22:ERROR:114: Max number of attempts to resume from last checkpoint reached.
09:08:52:WU01:FS02:0x22:Saving result file ../logfile_01.txt
09:08:52:WU01:FS02:0x22:Saving result file globals.csv
09:08:52:WU01:FS02:0x22:Saving result file science.log
09:08:52:WU01:FS02:0x22:Saving result file state.xml.bz2
09:08:52:WU01:FS02:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
09:08:52:WARNING:WU01:FS02:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
09:08:52:WU01:FS02:Sending unit results: id:01 state:SEND error:FAULTY project:13420 run:7739 clone:4 gen:0 core:0x22 unit:0x0000000312bc7d9a5f2249bd67921a1e
Image
Ryzen 5900x 12T - RTX 4070 TI
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: How can I stop getting 13420 WUs on my GPU?

Post by bruce »

NaN errors are sometimes a result of overclocking and sometimes a characteristic of the WU and sometimes a driver issue. The uploaded error reports for p134xx are being carefully monitored so it's not necessary to report that particular error here.

The ~30 seconds of GPU time that it took to generate that error and make the report was not wasted.
JohnChodera
Pande Group Member
Posts: 470
Joined: Fri Feb 22, 2013 9:59 pm

Re: How can I stop getting 13420 WUs on my GPU?

Post by JohnChodera »

Yikes! Sorry for the issues here! These WUs should only take ~1-2 h to run on fast modern GPUs! Glad you got the issue sorted, and huge thanks for helping us out!

~ John Chodera // MSKCC
Post Reply