Page 1 of 2

Folding is not fun right now - lots of trouble, no result

Posted: Wed Aug 05, 2020 6:24 pm
by ThWuensche
Some feedback of frustration:

In the last time I built 3 PCs to support the project, running a total of 8 Radeon VII. Just after adding the last PC with a Ryzen 3900X and 4 Radeon VII my daily PPD increased to above 10 million, hopefully providing some useful scientific value. After trouble with a lot of failing 13416 WUs the situation had improved with the 13418 series.

But now with the 13420 series the trouble is back, they almost all are failing right at the beginning. Instead I get a lot of 16918 WUs, which sit at about 100kPPD (or even less) and run about 16h to 24h on that last box with 4 Radeon VII GPUs, blocking other WUs for long time. Right now the CPU alone (running on 12 virtual cores out of 24) is producing the same scientific value (in PPD) as the 4 Radeon VII together. The daily scientific value (measured in PPD) of all the boxes together dropped from beyond 10mPPD to about 4mPPD. Wonder for what I spent the money to support project moonshot, if no results come out of that investment into hardware. Hope the team can find a solution from the improved diagnostic in a new core to fix that troubles. No reason to spend electricity for idling boxes.

My setup on all the boxes is Debian 10 on Ryzen 3[79]00X with linux kernel stock graphics driver and AMD ROCm openCL stack.

Looking to a more productive future! If I can help, please let me know.

Re: Folding is not fun right now - lots of trouble, no resul

Posted: Wed Aug 05, 2020 6:58 pm
by JimboPalmer
As ever, we can't really help without the first 200 lines of your logs. This will show us your PC's Hardware, OS, Client, and configuration.

viewtopic.php?f=24&t=26036

should give help.

Re: Folding is not fun right now - lots of trouble, no resul

Posted: Wed Aug 05, 2020 8:27 pm
by ThWuensche
Yes, of course, but information had already been in this message regarding the 16918 WUs (from science.log):

viewtopic.php?f=19&t=35909#p340577

More information on the 134xx series WUs had actually been provided to JohnChodera, sorry, forgot that it was not public in the forum. So here is goes:

Code: Select all

*********************** Log Started 2020-08-04T19:03:57Z ***********************
19:03:57:Trying to access database...
19:03:57:Successfully acquired database lock
19:03:57:Read GPUs.txt
19:03:57:Enabled folding slot 00: PAUSED cpu:12 (by user)
19:03:57:Enabled folding slot 01: READY gpu:0:Vega 20 [Radeon VII]
19:03:57:Enabled folding slot 02: READY gpu:1:Vega 20 [Radeon VII]
19:03:57:Enabled folding slot 03: READY gpu:2:Vega 20 [Radeon VII]
19:03:57:Enabled folding slot 04: READY gpu:3:Vega 20 [Radeon VII]
19:03:57:****************************** FAHClient ******************************
19:03:57:        Version: 7.6.13
19:03:57:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:03:57:      Copyright: 2020 foldingathome.org
19:03:57:       Homepage: https://foldingathome.org/
19:03:57:           Date: Apr 28 2020
19:03:57:           Time: 04:20:16
19:03:57:       Revision: 5a652817f46116b6e135503af97f18e094414e3b
19:03:57:         Branch: master
19:03:57:       Compiler: GNU 8.3.0
19:03:57:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
19:03:57:                 -funroll-loops -fno-pie
19:03:57:       Platform: linux2 4.19.0-5-amd64
19:03:57:           Bits: 64
19:03:57:           Mode: Release
19:03:57:           Args: --child /etc/fahclient/config.xml
19:03:57:                 --pid-file=/run/fahclient/fahclient.pid --daemon
19:03:57:         Config: /etc/fahclient/config.xml
19:03:57:******************************** CBang ********************************
19:03:57:           Date: Apr 25 2020
19:03:57:           Time: 00:07:53
19:03:57:       Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
19:03:57:         Branch: master
19:03:57:       Compiler: GNU 8.3.0
19:03:57:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
19:03:57:                 -funroll-loops -fno-pie -fPIC
19:03:57:       Platform: linux2 4.19.0-5-amd64
19:03:57:           Bits: 64
19:03:57:           Mode: Release
19:03:57:******************************* System ********************************
19:03:57:            CPU: AMD Ryzen 9 3900X 12-Core Processor
19:03:57:         CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
19:03:57:           CPUs: 24
19:03:57:         Memory: 15.58GiB
19:03:57:    Free Memory: 5.98GiB
19:03:57:        Threads: POSIX_THREADS
19:03:57:     OS Version: 5.6
19:03:57:    Has Battery: false
19:03:57:     On Battery: false
19:03:57:     UTC Offset: 2
19:03:57:            PID: 7921
19:03:57:            CWD: /var/lib/fahclient
19:03:57:             OS: Linux 5.6.0-0.bpo.2-amd64 x86_64
19:03:57:        OS Arch: AMD64
19:03:57:           GPUs: 4
19:03:57:          GPU 0: Bus:3 Slot:0 Func:0 AMD:6 Vega 20 [Radeon VII]
19:03:57:          GPU 1: Bus:9 Slot:0 Func:0 AMD:6 Vega 20 [Radeon VII]
19:03:57:          GPU 2: Bus:17 Slot:0 Func:0 AMD:6 Vega 20 [Radeon VII]
19:03:57:          GPU 3: Bus:20 Slot:0 Func:0 AMD:6 Vega 20 [Radeon VII]
19:03:57:           CUDA: Not detected: Failed to open dynamic library 'libcuda.so':
19:03:57:                 libcuda.so: cannot open shared object file: No such file or
19:03:57:                 directory
19:03:57:OpenCL Device 0: Platform:0 Device:0 Bus:3 Slot:0 Compute:2.0 Driver:3137.0
19:03:57:OpenCL Device 1: Platform:0 Device:1 Bus:9 Slot:0 Compute:2.0 Driver:3137.0
19:03:57:OpenCL Device 2: Platform:0 Device:2 Bus:17 Slot:0 Compute:2.0 Driver:3137.0
19:03:57:OpenCL Device 3: Platform:0 Device:3 Bus:20 Slot:0 Compute:2.0 Driver:3137.0
19:03:57:******************************* libFAH ********************************
19:03:57:           Date: Apr 15 2020
19:03:57:           Time: 21:43:24
19:03:57:       Revision: 216968bc7025029c841ed6e36e81a03a316890d3
19:03:57:         Branch: master
19:03:57:       Compiler: GNU 8.3.0
19:03:57:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
19:03:57:                 -funroll-loops -fno-pie
19:03:57:       Platform: linux2 4.19.0-5-amd64
19:03:57:           Bits: 64
19:03:57:           Mode: Release
19:03:57:***********************************************************************
19:03:57:<config>
19:03:57:  <!-- Client Control -->
19:03:57:  <fold-anon v='true'/>
19:03:57:
19:03:57:  <!-- User Information -->
19:03:57:  <passkey v='*****'/>
19:03:57:  <team v='265730'/>
19:03:57:  <user v='tw@ems-wuensche'/>
19:03:57:
19:03:57:  <!-- Folding Slots -->
19:03:57:  <slot id='0' type='CPU'>
19:03:57:    <cpus v='12'/>
19:03:57:    <paused v='true'/>
19:03:57:  </slot>
19:03:57:  <slot id='1' type='GPU'/>
19:03:57:  <slot id='2' type='GPU'/>
19:03:57:  <slot id='3' type='GPU'/>
19:03:57:  <slot id='4' type='GPU'/>
19:03:57:</config>
And here is output of failing 13420 WUs filtered on one channel:

Code: Select all

19:05:50:WU01:FS01:Starting
19:05:50:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.11/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 7921 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
19:05:50:WU01:FS01:Started FahCore on PID 8704
19:05:50:WU01:FS01:Core PID:8708
19:05:50:WU01:FS01:FahCore 0x22 started
19:05:51:WU01:FS01:0x22:*********************** Log Started 2020-08-04T19:05:50Z ***********************
19:05:51:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
19:05:51:WU01:FS01:0x22:       Core: Core22
19:05:51:WU01:FS01:0x22:       Type: 0x22
19:05:51:WU01:FS01:0x22:    Version: 0.0.11
19:05:51:WU01:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:05:51:WU01:FS01:0x22:  Copyright: 2020 foldingathome.org
19:05:51:WU01:FS01:0x22:   Homepage: https://foldingathome.org/
19:05:51:WU01:FS01:0x22:       Date: Jun 27 2020
19:05:51:WU01:FS01:0x22:       Time: 22:50:00
19:05:51:WU01:FS01:0x22:   Revision: cfc2940c5dd1aa80f60daa6e28d4a2a417f74edb
19:05:51:WU01:FS01:0x22:     Branch: core22-0.0.11
19:05:51:WU01:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
19:05:51:WU01:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
19:05:51:WU01:FS01:0x22:             -funroll-loops
19:05:51:WU01:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
19:05:51:WU01:FS01:0x22:       Bits: 64
19:05:51:WU01:FS01:0x22:       Mode: Release
19:05:51:WU01:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
19:05:51:WU01:FS01:0x22:             <peastman@stanford.edu>
19:05:51:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 706 -lifeline 8704 -checkpoint 15
19:05:51:WU01:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
19:05:51:WU01:FS01:0x22:************************************ libFAH ************************************
19:05:51:WU01:FS01:0x22:       Date: Jun 27 2020
19:05:51:WU01:FS01:0x22:       Time: 22:11:04
19:05:51:WU01:FS01:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
19:05:51:WU01:FS01:0x22:     Branch: HEAD
19:05:51:WU01:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
19:05:51:WU01:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
19:05:51:WU01:FS01:0x22:             -funroll-loops
19:05:51:WU01:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
19:05:51:WU01:FS01:0x22:       Bits: 64
19:05:51:WU01:FS01:0x22:       Mode: Release
19:05:51:WU01:FS01:0x22:************************************ CBang *************************************
19:05:51:WU01:FS01:0x22:       Date: Jun 27 2020
19:05:51:WU01:FS01:0x22:       Time: 22:10:11
19:05:51:WU01:FS01:0x22:   Revision: f8529962055b0e7bde23e429f5072ff758089dee
19:05:51:WU01:FS01:0x22:     Branch: HEAD
19:05:51:WU01:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
19:05:51:WU01:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
19:05:51:WU01:FS01:0x22:             -funroll-loops -fPIC
19:05:51:WU01:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
19:05:51:WU01:FS01:0x22:       Bits: 64
19:05:51:WU01:FS01:0x22:       Mode: Release
19:05:51:WU01:FS01:0x22:************************************ System ************************************
19:05:51:WU01:FS01:0x22:        CPU: AMD Ryzen 9 3900X 12-Core Processor
19:05:51:WU01:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
19:05:51:WU01:FS01:0x22:       CPUs: 24
19:05:51:WU01:FS01:0x22:     Memory: 15.58GiB
19:05:51:WU01:FS01:0x22:Free Memory: 4.47GiB
19:05:51:WU01:FS01:0x22:    Threads: POSIX_THREADS
19:05:51:WU01:FS01:0x22: OS Version: 5.6
19:05:51:WU01:FS01:0x22:Has Battery: false
19:05:51:WU01:FS01:0x22: On Battery: false
19:05:51:WU01:FS01:0x22: UTC Offset: 2
19:05:51:WU01:FS01:0x22:        PID: 8708
19:05:51:WU01:FS01:0x22:        CWD: /var/lib/fahclient/work
19:05:51:WU01:FS01:0x22:********************************************************************************
19:05:51:WU01:FS01:0x22:Project: 13420 (Run 2033, Clone 28, Gen 1)
19:05:51:WU01:FS01:0x22:Unit: 0x0000000412bc7d9a5f1f47e2d7501c56
19:05:51:WU01:FS01:0x22:Digital signatures verified
19:05:51:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
19:05:51:WU01:FS01:0x22:Version 0.0.11
19:05:51:WU01:FS01:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
19:05:51:WU01:FS01:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
19:05:51:WU01:FS01:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
19:05:51:WU01:FS01:0x22:  Global context and integrator variables write interval: 25000 steps (2.5%) [40 total]
19:06:06:WU01:FS01:0x22:Completed 0 out of 1000000 steps (0%)
19:06:39:WU01:FS01:0x22:An exception occurred at step 501: Particle coordinate is nan
19:06:39:WU01:FS01:0x22:Max number of attempts to resume from last checkpoint (2) reached. Aborting.
19:06:39:WU01:FS01:0x22:ERROR:114: Max number of attempts to resume from last checkpoint reached.
19:06:39:WU01:FS01:0x22:Saving result file ../logfile_01.txt
19:06:39:WU01:FS01:0x22:Saving result file globals.csv
19:06:39:WU01:FS01:0x22:Saving result file science.log
19:06:39:WU01:FS01:0x22:Saving result file state.xml.bz2
19:06:40:WU01:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
[93m19:06:40:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)[0m
19:06:40:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13420 run:2033 clone:28 gen:1 core:0x22 unit:0x0000000412bc7d9a5f1f47e2d7501c56
19:06:40:WU01:FS01:Uploading 5.62MiB to 18.188.125.154
19:06:40:WU01:FS01:Connecting to 18.188.125.154:8080
19:06:40:WU03:FS01:Connecting to assign1.foldingathome.org:80
19:06:41:WU03:FS01:Assigned to work server 18.188.125.154
19:06:41:WU03:FS01:Requesting new work unit for slot 01: READY gpu:0:Vega 20 [Radeon VII] from 18.188.125.154
19:06:41:WU03:FS01:Connecting to 18.188.125.154:8080
19:06:41:WU03:FS01:Downloading 7.04MiB
19:06:46:WU03:FS01:Download complete
19:06:46:WU03:FS01:Received Unit: id:03 state:DOWNLOAD error:NO_ERROR project:13420 run:4211 clone:28 gen:1 core:0x22 unit:0x0000000412bc7d9a5f20705faa4c8246
19:06:46:WU03:FS01:Starting
19:06:46:WU03:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.11/Core_22.fah/FahCore_22 -dir 03 -suffix 01 -version 706 -lifeline 7921 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
19:06:46:WU03:FS01:Started FahCore on PID 8800
19:06:46:WU03:FS01:Core PID:8804
19:06:46:WU03:FS01:FahCore 0x22 started
19:06:46:WU01:FS01:Upload 47.80%
19:06:46:WU03:FS01:0x22:*********************** Log Started 2020-08-04T19:06:46Z ***********************
19:06:46:WU03:FS01:0x22:*************************** Core22 Folding@home Core ***************************
19:06:46:WU03:FS01:0x22:       Core: Core22
19:06:46:WU03:FS01:0x22:       Type: 0x22
19:06:46:WU03:FS01:0x22:    Version: 0.0.11
19:06:46:WU03:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:06:46:WU03:FS01:0x22:  Copyright: 2020 foldingathome.org
19:06:46:WU03:FS01:0x22:   Homepage: https://foldingathome.org/
19:06:46:WU03:FS01:0x22:       Date: Jun 27 2020
19:06:46:WU03:FS01:0x22:       Time: 22:50:00
19:06:46:WU03:FS01:0x22:   Revision: cfc2940c5dd1aa80f60daa6e28d4a2a417f74edb
19:06:46:WU03:FS01:0x22:     Branch: core22-0.0.11
19:06:46:WU03:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
19:06:46:WU03:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
19:06:46:WU03:FS01:0x22:             -funroll-loops
19:06:46:WU03:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
19:06:46:WU03:FS01:0x22:       Bits: 64
19:06:46:WU03:FS01:0x22:       Mode: Release
19:06:46:WU03:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
19:06:46:WU03:FS01:0x22:             <peastman@stanford.edu>
19:06:46:WU03:FS01:0x22:       Args: -dir 03 -suffix 01 -version 706 -lifeline 8800 -checkpoint 15
19:06:46:WU03:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
19:06:46:WU03:FS01:0x22:************************************ libFAH ************************************
19:06:46:WU03:FS01:0x22:       Date: Jun 27 2020
19:06:46:WU03:FS01:0x22:       Time: 22:11:04
19:06:46:WU03:FS01:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
19:06:46:WU03:FS01:0x22:     Branch: HEAD
19:06:46:WU03:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
19:06:46:WU03:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
19:06:46:WU03:FS01:0x22:             -funroll-loops
19:06:46:WU03:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
19:06:46:WU03:FS01:0x22:       Bits: 64
19:06:46:WU03:FS01:0x22:       Mode: Release
19:06:46:WU03:FS01:0x22:************************************ CBang *************************************
19:06:46:WU03:FS01:0x22:       Date: Jun 27 2020
19:06:46:WU03:FS01:0x22:       Time: 22:10:11
19:06:46:WU03:FS01:0x22:   Revision: f8529962055b0e7bde23e429f5072ff758089dee
19:06:46:WU03:FS01:0x22:     Branch: HEAD
19:06:46:WU03:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
19:06:46:WU03:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
19:06:46:WU03:FS01:0x22:             -funroll-loops -fPIC
19:06:46:WU03:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
19:06:46:WU03:FS01:0x22:       Bits: 64
19:06:46:WU03:FS01:0x22:       Mode: Release
19:06:46:WU03:FS01:0x22:************************************ System ************************************
19:06:46:WU03:FS01:0x22:        CPU: AMD Ryzen 9 3900X 12-Core Processor
19:06:46:WU03:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
19:06:46:WU03:FS01:0x22:       CPUs: 24
19:06:46:WU03:FS01:0x22:     Memory: 15.58GiB
19:06:46:WU03:FS01:0x22:Free Memory: 4.48GiB
19:06:46:WU03:FS01:0x22:    Threads: POSIX_THREADS
19:06:46:WU03:FS01:0x22: OS Version: 5.6
19:06:46:WU03:FS01:0x22:Has Battery: false
19:06:46:WU03:FS01:0x22: On Battery: false
19:06:46:WU03:FS01:0x22: UTC Offset: 2
19:06:46:WU03:FS01:0x22:        PID: 8804
19:06:46:WU03:FS01:0x22:        CWD: /var/lib/fahclient/work
19:06:46:WU03:FS01:0x22:********************************************************************************
19:06:46:WU03:FS01:0x22:Project: 13420 (Run 4211, Clone 28, Gen 1)
19:06:46:WU03:FS01:0x22:Unit: 0x0000000412bc7d9a5f20705faa4c8246
19:06:46:WU03:FS01:0x22:Reading tar file core.xml
19:06:46:WU03:FS01:0x22:Reading tar file integrator.xml
19:06:46:WU03:FS01:0x22:Reading tar file state.xml.bz2
19:06:46:WU03:FS01:0x22:Reading tar file system.xml.bz2
19:06:46:WU03:FS01:0x22:Digital signatures verified
19:06:46:WU03:FS01:0x22:Folding@home GPU Core22 Folding@home Core
19:06:46:WU03:FS01:0x22:Version 0.0.11
19:06:46:WU03:FS01:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
19:06:46:WU03:FS01:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
19:06:46:WU03:FS01:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
19:06:46:WU03:FS01:0x22:  Global context and integrator variables write interval: 25000 steps (2.5%) [40 total]
19:06:52:WU01:FS01:Upload 92.26%
19:06:54:WU01:FS01:Upload complete
19:06:54:WU01:FS01:Server responded WORK_ACK (400)
19:06:54:WU01:FS01:Cleaning up
19:07:01:WU03:FS01:0x22:Completed 0 out of 1000000 steps (0%)
19:07:47:WU03:FS01:0x22:An exception occurred at step 250: Particle coordinate is nan
19:07:47:WU03:FS01:0x22:ERROR:98: Attempting to restart from last good checkpoint by restarting core.
19:07:47:WU03:FS01:0x22:Folding@home Core Shutdown: CORE_RESTART
[93m19:07:48:WARNING:WU03:FS01:FahCore returned: CORE_RESTART (98 = 0x62)[0m
19:07:48:WU03:FS01:Starting
19:07:48:WU03:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit/22-0.0.11/Core_22.fah/FahCore_22 -dir 03 -suffix 01 -version 706 -lifeline 7921 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
19:07:48:WU03:FS01:Started FahCore on PID 8896
19:07:48:WU03:FS01:Core PID:8900
19:07:48:WU03:FS01:FahCore 0x22 started
19:07:48:WU03:FS01:0x22:*********************** Log Started 2020-08-04T19:07:48Z ***********************
19:07:48:WU03:FS01:0x22:*************************** Core22 Folding@home Core ***************************
19:07:48:WU03:FS01:0x22:       Core: Core22
19:07:48:WU03:FS01:0x22:       Type: 0x22
19:07:48:WU03:FS01:0x22:    Version: 0.0.11
19:07:48:WU03:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:07:48:WU03:FS01:0x22:  Copyright: 2020 foldingathome.org
19:07:48:WU03:FS01:0x22:   Homepage: https://foldingathome.org/
19:07:48:WU03:FS01:0x22:       Date: Jun 27 2020
19:07:48:WU03:FS01:0x22:       Time: 22:50:00
19:07:48:WU03:FS01:0x22:   Revision: cfc2940c5dd1aa80f60daa6e28d4a2a417f74edb
19:07:48:WU03:FS01:0x22:     Branch: core22-0.0.11
19:07:48:WU03:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
19:07:48:WU03:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
19:07:48:WU03:FS01:0x22:             -funroll-loops
19:07:48:WU03:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
19:07:48:WU03:FS01:0x22:       Bits: 64
19:07:48:WU03:FS01:0x22:       Mode: Release
19:07:48:WU03:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
19:07:48:WU03:FS01:0x22:             <peastman@stanford.edu>
19:07:48:WU03:FS01:0x22:       Args: -dir 03 -suffix 01 -version 706 -lifeline 8896 -checkpoint 15
19:07:48:WU03:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
19:07:48:WU03:FS01:0x22:************************************ libFAH ************************************
19:07:48:WU03:FS01:0x22:       Date: Jun 27 2020
19:07:48:WU03:FS01:0x22:       Time: 22:11:04
19:07:48:WU03:FS01:0x22:   Revision: 2b383f4f04f38511dff592885d7c0400e72bdf43
19:07:48:WU03:FS01:0x22:     Branch: HEAD
19:07:48:WU03:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
19:07:48:WU03:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
19:07:48:WU03:FS01:0x22:             -funroll-loops
19:07:48:WU03:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
19:07:48:WU03:FS01:0x22:       Bits: 64
19:07:48:WU03:FS01:0x22:       Mode: Release
19:07:48:WU03:FS01:0x22:************************************ CBang *************************************
19:07:48:WU03:FS01:0x22:       Date: Jun 27 2020
19:07:48:WU03:FS01:0x22:       Time: 22:10:11
19:07:48:WU03:FS01:0x22:   Revision: f8529962055b0e7bde23e429f5072ff758089dee
19:07:48:WU03:FS01:0x22:     Branch: HEAD
19:07:48:WU03:FS01:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
19:07:48:WU03:FS01:0x22:    Options: -std=c++11 -fsigned-char -ffunction-sections -fdata-sections -O3
19:07:48:WU03:FS01:0x22:             -funroll-loops -fPIC
19:07:48:WU03:FS01:0x22:   Platform: linux2 4.19.76-linuxkit
19:07:48:WU03:FS01:0x22:       Bits: 64
19:07:48:WU03:FS01:0x22:       Mode: Release
19:07:48:WU03:FS01:0x22:************************************ System ************************************
19:07:48:WU03:FS01:0x22:        CPU: AMD Ryzen 9 3900X 12-Core Processor
19:07:48:WU03:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
19:07:48:WU03:FS01:0x22:       CPUs: 24
19:07:48:WU03:FS01:0x22:     Memory: 15.58GiB
19:07:48:WU03:FS01:0x22:Free Memory: 4.46GiB
19:07:48:WU03:FS01:0x22:    Threads: POSIX_THREADS
19:07:48:WU03:FS01:0x22: OS Version: 5.6
19:07:48:WU03:FS01:0x22:Has Battery: false
19:07:48:WU03:FS01:0x22: On Battery: false
19:07:48:WU03:FS01:0x22: UTC Offset: 2
19:07:48:WU03:FS01:0x22:        PID: 8900
19:07:48:WU03:FS01:0x22:        CWD: /var/lib/fahclient/work
19:07:48:WU03:FS01:0x22:********************************************************************************
19:07:48:WU03:FS01:0x22:Project: 13420 (Run 4211, Clone 28, Gen 1)
19:07:48:WU03:FS01:0x22:Unit: 0x0000000412bc7d9a5f20705faa4c8246
19:07:48:WU03:FS01:0x22:Digital signatures verified
19:07:48:WU03:FS01:0x22:Folding@home GPU Core22 Folding@home Core
19:07:48:WU03:FS01:0x22:Version 0.0.11
19:07:48:WU03:FS01:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
19:07:48:WU03:FS01:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
19:07:48:WU03:FS01:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
19:07:48:WU03:FS01:0x22:  Global context and integrator variables write interval: 25000 steps (2.5%) [40 total]
19:08:03:WU03:FS01:0x22:Completed 0 out of 1000000 steps (0%)
19:08:49:WU03:FS01:0x22:An exception occurred at step 250: Particle coordinate is nan
19:08:49:WU03:FS01:0x22:Max number of attempts to resume from last checkpoint (2) reached. Aborting.
19:08:49:WU03:FS01:0x22:ERROR:114: Max number of attempts to resume from last checkpoint reached.
19:08:49:WU03:FS01:0x22:Saving result file ../logfile_01.txt
19:08:49:WU03:FS01:0x22:Saving result file globals.csv
19:08:49:WU03:FS01:0x22:Saving result file science.log
19:08:49:WU03:FS01:0x22:Saving result file state.xml.bz2
19:08:49:WU03:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
[93m19:08:49:WARNING:WU03:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)[0m
Hope, that is sufficient information.

Re: Folding is not fun right now - lots of trouble, no resul

Posted: Wed Aug 05, 2020 8:45 pm
by JimboPalmer
Drat. neither linux nor AMD are products I have any familiarity with. perhaps someone who knows more will help.

I will say it looks like a valid configuration.

nan is Not A Number https://en.wikipedia.org/wiki/NaN

Re: Folding is not fun right now - lots of trouble, no resul

Posted: Thu Aug 06, 2020 1:04 am
by _r2w_ben
To avoid 13420 until the next sprint with new project numbers, you could block connections to 18.188.125.154. This would avoid repeatedly thrashing work units but it might take longer to get work.

Can you run roc-smi and post the output when one of your GPUs has a 16918?

You could try running two slots per GPU by manually setting gpu-index and opencl-index to see if that makes better use of your high-end hardware until 16918 is finished.

Code: Select all

<slot id='1' type='GPU'>
  <gpu-index v='0'/>
  <opencl-index v='0'/>
</slot>
<slot id='2' type='GPU'>
  <gpu-index v='0'/>
  <opencl-index v='0'/>
</slot>
<slot id='3' type='GPU'>
  <gpu-index v='1'/>
  <opencl-index v='1'/>
</slot>
<slot id='4' type='GPU'>
  <gpu-index v='1'/>
  <opencl-index v='1'/>
</slot>
<slot id='5' type='GPU'>
  <gpu-index v='2'/>
  <opencl-index v='2'/>
</slot>
<slot id='6' type='GPU'>
  <gpu-index v='2'/>
  <opencl-index v='2'/>
</slot>
<slot id='7' type='GPU'>
  <gpu-index v='3'/>
  <opencl-index v='3'/>
</slot>
<slot id='8' type='GPU'>
  <gpu-index v='3'/>
  <opencl-index v='3'/>
</slot>

Re: Folding is not fun right now - lots of trouble, no resul

Posted: Thu Aug 06, 2020 1:29 am
by anandhanju
Hi ThWuensche, thanks for these reports. So far, you're the only one who's been issued the two 13420 WUs in your log that failed at 0%. While it's true that the number of failures have reduced from earlier 134XX projects, a few of these can still fail due to them being bad WUs -- we'll wait and see how other users do with these two WUs once they are reassigned.

https://apps.foldingathome.org/wu#proje ... e=28&gen=1
https://apps.foldingathome.org/wu#proje ... e=28&gen=1

Do you see failures for *all* 13420s? If so, that'd defintely be something for the researchers to look into.

As for the PPD, 13420 is about 1.5 to 2x above average to account for the fact that some of these WUs run very slowly (about 2-3x longer than "normal" ones). To account for this variability, the base points are boosted so that it averages out. However, this ends up inflating the PPD estimates, especially if you haven't been at the receiving end of the slower WUs.

The 16918 WUs (as you've reported earlier) also has variability which results in more pronounced PPD drops on certain GPUs like yours. Unfortunately, with AMD cards, the scaling of atom counts and shaders isn't consistent and the grouping of card types makes it impossible (AFAIK) to exclude these from being assigned to cards like yours at the moment. The net effect is an inflated PPD from 134XX and a worse than normal result from 16918, which when seen back to back can be maddening. I'm not an official spokesperson for the project but I commend your committment in getting a setup like yours working for FAH and from what've seen, these things improve over time and do get addressed. As bruce mentioned in the other thread, there's someone looking to sort out the variability and assignments and it can take some time to get it nailed down.

Not plugging LTT, but there was an extension posted in another thread which also reports PPDs and these are the ones for your card.
https://folding.lar.systems/folding_dat ... ga%2020%20[Radeon%20VII] Quite a bit of variability even with the very limited dataset they have there. I know seeing this doesn't make it any better that you're getting 100K :lol: Hang on there bud

Edit: See ben's comment above for meaningful and actionable suggestions. I went on a bit of a ramble.

Re: Folding is not fun right now - lots of trouble, no resul

Posted: Thu Aug 06, 2020 10:29 am
by toTOW
Is there something wrong with Radeon 7 ? This guy is also having a log of crashes on his : viewtopic.php?f=19&t=35933 :(

Re: Folding is not fun right now - lots of trouble, no resul

Posted: Thu Aug 06, 2020 10:49 am
by NormalDiffusion
toTOW wrote:Is there something wrong with Radeon 7 ? This guy is also having a log of crashes on his : viewtopic.php?f=19&t=35933 :(
But running the 13420 WUs flawlessly...

Re: Folding is not fun right now - lots of trouble, no resul

Posted: Thu Aug 06, 2020 11:57 am
by muziqaz
@toTOW, I have my theory about 16448, we might need to look into it a bit more.
Now to the main issue:
1st things 1st, whoever advised you to buy all those Radeon VIIs for folding and install them into Linux machines needs to be fired and banned for good. This should have never gone past wallet security, ever. This is worst combination for folding in the history of folding.
1 AMD card + Linux = Linus level rage inducing experience, 4 cards in same Linux system is just pure masochism. Drivers from AMD were never designed to work with AMD card, and it is beyond AMD software team's wildest dreams that someone would have put 4 of their cards in the Linux system. 2080 (non S) could have been way better PPD wise and less issues
I understand what's done is done, but beyond asking AMD to fix their Linux drivers and optimise for 4 cards in same system, there is nothing one would be able to do.
I take it, that you have at least 1600W Superflower/Seasonic PSU, because anything less than that even on the good brand will cause GPUs to fail left and right.
I honestly believe Windows would fix this issue (ironically).
My own Radeon VII on Windows is folding flawlessly, and I even rely on it to see if one project or the other is unstable.

Re: Folding is not fun right now - lots of trouble, no resul

Posted: Thu Aug 06, 2020 6:43 pm
by ThWuensche
@muziqaz

The cards probably will be running FAH most of the time, but the design decision is based on and the reason to justify the expenses is training of DNN for plant detection, so image recognition in some form. This happens on a lot of pictures in parallel and there the 16G of memory and the memory bandwidth made the point. Further on I will not use NVidia cards due to their closed source driver policy. And I'm running Linux since 1993, somewhere about version 0.12 - no intention to use Windows at all besides a very limited set of tasks for which there is no software available for Linux.

The power supply is 1500W, but things are not improving if only one or two cards are active.

Thanks for you comment and good to know that there are no such problems on Windows. For me also most of the projects are running without trouble, just in the last days there almost only are those two projects, which have uncommon behavior on my systems.

Re: Folding is not fun right now - lots of trouble, no resul

Posted: Thu Aug 06, 2020 6:48 pm
by muziqaz
ThWuensche wrote:@muziqaz

The cards probably will be running FAH most of the time, but the design decision is based on and the reason to justify the expenses is training of DNN for plant detection, so image recognition in some form. This happens on a lot of pictures in parallel and there the 16G of memory and the memory bandwidth made the point. Further on I will not use NVidia cards due to their closed source driver policy. And I'm running Linux since 1993, somewhere about version 0.12 - no intention to use Windows at all besides a very limited set of tasks for which there is no software available for Linux.

The power supply is 1500W, but things are not improving if only one or two cards are active.

Thanks for you comment and good to know that there are no such problems on Windows. For me also most of the projects are running without trouble, just in the last days there almost only are those two projects, which have uncommon behavior on my systems.
I have sent you a private message, but it is still sitting in my outbox for some reason. Is your inbox on this forum full up by any chance? :)

I understand the reasoning going AMD now. Thanks. I was under impression you bought those cards solely for folding :)
Unfortunately as I mentioned earlier, AMD linux support is subpar in regards to GPUs. Especially multiple ones on single system :(
Also one of the fellow folders mentioned something about latest kernel messing up multiGPU support on Linux. As I do not use Linux and do not follow kernel development, I cannot comment if it is correct or not, and unfortunately cannot help much debugging issues in Linux ecosytem :)

Re: Folding is not fun right now - lots of trouble, no resul

Posted: Fri Aug 07, 2020 3:25 pm
by gunnarre
If you're pausing FAH completely when you're running the DNN, then virtualization might be an option - at least for three of the cards. I'm folding on an old Radeon 7770 HD card which doesn't have recent Linux drivers, using IOMMU to export the PCIe device to a Libvirt QEMU hosted guest OS. It works in both Windows and Linux as the guest OS. Linux has the advantage that it's easier to install as a guest, and uses less memory. Windows is more work to install, but driver support is easer to deal with. I ran Windows 10 as the guest for a while until I found the exact Linux kernel and AMD driver which would work for my card, and then I spun down the Windows 10 guest and used the Linux guest instead. You might even consider running one guest OS per graphics card - except the one you're using as a display.

Edit: Here's a guide for setting up GPU passthrough when your Linux distro has vfio as an optional kernel module: Using GPUs in KVM Virtual Machines. Note that these instructions need modification if the kernel already has vfio compiled in - in that case, you need to set the vfio-settings in GRUB rather than in the modprobe.d settings. For example: (Replace the PCIe device identifiers with those you need to export from your system - here both the audio and video from the Radeon 7770HD.):

Code: Select all

GRUB_CMDLINE_LINUX_DEFAULT="amd_iommu=pt vfio-pci.ids=1002:683d,1002:aab0"
You can even run the host machine in headless mode and export every single graphics card to a guest, but you then have to add

Code: Select all

video=efifb:off
to the GRUB boot options to prevent the host OS from grabbing the graphics card on boot.

Re: Folding is not fun right now - lots of trouble, no resul

Posted: Fri Aug 07, 2020 7:35 pm
by ThWuensche
@_r2w_ben

Here is the output of rocm-smi -a. GPU 0 and GPU2 have been paused, GPU 1 and GPU 3 are each running 16918 WUs at a little over 110000PPD.

Code: Select all

========================ROCm System Management Interface========================
Driver version: 5.6.0-0.bpo.2-amd64
================================================================================
GPU[0] 		: GPU ID: 0x66af
GPU[1] 		: GPU ID: 0x66af
GPU[2] 		: GPU ID: 0x66af
GPU[3] 		: GPU ID: 0x66af
================================================================================
================================================================================
GPU[0] 		: VBIOS version: 113-D3600200-106
GPU[1] 		: VBIOS version: 113-D3600200-106
GPU[2] 		: VBIOS version: 113-D3600200-106
GPU[3] 		: VBIOS version: 113-D3600200-106
================================================================================
================================================================================
GPU[0] 		: Temperature (Sensor edge) (C): 29.0
GPU[0] 		: Temperature (Sensor junction) (C): 30.0
GPU[0] 		: Temperature (Sensor mem) (C): 31.0
GPU[1] 		: Temperature (Sensor edge) (C): 38.0
GPU[1] 		: Temperature (Sensor junction) (C): 42.0
GPU[1] 		: Temperature (Sensor mem) (C): 41.0
GPU[2] 		: Temperature (Sensor edge) (C): 35.0
GPU[2] 		: Temperature (Sensor junction) (C): 35.0
GPU[2] 		: Temperature (Sensor mem) (C): 33.0
GPU[3] 		: Temperature (Sensor edge) (C): 41.0
GPU[3] 		: Temperature (Sensor junction) (C): 42.0
GPU[3] 		: Temperature (Sensor mem) (C): 42.0
================================================================================
================================================================================
GPU[0] 		: dcefclk clock level: 0 (358Mhz)
GPU[0] 		: fclk clock level: 0 (551Mhz)
GPU[0] 		: mclk clock level: 0 (351Mhz)
GPU[0] 		: sclk clock level: 1 (809Mhz)
GPU[0] 		: socclk clock level: 0 (310Mhz)
================================================================================
GPU[1] 		: dcefclk clock level: 0 (358Mhz)
GPU[1] 		: fclk clock level: 0 (551Mhz)
GPU[1] 		: mclk clock level: 0 (351Mhz)
GPU[1] 		: pcie clock level: 1 (8.0GT/s, x4 308Mhz)
GPU[1] 		: sclk clock level: 0 (701Mhz)
GPU[1] 		: socclk clock level: 0 (310Mhz)
================================================================================
GPU[2] 		: dcefclk clock level: 0 (358Mhz)
GPU[2] 		: fclk clock level: 7 (1226Mhz)
GPU[2] 		: mclk clock level: 0 (351Mhz)
GPU[2] 		: sclk clock level: 1 (809Mhz)
GPU[2] 		: socclk clock level: 7 (972Mhz)
================================================================================
GPU[3] 		: dcefclk clock level: 0 (358Mhz)
GPU[3] 		: fclk clock level: 6 (1081Mhz)
GPU[3] 		: mclk clock level: 1 (801Mhz)
GPU[3] 		: pcie clock level: 1 (8.0GT/s, x8 308Mhz)
GPU[3] 		: sclk clock level: 4 (1547Mhz)
GPU[3] 		: socclk clock level: 6 (850Mhz)
================================================================================
================================================================================
GPU[0] 		: Fan Level: 53 (20%)
GPU[1] 		: Fan Level: 51 (20%)
GPU[2] 		: Fan Level: 53 (20%)
GPU[3] 		: Fan Level: 53 (20%)
================================================================================
================================================================================
GPU[0] 		: Performance Level: auto
GPU[1] 		: Performance Level: auto
GPU[2] 		: Performance Level: auto
GPU[3] 		: Performance Level: auto
================================================================================
================================================================================
GPU[0] 		: GPU OverDrive value (%): 0
GPU[1] 		: GPU OverDrive value (%): 0
GPU[2] 		: GPU OverDrive value (%): 0
GPU[3] 		: GPU OverDrive value (%): 0
================================================================================
================================================================================
GPU[0] 		: GPU Memory OverDrive value (%): 0
GPU[1] 		: GPU Memory OverDrive value (%): 0
GPU[2] 		: GPU Memory OverDrive value (%): 0
GPU[3] 		: GPU Memory OverDrive value (%): 0
================================================================================
================================================================================
GPU[0] 		: Max Graphics Package Power (W): 250.0
GPU[1] 		: Max Graphics Package Power (W): 250.0
GPU[2] 		: Max Graphics Package Power (W): 250.0
GPU[3] 		: Max Graphics Package Power (W): 250.0
================================================================================
================================================================================
GPU[0] 		: 
GPU[0] 		: PROFILE_INDEX(NAME) CLOCK_TYPE(NAME) FPS UseRlcBusy MinActiveFreqType MinActiveFreq BoosterFreqType BoosterFreq PD_Data_limit_c PD_Data_error_coeff PD_Data_error_rate_coeff
GPU[0] 		:  0 BOOTUP_DEFAULT*:
GPU[0] 		:                     0(       GFXCLK)       0       0       1       0       4     800 4587520  -65536       0
GPU[0] 		:                     1(       SOCCLK)       0       0       1       0       4     800  327680   -6553       0
GPU[0] 		:                     2(         UCLK)       0       0       1       0       4     800  327680  -65536       0
GPU[0] 		:                     3(         FCLK)       0       0       0       0       4     800  327680   -6553       0
GPU[0] 		:  1 3D_FULL_SCREEN :
GPU[0] 		:                     0(       GFXCLK)       0       1       1       0       4     800 4587520  -65536       0
GPU[0] 		:                     1(       SOCCLK)       0       1       4     850       4     800  327680  -65536       0
GPU[0] 		:                     2(         UCLK)       0       1       4     850       4     800  327680  -65536       0
GPU[0] 		:                     3(         FCLK)       0       1       4     850       4     800  327680  -65536       0
GPU[0] 		:  2   POWER_SAVING :
GPU[0] 		:                     0(       GFXCLK)       0       0       1       0       3       0 5898240  -65536       0
GPU[0] 		:                     1(       SOCCLK)       0       0       1       0       3       0 1310720   -6553       0
GPU[0] 		:                     2(         UCLK)       0       0       1       0       3       0 1966080  -65536       0
GPU[0] 		:                     3(         FCLK)       0       0       0       0       3     800 1966080   -6553       0
GPU[0] 		:  3          VIDEO :
GPU[0] 		:                     0(       GFXCLK)       0       1       1       0       4     500 4587520   -6553       0
GPU[0] 		:                     1(       SOCCLK)       0       0       1       0       4     500 1310720   -6553       0
GPU[0] 		:                     2(         UCLK)       0       0       1       0       4     500 1966080  -65536       0
GPU[0] 		:                     3(         FCLK)       0       0       3       0       4     500 1966080   -6553       0
GPU[0] 		:  4             VR :
GPU[0] 		:                     0(       GFXCLK)       0       1       0    1540       4     800 5898240   -6553   65536
GPU[0] 		:                     1(       SOCCLK)       0       1       2       0       4     800  327680  -32768  -65536
GPU[0] 		:                     2(         UCLK)       0       1       2       0       4     800  327680  -32768  -65536
GPU[0] 		:                     3(         FCLK)       0       1       2       0       4     800  327680  -32768  -65536
GPU[0] 		:  5        COMPUTE :
GPU[0] 		:                     0(       GFXCLK)       0       1       0    1600       3       0 3932160  -65536  -65536
GPU[0] 		:                     1(       SOCCLK)       0       0       4     850       3       0  327680  -65536  -32768
GPU[0] 		:                     2(         UCLK)       0       0       4     850       3       0  327680  -65536  -32768
GPU[0] 		:                     3(         FCLK)       0       0       4     850       3       0  327680  -65536  -32768
GPU[0] 		:  6         CUSTOM :
GPU[0] 		:                     0(       GFXCLK)       0       0       1       0       4     800 4587520  -65536       0
GPU[0] 		:                     1(       SOCCLK)       0       0       1       0       4     800  327680   -6553       0
GPU[0] 		:                     2(         UCLK)       0       0       1       0       4     800  327680  -65536       0
GPU[0] 		:                     3(         FCLK)       0       0       0       0       4     800  327680   -6553       0
GPU[1] 		: 
GPU[1] 		: PROFILE_INDEX(NAME) CLOCK_TYPE(NAME) FPS UseRlcBusy MinActiveFreqType MinActiveFreq BoosterFreqType BoosterFreq PD_Data_limit_c PD_Data_error_coeff PD_Data_error_rate_coeff
GPU[1] 		:  0 BOOTUP_DEFAULT :
GPU[1] 		:                     0(       GFXCLK)       0       0       1       0       4     800 4587520  -65536       0
GPU[1] 		:                     1(       SOCCLK)       0       0       1       0       4     800  327680   -6553       0
GPU[1] 		:                     2(         UCLK)       0       0       1       0       4     800  327680  -65536       0
GPU[1] 		:                     3(         FCLK)       0       0       0       0       4     800  327680   -6553       0
GPU[1] 		:  1 3D_FULL_SCREEN :
GPU[1] 		:                     0(       GFXCLK)       0       1       1       0       4     800 4587520  -65536       0
GPU[1] 		:                     1(       SOCCLK)       0       1       4     850       4     800  327680  -65536       0
GPU[1] 		:                     2(         UCLK)       0       1       4     850       4     800  327680  -65536       0
GPU[1] 		:                     3(         FCLK)       0       1       4     850       4     800  327680  -65536       0
GPU[1] 		:  2   POWER_SAVING :
GPU[1] 		:                     0(       GFXCLK)       0       0       1       0       3       0 5898240  -65536       0
GPU[1] 		:                     1(       SOCCLK)       0       0       1       0       3       0 1310720   -6553       0
GPU[1] 		:                     2(         UCLK)       0       0       1       0       3       0 1966080  -65536       0
GPU[1] 		:                     3(         FCLK)       0       0       0       0       3     800 1966080   -6553       0
GPU[1] 		:  3          VIDEO :
GPU[1] 		:                     0(       GFXCLK)       0       1       1       0       4     500 4587520   -6553       0
GPU[1] 		:                     1(       SOCCLK)       0       0       1       0       4     500 1310720   -6553       0
GPU[1] 		:                     2(         UCLK)       0       0       1       0       4     500 1966080  -65536       0
GPU[1] 		:                     3(         FCLK)       0       0       3       0       4     500 1966080   -6553       0
GPU[1] 		:  4             VR :
GPU[1] 		:                     0(       GFXCLK)       0       1       0    1540       4     800 5898240   -6553   65536
GPU[1] 		:                     1(       SOCCLK)       0       1       2       0       4     800  327680  -32768  -65536
GPU[1] 		:                     2(         UCLK)       0       1       2       0       4     800  327680  -32768  -65536
GPU[1] 		:                     3(         FCLK)       0       1       2       0       4     800  327680  -32768  -65536
GPU[1] 		:  5        COMPUTE*:
GPU[1] 		:                     0(       GFXCLK)       0       1       0    1600       3       0 3932160  -65536  -65536
GPU[1] 		:                     1(       SOCCLK)       0       0       4     850       3       0  327680  -65536  -32768
GPU[1] 		:                     2(         UCLK)       0       0       4     850       3       0  327680  -65536  -32768
GPU[1] 		:                     3(         FCLK)       0       0       4     850       3       0  327680  -65536  -32768
GPU[1] 		:  6         CUSTOM :
GPU[1] 		:                     0(       GFXCLK)       0       0       1       0       4     800 4587520  -65536       0
GPU[1] 		:                     1(       SOCCLK)       0       0       1       0       4     800  327680   -6553       0
GPU[1] 		:                     2(         UCLK)       0       0       1       0       4     800  327680  -65536       0
GPU[1] 		:                     3(         FCLK)       0       0       0       0       4     800  327680   -6553       0
GPU[2] 		: 
GPU[2] 		: PROFILE_INDEX(NAME) CLOCK_TYPE(NAME) FPS UseRlcBusy MinActiveFreqType MinActiveFreq BoosterFreqType BoosterFreq PD_Data_limit_c PD_Data_error_coeff PD_Data_error_rate_coeff
GPU[2] 		:  0 BOOTUP_DEFAULT*:
GPU[2] 		:                     0(       GFXCLK)       0       0       1       0       4     800 4587520  -65536       0
GPU[2] 		:                     1(       SOCCLK)       0       0       1       0       4     800  327680   -6553       0
GPU[2] 		:                     2(         UCLK)       0       0       1       0       4     800  327680  -65536       0
GPU[2] 		:                     3(         FCLK)       0       0       0       0       4     800  327680   -6553       0
GPU[2] 		:  1 3D_FULL_SCREEN :
GPU[2] 		:                     0(       GFXCLK)       0       1       1       0       4     800 4587520  -65536       0
GPU[2] 		:                     1(       SOCCLK)       0       1       4     850       4     800  327680  -65536       0
GPU[2] 		:                     2(         UCLK)       0       1       4     850       4     800  327680  -65536       0
GPU[2] 		:                     3(         FCLK)       0       1       4     850       4     800  327680  -65536       0
GPU[2] 		:  2   POWER_SAVING :
GPU[2] 		:                     0(       GFXCLK)       0       0       1       0       3       0 5898240  -65536       0
GPU[2] 		:                     1(       SOCCLK)       0       0       1       0       3       0 1310720   -6553       0
GPU[2] 		:                     2(         UCLK)       0       0       1       0       3       0 1966080  -65536       0
GPU[2] 		:                     3(         FCLK)       0       0       0       0       3     800 1966080   -6553       0
GPU[2] 		:  3          VIDEO :
GPU[2] 		:                     0(       GFXCLK)       0       1       1       0       4     500 4587520   -6553       0
GPU[2] 		:                     1(       SOCCLK)       0       0       1       0       4     500 1310720   -6553       0
GPU[2] 		:                     2(         UCLK)       0       0       1       0       4     500 1966080  -65536       0
GPU[2] 		:                     3(         FCLK)       0       0       3       0       4     500 1966080   -6553       0
GPU[2] 		:  4             VR :
GPU[2] 		:                     0(       GFXCLK)       0       1       0    1540       4     800 5898240   -6553   65536
GPU[2] 		:                     1(       SOCCLK)       0       1       2       0       4     800  327680  -32768  -65536
GPU[2] 		:                     2(         UCLK)       0       1       2       0       4     800  327680  -32768  -65536
GPU[2] 		:                     3(         FCLK)       0       1       2       0       4     800  327680  -32768  -65536
GPU[2] 		:  5        COMPUTE :
GPU[2] 		:                     0(       GFXCLK)       0       1       0    1600       3       0 3932160  -65536  -65536
GPU[2] 		:                     1(       SOCCLK)       0       0       4     850       3       0  327680  -65536  -32768
GPU[2] 		:                     2(         UCLK)       0       0       4     850       3       0  327680  -65536  -32768
GPU[2] 		:                     3(         FCLK)       0       0       4     850       3       0  327680  -65536  -32768
GPU[2] 		:  6         CUSTOM :
GPU[2] 		:                     0(       GFXCLK)       0       0       1       0       4     800 4587520  -65536       0
GPU[2] 		:                     1(       SOCCLK)       0       0       1       0       4     800  327680   -6553       0
GPU[2] 		:                     2(         UCLK)       0       0       1       0       4     800  327680  -65536       0
GPU[2] 		:                     3(         FCLK)       0       0       0       0       4     800  327680   -6553       0
GPU[3] 		: 
GPU[3] 		: PROFILE_INDEX(NAME) CLOCK_TYPE(NAME) FPS UseRlcBusy MinActiveFreqType MinActiveFreq BoosterFreqType BoosterFreq PD_Data_limit_c PD_Data_error_coeff PD_Data_error_rate_coeff
GPU[3] 		:  0 BOOTUP_DEFAULT :
GPU[3] 		:                     0(       GFXCLK)       0       0       1       0       4     800 4587520  -65536       0
GPU[3] 		:                     1(       SOCCLK)       0       0       1       0       4     800  327680   -6553       0
GPU[3] 		:                     2(         UCLK)       0       0       1       0       4     800  327680  -65536       0
GPU[3] 		:                     3(         FCLK)       0       0       0       0       4     800  327680   -6553       0
GPU[3] 		:  1 3D_FULL_SCREEN :
GPU[3] 		:                     0(       GFXCLK)       0       1       1       0       4     800 4587520  -65536       0
GPU[3] 		:                     1(       SOCCLK)       0       1       4     850       4     800  327680  -65536       0
GPU[3] 		:                     2(         UCLK)       0       1       4     850       4     800  327680  -65536       0
GPU[3] 		:                     3(         FCLK)       0       1       4     850       4     800  327680  -65536       0
GPU[3] 		:  2   POWER_SAVING :
GPU[3] 		:                     0(       GFXCLK)       0       0       1       0       3       0 5898240  -65536       0
GPU[3] 		:                     1(       SOCCLK)       0       0       1       0       3       0 1310720   -6553       0
GPU[3] 		:                     2(         UCLK)       0       0       1       0       3       0 1966080  -65536       0
GPU[3] 		:                     3(         FCLK)       0       0       0       0       3     800 1966080   -6553       0
GPU[3] 		:  3          VIDEO :
GPU[3] 		:                     0(       GFXCLK)       0       1       1       0       4     500 4587520   -6553       0
GPU[3] 		:                     1(       SOCCLK)       0       0       1       0       4     500 1310720   -6553       0
GPU[3] 		:                     2(         UCLK)       0       0       1       0       4     500 1966080  -65536       0
GPU[3] 		:                     3(         FCLK)       0       0       3       0       4     500 1966080   -6553       0
GPU[3] 		:  4             VR :
GPU[3] 		:                     0(       GFXCLK)       0       1       0    1540       4     800 5898240   -6553   65536
GPU[3] 		:                     1(       SOCCLK)       0       1       2       0       4     800  327680  -32768  -65536
GPU[3] 		:                     2(         UCLK)       0       1       2       0       4     800  327680  -32768  -65536
GPU[3] 		:                     3(         FCLK)       0       1       2       0       4     800  327680  -32768  -65536
GPU[3] 		:  5        COMPUTE*:
GPU[3] 		:                     0(       GFXCLK)       0       1       0    1600       3       0 3932160  -65536  -65536
GPU[3] 		:                     1(       SOCCLK)       0       0       4     850       3       0  327680  -65536  -32768
GPU[3] 		:                     2(         UCLK)       0       0       4     850       3       0  327680  -65536  -32768
GPU[3] 		:                     3(         FCLK)       0       0       4     850       3       0  327680  -65536  -32768
GPU[3] 		:  6         CUSTOM :
GPU[3] 		:                     0(       GFXCLK)       0       0       1       0       4     800 4587520  -65536       0
GPU[3] 		:                     1(       SOCCLK)       0       0       1       0       4     800  327680   -6553       0
GPU[3] 		:                     2(         UCLK)       0       0       1       0       4     800  327680  -65536       0
GPU[3] 		:                     3(         FCLK)       0       0       0       0       4     800  327680   -6553       0
================================================================================
================================================================================
GPU[0] 		: Average Graphics Package Power (W): 17.0
GPU[1] 		: Average Graphics Package Power (W): 23.0
GPU[2] 		: Average Graphics Package Power (W): 21.0
GPU[3] 		: Average Graphics Package Power (W): 20.0
================================================================================
================================================================================
GPU[0] 		: Supported dcefclk frequencies on GPU0
GPU[0] 		: 0: 358Mhz *
GPU[0] 		: 1: 454Mhz 
GPU[0] 		: 2: 567Mhz 
GPU[0] 		: 3: 680Mhz 
GPU[0] 		: 4: 756Mhz 
GPU[0] 		: 5: 850Mhz 
GPU[0] 		: 6: 972Mhz 
GPU[0] 		: 7: 1134Mhz 
GPU[0] 		: 
GPU[0] 		: Supported fclk frequencies on GPU0
GPU[0] 		: 0: 551Mhz *
GPU[0] 		: 1: 611Mhz 
GPU[0] 		: 2: 691Mhz 
GPU[0] 		: 3: 761Mhz 
GPU[0] 		: 4: 871Mhz 
GPU[0] 		: 5: 961Mhz 
GPU[0] 		: 6: 1081Mhz 
GPU[0] 		: 7: 1226Mhz 
GPU[0] 		: 
GPU[0] 		: Supported mclk frequencies on GPU0
GPU[0] 		: 0: 351Mhz *
GPU[0] 		: 1: 801Mhz 
GPU[0] 		: 2: 1001Mhz 
GPU[0] 		: 
GPU[0] 		: Supported pcie frequencies on GPU0
GPU[0] 		: 0: 2.5GT/s, x16 80Mhz 
GPU[0] 		: 1: 8.0GT/s, x2 308Mhz 
GPU[0] 		: 
GPU[0] 		: Supported sclk frequencies on GPU0
GPU[0] 		: 0: 701Mhz 
GPU[0] 		: 1: 809Mhz *
GPU[0] 		: 2: 1135Mhz 
GPU[0] 		: 3: 1373Mhz 
GPU[0] 		: 4: 1547Mhz 
GPU[0] 		: 5: 1684Mhz 
GPU[0] 		: 6: 1750Mhz 
GPU[0] 		: 7: 1774Mhz 
GPU[0] 		: 8: 1802Mhz 
GPU[0] 		: 
GPU[0] 		: Supported socclk frequencies on GPU0
GPU[0] 		: 0: 310Mhz *
GPU[0] 		: 1: 524Mhz 
GPU[0] 		: 2: 567Mhz 
GPU[0] 		: 3: 619Mhz 
GPU[0] 		: 4: 680Mhz 
GPU[0] 		: 5: 756Mhz 
GPU[0] 		: 6: 850Mhz 
GPU[0] 		: 7: 972Mhz 
GPU[0] 		: 
GPU[1] 		: Supported dcefclk frequencies on GPU1
GPU[1] 		: 0: 358Mhz *
GPU[1] 		: 1: 454Mhz 
GPU[1] 		: 2: 567Mhz 
GPU[1] 		: 3: 680Mhz 
GPU[1] 		: 4: 756Mhz 
GPU[1] 		: 5: 850Mhz 
GPU[1] 		: 6: 972Mhz 
GPU[1] 		: 7: 1134Mhz 
GPU[1] 		: 
GPU[1] 		: Supported fclk frequencies on GPU1
GPU[1] 		: 0: 551Mhz *
GPU[1] 		: 1: 611Mhz 
GPU[1] 		: 2: 691Mhz 
GPU[1] 		: 3: 761Mhz 
GPU[1] 		: 4: 871Mhz 
GPU[1] 		: 5: 961Mhz 
GPU[1] 		: 6: 1081Mhz 
GPU[1] 		: 7: 1226Mhz 
GPU[1] 		: 
GPU[1] 		: Supported mclk frequencies on GPU1
GPU[1] 		: 0: 351Mhz *
GPU[1] 		: 1: 801Mhz 
GPU[1] 		: 2: 1001Mhz 
GPU[1] 		: 
GPU[1] 		: Supported pcie frequencies on GPU1
GPU[1] 		: 0: 2.5GT/s, x16 80Mhz 
GPU[1] 		: 1: 8.0GT/s, x4 308Mhz *
GPU[1] 		: 
GPU[1] 		: Supported sclk frequencies on GPU1
GPU[1] 		: 0: 701Mhz *
GPU[1] 		: 1: 809Mhz 
GPU[1] 		: 2: 1135Mhz 
GPU[1] 		: 3: 1373Mhz 
GPU[1] 		: 4: 1547Mhz 
GPU[1] 		: 5: 1684Mhz 
GPU[1] 		: 6: 1750Mhz 
GPU[1] 		: 7: 1774Mhz 
GPU[1] 		: 8: 1802Mhz 
GPU[1] 		: 
GPU[1] 		: Supported socclk frequencies on GPU1
GPU[1] 		: 0: 310Mhz *
GPU[1] 		: 1: 524Mhz 
GPU[1] 		: 2: 567Mhz 
GPU[1] 		: 3: 619Mhz 
GPU[1] 		: 4: 680Mhz 
GPU[1] 		: 5: 756Mhz 
GPU[1] 		: 6: 850Mhz 
GPU[1] 		: 7: 972Mhz 
GPU[1] 		: 
GPU[2] 		: Supported dcefclk frequencies on GPU2
GPU[2] 		: 0: 358Mhz *
GPU[2] 		: 1: 454Mhz 
GPU[2] 		: 2: 567Mhz 
GPU[2] 		: 3: 680Mhz 
GPU[2] 		: 4: 756Mhz 
GPU[2] 		: 5: 850Mhz 
GPU[2] 		: 6: 972Mhz 
GPU[2] 		: 7: 1134Mhz 
GPU[2] 		: 
GPU[2] 		: Supported fclk frequencies on GPU2
GPU[2] 		: 0: 551Mhz 
GPU[2] 		: 1: 611Mhz 
GPU[2] 		: 2: 691Mhz 
GPU[2] 		: 3: 761Mhz 
GPU[2] 		: 4: 871Mhz 
GPU[2] 		: 5: 961Mhz 
GPU[2] 		: 6: 1081Mhz 
GPU[2] 		: 7: 1226Mhz *
GPU[2] 		: 
GPU[2] 		: Supported mclk frequencies on GPU2
GPU[2] 		: 0: 351Mhz *
GPU[2] 		: 1: 801Mhz 
GPU[2] 		: 2: 1001Mhz 
GPU[2] 		: 
GPU[2] 		: Supported pcie frequencies on GPU2
GPU[2] 		: 0: 2.5GT/s, x16 80Mhz 
GPU[2] 		: 1: 8.0GT/s, x8 308Mhz 
GPU[2] 		: 
GPU[2] 		: Supported sclk frequencies on GPU2
GPU[2] 		: 0: 701Mhz 
GPU[2] 		: 1: 809Mhz *
GPU[2] 		: 2: 1135Mhz 
GPU[2] 		: 3: 1373Mhz 
GPU[2] 		: 4: 1547Mhz 
GPU[2] 		: 5: 1684Mhz 
GPU[2] 		: 6: 1750Mhz 
GPU[2] 		: 7: 1774Mhz 
GPU[2] 		: 8: 1802Mhz 
GPU[2] 		: 
GPU[2] 		: Supported socclk frequencies on GPU2
GPU[2] 		: 0: 310Mhz 
GPU[2] 		: 1: 524Mhz 
GPU[2] 		: 2: 567Mhz 
GPU[2] 		: 3: 619Mhz 
GPU[2] 		: 4: 680Mhz 
GPU[2] 		: 5: 756Mhz 
GPU[2] 		: 6: 850Mhz 
GPU[2] 		: 7: 972Mhz *
GPU[2] 		: 
GPU[3] 		: Supported dcefclk frequencies on GPU3
GPU[3] 		: 0: 358Mhz *
GPU[3] 		: 1: 454Mhz 
GPU[3] 		: 2: 567Mhz 
GPU[3] 		: 3: 680Mhz 
GPU[3] 		: 4: 756Mhz 
GPU[3] 		: 5: 850Mhz 
GPU[3] 		: 6: 972Mhz 
GPU[3] 		: 7: 1134Mhz 
GPU[3] 		: 
GPU[3] 		: Supported fclk frequencies on GPU3
GPU[3] 		: 0: 551Mhz 
GPU[3] 		: 1: 611Mhz 
GPU[3] 		: 2: 691Mhz 
GPU[3] 		: 3: 761Mhz 
GPU[3] 		: 4: 871Mhz 
GPU[3] 		: 5: 961Mhz 
GPU[3] 		: 6: 1081Mhz 
GPU[3] 		: 7: 1226Mhz *
GPU[3] 		: 
GPU[3] 		: Supported mclk frequencies on GPU3
GPU[3] 		: 0: 351Mhz 
GPU[3] 		: 1: 801Mhz 
GPU[3] 		: 2: 1001Mhz *
GPU[3] 		: 
GPU[3] 		: Supported pcie frequencies on GPU3
GPU[3] 		: 0: 2.5GT/s, x16 80Mhz 
GPU[3] 		: 1: 8.0GT/s, x8 308Mhz *
GPU[3] 		: 
GPU[3] 		: Supported sclk frequencies on GPU3
GPU[3] 		: 0: 701Mhz 
GPU[3] 		: 1: 809Mhz 
GPU[3] 		: 2: 1135Mhz 
GPU[3] 		: 3: 1373Mhz 
GPU[3] 		: 4: 1547Mhz 
GPU[3] 		: 5: 1684Mhz 
GPU[3] 		: 6: 1750Mhz 
GPU[3] 		: 7: 1774Mhz 
GPU[3] 		: 8: 1802Mhz *
GPU[3] 		: 
GPU[3] 		: Supported socclk frequencies on GPU3
GPU[3] 		: 0: 310Mhz 
GPU[3] 		: 1: 524Mhz 
GPU[3] 		: 2: 567Mhz 
GPU[3] 		: 3: 619Mhz 
GPU[3] 		: 4: 680Mhz 
GPU[3] 		: 5: 756Mhz 
GPU[3] 		: 6: 850Mhz 
GPU[3] 		: 7: 972Mhz *
GPU[3] 		: 
================================================================================
================================================================================
GPU[0] 		: GPU use (%): 0
GPU[1] 		: GPU use (%): 0
GPU[2] 		: GPU use (%): 0
GPU[3] 		: GPU use (%): 0
================================================================================
================================================================================
GPU[0] 		: GPU memory use (%): 0
GPU[1] 		: GPU memory use (%): 0
GPU[2] 		: GPU memory use (%): 0
GPU[3] 		: GPU memory use (%): 0
================================================================================
================================================================================
GPU[0] 		: GPU memory vendor: hynix
GPU[1] 		: GPU memory vendor: hynix
GPU[2] 		: GPU memory vendor: samsung
GPU[3] 		: GPU memory vendor: hynix
================================================================================
================================================================================
GPU[0] 		: PCIe Replay Count: 0
GPU[1] 		: PCIe Replay Count: 0
GPU[2] 		: PCIe Replay Count: 0
GPU[3] 		: PCIe Replay Count: 0
================================================================================
================================================================================
GPU[0] 		: Unique ID: 1d20492172dc76bd
GPU[1] 		: Unique ID: c5d6812172dc76e9
GPU[2] 		: Unique ID: a5de212172dc768b
GPU[3] 		: Unique ID: d08e516172fc1a8a
================================================================================
================================================================================
GPU[0] 		: Serial Number: N/A
GPU[1] 		: Serial Number: N/A
GPU[2] 		: Serial Number: N/A
GPU[3] 		: Serial Number: N/A
================================================================================
PIDs for KFD processes:
1773 1015 4193
================================================================================
================================================================================
================================================================================
GPU[0] 		: Voltage (mV): 737
GPU[1] 		: Voltage (mV): 743
GPU[2] 		: Voltage (mV): 737
GPU[3] 		: Voltage (mV): 1106
================================================================================
================================================================================
GPU[0] 		: PCI Bus: 0000:03:00.0
GPU[1] 		: PCI Bus: 0000:09:00.0
GPU[2] 		: PCI Bus: 0000:11:00.0
GPU[3] 		: PCI Bus: 0000:14:00.0
================================================================================
================================================================================
GPU[0] 		: ASD firmware version:  	1757293
GPU[0] 		: CE firmware version:  	78
GPU[0] 		: DMCU firmware version:  	0
GPU[0] 		: MC firmware version:  	0
GPU[0] 		: ME firmware version:  	160
GPU[0] 		: MEC firmware version:  	421
GPU[0] 		: MEC2 firmware version:  	421
GPU[0] 		: PFP firmware version:  	183
GPU[0] 		: RLC firmware version:  	50
GPU[0] 		: RLC SRLC firmware version:  	0
GPU[0] 		: RLC SRLG firmware version:  	0
GPU[0] 		: RLC SRLS firmware version:  	0
GPU[0] 		: SDMA firmware version:  	141
GPU[0] 		: SDMA2 firmware version:  	141
GPU[0] 		: SMC firmware version:  	00.40.43.00
GPU[0] 		: SOS firmware version:  	0x0008005f
GPU[0] 		: TA RAS firmware version:  	00.00.00.00
GPU[0] 		: TA XGMI firmware version:  	00.00.00.00
GPU[0] 		: UVD firmware version:  	0x41001713
GPU[0] 		: VCE firmware version:  	0x37050400
GPU[0] 		: VCN firmware version:  	0x00000000
GPU[1] 		: ASD firmware version:  	1757293
GPU[1] 		: CE firmware version:  	78
GPU[1] 		: DMCU firmware version:  	0
GPU[1] 		: MC firmware version:  	0
GPU[1] 		: ME firmware version:  	160
GPU[1] 		: MEC firmware version:  	421
GPU[1] 		: MEC2 firmware version:  	421
GPU[1] 		: PFP firmware version:  	183
GPU[1] 		: RLC firmware version:  	50
GPU[1] 		: RLC SRLC firmware version:  	0
GPU[1] 		: RLC SRLG firmware version:  	0
GPU[1] 		: RLC SRLS firmware version:  	0
GPU[1] 		: SDMA firmware version:  	141
GPU[1] 		: SDMA2 firmware version:  	141
GPU[1] 		: SMC firmware version:  	00.40.43.00
GPU[1] 		: SOS firmware version:  	0x0008005f
GPU[1] 		: TA RAS firmware version:  	00.00.00.00
GPU[1] 		: TA XGMI firmware version:  	00.00.00.00
GPU[1] 		: UVD firmware version:  	0x41001713
GPU[1] 		: VCE firmware version:  	0x37050400
GPU[1] 		: VCN firmware version:  	0x00000000
GPU[2] 		: ASD firmware version:  	1757293
GPU[2] 		: CE firmware version:  	78
GPU[2] 		: DMCU firmware version:  	0
GPU[2] 		: MC firmware version:  	0
GPU[2] 		: ME firmware version:  	160
GPU[2] 		: MEC firmware version:  	421
GPU[2] 		: MEC2 firmware version:  	421
GPU[2] 		: PFP firmware version:  	183
GPU[2] 		: RLC firmware version:  	50
GPU[2] 		: RLC SRLC firmware version:  	0
GPU[2] 		: RLC SRLG firmware version:  	0
GPU[2] 		: RLC SRLS firmware version:  	0
GPU[2] 		: SDMA firmware version:  	141
GPU[2] 		: SDMA2 firmware version:  	141
GPU[2] 		: SMC firmware version:  	00.40.43.00
GPU[2] 		: SOS firmware version:  	0x0008005f
GPU[2] 		: TA RAS firmware version:  	00.00.00.00
GPU[2] 		: TA XGMI firmware version:  	00.00.00.00
GPU[2] 		: UVD firmware version:  	0x41001713
GPU[2] 		: VCE firmware version:  	0x37050400
GPU[2] 		: VCN firmware version:  	0x00000000
GPU[3] 		: ASD firmware version:  	1757293
GPU[3] 		: CE firmware version:  	78
GPU[3] 		: DMCU firmware version:  	0
GPU[3] 		: MC firmware version:  	0
GPU[3] 		: ME firmware version:  	160
GPU[3] 		: MEC firmware version:  	421
GPU[3] 		: MEC2 firmware version:  	421
GPU[3] 		: PFP firmware version:  	183
GPU[3] 		: RLC firmware version:  	50
GPU[3] 		: RLC SRLC firmware version:  	0
GPU[3] 		: RLC SRLG firmware version:  	0
GPU[3] 		: RLC SRLS firmware version:  	0
GPU[3] 		: SDMA firmware version:  	141
GPU[3] 		: SDMA2 firmware version:  	141
GPU[3] 		: SMC firmware version:  	00.40.43.00
GPU[3] 		: SOS firmware version:  	0x0008005f
GPU[3] 		: TA RAS firmware version:  	00.00.00.00
GPU[3] 		: TA XGMI firmware version:  	00.00.00.00
GPU[3] 		: UVD firmware version:  	0x41001713
GPU[3] 		: VCE firmware version:  	0x37050400
GPU[3] 		: VCN firmware version:  	0x00000000
================================================================================
================================================================================
GPU[0] 		: Card series:		Vega 20 [Radeon VII]
GPU[0] 		: Card vendor:		Advanced Micro Devices, Inc. [AMD/ATI]
GPU[0] 		: Card SKU:		D36002
GPU[1] 		: Card series:		Vega 20 [Radeon VII]
GPU[1] 		: Card vendor:		Advanced Micro Devices, Inc. [AMD/ATI]
GPU[1] 		: Card SKU:		D36002
GPU[2] 		: Card series:		Vega 20 [Radeon VII]
GPU[2] 		: Card vendor:		Advanced Micro Devices, Inc. [AMD/ATI]
GPU[2] 		: Card SKU:		D36002
GPU[3] 		: Card series:		Vega 20 [Radeon VII]
GPU[3] 		: Card vendor:		Advanced Micro Devices, Inc. [AMD/ATI]
GPU[3] 		: Card SKU:		D36002
================================================================================
================================================================================
================================================================================
GPU[0] 		: Unable to display sclk range
GPU[1] 		: Unable to display sclk range
GPU[2] 		: Unable to display sclk range
GPU[3] 		: Unable to display sclk range
GPU[0] 		: Unable to display mclk range
GPU[1] 		: Unable to display mclk range
GPU[2] 		: Unable to display mclk range
GPU[3] 		: Unable to display mclk range
GPU[0] 		: Unable to display voltage range
GPU[1] 		: Unable to display voltage range
GPU[2] 		: Unable to display voltage range
GPU[3] 		: Unable to display voltage range
GPU[0] 		: Unable to get voltage curve
GPU[1] 		: Unable to get voltage curve
GPU[2] 		: Unable to get voltage curve
GPU[3] 		: Unable to get voltage curve
==============================End of ROCm SMI Log ==============================
Observing it further, it seems that the low PPD for these WUs occurs only for the box with 4 Radeon VII. The boxes with two GPUs seem to run between 600kPPD and 1mPPD on these WUs. So it might be a problem with some limited resource. However without source and debugging possibility hard to say which one.

The 13420 WUs almost all fail close to the beginning. However some are running further, I now have 13420,8372,13,1 running at 16.8% completion at 1.8mPPD forecast, but that is an absolute exception. Some of the GPU threads fail to get new WUs and hang idle until I pause and continue the client, my guess is according to amount of failed 13420 WUs.

Re: Folding is not fun right now - lots of trouble, no resul

Posted: Fri Aug 07, 2020 7:45 pm
by muziqaz
16918 is not well suited for Radeon VIIs due to small amount of atoms in simulation. There are a lot of such projects around
1.8m for 13420 is about right :)

Re: Folding is not fun right now - lots of trouble, no resul

Posted: Fri Aug 07, 2020 8:42 pm
by ThWuensche
@gunnarre: Thank you for your hint, some time I will definitely try that. However in the moment I'm 1250km away from the place where the computers run, so probably will wait with that experiment until I'm back.