Asus Arez Vega 56 - Particle coordinate is nan

It seems that a lot of GPU problems revolve around specific versions of drivers. Though AMD has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

Post Reply
saschahi
Posts: 3
Joined: Thu Mar 26, 2020 2:10 pm

Asus Arez Vega 56 - Particle coordinate is nan

Post by saschahi »

Hello,

everytime, within the first 5% of starting to fold, FAH stops to work with the following:

Code: Select all

08:19:48:WU01:FS01:FahCore 0x22 started
08:19:49:WU01:FS01:0x22:*********************** Log Started 2020-03-26T08:19:48Z ***********************
08:19:49:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
08:19:49:WU01:FS01:0x22:       Type: 0x22
08:19:49:WU01:FS01:0x22:       Core: Core22
08:19:49:WU01:FS01:0x22:    Website: https://foldingathome.org/
08:19:49:WU01:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
08:19:49:WU01:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
08:19:49:WU01:FS01:0x22:             <rafal.wiewiora@choderalab.org>
08:19:49:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 705 -lifeline 2284 -checkpoint 15
08:19:49:WU01:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
08:19:49:WU01:FS01:0x22:     Config: <none>
08:19:49:WU01:FS01:0x22:************************************ Build *************************************
08:19:49:WU01:FS01:0x22:    Version: 0.0.2
08:19:49:WU01:FS01:0x22:       Date: Dec 6 2019
08:19:49:WU01:FS01:0x22:       Time: 21:30:31
08:19:49:WU01:FS01:0x22: Repository: Git
08:19:49:WU01:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
08:19:49:WU01:FS01:0x22:     Branch: HEAD
08:19:49:WU01:FS01:0x22:   Compiler: Visual C++ 2008
08:19:49:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
08:19:49:WU01:FS01:0x22:   Platform: win32 10
08:19:49:WU01:FS01:0x22:       Bits: 64
08:19:49:WU01:FS01:0x22:       Mode: Release
08:19:49:WU01:FS01:0x22:************************************ System ************************************
08:19:49:WU01:FS01:0x22:        CPU: AMD Ryzen 5 3600 6-Core Processor
08:19:49:WU01:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
08:19:49:WU01:FS01:0x22:       CPUs: 12
08:19:49:WU01:FS01:0x22:     Memory: 31.95GiB
08:19:49:WU01:FS01:0x22:Free Memory: 25.81GiB
08:19:49:WU01:FS01:0x22:    Threads: WINDOWS_THREADS
08:19:49:WU01:FS01:0x22: OS Version: 6.2
08:19:49:WU01:FS01:0x22:Has Battery: false
08:19:49:WU01:FS01:0x22: On Battery: false
08:19:49:WU01:FS01:0x22: UTC Offset: 1
08:19:49:WU01:FS01:0x22:        PID: 9616
08:19:49:WU01:FS01:0x22:        CWD: C:\Users\Saschahi\AppData\Roaming\FAHClient\work
08:19:49:WU01:FS01:0x22:         OS: Windows 10 Education
08:19:49:WU01:FS01:0x22:    OS Arch: AMD64
08:19:49:WU01:FS01:0x22:********************************************************************************
08:19:49:WU01:FS01:0x22:Project: 11762 (Run 0, Clone 6560, Gen 13)
08:19:49:WU01:FS01:0x22:Unit: 0x0000001480fccb0a5e7113df24f72b19
08:19:49:WU01:FS01:0x22:Reading tar file core.xml
08:19:49:WU01:FS01:0x22:Reading tar file integrator.xml
08:19:49:WU01:FS01:0x22:Reading tar file state.xml
08:19:49:WU01:FS01:0x22:Reading tar file system.xml
08:19:49:WU01:FS01:0x22:Digital signatures verified
08:19:49:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
08:19:49:WU01:FS01:0x22:Version 0.0.2
08:20:03:WU01:FS01:0x22:Completed 0 out of 1000000 steps (0%)
08:20:03:WU01:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
08:20:09:WU01:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
08:20:09:WU01:FS01:0x22:Following exception occured: Particle coordinate is nan
08:20:13:WU01:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
08:20:13:WU01:FS01:0x22:Following exception occured: Particle coordinate is nan
08:20:17:WU01:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
08:20:17:WU01:FS01:0x22:Following exception occured: Particle coordinate is nan
08:20:17:WU01:FS01:0x22:ERROR:114: Max Retries Reached
08:20:17:WU01:FS01:0x22:Saving result file ..\logfile_01.txt
08:20:17:WU01:FS01:0x22:Saving result file badstate-0.xml
08:20:17:WU01:FS01:0x22:Saving result file badstate-1.xml
08:20:17:WU01:FS01:0x22:Saving result file badstate-2.xml
08:20:17:WU01:FS01:0x22:Saving result file checkpt.crc
08:20:17:WU01:FS01:0x22:Saving result file science.log
08:20:17:WU01:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
08:20:18:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
sometimes that happens immediatly like here, and sometimes it works for like 3-5% and then errors out with the same exception.
And from what I've noticed: it errors out faster if I'm working at the machine at the same time as FAH tries to run (even if I'm just watching youtube)
I've tried:
- using every avaiable default setting in GPU Tweak 2
- Undervolting
- Underclocking
- Googling for someone with the same GPU and same error

My CPU folding with my overclocked Ryzen 5 3600 works fine.

pls help

- Saschahi
MrFrizzy
Posts: 123
Joined: Fri Feb 14, 2020 4:48 am

Re: Asus Arez Vega 56 - Particle coordinate is nan

Post by MrFrizzy »

  • What other projects have been giving you this error?
  • Have you been able to fold any projects successfully, both recently or in the past?
  • Have you done a clean install of the most recent drivers?
  • How far have you tried to undervolt/underclock?
  • What are your thermals looking like?
S1: AMD R5 3600 & Sapphire RX 5700 XT Reference @2.1GHz under water
S2: Intel Xeon E5-2620v3 & MSI GTX 1650

RX 5700 XT Project & PPD Tracking Spreadsheet

Image
saschahi
Posts: 3
Joined: Thu Mar 26, 2020 2:10 pm

Re: Asus Arez Vega 56 - Particle coordinate is nan

Post by saschahi »

What other projects have been giving you this error?
- this is the first time I ever saw this error anywhere. Or if you mean in FaH. This error came consecutively for ~12-15 WUs
additionally:

Code: Select all

07:19:02:WU01:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
this started popping up recently too.
Have you been able to fold any projects successfully, both recently or in the past?
- nope. only CPU projects. and only tried folding for the first time like 3 days ago
Have you done a clean install of the most recent drivers?
- updated my drivers yesterday to 20.3.1 - didn't help
How far have you tried to undervolt/underclock?
- undervolt: from (stock) 1200 to 1100
- underclock: from (stock) 1590 to 1500
What are your thermals looking like?
- I can only reach the thermal limit I set of 76 degrees Celcius after ~3 minutes of furmark burn test. I've never seen it go above 80.
saschahi
Posts: 3
Joined: Thu Mar 26, 2020 2:10 pm

Re: Asus Arez Vega 56 - Particle coordinate is nan

Post by saschahi »

Just a quick update with my last failed WU

Code: Select all

*********************** Log Started 2020-03-27T09:48:33Z ***********************
09:48:33:************************* Folding@home Client *************************
09:48:33:        Website: https://foldingathome.org/
09:48:33:      Copyright: (c) 2009-2018 foldingathome.org
09:48:33:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
09:48:33:           Args: 
09:48:33:         Config: C:\Users\Saschahi\AppData\Roaming\FAHClient\config.xml
09:48:33:******************************** Build ********************************
09:48:33:        Version: 7.5.1
09:48:33:           Date: May 11 2018
09:48:33:           Time: 13:06:32
09:48:33:     Repository: Git
09:48:33:       Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
09:48:33:         Branch: master
09:48:33:       Compiler: Visual C++ 2008
09:48:33:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
09:48:33:       Platform: win32 10
09:48:33:           Bits: 32
09:48:33:           Mode: Release
09:48:33:******************************* System ********************************
09:48:33:            CPU: AMD Ryzen 5 3600 6-Core Processor
09:48:33:         CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
09:48:33:           CPUs: 12
09:48:33:         Memory: 31.95GiB
09:48:33:    Free Memory: 28.45GiB
09:48:33:        Threads: WINDOWS_THREADS
09:48:33:     OS Version: 6.2
09:48:33:    Has Battery: false
09:48:33:     On Battery: false
09:48:33:     UTC Offset: 1
09:48:33:            PID: 120
09:48:33:            CWD: C:\Users\Saschahi\AppData\Roaming\FAHClient
09:48:33:             OS: Windows 10 Enterprise
09:48:33:        OS Arch: AMD64
09:48:33:           GPUs: 1
09:48:33:          GPU 0: Bus:43 Slot:0 Func:0 AMD:5 Vega 10 XL/XT [Radeon RX Vega 56/64]
09:48:33:           CUDA: Not detected: cuInit() returned 999
09:48:33:OpenCL Device 0: Platform:0 Device:0 Bus:43 Slot:0 Compute:1.2 Driver:3004.8
09:48:33:  Win32 Service: false
09:48:33:***********************************************************************
09:48:33:<config>
09:48:33:  <!-- Network -->
09:48:33:  <proxy v=':8080'/>
09:48:33:
09:48:33:  <!-- Slot Control -->
09:48:33:  <pause-on-battery v='false'/>
09:48:33:
09:48:33:  <!-- User Information -->
09:48:33:  <passkey v='********************************'/>
09:48:33:  <team v='223518'/>
09:48:33:  <user v='saschahi'/>
09:48:33:
09:48:33:  <!-- Folding Slots -->
09:48:33:  <slot id='0' type='CPU'/>
09:48:33:  <slot id='1' type='GPU'/>
09:48:33:</config>
09:48:33:Trying to access database...
09:48:33:Successfully acquired database lock
09:48:33:Enabled folding slot 00: READY cpu:10
09:48:33:Enabled folding slot 01: READY gpu:0:Vega 10 XL/XT [Radeon RX Vega 56/64]
09:48:34:WU00:FS00:Connecting to 65.254.110.245:8080
09:48:34:WU01:FS01:Connecting to 65.254.110.245:8080
09:48:35:WU00:FS00:Assigned to work server 13.90.152.57
09:48:35:WU00:FS00:Requesting new work unit for slot 00: READY cpu:10 from 13.90.152.57
09:48:35:WU01:FS01:Assigned to work server 13.90.152.57
09:48:35:WU00:FS00:Connecting to 13.90.152.57:8080
09:48:35:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:Vega 10 XL/XT [Radeon RX Vega 56/64] from 13.90.152.57
09:48:35:WU01:FS01:Connecting to 13.90.152.57:8080
09:48:56:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
09:48:56:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
09:48:56:WU01:FS01:Connecting to 13.90.152.57:80
09:48:56:WU00:FS00:Connecting to 13.90.152.57:80
09:49:23:WU01:FS01:Downloading 51.19MiB
09:49:29:WU01:FS01:Download 20.39%
09:49:35:WU01:FS01:Download 40.53%
09:49:41:WU01:FS01:Download 56.40%
09:49:47:WU01:FS01:Download 77.16%
09:49:55:WU01:FS01:Download 81.06%
09:50:00:WU01:FS01:Download complete
09:50:00:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:11780 run:0 clone:6346 gen:6 core:0x22 unit:0x0000000a0d5a98395e7589448a4ffc01
09:50:00:WU01:FS01:Starting
09:50:00:WU01:FS01:Running FahCore: D:\FAH\FAHClient/FAHCoreWrapper.exe C:\Users\Saschahi\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 705 -lifeline 120 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
09:50:00:WU01:FS01:Started FahCore on PID 10028
09:50:03:WU01:FS01:Core PID:7028
09:50:03:WU01:FS01:FahCore 0x22 started
09:50:04:WU01:FS01:0x22:*********************** Log Started 2020-03-27T09:50:03Z ***********************
09:50:04:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
09:50:04:WU01:FS01:0x22:       Type: 0x22
09:50:04:WU01:FS01:0x22:       Core: Core22
09:50:04:WU01:FS01:0x22:    Website: https://foldingathome.org/
09:50:04:WU01:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
09:50:04:WU01:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
09:50:04:WU01:FS01:0x22:             <rafal.wiewiora@choderalab.org>
09:50:04:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 705 -lifeline 10028 -checkpoint 15
09:50:04:WU01:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
09:50:04:WU01:FS01:0x22:     Config: <none>
09:50:04:WU01:FS01:0x22:************************************ Build *************************************
09:50:04:WU01:FS01:0x22:    Version: 0.0.2
09:50:04:WU01:FS01:0x22:       Date: Dec 6 2019
09:50:04:WU01:FS01:0x22:       Time: 21:30:31
09:50:04:WU01:FS01:0x22: Repository: Git
09:50:04:WU01:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
09:50:04:WU01:FS01:0x22:     Branch: HEAD
09:50:04:WU01:FS01:0x22:   Compiler: Visual C++ 2008
09:50:04:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
09:50:04:WU01:FS01:0x22:   Platform: win32 10
09:50:04:WU01:FS01:0x22:       Bits: 64
09:50:04:WU01:FS01:0x22:       Mode: Release
09:50:04:WU01:FS01:0x22:************************************ System ************************************
09:50:04:WU01:FS01:0x22:        CPU: AMD Ryzen 5 3600 6-Core Processor
09:50:04:WU01:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
09:50:04:WU01:FS01:0x22:       CPUs: 12
09:50:04:WU01:FS01:0x22:     Memory: 31.95GiB
09:50:04:WU01:FS01:0x22:Free Memory: 27.73GiB
09:50:04:WU01:FS01:0x22:    Threads: WINDOWS_THREADS
09:50:04:WU01:FS01:0x22: OS Version: 6.2
09:50:04:WU01:FS01:0x22:Has Battery: false
09:50:04:WU01:FS01:0x22: On Battery: false
09:50:04:WU01:FS01:0x22: UTC Offset: 1
09:50:04:WU01:FS01:0x22:        PID: 7028
09:50:04:WU01:FS01:0x22:        CWD: C:\Users\Saschahi\AppData\Roaming\FAHClient\work
09:50:04:WU01:FS01:0x22:         OS: Windows 10 Education
09:50:04:WU01:FS01:0x22:    OS Arch: AMD64
09:50:04:WU01:FS01:0x22:********************************************************************************
09:50:04:WU01:FS01:0x22:Project: 11780 (Run 0, Clone 6346, Gen 6)
09:50:04:WU01:FS01:0x22:Unit: 0x0000000a0d5a98395e7589448a4ffc01
09:50:04:WU01:FS01:0x22:Reading tar file core.xml
09:50:04:WU01:FS01:0x22:Reading tar file integrator.xml
09:50:04:WU01:FS01:0x22:Reading tar file state.xml
09:50:04:WU01:FS01:0x22:Reading tar file system.xml
09:50:04:WU01:FS01:0x22:Digital signatures verified
09:50:04:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
09:50:04:WU01:FS01:0x22:Version 0.0.2
09:50:08:WU00:FS00:Downloading 4.19MiB
09:50:09:WU00:FS00:Download complete
09:50:09:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13861 run:0 clone:8265 gen:15 core:0xa7 unit:0x000000120d5a98395e730e14f0e4f245
09:50:10:WU00:FS00:Starting
09:50:10:WU00:FS00:Running FahCore: D:\FAH\FAHClient/FAHCoreWrapper.exe C:\Users\Saschahi\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/avx/Core_a7.fah/FahCore_a7.exe -dir 00 -suffix 01 -version 705 -lifeline 120 -checkpoint 15 -np 10
09:50:10:WU00:FS00:Started FahCore on PID 9444
09:50:10:WU00:FS00:Core PID:1164
09:50:10:WU00:FS00:FahCore 0xa7 started
09:50:12:WU00:FS00:0xa7:*********************** Log Started 2020-03-27T09:50:11Z ***********************
09:50:12:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
09:50:12:WU00:FS00:0xa7:       Type: 0xa7
09:50:12:WU00:FS00:0xa7:       Core: Gromacs
09:50:12:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 705 -lifeline 9444 -checkpoint 15 -np
09:50:12:WU00:FS00:0xa7:             10
09:50:12:WU00:FS00:0xa7:************************************ CBang *************************************
09:50:12:WU00:FS00:0xa7:       Date: Oct 26 2019
09:50:12:WU00:FS00:0xa7:       Time: 01:38:25
09:50:12:WU00:FS00:0xa7:   Revision: c46a1a011a24143739ac7218c5a435f66777f62f
09:50:12:WU00:FS00:0xa7:     Branch: master
09:50:12:WU00:FS00:0xa7:   Compiler: Visual C++ 2008
09:50:12:WU00:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
09:50:12:WU00:FS00:0xa7:   Platform: win32 10
09:50:12:WU00:FS00:0xa7:       Bits: 64
09:50:12:WU00:FS00:0xa7:       Mode: Release
09:50:12:WU00:FS00:0xa7:************************************ System ************************************
09:50:12:WU00:FS00:0xa7:        CPU: AMD Ryzen 5 3600 6-Core Processor
09:50:12:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
09:50:12:WU00:FS00:0xa7:       CPUs: 12
09:50:12:WU00:FS00:0xa7:     Memory: 31.95GiB
09:50:12:WU00:FS00:0xa7:Free Memory: 27.38GiB
09:50:12:WU00:FS00:0xa7:    Threads: WINDOWS_THREADS
09:50:12:WU00:FS00:0xa7: OS Version: 6.2
09:50:12:WU00:FS00:0xa7:Has Battery: false
09:50:12:WU00:FS00:0xa7: On Battery: false
09:50:12:WU00:FS00:0xa7: UTC Offset: 1
09:50:12:WU00:FS00:0xa7:        PID: 1164
09:50:12:WU00:FS00:0xa7:        CWD: C:\Users\Saschahi\AppData\Roaming\FAHClient\work
09:50:12:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
09:50:12:WU00:FS00:0xa7:    Version: 0.0.18
09:50:12:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
09:50:12:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
09:50:12:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
09:50:12:WU00:FS00:0xa7:       Date: Oct 26 2019
09:50:12:WU00:FS00:0xa7:       Time: 01:52:30
09:50:12:WU00:FS00:0xa7:   Revision: c1e3513b1bc0c16013668f2173ee969e5995b38e
09:50:12:WU00:FS00:0xa7:     Branch: master
09:50:12:WU00:FS00:0xa7:   Compiler: Visual C++ 2008
09:50:12:WU00:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
09:50:12:WU00:FS00:0xa7:   Platform: win32 10
09:50:12:WU00:FS00:0xa7:       Bits: 64
09:50:12:WU00:FS00:0xa7:       Mode: Release
09:50:12:WU00:FS00:0xa7:************************************ Build *************************************
09:50:12:WU00:FS00:0xa7:       SIMD: avx_256
09:50:12:WU00:FS00:0xa7:********************************************************************************
09:50:12:WU00:FS00:0xa7:Project: 13861 (Run 0, Clone 8265, Gen 15)
09:50:12:WU00:FS00:0xa7:Unit: 0x000000120d5a98395e730e14f0e4f245
09:50:12:WU00:FS00:0xa7:Reading tar file core.xml
09:50:12:WU00:FS00:0xa7:Reading tar file frame15.tpr
09:50:12:WU00:FS00:0xa7:Digital signatures verified
09:50:12:WU00:FS00:0xa7:Calling: mdrun -s frame15.tpr -o frame15.trr -x frame15.xtc -e frame15.edr -cpt 15 -nt 10
09:50:12:WU00:FS00:0xa7:Steps: first=1875000 total=125000
09:50:13:WU00:FS00:0xa7:Completed 1 out of 125000 steps (0%)
09:50:24:WU01:FS01:0x22:Completed 0 out of 1000000 steps (0%)
09:50:24:WU01:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
09:50:43:WU00:FS00:0xa7:Completed 1250 out of 125000 steps (1%)
09:50:46:WU01:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
09:50:46:WU01:FS01:0x22:Following exception occured: Particle coordinate is nan
09:51:07:WU00:FS00:0xa7:Completed 2500 out of 125000 steps (2%)
09:51:13:WU01:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
09:51:13:WU01:FS01:0x22:Following exception occured: Particle coordinate is nan
09:51:31:WU00:FS00:0xa7:Completed 3750 out of 125000 steps (3%)
09:51:39:WU01:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
09:51:39:WU01:FS01:0x22:Following exception occured: Particle coordinate is nan
09:51:40:WU01:FS01:0x22:ERROR:114: Max Retries Reached
09:51:40:WU01:FS01:0x22:Saving result file ..\logfile_01.txt
09:51:40:WU01:FS01:0x22:Saving result file badstate-0.xml
09:51:40:WU01:FS01:0x22:Saving result file badstate-1.xml
09:51:40:WU01:FS01:0x22:Saving result file badstate-2.xml
09:51:40:WU01:FS01:0x22:Saving result file checkpt.crc
09:51:40:WU01:FS01:0x22:Saving result file science.log
09:51:40:WU01:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
09:51:41:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
09:51:41:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:11780 run:0 clone:6346 gen:6 core:0x22 unit:0x0000000a0d5a98395e7589448a4ffc01
NuovaApe
Posts: 54
Joined: Mon Jun 17, 2019 12:49 pm

Re: Asus Arez Vega 56 - Particle coordinate is nan

Post by NuovaApe »

I see you're running CPU + GPU.

You could try pausing/stopping the CPU slot to divide the problem in half. Check your motherboard drivers are up to date, specifically anything PCI related.

Two things in play here; increased heat and PCI system busier than normal.

I had spurious issues when my tower got hot (CPU + GPU FAH slots). I needed to beef up my case fans + CPU fan and get a descent PSU to provide more than the recommended juice (by the GPU vendor). I ended up removing the CPU slot altogether as the added heat made my PC sound like a hovercraft with all the fans maxed out. Now just GPU chomping along, quietly.

Your '56 recommends 750W PSU. Your CPU burns 65W.

With a decent GPU like yours the CPU will add little (comparatively) to your daily tally. I've a Vega 56 + i7 6700k and from my own experience recommend one or the other, not both. I did get my system stable but the fan noise made me opt for the quite life.

I saw that same "Particle coordinate is nan" error during my heat troubles.

Good luck!
MrFrizzy
Posts: 123
Joined: Fri Feb 14, 2020 4:48 am

Re: Asus Arez Vega 56 - Particle coordinate is nan

Post by MrFrizzy »

saschahi wrote:What other projects have been giving you this error?
- this is the first time I ever saw this error anywhere. Or if you mean in FaH. This error came consecutively for ~12-15 WUs
additionally:

Code: Select all

07:19:02:WU01:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
this started popping up recently too.
Have you been able to fold any projects successfully, both recently or in the past?
- nope. only CPU projects. and only tried folding for the first time like 3 days ago
Have you done a clean install of the most recent drivers?
- updated my drivers yesterday to 20.3.1 - didn't help
How far have you tried to undervolt/underclock?
- undervolt: from (stock) 1200 to 1100
- underclock: from (stock) 1590 to 1500
What are your thermals looking like?
- I can only reach the thermal limit I set of 76 degrees Celcius after ~3 minutes of furmark burn test. I've never seen it go above 80.
The message about the "Error invoking kernel sortShortList" is a common error that has been plaguing many GCN based AMD cards lately. The researchers are aware and many of the projects have begun blocking the affected cards until a permanent fix can be put into place.

Considering how many WUs you have error out consistently, this is 100% a stability issue. While undervolting and underclocking is a good start, you should also try leaving the card on stock settings and set an aggressive negative power limit of something like -20% to see how much that helps. Your thermals look to be good and lowering the power limit will only improve those as well. Also, if you have touched the memory clocks at all, put those back to stock and test some more.

Lastly, what is the brand and model of your power supply?
S1: AMD R5 3600 & Sapphire RX 5700 XT Reference @2.1GHz under water
S2: Intel Xeon E5-2620v3 & MSI GTX 1650

RX 5700 XT Project & PPD Tracking Spreadsheet

Image
Post Reply