Bad work units/folding slot failed RX580

It seems that a lot of GPU problems revolve around specific versions of drivers. Though AMD has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

Demandzm
Posts: 13
Joined: Sat Mar 28, 2020 9:45 pm

Re: Bad work units/folding slot failed RX580

Post by Demandzm »

sortShortList errors should probably be reported to the original thread just to keep them in the same place.

Is there a way I can run a test work unit on the gpu I am having an issue with? Right now I have to wait (sometimes) hours before I get a work unit then have a window of a few minutes if I am lucky to experiment.
bignerdguy
Posts: 14
Joined: Mon Mar 23, 2020 1:39 pm

Re: Bad work units/folding slot failed RX580

Post by bignerdguy »

If it helps for comparison sake in troubleshooting i got a WU that seems to be working fine now. you can use project 11777 as a reference if needed as it seems to be having no issues. Also FYI i turned the clock setting on my GPU to the chipset default of 1469 MHz instead of the factory OC default of 1545 Mz and am going to let this one run to completion. Next WU i get that seems to be going good i will bump it up a bit and see what happens just to be sure it isn't an OC type of issue. Should help you and AMD troubleshoot the issue.
Jan
Posts: 80
Joined: Tue Mar 31, 2020 6:46 pm

Re: Bad work units/folding slot failed RX580

Post by Jan »

Demandzm wrote:sortShortList errors should probably be reported to the original thread just to keep them in the same place.
Sorry, thats true. I mixed up the threads.
Demandzm
Posts: 13
Joined: Sat Mar 28, 2020 9:45 pm

Re: Bad work units/folding slot failed RX580

Post by Demandzm »

bignerdguy wrote:If it helps for comparison sake in troubleshooting i got a WU that seems to be working fine now. you can use project 11777 as a reference if needed as it seems to be having no issues. Also FYI i turned the clock setting on my GPU to the chipset default of 1469 MHz instead of the factory OC default of 1545 Mz and am going to let this one run to completion. Next WU i get that seems to be going good i will bump it up a bit and see what happens just to be sure it isn't an OC type of issue. Should help you and AMD troubleshoot the issue.
Unfortunately that wont help me. The gpu I'm having an issue with wont even ramp up above 700mhz. It seems to be failing before any work can be started.
ipkh
Posts: 175
Joined: Thu Jul 16, 2015 2:03 pm

Re: Bad work units/folding slot failed RX580

Post by ipkh »

I've seen a similar problem with my Nvidia GPU's. Your log shows 2 gpus with OpenCL.
GPU 0 needs to be set to OpenCL 1
GPU 1 needs to be set to OpenCL 0
You will need to reconfigure the slots to these values to fix this. It's a glitch in the automatic slot configuration.
Joe_H
Site Admin
Posts: 7868
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Bad work units/folding slot failed RX580

Post by Joe_H »

You left out the log entries that would have the "sortShortList" error. This project is already on a list of ones to be restricted from being assigned to most AMD GPUs. They should not be assigned in the future, not certain how far the project researchers are on getting through the list.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Flyordie209
Posts: 1
Joined: Wed Apr 01, 2020 9:14 pm

Re: Bad work units/folding slot failed RX580

Post by Flyordie209 »

I'm getting the same errors on an RX Vega 64 Liquid. Just an FYI- I'd lob those off the list. Some WU's complete though.
GPU 0: Bus:67 Slot:0 Func:0 AMD:5 Vega 10 XL/XT [Radeon RX Vega 56/64]
Project: 11762 (Run 0, Clone 10150, Gen 16)
FS01:0x22:Unit: 0x0000001e80fccb0a5e7113b78be40897
That one completed fine.

Then this one failed...
00:08:51:WU02:FS01:0x22:Project: 11781 (Run 0, Clone 111, Gen 21)
00:08:51:WU02:FS01:0x22:Unit: 0x0000002b0d5a98395e73c53f7acb9190
00:08:51:WU02:FS01:0x22:Reading tar file core.xml
00:08:51:WU02:FS01:0x22:Reading tar file integrator.xml
00:08:51:WU02:FS01:0x22:Reading tar file state.xml
00:08:52:WU02:FS01:0x22:Reading tar file system.xml
00:08:53:WU02:FS01:0x22:Digital signatures verified
00:08:53:WU02:FS01:0x22:Folding@home GPU Core22 Folding@home Core
00:08:53:WU02:FS01:0x22:Version 0.0.2
00:09:09:WU02:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
I am running WHQL drivers. November 26th, 2019 version.
Demandzm
Posts: 13
Joined: Sat Mar 28, 2020 9:45 pm

Re: Bad work units/folding slot failed RX580

Post by Demandzm »

've seen a similar problem with my Nvidia GPU's. Your log shows 2 gpus with OpenCL.
GPU 0 needs to be set to OpenCL 1
GPU 1 needs to be set to OpenCL 0
You will need to reconfigure the slots to these values to fix this. It's a glitch in the automatic slot configuration.
I get the same errors when I remove the working card (with folding at home reinstalled).


Joe_H wrote:You left out the log entries that would have the "sortShortList" error. This project is already on a list of ones to be restricted from being assigned to most AMD GPUs. They should not be assigned in the future, not certain how far the project researchers are on getting through the list.
I have no idea if you were replying to me. Here is a full log entry showing no sortShortList error, but still failing.

Code: Select all

04:05:46:WU03:FS02:Starting
04:05:46:WU03:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\Theater\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 03 -suffix 01 -version 705 -lifeline 6160 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
04:05:46:WU03:FS02:Started FahCore on PID 4124
04:05:46:WU03:FS02:Core PID:12000
04:05:46:WU03:FS02:FahCore 0x22 started
04:05:47:WU03:FS02:0x22:*********************** Log Started 2020-04-01T04:05:46Z ***********************
04:05:47:WU03:FS02:0x22:*************************** Core22 Folding@home Core ***************************
04:05:47:WU03:FS02:0x22:       Type: 0x22
04:05:47:WU03:FS02:0x22:       Core: Core22
04:05:47:WU03:FS02:0x22:    Website: https://foldingathome.org/
04:05:47:WU03:FS02:0x22:  Copyright: (c) 2009-2018 foldingathome.org
04:05:47:WU03:FS02:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
04:05:47:WU03:FS02:0x22:             <rafal.wiewiora@choderalab.org>
04:05:47:WU03:FS02:0x22:       Args: -dir 03 -suffix 01 -version 705 -lifeline 4124 -checkpoint 15
04:05:47:WU03:FS02:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
04:05:47:WU03:FS02:0x22:     Config: <none>
04:05:47:WU03:FS02:0x22:************************************ Build *************************************
04:05:47:WU03:FS02:0x22:    Version: 0.0.2
04:05:47:WU03:FS02:0x22:       Date: Dec 6 2019
04:05:47:WU03:FS02:0x22:       Time: 21:30:31
04:05:47:WU03:FS02:0x22: Repository: Git
04:05:47:WU03:FS02:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
04:05:47:WU03:FS02:0x22:     Branch: HEAD
04:05:47:WU03:FS02:0x22:   Compiler: Visual C++ 2008
04:05:47:WU03:FS02:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
04:05:47:WU03:FS02:0x22:   Platform: win32 10
04:05:47:WU03:FS02:0x22:       Bits: 64
04:05:47:WU03:FS02:0x22:       Mode: Release
04:05:47:WU03:FS02:0x22:************************************ System ************************************
04:05:47:WU03:FS02:0x22:        CPU: AMD Ryzen 5 1600 Six-Core Processor
04:05:47:WU03:FS02:0x22:     CPU ID: AuthenticAMD Family 23 Model 1 Stepping 1
04:05:47:WU03:FS02:0x22:       CPUs: 12
04:05:47:WU03:FS02:0x22:     Memory: 15.93GiB
04:05:47:WU03:FS02:0x22:Free Memory: 11.62GiB
04:05:47:WU03:FS02:0x22:    Threads: WINDOWS_THREADS
04:05:47:WU03:FS02:0x22: OS Version: 6.2
04:05:47:WU03:FS02:0x22:Has Battery: false
04:05:47:WU03:FS02:0x22: On Battery: false
04:05:47:WU03:FS02:0x22: UTC Offset: -4
04:05:47:WU03:FS02:0x22:        PID: 12000
04:05:47:WU03:FS02:0x22:        CWD: C:\Users\Theater\AppData\Roaming\FAHClient\work
04:05:47:WU03:FS02:0x22:         OS: Windows 10 Pro
04:05:47:WU03:FS02:0x22:    OS Arch: AMD64
04:05:47:WU03:FS02:0x22:********************************************************************************
04:05:47:WU03:FS02:0x22:Project: 11779 (Run 0, Clone 9659, Gen 15)
04:05:47:WU03:FS02:0x22:Unit: 0x000000160d5a98395e75895c8cbb8e0e
04:05:47:WU03:FS02:0x22:Reading tar file core.xml
04:05:47:WU03:FS02:0x22:Reading tar file integrator.xml
04:05:47:WU03:FS02:0x22:Reading tar file state.xml
04:05:47:WU03:FS02:0x22:Reading tar file system.xml
04:05:48:WU03:FS02:0x22:Digital signatures verified
04:05:48:WU03:FS02:0x22:Folding@home GPU Core22 Folding@home Core
04:05:48:WU03:FS02:0x22:Version 0.0.2
04:06:03:WU03:FS02:0x22:Completed 0 out of 1000000 steps (0%)
04:06:03:WU03:FS02:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
04:07:13:WU03:FS02:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
04:07:13:WU03:FS02:0x22:Following exception occured: Particle coordinate is nan
04:07:19:WU03:FS02:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
04:07:19:WU03:FS02:0x22:Following exception occured: Particle coordinate is nan
04:07:24:WU03:FS02:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
04:07:24:WU03:FS02:0x22:Following exception occured: Particle coordinate is nan
04:07:24:WU03:FS02:0x22:ERROR:114: Max Retries Reached
04:07:24:WU03:FS02:0x22:Saving result file ..\logfile_01.txt
04:07:24:WU03:FS02:0x22:Saving result file badstate-0.xml
04:07:25:WU03:FS02:0x22:Saving result file badstate-1.xml
04:07:25:WU03:FS02:0x22:Saving result file badstate-2.xml
04:07:25:WU03:FS02:0x22:Saving result file checkpt.crc
04:07:25:WU03:FS02:0x22:Saving result file science.log
04:07:25:WU03:FS02:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
04:07:26:WARNING:WU03:FS02:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
04:07:26:WU03:FS02:Sending unit results: id:03 state:SEND error:FAULTY project:11779 run:0 clone:9659 gen:15 core:0x22 unit:0x000000160d5a98395e75895c8cbb8e0e
04:07:26:WU03:FS02:Uploading 75.88MiB to 13.90.152.57
sam6861
Posts: 8
Joined: Sat Mar 21, 2020 10:04 am

Re: Bad work units/folding slot failed RX580

Post by sam6861 »

With 2 of the same RX 580 graphics, and only folding slot 2 (FS02) is throwing different errors. "Is your system overclocked?" "Particle coordinate is nan" - My guess is either: overclock instability going too fast at too low of voltage, or overheating, or dying or defective graphics card.

On windows, AMD Radeon software, performance tuning, or some overclock utility might fix this by slowing down core clock speed, but look on which graphics card to change performance tuning settings. Maybe the second RX 580 card might have a problem of having errors so try to slow down core speed.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Bad work units/folding slot failed RX580

Post by bruce »

You've posted some of the applicable lines of FAH's log, but when discussing GPU configuration issues, the essential information is about a page from the very beginning of FAH's log where it says

*****SYSTEM******
CPU: .....
CPU ID: .....
CPUs: 4
Memory: ....GiB
...
... up to ...
...
OS Arch: AMD64
GPUs: 0
CUDA: Not detected:
OpenCL Device Platform:0 Device:1 Bus:NA Slot:NA Compute:1.2 Driver:10.18
Demandzm
Posts: 13
Joined: Sat Mar 28, 2020 9:45 pm

Re: Bad work units/folding slot failed RX580

Post by Demandzm »

bruce wrote:You've posted some of the applicable lines of FAH's log, but when discussing GPU configuration issues, the essential information is about a page from the very beginning of FAH's log where it says

*****SYSTEM******
CPU: .....
CPU ID: .....
CPUs: 4
Memory: ....GiB
...
... up to ...
...
OS Arch: AMD64
GPUs: 0
CUDA: Not detected:
OpenCL Device Platform:0 Device:1 Bus:NA Slot:NA Compute:1.2 Driver:10.18

It should be in my first log, but here it is.

Code: Select all

*********************** Log Started 2020-03-30T00:21:11Z ***********************
00:21:11:************************* Folding@home Client *************************
00:21:11:        Website: https://foldingathome.org/
00:21:11:      Copyright: (c) 2009-2018 foldingathome.org
00:21:11:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
00:21:11:           Args: 
00:21:11:         Config: C:\Users\Theater\AppData\Roaming\FAHClient\config.xml
00:21:11:******************************** Build ********************************
00:21:11:        Version: 7.5.1
00:21:11:           Date: May 11 2018
00:21:11:           Time: 13:06:32
00:21:11:     Repository: Git
00:21:11:       Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
00:21:11:         Branch: master
00:21:11:       Compiler: Visual C++ 2008
00:21:11:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
00:21:11:       Platform: win32 10
00:21:11:           Bits: 32
00:21:11:           Mode: Release
00:21:11:******************************* System ********************************
00:21:11:            CPU: AMD Ryzen 5 1600 Six-Core Processor
00:21:11:         CPU ID: AuthenticAMD Family 23 Model 1 Stepping 1
00:21:11:           CPUs: 12
00:21:11:         Memory: 15.93GiB
00:21:11:    Free Memory: 14.06GiB
00:21:11:        Threads: WINDOWS_THREADS
00:21:11:     OS Version: 6.2
00:21:11:    Has Battery: false
00:21:11:     On Battery: false
00:21:11:     UTC Offset: -4
00:21:11:            PID: 7764
00:21:11:            CWD: C:\Users\Theater\AppData\Roaming\FAHClient
00:21:11:             OS: Windows 10 Enterprise
00:21:11:        OS Arch: AMD64
00:21:11:           GPUs: 2
00:21:11:          GPU 0: Bus:39 Slot:0 Func:0 AMD:5 Ellesmere XT [Radeon RX
00:21:11:                 470/480/570/580/590]
00:21:11:          GPU 1: Bus:38 Slot:0 Func:0 AMD:5 Ellesmere XT [Radeon RX
00:21:11:                 470/480/570/580/590]
00:21:11:           CUDA: Not detected: Failed to open dynamic library 'nvcuda.dll': The
00:21:11:                 specified module could not be found.
00:21:11:
00:21:11:OpenCL Device 0: Platform:0 Device:0 Bus:38 Slot:0 Compute:1.2 Driver:3004.8
00:21:11:OpenCL Device 1: Platform:0 Device:1 Bus:39 Slot:0 Compute:1.2 Driver:3004.8
00:21:11:  Win32 Service: false
00:21:11:***********************************************************************
Demandzm
Posts: 13
Joined: Sat Mar 28, 2020 9:45 pm

Re: Bad work units/folding slot failed RX580

Post by Demandzm »

sam6861 wrote:With 2 of the same RX 580 graphics, and only folding slot 2 (FS02) is throwing different errors. "Is your system overclocked?" "Particle coordinate is nan" - My guess is either: overclock instability going too fast at too low of voltage, or overheating, or dying or defective graphics card.

On windows, AMD Radeon software, performance tuning, or some overclock utility might fix this by slowing down core clock speed, but look on which graphics card to change performance tuning settings. Maybe the second RX 580 card might have a problem of having errors so try to slow down core speed.

I just ran heaven benchmark for 2 hours on the gpu giving me issues. HWinfo shows 1380Mhz core clock 2000Mhz memory clock. Max Temp 81 degrees C. 0 memory errors. I also ran 3d mark and ran a cryptominer for over a day with the same results. I have no overclock set. I guess it could still be failing. I would expect failures with other software if it was.
Post Reply