Page 1 of 3

Multiple Issues with AMD GPU Processing?

Posted: Sat Apr 04, 2020 2:36 am
by kwthom
Mods - please move if this is in the incorrect subforum...

In another thread, it would seem that multiple AMD GPU's may be having issues with certain WU's?

Windows 10 Pro
Intel Core I5 9400F @ 2.9 Ghz - 16GB RAM
Radeon RX 580, 8Gb RAM, software version 19.9.2
Not overclocked (1300MHz...) & temps hover around 80C - local temperatures are 26 - 28C these days...

I've not yet found the ability to track what's been successful and what hasn't been lately.

Should I kill the GPU for now?

Code: Select all

******************************* Date: 2020-04-03 *******************************
22:18:56:WU00:FS01:Starting
22:18:56:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\kwtho\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 705 -lifeline 10348 -checkpoint 20 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
22:18:56:WU00:FS01:Started FahCore on PID 904
22:18:56:WU00:FS01:Core PID:10432
22:18:56:WU00:FS01:FahCore 0x22 started
22:18:56:WU00:FS01:0x22:*********************** Log Started 2020-04-03T22:18:56Z ***********************
22:18:56:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
22:18:56:WU00:FS01:0x22:       Type: 0x22
22:18:56:WU00:FS01:0x22:       Core: Core22
22:18:56:WU00:FS01:0x22:    Website: https://foldingathome.org/
22:18:56:WU00:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
22:18:56:WU00:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
22:18:56:WU00:FS01:0x22:             <rafal.wiewiora@choderalab.org>
22:18:56:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 705 -lifeline 904 -checkpoint 20
22:18:56:WU00:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
22:18:56:WU00:FS01:0x22:     Config: <none>
22:18:56:WU00:FS01:0x22:************************************ Build *************************************
22:18:56:WU00:FS01:0x22:    Version: 0.0.2
22:18:56:WU00:FS01:0x22:       Date: Dec 6 2019
22:18:56:WU00:FS01:0x22:       Time: 21:30:31
22:18:56:WU00:FS01:0x22: Repository: Git
22:18:56:WU00:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
22:18:56:WU00:FS01:0x22:     Branch: HEAD
22:18:56:WU00:FS01:0x22:   Compiler: Visual C++ 2008
22:18:56:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
22:18:56:WU00:FS01:0x22:   Platform: win32 10
22:18:56:WU00:FS01:0x22:       Bits: 64
22:18:56:WU00:FS01:0x22:       Mode: Release
22:18:56:WU00:FS01:0x22:************************************ System ************************************
22:18:56:WU00:FS01:0x22:        CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
22:18:56:WU00:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
22:18:56:WU00:FS01:0x22:       CPUs: 6
22:18:56:WU00:FS01:0x22:     Memory: 15.93GiB
22:18:56:WU00:FS01:0x22:Free Memory: 12.63GiB
22:18:56:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
22:18:56:WU00:FS01:0x22: OS Version: 6.2
22:18:56:WU00:FS01:0x22:Has Battery: false
22:18:56:WU00:FS01:0x22: On Battery: false
22:18:56:WU00:FS01:0x22: UTC Offset: -7
22:18:56:WU00:FS01:0x22:        PID: 10432
22:18:56:WU00:FS01:0x22:        CWD: C:\Users\kwtho\AppData\Roaming\FAHClient\work
22:18:56:WU00:FS01:0x22:         OS: Windows 10 Pro
22:18:56:WU00:FS01:0x22:    OS Arch: AMD64
22:18:56:WU00:FS01:0x22:********************************************************************************
22:18:56:WU00:FS01:0x22:Project: 11781 (Run 0, Clone 9734, Gen 17)
22:18:56:WU00:FS01:0x22:Unit: 0x0000001e0d5a98395e7588d271197e8d
22:18:56:WU00:FS01:0x22:Reading tar file core.xml
22:18:56:WU00:FS01:0x22:Reading tar file integrator.xml
22:18:56:WU00:FS01:0x22:Reading tar file state.xml
22:18:57:WU00:FS01:0x22:Reading tar file system.xml
22:18:58:WU00:FS01:0x22:Digital signatures verified
22:18:58:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
22:18:58:WU00:FS01:0x22:Version 0.0.2
22:19:10:WU00:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
22:19:10:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
22:19:10:WU00:FS01:0x22:Saving result file science.log
22:19:10:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
22:19:11:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
22:19:11:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:11781 run:0 clone:9734 gen:17 core:0x22 unit:0x0000001e0d5a98395e7588d271197e8d
22:19:11:WU00:FS01:Uploading 8.00KiB to 13.90.152.57
22:19:11:WU00:FS01:Connecting to 13.90.152.57:8080
22:19:11:WU00:FS01:Upload complete
22:19:11:WU00:FS01:Server responded WORK_ACK (400)
22:19:11:WU00:FS01:Cleaning up
22:30:21:WU00:FS01:Connecting to 65.254.110.245:8080
22:30:22:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
22:30:22:WU00:FS01:Connecting to 18.218.241.186:80
Mod Edit: Moved Thread To Correct Forum - PantherX

Re: Multiple Issues with AMD GPU Processing?

Posted: Sat Apr 04, 2020 2:45 am
by JimboPalmer
Welcome to Folding@Home!

https://www.amd.com/en/support/graphics ... eon-rx-580
should be the latest driver.

Re: Multiple Issues with AMD GPU Processing?

Posted: Sat Apr 04, 2020 3:26 am
by Joe_H
The drivers may or may not help. A few days ago a problem was found where many AMD GPUs will fail with that "sortShortList" error if processing a WU from projects that had an atom count that fell within a range of sizes. A list of projects in that range are in the process of having assignments to those projects restricted from most AMD cards.

This project was one of those. I will see about checking into why it was assigned to your system.

P.S. the stock clock for the reference RX 580 is 1257 w/boost of 1340, so it appears your model has a minor factory overclock. Not a factor in this case, but some projects can load the card to where that may make a difference.

Re: Multiple Issues with AMD GPU Processing?

Posted: Sat Apr 04, 2020 9:35 am
by kwthom
This isn't the first GPU error my system has generated.

Appreciate your response and looking forward to providing error free results.

EDIT: Updated GPU drivers (20.4.1) & restarted machine. Will monitor system for improvements, if any.

Re: Multiple Issues with AMD GPU Processing?

Posted: Sat Apr 04, 2020 7:17 pm
by kwthom
Update...nope.

Disabling GPU until further notice.

Code: Select all

18:15:32:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:11776 run:0 clone:1531 gen:16 core:0x22 unit:0x00000022287234c95e73c47d6139add9
18:15:32:WU01:FS01:Starting
18:15:32:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\kwtho\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 705 -lifeline 1028 -checkpoint 20 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
18:15:32:WU01:FS01:Started FahCore on PID 11460
18:15:32:WU01:FS01:Core PID:1336
18:15:32:WU01:FS01:FahCore 0x22 started
18:15:32:WU01:FS01:0x22:*********************** Log Started 2020-04-04T18:15:32Z ***********************
18:15:32:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
18:15:32:WU01:FS01:0x22:       Type: 0x22
18:15:32:WU01:FS01:0x22:       Core: Core22
18:15:32:WU01:FS01:0x22:    Website: https://foldingathome.org/
18:15:32:WU01:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
18:15:32:WU01:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
18:15:32:WU01:FS01:0x22:             <rafal.wiewiora@choderalab.org>
18:15:32:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 705 -lifeline 11460 -checkpoint 20
18:15:32:WU01:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
18:15:32:WU01:FS01:0x22:     Config: <none>
18:15:32:WU01:FS01:0x22:************************************ Build *************************************
18:15:32:WU01:FS01:0x22:    Version: 0.0.2
18:15:32:WU01:FS01:0x22:       Date: Dec 6 2019
18:15:32:WU01:FS01:0x22:       Time: 21:30:31
18:15:33:WU01:FS01:0x22: Repository: Git
18:15:33:WU01:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
18:15:33:WU01:FS01:0x22:     Branch: HEAD
18:15:33:WU01:FS01:0x22:   Compiler: Visual C++ 2008
18:15:33:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
18:15:33:WU01:FS01:0x22:   Platform: win32 10
18:15:33:WU01:FS01:0x22:       Bits: 64
18:15:33:WU01:FS01:0x22:       Mode: Release
18:15:33:WU01:FS01:0x22:************************************ System ************************************
18:15:33:WU01:FS01:0x22:        CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
18:15:33:WU01:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
18:15:33:WU01:FS01:0x22:       CPUs: 6
18:15:33:WU01:FS01:0x22:     Memory: 15.93GiB
18:15:33:WU01:FS01:0x22:Free Memory: 12.60GiB
18:15:33:WU01:FS01:0x22:    Threads: WINDOWS_THREADS
18:15:33:WU01:FS01:0x22: OS Version: 6.2
18:15:33:WU01:FS01:0x22:Has Battery: false
18:15:33:WU01:FS01:0x22: On Battery: false
18:15:33:WU01:FS01:0x22: UTC Offset: -7
18:15:33:WU01:FS01:0x22:        PID: 1336
18:15:33:WU01:FS01:0x22:        CWD: C:\Users\kwtho\AppData\Roaming\FAHClient\work
18:15:33:WU01:FS01:0x22:         OS: Windows 10 Pro
18:15:33:WU01:FS01:0x22:    OS Arch: AMD64
18:15:33:WU01:FS01:0x22:********************************************************************************
18:15:33:WU01:FS01:0x22:Project: 11776 (Run 0, Clone 1531, Gen 16)
18:15:33:WU01:FS01:0x22:Unit: 0x00000022287234c95e73c47d6139add9
18:15:33:WU01:FS01:0x22:Reading tar file core.xml
18:15:33:WU01:FS01:0x22:Reading tar file integrator.xml
18:15:33:WU01:FS01:0x22:Reading tar file state.xml
18:15:33:WU01:FS01:0x22:Reading tar file system.xml
18:15:34:WU01:FS01:0x22:Digital signatures verified
18:15:34:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
18:15:34:WU01:FS01:0x22:Version 0.0.2
18:15:46:WU01:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
18:15:46:WU01:FS01:0x22:Saving result file ..\logfile_01.txt
18:15:46:WU01:FS01:0x22:Saving result file science.log
18:15:46:WU01:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
18:15:47:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
18:15:47:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:11776 run:0 clone:1531 gen:16 core:0x22 unit:0x00000022287234c95e73c47d6139add9
18:15:47:WU01:FS01:Uploading 8.00KiB to 40.114.52.201
18:15:47:WU01:FS01:Connecting to 40.114.52.201:8080
18:15:47:WU03:FS01:Connecting to 65.254.110.245:8080
18:15:47:WARNING:WU03:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
18:15:47:WU03:FS01:Connecting to 18.218.241.186:80
18:15:48:WARNING:WU03:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
18:15:48:ERROR:WU03:FS01:Exception: Could not get an assignment
18:15:48:WU03:FS01:Connecting to 65.254.110.245:8080
18:15:48:WARNING:WU03:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
18:15:48:WU03:FS01:Connecting to 18.218.241.186:80
18:15:48:WARNING:WU03:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
18:15:48:ERROR:WU03:FS01:Exception: Could not get an assignment
18:16:08:WU01:FS01:Upload complete
18:16:08:WU01:FS01:Server responded WORK_ACK (400)
18:16:08:WU01:FS01:Cleaning up
18:16:48:WU03:FS01:Connecting to 65.254.110.245:8080
18:16:48:WARNING:WU03:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
18:16:48:WU03:FS01:Connecting to 18.218.241.186:80
18:16:48:WARNING:WU03:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
18:16:48:ERROR:WU03:FS01:Exception: Could not get an assignment
18:18:25:WU03:FS01:Connecting to 65.254.110.245:8080
18:18:25:WU03:FS01:Assigned to work server 128.252.203.10
18:18:25:WU03:FS01:Requesting new work unit for slot 01: READY gpu:0:Ellesmere XT [Radeon RX 470/480/570/580] from 128.252.203.10
18:18:25:WU03:FS01:Connecting to 128.252.203.10:8080

Re: Multiple Issues with AMD GPU Processing?

Posted: Sat Apr 04, 2020 9:19 pm
by mwroggenbuck
I just purchased a RX 570 (yes, I know it is not the fastest, but it is a good price). I can't seem to run it against any folding at home GPU tasks. The GPU will fail, fall back to a checkpoint, then eventually give up. It crashes my Radeon control software (which will start a new instance). One time it locked up my computer (I had to do a power cycle).

I ran FurMark for over an hour with no problems. The temperature stayed below 75 and no crashes. I have been running Einstein at home (and other AMD GPU BOINC projects) with no issue.

Unless someone can think of something else, I have to believe that some of the Folding at Home GPU cores are not stable.

Anyone have thoughts?

Re: Multiple Issues with AMD GPU Processing?

Posted: Sat Apr 04, 2020 9:27 pm
by PantherX
mwroggenbuck wrote:...Unless someone can think of something else, I have to believe that some of the Folding at Home GPU cores are not stable.

Anyone have thoughts?
As posted by Joe_H above:
Joe_H wrote:The drivers may or may not help. A few days ago a problem was found where many AMD GPUs will fail with that "sortShortList" error if processing a WU from projects that had an atom count that fell within a range of sizes. A list of projects in that range are in the process of having assignments to those projects restricted from most AMD cards.

This project was one of those...
Every now and then, a vendor publishes a driver that breaks F@H GPU due to a bug or missing features, etc. How long it takes to resolve, it depends on the vendor. However, you can help out by reporting it to the vendor and asking them to fix it.

Re: Multiple Issues with AMD GPU Processing?

Posted: Sat Apr 04, 2020 9:33 pm
by mwroggenbuck
This happened with three different sets of AMD drivers (I am currently using 20.4.1).

I may be dense, but if this is a driver problem, why do my other tasks (and games) work? This seems to be specific to Folding At Home.

Just a thought. I am just trying to problem solve and not point fingers. I do not want to hurt anyone's feelings.

Re: Multiple Issues with AMD GPU Processing?

Posted: Sat Apr 04, 2020 9:40 pm
by PantherX
mwroggenbuck wrote:...if this is a driver problem, why do my other tasks (and games) work? This seems to be specific to Folding At Home...
AFAIK, games use OpenGL while F@H uses OpenCL. While they look similar to each other, they are very different. OpenGL is for rendering frames (games) while OpenCL is for compute (F@H and other software).

As Joe_H mentioned, once the Project assignment is fixed on F@H Servers, you will not be assigned WUs from that project but will be assigned WUs from other projects that your GPU can process :)

Re: Multiple Issues with AMD GPU Processing?

Posted: Sat Apr 04, 2020 9:51 pm
by Joe_H
Depends on whether those other tasks use OpenCL, what parts of OpenCL they use, and so on. The GPU folding core uses OpenCL and heavily.

What they do know is that for WUs from projects whose size is about 170k atoms +/- the error occurs. Both the developers of the core and AMD are aware of the issue and are looking into it

Re: Multiple Issues with AMD GPU Processing?

Posted: Sat Apr 04, 2020 10:14 pm
by mwroggenbuck
I never thought about games not using OpenCL. Like you said, they probably use OpenGL or DirectX. I know that the benchmark program I used had OpenGL. However, I am sure that the other BONIC projects use OpenCL, but they may not use double precision, or the functions that folding at home uses. That all makes sense to me now. I appreciate the responses.

I will keep an eye on this thread. If anyone thinks this gets fixed, please post here and I will try again.

Thanks.

Re: Multiple Issues with AMD GPU Processing?

Posted: Sat Apr 04, 2020 10:59 pm
by kwthom
I'm just wondering...ratio of AMD to Nvidia GPU's as recorded previously leans heavily toward Nvidia.

How many *other* AMD GPU owners have no clue that they may not be fruitful processing CV19-related WU's.

I just re-enabled my GPU - Project: 11778 (Run 0, Clone 4164, Gen 23) WU seems to be processing at this time.

Re: Multiple Issues with AMD GPU Processing?

Posted: Sat Apr 04, 2020 11:06 pm
by uyaem
kwthom wrote:I'm just wondering...ratio of AMD to Nvidia GPU's as recorded previously leans heavily toward Nvidia.

How many *other* AMD GPU owners have no clue that they may not be fruitful processing CV19-related WU's.

I just re-enabled my GPU - Project: 11778 (Run 0, Clone 4164, Gen 23) WU seems to be processing at this time.
I don't think the stats are based on successful returns, but on the requests of WUs.

Re: Multiple Issues with AMD GPU Processing?

Posted: Sat Apr 04, 2020 11:09 pm
by JimboPalmer
In past, there were years that the Nvidia cards did much more work than the AMD cards. You can see 'habit' at work, we are still buying Nvidia. (my GTX 1050ti LP just lost a fan and I ebayed a GTX 1650 LP without ever looking how strong an AMD card I could get in Low Profile with no power connectors, Just knee jerk to Nvidia)

Re: Multiple Issues with AMD GPU Processing?

Posted: Sun Apr 05, 2020 2:31 am
by kwthom
JimboPalmer wrote:You can see 'habit' at work, we are still buying Nvidia.<...>
This is the first 'big time' GPU I've purchased in... a really long time. Yes, I recognized this wasn't exactly bleeding edge tech when I built this current machine about a year ago, but it was decent horsepower for the money.

It crunched a *lot* of Seti@Home data, with no issues that I'm aware of. I'm fully cognizant that this is apples vs. oranges.

That earlier WU has completed with no errors, uploaded successfully.

Thus, some are acceptable to the GPU (myself and ~70,000 others have...), and some will fail. What that successful vs. unsuccessful ratio is?

Probably beyond my pay grade.

The education is appreciated - thanks!