Multiple Issues with AMD GPU Processing?

It seems that a lot of GPU problems revolve around specific versions of drivers. Though AMD has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

Multiple Issues with AMD GPU Processing?

Postby kwthom » Sat Apr 04, 2020 3:36 am

Mods - please move if this is in the incorrect subforum...

In another thread, it would seem that multiple AMD GPU's may be having issues with certain WU's?

Windows 10 Pro
Intel Core I5 9400F @ 2.9 Ghz - 16GB RAM
Radeon RX 580, 8Gb RAM, software version 19.9.2
Not overclocked (1300MHz...) & temps hover around 80C - local temperatures are 26 - 28C these days...

I've not yet found the ability to track what's been successful and what hasn't been lately.

Should I kill the GPU for now?

Code: Select all
******************************* Date: 2020-04-03 *******************************
22:18:56:WU00:FS01:Starting
22:18:56:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\kwtho\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 705 -lifeline 10348 -checkpoint 20 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
22:18:56:WU00:FS01:Started FahCore on PID 904
22:18:56:WU00:FS01:Core PID:10432
22:18:56:WU00:FS01:FahCore 0x22 started
22:18:56:WU00:FS01:0x22:*********************** Log Started 2020-04-03T22:18:56Z ***********************
22:18:56:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
22:18:56:WU00:FS01:0x22:       Type: 0x22
22:18:56:WU00:FS01:0x22:       Core: Core22
22:18:56:WU00:FS01:0x22:    Website: https://foldingathome.org/
22:18:56:WU00:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
22:18:56:WU00:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
22:18:56:WU00:FS01:0x22:             <rafal.wiewiora@choderalab.org>
22:18:56:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 705 -lifeline 904 -checkpoint 20
22:18:56:WU00:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
22:18:56:WU00:FS01:0x22:     Config: <none>
22:18:56:WU00:FS01:0x22:************************************ Build *************************************
22:18:56:WU00:FS01:0x22:    Version: 0.0.2
22:18:56:WU00:FS01:0x22:       Date: Dec 6 2019
22:18:56:WU00:FS01:0x22:       Time: 21:30:31
22:18:56:WU00:FS01:0x22: Repository: Git
22:18:56:WU00:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
22:18:56:WU00:FS01:0x22:     Branch: HEAD
22:18:56:WU00:FS01:0x22:   Compiler: Visual C++ 2008
22:18:56:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
22:18:56:WU00:FS01:0x22:   Platform: win32 10
22:18:56:WU00:FS01:0x22:       Bits: 64
22:18:56:WU00:FS01:0x22:       Mode: Release
22:18:56:WU00:FS01:0x22:************************************ System ************************************
22:18:56:WU00:FS01:0x22:        CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
22:18:56:WU00:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
22:18:56:WU00:FS01:0x22:       CPUs: 6
22:18:56:WU00:FS01:0x22:     Memory: 15.93GiB
22:18:56:WU00:FS01:0x22:Free Memory: 12.63GiB
22:18:56:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
22:18:56:WU00:FS01:0x22: OS Version: 6.2
22:18:56:WU00:FS01:0x22:Has Battery: false
22:18:56:WU00:FS01:0x22: On Battery: false
22:18:56:WU00:FS01:0x22: UTC Offset: -7
22:18:56:WU00:FS01:0x22:        PID: 10432
22:18:56:WU00:FS01:0x22:        CWD: C:\Users\kwtho\AppData\Roaming\FAHClient\work
22:18:56:WU00:FS01:0x22:         OS: Windows 10 Pro
22:18:56:WU00:FS01:0x22:    OS Arch: AMD64
22:18:56:WU00:FS01:0x22:********************************************************************************
22:18:56:WU00:FS01:0x22:Project: 11781 (Run 0, Clone 9734, Gen 17)
22:18:56:WU00:FS01:0x22:Unit: 0x0000001e0d5a98395e7588d271197e8d
22:18:56:WU00:FS01:0x22:Reading tar file core.xml
22:18:56:WU00:FS01:0x22:Reading tar file integrator.xml
22:18:56:WU00:FS01:0x22:Reading tar file state.xml
22:18:57:WU00:FS01:0x22:Reading tar file system.xml
22:18:58:WU00:FS01:0x22:Digital signatures verified
22:18:58:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
22:18:58:WU00:FS01:0x22:Version 0.0.2
22:19:10:WU00:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
22:19:10:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
22:19:10:WU00:FS01:0x22:Saving result file science.log
22:19:10:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
22:19:11:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
22:19:11:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:11781 run:0 clone:9734 gen:17 core:0x22 unit:0x0000001e0d5a98395e7588d271197e8d
22:19:11:WU00:FS01:Uploading 8.00KiB to 13.90.152.57
22:19:11:WU00:FS01:Connecting to 13.90.152.57:8080
22:19:11:WU00:FS01:Upload complete
22:19:11:WU00:FS01:Server responded WORK_ACK (400)
22:19:11:WU00:FS01:Cleaning up
22:30:21:WU00:FS01:Connecting to 65.254.110.245:8080
22:30:22:WARNING:WU00:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
22:30:22:WU00:FS01:Connecting to 18.218.241.186:80


Mod Edit: Moved Thread To Correct Forum - PantherX
Image
kwthom
 
Posts: 23
Joined: Mon Mar 30, 2020 12:06 am
Location: Jaynes Station, AZ

Re: Multiple Issues with AMD GPU Processing?

Postby JimboPalmer » Sat Apr 04, 2020 3:45 am

Welcome to Folding@Home!

https://www.amd.com/en/support/graphics ... eon-rx-580
should be the latest driver.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
JimboPalmer
 
Posts: 1955
Joined: Mon Feb 16, 2009 5:12 am
Location: Greenwood MS USA

Re: Multiple Issues with AMD GPU Processing?

Postby Joe_H » Sat Apr 04, 2020 4:26 am

The drivers may or may not help. A few days ago a problem was found where many AMD GPUs will fail with that "sortShortList" error if processing a WU from projects that had an atom count that fell within a range of sizes. A list of projects in that range are in the process of having assignments to those projects restricted from most AMD cards.

This project was one of those. I will see about checking into why it was assigned to your system.

P.S. the stock clock for the reference RX 580 is 1257 w/boost of 1340, so it appears your model has a minor factory overclock. Not a factor in this case, but some projects can load the card to where that may make a difference.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 6441
Joined: Tue Apr 21, 2009 5:41 pm
Location: W. MA

Re: Multiple Issues with AMD GPU Processing?

Postby kwthom » Sat Apr 04, 2020 10:35 am

This isn't the first GPU error my system has generated.

Appreciate your response and looking forward to providing error free results.

EDIT: Updated GPU drivers (20.4.1) & restarted machine. Will monitor system for improvements, if any.
kwthom
 
Posts: 23
Joined: Mon Mar 30, 2020 12:06 am
Location: Jaynes Station, AZ

Re: Multiple Issues with AMD GPU Processing?

Postby kwthom » Sat Apr 04, 2020 8:17 pm

Update...nope.

Disabling GPU until further notice.

Code: Select all
18:15:32:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:11776 run:0 clone:1531 gen:16 core:0x22 unit:0x00000022287234c95e73c47d6139add9
18:15:32:WU01:FS01:Starting
18:15:32:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\kwtho\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 705 -lifeline 1028 -checkpoint 20 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
18:15:32:WU01:FS01:Started FahCore on PID 11460
18:15:32:WU01:FS01:Core PID:1336
18:15:32:WU01:FS01:FahCore 0x22 started
18:15:32:WU01:FS01:0x22:*********************** Log Started 2020-04-04T18:15:32Z ***********************
18:15:32:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
18:15:32:WU01:FS01:0x22:       Type: 0x22
18:15:32:WU01:FS01:0x22:       Core: Core22
18:15:32:WU01:FS01:0x22:    Website: https://foldingathome.org/
18:15:32:WU01:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
18:15:32:WU01:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
18:15:32:WU01:FS01:0x22:             <rafal.wiewiora@choderalab.org>
18:15:32:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 705 -lifeline 11460 -checkpoint 20
18:15:32:WU01:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
18:15:32:WU01:FS01:0x22:     Config: <none>
18:15:32:WU01:FS01:0x22:************************************ Build *************************************
18:15:32:WU01:FS01:0x22:    Version: 0.0.2
18:15:32:WU01:FS01:0x22:       Date: Dec 6 2019
18:15:32:WU01:FS01:0x22:       Time: 21:30:31
18:15:33:WU01:FS01:0x22: Repository: Git
18:15:33:WU01:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
18:15:33:WU01:FS01:0x22:     Branch: HEAD
18:15:33:WU01:FS01:0x22:   Compiler: Visual C++ 2008
18:15:33:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
18:15:33:WU01:FS01:0x22:   Platform: win32 10
18:15:33:WU01:FS01:0x22:       Bits: 64
18:15:33:WU01:FS01:0x22:       Mode: Release
18:15:33:WU01:FS01:0x22:************************************ System ************************************
18:15:33:WU01:FS01:0x22:        CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
18:15:33:WU01:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
18:15:33:WU01:FS01:0x22:       CPUs: 6
18:15:33:WU01:FS01:0x22:     Memory: 15.93GiB
18:15:33:WU01:FS01:0x22:Free Memory: 12.60GiB
18:15:33:WU01:FS01:0x22:    Threads: WINDOWS_THREADS
18:15:33:WU01:FS01:0x22: OS Version: 6.2
18:15:33:WU01:FS01:0x22:Has Battery: false
18:15:33:WU01:FS01:0x22: On Battery: false
18:15:33:WU01:FS01:0x22: UTC Offset: -7
18:15:33:WU01:FS01:0x22:        PID: 1336
18:15:33:WU01:FS01:0x22:        CWD: C:\Users\kwtho\AppData\Roaming\FAHClient\work
18:15:33:WU01:FS01:0x22:         OS: Windows 10 Pro
18:15:33:WU01:FS01:0x22:    OS Arch: AMD64
18:15:33:WU01:FS01:0x22:********************************************************************************
18:15:33:WU01:FS01:0x22:Project: 11776 (Run 0, Clone 1531, Gen 16)
18:15:33:WU01:FS01:0x22:Unit: 0x00000022287234c95e73c47d6139add9
18:15:33:WU01:FS01:0x22:Reading tar file core.xml
18:15:33:WU01:FS01:0x22:Reading tar file integrator.xml
18:15:33:WU01:FS01:0x22:Reading tar file state.xml
18:15:33:WU01:FS01:0x22:Reading tar file system.xml
18:15:34:WU01:FS01:0x22:Digital signatures verified
18:15:34:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
18:15:34:WU01:FS01:0x22:Version 0.0.2
18:15:46:WU01:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
18:15:46:WU01:FS01:0x22:Saving result file ..\logfile_01.txt
18:15:46:WU01:FS01:0x22:Saving result file science.log
18:15:46:WU01:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
18:15:47:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
18:15:47:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:11776 run:0 clone:1531 gen:16 core:0x22 unit:0x00000022287234c95e73c47d6139add9
18:15:47:WU01:FS01:Uploading 8.00KiB to 40.114.52.201
18:15:47:WU01:FS01:Connecting to 40.114.52.201:8080
18:15:47:WU03:FS01:Connecting to 65.254.110.245:8080
18:15:47:WARNING:WU03:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
18:15:47:WU03:FS01:Connecting to 18.218.241.186:80
18:15:48:WARNING:WU03:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
18:15:48:ERROR:WU03:FS01:Exception: Could not get an assignment
18:15:48:WU03:FS01:Connecting to 65.254.110.245:8080
18:15:48:WARNING:WU03:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
18:15:48:WU03:FS01:Connecting to 18.218.241.186:80
18:15:48:WARNING:WU03:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
18:15:48:ERROR:WU03:FS01:Exception: Could not get an assignment
18:16:08:WU01:FS01:Upload complete
18:16:08:WU01:FS01:Server responded WORK_ACK (400)
18:16:08:WU01:FS01:Cleaning up
18:16:48:WU03:FS01:Connecting to 65.254.110.245:8080
18:16:48:WARNING:WU03:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
18:16:48:WU03:FS01:Connecting to 18.218.241.186:80
18:16:48:WARNING:WU03:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
18:16:48:ERROR:WU03:FS01:Exception: Could not get an assignment
18:18:25:WU03:FS01:Connecting to 65.254.110.245:8080
18:18:25:WU03:FS01:Assigned to work server 128.252.203.10
18:18:25:WU03:FS01:Requesting new work unit for slot 01: READY gpu:0:Ellesmere XT [Radeon RX 470/480/570/580] from 128.252.203.10
18:18:25:WU03:FS01:Connecting to 128.252.203.10:8080
Last edited by kwthom on Sat Apr 04, 2020 10:31 pm, edited 1 time in total.
kwthom
 
Posts: 23
Joined: Mon Mar 30, 2020 12:06 am
Location: Jaynes Station, AZ

Re: Multiple Issues with AMD GPU Processing?

Postby mwroggenbuck » Sat Apr 04, 2020 10:19 pm

I just purchased a RX 570 (yes, I know it is not the fastest, but it is a good price). I can't seem to run it against any folding at home GPU tasks. The GPU will fail, fall back to a checkpoint, then eventually give up. It crashes my Radeon control software (which will start a new instance). One time it locked up my computer (I had to do a power cycle).

I ran FurMark for over an hour with no problems. The temperature stayed below 75 and no crashes. I have been running Einstein at home (and other AMD GPU BOINC projects) with no issue.

Unless someone can think of something else, I have to believe that some of the Folding at Home GPU cores are not stable.

Anyone have thoughts?
mwroggenbuck
 
Posts: 74
Joined: Tue Mar 24, 2020 1:47 pm

Re: Multiple Issues with AMD GPU Processing?

Postby PantherX » Sat Apr 04, 2020 10:27 pm

mwroggenbuck wrote:...Unless someone can think of something else, I have to believe that some of the Folding at Home GPU cores are not stable.

Anyone have thoughts?

As posted by Joe_H above:
Joe_H wrote:The drivers may or may not help. A few days ago a problem was found where many AMD GPUs will fail with that "sortShortList" error if processing a WU from projects that had an atom count that fell within a range of sizes. A list of projects in that range are in the process of having assignments to those projects restricted from most AMD cards.

This project was one of those...


Every now and then, a vendor publishes a driver that breaks F@H GPU due to a bug or missing features, etc. How long it takes to resolve, it depends on the vendor. However, you can help out by reporting it to the vendor and asking them to fix it.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
User avatar
PantherX
Site Moderator
 
Posts: 6327
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: Multiple Issues with AMD GPU Processing?

Postby mwroggenbuck » Sat Apr 04, 2020 10:33 pm

This happened with three different sets of AMD drivers (I am currently using 20.4.1).

I may be dense, but if this is a driver problem, why do my other tasks (and games) work? This seems to be specific to Folding At Home.

Just a thought. I am just trying to problem solve and not point fingers. I do not want to hurt anyone's feelings.
mwroggenbuck
 
Posts: 74
Joined: Tue Mar 24, 2020 1:47 pm

Re: Multiple Issues with AMD GPU Processing?

Postby PantherX » Sat Apr 04, 2020 10:40 pm

mwroggenbuck wrote:...if this is a driver problem, why do my other tasks (and games) work? This seems to be specific to Folding At Home...

AFAIK, games use OpenGL while F@H uses OpenCL. While they look similar to each other, they are very different. OpenGL is for rendering frames (games) while OpenCL is for compute (F@H and other software).

As Joe_H mentioned, once the Project assignment is fixed on F@H Servers, you will not be assigned WUs from that project but will be assigned WUs from other projects that your GPU can process :)
User avatar
PantherX
Site Moderator
 
Posts: 6327
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: Multiple Issues with AMD GPU Processing?

Postby Joe_H » Sat Apr 04, 2020 10:51 pm

Depends on whether those other tasks use OpenCL, what parts of OpenCL they use, and so on. The GPU folding core uses OpenCL and heavily.

What they do know is that for WUs from projects whose size is about 170k atoms +/- the error occurs. Both the developers of the core and AMD are aware of the issue and are looking into it
Joe_H
Site Admin
 
Posts: 6441
Joined: Tue Apr 21, 2009 5:41 pm
Location: W. MA

Re: Multiple Issues with AMD GPU Processing?

Postby mwroggenbuck » Sat Apr 04, 2020 11:14 pm

I never thought about games not using OpenCL. Like you said, they probably use OpenGL or DirectX. I know that the benchmark program I used had OpenGL. However, I am sure that the other BONIC projects use OpenCL, but they may not use double precision, or the functions that folding at home uses. That all makes sense to me now. I appreciate the responses.

I will keep an eye on this thread. If anyone thinks this gets fixed, please post here and I will try again.

Thanks.
mwroggenbuck
 
Posts: 74
Joined: Tue Mar 24, 2020 1:47 pm

Re: Multiple Issues with AMD GPU Processing?

Postby kwthom » Sat Apr 04, 2020 11:59 pm

I'm just wondering...ratio of AMD to Nvidia GPU's as recorded previously leans heavily toward Nvidia.

How many *other* AMD GPU owners have no clue that they may not be fruitful processing CV19-related WU's.

I just re-enabled my GPU - Project: 11778 (Run 0, Clone 4164, Gen 23) WU seems to be processing at this time.
kwthom
 
Posts: 23
Joined: Mon Mar 30, 2020 12:06 am
Location: Jaynes Station, AZ

Re: Multiple Issues with AMD GPU Processing?

Postby uyaem » Sun Apr 05, 2020 12:06 am

kwthom wrote:I'm just wondering...ratio of AMD to Nvidia GPU's as recorded previously leans heavily toward Nvidia.

How many *other* AMD GPU owners have no clue that they may not be fruitful processing CV19-related WU's.

I just re-enabled my GPU - Project: 11778 (Run 0, Clone 4164, Gen 23) WU seems to be processing at this time.

I don't think the stats are based on successful returns, but on the requests of WUs.
Image
CPU: Ryzen 9 3900X (1x21 CPUs) ~ GPU: nVidia GeForce GTX 1660 Super (Asus)
uyaem
 
Posts: 216
Joined: Sat Mar 21, 2020 8:35 pm
Location: Esslingen, Germany

Re: Multiple Issues with AMD GPU Processing?

Postby JimboPalmer » Sun Apr 05, 2020 12:09 am

In past, there were years that the Nvidia cards did much more work than the AMD cards. You can see 'habit' at work, we are still buying Nvidia. (my GTX 1050ti LP just lost a fan and I ebayed a GTX 1650 LP without ever looking how strong an AMD card I could get in Low Profile with no power connectors, Just knee jerk to Nvidia)
JimboPalmer
 
Posts: 1955
Joined: Mon Feb 16, 2009 5:12 am
Location: Greenwood MS USA

Re: Multiple Issues with AMD GPU Processing?

Postby kwthom » Sun Apr 05, 2020 3:31 am

JimboPalmer wrote:You can see 'habit' at work, we are still buying Nvidia.<...>

This is the first 'big time' GPU I've purchased in... a really long time. Yes, I recognized this wasn't exactly bleeding edge tech when I built this current machine about a year ago, but it was decent horsepower for the money.

It crunched a *lot* of Seti@Home data, with no issues that I'm aware of. I'm fully cognizant that this is apples vs. oranges.

That earlier WU has completed with no errors, uploaded successfully.

Thus, some are acceptable to the GPU (myself and ~70,000 others have...), and some will fail. What that successful vs. unsuccessful ratio is?

Probably beyond my pay grade.

The education is appreciated - thanks!
kwthom
 
Posts: 23
Joined: Mon Mar 30, 2020 12:06 am
Location: Jaynes Station, AZ

Next

Return to Problems with AMD/ATI drivers

Who is online

Users browsing this forum: jeffmr4 and 2 guests

cron