AMD GPU Error sortShortList on some projects

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, PandeGroup

Re: AMD GPU Error sortShortList on some projects

Postby MrFrizzy » Tue Mar 24, 2020 11:56 pm

muziqaz wrote:Researchers are looking into disabling those projects on AMD GPUs until fix has been found.

Just to add to this discussion, I have a 5700 XT and have had no failures on any of the projects mentioned in this thread. Perhaps the source of the error isn't present on Navi cards?

Successful projects (tracked in the spreadsheet in my sig): 11741-11752, 11755, 11759, 11762-11764, 11776-11778, 11780, 11781

Driver: 20.2.2

Similar post here: https://foldingforum.org/viewtopic.php?f=81&t=32771
S1: AMD R5 3600 & Sapphire RX 5700XT Reference @2.1GHz under water
S2: Old Xeon Workstation with BIOS modded Gigabyte G1 Gaming GTX 980 Ti

RX 5700 XT Project & PPD Tracking Spreadsheet

Image
MrFrizzy
 
Posts: 79
Joined: Fri Feb 14, 2020 5:48 am

Re: AMD GPU Error sortShortList on some projects

Postby muziqaz » Wed Mar 25, 2020 12:27 am

MrFrizzy wrote:
muziqaz wrote:Researchers are looking into disabling those projects on AMD GPUs until fix has been found.

Just to add to this discussion, I have a 5700 XT and have had no failures on any of the projects mentioned in this thread. Perhaps the source of the error isn't present on Navi cards?

Successful projects (tracked in the spreadsheet in my sig): 11741-11752, 11755, 11759, 11762-11764, 11776-11778, 11780, 11781

Driver: 20.2.2

Similar post here: https://foldingforum.org/viewtopic.php?f=81&t=32771


Thank you for information. Seems that GCN based cards are influenced.
Big Navi can't come quick enough :D
User avatar
muziqaz
 
Posts: 678
Joined: Sun Dec 16, 2007 7:22 pm
Location: London

Re: AMD GPU Error sortShortList on some projects

Postby alxbelu » Wed Mar 25, 2020 12:33 am

muziqaz wrote:
MrFrizzy wrote:
muziqaz wrote:Researchers are looking into disabling those projects on AMD GPUs until fix has been found.

Just to add to this discussion, I have a 5700 XT and have had no failures on any of the projects mentioned in this thread. Perhaps the source of the error isn't present on Navi cards?

Successful projects (tracked in the spreadsheet in my sig): 11741-11752, 11755, 11759, 11762-11764, 11776-11778, 11780, 11781

Driver: 20.2.2

Similar post here: https://foldingforum.org/viewtopic.php?f=81&t=32771


Thank you for information. Seems that GCN based cards are influenced.
Big Navi can't come quick enough :D


Yep, and yep! (Was planning on upgrading my desktop this year, my 290x just turned 6 and deserves retirement, but I guess we'll see if launches actually happen as planned this year..)
Official F@H Twitter (frequently updated): https://twitter.com/foldingathome
Official F@H Facebook: https://www.facebook.com/Foldinghome-136059519794607/

(I'm not affiliated with the F@H Team, just promoting these channels for official updates)
alxbelu
 
Posts: 84
Joined: Sat Mar 14, 2020 7:28 pm

Re: AMD GPU Error sortShortList on some projects

Postby _r2w_ben » Thu Mar 26, 2020 2:13 am

muziqaz wrote:Researchers are looking into disabling those projects on AMD GPUs until fix has been found.
Thank you for understanding

The restriction needs to be added to p14533. One was assigned at 2020-03-25T23:29:18Z.
_r2w_ben
 
Posts: 174
Joined: Wed Apr 23, 2008 4:11 pm

Re: AMD GPU Error sortShortList on some projects

Postby muziqaz » Thu Mar 26, 2020 8:37 am

_r2w_ben wrote:
muziqaz wrote:Researchers are looking into disabling those projects on AMD GPUs until fix has been found.
Thank you for understanding

The restriction needs to be added to p14533. One was assigned at 2020-03-25T23:29:18Z.


Thanks for the info. It was passed to researchers.
User avatar
muziqaz
 
Posts: 678
Joined: Sun Dec 16, 2007 7:22 pm
Location: London

Re: AMD GPU Error sortShortList on some projects

Postby MrFrizzy » Thu Mar 26, 2020 9:14 pm

muziqaz wrote:
_r2w_ben wrote:The restriction needs to be added to p14533. One was assigned at 2020-03-25T23:29:18Z.

Thanks for the info. It was passed to researchers.

On the 5700 XT, I was able to process the only 14533 project I got to 100% and sent the results to the server only to have the server dump the results. So while this is a different result than the kernel message from before, I think it needs to be pointed out for distinction. Whatever the kernel message is about, it is not for all AMD cards.

As pointed out in an earlier post, I can process all of the COVID-19 core22 related projects just fine, not one has erred out for any reason besides me messing with my overclock. See the spreadsheet in my sig, I have tracked 95 successful COVID-19 core22 projects (85 are shown). If any of the devs/researchers need more information, I can provide PRCG numbers for all projects with timestamps or even the full logs (I archive all of them before the client can clean them out).

I would suggest not blocking all AMD cards on these projects and to allow species 6 to continue folding.
MrFrizzy
 
Posts: 79
Joined: Fri Feb 14, 2020 5:48 am

Re: AMD GPU Error sortShortList on some projects

Postby muziqaz » Thu Mar 26, 2020 9:30 pm

At the moment, projects which are known to fail on AMD are being blocked. The rest of them are freely available (relatively speaking).
User avatar
muziqaz
 
Posts: 678
Joined: Sun Dec 16, 2007 7:22 pm
Location: London

Re: AMD GPU Error sortShortList on some projects

Postby _r2w_ben » Sat Mar 28, 2020 9:13 pm

The restriction needs to be added to p11781. One was assigned at 2020-03-28T20:10:11Z.
_r2w_ben
 
Posts: 174
Joined: Wed Apr 23, 2008 4:11 pm

Re: AMD GPU Error sortShortList on some projects

Postby IkkeDus » Sat Mar 28, 2020 9:59 pm

I also see this problem.
AMD R9 280X 3GB (ID: 6798 SUB: 3001)

Project: 11776

It often seems to be stuck after the "...0x22:Version 0.0.2" log line. If I leave it alone it will stay there for hours. If I pause/unpause it either get stuck there again or it finishes with the error. At least it will retry to fetch another WU.


Code: Select all
20:42:10:WU02:FS02:Starting
20:42:10:WU02:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\Ray\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 02 -suffix 01 -version 705 -lifeline 8668 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 2 -gpu 2
20:42:10:WU02:FS02:Started FahCore on PID 8908
20:42:10:WU02:FS02:Core PID:8932
20:42:10:WU02:FS02:FahCore 0x22 started
20:42:11:WU02:FS02:0x22:*********************** Log Started 2020-03-28T20:42:10Z ***********************
20:42:11:WU02:FS02:0x22:*************************** Core22 Folding@home Core ***************************
20:42:11:WU02:FS02:0x22:       Type: 0x22
20:42:11:WU02:FS02:0x22:       Core: Core22
20:42:11:WU02:FS02:0x22:    Website: https://foldingathome.org/
20:42:11:WU02:FS02:0x22:  Copyright: (c) 2009-2018 foldingathome.org
20:42:11:WU02:FS02:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
20:42:11:WU02:FS02:0x22:             <rafal.wiewiora@choderalab.org>
20:42:11:WU02:FS02:0x22:       Args: -dir 02 -suffix 01 -version 705 -lifeline 8908 -checkpoint 15
20:42:11:WU02:FS02:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 2 -gpu 2
20:42:11:WU02:FS02:0x22:     Config: <none>
20:42:11:WU02:FS02:0x22:************************************ Build *************************************
20:42:11:WU02:FS02:0x22:    Version: 0.0.2
20:42:11:WU02:FS02:0x22:       Date: Dec 6 2019
20:42:11:WU02:FS02:0x22:       Time: 21:30:31
20:42:11:WU02:FS02:0x22: Repository: Git
20:42:11:WU02:FS02:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
20:42:11:WU02:FS02:0x22:     Branch: HEAD
20:42:11:WU02:FS02:0x22:   Compiler: Visual C++ 2008
20:42:11:WU02:FS02:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
20:42:11:WU02:FS02:0x22:   Platform: win32 10
20:42:11:WU02:FS02:0x22:       Bits: 64
20:42:11:WU02:FS02:0x22:       Mode: Release
20:42:11:WU02:FS02:0x22:************************************ System ************************************
20:42:11:WU02:FS02:0x22:        CPU: Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz
20:42:11:WU02:FS02:0x22:     CPU ID: GenuineIntel Family 6 Model 23 Stepping 10
20:42:11:WU02:FS02:0x22:       CPUs: 4
20:42:11:WU02:FS02:0x22:     Memory: 4.00GiB
20:42:11:WU02:FS02:0x22:Free Memory: 2.13GiB
20:42:11:WU02:FS02:0x22:    Threads: WINDOWS_THREADS
20:42:11:WU02:FS02:0x22: OS Version: 6.2
20:42:11:WU02:FS02:0x22:Has Battery: false
20:42:11:WU02:FS02:0x22: On Battery: false
20:42:11:WU02:FS02:0x22: UTC Offset: 1
20:42:11:WU02:FS02:0x22:        PID: 8932
20:42:11:WU02:FS02:0x22:        CWD: C:\Users\\AppData\Roaming\FAHClient\work
20:42:11:WU02:FS02:0x22:         OS: Windows 10 Pro
20:42:11:WU02:FS02:0x22:    OS Arch: AMD64
20:42:11:WU02:FS02:0x22:********************************************************************************
20:42:11:WU02:FS02:0x22:Project: 11776 (Run 0, Clone 1781, Gen 6)
20:42:11:WU02:FS02:0x22:Unit: 0x0000000f287234c95e73c47b56c80b8a
20:42:11:WU02:FS02:0x22:Reading tar file core.xml
20:42:11:WU02:FS02:0x22:Reading tar file integrator.xml
20:42:11:WU02:FS02:0x22:Reading tar file state.xml
20:42:12:WU02:FS02:0x22:Reading tar file system.xml
20:42:14:WU02:FS02:0x22:Digital signatures verified
20:42:14:WU02:FS02:0x22:Folding@home GPU Core22 Folding@home Core
20:42:14:WU02:FS02:0x22:Version 0.0.2
20:42:45:WU02:FS02:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
20:42:45:WU02:FS02:0x22:Saving result file ..\logfile_01.txt
20:42:45:WU02:FS02:0x22:Saving result file science.log
20:42:45:WU02:FS02:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
20:42:45:WARNING:WU02:FS02:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
20:42:45:WU02:FS02:Sending unit results: id:02 state:SEND error:FAULTY project:11776 run:0 clone:1781 gen:6 core:0x22 unit:0x0000000f287234c95e73c47b56c80b8a
20:42:45:WU02:FS02:Uploading 15.00KiB to 40.114.52.201
20:42:45:WU02:FS02:Connecting to 40.114.52.201:8080
20:42:46:WU03:FS02:Connecting to 65.254.110.245:8080
20:42:46:WARNING:WU03:FS02:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
20:42:46:WU03:FS02:Connecting to 18.218.241.186:80
20:42:47:WARNING:WU03:FS02:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
20:42:47:ERROR:WU03:FS02:Exception: Could not get an assignment
20:42:47:WU03:FS02:Connecting to 65.254.110.245:8080
20:42:47:WARNING:WU03:FS02:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
20:42:47:WU03:FS02:Connecting to 18.218.241.186:80
20:42:48:WARNING:WU03:FS02:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration
20:42:48:ERROR:WU03:FS02:Exception: Could not get an assignment
20:43:06:WARNING:WU02:FS02:WorkServer connection failed on port 8080 trying 80
20:43:06:WU02:FS02:Connecting to 40.114.52.201:80
20:43:14:WU02:FS02:Upload 100.00%
20:43:30:WU02:FS02:Upload complete
20:43:30:WU02:FS02:Server responded WORK_ACK (400)
20:43:30:WU02:FS02:Cleaning up
Q9550 @ 2.8 GHz | 2x R9 280X-3GB | HD 7950-3GB | Win10 x64

Image
IkkeDus
 
Posts: 23
Joined: Wed Jun 18, 2008 11:42 am
Location: Amsterdam, The Netherlands

Re: AMD GPU Error sortShortList on some projects

Postby muziqaz » Sat Mar 28, 2020 11:48 pm

Just an update, some people at AMD are aware of this issue and are looking into it :)
Hopefully we will have it solved sooner rather than later :)
Thank you for your patience
User avatar
muziqaz
 
Posts: 678
Joined: Sun Dec 16, 2007 7:22 pm
Location: London

Re: AMD GPU Error sortShortList on some projects

Postby bruce » Sun Mar 29, 2020 1:11 am

First a temporary solution from FAH: Those projects will not be assigned to that group of GPUs.
Second, a permanent solution: New AMD drivers or a new FAHCore from FAH will be prepared that fixes the original problem. (Then the temporary solution will be removed.)
bruce
 
Posts: 23736
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: AMD GPU Error sortShortList on some projects

Postby alxbelu » Sun Mar 29, 2020 11:41 am

That's great news! Thanks for the update!
alxbelu
 
Posts: 84
Joined: Sat Mar 14, 2020 7:28 pm

Re: AMD GPU Error sortShortList on some projects

Postby _r2w_ben » Sat Apr 04, 2020 10:41 pm

The restriction needs to be added to p11759. One was assigned at 2020-04-04T20:59:25Z.
_r2w_ben
 
Posts: 174
Joined: Wed Apr 23, 2008 4:11 pm

Re: AMD GPU Error sortShortList on some projects

Postby bruce » Sat Apr 04, 2020 10:48 pm

MrFrizzy wrote:
muziqaz wrote:Researchers are looking into disabling those projects on AMD GPUs until fix has been found.

Just to add to this discussion, I have a 5700 XT and have had no failures on any of the projects mentioned in this thread. Perhaps the source of the error isn't present on Navi cards?



Right. Navi is the one exception.
bruce
 
Posts: 23736
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Previous

Return to GPU Projects and FahCores

Who is online

Users browsing this forum: No registered users and 6 guests

cron