Page 1 of 1

16448 Bad Work Units, sortShortList errors

Posted: Sun Jun 14, 2020 1:18 pm
by BobWilliams757

Code: Select all

02:09:48:WU00:FS01:Downloading 22.42MiB
02:09:54:WU00:FS01:Download 84.21%
02:09:55:WU00:FS01:Download complete
02:09:55:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:16448 run:0 clone:2704 gen:71 core:0x22 unit:0x0000006e42aa6f325ed18ce24fb35eef
02:10:59:WU01:FS01:0x21:Completed 10000000 out of 10000000 steps (100%)
02:11:01:WU01:FS01:0x21:Saving result file logfile_01.txt
02:11:01:WU01:FS01:0x21:Saving result file checkpointState.xml
02:11:01:WU01:FS01:0x21:Saving result file checkpt.crc
02:11:01:WU01:FS01:0x21:Saving result file log.txt
02:11:01:WU01:FS01:0x21:Saving result file positions.xtc
02:11:01:WU01:FS01:0x21:Folding@home Core Shutdown: FINISHED_UNIT
02:11:01:WU01:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
02:11:01:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:16906 run:6 clone:18 gen:23 core:0x21 unit:0x0000002a0002894c5ea45b8aab163c17
02:11:01:WU01:FS01:Uploading 11.15MiB to 155.247.166.220
02:11:01:WU01:FS01:Connecting to 155.247.166.220:8080
02:11:01:WU00:FS01:Starting
02:11:01:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\rober\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 9644 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
02:11:01:WU00:FS01:Started FahCore on PID 10208
02:11:01:WU00:FS01:Core PID:13476
02:11:01:WU00:FS01:FahCore 0x22 started
02:11:02:WU00:FS01:0x22:*********************** Log Started 2020-06-14T02:11:01Z ***********************
02:11:02:WU00:FS01:0x22:*************************** Core22 Folding@home Core ***************************
02:11:02:WU00:FS01:0x22:       Type: 0x22
02:11:02:WU00:FS01:0x22:       Core: Core22
02:11:02:WU00:FS01:0x22:    Website: https://foldingathome.org/
02:11:02:WU00:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
02:11:02:WU00:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
02:11:02:WU00:FS01:0x22:             <rafal.wiewiora@choderalab.org>
02:11:02:WU00:FS01:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 10208 -checkpoint 15
02:11:02:WU00:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
02:11:02:WU00:FS01:0x22:     Config: <none>
02:11:02:WU00:FS01:0x22:************************************ Build *************************************
02:11:02:WU00:FS01:0x22:    Version: 0.0.5
02:11:02:WU00:FS01:0x22:       Date: Apr 22 2020
02:11:02:WU00:FS01:0x22:       Time: 04:42:59
02:11:02:WU00:FS01:0x22: Repository: Git
02:11:02:WU00:FS01:0x22:   Revision: 2d69202c898bd9bb3e093f51cd32bf411c2a0388
02:11:02:WU00:FS01:0x22:     Branch: HEAD
02:11:02:WU00:FS01:0x22:   Compiler: Visual C++ 2008
02:11:02:WU00:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
02:11:02:WU00:FS01:0x22:   Platform: win32 10
02:11:02:WU00:FS01:0x22:       Bits: 64
02:11:02:WU00:FS01:0x22:       Mode: Release
02:11:02:WU00:FS01:0x22:************************************ System ************************************
02:11:02:WU00:FS01:0x22:        CPU: AMD Ryzen 5 2400G with Radeon Vega Graphics
02:11:02:WU00:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 17 Stepping 0
02:11:02:WU00:FS01:0x22:       CPUs: 8
02:11:02:WU00:FS01:0x22:     Memory: 14.93GiB
02:11:02:WU00:FS01:0x22:Free Memory: 11.18GiB
02:11:02:WU00:FS01:0x22:    Threads: WINDOWS_THREADS
02:11:02:WU00:FS01:0x22: OS Version: 6.2
02:11:02:WU00:FS01:0x22:Has Battery: false
02:11:02:WU00:FS01:0x22: On Battery: false
02:11:02:WU00:FS01:0x22: UTC Offset: -4
02:11:02:WU00:FS01:0x22:        PID: 13476
02:11:02:WU00:FS01:0x22:        CWD: C:\Users\rober\AppData\Roaming\FAHClient\work
02:11:02:WU00:FS01:0x22:         OS: Windows 10 Home
02:11:02:WU00:FS01:0x22:    OS Arch: AMD64
02:11:02:WU00:FS01:0x22:********************************************************************************
02:11:02:WU00:FS01:0x22:Project: 16448 (Run 0, Clone 2704, Gen 71)
02:11:02:WU00:FS01:0x22:Unit: 0x0000006e42aa6f325ed18ce24fb35eef
02:11:02:WU00:FS01:0x22:Reading tar file core.xml
02:11:02:WU00:FS01:0x22:Reading tar file integrator.xml
02:11:02:WU00:FS01:0x22:Reading tar file state.xml
02:11:03:WU00:FS01:0x22:Reading tar file system.xml
02:11:05:WU00:FS01:0x22:Digital signatures verified
02:11:05:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
02:11:05:WU00:FS01:0x22:Version 0.0.5
02:11:06:WU01:FS01:Upload complete
02:11:06:WU01:FS01:Server responded WORK_ACK (400)
02:11:06:WU01:FS01:Final credit estimate, 48201.00 points
02:11:06:WU01:FS01:Cleaning up
02:11:36:WU00:FS01:0x22:ERROR:exception: Error invoking kernel sortShortList: clEnqueueNDRangeKernel (-5)
02:11:36:WU00:FS01:0x22:Saving result file ..\logfile_01.txt
02:11:36:WU00:FS01:0x22:Saving result file science.log
02:11:36:WU00:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
02:11:37:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
02:11:37:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:16448 run:0 clone:2704 gen:71 core:0x22 unit:0x0000006e42aa6f325ed18ce24fb35eef
02:11:37:WU00:FS01:Uploading 2.32KiB to 66.170.111.50
02:11:37:WU00:FS01:Connecting to 66.170.111.50:8080
02:11:37:WU00:FS01:Upload complete
02:11:37:WU00:FS01:Server responded WORK_ACK (400)
02:11:37:WU00:FS01:Cleaning up

I'll spare further logs, but got the exact same error on another run of this same project earlier. (16448,0,2571,35)

Searching earlier threads with this error seemed to point to driver bugs from AMD. Though I've only had two of these projects, both were completed by Nvidia GPU's after my failed work units.


All drivers up to date, system rock solid stable otherwise, no overclocks during the failures, etc.

Earlier threads seemed to indicate that the fix included waiting on AMD to fix driver bugs, while some projects were restricted from certain AMD GPU's. I wasn't sure if there was any easy way to see if this project generates a lot of failures with AMD GPU's or not.... so I just want to report it just in case.


ETA: Just doing some random counting down on PRCG numbers, it does appear that AMD GPU's in general might be having an issue with this project. But I did limited looking, as searching through WU status is a slow way of figuring it out. :lol:

Re: 16448 Bad Work Units, sortShortList errors

Posted: Sun Jun 14, 2020 2:00 pm
by jchang6
I have had several errors on an RX570 card. I do not think errors have occurred on my RX560, 5500, and 5600 cars, not my two RTX 2080's

Re: 16448 Bad Work Units, sortShortList errors

Posted: Mon Jun 15, 2020 12:48 am
by PantherX
Can you please post the first 100 lines of the log file so we can see what hardware the client has detected?

Re: 16448 Bad Work Units, sortShortList errors

Posted: Mon Jun 15, 2020 1:25 am
by muziqaz
16448 is for navi only. We are still couple of weeks away from a fix for pre-navi cards on certain GPU projects. Project owner has been informed, and will exclude the project for anything other than Navi GPUs.
2400G should not be getting this project

Re: 16448 Bad Work Units, sortShortList errors

Posted: Wed Jun 17, 2020 1:04 am
by BobWilliams757
muziqaz wrote:16448 is for navi only. We are still couple of weeks away from a fix for pre-navi cards on certain GPU projects. Project owner has been informed, and will exclude the project for anything other than Navi GPUs.
2400G should not be getting this project
Thanks for the scoop. Being that these were my first WU's to not complete, it's good to know it's a known issue with a fix possibly in the works, and that it won't tie up server resources to issue them to those of us that can't fold them.

Re: 16448 Bad Work Units, sortShortList errors

Posted: Wed Jun 17, 2020 5:43 am
by JohnChodera
sortShortList errors should be fixed in core22 0.0.6 and later!

Let us know if you notice this again for newer core releases.

~ John Chodera // MSKCC

Re: 16448 Bad Work Units, sortShortList errors

Posted: Sat Jun 20, 2020 12:30 pm
by BobWilliams757
I did have another PRCG fail on the older core, but haven't picked one up on the newer core. I'll make sure to update if I do pick another up.... with a slow GPU it could be a while.

Thanks for the further info, good to know that a well working system is always being improved.