Proj. 14157 WU Failed: clEnqueueReadBuffer (-5)

Moderators: Site Moderators, FAHC Science Team

Post Reply
gordonbb
Posts: 510
Joined: Mon May 21, 2018 4:12 pm
Hardware configuration: Ubuntu 22.04.2 LTS; NVidia 525.60.11; 2 x 4070ti; 4070; 4060ti; 3x 3080; 3070ti; 3070
Location: Great White North

Proj. 14157 WU Failed: clEnqueueReadBuffer (-5)

Post by gordonbb »

Hello,

I noticed my points were down a bit today and discovered that I had a failed WU on one of my systems that has been otherwise stable.

Code: Select all

04:48:15:WU00:FS00:Connecting to 65.254.110.245:8080
04:48:15:WU00:FS00:Assigned to work server 155.247.166.220
04:48:15:WU00:FS00:Requesting new work unit for slot 00: RUNNING gpu:0:GP106 [GeForce GTX 1060 6GB] 4372 from 155.247.166.220
04:48:15:WU00:FS00:Connecting to 155.247.166.220:8080
04:48:17:WU00:FS00:Downloading 1.66MiB
04:48:18:WU01:FS00:0x21:Saving result file logfile_01.txt
04:48:18:WU01:FS00:0x21:Saving result file checkpointState.xml
04:48:19:WU01:FS00:0x21:Saving result file checkpt.crc
04:48:19:WU01:FS00:0x21:Saving result file log.txt
04:48:19:WU01:FS00:0x21:Saving result file positions.xtc
04:48:19:WU01:FS00:0x21:Folding@home Core Shutdown: FINISHED_UNIT
04:48:19:WU00:FS00:Download complete
04:48:19:WU01:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
04:48:19:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:11720 run:0 clone:2332 gen:162 core:0x21 unit:0x000000d08ca304e75bbce9b9efddc4d7
04:48:19:WU01:FS00:Uploading 7.01MiB to 140.163.4.231
04:48:19:WU01:FS00:Connecting to 140.163.4.231:8080
04:48:19:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:14157 run:43 clone:9 gen:46 core:0x21 unit:0x000000400002894c5c281bbcecd82acd
04:48:19:WU00:FS00:Starting
04:48:19:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 00 -suffix 01 -version 705 -lifeline 1041 -checkpoint 5 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
04:48:19:WU00:FS00:Started FahCore on PID 24436
04:48:19:WU00:FS00:Core PID:24440
04:48:19:WU00:FS00:FahCore 0x21 started
04:48:19:WU00:FS00:0x21:*********************** Log Started 2019-02-21T04:48:19Z ***********************
04:48:19:WU00:FS00:0x21:Project: 14157 (Run 43, Clone 9, Gen 46)
04:48:19:WU00:FS00:0x21:Unit: 0x000000400002894c5c281bbcecd82acd
04:48:19:WU00:FS00:0x21:CPU: 0x00000000000000000000000000000000
04:48:19:WU00:FS00:0x21:Machine: 0
04:48:19:WU00:FS00:0x21:Reading tar file core.xml
04:48:19:WU00:FS00:0x21:Reading tar file integrator.xml
04:48:19:WU00:FS00:0x21:Reading tar file state.xml
04:48:19:WU00:FS00:0x21:Reading tar file system.xml
04:48:20:WU00:FS00:0x21:Digital signatures verified
04:48:20:WU00:FS00:0x21:Folding@home GPU Core21 Folding@home Core
04:48:20:WU00:FS00:0x21:Version 0.0.18
04:48:25:WU01:FS00:Upload complete
04:48:25:WU01:FS00:Server responded WORK_ACK (400)
04:48:25:WU01:FS00:Final credit estimate, 43009.00 points
04:48:25:WU01:FS00:Cleaning up
04:48:28:WU00:FS00:0x21:Completed 0 out of 12500000 steps (0%)
04:48:28:WU00:FS00:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
04:51:36:WU00:FS00:0x21:Completed 125000 out of 12500000 steps (1%)
04:54:42:WU00:FS00:0x21:Completed 250000 out of 12500000 steps (2%)
04:57:51:WU00:FS00:0x21:Completed 375000 out of 12500000 steps (3%)
05:00:56:WU00:FS00:0x21:Completed 500000 out of 12500000 steps (4%)
05:04:05:WU00:FS00:0x21:Completed 625000 out of 12500000 steps (5%)
05:07:10:WU00:FS00:0x21:Completed 750000 out of 12500000 steps (6%)
05:10:20:WU00:FS00:0x21:Completed 875000 out of 12500000 steps (7%)
05:13:25:WU00:FS00:0x21:Completed 1000000 out of 12500000 steps (8%)
05:16:34:WU00:FS00:0x21:Completed 1125000 out of 12500000 steps (9%)
05:19:40:WU00:FS00:0x21:Completed 1250000 out of 12500000 steps (10%)
05:22:49:WU00:FS00:0x21:Completed 1375000 out of 12500000 steps (11%)
05:25:54:WU00:FS00:0x21:Completed 1500000 out of 12500000 steps (12%)
05:29:03:WU00:FS00:0x21:Completed 1625000 out of 12500000 steps (13%)
05:32:09:WU00:FS00:0x21:Completed 1750000 out of 12500000 steps (14%)
05:35:18:WU00:FS00:0x21:Completed 1875000 out of 12500000 steps (15%)
05:38:24:WU00:FS00:0x21:Completed 2000000 out of 12500000 steps (16%)
05:41:33:WU00:FS00:0x21:Completed 2125000 out of 12500000 steps (17%)
05:44:38:WU00:FS00:0x21:Completed 2250000 out of 12500000 steps (18%)
05:47:47:WU00:FS00:0x21:Completed 2375000 out of 12500000 steps (19%)
05:50:53:WU00:FS00:0x21:Completed 2500000 out of 12500000 steps (20%)
05:54:02:WU00:FS00:0x21:Completed 2625000 out of 12500000 steps (21%)
05:57:08:WU00:FS00:0x21:Completed 2750000 out of 12500000 steps (22%)
06:00:17:WU00:FS00:0x21:Completed 2875000 out of 12500000 steps (23%)
06:03:22:WU00:FS00:0x21:Completed 3000000 out of 12500000 steps (24%)
06:06:32:WU00:FS00:0x21:Completed 3125000 out of 12500000 steps (25%)
06:09:37:WU00:FS00:0x21:Completed 3250000 out of 12500000 steps (26%)
06:12:46:WU00:FS00:0x21:Completed 3375000 out of 12500000 steps (27%)
06:15:52:WU00:FS00:0x21:Completed 3500000 out of 12500000 steps (28%)
06:19:01:WU00:FS00:0x21:Completed 3625000 out of 12500000 steps (29%)
06:22:06:WU00:FS00:0x21:Completed 3750000 out of 12500000 steps (30%)
06:25:15:WU00:FS00:0x21:Completed 3875000 out of 12500000 steps (31%)
06:28:21:WU00:FS00:0x21:Completed 4000000 out of 12500000 steps (32%)
06:31:30:WU00:FS00:0x21:Completed 4125000 out of 12500000 steps (33%)
06:34:35:WU00:FS00:0x21:Completed 4250000 out of 12500000 steps (34%)
06:37:45:WU00:FS00:0x21:Completed 4375000 out of 12500000 steps (35%)
06:40:50:WU00:FS00:0x21:Completed 4500000 out of 12500000 steps (36%)
06:43:59:WU00:FS00:0x21:Completed 4625000 out of 12500000 steps (37%)
******************************* Date: 2019-02-21 *******************************
06:47:05:WU00:FS00:0x21:Completed 4750000 out of 12500000 steps (38%)
06:50:14:WU00:FS00:0x21:Completed 4875000 out of 12500000 steps (39%)
06:53:19:WU00:FS00:0x21:Completed 5000000 out of 12500000 steps (40%)
06:56:28:WU00:FS00:0x21:Completed 5125000 out of 12500000 steps (41%)
06:59:34:WU00:FS00:0x21:Completed 5250000 out of 12500000 steps (42%)
07:02:43:WU00:FS00:0x21:Completed 5375000 out of 12500000 steps (43%)
07:05:49:WU00:FS00:0x21:Completed 5500000 out of 12500000 steps (44%)
07:08:58:WU00:FS00:0x21:Completed 5625000 out of 12500000 steps (45%)
07:11:39:WU00:FS00:0x21:ERROR:exception: Error downloading array energyBuffer: clEnqueueReadBuffer (-5)
07:11:39:WU00:FS00:0x21:Saving result file logfile_01.txt
07:11:39:WU00:FS00:0x21:Saving result file log.txt
07:11:39:WU00:FS00:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
07:11:39:WARNING:WU00:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
07:11:39:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:14157 run:43 clone:9 gen:46 core:0x21 unit:0x000000400002894c5c281bbcecd82acd
07:11:40:WU00:FS00:Uploading 3.18KiB to 155.247.166.220
07:11:40:WU00:FS00:Connecting to 155.247.166.220:8080
07:11:40:WU00:FS00:Upload complete
07:11:40:WU00:FS00:Server responded WORK_ACK (400)
07:11:40:WU00:FS00:Cleaning up
07:11:40:WU01:FS00:Connecting to 65.254.110.245:8080
07:11:40:WU01:FS00:Assigned to work server 155.247.166.220
07:11:40:WU01:FS00:Requesting new work unit for slot 00: READY gpu:0:GP106 [GeForce GTX 1060 6GB] 4372 from 155.247.166.220
07:11:40:WU01:FS00:Connecting to 155.247.166.220:8080
07:11:43:WU01:FS00:Downloading 1.63MiB
07:11:45:WU01:FS00:Download complete
07:11:45:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:14131 run:14 clone:2 gen:91 core:0x21 unit:0x000000750002894c5c2822c03e55d9cc
07:11:45:WU01:FS00:Starting
07:11:45:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 01 -suffix 01 -version 705 -lifeline 1041 -checkpoint 5 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
07:11:45:WU01:FS00:Started FahCore on PID 6516
07:11:45:WU01:FS00:Core PID:6520
07:11:45:WU01:FS00:FahCore 0x21 started
07:11:46:WU01:FS00:0x21:*********************** Log Started 2019-02-21T07:11:45Z ***********************
07:11:46:WU01:FS00:0x21:Project: 14131 (Run 14, Clone 2, Gen 91)
07:11:46:WU01:FS00:0x21:Unit: 0x000000750002894c5c2822c03e55d9cc
07:11:46:WU01:FS00:0x21:CPU: 0x00000000000000000000000000000000
07:11:46:WU01:FS00:0x21:Machine: 0
07:11:46:WU01:FS00:0x21:Reading tar file core.xml
07:11:46:WU01:FS00:0x21:Reading tar file integrator.xml
07:11:46:WU01:FS00:0x21:Reading tar file state.xml
07:11:46:WU01:FS00:0x21:Reading tar file system.xml
07:11:46:WU01:FS00:0x21:Digital signatures verified
07:11:46:WU01:FS00:0x21:Folding@home GPU Core21 Folding@home Core
07:11:46:WU01:FS00:0x21:Version 0.0.18
07:11:54:WU01:FS00:0x21:Completed 0 out of 12500000 steps (0%)
07:11:54:WU01:FS00:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
07:15:01:WU01:FS00:0x21:Completed 125000 out of 12500000 steps (1%)

Code: Select all

*********@fold1:~$ FAHClient --info
***************************** Folding@home Client ******************************
        Website: https://foldingathome.org/
      Copyright: (c) 2009-2018 foldingathome.org
         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
           Args: --info
************************************ Build *************************************
        Version: 7.5.1
           Date: May 11 2018
           Time: 19:59:04
     Repository: Git
       Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
         Branch: master
       Compiler: GNU 6.3.0 20170516
        Options: -std=gnu++98 -O3 -funroll-loops
       Platform: linux2 4.14.0-3-amd64
           Bits: 64
           Mode: Release
************************************ System ************************************
            CPU: AMD Athlon(tm) II X2 220 Processor
         CPU ID: AuthenticAMD Family 16 Model 6 Stepping 3
           CPUs: 2
         Memory: 2.93GiB
    Free Memory: 1.08GiB
        Threads: POSIX_THREADS
     OS Version: 4.15
    Has Battery: false
     On Battery: false
     UTC Offset: -5
            PID: 22629
            CWD: /home/*******
             OS: Linux 4.15.0-45-generic x86_64
        OS Arch: AMD64
           GPUs: 0
  CUDA Device 0: Platform:0 Device:0 Bus:2 Slot:0 Compute:6.1 Driver:10.0
OpenCL Device 0: Platform:0 Device:0 Bus:2 Slot:0 Compute:1.2 Driver:415.27
********************************************************************************
Hardware is a EVGA GTX 1060 6GB "Shortboard" running in a PCIe2 x16 slot in a Acer ITX motherboard with 3GB DDR3 and a dedicated AMD Athlon II x2 220 2.8GHz processor.

OS is Ubuntu 18.04.1 LTS Server. Nvidia Driver is 415.27 from the ppa repository

The GPU was at 67C at the time, Shader Clock was 1999MHz, Power Limit set to 110W, +78MHz Shader Clock Offset (OverClock) which has been stable for months.

I had rebuilt the OS from scratch a few weeks ago.
Image
Joe_H
Site Admin
Posts: 7868
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Proj. 14157 WU Failed: clEnqueueReadBuffer (-5)

Post by Joe_H »

The WU was successfully processed by the next folder to be assigned it.

Whether that means your card was the problem or something else, not easy to determine.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
gordonbb
Posts: 510
Joined: Mon May 21, 2018 4:12 pm
Hardware configuration: Ubuntu 22.04.2 LTS; NVidia 525.60.11; 2 x 4070ti; 4070; 4060ti; 3x 3080; 3070ti; 3070
Location: Great White North

Re: Proj. 14157 WU Failed: clEnqueueReadBuffer (-5)

Post by gordonbb »

Joe_H wrote:The WU was successfully processed by the next folder to be assigned it.

Whether that means your card was the problem or something else, not easy to determine.
Thanks Joe, This thread about “clEnqueueReadBuffer” referenced an issue in the 375 drivers but in those cases the GPU consistently failed to fold whereas in this case my system immediately picked up another WU and completed it and has kept on folding since completing 5 WUs so far without issue.

A later add-on in the same post implied that it might be a hardware issue and mentioned spurious “Bad State” errors which I’m not seeing but I do see the occasional Bada Topology Warning.

The motherboard and CPU are 10 years old but have been OK folding for the past nine months and the GPU is less than a year old.

The system is on a UPS, but it is a APC Single-Conversion BackUPS 1000VA rather than an Eaton Dual-Conversion UPS like my two other rigs so it is possible that the UPS has a battery issue which would cause it to bypass the UPS and not filter transients. Though this UPS has my FreeNAS server as well as My ESIx Server on it both of which are operating without issue.

I’ve dialled back the GPU Shader Clock offset one bin to +65MHz and will keep monitoring it. The card has a little more than two years warranty left so I can get it replaced if necessary by EVGA.
Image
toTOW
Site Moderator
Posts: 6309
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Proj. 14157 WU Failed: clEnqueueReadBuffer (-5)

Post by toTOW »

On Windows, this is usually caused by a driver reset after something went wrong on the GPU ... usually, reducing overclocking solves the issue ...
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Proj. 14157 WU Failed: clEnqueueReadBuffer (-5)

Post by bruce »

toTOW wrote:On Windows, this is usually caused by a driver reset after something went wrong on the GPU ... usually, reducing overclocking solves the issue ...
FAH"s official position is that "we don't support overclocking" Unfortunately not all WUs are equally tolerant of overclocking so I'd bet that you just hit one that was inconsistent with your previous benchmarking. The (-5) is an out-of-resources error ... which may simply be that the GPU was off-line briefly while it was being reset.
Post Reply