11750 (0, 1937, 0) & 11760 (0, 2011, 6), clWaitForEvents

Moderators: Site Moderators, FAHC Science Team

Post Reply
Karia
Posts: 2
Joined: Mon Mar 16, 2020 11:51 pm

11750 (0, 1937, 0) & 11760 (0, 2011, 6), clWaitForEvents

Post by Karia »

I've gotten exception: clWaitForEvents on two WUs in a row. There was another WU last night that failed with the same error, but I do not have the log for that now (and after that failed WU, several completed successfully.) Error logs below. I'm still waiting on a new WU, and will update to confirm whether this continues.

Card: GeForce GTX 1070 Mobile, Firmware 442.19 (latest stable studio version)

Code: Select all

22:03:45:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GP104BM [GeForce GTX 1070 Mobile] 6463 from 140.163.4.231
22:03:45:WU01:FS01:Connecting to 140.163.4.231:8080
22:05:28:WU01:FS01:Downloading 8.47MiB
22:05:34:WU01:FS01:Download 11.06%
22:05:40:WU01:FS01:Download 26.55%
22:05:46:WU01:FS01:Download 43.51%
22:05:54:WU01:FS01:Download 92.18%
22:05:55:WU01:FS01:Download complete
22:05:55:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:11750 run:0 clone:1937 gen:0 core:0x22 unit:0x000000028ca304e75e6a8024951c1f40
22:05:55:WU01:FS01:Starting
22:05:55:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\Karia\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 705 -lifeline 15804 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
...
23:27:07:WU01:FS01:0x22:Completed 590000 out of 1000000 steps (59%)
23:28:27:WU01:FS01:0x22:Completed 600000 out of 1000000 steps (60%)
23:28:47:WU01:FS01:0x22:ERROR:exception: clWaitForEvents
23:28:47:WU01:FS01:0x22:Saving result file ..\logfile_01.txt
23:28:47:WU01:FS01:0x22:Saving result file checkpointState.xml
23:28:50:WU01:FS01:0x22:Saving result file checkpt.crc
23:28:50:WU01:FS01:0x22:Saving result file positions.xtc
23:28:50:WU01:FS01:0x22:Saving result file science.log
23:28:51:WU01:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
23:28:51:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
23:28:51:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:11750 run:0 clone:1937 gen:0 core:0x22 unit:0x000000028ca304e75e6a8024951c1f40
23:28:51:WU01:FS01:Uploading 11.41MiB to 140.163.4.231

Code: Select all

18:47:29:WU02:FS01:Requesting new work unit for slot 01: READY gpu:0:GP104BM [GeForce GTX 1070 Mobile] 6463 from 128.252.203.10
18:47:29:WU02:FS01:Connecting to 128.252.203.10:8080
18:47:33:WU01:FS01:Upload complete
18:47:34:WU01:FS01:Server responded WORK_ACK (400)
18:47:34:WU01:FS01:Final credit estimate, 60767.00 points
18:47:34:WU01:FS01:Cleaning up
18:47:50:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
18:47:50:WU02:FS01:Connecting to 128.252.203.10:80
18:49:11:WU02:FS01:Downloading 29.70MiB
18:49:17:WU02:FS01:Download 64.61%
18:49:18:WU02:FS01:Download complete
18:49:18:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:11760 run:0 clone:2011 gen:6 core:0x22 unit:0x0000000880fccb0a5e6d7ce37094cb2c
...
21:05:25:WU02:FS01:0x22:ERROR:exception: clWaitForEvents
21:05:25:WU02:FS01:0x22:Saving result file ..\logfile_01.txt
21:05:25:WU02:FS01:0x22:Saving result file checkpointState.xml
21:05:25:WU02:FS01:0x22:Saving result file checkpt.crc
21:05:25:WU02:FS01:0x22:Saving result file positions.xtc
21:05:25:WU02:FS01:0x22:Saving result file science.log
21:05:25:WU02:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
21:05:26:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
21:05:26:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:11760 run:0 clone:2011 gen:6 core:0x22 unit:0x0000000880fccb0a5e6d7ce37094cb2c

EDIT: Forgot to include my driver version.


EDIT 2: The next WU I got completed successfully, 11760 (0, 14862, 0). So it seems like the issue is probably not on my end, unless it's intermitent. Will continue monitoring.
Karia
Posts: 2
Joined: Mon Mar 16, 2020 11:51 pm

Re: 11750 (0, 1937, 0) & 11760 (0, 2011, 6), clWaitForEvents

Post by Karia »

Another WU failed. 11743 (0, 1877, 7).

Code: Select all

17:29:26:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:11743 run:0 clone:1877 gen:7 core:0x22 unit:0x0000000c8ca304f15e67e325920e4f56
...
17:30:43:WU01:FS01:0x22:Completed 0 out of 2000000 steps (0%)
17:30:43:WU01:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
17:32:17:WU01:FS01:0x22:Completed 20000 out of 2000000 steps (1%)
17:33:50:WU01:FS01:0x22:Completed 40000 out of 2000000 steps (2%)
17:35:26:WU01:FS01:0x22:Completed 60000 out of 2000000 steps (3%)
17:36:58:WU01:FS01:0x22:ERROR:exception: clWaitForEvents
17:36:58:WU01:FS01:0x22:Saving result file ..\logfile_01.txt
17:36:58:WU01:FS01:0x22:Saving result file checkpointState.xml
17:37:00:WU01:FS01:0x22:Saving result file checkpt.crc
17:37:00:WU01:FS01:0x22:Saving result file positions.xtc
17:37:00:WU01:FS01:0x22:Saving result file science.log
17:37:00:WU01:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
17:37:01:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
17:37:01:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:11743 run:0 clone:1877 gen:7 core:0x22 unit:0x0000000c8ca304f15e67e325920e4f56
Post Reply