Failed GPU work unit issue.

Moderators: Site Moderators, PandeGroup

Re: Failed GPU work unit issue.

Postby bruce » Tue Apr 03, 2018 4:39 pm

What appears here near the top of the latest log?
Code: Select all
16:17:24:      OS Arch: AMD64
16:17:24:         GPUs: 1
16:17:24:        GPU 0: NVIDIA:3 GK106 [GeForce GTX 660]
16:17:24:         CUDA: Not detected
16:17:24:Win32 Service: false
bruce
 
Posts: 21402
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Failed GPU work unit issue.

Postby vince_lewis » Mon Apr 09, 2018 8:04 am

Hi, I have the same issue. I contacted Joseph Coffland at Cauldron and he gave me this reply.
(his reply first, the my log after)

Note I am running FAH client 7.4.4 on ubuntu,
Vince,

I'm not quite sure what's going on there. The 0x21 core is one of the few
parts of F@H I'm not responsible for. This error appears to only occur
for some clients. If it is a problem with your GPU then we should either
not be assigning these WUs to that GPU and the core should log more
information. BAD_WORK_UNIT is most likely the wrong return code. You
might want to take this to foldingforum.org to get more help.

Regards,

Joseph


My log:

Code: Select all
> 08:50:24:Adding folding slot 01: READY gpu:0:BeaverCreek [Mobility Radeon
> HD 6620G]
> 08:50:24:Removing old file 'configs/config-20180329-170337.xml'
> 08:50:24:Saving configuration to /etc/fahclient/config.xml
> 08:50:24:<config>
> 08:50:24:  <!-- Client Control -->
> 08:50:24:  <fold-anon v='true'/>
> 08:50:24:
> 08:50:24:  <!-- Folding Slot Configuration -->
> 08:50:24:  <cause v='CANCER'/>
> 08:50:24:  <gpu v='false'/>
> 08:50:24:
> 08:50:24:  <!-- Network -->
> 08:50:24:  <proxy v=':8080'/>
> 08:50:24:
> 08:50:24:  <!-- Slot Control -->
> 08:50:24:  <power v='full'/>
> 08:50:24:
> 08:50:24:  <!-- User Information -->
> 08:50:24:  <passkey v='********************************'/>
> 08:50:24:  <team v='234171'/>
> 08:50:24:  <user v='Vince_Lewis'/>
> 08:50:24:
> 08:50:24:  <!-- Folding Slots -->
> 08:50:24:  <slot id='0' type='CPU'/>
> 08:50:24:  <slot id='1' type='GPU'/>
> 08:50:24:</config>
> 08:50:24:FS00:Shutting core down
> 08:50:25:WU01:FS01:Connecting to 171.67.108.45:80
> 08:50:25:WU01:FS01:Assigned to work server 171.67.108.157
> 08:50:25:WU01:FS01:Requesting new work unit for slot 01: READY
> gpu:0:BeaverCreek [Mobility Radeon HD 6620G] from 171.67.108.157
> 08:50:25:WU01:FS01:Connecting to 171.67.108.157:8080
> 08:50:26:WU01:FS01:Downloading 8.84MiB
> 08:50:27:WU00:FS00:0xa4:Client no longer detected. Shutting down core.
> 08:50:27:WU00:FS00:0xa4:
> 08:50:27:WU00:FS00:0xa4:Folding@home Core Shutdown: CLIENT_DIED
> 08:50:28:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
> 08:50:28:WU00:FS00:Starting
> 08:50:28:WARNING:WU00:FS00:Changed SMP threads from 4 to 3 this can cause
> some work units to fail
> 08:50:28:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper
> /var/lib/fahclient/cores/
> fahwebx.stanford.edu/cores/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 00
> -suffix 01 -version 704 -lifeline 2238 -checkpoint 15 -np 3
> 08:50:28:WU00:FS00:Started FahCore on PID 6066
> 08:50:28:WU00:FS00:Core PID:6070
> 08:50:28:WU00:FS00:FahCore 0xa4 started
> 08:50:28:WU00:FS00:0xa4:
> 08:50:28:WU00:FS00:0xa4:*------------------------------*
> 08:50:28:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
> 08:50:28:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
> 08:50:28:WU00:FS00:0xa4:
> 08:50:28:WU00:FS00:0xa4:Preparing to commence simulation
> 08:50:28:WU00:FS00:0xa4:- Looking at optimizations...
> 08:50:28:WU00:FS00:0xa4:- Files status OK
> 08:50:28:WU00:FS00:0xa4:- Expanded 825913 -> 1402860 (decompressed 169.8
> percent)
> 08:50:28:WU00:FS00:0xa4:Called DecompressByteArray:
> compressed_data_size=825913 data_size=1402860,
> decompressed_data_size=1402860 diff=0
> 08:50:28:WU00:FS00:0xa4:- Digital signature verified
> 08:50:28:WU00:FS00:0xa4:
> 08:50:28:WU00:FS00:0xa4:Project: 9038 (Run 393, Clone 0, Gen 1598)
> 08:50:28:WU00:FS00:0xa4:
> 08:50:28:WU00:FS00:0xa4:Assembly optimizations on if available.
> 08:50:28:WU00:FS00:0xa4:Entering M.D.
> 08:50:31:Removing old file 'configs/config-20180329-170344.xml'
> 08:50:31:Saving configuration to /etc/fahclient/config.xml
> 08:50:31:<config>
> 08:50:31:  <!-- Client Control -->
> 08:50:31:  <fold-anon v='true'/>
> 08:50:31:
> 08:50:31:  <!-- Folding Slot Configuration -->
> 08:50:31:  <cause v='CANCER'/>
> 08:50:31:  <gpu v='false'/>
> 08:50:31:
> 08:50:31:  <!-- Network -->
> 08:50:31:  <proxy v=':8080'/>
> 08:50:31:
> 08:50:31:  <!-- Slot Control -->
> 08:50:31:  <power v='full'/>
> 08:50:31:
> 08:50:31:  <!-- User Information -->
> 08:50:31:  <passkey v='********************************'/>
> 08:50:31:  <team v='234171'/>
> 08:50:31:  <user v='Vince_Lewis'/>
> 08:50:31:
> 08:50:31:  <!-- Folding Slots -->
> 08:50:31:  <slot id='0' type='CPU'/>
> 08:50:31:  <slot id='1' type='GPU'/>
> 08:50:31:</config>
> 08:50:32:WU01:FS01:Download 41.69%
> 08:50:34:WU00:FS00:0xa4:Using Gromacs checkpoints
> 08:50:35:WU00:FS00:0xa4:Resuming from checkpoint
> 08:50:35:WU00:FS00:0xa4:Verified 00/wudata_01.log
> 08:50:35:WU00:FS00:0xa4:Verified 00/wudata_01.trr
> 08:50:35:WU00:FS00:0xa4:Verified 00/wudata_01.xtc
> 08:50:35:WU00:FS00:0xa4:Verified 00/wudata_01.edr
> 08:50:35:WU00:FS00:0xa4:Completed 22720 out of 250000 steps  (9%)
> 08:50:38:WU01:FS01:Download 87.63%
> 08:50:39:WU01:FS01:Download complete
> 08:50:39:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR
> project:9431 run:915 clone:1 gen:660 core:0x21
> unit:0x000002faab436c9d586fdd3bf0d699d2
> 08:50:39:WU01:FS01:Starting
> 08:50:39:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper
> /var/lib/fahclient/cores/
> fahwebx.stanford.edu/cores/Linux/AMD64/ATI/R600/Core_21.fah/FahCore_21
> -dir
> 01 -suffix 01 -version 704 -lifeline 2238 -checkpoint 15 -gpu 0
> -gpu-vendor
> ati
> 08:50:39:WU01:FS01:Started FahCore on PID 6076
> 08:50:39:WU01:FS01:Core PID:6080
> 08:50:39:WU01:FS01:FahCore 0x21 started
> 08:50:40:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
> 08:50:40:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY
> project:9431 run:915 clone:1 gen:660 core:0x21
> unit:0x000002faab436c9d586fdd3bf0d699d2
> 08:50:40:WU01:FS01:Uploading 6.00KiB to 171.67.108.157
> 08:50:40:WU01:FS01:Connecting to 171.67.108.157:8080
> 08:50:40:WU02:FS01:Connecting to 171.67.108.45:80
> 08:50:40:WU01:FS01:Upload complete
> 08:50:40:WU01:FS01:Server responded WORK_ACK (400)
> 08:50:40:WU01:FS01:Cleaning up
> 08:50:41:WU02:FS01:Assigned to work server 171.67.108.157
> 08:50:41:WU02:FS01:Requesting new work unit for slot 01: READY
> gpu:0:BeaverCreek [Mobility Radeon HD 6620G] from 171.67.108.157
> 08:50:41:WU02:FS01:Connecting to 171.67.108.157:8080
> 08:50:42:WU02:FS01:Downloading 5.17MiB
> 08:50:48:WU02:FS01:Download 70.08%
> 08:50:50:WU02:FS01:Download complete
> 08:50:50:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR
> project:9415 run:2128 clone:3 gen:657 core:0x21
> unit:0x0000030dab436c9d585e06dcf5a418b7
> 08:50:50:WU02:FS01:Starting
> 08:50:50:WU02:FS01:Running FahCore: /usr/bin/FAHCoreWrapper
> /var/lib/fahclient/cores/
> fahwebx.stanford.edu/cores/Linux/AMD64/ATI/R600/Core_21.fah/FahCore_21
> -dir
> 02 -suffix 01 -version 704 -lifeline 2238 -checkpoint 15 -gpu 0
> -gpu-vendor
> ati
> 08:50:50:WU02:FS01:Started FahCore on PID 6086
> 08:50:50:WU02:FS01:Core PID:6090
> 08:50:50:WU02:FS01:FahCore 0x21 started
> 08:50:50:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
> 08:50:50:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY
> project:9415 run:2128 clone:3 gen:657 core:0x21
> unit:0x0000030dab436c9d585e06dcf5a418b7
> 08:50:50:WU02:FS01:Uploading 6.00KiB to 171.67.108.157
> 08:50:50:WU02:FS01:Connecting to 171.67.108.157:8080
> 08:50:51:WU01:FS01:Connecting to 171.67.108.45:80
> 08:50:51:WU02:FS01:Upload complete
> 08:50:51:WU01:FS01:Assigned to work server 171.67.108.157
> 08:50:51:WU01:FS01:Requesting new work unit for slot 01: READY
> gpu:0:BeaverCreek [Mobility Radeon HD 6620G] from 171.67.108.157
> 08:50:51:WU01:FS01:Connecting to 171.67.108.157:8080
> 08:50:51:WU02:FS01:Server responded WORK_ACK (400)
> 08:50:51:WU02:FS01:Cleaning up
> 08:50:52:WU01:FS01:Downloading 5.15MiB
> 08:50:58:WU01:FS01:Download 80.04%
> 08:50:59:WU01:FS01:Download complete
> 08:50:59:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR
> project:9414 run:1615 clone:1 gen:765 core:0x21
> unit:0x000003a2ab436c9d585e069c3fd9de7b
> 08:50:59:WU01:FS01:Starting


Mod edit: added Code tags to log - j
vince_lewis
 
Posts: 16
Joined: Fri Dec 09, 2016 11:43 pm

Re: Failed GPU work unit issue.

Postby bruce » Mon Apr 09, 2018 5:20 pm

Apparently the WU that's failing is p9431 R915 C1 G660, That WU was reassigned and successfully completed by someone else, so there's nothing inherently wrong with that WU, itself.

That seems to imply that the error happened on your system.
The most likely possibilities are
* Your GPU is marginally unstable causing calculation errors.
* Your system had a forced shut down without sufficient time to finish writing a checkpoint.
* There was a communications error resulting in the WU downloaded being corrupted

Inasmuch as it was a fresh download, the most probable explanation is an unstable GPU.
bruce
 
Posts: 21402
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Failed GPU work unit issue.

Postby vince_lewis » Thu Apr 19, 2018 4:00 pm

So can I do anything about that, or should I just leave the GPU switched off?
vince_lewis
 
Posts: 16
Joined: Fri Dec 09, 2016 11:43 pm

Re: Failed GPU work unit issue.

Postby foldy » Thu Apr 19, 2018 5:25 pm

The work units fail immidiately after download and start of work unit.

Code: Select all
08:50:50:WU02:FS01:Download complete
08:50:50:WU02:FS01:FahCore 0x21 started
08:50:50:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)

For me that looks more like a OpenCL or driver issue.

Looks like AMD Mobility Radeon HD 6620G is a iGPU in the CPU (APU).

Maybe you cannot fold on that and need to delete the GPU slot.
foldy
 
Posts: 1130
Joined: Sat Dec 01, 2012 3:43 pm

Re: Failed GPU work unit issue.

Postby toTOW » Sun Apr 22, 2018 10:21 am

You can still try to run the Fahcore from a terminal and see what error it prints ... it might give more details than what the client is able to capture.
Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.

FAH-Addict : latest news, tests and reviews about Folding@Home project.

Image
User avatar
toTOW
Site Moderator
 
Posts: 8433
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France

Re: Failed GPU work unit issue.

Postby vince_lewis » Sun Apr 22, 2018 2:06 pm

How do I do that in Ubuntu? Sorry, newbie question I know
vince_lewis
 
Posts: 16
Joined: Fri Dec 09, 2016 11:43 pm

Re: Failed GPU work unit issue.

Postby bollix47 » Sun Apr 22, 2018 9:56 pm

I'm not sure you'll get anything useful but here's how:

Open a Terminal (ctrl-alt-t)

Select the following (SELECT ALL may not work as it should, if not just triple click inside the box) and right-click Copy and right-click Paste it in the terminal:
Code: Select all
cd "$(dirname "$(find / -type f -name FahCore_21 2>/dev/null | head -1)")"

If FahCore_21 is not the core you're looking for then change that portion of the command (e.g. change 21 to a4 or a7).

Press Enter (could take a while depending on your setup)
You should now be in the folder of the core you want to execute, (You can use the ls command to verify)

Type the following (again I'm using FahCore_21 so change it according to your needs):

Code: Select all
./FahCore_21
bollix47
 
Posts: 3413
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: Failed GPU work unit issue.

Postby vince_lewis » Mon Apr 23, 2018 8:18 am

Okay, so I'm using core a4.
I found the directory manually (the cd command seemed to hang)

I ran the script ./FahCore_a4

Output is as follows:
Folding@Home Gromacs GB Core Version 2.27

That's all I get.
Any next steps?
vince_lewis
 
Posts: 16
Joined: Fri Dec 09, 2016 11:43 pm

Re: Failed GPU work unit issue.

Postby bruce » Mon Apr 23, 2018 4:14 pm

Please post the output of ldd ./FahCore_a4
bruce
 
Posts: 21402
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Failed GPU work unit issue.

Postby vince_lewis » Tue Apr 24, 2018 5:47 am

I get the following:

not a dynamic executable
vince_lewis
 
Posts: 16
Joined: Fri Dec 09, 2016 11:43 pm

Re: Failed GPU work unit issue.

Postby vince_lewis » Tue Apr 24, 2018 6:20 am

OK, scratch that, was misreading the log - very sorry. The GPU is indeed trying to use FahCore_21.

Output from script ./FahCore_21 is:

*************************** Core21 Folding@home Core ***************************
Type: 33
Core: Core21
Website: http://folding.stanford.edu/
Copyright: (c) 2009-2014 Stanford University
Author: Yutong Zhao <yutong.zhao@stanford.edu>
Args:
Config: <none>
************************************ Build *************************************
Version: 0.0.18
Date: Jan 20 2017
Time: 03:42:31
Repository: Git
Revision: 2745fc8067662d2e7b9e455232edb5ebd8790640
Branch: HEAD
Compiler: GNU 4.4.7 20120313 (Red Hat 4.4.7-17)
Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
-fno-unsafe-math-optimizations -msse2
Platform: linux2 4.4.39-moby
Bits: 64
Mode: Release
************************************ System ************************************
CPU: AMD A8-3500M APU with Radeon(tm) HD Graphics
CPU ID: AuthenticAMD Family 18 Model 1 Stepping 0
CPUs: 4
Memory: 5.31GiB
Free Memory: 2.30GiB
Threads: POSIX_THREADS
OS Version: 4.4
Has Battery: true
On Battery: false
UTC Offset: 1
PID: 10509
CWD: /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/ATI/R600/Core_21.fah
OS: Linux 4.4.0-121-lowlatency x86_64
OS Arch: AMD64
GPUs: 1
GPU 0: Bus:0 Slot:1 ATI:4 BeaverCreek [Mobility Radeon HD 6620G]
CUDA: Not detected
OpenCL: Not detected
********************************************************************************



Output from ldd is:
linux-vdso.so.1 => (0x00007ffdfb0e6000)
libOpenCL.so.1 => /usr/lib/x86_64-linux-gnu/libOpenCL.so.1 (0x00007f88e3bcb000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f88e39ae000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f88e37aa000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f88e35a2000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f88e3220000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f88e2f17000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f88e2d01000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f88e2937000)
/lib64/ld-linux-x86-64.so.2 (0x00007f88e3dd6000)



I'm running Ubuntu 16.04. I see AMD has proprietary drivers for 12.04 and 14.04 but not 16.04. Should I try installing the drivers for 14.04 ?
vince_lewis
 
Posts: 16
Joined: Fri Dec 09, 2016 11:43 pm

Previous

Return to V7.4.4 Public Release Windows/Linux/MacOS X (deprecated)

Who is online

Users browsing this forum: No registered users and 2 guests

cron