Project: 9704 (Run 35, Clone 16, Gen 99) - Bad WU?

Moderators: Site Moderators, FAHC Science Team

Post Reply
weirddan455
Posts: 12
Joined: Mon Oct 19, 2015 12:53 pm

Project: 9704 (Run 35, Clone 16, Gen 99) - Bad WU?

Post by weirddan455 »

I've been having problems folding Core 21 WUs and started a topic here:

viewtopic.php?f=74&t=28207

But this time the WU actually ended early, not just stalled (it did stall at ~74% but I restarted it, it got to 87%, and then ended early.) This is the first time I've had a WU end early like this so I'm wondering is this a bad WU or a problem on my end? Also did I get any points for this or no?

Code: Select all

*********************** Log Started 2015-10-23T13:03:52Z ***********************
13:03:52:************************* Folding@home Client *************************
13:03:52:    Website: http://folding.stanford.edu/
13:03:52:  Copyright: (c) 2009-2014 Stanford University
13:03:52:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
13:03:52:       Args: --config /var/opt/fah/config.xml --exec-directory=/opt/fah
13:03:52:             --data-directory=/var/opt/fah
13:03:52:     Config: /var/opt/fah/config.xml
13:03:52:******************************** Build ********************************
13:03:52:    Version: 7.4.4
13:03:52:       Date: Mar 4 2014
13:03:52:       Time: 12:02:38
13:03:52:    SVN Rev: 4130
13:03:52:     Branch: fah/trunk/client
13:03:52:   Compiler: GNU 4.4.7
13:03:52:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
13:03:52:             -fno-unsafe-math-optimizations -msse2
13:03:52:   Platform: linux2 3.2.0-1-amd64
13:03:52:       Bits: 64
13:03:52:       Mode: Release
13:03:52:******************************* System ********************************
13:03:52:        CPU: Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz
13:03:52:     CPU ID: GenuineIntel Family 6 Model 42 Stepping 7
13:03:52:       CPUs: 4
13:03:52:     Memory: 15.63GiB
13:03:52:Free Memory: 14.26GiB
13:03:52:    Threads: POSIX_THREADS
13:03:52: OS Version: 4.2
13:03:52:Has Battery: false
13:03:52: On Battery: false
13:03:52: UTC Offset: -5
13:03:52:        PID: 2028
13:03:52:        CWD: /var/opt/fah
13:03:52:         OS: Linux 4.2.3-1-ARCH x86_64
13:03:52:    OS Arch: AMD64
13:03:52:       GPUs: 1
13:03:52:      GPU 0: NVIDIA:3 GK104 [GeForce GTX 770]
13:03:52:       CUDA: 3.0
13:03:52:CUDA Driver: 7050
13:03:52:***********************************************************************
13:03:52:<config>
13:03:52:  <!-- Slot Control -->
13:03:52:  <power v='full'/>
13:03:52:
13:03:52:  <!-- User Information -->
13:03:52:  <passkey v='********************************'/>
13:03:52:  <team v='45032'/>
13:03:52:  <user v='weirddan455'/>
13:03:52:
13:03:52:  <!-- Folding Slots -->
13:03:52:  <slot id='0' type='CPU'/>
13:03:52:  <slot id='1' type='GPU'/>
13:03:52:</config>
13:03:52:Trying to access database...
13:03:52:Successfully acquired database lock
13:03:52:Enabled folding slot 00: READY cpu:3
13:03:52:Enabled folding slot 01: READY gpu:0:GK104 [GeForce GTX 770]
13:03:52:WU01:FS01:Starting
13:03:52:WU01:FS01:Running FahCore: /opt/fah/FAHCoreWrapper /var/opt/fah/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 01 -suffix 01 -version 704 -lifeline 2028 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
13:03:52:WU01:FS01:Started FahCore on PID 2037
13:03:52:WU01:FS01:Core PID:2041
13:03:52:WU01:FS01:FahCore 0x21 started
13:03:52:WU01:FS01:0x21:*********************** Log Started 2015-10-23T13:03:52Z ***********************
13:03:52:WU01:FS01:0x21:Project: 9704 (Run 35, Clone 16, Gen 99)
13:03:52:WU01:FS01:0x21:Unit: 0x0000008dab404162553ec14652ca66d1
13:03:52:WU01:FS01:0x21:CPU: 0x00000000000000000000000000000000
13:03:52:WU01:FS01:0x21:Machine: 1
13:03:52:WU01:FS01:0x21:Digital signatures verified
13:03:52:WU01:FS01:0x21:Folding@home GPU Core21 Folding@home Core
13:03:52:WU01:FS01:0x21:Version 0.0.12
13:03:52:WU01:FS01:0x21:  Found a checkpoint file
13:05:23:WU01:FS01:0x21:Completed 400000 out of 640000 steps (62%)
13:05:23:WU01:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
13:07:36:WU01:FS01:0x21:Completed 403200 out of 640000 steps (63%)
13:11:33:WU01:FS01:0x21:Completed 409600 out of 640000 steps (64%)
13:15:28:WU01:FS01:0x21:Completed 416000 out of 640000 steps (65%)
13:19:24:WU01:FS01:0x21:Completed 422400 out of 640000 steps (66%)
13:23:19:WU01:FS01:0x21:Completed 428800 out of 640000 steps (67%)
13:27:18:WU01:FS01:0x21:Completed 435200 out of 640000 steps (68%)
13:31:14:WU01:FS01:0x21:Completed 441600 out of 640000 steps (69%)
13:35:09:WU01:FS01:0x21:Completed 448000 out of 640000 steps (70%)
13:39:05:WU01:FS01:0x21:Completed 454400 out of 640000 steps (71%)
13:43:01:WU01:FS01:0x21:Completed 460800 out of 640000 steps (72%)
13:46:54:WU01:FS01:0x21:Completed 467200 out of 640000 steps (73%)
13:50:49:WU01:FS01:0x21:Completed 473600 out of 640000 steps (74%)
13:54:44:WU01:FS01:0x21:Completed 480000 out of 640000 steps (75%)
13:58:56:WU01:FS01:0x21:Completed 486400 out of 640000 steps (76%)
14:02:51:WU01:FS01:0x21:Completed 492800 out of 640000 steps (77%)
14:06:45:WU01:FS01:0x21:Completed 499200 out of 640000 steps (78%)
14:10:39:WU01:FS01:0x21:Completed 505600 out of 640000 steps (79%)
14:14:33:WU01:FS01:0x21:Completed 512000 out of 640000 steps (80%)
14:18:27:WU01:FS01:0x21:Completed 518400 out of 640000 steps (81%)
14:22:21:WU01:FS01:0x21:Completed 524800 out of 640000 steps (82%)
14:26:15:WU01:FS01:0x21:Completed 531200 out of 640000 steps (83%)
14:30:09:WU01:FS01:0x21:Completed 537600 out of 640000 steps (84%)
14:34:04:WU01:FS01:0x21:Completed 544000 out of 640000 steps (85%)
14:37:58:WU01:FS01:0x21:Completed 550400 out of 640000 steps (86%)
14:41:52:WU01:FS01:0x21:Completed 556800 out of 640000 steps (87%)
14:43:09:WU01:FS01:0x21:ERROR:exception: Error invoking kernel findBlocksWithInteractions: clEnqueueNDRangeKernel (-4)
14:43:09:WU01:FS01:0x21:Saving result file logfile_01.txt
14:43:09:WU01:FS01:0x21:Saving result file log.txt
14:43:09:WU01:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
14:43:09:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
14:43:09:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:9704 run:35 clone:16 gen:99 core:0x21 unit:0x0000008dab404162553ec14652ca66d1
14:43:09:WU01:FS01:Uploading 3.62KiB to 171.64.65.98
14:43:09:WU01:FS01:Connecting to 171.64.65.98:8080
14:43:10:WU00:FS01:Connecting to 171.67.108.45:80
14:43:10:WU01:FS01:Upload complete
14:43:10:WU01:FS01:Server responded WORK_ACK (400)
14:43:10:WU01:FS01:Cleaning up
davidcoton
Posts: 1102
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: Project: 9704 (Run 35, Clone 16, Gen 99) - Bad WU?

Post by davidcoton »

weirddan455 wrote: This is the first time I've had a WU end early like this so I'm wondering is this a bad WU or a problem on my end? Also did I get any points for this or no?
Can't tell easily. It will be reassigned, if someone else completes it then it's (nominally) your problem, though if it's a one-off there's not much you can do. If it fails for others too it's a bad WU. A mod should be able to find out whether anyone else completes it. You should get partial credit, but it's not guaranteed.
Image
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 9704 (Run 35, Clone 16, Gen 99) - Bad WU?

Post by bruce »

You received partial credit for returning an incomplete WU. It was reassigned and completed successfully by the next person. It's not a bad WU.

Hi weirddan455 (team 45032),
Your WU (P9704 R35 C16 G99) was added to the stats database on 2015-10-23 08:07:59 for 7830 points of credit.

Hi ***** (team ****),
Your WU (P9704 R35 C16 G99) was added to the stats database on 2015-10-24 11:08:22 for 16668 points of credit.
weirddan455
Posts: 12
Joined: Mon Oct 19, 2015 12:53 pm

Re: Project: 9704 (Run 35, Clone 16, Gen 99) - Bad WU?

Post by weirddan455 »

Hmm... so what does that error mean then?

Code: Select all

14:43:09:WU01:FS01:0x21:ERROR:exception: Error invoking kernel findBlocksWithInteractions: clEnqueueNDRangeKernel (-4)
I gotta figure out what's going on with these Core 21 WUs and see if there's anything I can do on my end to fix things.
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: Project: 9704 (Run 35, Clone 16, Gen 99) - Bad WU?

Post by Grandpa_01 »

It could be caused by any one of a number of things. Is the card overclocked, if so you may want to tone it down a little. I see it is a GTX 770 if it is not OCed and you are comfortable with adding voltage to it try giving it a little voltage bump and see if that helps. Those are just some basic trouble shooting steps you can give a try.
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
weirddan455
Posts: 12
Joined: Mon Oct 19, 2015 12:53 pm

Re: Project: 9704 (Run 35, Clone 16, Gen 99) - Bad WU?

Post by weirddan455 »

I have this card:

http://www.gigabyte.com/products/produc ... id=4629#ov

It's factory overclocked by ~100mhz. I haven't done any overclocking myself or messed with the voltages though and it's stable in everything else I've done including folding Core 17 and 18 WUs. I think I'll just wait and see what happens when I get my next Core 21 WU.
toTOW
Site Moderator
Posts: 6309
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Project: 9704 (Run 35, Clone 16, Gen 99) - Bad WU?

Post by toTOW »

This kind of errors (clEnqueueNDRangeKernel (-4)) are OpenCL errors (-4 error code is CL_MEM_OBJECT_ALLOCATION_FAILURE). If you were running Windows, I would say it was a driver reset (most likely TDR issue), but I don't know if Linux has a similar mechanism.

OpenCL errors are usually driver or OS related errors : inability to allocate ressources, either because the device is not reachable (during a driver reset because of TDR on Windows) or because something else is also trying to allocate ressources at the same time.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Post Reply