BAD_WORK_UNIT, Project: 13000 (Run 1046, Clone 0, Gen 28)

Moderators: Site Moderators, PandeGroup

BAD_WORK_UNIT, Project: 13000 (Run 1046, Clone 0, Gen 28)

Postby fangfufu » Mon Jul 28, 2014 8:44 am

Well guys, I have no idea what's going on... Please help me out / point me toward the right direction.

Code: Select all
*********************** Log Started 2014-07-28T01:12:26Z ***********************
01:12:26:************************* Folding@home Client *************************
01:12:26:    Website: http://folding.stanford.edu/
01:12:26:  Copyright: (c) 2009-2014 Stanford University
01:12:26:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
01:12:26:       Args: --child --lifeline 1895 /etc/fahclient/config.xml --run-as
01:12:26:             fahclient --pid-file=/var/run/fahclient.pid --daemon
01:12:26:     Config: /etc/fahclient/config.xml
01:12:26:******************************** Build ********************************
01:12:26:    Version: 7.4.4
01:12:26:       Date: Mar 4 2014
01:12:26:       Time: 12:02:38
01:12:26:    SVN Rev: 4130
01:12:26:     Branch: fah/trunk/client
01:12:26:   Compiler: GNU 4.4.7
01:12:26:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
01:12:26:             -fno-unsafe-math-optimizations -msse2
01:12:26:   Platform: linux2 3.2.0-1-amd64
01:12:26:       Bits: 64
01:12:26:       Mode: Release
01:12:26:******************************* System ********************************
01:12:26:        CPU: Intel(R) Core(TM) i5-4200M CPU @ 2.50GHz
01:12:26:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
01:12:26:       CPUs: 4
01:12:26:     Memory: 15.61GiB
01:12:26:Free Memory: 15.13GiB
01:12:26:    Threads: POSIX_THREADS
01:12:26: OS Version: 3.14
01:12:26:Has Battery: true
01:12:26: On Battery: false
01:12:26: UTC Offset: 1
01:12:26:        PID: 1897
01:12:26:        CWD: /var/lib/fahclient
01:12:26:         OS: Linux 3.14-1-amd64 x86_64
01:12:26:    OS Arch: AMD64
01:12:26:       GPUs: 1
01:12:26:      GPU 0: NVIDIA:3 GK208 [GeForce GT 730M]
01:12:26:       CUDA: 3.5
01:12:26:CUDA Driver: 6050
01:12:26:***********************************************************************
01:12:26:<config>
01:12:26:  <!-- Client Control -->
01:12:26:  <client-threads v='6'/>
01:12:26:  <cycle-rate v='4'/>
01:12:26:  <cycles v='-1'/>
01:12:26:  <data-directory v='.'/>
01:12:26:  <disable-sleep-when-active v='true'/>
01:12:26:  <exec-directory v='/usr/bin'/>
01:12:26:  <exit-when-done v='false'/>
01:12:26:  <fold-anon v='false'/>
01:12:26:  <idle-seconds v='300'/>
01:12:26:  <open-web-control v='false'/>
01:12:26:
01:12:26:  <!-- Configuration -->
01:12:26:  <config-rotate v='true'/>
01:12:26:  <config-rotate-dir v='configs'/>
01:12:26:  <config-rotate-max v='16'/>
01:12:26:
01:12:26:  <!-- Debugging -->
01:12:26:  <assignment-servers>
01:12:26:    assign3.stanford.edu:8080 assign4.stanford.edu:80
01:12:26:  </assignment-servers>
01:12:26:  <auth-as v='true'/>
01:12:26:  <capture-directory v='capture'/>
01:12:26:  <capture-on-error v='false'/>
01:12:26:  <capture-packets v='false'/>
01:12:26:  <capture-requests v='false'/>
01:12:26:  <capture-responses v='false'/>
01:12:26:  <capture-sockets v='false'/>
01:12:26:  <core-exec v='FahCore_$type'/>
01:12:26:  <core-wrapper-exec v='FAHCoreWrapper'/>
01:12:26:  <debug-sockets v='false'/>
01:12:26:  <exception-locations v='true'/>
01:12:26:  <gpu-assignment-servers>
01:12:26:    assign-GPU.stanford.edu:80 assign-GPU2.stanford.edu:80
01:12:26:  </gpu-assignment-servers>
01:12:26:  <stack-traces v='false'/>
01:12:26:
01:12:26:  <!-- Error Handling -->
01:12:26:  <max-slot-errors v='10'/>
01:12:26:  <max-unit-errors v='5'/>
01:12:26:
01:12:26:  <!-- Folding Core -->
01:12:26:  <checkpoint v='30'/>
01:12:26:  <core-dir v='cores'/>
01:12:26:  <core-priority v='idle'/>
01:12:26:  <cpu-affinity v='false'/>
01:12:26:  <cpu-usage v='100'/>
01:12:26:  <gpu-usage v='100'/>
01:12:26:  <no-assembly v='false'/>
01:12:26:
01:12:26:  <!-- Folding Slot Configuration -->
01:12:26:  <cause v='ANY'/>
01:12:26:  <client-subtype v='LINUX'/>
01:12:26:  <client-type v='normal'/>
01:12:26:  <cpu-species v='X86_PENTIUM_II'/>
01:12:26:  <cpu-type v='AMD64'/>
01:12:26:  <cpus v='-1'/>
01:12:26:  <gpu v='true'/>
01:12:26:  <max-packet-size v='normal'/>
01:12:26:  <os-species v='UNKNOWN'/>
01:12:26:  <os-type v='LINUX'/>
01:12:26:  <project-key v='0'/>
01:12:26:  <smp v='true'/>
01:12:26:
01:12:26:  <!-- GUI -->
01:12:26:  <gui-enabled v='true'/>
01:12:26:
01:12:26:  <!-- HTTP Server -->
01:12:26:  <allow v='127.0.0.1'/>
01:12:26:  <connection-timeout v='60'/>
01:12:26:  <deny v='0/0'/>
01:12:26:  <http-addresses v='0:7396'/>
01:12:26:  <https-addresses v=''/>
01:12:26:  <max-connect-time v='900'/>
01:12:26:  <max-connections v='800'/>
01:12:26:  <max-request-length v='52428800'/>
01:12:26:  <min-connect-time v='300'/>
01:12:26:  <threads v='4'/>
01:12:26:
01:12:26:  <!-- Logging -->
01:12:26:  <log v='log.txt'/>
01:12:26:  <log-color v='true'/>
01:12:26:  <log-crlf v='false'/>
01:12:26:  <log-date v='false'/>
01:12:26:  <log-date-periodically v='21600'/>
01:12:26:  <log-debug v='true'/>
01:12:26:  <log-domain v='false'/>
01:12:26:  <log-header v='true'/>
01:12:26:  <log-level v='true'/>
01:12:26:  <log-no-info-header v='true'/>
01:12:26:  <log-redirect v='false'/>
01:12:26:  <log-rotate v='true'/>
01:12:26:  <log-rotate-dir v='logs'/>
01:12:26:  <log-rotate-max v='16'/>
01:12:26:  <log-short-level v='false'/>
01:12:26:  <log-simple-domains v='true'/>
01:12:26:  <log-thread-id v='false'/>
01:12:26:  <log-thread-prefix v='true'/>
01:12:26:  <log-time v='true'/>
01:12:26:  <log-to-screen v='true'/>
01:12:26:  <log-truncate v='false'/>
01:12:26:  <verbosity v='4'/>
01:12:26:
01:12:26:  <!-- Network -->
01:12:26:  <proxy v=':8080'/>
01:12:26:  <proxy-enable v='false'/>
01:12:26:  <proxy-pass v=''/>
01:12:26:  <proxy-user v=''/>
01:12:26:
01:12:26:  <!-- Process Control -->
01:12:26:  <child v='true'/>
01:12:26:  <daemon v='true'/>
01:12:26:  <fork v='false'/>
01:12:26:  <pid v='false'/>
01:12:26:  <pid-file v='/var/run/fahclient.pid'/>
01:12:26:  <respawn v='false'/>
01:12:26:  <service v='false'/>
01:12:26:
01:12:26:  <!-- Remote Command Server -->
01:12:26:  <command-address v='0.0.0.0'/>
01:12:26:  <command-allow-no-pass v='127.0.0.1'/>
01:12:26:  <command-deny-no-pass v='0/0'/>
01:12:26:  <command-enable v='true'/>
01:12:26:  <command-port v='36330'/>
01:12:26:
01:12:26:  <!-- Slot Control -->
01:12:26:  <idle v='false'/>
01:12:26:  <max-shutdown-wait v='60'/>
01:12:26:  <pause-on-battery v='true'/>
01:12:26:  <pause-on-start v='false'/>
01:12:26:  <paused v='false'/>
01:12:26:  <power v='medium'/>
01:12:26:
01:12:26:  <!-- User Information -->
01:12:26:  <machine-id v='0'/>
01:12:26:  <passkey v='********************************'/>
01:12:26:  <team v='50959'/>
01:12:26:  <user v='fangfufu'/>
01:12:26:
01:12:26:  <!-- Web Server -->
01:12:26:  <web-allow v='127.0.0.1'/>
01:12:26:  <web-deny v='0/0'/>
01:12:26:  <web-enable v='true'/>
01:12:26:
01:12:26:  <!-- Web Server Sessions -->
01:12:26:  <session-cookie v='sid'/>
01:12:26:  <session-lifetime v='86400'/>
01:12:26:  <session-timeout v='3600'/>
01:12:26:
01:12:26:  <!-- Work Unit Control -->
01:12:26:  <dump-after-deadline v='true'/>
01:12:26:  <max-queue v='16'/>
01:12:26:  <max-units v='0'/>
01:12:26:  <next-unit-percentage v='99'/>
01:12:26:  <stall-detection-enabled v='false'/>
01:12:26:  <stall-percent v='5'/>
01:12:26:  <stall-timeout v='1800'/>
01:12:26:
01:12:26:  <!-- Folding Slots -->
01:12:26:  <slot id='1' type='GPU'/>
01:12:26:</config>
01:12:26:Switching to user fahclient
01:12:26:Trying to access database...
01:12:26:Successfully acquired database lock
01:12:26:Enabled folding slot 01: READY gpu:0:GK208 [GeForce GT 730M]
01:12:26:WU00:FS01:Starting
01:12:26:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17 -dir 00 -suffix 01 -version 704 -lifeline 1897 -checkpoint 30 -gpu 0 -gpu-vendor nvidia
01:12:26:WU00:FS01:Started FahCore on PID 1908
01:12:26:WU00:FS01:Core PID:1912
01:12:26:WU00:FS01:FahCore 0x17 started
01:12:27:WU00:FS01:0x17:*********************** Log Started 2014-07-28T01:12:26Z ***********************
01:12:27:WU00:FS01:0x17:Project: 13000 (Run 1046, Clone 0, Gen 28)
01:12:27:WU00:FS01:0x17:Unit: 0x00000045538b3db75310c310fa55cbd7
01:12:27:WU00:FS01:0x17:CPU: 0x00000000000000000000000000000000
01:12:27:WU00:FS01:0x17:Machine: 1
01:12:27:WU00:FS01:0x17:Digital signatures verified
01:12:27:WU00:FS01:0x17:  Found a checkpoint file
01:14:28:WU00:FS01:0x17:Completed 750000 out of 5000000 steps (15%)
02:03:02:WU00:FS01:0x17:Completed 800000 out of 5000000 steps (16%)
02:51:29:WU00:FS01:0x17:Completed 850000 out of 5000000 steps (17%)
03:40:05:WU00:FS01:0x17:Completed 900000 out of 5000000 steps (18%)
04:00:12:FS01:Paused
04:00:12:FS01:Shutting core down
04:00:12:WU00:FS01:0x17:Caught signal SIGINT(2) on PID 1912
04:00:12:WU00:FS01:0x17:Exiting, please wait. . .
04:00:12:WU00:FS01:0x17:Folding@home Core Shutdown: INTERRUPTED
04:00:13:WU00:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
04:01:12:Saving configuration to /etc/fahclient/config.xml
04:01:12:<config>
04:01:12:  <!-- Folding Core -->
04:01:12:  <checkpoint v='30'/>
04:01:12:
04:01:12:  <!-- Logging -->
04:01:12:  <verbosity v='4'/>
04:01:12:
04:01:12:  <!-- Network -->
04:01:12:  <proxy v=':8080'/>
04:01:12:
04:01:12:  <!-- Slot Control -->
04:01:12:  <power v='medium'/>
04:01:12:
04:01:12:  <!-- User Information -->
04:01:12:  <passkey v='********************************'/>
04:01:12:  <team v='50959'/>
04:01:12:  <user v='fangfufu'/>
04:01:12:
04:01:12:  <!-- Folding Slots -->
04:01:12:  <slot id='1' type='GPU'>
04:01:12:    <paused v='true'/>
04:01:12:  </slot>
04:01:12:</config>
04:07:44:FS01:Unpaused
04:07:44:WU00:FS01:Starting
04:07:44:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17 -dir 00 -suffix 01 -version 704 -lifeline 1897 -checkpoint 30 -gpu 0 -gpu-vendor nvidia
04:07:44:WU00:FS01:Started FahCore on PID 10618
04:07:44:WU00:FS01:Core PID:10622
04:07:44:WU00:FS01:FahCore 0x17 started
04:07:45:WU00:FS01:0x17:*********************** Log Started 2014-07-28T04:07:44Z ***********************
04:07:45:WU00:FS01:0x17:Project: 13000 (Run 1046, Clone 0, Gen 28)
04:07:45:WU00:FS01:0x17:Unit: 0x00000045538b3db75310c310fa55cbd7
04:07:45:WU00:FS01:0x17:CPU: 0x00000000000000000000000000000000
04:07:45:WU00:FS01:0x17:Machine: 1
04:07:45:WU00:FS01:0x17:Digital signatures verified
04:07:45:WU00:FS01:0x17:  Found a checkpoint file
04:08:19:Removing old file 'configs/config-20140726-215003.xml'
04:08:19:Saving configuration to /etc/fahclient/config.xml
04:08:19:<config>
04:08:19:  <!-- Folding Core -->
04:08:19:  <checkpoint v='30'/>
04:08:19:
04:08:19:  <!-- Logging -->
04:08:19:  <verbosity v='4'/>
04:08:19:
04:08:19:  <!-- Network -->
04:08:19:  <proxy v=':8080'/>
04:08:19:
04:08:19:  <!-- Slot Control -->
04:08:19:  <power v='medium'/>
04:08:19:
04:08:19:  <!-- User Information -->
04:08:19:  <passkey v='********************************'/>
04:08:19:  <team v='50959'/>
04:08:19:  <user v='fangfufu'/>
04:08:19:
04:08:19:  <!-- Folding Slots -->
04:08:19:  <slot id='1' type='GPU'/>
04:08:19:</config>
04:09:43:WU00:FS01:0x17:Completed 875000 out of 5000000 steps (17%)
04:11:27:FS01:Paused
04:11:27:FS01:Shutting core down
04:11:27:WU00:FS01:0x17:Caught signal SIGINT(2) on PID 10622
04:11:27:WU00:FS01:0x17:Exiting, please wait. . .
04:11:29:WU00:FS01:0x17:Lost lifeline PID 10618, exiting
04:11:29:WU00:FS01:0x17:ERROR:103: Lost client lifeline
04:11:29:WU00:FS01:0x17:Folding@home Core Shutdown: CLIENT_DIED
04:11:29:WU00:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
04:12:23:Removing old file 'configs/config-20140726-215037.xml'
04:12:23:Saving configuration to /etc/fahclient/config.xml
04:12:23:<config>
04:12:23:  <!-- Folding Core -->
04:12:23:  <checkpoint v='30'/>
04:12:23:
04:12:23:  <!-- Logging -->
04:12:23:  <verbosity v='4'/>
04:12:23:
04:12:23:  <!-- Network -->
04:12:23:  <proxy v=':8080'/>
04:12:23:
04:12:23:  <!-- Slot Control -->
04:12:23:  <power v='medium'/>
04:12:23:
04:12:23:  <!-- User Information -->
04:12:23:  <passkey v='********************************'/>
04:12:23:  <team v='50959'/>
04:12:23:  <user v='fangfufu'/>
04:12:23:
04:12:23:  <!-- Folding Slots -->
04:12:23:  <slot id='1' type='GPU'>
04:12:23:    <paused v='true'/>
04:12:23:  </slot>
04:12:23:</config>
04:12:29:FS01:Unpaused
04:12:29:WU00:FS01:Starting
04:12:29:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17 -dir 00 -suffix 01 -version 704 -lifeline 1897 -checkpoint 30 -gpu 0 -gpu-vendor nvidia
04:12:29:WU00:FS01:Started FahCore on PID 12730
04:12:29:WU00:FS01:Core PID:12734
04:12:29:WU00:FS01:FahCore 0x17 started
04:12:30:WU00:FS01:0x17:*********************** Log Started 2014-07-28T04:12:29Z ***********************
04:12:30:WU00:FS01:0x17:Project: 13000 (Run 1046, Clone 0, Gen 28)
04:12:30:WU00:FS01:0x17:Unit: 0x00000045538b3db75310c310fa55cbd7
04:12:30:WU00:FS01:0x17:CPU: 0x00000000000000000000000000000000
04:12:30:WU00:FS01:0x17:Machine: 1
04:12:30:WU00:FS01:0x17:Digital signatures verified
04:12:30:WU00:FS01:0x17:  Found a checkpoint file
04:13:24:Removing old file 'configs/config-20140726-215105.xml'
04:13:24:Saving configuration to /etc/fahclient/config.xml
04:13:24:<config>
04:13:24:  <!-- Folding Core -->
04:13:24:  <checkpoint v='30'/>
04:13:24:
04:13:24:  <!-- Logging -->
04:13:24:  <verbosity v='4'/>
04:13:24:
04:13:24:  <!-- Network -->
04:13:24:  <proxy v=':8080'/>
04:13:24:
04:13:24:  <!-- Slot Control -->
04:13:24:  <power v='medium'/>
04:13:24:
04:13:24:  <!-- User Information -->
04:13:24:  <passkey v='********************************'/>
04:13:24:  <team v='50959'/>
04:13:24:  <user v='fangfufu'/>
04:13:24:
04:13:24:  <!-- Folding Slots -->
04:13:24:  <slot id='1' type='GPU'/>
04:13:24:</config>
04:14:26:WU00:FS01:0x17:ERROR:exception: Error compiling kernel:
04:14:26:WU00:FS01:0x17:Saving result file logfile_01.txt
04:14:26:WU00:FS01:0x17:Saving result file log.txt
04:14:26:WU00:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
04:14:27:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
04:14:27:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13000 run:1046 clone:0 gen:28 core:0x17 unit:0x00000045538b3db75310c310fa55cbd7
04:14:27:WU00:FS01:Uploading 4.04KiB to 140.163.4.231
04:14:27:WU00:FS01:Connecting to 140.163.4.231:8080
04:14:27:WU01:FS01:Connecting to 171.67.108.201:80
04:14:27:WU00:FS01:Upload complete
04:14:27:WU00:FS01:Server responded WORK_ACK (400)
04:14:27:WU00:FS01:Cleaning up
04:14:28:WU01:FS01:Assigned to work server 140.163.4.231
04:14:28:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GK208 [GeForce GT 730M] from 140.163.4.231
04:14:28:WU01:FS01:Connecting to 140.163.4.231:8080
04:14:28:WU01:FS01:Downloading 4.84MiB
04:14:34:WU01:FS01:Download 68.46%
04:14:39:WU01:FS01:Download complete
04:14:39:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13000 run:1079 clone:0 gen:40 core:0x17 unit:0x00000052538b3db75310cc6641f59df9
04:14:39:WU01:FS01:Starting
04:14:39:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17 -dir 01 -suffix 01 -version 704 -lifeline 1897 -checkpoint 30 -gpu 0 -gpu-vendor nvidia
04:14:39:WU01:FS01:Started FahCore on PID 13177
04:14:39:WU01:FS01:Core PID:13181
04:14:39:WU01:FS01:FahCore 0x17 started
04:14:40:WU01:FS01:0x17:*********************** Log Started 2014-07-28T04:14:39Z ***********************
04:14:40:WU01:FS01:0x17:Project: 13000 (Run 1079, Clone 0, Gen 40)
04:14:40:WU01:FS01:0x17:Unit: 0x00000052538b3db75310cc6641f59df9
04:14:40:WU01:FS01:0x17:CPU: 0x00000000000000000000000000000000
04:14:40:WU01:FS01:0x17:Machine: 1
04:14:40:WU01:FS01:0x17:Reading tar file state.xml
04:14:40:WU01:FS01:0x17:Reading tar file system.xml
04:14:40:WU01:FS01:0x17:Reading tar file integrator.xml
04:14:40:WU01:FS01:0x17:Reading tar file core.xml
04:14:40:WU01:FS01:0x17:Digital signatures verified
04:16:31:WU01:FS01:0x17:Completed 0 out of 5000000 steps (0%)
******************************* Date: 2014-07-28 *******************************
08:24:34:WARNING:WU01:FS01:Detected clock skew (3 hours 48 mins), adjusting time estimates
08:24:35:WU01:FS01:0x17:ERROR:exception: Error downloading array energyBuffer: clEnqueueReadBuffer (-36)
08:24:35:WU01:FS01:0x17:Saving result file logfile_01.txt
08:24:35:WU01:FS01:0x17:Saving result file log.txt
08:24:35:WU01:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
08:24:35:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
08:24:35:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13000 run:1079 clone:0 gen:40 core:0x17 unit:0x00000052538b3db75310cc6641f59df9
08:24:35:WU01:FS01:Uploading 2.35KiB to 140.163.4.231
08:24:35:WU01:FS01:Connecting to 140.163.4.231:8080
08:24:35:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
08:24:35:WU01:FS01:Connecting to 140.163.4.231:80
08:24:35:WARNING:WU01:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.231:80: Network is unreachable
08:24:35:WU01:FS01:Trying to send results to collection server
08:24:35:WU01:FS01:Uploading 2.35KiB to 140.163.4.241
08:24:35:WU01:FS01:Connecting to 140.163.4.241:8080
08:24:35:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
08:24:35:WU01:FS01:Connecting to 140.163.4.241:80
08:24:35:ERROR:WU01:FS01:Exception: Failed to connect to 140.163.4.241:80: Network is unreachable
08:24:35:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13000 run:1079 clone:0 gen:40 core:0x17 unit:0x00000052538b3db75310cc6641f59df9
08:24:35:WU01:FS01:Uploading 2.35KiB to 140.163.4.231
08:24:35:WU01:FS01:Connecting to 140.163.4.231:8080
08:24:35:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
08:24:35:WU01:FS01:Connecting to 140.163.4.231:80
08:24:35:WARNING:WU01:FS01:Exception: Failed to send results to work server: Failed to connect to 140.163.4.231:80: Network is unreachable
08:24:35:WU01:FS01:Trying to send results to collection server
08:24:35:WU01:FS01:Uploading 2.35KiB to 140.163.4.241
08:24:35:WU01:FS01:Connecting to 140.163.4.241:8080
08:24:35:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
08:24:35:WU01:FS01:Connecting to 140.163.4.241:80
08:24:35:ERROR:WU01:FS01:Exception: Failed to connect to 140.163.4.241:80: Network is unreachable
08:24:35:WARNING:WU00:FS01:Exception: Could not get IP address for assign-GPU.stanford.edu: Temporary failure in name resolution
08:24:35:ERROR:WU00:FS01:Exception: Could not get an assignment
08:24:35:WARNING:WU00:FS01:Exception: Could not get IP address for assign-GPU.stanford.edu: Temporary failure in name resolution
08:24:35:ERROR:WU00:FS01:Exception: Could not get an assignment
08:25:35:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13000 run:1079 clone:0 gen:40 core:0x17 unit:0x00000052538b3db75310cc6641f59df9
08:25:35:WU01:FS01:Uploading 2.35KiB to 140.163.4.231
08:25:35:WU01:FS01:Connecting to 140.163.4.231:8080
08:25:36:WU00:FS01:Connecting to 171.67.108.201:80
08:25:36:WU01:FS01:Upload complete
08:25:36:WU01:FS01:Server responded WORK_ACK (400)
08:25:36:WU01:FS01:Cleaning up
08:25:36:WU00:FS01:Assigned to work server 171.67.108.52
08:25:36:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GK208 [GeForce GT 730M] from 171.67.108.52
08:25:36:WU00:FS01:Connecting to 171.67.108.52:8080
08:25:37:WU00:FS01:Downloading 1.52MiB
08:25:41:WU00:FS01:Download complete
08:25:41:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9201 run:574 clone:0 gen:111 core:0x17 unit:0x0000007e6652edc45399ec95d99af607
08:25:41:WU00:FS01:Starting
08:25:41:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17 -dir 00 -suffix 01 -version 704 -lifeline 1897 -checkpoint 30 -gpu 0 -gpu-vendor nvidia
08:25:41:WU00:FS01:Started FahCore on PID 13543
08:25:41:WU00:FS01:Core PID:13547
08:25:41:WU00:FS01:FahCore 0x17 started
08:25:42:WU00:FS01:0x17:*********************** Log Started 2014-07-28T08:25:41Z ***********************
08:25:42:WU00:FS01:0x17:Project: 9201 (Run 574, Clone 0, Gen 111)
08:25:42:WU00:FS01:0x17:Unit: 0x0000007e6652edc45399ec95d99af607
08:25:42:WU00:FS01:0x17:CPU: 0x00000000000000000000000000000000
08:25:42:WU00:FS01:0x17:Machine: 1
08:25:42:WU00:FS01:0x17:Reading tar file state.xml
08:25:42:WU00:FS01:0x17:Reading tar file system.xml
08:25:42:WU00:FS01:0x17:Reading tar file integrator.xml
08:25:42:WU00:FS01:0x17:Reading tar file core.xml
08:25:42:WU00:FS01:0x17:Digital signatures verified
08:26:00:WU00:FS01:0x17:Completed 0 out of 5000000 steps (0%)
Folder: Intel(R) Core(TM) i7-4900MQ (running on two thread to prevent thermal throttling...)

I first started folding back in the Google Compute days!
fangfufu
 
Posts: 97
Joined: Thu Jan 01, 2009 3:26 am
Location: Norwich, United Kingdom

Re: BAD_WORK_UNIT, Project: 13000 (Run 1046, Clone 0, Gen 28

Postby Rel25917 » Mon Jul 28, 2014 2:36 pm

The constant pausing/unpausing before the core even gets going makes me think corrupt checkpoint.
Rel25917
 
Posts: 160
Joined: Wed Aug 15, 2012 2:31 am

Re: BAD_WORK_UNIT, Project: 13000 (Run 1046, Clone 0, Gen 28

Postby fangfufu » Mon Jul 28, 2014 2:47 pm

Okay, I will be more careful next time. It was late night, I just wanted to quickly check something on the laptop before suspending it. It makes too much noise for me to sleep.
fangfufu
 
Posts: 97
Joined: Thu Jan 01, 2009 3:26 am
Location: Norwich, United Kingdom

Re: BAD_WORK_UNIT, Project: 13000 (Run 1046, Clone 0, Gen 28

Postby Joe_H » Mon Jul 28, 2014 3:15 pm

First, please return the logging verbosity to the default of 3. Higher values are rarely useful, and in the present case get in the way of troubleshooting.

As for suspending processing, it appears from the log that you just put your laptop to sleep. This is known to corrupt GPU processing as the RAM contents on the GPU are not saved by the OS during sleep. CPU processing usually resumes just fine after sleep because the OS does save RAM contents and will restore if necessary. Later versions of Core_17 do better at recovering from this by detecting the loss of GPU memory contents and resuming from a prior checkpoint. But a Linux version of the later code has not been released yet.

To avoid this happening, first pause the folding client before sleeping a system. That will stop processing on the WU, and when resumed the core will start at the last checkpoint.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 4594
Joined: Tue Apr 21, 2009 4:41 pm
Location: W. MA

Re: BAD_WORK_UNIT, Project: 13000 (Run 1046, Clone 0, Gen 28

Postby fangfufu » Mon Jul 28, 2014 3:57 pm

To avoid this happening, first pause the folding client before sleeping a system. That will stop processing on the WU, and when resumed the core will start at the last checkpoint.


Okay, I will do this for now. Thanks.
fangfufu
 
Posts: 97
Joined: Thu Jan 01, 2009 3:26 am
Location: Norwich, United Kingdom

Re: BAD_WORK_UNIT, Project: 13000 (Run 1046, Clone 0, Gen 28

Postby fangfufu » Mon Jul 28, 2014 5:31 pm

Okay people, when I was posting the post above, I had already suspended my laptop. I just got home, and the new work unit got corrupted again... As Joe_H said, suspension corrupts GPU work unit.

Is there an ETA for the new version of Core_17?
fangfufu
 
Posts: 97
Joined: Thu Jan 01, 2009 3:26 am
Location: Norwich, United Kingdom

Re: BAD_WORK_UNIT, Project: 13000 (Run 1046, Clone 0, Gen 28

Postby Joe_H » Mon Jul 28, 2014 5:47 pm

For Linux, none has been given. My understanding is the current Linux Core_17 in release is version 0.46. For Windows systems the version in full release is 0.52.
Joe_H
Site Admin
 
Posts: 4594
Joined: Tue Apr 21, 2009 4:41 pm
Location: W. MA

Re: BAD_WORK_UNIT, Project: 13000 (Run 1046, Clone 0, Gen 28

Postby 7im » Mon Jul 28, 2014 6:16 pm

fangfufu wrote:snip

Is there an ETA for the new version of Core_17?


Please note that a new core version is not likely to fix the suspension issue. As Joe noted, this is an OS and Hardware issue. Folding might be able to work around the problem eventually, but it's not high on the priority list.

They also just released a Core_18, which is kind of an updated Core_17. Make of it what you may.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
User avatar
7im
 
Posts: 14648
Joined: Thu Nov 29, 2007 4:30 pm
Location: Arizona


Return to Issues with a specific WU

Who is online

Users browsing this forum: No registered users and 1 guest

cron