running 24x7 with i7-4790k CPU using only (4) cores
has seen a few of this type of error. This is a fellow team mate's system.
WU before and after this one ran to completion with no problems.
Code: Select all
14:02:00:WU01:FS01:Connecting to assign-GPU.stanford.edu:80
14:02:00:WU01:FS01:News:
14:02:00:WU01:FS01:Assigned to work server 171.64.65.84
14:02:00:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GM206 [GeForce GTX 960] from 171.64.65.84
14:02:00:WU01:FS01:Connecting to 171.64.65.84:8080
14:02:01:WU01:FS01:Downloading 3.28MiB
14:02:01:WU01:FS01:Download complete
14:02:02:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:9121 run:11 clone:2 gen:161 core:0x18 unit:0x000000c00a3b1e78553ea21d10e428ee
14:02:04:WU01:FS01:Starting
14:02:04:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_18.fah/FahCore_18 -dir 01 -suffix 01 -version 703 -lifeline 7777 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
14:02:04:WU01:FS01:Started FahCore on PID 12813
14:02:04:WU01:FS01:Core PID:12817
14:02:04:WU01:FS01:FahCore 0x18 started
14:02:05:WU01:FS01:0x18:*********************** Log Started 2015-10-20T14:02:04Z ***********************
14:02:05:WU01:FS01:0x18:Project: 9121 (Run 11, Clone 2, Gen 161)
14:02:05:WU01:FS01:0x18:Unit: 0x000000c00a3b1e78553ea21d10e428ee
14:02:05:WU01:FS01:0x18:CPU: 0x00000000000000000000000000000000
14:02:05:WU01:FS01:0x18:Machine: 1
14:02:05:WU01:FS01:0x18:Reading tar file state.xml
14:02:05:WU01:FS01:0x18:Reading tar file system.xml
14:02:05:WU01:FS01:0x18:Reading tar file integrator.xml
14:02:05:WU01:FS01:0x18:Reading tar file core.xml
14:02:05:WU01:FS01:0x18:Digital signatures verified
14:02:05:WU01:FS01:0x18:Folding@home GPU core18
14:02:05:WU01:FS01:0x18:Version 0.0.4
14:02:12:WU01:FS01:0x18:Completed 0 out of 2500000 steps (0%)
14:02:12:WU01:FS01:0x18:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
14:05:01:WU01:FS01:0x18:Completed 25000 out of 2500000 steps (1%)
14:07:49:WU01:FS01:0x18:Completed 50000 out of 2500000 steps (2%)
14:10:36:WU01:FS01:0x18:Completed 75000 out of 2500000 steps (3%)
14:13:23:WU01:FS01:0x18:Completed 100000 out of 2500000 steps (4%)
14:16:14:WU01:FS01:0x18:Completed 125000 out of 2500000 steps (5%)
14:19:01:WU01:FS01:0x18:Completed 150000 out of 2500000 steps (6%)
14:21:49:WU01:FS01:0x18:Completed 175000 out of 2500000 steps (7%)
14:24:36:WU01:FS01:0x18:Completed 200000 out of 2500000 steps (8%)
14:27:27:WU01:FS01:0x18:Completed 225000 out of 2500000 steps (9%)
14:30:14:WU01:FS01:0x18:Completed 250000 out of 2500000 steps (10%)
14:33:02:WU01:FS01:0x18:Completed 275000 out of 2500000 steps (11%)
14:35:49:WU01:FS01:0x18:Completed 300000 out of 2500000 steps (12%)
14:38:39:WU01:FS01:0x18:Completed 325000 out of 2500000 steps (13%)
14:41:26:WU01:FS01:0x18:Completed 350000 out of 2500000 steps (14%)
14:44:14:WU01:FS01:0x18:Completed 375000 out of 2500000 steps (15%)
14:47:01:WU01:FS01:0x18:Completed 400000 out of 2500000 steps (16%)
14:47:03:WU01:FS01:0x18:Bad State detected... attempting to resume from last good checkpoint
14:49:50:WU01:FS01:0x18:Completed 325000 out of 2500000 steps (13%)
14:52:37:WU01:FS01:0x18:Completed 350000 out of 2500000 steps (14%)
14:55:25:WU01:FS01:0x18:Completed 375000 out of 2500000 steps (15%)
14:58:12:WU01:FS01:0x18:Completed 400000 out of 2500000 steps (16%)
14:58:14:WU01:FS01:0x18:Bad State detected... attempting to resume from last good checkpoint
15:01:01:WU01:FS01:0x18:Completed 325000 out of 2500000 steps (13%)
15:03:49:WU01:FS01:0x18:Completed 350000 out of 2500000 steps (14%)
15:06:36:WU01:FS01:0x18:Completed 375000 out of 2500000 steps (15%)
15:09:23:WU01:FS01:0x18:Completed 400000 out of 2500000 steps (16%)
15:12:13:WU01:FS01:0x18:Completed 425000 out of 2500000 steps (17%)
15:15:01:WU01:FS01:0x18:Completed 450000 out of 2500000 steps (18%)
15:17:48:WU01:FS01:0x18:Completed 475000 out of 2500000 steps (19%)
15:20:35:WU01:FS01:0x18:Completed 500000 out of 2500000 steps (20%)
15:23:25:WU01:FS01:0x18:Completed 525000 out of 2500000 steps (21%)
15:26:12:WU01:FS01:0x18:Completed 550000 out of 2500000 steps (22%)
15:29:00:WU01:FS01:0x18:Completed 575000 out of 2500000 steps (23%)
15:31:47:WU01:FS01:0x18:Completed 600000 out of 2500000 steps (24%)
15:31:49:WU01:FS01:0x18:Bad State detected... attempting to resume from last good checkpoint
15:31:49:WU01:FS01:0x18:Max number of retries reached. Aborting.
15:31:49:WU01:FS01:0x18:ERROR:exception: Max Retries Reached
15:31:49:WU01:FS01:0x18:Saving result file logfile_01.txt
15:31:49:WU01:FS01:0x18:Saving result file log.txt
15:31:49:WU01:FS01:0x18:Folding@home Core Shutdown: BAD_WORK_UNIT
15:31:49:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
15:31:49:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:9121 run:11 clone:2 gen:161 core:0x18 unit:0x000000c00a3b1e78553ea21d10e428ee
15:31:49:WU01:FS01:Uploading 2.78KiB to 171.64.65.84
15:31:49:WU01:FS01:Connecting to 171.64.65.84:8080
15:31:50:WU01:FS01:Upload complete
15:31:50:WU01:FS01:Server responded WORK_ACK (400)
15:31:50:WU01:FS01:Cleaning up