10467 (Run 0, Clone 135, Gen 276) - Bad State Max Retries

Moderators: Site Moderators, FAHC Science Team

Post Reply
parkut
Posts: 364
Joined: Tue Feb 12, 2008 7:33 am
Hardware configuration: Running exclusively Linux headless blades. All are dedicated crunching machines.
Location: SE Michigan, USA

10467 (Run 0, Clone 135, Gen 276) - Bad State Max Retries

Post by parkut »

Have not had one of these in a while

Ubuntu 14.04 LTS Linux
Model Name: NVIDIA:3 GK110 [GeForce GTX 780 Ti]
Driver Version: 331.20
Gpu temp: 70C
Client Version: 7.4.4

Bad State - Max Retries Reached - BAD_WORK_UNIT (114 = 0x72)

Code: Select all

04:05:52:WU02:FS01:Connecting to 171.67.108.45:80
04:05:53:WU02:FS01:Assigned to work server 140.163.4.233
04:05:53:WU02:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GK110 [GeForce GTX 780 Ti] from 140.163.4.233
04:05:53:WU02:FS01:Connecting to 140.163.4.233:8080
04:05:53:WU02:FS01:Downloading 4.28MiB
04:05:59:WU02:FS01:Download 45.27%
04:06:05:WU02:FS01:Download 93.47%
04:06:05:WU02:FS01:Download complete
04:06:05:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:10467 run:0 clone:135 gen:276 core:0x17 unit:0x000001bf538b3db9538bbd479616a096
04:13:17:WU02:FS01:Starting
-lifeline 26145 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
04:13:17:WU02:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17 -dir 02 -suffix 01 -version 704 
04:13:17:WU02:FS01:Started FahCore on PID 28057
04:13:17:WU02:FS01:Core PID:28061
04:13:17:WU02:FS01:FahCore 0x17 started
04:13:17:WU02:FS01:0x17:*********************** Log Started 2016-01-10T04:13:17Z ***********************
04:13:17:WU02:FS01:0x17:Project: 10467 (Run 0, Clone 135, Gen 276)
04:13:17:WU02:FS01:0x17:Unit: 0x000001bf538b3db9538bbd479616a096
04:13:17:WU02:FS01:0x17:CPU: 0x00000000000000000000000000000000
04:13:17:WU02:FS01:0x17:Machine: 1
04:13:17:WU02:FS01:0x17:Reading tar file state.xml
04:13:18:WU02:FS01:0x17:Reading tar file system.xml
04:13:19:WU02:FS01:0x17:Reading tar file integrator.xml
04:13:19:WU02:FS01:0x17:Reading tar file core.xml
04:13:19:WU02:FS01:0x17:Digital signatures verified
04:17:20:WU02:FS01:0x17:Completed 0 out of 5000000 steps (0%)
04:21:44:WU02:FS01:0x17:Completed 50000 out of 5000000 steps (1%)
04:25:58:WU02:FS01:0x17:Completed 100000 out of 5000000 steps (2%)
04:30:27:WU02:FS01:0x17:Completed 150000 out of 5000000 steps (3%)
04:40:11:WU02:FS01:0x17:Completed 200000 out of 5000000 steps (4%)
05:03:47:WU02:FS01:0x17:Completed 250000 out of 5000000 steps (5%)
05:04:05:WU02:FS01:0x17:Bad State detected... attempting to resume from last good checkpoint
05:06:12:WU02:FS01:0x17:Completed 150000 out of 5000000 steps (3%)
05:10:26:WU02:FS01:0x17:Completed 200000 out of 5000000 steps (4%)
05:14:39:WU02:FS01:0x17:Completed 250000 out of 5000000 steps (5%)
05:19:08:WU02:FS01:0x17:Completed 300000 out of 5000000 steps (6%)
... snip
07:11:34:WU02:FS01:0x17:Completed 1600000 out of 5000000 steps (32%)
07:20:28:WU02:FS01:0x17:Completed 1650000 out of 5000000 steps (33%)
07:44:04:WU02:FS01:0x17:Completed 1700000 out of 5000000 steps (34%)
08:07:39:WU02:FS01:0x17:Completed 1750000 out of 5000000 steps (35%)
08:07:52:WU02:FS01:0x17:Bad State detected... attempting to resume from last good checkpoint
08:09:59:WU02:FS01:0x17:Completed 1650000 out of 5000000 steps (33%)
08:14:13:WU02:FS01:0x17:Completed 1700000 out of 5000000 steps (34%)
08:18:27:WU02:FS01:0x17:Completed 1750000 out of 5000000 steps (35%)
... snip
12:07:53:WU02:FS01:0x17:Completed 4400000 out of 5000000 steps (88%)
12:12:07:WU02:FS01:0x17:Completed 4450000 out of 5000000 steps (89%)
12:27:55:WU02:FS01:0x17:Completed 4500000 out of 5000000 steps (90%)
12:28:08:WU02:FS01:0x17:Bad State detected... attempting to resume from last good checkpoint
12:28:08:WU02:FS01:0x17:Max number of retries reached. Aborting.
12:28:08:WU02:FS01:0x17:ERROR:exception: Max Retries Reached
12:28:08:WU02:FS01:0x17:Saving result file logfile_01.txt
12:28:08:WU02:FS01:0x17:Saving result file badStateCheckpoint_1283209767
12:28:12:WU02:FS01:0x17:Saving result file badStateCheckpoint_476171140
12:28:16:WU02:FS01:0x17:Saving result file badStateCheckpoint_804361509
12:28:21:WU02:FS01:0x17:Saving result file badStateForceGroup0_1283209767Core.xml
12:28:26:WU02:FS01:0x17:Saving result file badStateForceGroup0_1283209767Ref.xml
12:28:32:WU02:FS01:0x17:Saving result file badStateForceGroup0_476171140Core.xml
12:28:38:WU02:FS01:0x17:Saving result file badStateForceGroup0_476171140Ref.xml
12:28:44:WU02:FS01:0x17:Saving result file badStateForceGroup0_804361509Core.xml
12:28:49:WU02:FS01:0x17:Saving result file badStateForceGroup0_804361509Ref.xml
12:28:53:WU02:FS01:0x17:Saving result file badStateForceGroup1_1283209767Core.xml
12:28:58:WU02:FS01:0x17:Saving result file badStateForceGroup1_1283209767Ref.xml
12:29:02:WU02:FS01:0x17:Saving result file badStateForceGroup1_476171140Core.xml
12:29:07:WU02:FS01:0x17:Saving result file badStateForceGroup1_476171140Ref.xml
12:29:11:WU02:FS01:0x17:Saving result file badStateForceGroup1_804361509Core.xml
12:29:15:WU02:FS01:0x17:Saving result file badStateForceGroup1_804361509Ref.xml
12:29:19:WU02:FS01:0x17:Saving result file badStateForceGroup2_1283209767Core.xml
12:29:25:WU02:FS01:0x17:Saving result file badStateForceGroup2_1283209767Ref.xml
12:29:32:WU02:FS01:0x17:Saving result file badStateForceGroup2_476171140Core.xml
12:29:38:WU02:FS01:0x17:Saving result file badStateForceGroup2_476171140Ref.xml
12:29:44:WU02:FS01:0x17:Saving result file badStateForceGroup2_804361509Core.xml
12:29:50:WU02:FS01:0x17:Saving result file badStateForceGroup2_804361509Ref.xml
12:29:56:WU02:FS01:0x17:Saving result file log.txt
12:29:57:WU02:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
12:29:57:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
12:29:57:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:10467 run:0 clone:135 gen:276 core:0x17 unit:0x000001bf538b3db9538bbd479616a096
12:29:57:WU02:FS01:Uploading 10.63MiB to 140.163.4.233
12:29:57:WU02:FS01:Connecting to 140.163.4.233:8080
12:30:03:WU02:FS01:Upload 17.64%
12:30:09:WU02:FS01:Upload 36.46%
12:30:15:WU02:FS01:Upload 57.04%
12:30:21:WU02:FS01:Upload 75.27%
12:30:27:WU02:FS01:Upload 96.44%
12:30:31:WU02:FS01:Upload complete
12:30:31:WU02:FS01:Server responded WORK_ACK (400)
12:30:31:WU02:FS01:Cleaning up

toTOW
Site Moderator
Posts: 6309
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 10467 (Run 0, Clone 135, Gen 276) - Bad State Max Retrie

Post by toTOW »

This WU has been completed successfully by someone else.

Your drivers are pretty old, they might be the cause of instabilities you seem to report often.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Post Reply