13201 (Run 41, Clone 14, Gen 8) Bad

Moderators: Site Moderators, FAHC Science Team

Post Reply
Adam_Ford
Posts: 18
Joined: Sun Nov 18, 2012 3:09 am

13201 (Run 41, Clone 14, Gen 8) Bad

Post by Adam_Ford »

Code: Select all

06:14:44:WU01:FS01:Connecting to 171.67.108.45:80
06:14:44:WU01:FS01:Assigned to work server 171.67.108.102
06:14:44:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:Tahiti XT [Radeon R9 200/HD 7900/8970] from 171.67.108.102
06:14:44:WU01:FS01:Connecting to 171.67.108.102:8080
06:14:54:WU01:FS01:Downloading 6.88MiB
06:14:55:WU01:FS01:Download complete
06:14:55:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13201 run:41 clone:14 gen:8 core:0x21 unit:0x00000008ab436c66577fedfec6a2a3b9
06:15:15:WU01:FS01:Starting
06:15:15:WU01:FS01:Running FahCore: "C:\Program Files\FAHClient/FAHCoreWrapper.exe" C:\Users\Adam\AppData\Roaming\FAHClient\cores/web.stanford.edu/~pande/Win32/AMD64/ATI/R600/Core_21.fah/FahCore_21.exe -dir 01 -suffix 01 -version 704 -lifeline 4500 -checkpoint 10 -opencl-platform 0 -gpu-vendor ati -gpu 0
06:15:15:WU01:FS01:Started FahCore on PID 5828
06:15:15:WU01:FS01:Core PID:3508
06:15:15:WU01:FS01:FahCore 0x21 started
06:15:16:WU01:FS01:0x21:*********************** Log Started 2016-11-11T06:15:16Z ***********************
06:15:16:WU01:FS01:0x21:Project: 13201 (Run 41, Clone 14, Gen 8)
06:15:16:WU01:FS01:0x21:Unit: 0x00000008ab436c66577fedfec6a2a3b9
06:15:16:WU01:FS01:0x21:CPU: 0x00000000000000000000000000000000
06:15:16:WU01:FS01:0x21:Machine: 1
06:15:16:WU01:FS01:0x21:Reading tar file core.xml
06:15:16:WU01:FS01:0x21:Reading tar file integrator.xml
06:15:16:WU01:FS01:0x21:Reading tar file state.xml
06:15:17:WU01:FS01:0x21:Reading tar file system.xml
06:15:19:WU01:FS01:0x21:Digital signatures verified
06:15:19:WU01:FS01:0x21:Folding@home GPU Core21 Folding@home Core
06:15:19:WU01:FS01:0x21:Version 0.0.17
06:18:12:WU01:FS01:0x21:Completed 0 out of 2000000 steps (0%)
06:18:12:WU01:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
06:21:54:WU01:FS01:0x21:Completed 20000 out of 2000000 steps (1%)
06:24:59:WU01:FS01:0x21:Completed 40000 out of 2000000 steps (2%)
06:28:03:WU01:FS01:0x21:Completed 60000 out of 2000000 steps (3%)
06:29:46:WU01:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
06:32:51:WU01:FS01:0x21:Completed 20000 out of 2000000 steps (1%)
06:35:55:WU01:FS01:0x21:Completed 40000 out of 2000000 steps (2%)
06:39:00:WU01:FS01:0x21:Completed 60000 out of 2000000 steps (3%)
06:42:04:WU01:FS01:0x21:Completed 80000 out of 2000000 steps (4%)
06:45:08:WU01:FS01:0x21:Completed 100000 out of 2000000 steps (5%)
06:46:39:WU01:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
06:49:44:WU01:FS01:0x21:Completed 20000 out of 2000000 steps (1%)
06:52:51:WU01:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
06:52:51:WU01:FS01:0x21:ERROR:Max Retries Reached
06:52:51:WU01:FS01:0x21:Saving result file logfile_01.txt
06:52:51:WU01:FS01:0x21:Saving result file badstate-0.xml
06:52:56:WU01:FS01:0x21:Saving result file badstate-1.xml
06:53:01:WU01:FS01:0x21:Saving result file badstate-2.xml
06:53:06:WU01:FS01:0x21:Saving result file log.txt
06:53:07:WU01:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
06:53:08:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
06:53:08:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13201 run:41 clone:14 gen:8 core:0x21 unit:0x00000008ab436c66577fedfec6a2a3b9
06:53:08:WU01:FS01:Uploading 7.88KiB to 171.67.108.102
06:53:08:WU01:FS01:Connecting to 171.67.108.102:8080
06:53:08:WU01:FS01:Upload complete
06:53:08:WU01:FS01:Server responded WORK_ACK (400)
06:53:08:WU01:FS01:Cleaning up
Image
toTOW
Site Moderator
Posts: 6296
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 13201 (Run 41, Clone 14, Gen 8) Bad

Post by toTOW »

Very likely bad indeed, there are two reports of early failures in the DB ...
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
chaosdsm
Posts: 36
Joined: Tue Apr 10, 2012 4:32 pm
Hardware configuration: Current System folding on client 7.4.4 on GPU only
ASRock FM2A88X Extreme 6+ motherboard
AMD Athlon X4 860K CPU @ 4GHz
Watercooled EVGA GeForce GTX 970 FTW ACX 2.0 - typically folds @ 1454MHz Boost Core / 1502Mhz Memory
Windows 10 64bit
Samsung 850 EVO 500GB SSD - OS
2TB Western Digital SATA III HDD - storage
16GB (2x8GB) DDR3-1866 G.SKILL Ripjaws Z Series RAM
Corsair Air540 case
Antec EarthWatts 650 Watt PSU

Average folding temps on GPU ~60C (+/- 7C depending on WU being folded)

Re: 13201 (Run 41, Clone 14, Gen 8) Bad

Post by chaosdsm »

FAULTY project:13201 run:16 clone:6 gen:12
Failed 3 times, just seconds after completing step 20 each time

Code: Select all

05:13:38:WU01:FS01:0x21:Project: 13201 (Run 16, Clone 6, Gen 12)
05:13:38:WU01:FS01:0x21:Unit: 0x00000010ab436c66577fedfea7da5cd8
05:13:38:WU01:FS01:0x21:CPU: 0x00000000000000000000000000000000
05:13:38:WU01:FS01:0x21:Machine: 1
05:13:38:WU01:FS01:0x21:Reading tar file core.xml
05:13:38:WU01:FS01:0x21:Reading tar file integrator.xml
05:13:38:WU01:FS01:0x21:Reading tar file state.xml
05:13:39:WU01:FS01:0x21:Reading tar file system.xml
05:13:41:WU01:FS01:0x21:Digital signatures verified
05:13:41:WU01:FS01:0x21:Folding@home GPU Core21 Folding@home Core
05:13:41:WU01:FS01:0x21:Version 0.0.17
05:13:44:WU00:FS01:Upload 27.81%
05:13:50:WU00:FS01:Upload 34.64%
05:13:56:WU00:FS01:Upload 40.98%
05:14:02:WU00:FS01:Upload 47.33%
05:14:08:WU00:FS01:Upload 53.67%
05:14:14:WU00:FS01:Upload 60.01%
05:14:20:WU00:FS01:Upload 66.84%
05:14:26:WU00:FS01:Upload 73.18%
05:14:32:WU00:FS01:Upload 79.53%
05:14:38:WU00:FS01:Upload 85.87%
05:14:44:WU00:FS01:Upload 92.70%
05:14:50:WU00:FS01:Upload 99.04%
05:14:53:WU00:FS01:Upload complete
05:14:53:WU00:FS01:Server responded WORK_ACK (400)
05:14:53:WU00:FS01:Final credit estimate, 37842.00 points
05:14:53:WU00:FS01:Cleaning up
05:15:03:WU01:FS01:0x21:Completed 0 out of 2000000 steps (0%)
05:15:03:WU01:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
05:17:34:WU01:FS01:0x21:Completed 20000 out of 2000000 steps (1%)
05:20:01:WU01:FS01:0x21:Completed 40000 out of 2000000 steps (2%)
05:22:29:WU01:FS01:0x21:Completed 60000 out of 2000000 steps (3%)
05:24:57:WU01:FS01:0x21:Completed 80000 out of 2000000 steps (4%)
05:27:25:WU01:FS01:0x21:Completed 100000 out of 2000000 steps (5%)
05:29:53:WU01:FS01:0x21:Completed 120000 out of 2000000 steps (6%)
05:32:21:WU01:FS01:0x21:Completed 140000 out of 2000000 steps (7%)
05:34:49:WU01:FS01:0x21:Completed 160000 out of 2000000 steps (8%)
05:37:17:WU01:FS01:0x21:Completed 180000 out of 2000000 steps (9%)
05:39:44:WU01:FS01:0x21:Completed 200000 out of 2000000 steps (10%)
05:42:19:WU01:FS01:0x21:Completed 220000 out of 2000000 steps (11%)
05:44:47:WU01:FS01:0x21:Completed 240000 out of 2000000 steps (12%)
05:47:15:WU01:FS01:0x21:Completed 260000 out of 2000000 steps (13%)
05:49:42:WU01:FS01:0x21:Completed 280000 out of 2000000 steps (14%)
05:52:10:WU01:FS01:0x21:Completed 300000 out of 2000000 steps (15%)
05:54:39:WU01:FS01:0x21:Completed 320000 out of 2000000 steps (16%)
05:57:07:WU01:FS01:0x21:Completed 340000 out of 2000000 steps (17%)
05:59:35:WU01:FS01:0x21:Completed 360000 out of 2000000 steps (18%)
06:02:03:WU01:FS01:0x21:Completed 380000 out of 2000000 steps (19%)
06:04:31:WU01:FS01:0x21:Completed 400000 out of 2000000 steps (20%)
06:04:37:WU01:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
06:07:05:WU01:FS01:0x21:Completed 220000 out of 2000000 steps (11%)
06:09:33:WU01:FS01:0x21:Completed 240000 out of 2000000 steps (12%)
06:12:01:WU01:FS01:0x21:Completed 260000 out of 2000000 steps (13%)
06:14:29:WU01:FS01:0x21:Completed 280000 out of 2000000 steps (14%)
06:16:57:WU01:FS01:0x21:Completed 300000 out of 2000000 steps (15%)
06:19:25:WU01:FS01:0x21:Completed 320000 out of 2000000 steps (16%)
06:21:53:WU01:FS01:0x21:Completed 340000 out of 2000000 steps (17%)
06:24:21:WU01:FS01:0x21:Completed 360000 out of 2000000 steps (18%)
06:26:49:WU01:FS01:0x21:Completed 380000 out of 2000000 steps (19%)
06:29:17:WU01:FS01:0x21:Completed 400000 out of 2000000 steps (20%)
06:29:23:WU01:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
06:31:51:WU01:FS01:0x21:Completed 220000 out of 2000000 steps (11%)
06:34:18:WU01:FS01:0x21:Completed 240000 out of 2000000 steps (12%)
06:36:46:WU01:FS01:0x21:Completed 260000 out of 2000000 steps (13%)
06:39:15:WU01:FS01:0x21:Completed 280000 out of 2000000 steps (14%)
06:41:43:WU01:FS01:0x21:Completed 300000 out of 2000000 steps (15%)
06:44:11:WU01:FS01:0x21:Completed 320000 out of 2000000 steps (16%)
06:46:39:WU01:FS01:0x21:Completed 340000 out of 2000000 steps (17%)
06:49:07:WU01:FS01:0x21:Completed 360000 out of 2000000 steps (18%)
06:51:35:WU01:FS01:0x21:Completed 380000 out of 2000000 steps (19%)
06:54:03:WU01:FS01:0x21:Completed 400000 out of 2000000 steps (20%)
06:54:09:WU01:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
06:54:09:WU01:FS01:0x21:ERROR:Max Retries Reached
06:54:09:WU01:FS01:0x21:Saving result file logfile_01.txt
06:54:09:WU01:FS01:0x21:Saving result file badstate-0.xml
06:54:11:WU01:FS01:0x21:Saving result file badstate-1.xml
06:54:13:WU01:FS01:0x21:Saving result file badstate-2.xml
06:54:15:WU01:FS01:0x21:Saving result file log.txt
06:54:15:WU01:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
06:54:16:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
06:54:16:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13201 run:16 clone:6 gen:12
Folding rig: EVGA Z370 Classified K w/i7-8700 & Hyper 212 EVO - WIN7 PRO 64bit - EVGA 1660 Ti XC Gaming (soon to be water cooled) - Corsair Vengeance 16GB DDR4-2666 dual channel memory - Samsung 970 Pro 512GB M.2 SSD - EVGA SuperNova 850 Platinum PSU
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: 13201 (Run 41, Clone 14, Gen 8) Bad

Post by Joe_H »

Not a bad WU, it has been completed by another folder:
Hi ******** (team ******),
Your WU (P13201 R16 C6 G12) was added to the stats database on 2016-11-16 02:08:33 for 74881.7 points of credit.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
chaosdsm
Posts: 36
Joined: Tue Apr 10, 2012 4:32 pm
Hardware configuration: Current System folding on client 7.4.4 on GPU only
ASRock FM2A88X Extreme 6+ motherboard
AMD Athlon X4 860K CPU @ 4GHz
Watercooled EVGA GeForce GTX 970 FTW ACX 2.0 - typically folds @ 1454MHz Boost Core / 1502Mhz Memory
Windows 10 64bit
Samsung 850 EVO 500GB SSD - OS
2TB Western Digital SATA III HDD - storage
16GB (2x8GB) DDR3-1866 G.SKILL Ripjaws Z Series RAM
Corsair Air540 case
Antec EarthWatts 650 Watt PSU

Average folding temps on GPU ~60C (+/- 7C depending on WU being folded)

Re: 13201 (Run 41, Clone 14, Gen 8) Bad

Post by chaosdsm »

Joe_H wrote:Not a bad WU, it has been completed by another folder:
Hi ******** (team ******),
Your WU (P13201 R16 C6 G12) was added to the stats database on 2016-11-16 02:08:33 for 74881.7 points of credit.
Interesting that it failed 3 times in virtually the exact same spot. Wonder if it could be a driver issue? I read that this core has issues with newer drivers (375.xx), however I'm using driver version 372.70???
Folding rig: EVGA Z370 Classified K w/i7-8700 & Hyper 212 EVO - WIN7 PRO 64bit - EVGA 1660 Ti XC Gaming (soon to be water cooled) - Corsair Vengeance 16GB DDR4-2666 dual channel memory - Samsung 970 Pro 512GB M.2 SSD - EVGA SuperNova 850 Platinum PSU
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 13201 (Run 41, Clone 14, Gen 8) Bad

Post by bruce »

I doubt it's a driver issue, but I can't rule it out.

All FAH projects run a "sanity check" periodically designed to terminate projects containing errors (the idea being that if the simulation is going to explode soon or be discarded anyway, it's a good idea to abort the run before it wastes any more of your time. The concept of "periodically" varies depending on the project, but 20% sounds like a possible frequency to run the sanity check.

Since the project was complete by somebody else, that does suggest that your hardware made a error that was detected at 20%. If that's true, the first places to start are (A)drivers, (B)overclocking and (C)overheating.

Some errors are recoverable and are worth a retry. When FAH suspects that a retry is worth it, it will retry 3 times before giving up.
chaosdsm
Posts: 36
Joined: Tue Apr 10, 2012 4:32 pm
Hardware configuration: Current System folding on client 7.4.4 on GPU only
ASRock FM2A88X Extreme 6+ motherboard
AMD Athlon X4 860K CPU @ 4GHz
Watercooled EVGA GeForce GTX 970 FTW ACX 2.0 - typically folds @ 1454MHz Boost Core / 1502Mhz Memory
Windows 10 64bit
Samsung 850 EVO 500GB SSD - OS
2TB Western Digital SATA III HDD - storage
16GB (2x8GB) DDR3-1866 G.SKILL Ripjaws Z Series RAM
Corsair Air540 case
Antec EarthWatts 650 Watt PSU

Average folding temps on GPU ~60C (+/- 7C depending on WU being folded)

Re: 13201 (Run 41, Clone 14, Gen 8) Bad

Post by chaosdsm »

bruce wrote:I doubt it's a driver issue, but I can't rule it out.

All FAH projects run a "sanity check" periodically designed to terminate projects containing errors (the idea being that if the simulation is going to explode soon or be discarded anyway, it's a good idea to abort the run before it wastes any more of your time. The concept of "periodically" varies depending on the project, but 20% sounds like a possible frequency to run the sanity check.

Since the project was complete by somebody else, that does suggest that your hardware made a error that was detected at 20%. If that's true, the first places to start are (A)drivers, (B)overclocking and (C)overheating.

Some errors are recoverable and are worth a retry. When FAH suspects that a retry is worth it, it will retry 3 times before giving up.
Temps are awesome running core 21 units, highest recorded temp on any core 21 WU is 52C, GPU reaches up to 68C work units with other cores. So far, this one WU is the only one to fail with this GPU. There may be more showing on the Stanford side, but were due to me testing the overclocking on this card far past factory speed. Boost clock when folding core 21 work units is 1215Mhz. Yes this is technically overclocked vs reference speed, but on all other cores, this card runs at 1390MHz - 1454MHz boost clock.

Seems like core 21 may still have issues... core itself seems to have crashed twice, restarted, and finished the WU it was working on before the crash:

Code: Select all

00:31:31:WARNING:WU01:FS01:FahCore returned an unknown error code which probably indicates that it crashed
00:31:31:WARNING:WU01:FS01:FahCore returned: UNKNOWN_ENUM (127 = 0x7f)
00:31:31:WU01:FS01:Starting
00:31:31:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/chaosdsm/Documents/Folding/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21.exe -dir 01 -suffix 01 -version 704 -lifeline 8528 -checkpoint 7 -gpu 0 -gpu-vendor nvidia
00:31:31:WU01:FS01:Started FahCore on PID 5420
00:31:31:WU01:FS01:Core PID:6564
00:31:31:WU01:FS01:FahCore 0x21 started
00:31:32:WU01:FS01:0x21:*********************** Log Started 2016-11-22T00:31:31Z ***********************
00:31:32:WU01:FS01:0x21:Project: 11708 (Run 0, Clone 194, Gen 8)
00:31:32:WU01:FS01:0x21:Unit: 0x000000118ca304e75814df28f1225fe3
00:31:32:WU01:FS01:0x21:CPU: 0x00000000000000000000000000000000
00:31:32:WU01:FS01:0x21:Machine: 1
00:31:32:WU01:FS01:0x21:Digital signatures verified
00:31:32:WU01:FS01:0x21:Folding@home GPU Core21 Folding@home Core
00:31:32:WU01:FS01:0x21:Version 0.0.17
00:31:32:WU01:FS01:0x21:  Found a checkpoint file
00:31:40:WU01:FS01:0x21:Completed 6250000 out of 7500000 steps (83%)
00:31:40:WU01:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
00:34:29:WU01:FS01:0x21:Completed 6300000 out of 7500000 steps (84%)
00:38:39:WU01:FS01:0x21:Completed 6375000 out of 7500000 steps (85%)
00:42:50:WU01:FS01:0x21:Completed 6450000 out of 7500000 steps (86%)
00:47:02:WU01:FS01:0x21:Completed 6525000 out of 7500000 steps (87%)
00:51:13:WU01:FS01:0x21:Completed 6600000 out of 7500000 steps (88%)
00:55:23:WU01:FS01:0x21:Completed 6675000 out of 7500000 steps (89%)
00:59:31:WU01:FS01:0x21:Completed 6750000 out of 7500000 steps (90%)
01:03:32:WU01:FS01:0x21:Completed 6825000 out of 7500000 steps (91%)
01:07:36:WU01:FS01:0x21:Completed 6900000 out of 7500000 steps (92%)
01:11:41:WU01:FS01:0x21:Completed 6975000 out of 7500000 steps (93%)
01:15:45:WU01:FS01:0x21:Completed 7050000 out of 7500000 steps (94%)
01:19:46:WU01:FS01:0x21:Completed 7125000 out of 7500000 steps (95%)
01:23:50:WU01:FS01:0x21:Completed 7200000 out of 7500000 steps (96%)
01:27:54:WU01:FS01:0x21:Completed 7275000 out of 7500000 steps (97%)
01:31:56:WU01:FS01:0x21:Completed 7350000 out of 7500000 steps (98%)
01:35:57:WU01:FS01:0x21:Completed 7425000 out of 7500000 steps (99%)
01:40:00:WU01:FS01:0x21:Completed 7500000 out of 7500000 steps (100%)
01:40:01:WU00:FS01:Connecting to 171.67.108.45:80
01:40:01:WU00:FS01:Assigned to work server 171.67.108.159
01:40:01:WU00:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GM204 [GeForce GTX 970] from 171.67.108.159
01:40:01:WU00:FS01:Connecting to 171.67.108.159:8080
01:40:03:WU01:FS01:0x21:Saving result file logfile_01.txt
01:40:03:WU01:FS01:0x21:Saving result file checkpointState.xml
01:40:03:WU00:FS01:Downloading 22.85MiB
01:40:03:WU01:FS01:0x21:Saving result file checkpt.crc
01:40:03:WU01:FS01:0x21:Saving result file log.txt
01:40:03:WU01:FS01:0x21:Saving result file positions.xtc
01:40:03:WU01:FS01:0x21:Folding@home Core Shutdown: FINISHED_UNIT
01:40:04:WU01:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
01:40:04:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11708 run:0 clone:194 gen:8 core:0x21

Code: Select all

22:01:53:WU01:FS01:0x21:Completed 4500000 out of 7500000 steps (60%)
22:02:22:WARNING:WU01:FS01:FahCore returned an unknown error code which probably indicates that it crashed
22:02:22:WARNING:WU01:FS01:FahCore returned: UNKNOWN_ENUM (127 = 0x7f)
22:02:22:WU01:FS01:Starting
22:02:22:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/chaosdsm/Documents/Folding/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21.exe -dir 01 -suffix 01 -version 704 -lifeline 8528 -checkpoint 7 -gpu 0 -gpu-vendor nvidia
22:02:22:WU01:FS01:Started FahCore on PID 9000
22:02:22:WU01:FS01:Core PID:2208
22:02:22:WU01:FS01:FahCore 0x21 started
22:02:23:WU01:FS01:0x21:*********************** Log Started 2016-11-22T22:02:23Z ***********************
22:02:23:WU01:FS01:0x21:Project: 11709 (Run 3, Clone 31, Gen 48)
22:02:23:WU01:FS01:0x21:Unit: 0x000000458ca304f357f594b8cb203d05
22:02:23:WU01:FS01:0x21:CPU: 0x00000000000000000000000000000000
22:02:23:WU01:FS01:0x21:Machine: 1
22:02:23:WU01:FS01:0x21:Digital signatures verified
22:02:23:WU01:FS01:0x21:Folding@home GPU Core21 Folding@home Core
22:02:23:WU01:FS01:0x21:Version 0.0.17
22:02:23:WU01:FS01:0x21:  Found a checkpoint file
22:02:31:WU01:FS01:0x21:Completed 4500000 out of 7500000 steps (60%)
22:02:31:WU01:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
22:06:51:WU01:FS01:0x21:Completed 4575000 out of 7500000 steps (61%)
22:11:13:WU01:FS01:0x21:Completed 4650000 out of 7500000 steps (62%)
22:15:34:WU01:FS01:0x21:Completed 4725000 out of 7500000 steps (63%)
22:19:57:WU01:FS01:0x21:Completed 4800000 out of 7500000 steps (64%)
22:24:21:WU01:FS01:0x21:Completed 4875000 out of 7500000 steps (65%)
22:28:46:WU01:FS01:0x21:Completed 4950000 out of 7500000 steps (66%)
22:33:11:WU01:FS01:0x21:Completed 5025000 out of 7500000 steps (67%)
22:37:36:WU01:FS01:0x21:Completed 5100000 out of 7500000 steps (68%)
******************************* Date: 2016-11-22 *******************************
22:41:56:WU01:FS01:0x21:Completed 5175000 out of 7500000 steps (69%)
22:46:18:WU01:FS01:0x21:Completed 5250000 out of 7500000 steps (70%)
22:50:45:WU01:FS01:0x21:Completed 5325000 out of 7500000 steps (71%)
22:55:11:WU01:FS01:0x21:Completed 5400000 out of 7500000 steps (72%)
22:59:36:WU01:FS01:0x21:Completed 5475000 out of 7500000 steps (73%)
23:04:05:WU01:FS01:0x21:Completed 5550000 out of 7500000 steps (74%)
23:08:30:WU01:FS01:0x21:Completed 5625000 out of 7500000 steps (75%)
23:12:54:WU01:FS01:0x21:Completed 5700000 out of 7500000 steps (76%)
23:17:19:WU01:FS01:0x21:Completed 5775000 out of 7500000 steps (77%)
23:21:39:WU01:FS01:0x21:Completed 5850000 out of 7500000 steps (78%)
23:26:05:WU01:FS01:0x21:Completed 5925000 out of 7500000 steps (79%)
23:30:27:WU01:FS01:0x21:Completed 6000000 out of 7500000 steps (80%)
23:34:52:WU01:FS01:0x21:Completed 6075000 out of 7500000 steps (81%)
23:39:14:WU01:FS01:0x21:Completed 6150000 out of 7500000 steps (82%)
23:43:33:WU01:FS01:0x21:Completed 6225000 out of 7500000 steps (83%)
23:47:46:WU01:FS01:0x21:Completed 6300000 out of 7500000 steps (84%)
23:51:59:WU01:FS01:0x21:Completed 6375000 out of 7500000 steps (85%)
23:56:16:WU01:FS01:0x21:Completed 6450000 out of 7500000 steps (86%)
00:00:40:WU01:FS01:0x21:Completed 6525000 out of 7500000 steps (87%)
00:04:58:WU01:FS01:0x21:Completed 6600000 out of 7500000 steps (88%)
00:09:16:WU01:FS01:0x21:Completed 6675000 out of 7500000 steps (89%)
00:13:35:WU01:FS01:0x21:Completed 6750000 out of 7500000 steps (90%)
00:17:56:WU01:FS01:0x21:Completed 6825000 out of 7500000 steps (91%)
00:22:14:WU01:FS01:0x21:Completed 6900000 out of 7500000 steps (92%)
00:26:32:WU01:FS01:0x21:Completed 6975000 out of 7500000 steps (93%)
00:30:52:WU01:FS01:0x21:Completed 7050000 out of 7500000 steps (94%)
00:35:12:WU01:FS01:0x21:Completed 7125000 out of 7500000 steps (95%)
00:39:30:WU01:FS01:0x21:Completed 7200000 out of 7500000 steps (96%)
00:43:51:WU01:FS01:0x21:Completed 7275000 out of 7500000 steps (97%)
00:48:08:WU01:FS01:0x21:Completed 7350000 out of 7500000 steps (98%)
00:52:26:WU01:FS01:0x21:Completed 7425000 out of 7500000 steps (99%)
00:56:43:WU01:FS01:0x21:Completed 7500000 out of 7500000 steps (100%)
00:56:44:WU00:FS01:Connecting to 171.67.108.45:80
00:56:45:WU00:FS01:Assigned to work server 171.64.65.84
00:56:45:WU00:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GM204 [GeForce GTX 970] from 171.64.65.84
00:56:45:WU00:FS01:Connecting to 171.64.65.84:8080
00:56:46:WU00:FS01:Downloading 2.99MiB
00:56:46:WU01:FS01:0x21:Saving result file logfile_01.txt
00:56:46:WU01:FS01:0x21:Saving result file checkpointState.xml
00:56:49:WU01:FS01:0x21:Saving result file checkpt.crc
00:56:49:WU01:FS01:0x21:Saving result file log.txt
00:56:49:WU01:FS01:0x21:Saving result file positions.xtc
00:56:51:WU01:FS01:0x21:Folding@home Core Shutdown: FINISHED_UNIT
00:56:52:WU00:FS01:Download 68.91%
00:56:52:WU01:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
00:56:52:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11709 run:3 clone:31 gen:48 core:0x21
Plus core 21 only shows about 50 - 60% TDP utilization vs 90 to 100% for other cores.
Folding rig: EVGA Z370 Classified K w/i7-8700 & Hyper 212 EVO - WIN7 PRO 64bit - EVGA 1660 Ti XC Gaming (soon to be water cooled) - Corsair Vengeance 16GB DDR4-2666 dual channel memory - Samsung 970 Pro 512GB M.2 SSD - EVGA SuperNova 850 Platinum PSU
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 13201 (Run 41, Clone 14, Gen 8) Bad

Post by bruce »

I really doubt that "Core_21 has some issues".

Both Software and Computers are supposed to be extremely consistent, producing identical results if you re-run the same program with the same data. Driver problems can also produce unexpected results, but the drivers are very good about detecting internal problems and issuing error messages. Nevertheless, when you overclock hardware or it gets too hot or otherwise it is run outside it's intended set of conditions, it will produce errors.

Core_21 does a validity check periodically to see if there have been calculation errors. I suspect that Project: 13201 only runs that validity check every 20% so any error during the first 20% might not be detected until it reaches that point. Unfortunately there's no indication if all three failures were the same ... only that after retrying 3x, the likelihood of a successful completion on the fourth try is pretty slim.
Post Reply