p14201 77 0 29 dumped after a few interruptions

Moderators: Site Moderators, FAHC Science Team

p14201 77 0 29 dumped after a few interruptions

Postby Knish » Wed Jun 17, 2020 12:14 pm

first time seeing an error on 14201 for me

system
Code: Select all
*********************** Log Started 2020-06-16T21:15:45Z ***********************
21:15:45:Trying to access database...
21:15:45:Successfully acquired database lock
21:15:46:Read GPUs.txt
21:15:48:Enabled folding slot 01: READY gpu:0:GP100GL [Tesla P100 16GB] 9526
21:15:48:****************************** FAHClient ******************************
21:15:48:        Version: 7.6.13
21:15:48:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
21:15:48:      Copyright: 2020 foldingathome.org
21:15:48:       Homepage: https://foldingathome.org/
21:15:48:           Date: Apr 28 2020
21:15:48:           Time: 04:20:16
21:15:48:       Revision: 5a652817f46116b6e135503af97f18e094414e3b
21:15:48:         Branch: master
21:15:48:       Compiler: GNU 8.3.0
21:15:48:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
21:15:48:                 -funroll-loops -fno-pie
21:15:48:       Platform: linux2 4.19.0-5-amd64
21:15:48:           Bits: 64
21:15:48:           Mode: Release
21:15:48:           Args: --child /etc/fahclient/config.xml --run-as fahclient
21:15:48:                 --pid-file=/var/run/fahclient.pid --daemon
21:15:48:         Config: /etc/fahclient/config.xml
21:15:48:******************************** CBang ********************************
21:15:48:           Date: Apr 25 2020
21:15:48:           Time: 00:07:53
21:15:48:       Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
21:15:48:         Branch: master
21:15:48:       Compiler: GNU 8.3.0
21:15:48:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
21:15:48:                 -funroll-loops -fno-pie -fPIC
21:15:48:       Platform: linux2 4.19.0-5-amd64
21:15:48:           Bits: 64
21:15:48:           Mode: Release
21:15:48:******************************* System ********************************
21:15:48:            CPU: Intel(R) Xeon(R) CPU @ 2.20GHz
21:15:48:         CPU ID: GenuineIntel Family 6 Model 79 Stepping 0
21:15:48:           CPUs: 1
21:15:48:         Memory: 2.44GiB
21:15:48:    Free Memory: 2.19GiB
21:15:48:        Threads: POSIX_THREADS
21:15:48:     OS Version: 4.19
21:15:48:    Has Battery: false
21:15:48:     On Battery: false
21:15:48:     UTC Offset: 0
21:15:48:            PID: 449
21:15:48:            CWD: /var/lib/fahclient
21:15:48:             OS: Linux 4.19.0-9-cloud-amd64 x86_64
21:15:48:        OS Arch: AMD64
21:15:48:           GPUs: 1
21:15:48:          GPU 0: Bus:0 Slot:4 Func:0 NVIDIA:5 GP100GL [Tesla P100 16GB] 9526
21:15:48:  CUDA Device 0: Platform:0 Device:0 Bus:0 Slot:4 Compute:6.0 Driver:10.0
21:15:48:OpenCL Device 0: Platform:0 Device:0 Bus:0 Slot:4 Compute:1.2 Driver:410.104
21:15:48:******************************* libFAH ********************************
21:15:48:           Date: Apr 15 2020
21:15:48:           Time: 21:43:24
21:15:48:       Revision: 216968bc7025029c841ed6e36e81a03a316890d3
21:15:48:         Branch: master
21:15:48:       Compiler: GNU 8.3.0
21:15:48:        Options: -std=c++11 -ffunction-sections -fdata-sections -O3
21:15:48:                 -funroll-loops -fno-pie
21:15:48:       Platform: linux2 4.19.0-5-amd64
21:15:48:           Bits: 64
21:15:48:           Mode: Release
21:15:48:***********************************************************************
21:15:48:<config>
21:15:48:  <!-- Client Control -->
21:15:48:  <fold-anon v='true'/>
21:15:48:
21:15:48:  <!-- HTTP Server -->
21:15:48:  <allow v='127.0.0.1 ******'/>
21:15:48:
21:15:48:  <!-- Logging -->
21:15:48:  <log-rotate-max v='99'/>
21:15:48:
21:15:48:  <!-- Network -->
21:15:48:  <proxy v=':8080'/>
21:15:48:
21:15:48:  <!-- Remote Command Server -->
21:15:48:  <password v='*****'/>
21:15:48:
21:15:48:  <!-- Slot Control -->
21:15:48:  <power v='full'/>
21:15:48:
21:15:48:  <!-- User Information -->
21:15:48:  <passkey v='*****'/>
21:15:48:  <team v='*'/>
21:15:48:  <user v='*******'/>
21:15:48:
21:15:48:  <!-- Folding Slots -->
21:15:48:  <slot id='1' type='GPU'/>
21:15:48:</config>
21:15:48:WU01:FS01:Starting
21:15:48:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 449 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
21:15:48:WU01:FS01:Started FahCore on PID 519
21:15:48:WU01:FS01:Core PID:523
21:15:48:WU01:FS01:FahCore 0x22 started


progress & errors
Code: Select all
08:53:27:WU00:FS01:Starting
08:53:27:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 706 -lifeline 449 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
08:53:27:WU00:FS01:Started FahCore on PID 2137
08:53:27:WU00:FS01:Core PID:2141
08:53:27:WU00:FS01:FahCore 0x22 started
08:53:28:WU00:FS01:0x22:*********************** Log Started 2020-06-17T08:53:27Z ***********************

08:53:28:WU00:FS01:0x22:        PID: 2141
08:53:28:WU00:FS01:0x22:        CWD: /var/lib/fahclient/work
08:53:28:WU00:FS01:0x22:         OS: Linux 4.19.0-9-cloud-amd64 x86_64
08:53:28:WU00:FS01:0x22:    OS Arch: AMD64
08:53:28:WU00:FS01:0x22:********************************************************************************
08:53:28:WU00:FS01:0x22:Project: 14201 (Run 77, Clone 0, Gen 29)
08:53:28:WU00:FS01:0x22:Unit: 0x0000002bcedfaa925eb99c63fee9f330
08:53:28:WU00:FS01:0x22:Reading tar file core.xml
08:53:28:WU00:FS01:0x22:Reading tar file integrator.xml
08:53:28:WU00:FS01:0x22:Reading tar file state.xml
08:53:29:WU00:FS01:0x22:Reading tar file system.xml
08:53:30:WU00:FS01:0x22:Digital signatures verified
08:53:30:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
08:53:30:WU00:FS01:0x22:Version 0.0.5

08:54:10:WU00:FS01:0x22:Completed 0 out of 500000 steps (0%)
08:54:10:WU00:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
08:55:49:WU00:FS01:0x22:Completed 5000 out of 500000 steps (1%)
08:57:25:WU00:FS01:0x22:Completed 10000 out of 500000 steps (2%)
08:59:00:WU00:FS01:0x22:Completed 15000 out of 500000 steps (3%)

09:26:52:WU00:FS01:0x22:Completed 100000 out of 500000 steps (20%)
09:27:17:WU00:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
09:27:17:WU00:FS01:Starting
09:27:17:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 706 -lifeline 449 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
09:27:17:WU00:FS01:Started FahCore on PID 2214
09:27:17:WU00:FS01:Core PID:2218
09:27:17:WU00:FS01:FahCore 0x22 started
09:27:18:WU00:FS01:0x22:*********************** Log Started 2020-06-17T09:27:17Z ***********************
09:27:18:WU00:FS01:0x22:         OS: Linux 4.19.0-9-cloud-amd64 x86_64
09:27:18:WU00:FS01:0x22:    OS Arch: AMD64
09:27:18:WU00:FS01:0x22:********************************************************************************
09:27:18:WU00:FS01:0x22:Project: 14201 (Run 77, Clone 0, Gen 29)
09:27:18:WU00:FS01:0x22:Unit: 0x0000002bcedfaa925eb99c63fee9f330
09:27:18:WU00:FS01:0x22:Digital signatures verified
09:27:18:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
09:27:18:WU00:FS01:0x22:Version 0.0.5
09:27:18:WU00:FS01:0x22:  Found a checkpoint file
09:27:59:WU00:FS01:0x22:Completed 100000 out of 500000 steps (20%)
09:28:00:WU00:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
09:29:38:WU00:FS01:0x22:Completed 105000 out of 500000 steps (21%)

09:47:43:WU00:FS01:0x22:Completed 160000 out of 500000 steps (32%)
09:47:48:WU00:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
09:47:49:WU00:FS01:Starting
09:47:49:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 706 -lifeline 449 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
09:47:49:WU00:FS01:Started FahCore on PID 2283
09:47:49:WU00:FS01:Core PID:2287
09:47:49:WU00:FS01:FahCore 0x22 started
09:47:50:WU00:FS01:0x22:*********************** Log Started 2020-06-17T09:47:49Z ***********************
09:47:50:WU00:FS01:0x22:        CWD: /var/lib/fahclient/work
09:47:50:WU00:FS01:0x22:         OS: Linux 4.19.0-9-cloud-amd64 x86_64
09:47:50:WU00:FS01:0x22:    OS Arch: AMD64
09:47:50:WU00:FS01:0x22:********************************************************************************
09:47:50:WU00:FS01:0x22:Project: 14201 (Run 77, Clone 0, Gen 29)
09:47:50:WU00:FS01:0x22:Unit: 0x0000002bcedfaa925eb99c63fee9f330
09:47:50:WU00:FS01:0x22:Digital signatures verified
09:47:50:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
09:47:50:WU00:FS01:0x22:Version 0.0.5
09:47:50:WU00:FS01:0x22:  Found a checkpoint file
09:48:32:WU00:FS01:0x22:Completed 140000 out of 500000 steps (28%)

10:01:39:WU00:FS01:0x22:Completed 180000 out of 500000 steps (36%)
10:01:45:WU00:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
10:01:45:WU00:FS01:Starting
10:01:45:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 706 -lifeline 449 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
10:01:45:WU00:FS01:Started FahCore on PID 2301
10:01:45:WU00:FS01:Core PID:2305
10:01:45:WU00:FS01:FahCore 0x22 started
10:01:46:WU00:FS01:0x22:*********************** Log Started 2020-06-17T10:01:46Z ***********************
10:01:46:WU00:FS01:0x22:         OS: Linux 4.19.0-9-cloud-amd64 x86_64
10:01:46:WU00:FS01:0x22:    OS Arch: AMD64
10:01:46:WU00:FS01:0x22:********************************************************************************
10:01:46:WU00:FS01:0x22:Project: 14201 (Run 77, Clone 0, Gen 29)
10:01:46:WU00:FS01:0x22:Unit: 0x0000002bcedfaa925eb99c63fee9f330
10:01:46:WU00:FS01:0x22:Digital signatures verified
10:01:46:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
10:01:46:WU00:FS01:0x22:Version 0.0.5
10:01:47:WU00:FS01:0x22:  Found a checkpoint file
10:02:30:WU00:FS01:0x22:Completed 160000 out of 500000 steps (32%)

10:08:56:WU00:FS01:0x22:Completed 180000 out of 500000 steps (36%)
10:09:08:WU00:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
10:09:08:WU00:FS01:Starting
10:09:08:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 706 -lifeline 449 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
10:09:08:WU00:FS01:Started FahCore on PID 2319
10:09:08:WU00:FS01:Core PID:2323
10:09:08:WU00:FS01:FahCore 0x22 started
10:09:09:WU00:FS01:0x22:*********************** Log Started 2020-06-17T10:09:09Z ***********************
10:09:09:WU00:FS01:0x22:         OS: Linux 4.19.0-9-cloud-amd64 x86_64
10:09:09:WU00:FS01:0x22:    OS Arch: AMD64
10:09:09:WU00:FS01:0x22:********************************************************************************
10:09:09:WU00:FS01:0x22:Project: 14201 (Run 77, Clone 0, Gen 29)
10:09:09:WU00:FS01:0x22:Unit: 0x0000002bcedfaa925eb99c63fee9f330
10:09:09:WU00:FS01:0x22:Digital signatures verified
10:09:09:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
10:09:09:WU00:FS01:0x22:Version 0.0.5
10:09:10:WU00:FS01:0x22:  Found a checkpoint file
10:09:23:WARNING:WU00:FS01:FahCore returned: BAD_FRAME_CHECKSUM (112 = 0x70)
10:09:23:WARNING:WU00:FS01:Fatal error, dumping
10:09:23:WU00:FS01:Sending unit results: id:00 state:SEND error:DUMPED project:14201 run:77 clone:0 gen:29 core:0x22 unit:0x0000002bcedfaa925eb99c63fee9f330
10:09:23:WU00:FS01:Connecting to 206.223.170.146:8080
10:09:24:WU00:FS01:Server responded WORK_ACK (400)
10:09:24:WU00:FS01:Cleaning up
10:09:24:WU01:FS01:Connecting to assign1.foldingathome.org:80
Knish
 
Posts: 75
Joined: Tue Mar 17, 2020 6:20 am

Re: p14201 77 0 29 dumped after a few interruptions

Postby Joe_H » Wed Jun 17, 2020 4:18 pm

The WU has been completed by someone else, so does not appear to be a bad one. The restarts were going okay until the last one which had this error during reading the checkpoint file:
Code: Select all
10:09:23:WARNING:WU00:FS01:FahCore returned: BAD_FRAME_CHECKSUM (112 = 0x70)

It appears the checkpoint file got corrupted in some manner.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 6451
Joined: Tue Apr 21, 2009 5:41 pm
Location: W. MA

Re: p14201 77 0 29 dumped after a few interruptions

Postby bruce » Wed Jun 17, 2020 4:24 pm

Is your GPU overheating? I would check its fans/etc. and if nothing is found, maybe a little underclocking would help. The WU Project: 14201 (Run 77, Clone 0, Gen 29) has been successfully completed by someone else.
bruce
 
Posts: 19701
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.


Return to Issues with a specific WU

Who is online

Users browsing this forum: No registered users and 1 guest

cron