BAD_FRAME_CHECKSUM

Moderators: Site Moderators, FAHC Science Team

Post Reply
SteveWillis
Posts: 409
Joined: Fri Apr 15, 2016 12:42 am
Hardware configuration: PC 1:
Linux Mint 17.3
three gtx 1080 GPUs One on a powered header
Motherboard = [MB-AM3-AS-SB-990FXR2] qty 1 Asus Sabertooth 990FX(+59.99)
CPU = [CPU-AM3-FX-8320BR] qty 1 AMD FX 8320 Eight Core 3.5GHz(+41.99)

PC2:
Linux Mint 18
Open air case
Motherboard: ASUS Crosshair V Formula-Z AM3+ AMD 990FX SATA 6Gb/s USB 3.0 ATX AMD
AMD FD6300WMHKBOX FX-6300 6-Core Processor Black Edition with Cooler Master Hyper 212 EVO - CPU Cooler with 120mm PWM Fan
three gtx 1080,
one gtx 1080 TI on a powered header

BAD_FRAME_CHECKSUM

Post by SteveWillis »

Just a few days ago, and now on a different machine today, right after a reboot on both I got a BAD_FRAME_CHECKSUM error.

00:45:09:WARNING:WU01:FS00:FahCore returned: BAD_FRAME_CHECKSUM (112 = 0x70)
00:45:09:WARNING:WU01:FS00:Fatal error, dumping

I don't think I ever got that before, definitely not within the scope of my available log files.
I've been using the same Nvidia driver on both, 370.28, for quite a while. I did determine that it was different projects. I was just wondering if it was something I had any control over. I couldn't find anything searching the forum.
machine 2 cpu is admittedly overclocked but machine 1 is not.

machine 1:

Code: Select all

steve@linux01 /var/lib/fahclient $ head -500 log.txt
*********************** Log Started 2017-04-28T00:41:05Z ***********************
00:41:05:************************* Folding@home Client *************************
00:41:05:    Website: http://folding.stanford.edu/
00:41:05:  Copyright: (c) 2009-2014 Stanford University
00:41:05:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
00:41:05:       Args: --child --lifeline 1830 /etc/fahclient/config.xml --run-as
00:41:05:             fahclient --pid-file=/var/run/fahclient.pid --daemon
00:41:05:     Config: /etc/fahclient/config.xml
00:41:05:******************************** Build ********************************
00:41:05:    Version: 7.4.4
00:41:05:       Date: Mar 4 2014
00:41:05:       Time: 12:02:38
00:41:05:    SVN Rev: 4130
00:41:05:     Branch: fah/trunk/client
00:41:05:   Compiler: GNU 4.4.7
00:41:05:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
00:41:05:             -fno-unsafe-math-optimizations -msse2
00:41:05:   Platform: linux2 3.2.0-1-amd64
00:41:05:       Bits: 64
00:41:05:       Mode: Release
00:41:05:******************************* System ********************************
00:41:05:        CPU: AMD FX(tm)-8320 Eight-Core Processor
00:41:05:     CPU ID: AuthenticAMD Family 21 Model 2 Stepping 0
00:41:05:       CPUs: 8
00:41:05:     Memory: 31.32GiB
00:41:05:Free Memory: 30.64GiB
00:41:05:    Threads: POSIX_THREADS
00:41:05: OS Version: 3.19
00:41:05:Has Battery: false
00:41:05: On Battery: false
00:41:05: UTC Offset: -5
00:41:05:        PID: 1832
00:41:05:        CWD: /var/lib/fahclient
00:41:05:         OS: Linux 3.19.0-32-generic x86_64
00:41:05:    OS Arch: AMD64
00:41:05:       GPUs: 6
00:41:05:      GPU 0: NVIDIA:5 GP104 [GeForce GTX 1080]
00:41:05:      GPU 1: UNSUPPORTED: NV3 [PCI]
00:41:05:      GPU 2: NVIDIA:5 GP104 [GeForce GTX 1080]
00:41:05:      GPU 3: UNSUPPORTED: NV3 [PCI]
00:41:05:      GPU 4: NVIDIA:5 GP104 [GeForce GTX 1080]
00:41:05:      GPU 5: UNSUPPORTED: NV3 [PCI]
00:41:05:       CUDA: 6.1
00:41:05:CUDA Driver: 8000
00:41:05:***********************************************************************
00:41:05:<config>
00:41:05:  <!-- Client Control -->
00:41:05:  <fold-anon v='true'/>
00:41:05:
00:41:05:  <!-- Folding Core -->
00:41:05:  <checkpoint v='30'/>
00:41:05:
00:41:05:  <!-- Folding Slot Configuration -->
00:41:05:  <cause v='HUNTINGTONS'/>
00:41:05:
00:41:05:  <!-- Network -->
00:41:05:  <proxy v=':8080'/>
00:41:05:
00:41:05:  <!-- Slot Control -->
00:41:05:  <power v='full'/>
00:41:05:
00:41:05:  <!-- User Information -->
00:41:05:  <passkey v='********************************'/>
00:41:05:  <team v='224497'/>
00:41:05:  <user v='DarthMouse_ALL_1GD5nCZbh7gNo1SESPLT24xEd2Jsu4rTP9'/>
00:41:05:
00:41:05:  <!-- Work Unit Control -->
00:41:05:  <next-unit-percentage v='100'/>
00:41:05:
00:41:05:  <!-- Folding Slots -->
00:41:05:  <slot id='0' type='GPU'/>
00:41:05:  <slot id='1' type='GPU'/>
00:41:05:  <slot id='2' type='GPU'/>
00:41:05:</config>
00:41:05:Switching to user fahclient
00:41:05:Trying to access database...
00:41:05:Successfully acquired database lock
00:41:05:Enabled folding slot 00: READY gpu:0:GP104 [GeForce GTX 1080]
00:41:05:Enabled folding slot 01: READY gpu:2:GP104 [GeForce GTX 1080]
00:41:05:Enabled folding slot 02: READY gpu:4:GP104 [GeForce GTX 1080]
00:41:05:WU02:FS01:Starting
00:41:05:WU02:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 02 -suffix 01 -version 704 -lifeline 1832 -checkpoint 30 -gpu 1 -gpu-vendor nvidia
00:41:05:WU02:FS01:Started FahCore on PID 1849
00:41:05:WU02:FS01:Core PID:1853
00:41:05:WU02:FS01:FahCore 0x21 started
00:41:06:WU01:FS00:Starting
00:41:06:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 01 -suffix 01 -version 704 -lifeline 1832 -checkpoint 30 -gpu 0 -gpu-vendor nvidia
00:41:06:WU01:FS00:Started FahCore on PID 1875
00:41:06:WU01:FS00:Core PID:1879
00:41:06:WU01:FS00:FahCore 0x21 started
00:41:07:WU00:FS02:Starting
00:41:07:WU00:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 00 -suffix 01 -version 704 -lifeline 1832 -checkpoint 30 -gpu 2 -gpu-vendor nvidia
00:41:07:WU00:FS02:Started FahCore on PID 1892
00:41:07:WU00:FS02:Core PID:1898
00:41:07:WU00:FS02:FahCore 0x21 started
00:41:08:WU01:FS00:0x21:*********************** Log Started 2017-04-28T00:41:07Z ***********************
00:41:08:WU01:FS00:0x21:Project: 9177 (Run 1, Clone 9, Gen 159)
00:41:08:WU01:FS00:0x21:Unit: 0x000000cfab436c6957b24c299c937b2a
00:41:08:WU01:FS00:0x21:CPU: 0x00000000000000000000000000000000
00:41:08:WU01:FS00:0x21:Machine: 0
00:41:08:WU01:FS00:0x21:Digital signatures verified
00:41:08:WU01:FS00:0x21:Folding@home GPU Core21 Folding@home Core
00:41:08:WU01:FS00:0x21:Version 0.0.18
00:41:08:WU01:FS00:0x21:  Found a checkpoint file
00:41:08:WU00:FS02:0x21:*********************** Log Started 2017-04-28T00:41:07Z ***********************
00:41:08:WU00:FS02:0x21:Project: 11407 (Run 3, Clone 5, Gen 459)
00:41:08:WU00:FS02:0x21:Unit: 0x000002758ca304f25686b25de63acd8e
00:41:08:WU00:FS02:0x21:CPU: 0x00000000000000000000000000000000
00:41:08:WU00:FS02:0x21:Machine: 2
00:41:08:WU00:FS02:0x21:Digital signatures verified
00:41:08:WU00:FS02:0x21:Folding@home GPU Core21 Folding@home Core
00:41:08:WU00:FS02:0x21:Version 0.0.18
00:41:08:WU00:FS02:0x21:  Found a checkpoint file
00:41:08:WU02:FS01:0x21:*********************** Log Started 2017-04-28T00:41:07Z ***********************
00:41:08:WU02:FS01:0x21:Project: 11403 (Run 13, Clone 28, Gen 59)
00:41:08:WU02:FS01:0x21:Unit: 0x000000488ca304f255ed4fc71b35c204
00:41:08:WU02:FS01:0x21:CPU: 0x00000000000000000000000000000000
00:41:08:WU02:FS01:0x21:Machine: 1
00:41:08:WU02:FS01:0x21:Digital signatures verified
00:41:08:WU02:FS01:0x21:Folding@home GPU Core21 Folding@home Core
00:41:08:WU02:FS01:0x21:Version 0.0.18
00:41:08:WU02:FS01:0x21:  Found a checkpoint file
00:41:28:WU00:FS02:0x21:Completed 3500000 out of 5000000 steps (70%)
00:41:28:WU00:FS02:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
00:41:29:WU01:FS00:0x21:Completed 1200000 out of 2500000 steps (48%)
00:41:29:WU01:FS00:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
00:41:29:WU02:FS01:0x21:Completed 2625000 out of 5000000 steps (52%)
00:41:29:WU02:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
00:42:22:WU01:FS00:0x21:Completed 1225000 out of 2500000 steps (49%)
00:42:30:WU02:FS01:0x21:Completed 2650000 out of 5000000 steps (53%)
00:43:04:WU00:FS02:0x21:Completed 3550000 out of 5000000 steps (71%)
00:43:16:WU01:FS00:0x21:Completed 1250000 out of 2500000 steps (50%)
00:44:12:WU01:FS00:0x21:Completed 1275000 out of 2500000 steps (51%)
00:44:29:WU02:FS01:0x21:Completed 2700000 out of 5000000 steps (54%)
00:44:39:WU00:FS02:0x21:Completed 3600000 out of 5000000 steps (72%)
00:45:07:WU01:FS00:0x21:Completed 1300000 out of 2500000 steps (52%)
00:45:09:WU01:FS00:0x21:ERROR:Guru Meditation #b31b3bf6c9b44fab.956c859f86c75a85 (1806336.1967308) '01/01/positions.xtc'
00:45:09:WU01:FS00:0x21:WARNING:Unexpected exit() call
00:45:09:WU01:FS00:0x21:WARNING:Unexpected exit from science code
00:45:09:WU01:FS00:0x21:Saving result file logfile_01.txt
00:45:09:WU01:FS00:0x21:Saving result file checkpointState.xml
00:45:09:WU01:FS00:0x21:Saving result file checkpt.crc
00:45:09:WU01:FS00:0x21:Saving result file log.txt
00:45:09:WU01:FS00:0x21:ERROR:Guru Meditation #b31b3bf6c9b44fab.956c859f86c75a85 (1806336.1967308) '01/01/positions.xtc'
00:45:09:WARNING:WU01:FS00:FahCore returned: BAD_FRAME_CHECKSUM (112 = 0x70)
00:45:09:WARNING:WU01:FS00:Fatal error, dumping
00:45:09:WU01:FS00:Sending unit results: id:01 state:SEND error:DUMPED project:9177 run:1 clone:9 gen:159 core:0x21 unit:0x000000cfab436c6957b24c299c937b2a
00:45:10:WU01:FS00:Uploading 10.41MiB to 171.67.108.105

Machine 2:

Code: Select all

steve@fah01 /var/lib/fahclient $ head -400 log.txt |grep -v topology
*********************** Log Started 2017-04-30T21:00:44Z ***********************
21:00:44:************************* Folding@home Client *************************
21:00:44:        Website: http://folding.stanford.edu/
21:00:44:      Copyright: (c) 2009-2016 Stanford University
21:00:44:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
21:00:44:           Args: --child --lifeline 1836 /etc/fahclient/config.xml --run-as
21:00:44:                 fahclient --pid-file=/var/run/fahclient.pid --daemon
21:00:44:         Config: /etc/fahclient/config.xml
21:00:44:******************************** Build ********************************
21:00:44:        Version: 7.4.16
21:00:44:           Date: Jan 6 2017
21:00:44:           Time: 08:08:33
21:00:44:     Repository: Git
21:00:44:       Revision: e12187cbb0bd6937c067b9749af011374563b7b9
21:00:44:         Branch: master
21:00:44:       Compiler: GNU 4.9.2
21:00:44:        Options: -std=gnu++98 -O3 -funroll-loops -ffast-math -mfpmath=sse
21:00:44:                 -fno-unsafe-math-optimizations -msse2
21:00:44:       Platform: linux2 4.8.0-2-amd64
21:00:44:           Bits: 64
21:00:44:           Mode: Release
21:00:44:******************************* System ********************************
21:00:44:            CPU: AMD FX(tm)-6300 Six-Core Processor
21:00:44:         CPU ID: AuthenticAMD Family 21 Model 2 Stepping 0
21:00:44:           CPUs: 6
21:00:44:         Memory: 7.70GiB
21:00:44:    Free Memory: 7.20GiB
21:00:44:        Threads: POSIX_THREADS
21:00:44:     OS Version: 4.4
21:00:44:    Has Battery: false
21:00:44:     On Battery: false
21:00:44:     UTC Offset: -5
21:00:44:            PID: 1838
21:00:44:            CWD: /var/lib/fahclient
21:00:44:             OS: Linux 4.4.0-21-generic x86_64
21:00:44:        OS Arch: AMD64
21:00:44:           GPUs: 4
21:00:44:          GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:5 GP104 [GeForce GTX 1080]
21:00:44:          GPU 1: Bus:8 Slot:0 Func:0 NVIDIA:5 GM206 [GeForce GTX 960]
21:00:44:          GPU 2: Bus:9 Slot:0 Func:0 NVIDIA:5 GP104 [GeForce GTX 1080]
21:00:44:          GPU 3: Bus:10 Slot:0 Func:0 NVIDIA:5 GP104 [GeForce GTX 1080]
21:00:44:  CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:6.1 Driver:8.0
21:00:44:  CUDA Device 1: Platform:0 Device:1 Bus:8 Slot:0 Compute:5.2 Driver:8.0
21:00:44:  CUDA Device 2: Platform:0 Device:2 Bus:9 Slot:0 Compute:6.1 Driver:8.0
21:00:44:  CUDA Device 3: Platform:0 Device:3 Bus:10 Slot:0 Compute:6.1 Driver:8.0
21:00:44:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:370.28
21:00:44:OpenCL Device 1: Platform:0 Device:1 Bus:8 Slot:0 Compute:1.2 Driver:370.28
21:00:44:OpenCL Device 2: Platform:0 Device:2 Bus:9 Slot:0 Compute:1.2 Driver:370.28
21:00:44:OpenCL Device 3: Platform:0 Device:3 Bus:10 Slot:0 Compute:1.2 Driver:370.28
21:00:44:***********************************************************************
21:00:44:<config>
21:00:44:  <!-- Client Control -->
21:00:44:  <fold-anon v='true'/>
21:00:44:
21:00:44:  <!-- Folding Core -->
21:00:44:  <core-priority v='low'/>
21:00:44:
21:00:44:  <!-- Folding Slot Configuration -->
21:00:44:  <gpu v='false'/>
21:00:44:
21:00:44:  <!-- Network -->
21:00:44:  <proxy v=':8080'/>
21:00:44:
21:00:44:  <!-- Slot Control -->
21:00:44:  <power v='full'/>
21:00:44:
21:00:44:  <!-- User Information -->
21:00:44:  <passkey v='********************************'/>
21:00:44:  <team v='224497'/>
21:00:44:  <user v='DarthMouse_ALL_1GD5nCZbh7gNo1SESPLT24xEd2Jsu4rTP9'/>
21:00:44:
21:00:44:  <!-- Work Unit Control -->
21:00:44:  <next-unit-percentage v='100'/>
21:00:44:
21:00:44:  <!-- Folding Slots -->
21:00:44:  <slot id='1' type='GPU'>
21:00:44:    <gpu-index v='0'/>
21:00:44:  </slot>
21:00:44:  <slot id='0' type='GPU'>
21:00:44:    <gpu-index v='1'/>
21:00:44:  </slot>
21:00:44:  <slot id='2' type='GPU'>
21:00:44:    <gpu-index v='2'/>
21:00:44:  </slot>
21:00:44:  <slot id='3' type='GPU'>
21:00:44:    <gpu-index v='3'/>
21:00:44:  </slot>
21:00:44:</config>
21:00:44:Switching to user fahclient
21:00:44:Trying to access database...
21:00:44:Successfully acquired database lock
21:00:44:Enabled folding slot 01: READY gpu:0:GP104 [GeForce GTX 1080]
21:00:44:Enabled folding slot 00: READY gpu:1:GM206 [GeForce GTX 960]
21:00:44:Enabled folding slot 02: READY gpu:2:GP104 [GeForce GTX 1080]
21:00:44:Enabled folding slot 03: READY gpu:3:GP104 [GeForce GTX 1080]
21:00:44:WU02:FS00:Starting
21:00:44:WU02:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 02 -suffix 01 -version 704 -lifeline 1838 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 1 -cuda-device 1 -gpu 1
21:00:44:WU02:FS00:Started FahCore on PID 1848
21:00:44:WU02:FS00:Core PID:1852
21:00:44:WU02:FS00:FahCore 0x21 started
21:00:45:WU00:FS01:Starting
21:00:45:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 00 -suffix 01 -version 704 -lifeline 1838 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
21:00:45:WU00:FS01:Started FahCore on PID 1860
21:00:45:WU00:FS01:Core PID:1864
21:00:45:WU00:FS01:FahCore 0x21 started
21:00:45:WU04:FS03:Starting
21:00:45:WU04:FS03:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 04 -suffix 01 -version 704 -lifeline 1838 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 3 -cuda-device 3 -gpu 3
21:00:45:WU04:FS03:Started FahCore on PID 1886
21:00:45:WU04:FS03:Core PID:1890
21:00:45:WU04:FS03:FahCore 0x21 started
21:00:45:WU01:FS02:Starting
21:00:45:WU01:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 01 -suffix 01 -version 704 -lifeline 1838 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 2 -cuda-device 2 -gpu 2
21:00:45:WU01:FS02:Started FahCore on PID 1894
21:00:45:WU01:FS02:Core PID:1898
21:00:45:WU01:FS02:FahCore 0x21 started
21:00:46:WU01:FS02:0x21:*********************** Log Started 2017-04-30T21:00:46Z ***********************
21:00:46:WU01:FS02:0x21:Project: 11431 (Run 10, Clone 12, Gen 12)
21:00:46:WU01:FS02:0x21:Unit: 0x0000000e8ca304e858e137b9bad6957b
21:00:46:WU01:FS02:0x21:CPU: 0x00000000000000000000000000000000
21:00:46:WU01:FS02:0x21:Machine: 2
21:00:46:WU01:FS02:0x21:Digital signatures verified
21:00:46:WU01:FS02:0x21:Folding@home GPU Core21 Folding@home Core
21:00:46:WU01:FS02:0x21:Version 0.0.18
21:00:46:WU01:FS02:0x21:  Found a checkpoint file
21:00:46:WU02:FS00:0x21:*********************** Log Started 2017-04-30T21:00:46Z ***********************
21:00:46:WU02:FS00:0x21:Project: 11431 (Run 3, Clone 15, Gen 10)
21:00:46:WU02:FS00:0x21:Unit: 0x0000000e8ca304e858e137b8327395ce
21:00:46:WU02:FS00:0x21:CPU: 0x00000000000000000000000000000000
21:00:46:WU02:FS00:0x21:Machine: 0
21:00:46:WU02:FS00:0x21:Digital signatures verified
21:00:46:WU02:FS00:0x21:Folding@home GPU Core21 Folding@home Core
21:00:46:WU02:FS00:0x21:Version 0.0.18
21:00:46:WU02:FS00:0x21:  Found a checkpoint file
21:00:46:WU00:FS01:0x21:*********************** Log Started 2017-04-30T21:00:46Z ***********************
21:00:46:WU00:FS01:0x21:Project: 11403 (Run 3, Clone 17, Gen 527)
21:00:46:WU00:FS01:0x21:Unit: 0x000002d88ca304f255ed4f44677b0309
21:00:46:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
21:00:46:WU00:FS01:0x21:Machine: 1
21:00:46:WU00:FS01:0x21:Digital signatures verified
21:00:46:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core
21:00:46:WU00:FS01:0x21:Version 0.0.18
21:00:46:WU00:FS01:0x21:  Found a checkpoint file
21:00:46:WU04:FS03:0x21:*********************** Log Started 2017-04-30T21:00:46Z ***********************
21:00:46:WU04:FS03:0x21:Project: 9176 (Run 27, Clone 8, Gen 273)
21:00:46:WU04:FS03:0x21:Unit: 0x00000189ab436c6957b24c2969eb517c
21:00:46:WU04:FS03:0x21:CPU: 0x00000000000000000000000000000000
21:00:46:WU04:FS03:0x21:Machine: 3
21:00:46:WU04:FS03:0x21:Digital signatures verified
21:00:46:WU04:FS03:0x21:Folding@home GPU Core21 Folding@home Core
21:00:46:WU04:FS03:0x21:Version 0.0.18
21:00:46:WU04:FS03:0x21:  Found a checkpoint file
21:01:08:WU04:FS03:0x21:Completed 2300000 out of 2500000 steps (92%)
21:01:08:WU04:FS03:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
21:01:09:WU01:FS02:0x21:Completed 3500000 out of 5000000 steps (70%)
21:01:09:WU01:FS02:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
21:01:09:WU00:FS01:0x21:Completed 1000000 out of 5000000 steps (20%)
21:01:09:WU00:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
21:01:10:WU02:FS00:0x21:Completed 250000 out of 5000000 steps (5%)
21:01:10:WU02:FS00:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
21:01:58:WU04:FS03:0x21:Completed 2325000 out of 2500000 steps (93%)
21:02:47:WU04:FS03:0x21:Completed 2350000 out of 2500000 steps (94%)
21:03:14:WU00:FS01:0x21:Completed 1050000 out of 5000000 steps (21%)
21:03:25:WU01:FS02:0x21:Completed 3550000 out of 5000000 steps (71%)
21:03:35:WU04:FS03:0x21:Completed 2375000 out of 2500000 steps (95%)
21:04:24:WU04:FS03:0x21:Completed 2400000 out of 2500000 steps (96%)
21:05:15:WU04:FS03:0x21:Completed 2425000 out of 2500000 steps (97%)
21:05:19:WU00:FS01:0x21:Completed 1100000 out of 5000000 steps (22%)
21:05:38:WU01:FS02:0x21:Completed 3600000 out of 5000000 steps (72%)
21:06:03:WU04:FS03:0x21:Completed 2450000 out of 2500000 steps (98%)
21:06:51:WU04:FS03:0x21:Completed 2475000 out of 2500000 steps (99%)
21:07:27:WU00:FS01:0x21:Completed 1150000 out of 5000000 steps (23%)
21:07:40:WU04:FS03:0x21:Completed 2500000 out of 2500000 steps (100%)
21:07:41:WU03:FS03:Connecting to 171.67.108.45:80
21:07:41:WU03:FS03:Assigned to work server 171.67.108.105
21:07:41:WU03:FS03:Requesting new work unit for slot 03: RUNNING gpu:3:GP104 [GeForce GTX 1080] from 171.67.108.105
21:07:41:WU03:FS03:Connecting to 171.67.108.105:8080
21:07:42:WU03:FS03:Downloading 20.01MiB
21:07:42:WU04:FS03:0x21:Saving result file logfile_01.txt
21:07:42:WU04:FS03:0x21:Saving result file checkpointState.xml
21:07:42:WU04:FS03:0x21:Saving result file checkpt.crc
21:07:42:WU04:FS03:0x21:Saving result file log.txt
21:07:42:WU04:FS03:0x21:Saving result file positions.xtc
21:07:42:WU04:FS03:0x21:Folding@home Core Shutdown: FINISHED_UNIT
21:07:42:WU04:FS03:FahCore returned: FINISHED_UNIT (100 = 0x64)
21:07:42:WU04:FS03:Sending unit results: id:04 state:SEND error:NO_ERROR project:9176 run:27 clone:8 gen:273 core:0x21 unit:0x00000189ab436c6957b24c2969eb517c
21:07:42:WU04:FS03:Uploading 13.16MiB to 171.67.108.105
21:07:42:WU04:FS03:Connecting to 171.67.108.105:8080
21:07:43:WU02:FS00:0x21:Completed 300000 out of 5000000 steps (6%)
21:07:44:WU03:FS03:Download complete
21:07:44:WU03:FS03:Received Unit: id:03 state:DOWNLOAD error:NO_ERROR project:9178 run:2 clone:18 gen:327 core:0x21 unit:0x000001baab436c6957b24c29526c766c
21:07:44:WU03:FS03:Starting
21:07:44:WU03:FS03:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 03 -suffix 01 -version 704 -lifeline 1838 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 3 -cuda-device 3 -gpu 3
21:07:44:WU03:FS03:Started FahCore on PID 2599
21:07:44:WU03:FS03:Core PID:2603
21:07:44:WU03:FS03:FahCore 0x21 started
21:07:45:WU03:FS03:0x21:*********************** Log Started 2017-04-30T21:07:44Z ***********************
21:07:45:WU03:FS03:0x21:Project: 9178 (Run 2, Clone 18, Gen 327)
21:07:45:WU03:FS03:0x21:Unit: 0x000001baab436c6957b24c29526c766c
21:07:45:WU03:FS03:0x21:CPU: 0x00000000000000000000000000000000
21:07:45:WU03:FS03:0x21:Machine: 3
21:07:45:WU03:FS03:0x21:Reading tar file core.xml
21:07:45:WU03:FS03:0x21:Reading tar file integrator.xml
21:07:45:WU03:FS03:0x21:Reading tar file state.xml
21:07:45:WU03:FS03:0x21:Reading tar file system.xml
21:07:45:WU03:FS03:0x21:Digital signatures verified
21:07:45:WU03:FS03:0x21:Folding@home GPU Core21 Folding@home Core
21:07:45:WU03:FS03:0x21:Version 0.0.18
21:07:48:WU04:FS03:Upload 58.90%
21:07:52:WU04:FS03:Upload complete
21:07:52:WU04:FS03:Server responded WORK_ACK (400)
21:07:52:WU04:FS03:Final credit estimate, 41843.00 points
21:07:52:WU04:FS03:Cleaning up
21:07:52:WU01:FS02:0x21:Completed 3650000 out of 5000000 steps (73%)
21:07:53:WU03:FS03:0x21:Completed 0 out of 2500000 steps (0%)
21:07:53:WU03:FS03:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
21:08:37:WU03:FS03:0x21:Completed 25000 out of 2500000 steps (1%)
21:09:21:WU03:FS03:0x21:Completed 50000 out of 2500000 steps (2%)
21:09:33:WU00:FS01:0x21:Completed 1200000 out of 5000000 steps (24%)
21:10:06:WU03:FS03:0x21:Completed 75000 out of 2500000 steps (3%)
21:10:07:WU01:FS02:0x21:Completed 3700000 out of 5000000 steps (74%)
21:10:50:WU03:FS03:0x21:Completed 100000 out of 2500000 steps (4%)
21:11:36:WU03:FS03:0x21:Completed 125000 out of 2500000 steps (5%)
21:11:38:WU00:FS01:0x21:Completed 1250000 out of 5000000 steps (25%)
21:12:20:WU03:FS03:0x21:Completed 150000 out of 2500000 steps (6%)
21:12:20:WU01:FS02:0x21:Completed 3750000 out of 5000000 steps (75%)
21:12:23:WU01:FS02:0x21:ERROR:Guru Meditation #3f4c86e420183ff6.87e1451446317e75 (2867200.3087004) '01/01/positions.xtc'
21:12:23:WU01:FS02:0x21:WARNING:Unexpected exit() call
21:12:23:WU01:FS02:0x21:WARNING:Unexpected exit from science code
21:12:23:WU01:FS02:0x21:Saving result file logfile_01.txt
21:12:23:WU01:FS02:0x21:Saving result file checkpointState.xml
21:12:23:WU01:FS02:0x21:Saving result file checkpt.crc
21:12:23:WU01:FS02:0x21:Saving result file log.txt
21:12:23:WU01:FS02:0x21:ERROR:Guru Meditation #3f4c86e420183ff6.87e1451446317e75 (2867200.3087004) '01/01/positions.xtc'
21:12:24:WARNING:WU01:FS02:FahCore returned: BAD_FRAME_CHECKSUM (112 = 0x70)
21:12:24:WARNING:WU01:FS02:Fatal error, dumping
21:12:24:WU01:FS02:Sending unit results: id:01 state:SEND error:DUMPED project:11431 run:10 clone:12 gen:12 core:0x21 unit:0x0000000e8ca304e858e137b9bad6957b
21:12:24:WU01:FS02:Uploading 13.91MiB to 140.163.4.232
21:12:24:WU01:FS02:Connecting to 140.163.4.232:8080
21:12:24:WU04:FS02:Connecting to 171.67.108.45:80
21:12:25:WU04:FS02:Assigned to work server 171.67.108.159
21:12:25:WU04:FS02:Requesting new work unit for slot 02: READY gpu:2:GP104 [GeForce GTX 1080] from 171.67.108.159
21:12:25:WU04:FS02:Connecting to 171.67.108.159:8080
21:12:25:WU04:FS02:Downloading 22.85MiB
21:12:28:WU04:FS02:Download complete
21:12:28:WU04:FS02:Received Unit: id:04 state:DOWNLOAD error:NO_ERROR project:9180 run:24 clone:9 gen:338 core:0x21 unit:0x000001ebab436c9f57bdce052b1c7e88
21:12:28:WU04:FS02:Starting
21:12:28:WU04:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 04 -suffix 01 -version 704 -lifeline 1838 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 2 -cuda-device 2 -gpu 2
21:12:28:WU04:FS02:Started FahCore on PID 2646
21:12:28:WU04:FS02:Core PID:2650
21:12:28:WU04:FS02:FahCore 0x21 started
21:12:28:WU04:FS02:0x21:*********************** Log Started 2017-04-30T21:12:28Z ***********************
21:12:28:WU04:FS02:0x21:Project: 9180 (Run 24, Clone 9, Gen 338)
21:12:28:WU04:FS02:0x21:Unit: 0x000001ebab436c9f57bdce052b1c7e88
21:12:28:WU04:FS02:0x21:CPU: 0x00000000000000000000000000000000
21:12:28:WU04:FS02:0x21:Machine: 2
21:12:28:WU04:FS02:0x21:Reading tar file core.xml
21:12:28:WU04:FS02:0x21:Reading tar file integrator.xml
21:12:28:WU04:FS02:0x21:Reading tar file state.xml
21:12:28:WU04:FS02:0x21:Reading tar file system.xml
21:12:28:WU04:FS02:0x21:Digital signatures verified
21:12:28:WU04:FS02:0x21:Folding@home GPU Core21 Folding@home Core
21:12:28:WU04:FS02:0x21:Version 0.0.18
21:12:30:WU01:FS02:Upload 62.90%
21:12:34:WU01:FS02:Upload complete
21:12:35:WU01:FS02:Server responded WORK_QUIT (404)
21:12:35:WARNING:WU01:FS02:Server did not like results, dumping
21:12:35:WU01:FS02:Cleaning up
Image

1080 and 1080TI GPUs on Linux Mint
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: BAD_FRAME_CHECKSUM

Post by bruce »

SteveWillis wrote:Just a few days ago, and now on a different machine today, right after a reboot on both I got a BAD_FRAME_CHECKSUM error.
This, indeed, is a rare occurrence. That suggests that it has nothing to do with the configuration of systems around the world.

Strictly a guess, but I'll bet your reboot was too fast. From the time you tell the OS it needs to shut down it can take as much as a minute or two for the FAHCore to finish whatever it's doing (especially if it's in the midst of writing a checkpoint) followed by a certain amount of time for the OS to sync the cache to the disk, followed by an orderly shutdown and power off. If you tried to hurry that process, it's likely to produce a corrupt the checkpoint or its checksum.

If that's not what happened, somebody else can offer a guess.
SteveWillis
Posts: 409
Joined: Fri Apr 15, 2016 12:42 am
Hardware configuration: PC 1:
Linux Mint 17.3
three gtx 1080 GPUs One on a powered header
Motherboard = [MB-AM3-AS-SB-990FXR2] qty 1 Asus Sabertooth 990FX(+59.99)
CPU = [CPU-AM3-FX-8320BR] qty 1 AMD FX 8320 Eight Core 3.5GHz(+41.99)

PC2:
Linux Mint 18
Open air case
Motherboard: ASUS Crosshair V Formula-Z AM3+ AMD 990FX SATA 6Gb/s USB 3.0 ATX AMD
AMD FD6300WMHKBOX FX-6300 6-Core Processor Black Edition with Cooler Master Hyper 212 EVO - CPU Cooler with 120mm PWM Fan
three gtx 1080,
one gtx 1080 TI on a powered header

Re: BAD_FRAME_CHECKSUM

Post by SteveWillis »

Thanks, bruce. I would agree that's a good guess. I think in both cases I had system problems requiring a hard reset. It never occurred to me that that was the problem.
Image

1080 and 1080TI GPUs on Linux Mint
Post Reply