project:17435 run:0 clone:1286 gen:36 failing on v7.4.4

Moderators: Site Moderators, FAHC Science Team

Post Reply
wuffy68
Posts: 169
Joined: Wed Jun 04, 2014 11:06 pm
Hardware configuration: 1x nVidia 1080Ti, 1x nVidia 1070, 1x nVidia 1060, 1x nVidia 750Ti, AMD Radeon R7 M460
Location: Roxborough, Colorado USA
Contact:

project:17435 run:0 clone:1286 gen:36 failing on v7.4.4

Post by wuffy68 »

Ubuntu 16.04.1 SMP x86_64
Intel Core i7 2.8 GHz
nVidia 960 GTX [GM206]
Driver 384.130
Folding Client 7.4.4

project:17435 run:0 clone:1286 gen:36 causes FAH service to crash after reaching ~.04% complete. This happed similarly on another work unit Saturday - forcing me to dump the WU. After that, it ran well for a couple days, now it's back to the same problem.

syslog:

Code: Select all

Feb 15 17:33:05 curecoinproject1 kernel: [    0.194735] acpi PNP0A08:00: _OSC failed (AE_NOT_FOUND); disabling ASPM
Feb 15 17:33:05 curecoinproject1 kernel: [    1.316545] nvidia: module verification failed: signature and/or required key missing - tainting kernel
Feb 15 17:33:07 curecoinproject1 thermald[947]: THD engine start failed
Feb 15 17:33:15 curecoinproject1 NetworkManager[1116]: nm_device_get_device_type: assertion 'NM_IS_DEVICE (self)' failed
Feb 15 17:33:15 curecoinproject1 NetworkManager[1116]: <warn>  [1613435595.8848] failed to enumerate oFono devices: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.ofono was not provided by any .service files
Feb 15 17:33:19 curecoinproject1 nm-dispatcher: req:2 'up' [docker0], "/etc/NetworkManager/dispatcher.d/01ifupdown": complete: failed with Script '/etc/NetworkManager/dispatcher.d/01ifupdown' exited with error status 1.
Feb 15 17:33:19 curecoinproject1 NetworkManager[1116]: <warn>  [1613435599.9162] dispatcher: (3) 01ifupdown failed (failed): Script '/etc/NetworkManager/dispatcher.d/01ifupdown' exited with error status 1.
Feb 15 17:33:24 curecoinproject1 nm-dispatcher: req:3 'up' [enp2s0], "/etc/NetworkManager/dispatcher.d/01ifupdown": complete: failed with Script '/etc/NetworkManager/dispatcher.d/01ifupdown' exited with error status 1.
Feb 15 17:33:24 curecoinproject1 NetworkManager[1116]: <warn>  [1613435604.1199] dispatcher: (5) 01ifupdown failed (failed): Script '/etc/NetworkManager/dispatcher.d/01ifupdown' exited with error status 1.
Feb 15 17:33:34 curecoinproject1 fwupd[2735]: (fwupd:2735): Fu-WARNING **: FuMain: failed to load AppStream data: Failed to parse /var/cache/app-info/xmls/fwupd.xml file: Error on line 2672: Entity did not end with a semicolon; most likely you used an ampersand character without intending to start an entity - escape ampersand as &
Feb 15 17:33:34 curecoinproject1 fwupd[2735]: (fwupd:2735): Fu-WARNING **: disabling plugin because: failed to coldplug uefi: UEFI firmware updating not supported
Feb 15 17:33:34 curecoinproject1 fwupd[2735]: (fwupd:2735): Fu-WARNING **: disabling plugin because: failed to coldplug raspberrypi: Raspberry PI firmware updating not supported, no /boot/start.elf
Feb 15 17:33:52 curecoinproject1 pulseaudio[1964]: [pulseaudio] bluez5-util.c: GetManagedObjects() failed: org.freedesktop.DBus.Error.TimedOut: Failed to activate service 'org.bluez': timed out
Feb 15 17:34:49 curecoinproject1 pulseaudio[1964]: [pulseaudio] module-x11-bell.c: XkbQueryExtension() failed
Feb 15 17:34:49 curecoinproject1 pulseaudio[1964]: [pulseaudio] module.c: Failed to load module "module-x11-bell" (argument: "display=:10.0 sample=bell.ogg"): initialization failed.
FAH log:

Code: Select all

N/A for that time period - appears to have rolled

I realize this is an old build, old GPU and old driver ... but I figured it's worth reporting.

Thank you,

wuffy68
1x nVidia 1070, 1x nVidia 1060 3g,
1x nVidia 970, 2x nVidia 960,
1x nVidia 555, 1x AMD R7, 2x AMD 295,
6x i5 CPU-only rigs
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: project:17435 run:0 clone:1286 gen:36 failing on v7.4.4

Post by PantherX »

It is strongly recommended to use version 7.6.21 since FahCore_22 has some new arguments that are not supported by the older clients. Thus, it would be nice to simply update the client. Since you have Ubuntu 16, I think it can handle Python 2 without issues so it would be easier to upgrade.

BTW, you can have up-to 16 previous logs in the logs folder by default so you can check the file in there if needed :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
wuffy68
Posts: 169
Joined: Wed Jun 04, 2014 11:06 pm
Hardware configuration: 1x nVidia 1080Ti, 1x nVidia 1070, 1x nVidia 1060, 1x nVidia 750Ti, AMD Radeon R7 M460
Location: Roxborough, Colorado USA
Contact:

Re: project:17435 run:0 clone:1286 gen:36 failing on v7.4.4

Post by wuffy68 »

PantherX wrote:BTW, you can have up-to 16 previous logs in the logs folder by default so you can check the file in there if needed :)
Thanks - yea, I haven't looked at Linux logs for a while (found them in /var/lib/fahclient/logs) ... looks like a "BAD_FRAME_CHECKSUM" upon restart, and the work unit auto-dumped in this case. Both failures came from project 17435.

Code: Select all

01:02:03:WU01:FS01:0x22:Project: 17435 (Run 0, Clone 1286, Gen 36)
01:02:03:WU01:FS01:0x22:Unit: 0x00000000000000000000000000000000
01:02:03:WU01:FS01:0x22:Digital signatures verified
01:02:03:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
01:02:03:WU01:FS01:0x22:Version 0.0.13
01:02:03:WU01:FS01:0x22:  Checkpoint write interval: 25000 steps (2%) [50 total]
01:02:03:WU01:FS01:0x22:  JSON viewer frame write interval: 12500 steps (1%) [100 total]
01:02:03:WU01:FS01:0x22:  XTC frame write interval: 10000 steps (0.8%) [125 total]
01:02:03:WU01:FS01:0x22:  Global context and integrator variables write interval: disabled
01:02:03:WU01:FS01:0x22:No -opencl-device specified; using deprecated -gpu argument as an alias for -opencl-device.
01:02:03:WU01:FS01:0x22:Please consider upgrading your client version.
01:02:03:WU01:FS01:0x22:There are 3 platforms available.
01:02:03:WU01:FS01:0x22:Platform 0: Reference
01:02:03:WU01:FS01:0x22:Platform 1: CPU
01:02:03:WU01:FS01:0x22:Platform 2: OpenCL
01:02:03:WU01:FS01:0x22:  opencl-device 0 specified
01:02:05:WU00:FS00:0xa7:Completed 366522 out of 500000 steps (73%)
01:02:33:WU01:FS01:0x22:Attempting to create OpenCL context:
01:02:33:WU01:FS01:0x22:  Configuring platform OpenCL
01:02:41:Removing old file 'configs/config-20200713-054312.xml'
01:02:41:Saving configuration to /etc/fahclient/config.xml
01:02:41:<config>
01:02:41:  <!-- Network -->
01:02:41:  <proxy v=':8080'/>
01:02:41:
01:02:41:  <!-- Slot Control -->
01:02:41:  <pause-on-battery v='false'/>
01:02:41:  <power v='full'/>
01:02:41:
01:02:41:  <!-- User Information -->
01:02:41:  <passkey v='********************************'/>
01:02:41:  <team v='43573'/>
01:02:41:  <user v='Ivan_Tuma'/>
01:02:41:
01:02:41:  <!-- Folding Slots -->
01:02:41:  <slot id='0' type='CPU'/>
01:02:41:  <slot id='1' type='GPU'/>
01:02:41:</config>
01:02:55:WU01:FS01:0x22:  Using OpenCL on platformId 0 and gpu 0
01:02:55:WU01:FS01:0x22:ERROR:Guru Meditation #0.baef2504129c7209 (0.42744404) '01/01/checkpoint'
^[[93m01:02:55:WARNING:WU01:FS01:FahCore returned: BAD_FRAME_CHECKSUM (112 = 0x70)^[[0m
[color=#FF0000]^[[93m01:02:55:WARNING:WU01:FS01:Fatal error, dumping^[[0m
[/color][color=#FF0000]01:02:55:WU01:FS01:Sending unit results: id:01 state:SEND error:DUMPED project:17435 run:0 clone:1286 gen:36 core:0x22 unit:0x00000506000000240000441b00000000
[/color]01:02:55:WU01:FS01:Connecting to 206.223.170.146:8080
01:02:56:WU01:FS01:Server responded WORK_ACK (400)
1x nVidia 1070, 1x nVidia 1060 3g,
1x nVidia 970, 2x nVidia 960,
1x nVidia 555, 1x AMD R7, 2x AMD 295,
6x i5 CPU-only rigs
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: project:17435 run:0 clone:1286 gen:36 failing on v7.4.4

Post by PantherX »

Generally speaking, a cause of that could be a faulty disk drive since it is reading the checkpoint data to resume and if it fails the checksum, that's a strong indication that something is off. See if your filesystem is healthy and repair any issues if detected/needed. Also, check to see if your drive (HDD/SSD) are within normal parameters for working.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Post Reply