Zombie Download Process

Moderators: Site Moderators, PandeGroup

Zombie Download Process

Postby Nert » Wed Aug 30, 2017 9:52 pm

I'm running Linux Mint with a GTX 970 and GTX 1080. Everything was fine until today.
I noticed that had a download that was "stuck". The log showed a download roughly 50% complete and then .... nothing. As a result, the slot wasn't folding. I tried pausing and unpausing both slots. This didn't do anything. Then I deleted and added back the slots. This resulted in 3 slots, with the two new ones folding and the zombie showing "downloading". Later after completing a couple of units, all of the slots displayed the same download stuck condition. I then rebooted the system with no success. I know there's a 20 page post about stuck downloads, but there's so much there I couldn't pick out any specifics about how to fix this. I'm hoping I can just delete something left over or kill some process rather than re-installing. Right now I'm folding, but I expect that I'll have problems at some point in the future because of this zombie download.

Here's the most recent log. It probably has some pause and restarts in it. I can try and find previous logs if that will help.

Code: Select all
*********************** Log Started 2017-08-30T21:07:22Z ***********************
21:07:22:************************* Folding@home Client *************************
21:07:22:    Website: http://folding.stanford.edu/
21:07:22:  Copyright: (c) 2009-2014 Stanford University
21:07:22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
21:07:22:       Args: --child --lifeline 1661 /etc/fahclient/config.xml --run-as
21:07:22:             fahclient --pid-file=/var/run/fahclient.pid --daemon
21:07:22:     Config: /etc/fahclient/config.xml
21:07:22:******************************** Build ********************************
21:07:22:    Version: 7.4.4
21:07:22:       Date: Mar 4 2014
21:07:22:       Time: 12:02:38
21:07:22:    SVN Rev: 4130
21:07:22:     Branch: fah/trunk/client
21:07:22:   Compiler: GNU 4.4.7
21:07:22:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
21:07:22:             -fno-unsafe-math-optimizations -msse2
21:07:22:   Platform: linux2 3.2.0-1-amd64
21:07:22:       Bits: 64
21:07:22:       Mode: Release
21:07:22:******************************* System ********************************
21:07:22:        CPU: Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
21:07:22:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
21:07:22:       CPUs: 4
21:07:22:     Memory: 15.61GiB
21:07:22:Free Memory: 15.16GiB
21:07:22:    Threads: POSIX_THREADS
21:07:22: OS Version: 3.19
21:07:22:Has Battery: false
21:07:22: On Battery: false
21:07:22: UTC Offset: -5
21:07:22:        PID: 1663
21:07:22:        CWD: /var/lib/fahclient
21:07:22:         OS: Linux 3.19.0-32-generic x86_64
21:07:22:    OS Arch: AMD64
21:07:22:       GPUs: 3
21:07:22:      GPU 0: NVIDIA:5 GP104 [GeForce GTX 1080]
21:07:22:      GPU 1: UNSUPPORTED: NV3 [PCI]
21:07:22:      GPU 2: NVIDIA:5 GM204 [GeForce GTX 970]
21:07:22:       CUDA: 6.1
21:07:22:CUDA Driver: 8000
21:07:22:***********************************************************************
21:07:22:<config>
21:07:22:  <!-- Client Control -->
21:07:22:  <fold-anon v='true'/>
21:07:22:
21:07:22:  <!-- Folding Slot Configuration -->
21:07:22:  <cause v='PARKINSONS'/>
21:07:22:  <gpu v='false'/>
21:07:22:
21:07:22:  <!-- Network -->
21:07:22:  <proxy v=':8080'/>
21:07:22:
21:07:22:  <!-- Slot Control -->
21:07:22:  <power v='full'/>
21:07:22:
21:07:22:  <!-- User Information -->
21:07:22:  <passkey v='********************************'/>
21:07:22:  <team v='224497'/>
21:07:22:  <user v='nert_ALL_1KqFJ6gDgARrEvTDsJFE9dXX3B4ttLsv1g'/>
21:07:22:
21:07:22:  <!-- Folding Slots -->
21:07:22:  <slot id='1' type='GPU'/>
21:07:22:  <slot id='0' type='GPU'/>
21:07:22:</config>
21:07:22:Switching to user fahclient
21:07:22:Trying to access database...
21:07:22:Successfully acquired database lock
21:07:22:Enabled folding slot 01: READY gpu:0:GP104 [GeForce GTX 1080]
21:07:22:Enabled folding slot 00: READY gpu:2:GM204 [GeForce GTX 970]
21:07:23:WARNING:WU00:FS01:Exception: Could not get IP address for assign-GPU.stanford.edu: Temporary failure in name resolution
21:07:23:ERROR:WU00:FS01:Exception: Could not get an assignment
21:07:23:WARNING:WU01:FS00:Exception: Could not get IP address for assign-GPU.stanford.edu: Temporary failure in name resolution
21:07:23:ERROR:WU01:FS00:Exception: Could not get an assignment
21:07:23:WARNING:WU00:FS01:Exception: Could not get IP address for assign-GPU.stanford.edu: Temporary failure in name resolution
21:07:23:ERROR:WU00:FS01:Exception: Could not get an assignment
21:07:23:WARNING:WU01:FS00:Exception: Could not get IP address for assign-GPU.stanford.edu: Temporary failure in name resolution
21:07:23:ERROR:WU01:FS00:Exception: Could not get an assignment
21:08:23:WU00:FS01:Connecting to 171.67.108.45:80
21:08:23:WU01:FS00:Connecting to 171.67.108.45:80
21:08:23:WU00:FS01:Assigned to work server 171.67.108.157
21:08:23:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GP104 [GeForce GTX 1080] from 171.67.108.157
21:08:23:WU00:FS01:Connecting to 171.67.108.157:8080
21:08:24:WU01:FS00:Assigned to work server 140.163.4.233
21:08:24:WU01:FS00:Requesting new work unit for slot 00: READY gpu:2:GM204 [GeForce GTX 970] from 140.163.4.233
21:08:24:WU01:FS00:Connecting to 140.163.4.233:8080
21:08:24:WU00:FS01:Downloading 8.87MiB
21:08:24:WU01:FS00:Downloading 25.55MiB
21:08:28:WU00:FS01:Download complete
21:08:28:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9431 run:646 clone:1 gen:325 core:0x21 unit:0x00000180ab436c9d586fdd3964a36de7
21:08:30:WU01:FS00:Download 0.98%
21:08:30:WU00:FS01:Starting
21:08:30:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 00 -suffix 01 -version 704 -lifeline 1663 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
21:08:30:WU00:FS01:Started FahCore on PID 2875
21:08:30:WU00:FS01:Core PID:2879
21:08:30:WU00:FS01:FahCore 0x21 started
21:08:39:WU00:FS01:0x21:*********************** Log Started 2017-08-30T21:08:39Z ***********************
21:08:39:WU00:FS01:0x21:Project: 9431 (Run 646, Clone 1, Gen 325)
21:08:39:WU00:FS01:0x21:Unit: 0x00000180ab436c9d586fdd3964a36de7
21:08:39:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
21:08:39:WU00:FS01:0x21:Machine: 1
21:08:39:WU00:FS01:0x21:Reading tar file core.xml
21:08:39:WU00:FS01:0x21:Reading tar file integrator.xml
21:08:39:WU00:FS01:0x21:Reading tar file state.xml
21:08:39:WU00:FS01:0x21:Reading tar file system.xml
21:08:39:WU00:FS01:0x21:Digital signatures verified
21:08:39:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core
21:08:39:WU00:FS01:0x21:Version 0.0.18
21:08:47:WU01:FS00:Download 1.71%
21:08:47:WU00:FS01:0x21:Completed 0 out of 6250000 steps (0%)
21:08:47:WU00:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
21:09:00:WU01:FS00:Download 2.45%
21:09:41:WU00:FS01:0x21:Completed 62500 out of 6250000 steps (1%)
21:10:34:WU00:FS01:0x21:Completed 125000 out of 6250000 steps (2%)
21:10:49:FS00:Paused
21:10:53:FS00:Unpaused
21:11:12:WU01:FS00:Download 2.94%
21:11:22:WU01:FS00:Download 3.67%
21:11:27:WU00:FS01:0x21:Completed 187500 out of 6250000 steps (3%)
21:11:28:WU01:FS00:Download 4.40%
21:11:35:WU01:FS00:Download 4.65%
21:11:58:WU01:FS00:Download 5.14%
21:11:58:Removing old file 'configs/config-20170816-191327.xml'
21:11:58:Saving configuration to /etc/fahclient/config.xml
21:11:58:<config>
21:11:58:  <!-- Client Control -->
21:11:58:  <fold-anon v='true'/>
21:11:58:
21:11:58:  <!-- Folding Slot Configuration -->
21:11:58:  <cause v='PARKINSONS'/>
21:11:58:  <gpu v='false'/>
21:11:58:
21:11:58:  <!-- Network -->
21:11:58:  <proxy v=':8080'/>
21:11:58:
21:11:58:  <!-- Slot Control -->
21:11:58:  <power v='full'/>
21:11:58:
21:11:58:  <!-- User Information -->
21:11:58:  <passkey v='********************************'/>
21:11:58:  <team v='224497'/>
21:11:58:  <user v='nert_ALL_1KqFJ6gDgARrEvTDsJFE9dXX3B4ttLsv1g'/>
21:11:58:
21:11:58:  <!-- Folding Slots -->
21:11:58:  <slot id='0' type='GPU'/>
21:11:58:</config>
21:11:58:FS01:Shutting core down
21:11:58:WU00:FS01:0x21:Caught signal SIGINT(2) on PID 2879
21:11:58:WU00:FS01:0x21:Exiting, please wait. . .
21:11:58:WU00:FS01:0x21:Folding@home Core Shutdown: INTERRUPTED
21:11:58:WU00:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
21:11:58:WARNING:WU00:Slot ID 1 no longer exists and there are no other matching slots, dumping
21:11:59:WU00:Sending unit results: id:00 state:SEND error:DUMPED project:9431 run:646 clone:1 gen:325 core:0x21 unit:0x00000180ab436c9d586fdd3964a36de7
21:11:59:WU00:Connecting to 171.67.108.157:8080
21:11:59:WU00:Server responded WORK_ACK (400)
21:11:59:WU00:Cleaning up
21:12:09:WU01:FS00:Download 5.63%
21:12:11:Removing old file 'configs/config-20170816-191407.xml'
21:12:11:Saving configuration to /etc/fahclient/config.xml
21:12:11:<config>
21:12:11:  <!-- Client Control -->
21:12:11:  <fold-anon v='true'/>
21:12:11:
21:12:11:  <!-- Folding Slot Configuration -->
21:12:11:  <cause v='PARKINSONS'/>
21:12:11:  <gpu v='false'/>
21:12:11:
21:12:11:  <!-- Network -->
21:12:11:  <proxy v=':8080'/>
21:12:11:
21:12:11:  <!-- Slot Control -->
21:12:11:  <power v='full'/>
21:12:11:
21:12:11:  <!-- User Information -->
21:12:11:  <passkey v='********************************'/>
21:12:11:  <team v='224497'/>
21:12:11:  <user v='nert_ALL_1KqFJ6gDgARrEvTDsJFE9dXX3B4ttLsv1g'/>
21:12:11:
21:12:11:  <!-- Folding Slots -->
21:12:11:</config>
21:12:27:Removing old file 'configs/config-20170816-205345.xml'
21:12:27:Saving configuration to /etc/fahclient/config.xml
21:12:27:<config>
21:12:27:  <!-- Client Control -->
21:12:27:  <fold-anon v='true'/>
21:12:27:
21:12:27:  <!-- Folding Slot Configuration -->
21:12:27:  <cause v='PARKINSONS'/>
21:12:27:  <gpu v='false'/>
21:12:27:
21:12:27:  <!-- Network -->
21:12:27:  <proxy v=':8080'/>
21:12:27:
21:12:27:  <!-- Slot Control -->
21:12:27:  <power v='full'/>
21:12:27:
21:12:27:  <!-- User Information -->
21:12:27:  <passkey v='********************************'/>
21:12:27:  <team v='224497'/>
21:12:27:  <user v='nert_ALL_1KqFJ6gDgARrEvTDsJFE9dXX3B4ttLsv1g'/>
21:12:27:
21:12:27:  <!-- Folding Slots -->
21:12:27:</config>
21:13:17:WU01:FS00:Download 6.12%
21:13:24:WU01:FS00:Download 6.61%
21:13:31:WU01:FS00:Download 7.34%
21:13:40:WU01:FS00:Download 8.07%
21:13:52:WU01:FS00:Download 8.81%
21:14:02:Adding folding slot 00: READY gpu:0:GP104 [GeForce GTX 1080]
21:14:02:Removing old file 'configs/config-20170816-214334.xml'
21:14:02:Saving configuration to /etc/fahclient/config.xml
21:14:02:<config>
21:14:02:  <!-- Client Control -->
21:14:02:  <fold-anon v='true'/>
21:14:02:
21:14:02:  <!-- Folding Slot Configuration -->
21:14:02:  <cause v='PARKINSONS'/>
21:14:02:  <gpu v='false'/>
21:14:02:
21:14:02:  <!-- Network -->
21:14:02:  <proxy v=':8080'/>
21:14:02:
21:14:02:  <!-- Slot Control -->
21:14:02:  <power v='full'/>
21:14:02:
21:14:02:  <!-- User Information -->
21:14:02:  <passkey v='********************************'/>
21:14:02:  <team v='224497'/>
21:14:02:  <user v='nert_ALL_1KqFJ6gDgARrEvTDsJFE9dXX3B4ttLsv1g'/>
21:14:02:
21:14:02:  <!-- Folding Slots -->
21:14:02:  <slot id='0' type='GPU'/>
21:14:02:</config>
21:14:02:WU00:FS00:Connecting to 171.67.108.45:80
21:14:03:WU00:FS00:Assigned to work server 171.67.108.157
21:14:03:WU00:FS00:Requesting new work unit for slot 00: READY gpu:0:GP104 [GeForce GTX 1080] from 171.67.108.157
21:14:03:WU00:FS00:Connecting to 171.67.108.157:8080
21:14:03:WU00:FS00:Downloading 5.13MiB
21:14:06:WU00:FS00:Download complete
21:14:06:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9414 run:1157 clone:0 gen:74 core:0x21 unit:0x00000058ab436c9d585e0698e5671a2f
21:14:06:WU00:FS00:Starting
21:14:06:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 00 -suffix 01 -version 704 -lifeline 1663 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
21:14:06:WU00:FS00:Started FahCore on PID 3016
21:14:06:WU00:FS00:Core PID:3020
21:14:06:WU00:FS00:FahCore 0x21 started
21:14:06:WU00:FS00:0x21:*********************** Log Started 2017-08-30T21:14:06Z ***********************
21:14:06:WU00:FS00:0x21:Project: 9414 (Run 1157, Clone 0, Gen 74)
21:14:06:WU00:FS00:0x21:Unit: 0x00000058ab436c9d585e0698e5671a2f
21:14:06:WU00:FS00:0x21:CPU: 0x00000000000000000000000000000000
21:14:06:WU00:FS00:0x21:Machine: 0
21:14:06:WU00:FS00:0x21:Reading tar file core.xml
21:14:06:WU00:FS00:0x21:Reading tar file integrator.xml
21:14:06:WU00:FS00:0x21:Reading tar file state.xml
21:14:06:WU00:FS00:0x21:Reading tar file system.xml
21:14:06:WU00:FS00:0x21:Digital signatures verified
21:14:06:WU00:FS00:0x21:Folding@home GPU Core21 Folding@home Core
21:14:06:WU00:FS00:0x21:Version 0.0.18
21:14:07:WU00:FS00:0x21:Completed 0 out of 6250000 steps (0%)
21:14:07:WU00:FS00:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
21:14:21:Adding folding slot 01: READY gpu:2:GM204 [GeForce GTX 970]
21:14:21:Removing old file 'configs/config-20170817-011405.xml'
21:14:21:Saving configuration to /etc/fahclient/config.xml
21:14:21:<config>
21:14:21:  <!-- Client Control -->
21:14:21:  <fold-anon v='true'/>
21:14:21:
21:14:21:  <!-- Folding Slot Configuration -->
21:14:21:  <cause v='PARKINSONS'/>
21:14:21:  <gpu v='false'/>
21:14:21:
21:14:21:  <!-- Network -->
21:14:21:  <proxy v=':8080'/>
21:14:21:
21:14:21:  <!-- Slot Control -->
21:14:21:  <power v='full'/>
21:14:21:
21:14:21:  <!-- User Information -->
21:14:21:  <passkey v='********************************'/>
21:14:21:  <team v='224497'/>
21:14:21:  <user v='nert_ALL_1KqFJ6gDgARrEvTDsJFE9dXX3B4ttLsv1g'/>
21:14:21:
21:14:21:  <!-- Folding Slots -->
21:14:21:  <slot id='0' type='GPU'/>
21:14:21:  <slot id='1' type='GPU'/>
21:14:21:</config>
21:14:21:WU02:FS01:Connecting to 171.67.108.45:80
21:14:22:WU02:FS01:Assigned to work server 171.67.108.160
21:14:22:WU02:FS01:Requesting new work unit for slot 01: READY gpu:2:GM204 [GeForce GTX 970] from 171.67.108.160
21:14:22:WU02:FS01:Connecting to 171.67.108.160:8080
21:14:24:WU02:FS01:Downloading 2.02MiB
21:14:25:WU02:FS01:Download complete
21:14:25:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:9841 run:3 clone:1 gen:272 core:0x21 unit:0x00000123ab436ca059568b61f18d0efa
21:14:25:WU02:FS01:Starting
21:14:25:WU02:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 02 -suffix 01 -version 704 -lifeline 1663 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
21:14:25:WU02:FS01:Started FahCore on PID 3037
21:14:25:WU02:FS01:Core PID:3041
21:14:25:WU02:FS01:FahCore 0x21 started
21:14:26:WU02:FS01:0x21:*********************** Log Started 2017-08-30T21:14:25Z ***********************
21:14:26:WU02:FS01:0x21:Project: 9841 (Run 3, Clone 1, Gen 272)
21:14:26:WU02:FS01:0x21:Unit: 0x00000123ab436ca059568b61f18d0efa
21:14:26:WU02:FS01:0x21:CPU: 0x00000000000000000000000000000000
21:14:26:WU02:FS01:0x21:Machine: 1
21:14:26:WU02:FS01:0x21:Reading tar file core.xml
21:14:26:WU02:FS01:0x21:Reading tar file integrator.xml
21:14:26:WU02:FS01:0x21:Reading tar file state.xml
21:14:26:WU02:FS01:0x21:Reading tar file system.xml
21:14:26:WU02:FS01:0x21:Digital signatures verified
21:14:26:WU02:FS01:0x21:Folding@home GPU Core21 Folding@home Core
21:14:26:WU02:FS01:0x21:Version 0.0.18
21:14:29:Removing old file 'configs/config-20170817-015041.xml'
21:14:29:Saving configuration to /etc/fahclient/config.xml
21:14:29:<config>
21:14:29:  <!-- Client Control -->
21:14:29:  <fold-anon v='true'/>
21:14:29:
21:14:29:  <!-- Folding Slot Configuration -->
21:14:29:  <cause v='PARKINSONS'/>
21:14:29:  <gpu v='false'/>
21:14:29:
21:14:29:  <!-- Network -->
21:14:29:  <proxy v=':8080'/>
21:14:29:
21:14:29:  <!-- Slot Control -->
21:14:29:  <power v='full'/>
21:14:29:
21:14:29:  <!-- User Information -->
21:14:29:  <passkey v='********************************'/>
21:14:29:  <team v='224497'/>
21:14:29:  <user v='nert_ALL_1KqFJ6gDgARrEvTDsJFE9dXX3B4ttLsv1g'/>
21:14:29:
21:14:29:  <!-- Folding Slots -->
21:14:29:  <slot id='0' type='GPU'/>
21:14:29:  <slot id='1' type='GPU'/>
21:14:29:</config>
21:14:30:WU02:FS01:0x21:Completed 0 out of 2400000 steps (0%)
21:14:30:WU02:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
21:14:50:WU00:FS00:0x21:Completed 62500 out of 6250000 steps (1%)
21:15:01:WU01:FS00:Download 9.30%
21:15:30:WU02:FS01:0x21:Completed 24000 out of 2400000 steps (1%)
21:15:32:WU00:FS00:0x21:Completed 125000 out of 6250000 steps (2%)
21:16:15:WU00:FS00:0x21:Completed 187500 out of 6250000 steps (3%)
21:16:31:WU02:FS01:0x21:Completed 48000 out of 2400000 steps (2%)
21:16:57:WU00:FS00:0x21:Completed 250000 out of 6250000 steps (4%)
21:17:32:WU02:FS01:0x21:Completed 72000 out of 2400000 steps (3%)
21:17:40:WU00:FS00:0x21:Completed 312500 out of 6250000 steps (5%)
21:18:22:WU00:FS00:0x21:Completed 375000 out of 6250000 steps (6%)
21:18:33:WU02:FS01:0x21:Completed 96000 out of 2400000 steps (4%)
21:19:05:WU00:FS00:0x21:Completed 437500 out of 6250000 steps (7%)
21:19:35:WU02:FS01:0x21:Completed 120000 out of 2400000 steps (5%)
21:19:48:WU00:FS00:0x21:Completed 500000 out of 6250000 steps (8%)
21:20:30:WU00:FS00:0x21:Completed 562500 out of 6250000 steps (9%)
21:20:36:WU02:FS01:0x21:Completed 144000 out of 2400000 steps (6%)
21:21:13:WU00:FS00:0x21:Completed 625000 out of 6250000 steps (10%)
21:21:37:WU02:FS01:0x21:Completed 168000 out of 2400000 steps (7%)
21:21:56:WU00:FS00:0x21:Completed 687500 out of 6250000 steps (11%)
21:22:38:WU02:FS01:0x21:Completed 192000 out of 2400000 steps (8%)
21:22:40:WU00:FS00:0x21:Completed 750000 out of 6250000 steps (12%)
21:23:15:FS00:Paused
21:23:15:FS01:Paused
21:23:15:FS00:Shutting core down
21:23:15:FS01:Shutting core down
21:23:15:WU02:FS01:0x21:Caught signal SIGINT(2) on PID 3041
21:23:15:WU02:FS01:0x21:Exiting, please wait. . .
21:23:15:WU00:FS00:0x21:Caught signal SIGINT(2) on PID 3020
21:23:15:WU00:FS00:0x21:Exiting, please wait. . .
21:23:16:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
21:23:16:WU02:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
21:23:19:FS00:Unpaused
21:23:19:FS01:Unpaused
21:23:19:WU00:FS00:Starting
21:23:19:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 00 -suffix 01 -version 704 -lifeline 1663 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
21:23:19:WU00:FS00:Started FahCore on PID 3859
21:23:19:WU00:FS00:Core PID:3863
21:23:19:WU00:FS00:FahCore 0x21 started
21:23:19:WU02:FS01:Starting
21:23:19:WU02:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 02 -suffix 01 -version 704 -lifeline 1663 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
21:23:19:WU02:FS01:Started FahCore on PID 3866
21:23:19:WU02:FS01:Core PID:3870
21:23:19:WU02:FS01:FahCore 0x21 started
21:23:19:WU00:FS00:0x21:*********************** Log Started 2017-08-30T21:23:19Z ***********************
21:23:19:WU00:FS00:0x21:Project: 9414 (Run 1157, Clone 0, Gen 74)
21:23:19:WU00:FS00:0x21:Unit: 0x00000058ab436c9d585e0698e5671a2f
21:23:19:WU00:FS00:0x21:CPU: 0x00000000000000000000000000000000
21:23:19:WU00:FS00:0x21:Machine: 0
21:23:19:WU00:FS00:0x21:Digital signatures verified
21:23:19:WU00:FS00:0x21:Folding@home GPU Core21 Folding@home Core
21:23:19:WU00:FS00:0x21:Version 0.0.18
21:23:19:WU00:FS00:0x21:  Found a checkpoint file
21:23:19:WU02:FS01:0x21:*********************** Log Started 2017-08-30T21:23:19Z ***********************
21:23:19:WU02:FS01:0x21:Project: 9841 (Run 3, Clone 1, Gen 272)
21:23:19:WU02:FS01:0x21:Unit: 0x00000123ab436ca059568b61f18d0efa
21:23:19:WU02:FS01:0x21:CPU: 0x00000000000000000000000000000000
21:23:19:WU02:FS01:0x21:Machine: 1
21:23:19:WU02:FS01:0x21:Digital signatures verified
21:23:19:WU02:FS01:0x21:Folding@home GPU Core21 Folding@home Core
21:23:19:WU02:FS01:0x21:Version 0.0.18
21:23:19:WU02:FS01:0x21:  Found a checkpoint file
21:23:20:WU00:FS00:0x21:Completed 800000 out of 6250000 steps (12%)
21:23:20:WU00:FS00:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
21:23:21:WU02:FS01:0x21:Completed 200000 out of 2400000 steps (8%)
21:23:21:WU02:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
21:23:28:WU00:FS00:0x21:Completed 812500 out of 6250000 steps (13%)
21:24:02:WU02:FS01:0x21:Completed 216000 out of 2400000 steps (9%)
21:24:11:WU00:FS00:0x21:Completed 875000 out of 6250000 steps (14%)
21:24:54:WU00:FS00:0x21:Completed 937500 out of 6250000 steps (15%)
21:25:03:WU02:FS01:0x21:Completed 240000 out of 2400000 steps (10%)
21:25:37:WU00:FS00:0x21:Completed 1000000 out of 6250000 steps (16%)
21:26:04:WU02:FS01:0x21:Completed 264000 out of 2400000 steps (11%)
21:26:20:WU00:FS00:0x21:Completed 1062500 out of 6250000 steps (17%)
21:27:03:WU00:FS00:0x21:Completed 1125000 out of 6250000 steps (18%)
21:27:05:WU02:FS01:0x21:Completed 288000 out of 2400000 steps (12%)
21:27:47:WU00:FS00:0x21:Completed 1187500 out of 6250000 steps (19%)
21:28:06:WU02:FS01:0x21:Completed 312000 out of 2400000 steps (13%)
21:28:30:WU00:FS00:0x21:Completed 1250000 out of 6250000 steps (20%)
21:29:07:WU02:FS01:0x21:Completed 336000 out of 2400000 steps (14%)
21:29:13:WU00:FS00:0x21:Completed 1312500 out of 6250000 steps (21%)
21:29:56:WU00:FS00:0x21:Completed 1375000 out of 6250000 steps (22%)
21:30:08:WU02:FS01:0x21:Completed 360000 out of 2400000 steps (15%)
21:30:39:WU00:FS00:0x21:Completed 1437500 out of 6250000 steps (23%)
21:31:09:WU02:FS01:0x21:Completed 384000 out of 2400000 steps (16%)
21:31:22:WU00:FS00:0x21:Completed 1500000 out of 6250000 steps (24%)
21:32:05:WU00:FS00:0x21:Completed 1562500 out of 6250000 steps (25%)
21:32:11:WU02:FS01:0x21:Completed 408000 out of 2400000 steps (17%)
21:32:48:WU00:FS00:0x21:Completed 1625000 out of 6250000 steps (26%)
21:33:12:WU02:FS01:0x21:Completed 432000 out of 2400000 steps (18%)
21:33:31:WU00:FS00:0x21:Completed 1687500 out of 6250000 steps (27%)
21:34:13:WU02:FS01:0x21:Completed 456000 out of 2400000 steps (19%)
21:34:14:WU00:FS00:0x21:Completed 1750000 out of 6250000 steps (28%)
21:34:57:WU00:FS00:0x21:Completed 1812500 out of 6250000 steps (29%)
21:35:14:WU02:FS01:0x21:Completed 480000 out of 2400000 steps (20%)
21:35:41:WU00:FS00:0x21:Completed 1875000 out of 6250000 steps (30%)
21:36:15:WU02:FS01:0x21:Completed 504000 out of 2400000 steps (21%)
21:36:24:WU00:FS00:0x21:Completed 1937500 out of 6250000 steps (31%)
21:37:07:WU00:FS00:0x21:Completed 2000000 out of 6250000 steps (32%)
21:37:16:WU02:FS01:0x21:Completed 528000 out of 2400000 steps (22%)
21:37:51:WU00:FS00:0x21:Completed 2062500 out of 6250000 steps (33%)
21:38:17:WU02:FS01:0x21:Completed 552000 out of 2400000 steps (23%)
21:38:34:WU00:FS00:0x21:Completed 2125000 out of 6250000 steps (34%)
21:39:17:WU00:FS00:0x21:Completed 2187500 out of 6250000 steps (35%)
21:39:18:WU02:FS01:0x21:Completed 576000 out of 2400000 steps (24%)
21:40:00:WU00:FS00:0x21:Completed 2250000 out of 6250000 steps (36%)
21:40:19:WU02:FS01:0x21:Completed 600000 out of 2400000 steps (25%)
21:40:44:WU00:FS00:0x21:Completed 2312500 out of 6250000 steps (37%)
21:41:21:WU02:FS01:0x21:Completed 624000 out of 2400000 steps (26%)
21:41:27:WU00:FS00:0x21:Completed 2375000 out of 6250000 steps (38%)
21:42:10:WU00:FS00:0x21:Completed 2437500 out of 6250000 steps (39%)
21:42:22:WU02:FS01:0x21:Completed 648000 out of 2400000 steps (27%)
21:42:54:WU00:FS00:0x21:Completed 2500000 out of 6250000 steps (40%)
21:43:23:WU02:FS01:0x21:Completed 672000 out of 2400000 steps (28%)
21:43:37:WU00:FS00:0x21:Completed 2562500 out of 6250000 steps (41%)


I appreciate any help.
Nert
 
Posts: 142
Joined: Wed Mar 26, 2014 7:46 pm

Re: Zombie Download Process

Postby Nert » Thu Aug 31, 2017 1:28 am

Another reboot solved the problem. I'm not sure why the first one didn't clean things up, but problem is solved now. Not sure what the root cause is, but I'm up and running for not.
Nert
 
Posts: 142
Joined: Wed Mar 26, 2014 7:46 pm

Re: Zombie Download Process

Postby bruce » Thu Aug 31, 2017 1:33 am

Restarting FAHClient should have cleared the zombie connection.
> sudo /etc/init.d/FAHClient restart
(at least in Ubuntu)
bruce
 
Posts: 21599
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Zombie Download Process

Postby Nert » Thu Aug 31, 2017 2:55 pm

Bruce,

Thanks for the info. I'm going to save that for future reference (along with some other stuff that I just discovered). This most recent experience showed me how little I actually know about FAH on Linux.
Nert
 
Posts: 142
Joined: Wed Mar 26, 2014 7:46 pm

Re: Zombie Download Process

Postby Nert » Thu Aug 31, 2017 3:03 pm

Right after I posted that message, I noticed that I seem to have something going on again with the download process.

Code: Select all
14:39:26:WU02:FS00:Connecting to 171.67.108.45:80
14:39:26:WU02:FS00:Assigned to work server 140.163.4.232
14:39:26:WU02:FS00:Requesting new work unit for slot 00: RUNNING gpu:0:GP104 [GeForce GTX 1080] from 140.163.4.232
14:39:26:WU02:FS00:Connecting to 140.163.4.232:8080
14:39:27:WU02:FS00:Downloading 36.32MiB
14:39:33:WU02:FS00:Download 0.69%
14:39:40:WU02:FS00:Download 1.20%
14:39:50:WU02:FS00:Download 1.72%
14:39:59:WU02:FS00:Download 1.89%
14:40:08:WU00:FS00:0x21:Completed 6250000 out of 6250000 steps (100%)
14:40:08:WU00:FS00:0x21:Saving result file logfile_01.txt
14:40:08:WU00:FS00:0x21:Saving result file checkpointState.xml
14:40:08:WU00:FS00:0x21:Saving result file checkpt.crc
14:40:08:WU00:FS00:0x21:Saving result file log.txt
14:40:08:WU00:FS00:0x21:Saving result file positions.xtc
14:40:08:WU00:FS00:0x21:Folding@home Core Shutdown: FINISHED_UNIT
14:40:08:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
14:40:08:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:9415 run:2600 clone:0 gen:24 core:0x21 unit:0x0000001eab436c9d585e06e132c1f606
14:40:09:WU00:FS00:Uploading 7.72MiB to 171.67.108.157
14:40:09:WU00:FS00:Connecting to 171.67.108.157:8080
14:40:15:WU00:FS00:Upload 23.47%
14:40:17:WU02:FS00:Download 2.07%
14:40:21:WU00:FS00:Upload 42.89%
14:40:27:WU00:FS00:Upload 61.50%
14:40:33:WU00:FS00:Upload 80.92%
14:40:39:WU00:FS00:Upload 100.00%
14:40:41:WU00:FS00:Upload complete
14:40:41:WU00:FS00:Server responded WORK_ACK (400)
14:40:41:WU00:FS00:Final credit estimate, 51574.00 points
14:40:41:WU00:FS00:Cleaning up
14:40:41:WU02:FS00:Download 2.24%
14:41:02:WU02:FS00:Download 2.75%
14:41:08:WU02:FS00:Download 3.27%
14:41:24:WU01:FS01:0x21:Completed 1200000 out of 5000000 steps (24%)
14:41:29:WU02:FS00:Download 3.79%
14:41:35:WU02:FS00:Download 4.30%
14:42:10:WU02:FS00:Download 4.65%
14:43:53:WU02:FS00:Download 4.82%
14:44:55:WU02:FS00:Download 4.99%
14:45:03:WU02:FS00:Download 5.34%
14:45:09:WU02:FS00:Download 5.51%
14:45:27:WU02:FS00:Download 5.68%
14:45:45:WU01:FS01:0x21:Completed 1250000 out of 5000000 steps (25%)
14:46:33:WU02:FS00:Download 5.85%
14:46:39:WU02:FS00:Download 6.20%
14:46:45:WU02:FS00:Download 6.54%
14:46:57:WU02:FS00:Download 7.06%
14:47:07:WU02:FS00:Download 7.57%
14:47:40:WU02:FS00:Download 7.74%
14:47:47:WU02:FS00:Download 8.09%
14:47:54:WU02:FS00:Download 8.61%
14:50:06:WU02:FS00:Download 8.78%
14:50:06:WU01:FS01:0x21:Completed 1300000 out of 5000000 steps (26%)
14:50:46:WU02:FS00:Download 8.95%
14:54:27:WU01:FS01:0x21:Completed 1350000 out of 5000000 steps (27%)
14:58:48:WU01:FS01:0x21:Completed 1400000 out of 5000000 steps (28%)
14:58:53:WU02:FS00:Download 9.12%
14:59:00:WU02:FS00:Download 9.47%
14:59:07:WU02:FS00:Download 9.81%
14:59:13:WU02:FS00:Download 10.15%
14:59:49:WU02:FS00:Download 10.50%


Since it's happening right now I could gather additional information if you tell me what to do.
Nert
 
Posts: 142
Joined: Wed Mar 26, 2014 7:46 pm

Re: Zombie Download Process

Postby Nert » Thu Aug 31, 2017 3:21 pm

Here is ping and traceroute info. Maybe some network guru can diagnose from this:

Code: Select all
roger@mintz97 ~ $ ping 171.67.108.4
PING 171.67.108.4 (171.67.108.4) 56(84) bytes of data.

^C
--- 171.67.108.4 ping statistics ---
213 packets transmitted, 0 received, 100% packet loss, time 211999ms

roger@mintz97 ~ $ traceroute 171.67.108.4
traceroute to 171.67.108.4 (171.67.108.4), 64 hops max
  1   192.168.0.1  0.519ms  0.356ms  0.301ms
  2   207.109.2.15  12.268ms  12.257ms  11.900ms
  3   207.109.3.113  12.868ms  11.948ms  12.485ms
  4   216.160.19.2  12.710ms  12.785ms  13.108ms
  5   *  *  4.69.214.150  53.273ms
  6   4.15.122.46  58.016ms  57.136ms  58.295ms
  7   137.164.23.145  56.672ms  56.669ms  57.323ms
  8   *  *  *
  9   *  *  *
 10   *  *  *
 11   *  *  *
 12   *  *  *
 13   *  *  *
 14   *  *  *
 15   *  *  *
 16   *  *  *
 17   *  *  *
 18   *  *  *
 19   *  *  *
 20   *  *  *
 21   *  *  *
 22   *  *  *
 23   *  *  *
 24   *  *  *
 25   *  *  *
 26   *  *  *
 27   *  *  *
 28   *  *  *
 29   *  *  *
 30   *  *  *
 31   *  *  *
 32   *  *  *
 33   *  *  *
 34   *  *  *
 35   *  *  *
 36   *  *  *
 37   *  *  *
 38   *  *  *
 39   *  *  *
 40   *  *  *
 41   *  *  *
 42   *  *  *
 43   *  *  *
 44   *  *  *
 45   *  *  *
 46   *  *  *
 47   *  *  *
 48   *  *  *
 49   *  *  *
 50   *  *  *
 51   *  *  *
 52   *  *  *
 53   *  *  *
 54   *  *  *
 55   *  *  *
 56   *  *  *
 57   *  *  *
 58   *  *  *
 59   *  *  *
 60   *  *  *
 61   *  *  *
 62   *  *  *
 63   *  *  *
 64   *  *  *
roger@mintz97 ~ $


btw ... I'm more than a little embarrassed about having started a thread, then posting multiple responses before anyone has a chance to take a breath.... kinda like talking to yourself in public. :oops:

I'll wait for a hang and then restart the client as described by Bruce.
Nert
 
Posts: 142
Joined: Wed Mar 26, 2014 7:46 pm

Re: Zombie Download Process

Postby rwh202 » Thu Aug 31, 2017 5:06 pm

Sorry, can only be of limited help with the slow download (maybe a server admin can check logs at their end), but what does a ping to another server (ideally another stanford one and something else like google.com) look like - might help diagnose which end is having the biggest problem.

Also, I'm interested whether the client restart works?
I ask, because running linux mint, I've never been able to restart the client and have fahControl reconnect - it has needed a machine reboot for me.
rwh202
 
Posts: 333
Joined: Mon Nov 15, 2010 8:51 pm
Location: South Coast, UK

Re: Zombie Download Process

Postby bruce » Thu Aug 31, 2017 5:13 pm

rwh202 wrote:Also, I'm interested whether the client restart works?
I ask, because running linux mint, I've never been able to restart the client and have fahControl reconnect - it has needed a machine reboot for me.


Just a guess, but are you restarting FAHClient using the script found in /etc/init.d (or an adaptation of it). Restarting FAHClient needs more than just a simple restart.
bruce
 
Posts: 21599
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Zombie Download Process

Postby Nert » Thu Aug 31, 2017 7:03 pm

rwh202 - Here are a couple of pings and traceroutes for two addresses that are from the original log I posted. I'm assuming they're for Stanford, but don't know for sure:

Code: Select all
roger@mintz97 ~ $ ping 171.67.108.157
PING 171.67.108.157 (171.67.108.157) 56(84) bytes of data.
64 bytes from 171.67.108.157: icmp_seq=1 ttl=57 time=57.2 ms
64 bytes from 171.67.108.157: icmp_seq=2 ttl=57 time=58.3 ms
64 bytes from 171.67.108.157: icmp_seq=3 ttl=57 time=57.3 ms
64 bytes from 171.67.108.157: icmp_seq=4 ttl=57 time=58.5 ms
64 bytes from 171.67.108.157: icmp_seq=5 ttl=57 time=57.9 ms
64 bytes from 171.67.108.157: icmp_seq=6 ttl=57 time=57.9 ms
^C
--- 171.67.108.157 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5006ms
rtt min/avg/max/mdev = 57.229/57.905/58.551/0.469 ms
roger@mintz97 ~ $ traceroute 171.67.108.157
traceroute to 171.67.108.157 (171.67.108.157), 64 hops max
  1   192.168.0.1  0.456ms  0.318ms  0.298ms
  2   207.109.2.15  12.252ms  12.575ms  12.094ms
  3   207.109.3.113  12.659ms  12.065ms  12.287ms
  4   216.160.19.2  12.213ms  15.028ms  12.613ms
  5   *  *  *
  6   4.15.122.46  58.242ms  57.083ms  58.316ms
  7   137.164.23.145  60.974ms  58.478ms  57.131ms
  8   171.64.255.193  57.796ms  58.012ms  60.127ms
  9   171.67.108.157  57.834ms !*  58.270ms !*  57.599ms !*
roger@mintz97 ~ $ ping 140.163.4.233
PING 140.163.4.233 (140.163.4.233) 56(84) bytes of data.
^C
--- 140.163.4.233 ping statistics ---
20 packets transmitted, 0 received, 100% packet loss, time 19150ms

roger@mintz97 ~ $ traceroute 140.163.4.233
traceroute to 140.163.4.233 (140.163.4.233), 64 hops max
  1   192.168.0.1  0.502ms  0.323ms  0.305ms
  2   207.109.2.15  46.188ms  12.084ms  12.269ms
  3   207.109.3.113  12.191ms  12.553ms  11.699ms
  4   67.14.8.194  21.594ms  21.080ms  21.825ms
  5   173.205.63.229  21.936ms  21.560ms  21.641ms
  6   213.254.214.158  47.584ms  48.671ms  47.056ms
  7   173.205.47.226  46.934ms  47.010ms  46.937ms
  8   40.128.10.173  47.536ms  59.392ms  47.287ms
  9   *  40.128.248.7  51.435ms  51.630ms
 10   40.128.249.134  50.545ms  50.103ms  50.578ms
 11   74.8.57.6  50.432ms  51.004ms  50.183ms
 12   *  *  *
 13   *  *  *
 14   *  *  *
 15   *  *  *
 16   *  *  *
 17   *  *  *
 18   *  *  *
 19   *  *  *
 20   *  *  *
 21   *  *  *
 22   *  *  *
 23   *  *  *
 24   *  *  *
 25   *  *  *
 26   *  *  *
 27   *  *  *
 28   *  *  *
 29   *  *  *
 30   *  *  *
 31   *  *  *
 32   *  *  *
 33   *  *  *
 34   *  *  *
 35   *  *  *
 36   *  *  *
 37   *  *  *
 38   *  *  *
 39   *  *  *
 40   *  *  *
 41   *  *  *
 42   *  *  *
 43   *  *  *
 44   *  *  *
 45   *  *  *
 46   *  *  *
 47   *  *  *
 48   *  *  *
 49   *  *  *
 50   *  *  *
 51   *  *  *
 52   *  *  *
 53   *  *  *
 54   *  *  *
 55   *  *  *
 56   *  *  *
 57   *  *  *
 58   *  *  *
 59   *  *  *
 60   *  *  *
 61   *  *  *
 62   *  *  *
 63   *  *  *
 64   *  *  *
roger@mintz97 ~ $


I tried the command that Bruce gave me "sudo /etc/init.d/FAHClient restart". Before I issued this command, I stopped FAHControl. After the restart of FAHClient, I re-started FAHControl. Right at the present time FAHControl is sitting there with everything on the screen grayed out. It says "Client:local Updating Inactive."

I went to /var/lib/fahclient and log.txt contains the following:

Code: Select all
roger@mintz97 ~ $ sudo /etc/init.d/FAHClient restart
[sudo] password for roger:
Stopping fahclient ... OK
Starting fahclient ... OK
roger@mintz97 ~ $ cd /var/lib/fah*
roger@mintz97 /var/lib/fahclient $ ls
configs  cores  GPUs.txt  logs  log.txt  work
roger@mintz97 /var/lib/fahclient $ cat log.txt
*********************** Log Started 2017-08-31T18:51:09Z ***********************
18:51:09:************************* Folding@home Client *************************
18:51:09:    Website: http://folding.stanford.edu/
18:51:09:  Copyright: (c) 2009-2014 Stanford University
18:51:09:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
18:51:09:       Args: --child --lifeline 9286 /etc/fahclient/config.xml --run-as
18:51:09:             fahclient --pid-file=/var/run/fahclient.pid --daemon
18:51:09:     Config: /etc/fahclient/config.xml
18:51:09:******************************** Build ********************************
18:51:09:    Version: 7.4.4
18:51:09:       Date: Mar 4 2014
18:51:09:       Time: 12:02:38
18:51:09:    SVN Rev: 4130
18:51:09:     Branch: fah/trunk/client
18:51:09:   Compiler: GNU 4.4.7
18:51:09:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
18:51:09:             -fno-unsafe-math-optimizations -msse2
18:51:09:   Platform: linux2 3.2.0-1-amd64
18:51:09:       Bits: 64
18:51:09:       Mode: Release
18:51:09:******************************* System ********************************
18:51:09:        CPU: Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
18:51:09:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
18:51:09:       CPUs: 4
18:51:09:     Memory: 15.61GiB
18:51:09:Free Memory: 10.88GiB
18:51:09:    Threads: POSIX_THREADS
18:51:09: OS Version: 3.19
18:51:09:Has Battery: false
18:51:09: On Battery: false
18:51:09: UTC Offset: -5
18:51:09:        PID: 9315
18:51:09:        CWD: /var/lib/fahclient
18:51:09:         OS: Linux 3.19.0-32-generic x86_64
18:51:09:    OS Arch: AMD64
18:51:09:       GPUs: 3
18:51:09:      GPU 0: NVIDIA:5 GP104 [GeForce GTX 1080]
18:51:09:      GPU 1: UNSUPPORTED: NV3 [PCI]
18:51:09:      GPU 2: NVIDIA:5 GM204 [GeForce GTX 970]
18:51:09:       CUDA: 6.1
18:51:09:CUDA Driver: 8000
18:51:09:***********************************************************************
18:51:09:<config>
18:51:09:  <!-- Client Control -->
18:51:09:  <fold-anon v='true'/>
18:51:09:
18:51:09:  <!-- Folding Slot Configuration -->
18:51:09:  <cause v='PARKINSONS'/>
18:51:09:  <gpu v='false'/>
18:51:09:
18:51:09:  <!-- Network -->
18:51:09:  <proxy v=':8080'/>
18:51:09:
18:51:09:  <!-- Slot Control -->
18:51:09:  <power v='full'/>
18:51:09:
18:51:09:  <!-- User Information -->
18:51:09:  <passkey v='********************************'/>
18:51:09:  <team v='224497'/>
18:51:09:  <user v='nert_ALL_1KqFJ6gDgARrEvTDsJFE9dXX3B4ttLsv1g'/>
18:51:09:
18:51:09:  <!-- Folding Slots -->
18:51:09:  <slot id='0' type='GPU'/>
18:51:09:  <slot id='1' type='GPU'/>
18:51:09:</config>
18:51:09:Switching to user fahclient
18:51:09:Trying to access database...
18:51:39:ERROR:Exception: Error executing: 'PRAGMA synchronous=NORMAL': database is locked
roger@mintz97 /var/lib/fahclient $


So, it looks like the database (wherever or whatever that is) is locked. No Bueno. Looks like a reboot is in the cards. I will, however, leave the system as is in case there are other diagnostics that might be useful in diagnosing this problem. Please let me know if there is any other info I can provide.
Nert
 
Posts: 142
Joined: Wed Mar 26, 2014 7:46 pm

Re: Zombie Download Process

Postby Nert » Thu Aug 31, 2017 9:52 pm

just rebooted system. FAHControl shows 2 slots folding and 1 "ready". Not sure what the ready one is about, probably means trouble down the road. I'm done for now .... just wanted to get back to contributing to the science effort. Here's the log:

Code: Select all
*********************** Log Started 2017-08-31T21:44:09Z ***********************
21:44:09:************************* Folding@home Client *************************
21:44:09:    Website: http://folding.stanford.edu/
21:44:09:  Copyright: (c) 2009-2014 Stanford University
21:44:09:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
21:44:09:       Args: --child --lifeline 1679 /etc/fahclient/config.xml --run-as
21:44:09:             fahclient --pid-file=/var/run/fahclient.pid --daemon
21:44:09:     Config: /etc/fahclient/config.xml
21:44:09:******************************** Build ********************************
21:44:09:    Version: 7.4.4
21:44:09:       Date: Mar 4 2014
21:44:09:       Time: 12:02:38
21:44:09:    SVN Rev: 4130
21:44:09:     Branch: fah/trunk/client
21:44:09:   Compiler: GNU 4.4.7
21:44:09:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
21:44:09:             -fno-unsafe-math-optimizations -msse2
21:44:09:   Platform: linux2 3.2.0-1-amd64
21:44:09:       Bits: 64
21:44:09:       Mode: Release
21:44:09:******************************* System ********************************
21:44:09:        CPU: Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
21:44:09:     CPU ID: GenuineIntel Family 6 Model 60 Stepping 3
21:44:09:       CPUs: 4
21:44:09:     Memory: 15.61GiB
21:44:09:Free Memory: 15.16GiB
21:44:09:    Threads: POSIX_THREADS
21:44:09: OS Version: 3.19
21:44:09:Has Battery: false
21:44:09: On Battery: false
21:44:09: UTC Offset: -5
21:44:09:        PID: 1681
21:44:09:        CWD: /var/lib/fahclient
21:44:09:         OS: Linux 3.19.0-32-generic x86_64
21:44:09:    OS Arch: AMD64
21:44:09:       GPUs: 3
21:44:09:      GPU 0: NVIDIA:5 GP104 [GeForce GTX 1080]
21:44:09:      GPU 1: UNSUPPORTED: NV3 [PCI]
21:44:09:      GPU 2: NVIDIA:5 GM204 [GeForce GTX 970]
21:44:09:       CUDA: 6.1
21:44:09:CUDA Driver: 8000
21:44:09:***********************************************************************
21:44:09:<config>
21:44:09:  <!-- Client Control -->
21:44:09:  <fold-anon v='true'/>
21:44:09:
21:44:09:  <!-- Folding Slot Configuration -->
21:44:09:  <cause v='PARKINSONS'/>
21:44:09:  <gpu v='false'/>
21:44:09:
21:44:09:  <!-- Network -->
21:44:09:  <proxy v=':8080'/>
21:44:09:
21:44:09:  <!-- Slot Control -->
21:44:09:  <power v='full'/>
21:44:09:
21:44:09:  <!-- User Information -->
21:44:09:  <passkey v='********************************'/>
21:44:09:  <team v='224497'/>
21:44:09:  <user v='nert_ALL_1KqFJ6gDgARrEvTDsJFE9dXX3B4ttLsv1g'/>
21:44:09:
21:44:09:  <!-- Folding Slots -->
21:44:09:  <slot id='0' type='GPU'/>
21:44:09:  <slot id='1' type='GPU'/>
21:44:09:</config>
21:44:09:Switching to user fahclient
21:44:09:Trying to access database...
21:44:09:Successfully acquired database lock
21:44:10:Enabled folding slot 00: READY gpu:0:GP104 [GeForce GTX 1080]
21:44:10:Enabled folding slot 01: READY gpu:2:GM204 [GeForce GTX 970]
21:44:10:WU01:FS01:Starting
21:44:10:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 01 -suffix 01 -version 704 -lifeline 1681 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
21:44:10:WU01:FS01:Started FahCore on PID 1707
21:44:10:WU01:FS01:Core PID:1721
21:44:10:WU01:FS01:FahCore 0x21 started
21:44:12:WARNING:WU00:FS00:Exception: Could not get IP address for assign-GPU.stanford.edu: Temporary failure in name resolution
21:44:12:ERROR:WU00:FS00:Exception: Could not get an assignment
21:44:12:WARNING:WU02:FS01:Exception: Could not get IP address for assign-GPU.stanford.edu: Temporary failure in name resolution
21:44:12:ERROR:WU02:FS01:Exception: Could not get an assignment
21:44:13:WARNING:WU00:FS00:Exception: Could not get IP address for assign-GPU.stanford.edu: Temporary failure in name resolution
21:44:13:ERROR:WU00:FS00:Exception: Could not get an assignment
21:44:13:WARNING:WU02:FS01:Exception: Could not get IP address for assign-GPU.stanford.edu: Temporary failure in name resolution
21:44:13:ERROR:WU02:FS01:Exception: Could not get an assignment
21:44:18:WU01:FS01:0x21:*********************** Log Started 2017-08-31T21:44:18Z ***********************
21:44:18:WU01:FS01:0x21:Project: 11431 (Run 2, Clone 13, Gen 143)
21:44:18:WU01:FS01:0x21:Unit: 0x000000b28ca304e858e137b836bf39bd
21:44:18:WU01:FS01:0x21:CPU: 0x00000000000000000000000000000000
21:44:18:WU01:FS01:0x21:Machine: 1
21:44:18:WU01:FS01:0x21:Reading tar file core.xml
21:44:18:WU01:FS01:0x21:Reading tar file integrator.xml
21:44:18:WU01:FS01:0x21:Reading tar file state.xml
21:44:21:WU01:FS01:0x21:Reading tar file system.xml
21:44:24:WU01:FS01:0x21:Digital signatures verified
21:44:24:WU01:FS01:0x21:Folding@home GPU Core21 Folding@home Core
21:44:24:WU01:FS01:0x21:Version 0.0.18
21:44:35:WU01:FS01:0x21:Completed 0 out of 5000000 steps (0%)
21:44:35:WU01:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
21:45:13:WU00:FS00:Connecting to 171.67.108.45:80
21:45:13:WU02:FS01:Connecting to 171.67.108.45:80
21:45:13:WU00:FS00:Assigned to work server 171.67.108.157
21:45:13:WU00:FS00:Requesting new work unit for slot 00: READY gpu:0:GP104 [GeForce GTX 1080] from 171.67.108.157
21:45:13:WU00:FS00:Connecting to 171.67.108.157:8080
21:45:13:WU02:FS01:Assigned to work server 171.67.108.160
21:45:14:WU02:FS01:Requesting new work unit for slot 01: RUNNING gpu:2:GM204 [GeForce GTX 970] from 171.67.108.160
21:45:14:WU02:FS01:Connecting to 171.67.108.160:8080
21:45:14:WU00:FS00:Downloading 5.16MiB
21:45:17:WU02:FS01:Downloading 2.02MiB
21:45:18:WU00:FS00:Download complete
21:45:18:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9414 run:1968 clone:0 gen:64 core:0x21 unit:0x0000004bab436c9d585e069f630fa758
21:45:18:WU00:FS00:Starting
21:45:18:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 00 -suffix 01 -version 704 -lifeline 1681 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
21:45:18:WU00:FS00:Started FahCore on PID 2936
21:45:18:WU00:FS00:Core PID:2940
21:45:18:WU00:FS00:FahCore 0x21 started
21:45:18:WU02:FS01:Download complete
21:45:19:WU00:FS00:0x21:*********************** Log Started 2017-08-31T21:45:18Z ***********************
21:45:19:WU00:FS00:0x21:Project: 9414 (Run 1968, Clone 0, Gen 64)
21:45:19:WU00:FS00:0x21:Unit: 0x0000004bab436c9d585e069f630fa758
21:45:19:WU00:FS00:0x21:CPU: 0x00000000000000000000000000000000
21:45:19:WU00:FS00:0x21:Machine: 0
21:45:19:WU00:FS00:0x21:Reading tar file core.xml
21:45:19:WU00:FS00:0x21:Reading tar file integrator.xml
21:45:19:WU00:FS00:0x21:Reading tar file state.xml
21:45:19:WU00:FS00:0x21:Reading tar file system.xml
21:45:19:WU00:FS00:0x21:Digital signatures verified
21:45:19:WU00:FS00:0x21:Folding@home GPU Core21 Folding@home Core
21:45:19:WU00:FS00:0x21:Version 0.0.18
21:45:19:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:9841 run:5 clone:4 gen:302 core:0x21 unit:0x00000156ab436ca059568b611be5979a
21:45:21:WU00:FS00:0x21:Completed 0 out of 6250000 steps (0%)
21:45:21:WU00:FS00:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
21:46:03:WU00:FS00:0x21:Completed 62500 out of 6250000 steps (1%)
21:46:46:WU00:FS00:0x21:Completed 125000 out of 6250000 steps (2%)
21:47:28:WU00:FS00:0x21:Completed 187500 out of 6250000 steps (3%)
21:48:11:WU00:FS00:0x21:Completed 250000 out of 6250000 steps (4%)
21:48:54:WU00:FS00:0x21:Completed 312500 out of 6250000 steps (5%)
21:48:57:WU01:FS01:0x21:Completed 50000 out of 5000000 steps (1%)
21:49:37:WU00:FS00:0x21:Completed 375000 out of 6250000 steps (6%)


Kinda disappointed to have to reboot a linux system to get something running. I expect that from Windows, but not Linux.
Nert
 
Posts: 142
Joined: Wed Mar 26, 2014 7:46 pm

Re: Zombie Download Process

Postby bruce » Thu Aug 31, 2017 11:31 pm

Nert wrote:just rebooted system. FAHControl shows 2 slots folding and 1 "ready". Not sure what the ready one is about, probably means trouble down the road.


False. Note that you have 3 GPUs which are managed by way of 2 slots. The UNSUPPORTED GPU won't cause you trouble down the road because it doesn't get a slot allocated to it.

18:51:09: GPU 0: NVIDIA:5 GP104 [GeForce GTX 1080] gets assigned to slot 00
18:51:09: GPU 1: UNSUPPORTED: NV3 [PCI]
18:51:09: GPU 2: NVIDIA:5 GM204 [GeForce GTX 970] gets assigned to slot 01.
bruce
 
Posts: 21599
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Zombie Download Process

Postby Nert » Fri Sep 01, 2017 1:07 am

I should have said "Work Queue" has three entries ... I saw three items on the bottom half of the display.
Nert
 
Posts: 142
Joined: Wed Mar 26, 2014 7:46 pm

Re: Zombie Download Process

Postby bruce » Fri Sep 01, 2017 5:01 am

Nope.Work queue is something different.

When a WU is downloaded, it gets a work queue number, even before it starts processing, and it keeps that number even after it finishing processing (until the upload is successful) so there are often 3 WUs when you only have two slots. (It's possible to have more than three, but two should be the minimum, given that you have two slots.

Your GPU 1 isn't assigned a number in either the slot enumeration or the WU enumeration since it is unsupported as far as FAH is concerned.
bruce
 
Posts: 21599
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Zombie Download Process

Postby rwh202 » Fri Sep 01, 2017 6:16 am

Nert wrote:I should have said "Work Queue" has three entries ... I saw three items on the bottom half of the display.

I think that's the behaviour I have experienced too when attempting a restart on Mint using the script - it is able to download a WU but not record the fact in the 'database'.
When you restarted the computer, it then downloaded two new WUs and probably then registered the previous download. It should hopefully process it after one of the others completes.
rwh202
 
Posts: 333
Joined: Mon Nov 15, 2010 8:51 pm
Location: South Coast, UK

Re: Zombie Download Process

Postby Nert » Fri Sep 01, 2017 1:35 pm

rwh202,

I'm back to normal now. I did have additional problems following my last post. I resolved them by stopping the client and then deleting everything in /var/lib/fahclient/work/01 and /var/lib/fahclient/work/02 and then removing the directories 01 and 02 and rebooting. I noticed that there's a .lock file in those directories. That file probably caused the "database is locked" condition following restart that I reported. If I encounter this specific problem in the future I'll try this procedure without the reboot and see what happens. As nearly as I can tell, there are two problems that are as yet unresolved:

1) What was the network or server issue that caused the slow download that started the cascade of problems ?
2) Why doesn't sudo /etc/init.d/FAHClient restart resolve the condition on Linux Mint.
Nert
 
Posts: 142
Joined: Wed Mar 26, 2014 7:46 pm

Next

Return to V7.4.4 Public Release Windows/Linux/MacOS X (deprecated)

Who is online

Users browsing this forum: No registered users and 1 guest

cron