Folding stopped for no apparent reason.

Moderators: Site Moderators, PandeGroup

Folding stopped for no apparent reason.

Postby SKeptical_Thinker » Fri Jan 31, 2014 2:24 am

Folding stopped at 19:33:14 and it got very, very quiet in the log.

I tried to restart it with the advanced control at 1:59, but nothing useful happened.

Code: Select all
17:23:32:WU00:FS00:Upload 93.43%
17:23:38:WU00:FS00:Upload 96.75%
17:23:46:WU00:FS00:Upload complete
17:23:46:WU00:FS00:Server responded WORK_ACK (400)
17:23:46:WU00:FS00:Final credit estimate, 6890.00 points
17:23:47:WU00:FS00:Cleaning up
17:23:49:WU01:FS00:0xa4:Completed 5000 out of 250000 steps  (2%)
17:25:06:WU01:FS00:0xa4:Completed 7500 out of 250000 steps  (3%)
17:26:24:WU01:FS00:0xa4:Completed 10000 out of 250000 steps  (4%)
17:27:41:WU01:FS00:0xa4:Completed 12500 out of 250000 steps  (5%)
17:28:59:WU01:FS00:0xa4:Completed 15000 out of 250000 steps  (6%)
17:30:17:WU01:FS00:0xa4:Completed 17500 out of 250000 steps  (7%)
17:31:34:WU01:FS00:0xa4:Completed 20000 out of 250000 steps  (8%)
17:32:52:WU01:FS00:0xa4:Completed 22500 out of 250000 steps  (9%)
17:34:10:WU01:FS00:0xa4:Completed 25000 out of 250000 steps  (10%)
17:35:28:WU01:FS00:0xa4:Completed 27500 out of 250000 steps  (11%)
17:36:45:WU01:FS00:0xa4:Completed 30000 out of 250000 steps  (12%)
17:38:03:WU01:FS00:0xa4:Completed 32500 out of 250000 steps  (13%)
17:39:21:WU01:FS00:0xa4:Completed 35000 out of 250000 steps  (14%)
17:40:39:WU01:FS00:0xa4:Completed 37500 out of 250000 steps  (15%)
17:41:57:WU01:FS00:0xa4:Completed 40000 out of 250000 steps  (16%)
17:43:15:WU01:FS00:0xa4:Completed 42500 out of 250000 steps  (17%)
17:44:33:WU01:FS00:0xa4:Completed 45000 out of 250000 steps  (18%)
17:45:52:WU01:FS00:0xa4:Completed 47500 out of 250000 steps  (19%)
17:47:10:WU01:FS00:0xa4:Completed 50000 out of 250000 steps  (20%)
17:48:29:WU01:FS00:0xa4:Completed 52500 out of 250000 steps  (21%)
17:49:48:WU01:FS00:0xa4:Completed 55000 out of 250000 steps  (22%)
17:51:06:WU01:FS00:0xa4:Completed 57500 out of 250000 steps  (23%)
17:52:25:WU01:FS00:0xa4:Completed 60000 out of 250000 steps  (24%)
17:53:44:WU01:FS00:0xa4:Completed 62500 out of 250000 steps  (25%)
17:55:02:WU01:FS00:0xa4:Completed 65000 out of 250000 steps  (26%)
17:56:20:WU01:FS00:0xa4:Completed 67500 out of 250000 steps  (27%)
17:57:39:WU01:FS00:0xa4:Completed 70000 out of 250000 steps  (28%)
17:58:58:WU01:FS00:0xa4:Completed 72500 out of 250000 steps  (29%)
18:00:16:WU01:FS00:0xa4:Completed 75000 out of 250000 steps  (30%)
18:01:35:WU01:FS00:0xa4:Completed 77500 out of 250000 steps  (31%)
18:02:53:WU01:FS00:0xa4:Completed 80000 out of 250000 steps  (32%)
18:04:11:WU01:FS00:0xa4:Completed 82500 out of 250000 steps  (33%)
18:05:30:WU01:FS00:0xa4:Completed 85000 out of 250000 steps  (34%)
18:06:48:WU01:FS00:0xa4:Completed 87500 out of 250000 steps  (35%)
18:08:06:WU01:FS00:0xa4:Completed 90000 out of 250000 steps  (36%)
18:09:24:WU01:FS00:0xa4:Completed 92500 out of 250000 steps  (37%)
18:10:42:WU01:FS00:0xa4:Completed 95000 out of 250000 steps  (38%)
18:12:01:WU01:FS00:0xa4:Completed 97500 out of 250000 steps  (39%)
18:13:18:WU01:FS00:0xa4:Completed 100000 out of 250000 steps  (40%)
18:14:37:WU01:FS00:0xa4:Completed 102500 out of 250000 steps  (41%)
18:15:55:WU01:FS00:0xa4:Completed 105000 out of 250000 steps  (42%)
18:17:13:WU01:FS00:0xa4:Completed 107500 out of 250000 steps  (43%)
18:18:31:WU01:FS00:0xa4:Completed 110000 out of 250000 steps  (44%)
18:19:49:WU01:FS00:0xa4:Completed 112500 out of 250000 steps  (45%)
18:21:07:WU01:FS00:0xa4:Completed 115000 out of 250000 steps  (46%)
18:22:25:WU01:FS00:0xa4:Completed 117500 out of 250000 steps  (47%)
18:23:43:WU01:FS00:0xa4:Completed 120000 out of 250000 steps  (48%)
18:25:01:WU01:FS00:0xa4:Completed 122500 out of 250000 steps  (49%)
18:26:19:WU01:FS00:0xa4:Completed 125000 out of 250000 steps  (50%)
18:27:36:WU01:FS00:0xa4:Completed 127500 out of 250000 steps  (51%)
18:28:54:WU01:FS00:0xa4:Completed 130000 out of 250000 steps  (52%)
18:30:13:WU01:FS00:0xa4:Completed 132500 out of 250000 steps  (53%)
18:31:32:WU01:FS00:0xa4:Completed 135000 out of 250000 steps  (54%)
18:32:51:WU01:FS00:0xa4:Completed 137500 out of 250000 steps  (55%)
18:34:10:WU01:FS00:0xa4:Completed 140000 out of 250000 steps  (56%)
18:35:28:WU01:FS00:0xa4:Completed 142500 out of 250000 steps  (57%)
18:36:48:WU01:FS00:0xa4:Completed 145000 out of 250000 steps  (58%)
18:38:06:WU01:FS00:0xa4:Completed 147500 out of 250000 steps  (59%)
18:39:26:WU01:FS00:0xa4:Completed 150000 out of 250000 steps  (60%)
18:40:44:WU01:FS00:0xa4:Completed 152500 out of 250000 steps  (61%)
18:42:02:WU01:FS00:0xa4:Completed 155000 out of 250000 steps  (62%)
18:43:21:WU01:FS00:0xa4:Completed 157500 out of 250000 steps  (63%)
18:44:38:WU01:FS00:0xa4:Completed 160000 out of 250000 steps  (64%)
18:45:56:WU01:FS00:0xa4:Completed 162500 out of 250000 steps  (65%)
18:47:14:WU01:FS00:0xa4:Completed 165000 out of 250000 steps  (66%)
18:48:32:WU01:FS00:0xa4:Completed 167500 out of 250000 steps  (67%)
******************************* Date: 2014-01-30 *******************************
18:49:50:WU01:FS00:0xa4:Completed 170000 out of 250000 steps  (68%)
18:51:08:WU01:FS00:0xa4:Completed 172500 out of 250000 steps  (69%)
18:52:26:WU01:FS00:0xa4:Completed 175000 out of 250000 steps  (70%)
18:53:44:WU01:FS00:0xa4:Completed 177500 out of 250000 steps  (71%)
18:55:03:WU01:FS00:0xa4:Completed 180000 out of 250000 steps  (72%)
18:56:23:WU01:FS00:0xa4:Completed 182500 out of 250000 steps  (73%)
18:57:42:WU01:FS00:0xa4:Completed 185000 out of 250000 steps  (74%)
18:59:00:WU01:FS00:0xa4:Completed 187500 out of 250000 steps  (75%)
19:00:19:WU01:FS00:0xa4:Completed 190000 out of 250000 steps  (76%)
19:01:38:WU01:FS00:0xa4:Completed 192500 out of 250000 steps  (77%)
19:02:56:WU01:FS00:0xa4:Completed 195000 out of 250000 steps  (78%)
19:04:15:WU01:FS00:0xa4:Completed 197500 out of 250000 steps  (79%)
19:05:33:WU01:FS00:0xa4:Completed 200000 out of 250000 steps  (80%)
19:06:51:WU01:FS00:0xa4:Completed 202500 out of 250000 steps  (81%)
19:08:08:WU01:FS00:0xa4:Completed 205000 out of 250000 steps  (82%)
19:09:26:WU01:FS00:0xa4:Completed 207500 out of 250000 steps  (83%)
19:10:43:WU01:FS00:0xa4:Completed 210000 out of 250000 steps  (84%)
19:12:01:WU01:FS00:0xa4:Completed 212500 out of 250000 steps  (85%)
19:13:19:WU01:FS00:0xa4:Completed 215000 out of 250000 steps  (86%)
19:14:38:WU01:FS00:0xa4:Completed 217500 out of 250000 steps  (87%)
19:15:56:WU01:FS00:0xa4:Completed 220000 out of 250000 steps  (88%)
19:17:15:WU01:FS00:0xa4:Completed 222500 out of 250000 steps  (89%)
19:18:34:WU01:FS00:0xa4:Completed 225000 out of 250000 steps  (90%)
19:19:53:WU01:FS00:0xa4:Completed 227500 out of 250000 steps  (91%)
19:21:12:WU01:FS00:0xa4:Completed 230000 out of 250000 steps  (92%)
19:22:32:WU01:FS00:0xa4:Completed 232500 out of 250000 steps  (93%)
19:23:51:WU01:FS00:0xa4:Completed 235000 out of 250000 steps  (94%)
19:25:09:WU01:FS00:0xa4:Completed 237500 out of 250000 steps  (95%)
19:26:28:WU01:FS00:0xa4:Completed 240000 out of 250000 steps  (96%)
19:27:47:WU01:FS00:0xa4:Completed 242500 out of 250000 steps  (97%)
19:29:05:WU01:FS00:0xa4:Completed 245000 out of 250000 steps  (98%)
19:30:24:WU01:FS00:0xa4:Completed 247500 out of 250000 steps  (99%)
19:31:43:WU01:FS00:0xa4:Completed 250000 out of 250000 steps  (100%)
19:31:43:WU01:FS00:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
19:31:44:WU00:FS00:Connecting to assign3.stanford.edu:8080
19:31:51:WU00:FS00:News: Welcome to Folding@Home
19:31:51:WU00:FS00:Assigned to work server 128.143.199.97
19:31:51:WU00:FS00:Requesting new work unit for slot 00: RUNNING cpu:8 from 128.143.199.97
19:31:51:WU00:FS00:Connecting to 128.143.199.97:8080
19:31:52:WU00:FS00:Downloading 2.22MiB
19:31:53:WU01:FS00:0xa4:
19:31:53:WU01:FS00:0xa4:Finished Work Unit:
19:31:53:WU01:FS00:0xa4:- Reading up to 811800 from "01/wudata_01.trr": Read 811800
19:31:53:WU01:FS00:0xa4:trr file hash check passed.
19:31:53:WU01:FS00:0xa4:- Reading up to 746284 from "01/wudata_01.xtc": Read 746284
19:31:53:WU01:FS00:0xa4:xtc file hash check passed.
19:31:53:WU01:FS00:0xa4:edr file hash check passed.
19:31:53:WU01:FS00:0xa4:logfile size: 22940
19:31:53:WU01:FS00:0xa4:Leaving Run
19:31:56:WU01:FS00:0xa4:- Writing 1583512 bytes of core data to disk...
19:31:56:WU01:FS00:0xa4:Done: 1583000 -> 1538251 (compressed to 97.1 percent)
19:31:56:WU01:FS00:0xa4:  ... Done.
19:32:47:WU01:FS00:0xa4:- Shutting down core
19:32:47:WU01:FS00:0xa4:
19:32:47:WU01:FS00:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
19:32:54:WU01:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
19:32:54:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:9007 run:2 clone:24 gen:13 core:0xa4 unit:0x00000010664f2de452ba296b5646cae7
19:32:54:WU01:FS00:Uploading 1.47MiB to 171.64.65.124
19:32:54:WU01:FS00:Connecting to 171.64.65.124:8080
19:33:00:WU01:FS00:Upload 25.55%
19:33:06:WU01:FS00:Upload 63.89%
19:33:12:WU01:FS00:Upload 97.96%
19:33:14:WU01:FS00:Upload complete
19:33:14:WU01:FS00:Server responded WORK_ACK (400)
19:33:14:WU01:FS00:Final credit estimate, 1652.00 points
19:33:14:WU01:FS00:Cleaning up
******************************* Date: 2014-01-31 *******************************
01:59:21:FS00:Paused
01:59:30:FS00:Unpaused
02:00:15:Removing old file 'configs/config-20130316-162922.xml'
02:00:15:Saving configuration to /etc/fahclient/config.xml
02:00:15:<config>
02:00:15:  <!-- Folding Slot Configuration -->
02:00:15:  <client-type v='advanced'/>
02:00:15:  <power v='full'/>
02:00:15:
02:00:15:  <!-- HTTP Server -->
02:00:15:  <allow v='127.0.0.1 192.168.0.0/16'/>
02:00:15:
02:00:15:  <!-- Network -->
02:00:15:  <proxy v=':8080'/>
02:00:15:
02:00:15:  <!-- Remote Command Server -->
02:00:15:  <command-allow-no-pass v='127.0.0.1 192.168.0.0/16'/>
02:00:15:
02:00:15:  <!-- User Information -->
02:00:15:  <passkey v='********************************'/>
02:00:15:  <team v='31574'/>
02:00:15:  <user v='SKeptical_Thinker'/>
02:00:15:
02:00:15:  <!-- Work Unit Control -->
02:00:15:  <next-unit-percentage v='100'/>
02:00:15:
02:00:15:  <!-- Folding Slots -->
02:00:15:  <slot id='0' type='CPU'>
02:00:15:    <cpus v='8'/>
02:00:15:  </slot>
02:00:15:</config>


I got onto a console and did this:

~$ service FAHClient restart
Stopping fahclient ... FAILED
Starting fahclient ... FAILED
fahclient seems to be already running with PID 1461
~$ ps alx | grep FAH
4 115 1461 1 20 0 114144 4896 hrtime Sl ? 3:49 /usr/bin/FAHClient /etc/fahclient/config.xml --run-as fahclient --pid-file=/var/run/fahclient.pid --daemon
0 115 1585 1461 20 0 719436 53700 hrtime Sl ? 16:41 /usr/bin/FAHClient --child --lifeline 1461 /etc/fahclient/config.xml --run-as fahclient --pid-file=/var/run/fahclient.pid --daemon
0 1000 14571 14462 20 0 13596 920 pipe_w S+ pts/0 0:00 grep --color=auto FAH
$ sudo killall FAHClient
$ ps alx | grep FAH
4 115 1461 1 20 0 114144 4896 hrtime Sl ? 3:50 /usr/bin/FAHClient /etc/fahclient/config.xml --run-as fahclient --pid-file=/var/run/fahclient.pid --daemon
0 115 1585 1461 20 0 719436 53532 futex_ Sl ? 16:42 /usr/bin/FAHClient --child --lifeline 1461 /etc/fahclient/config.xml --run-as fahclient --pid-file=/var/run/fahclient.pid --daemon
0 1000 14626 14462 20 0 13596 924 pipe_w S+ pts/0 0:00 grep --color=auto FAH
$ sudo killall FAHClient
$ sudo service FAHClient start
Starting fahclient ... OK

Here is the log file after the restart:

Code: Select all
*********************** Log Started 2014-01-31T02:20:21Z ***********************
02:20:21:************************* Folding@home Client *************************
02:20:21:    Website: http://folding.stanford.edu/
02:20:21:  Copyright: (c) 2009-2013 Stanford University
02:20:21:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
02:20:21:       Args: --child --lifeline 14668 /etc/fahclient/config.xml --run-as
02:20:21:             fahclient --pid-file=/var/run/fahclient.pid --daemon
02:20:21:     Config: /etc/fahclient/config.xml
02:20:21:******************************** Build ********************************
02:20:21:    Version: 7.3.6
02:20:21:       Date: Feb 18 2013
02:20:21:       Time: 07:24:08
02:20:21:    SVN Rev: 3923
02:20:21:     Branch: fah/trunk/client
02:20:21:   Compiler: GNU 4.4.7
02:20:21:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
02:20:21:             -fno-unsafe-math-optimizations -msse2
02:20:21:   Platform: linux2 3.2.0-1-amd64
02:20:21:       Bits: 64
02:20:21:       Mode: Release
02:20:21:******************************* System ********************************
02:20:21:        CPU: AMD FX(tm)-8120 Eight-Core Processor
02:20:21:     CPU ID: AuthenticAMD Family 21 Model 1 Stepping 2
02:20:21:       CPUs: 8
02:20:21:     Memory: 7.75GiB
02:20:21:Free Memory: 6.27GiB
02:20:21:    Threads: POSIX_THREADS
02:20:21:Has Battery: false
02:20:21: On Battery: false
02:20:21: UTC offset: -5
02:20:21:        PID: 14676
02:20:21:        CWD: /var/lib/fahclient
02:20:21:         OS: Linux 3.8.0-35-generic x86_64
02:20:21:    OS Arch: AMD64
02:20:21:       GPUs: 0
02:20:21:       CUDA: Not detected
02:20:21:***********************************************************************
02:20:21:<config>
02:20:21:  <!-- Folding Slot Configuration -->
02:20:21:  <client-type v='advanced'/>
02:20:21:  <power v='full'/>
02:20:21:
02:20:21:  <!-- HTTP Server -->
02:20:21:  <allow v='127.0.0.1 192.168.0.0/16'/>
02:20:21:
02:20:21:  <!-- Network -->
02:20:21:  <proxy v=':8080'/>
02:20:21:
02:20:21:  <!-- Remote Command Server -->
02:20:21:  <command-allow-no-pass v='127.0.0.1 192.168.0.0/16'/>
02:20:21:
02:20:21:  <!-- User Information -->
02:20:21:  <passkey v='********************************'/>
02:20:21:  <team v='31574'/>
02:20:21:  <user v='SKeptical_Thinker'/>
02:20:21:
02:20:21:  <!-- Work Unit Control -->
02:20:21:  <next-unit-percentage v='100'/>
02:20:21:
02:20:21:  <!-- Folding Slots -->
02:20:21:  <slot id='0' type='CPU'>
02:20:21:    <cpus v='8'/>
02:20:21:  </slot>
02:20:21:</config>
02:20:21:Switching to user fahclient
02:20:21:Trying to access database...
02:20:21:Successfully acquired database lock
02:20:21:Enabled folding slot 00: READY cpu:8
02:20:22:WU00:FS00:Connecting to assign3.stanford.edu:8080
02:20:23:WU00:FS00:News: Welcome to Folding@Home
02:20:23:WU00:FS00:Assigned to work server 171.64.65.124
02:20:23:WU00:FS00:Requesting new work unit for slot 00: READY cpu:8 from 171.64.65.124
02:20:23:WU00:FS00:Connecting to 171.64.65.124:8080
02:20:24:WU00:FS00:Downloading 862.06KiB
02:20:30:WU00:FS00:Download 81.66%
02:20:31:WU00:FS00:Download complete
02:20:31:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9005 run:26 clone:35 gen:3 core:0xa4 unit:0x00000008664f2de452b8018e580026aa
02:20:31:WU00:FS00:Starting
02:20:31:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 00 -suffix 01 -version 703 -lifeline 14676 -checkpoint 15 -np 8
02:20:31:WU00:FS00:Started FahCore on PID 14686
02:20:31:WU00:FS00:Core PID:14690
02:20:31:WU00:FS00:FahCore 0xa4 started
02:20:32:WU00:FS00:0xa4:
02:20:32:WU00:FS00:0xa4:*------------------------------*
02:20:32:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
02:20:32:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
02:20:32:WU00:FS00:0xa4:
02:20:32:WU00:FS00:0xa4:Preparing to commence simulation
02:20:32:WU00:FS00:0xa4:- Looking at optimizations...
02:20:32:WU00:FS00:0xa4:- Created dyn
02:20:32:WU00:FS00:0xa4:- Files status OK
02:20:32:WU00:FS00:0xa4:- Expanded 882237 -> 1469104 (decompressed 166.5 percent)
02:20:32:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=882237 data_size=1469104, decompressed_data_size=1469104 diff=0
02:20:32:WU00:FS00:0xa4:- Digital signature verified
02:20:32:WU00:FS00:0xa4:
02:20:32:WU00:FS00:0xa4:Project: 9005 (Run 26, Clone 35, Gen 3)
02:20:32:WU00:FS00:0xa4:
02:20:32:WU00:FS00:0xa4:Assembly optimizations on if available.
02:20:32:WU00:FS00:0xa4:Entering M.D.
02:20:38:WU00:FS00:0xa4:Completed 0 out of 250000 steps  (0%)
02:21:57:WU00:FS00:0xa4:Completed 2500 out of 250000 steps  (1%)
Image
SKeptical_Thinker
 
Posts: 254
Joined: Tue Apr 29, 2008 11:02 pm

Re: Folding stopped for no apparent reason.

Postby Joe_H » Fri Jan 31, 2014 2:44 am

It appears you ran into a known network "bug" in the folding client, a download was started but never finished and then did not reset the connection to retry the download. You can see the beginning of the download at:
Code: Select all
19:31:52:WU00:FS00:Downloading 2.22MiB

Later in the first log there are no progress reports for the download that started then.

There is a ticket for this, as best as I recall it is still open. In the meantime killing and restarting the FAHClient process is the only way to get folding to resume. Sometimes a system reboot is needed, and can be quicker is some cases. It is not a common problem, but does occur from time to time on all three OS's.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 4174
Joined: Tue Apr 21, 2009 4:41 pm
Location: W. MA

Re: Folding stopped for no apparent reason.

Postby bruce » Fri Jan 31, 2014 3:57 am

Joe_H wrote:There is a ticket for this, as best as I recall it is still open.


You're probably thinking of https://fah-web.stanford.edu/projects/F ... ticket/983
bruce
 
Posts: 21534
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Folding stopped for no apparent reason.

Postby SKeptical_Thinker » Fri Jan 31, 2014 11:52 am

Thanks, Now that I know the signature I wont report this again. But ---
SKeptical_Thinker
 
Posts: 254
Joined: Tue Apr 29, 2008 11:02 pm

Re: Folding stopped for no apparent reason.

Postby codysluder » Fri Jan 31, 2014 9:52 pm

Yes, "But ---"
codysluder
 
Posts: 2128
Joined: Sun Dec 02, 2007 12:43 pm


Return to V7.3.6 Public Release Windows/Linux/MacOS X

Who is online

Users browsing this forum: No registered users and 1 guest

cron