server 128.252.203.9 WU download failures

Moderators: Site Moderators, FAHC Science Team

Post Reply
goodyca
Posts: 187
Joined: Sun Dec 02, 2007 12:36 pm

server 128.252.203.9 WU download failures

Post by goodyca »

On 2019-04-10 at 22:48:37 UTC a request for new WU (WU00) was made to the subject sever. The log file then indicates that 8.14MiB is being downloaded. The download does not proceed and the client is paused after 1 hour and 48 minutes.

Code: Select all

19:48:35:WU01:FS00:0xa7:*********************** Log Started 2019-04-10T19:48:34Z ***********************
19:48:35:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
19:48:35:WU01:FS00:0xa7:       Type: 0xa7
19:48:35:WU01:FS00:0xa7:       Core: Gromacs
19:48:35:WU01:FS00:0xa7:    Website: https://foldingathome.org/
19:48:35:WU01:FS00:0xa7:  Copyright: (c) 2009-2018 foldingathome.org
19:48:35:WU01:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:48:35:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 705 -lifeline 28682 -checkpoint 15 -np
19:48:35:WU01:FS00:0xa7:             8
19:48:35:WU01:FS00:0xa7:     Config: <none>
19:48:35:WU01:FS00:0xa7:************************************ Build *************************************
19:48:35:WU01:FS00:0xa7:    Version: 0.0.17
19:48:35:WU01:FS00:0xa7:       Date: Apr 27 2018
19:48:35:WU01:FS00:0xa7:       Time: 19:09:21
19:48:35:WU01:FS00:0xa7: Repository: Git
19:48:35:WU01:FS00:0xa7:   Revision: 21359963583d09ec2063ef946399441c4df4ccd7
19:48:35:WU01:FS00:0xa7:     Branch: master
19:48:35:WU01:FS00:0xa7:   Compiler: GNU 6.3.0 20170516
19:48:35:WU01:FS00:0xa7:    Options: -std=gnu++98 -O3 -funroll-loops
19:48:35:WU01:FS00:0xa7:   Platform: linux2 4.14.0-3-amd64
19:48:35:WU01:FS00:0xa7:       Bits: 64
19:48:35:WU01:FS00:0xa7:       Mode: Release
19:48:35:WU01:FS00:0xa7:       SIMD: avx_256
19:48:35:WU01:FS00:0xa7:************************************ System ************************************
19:48:35:WU01:FS00:0xa7:        CPU: Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz
19:48:35:WU01:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 45 Stepping 7
19:48:35:WU01:FS00:0xa7:       CPUs: 8
19:48:35:WU01:FS00:0xa7:     Memory: 7.71GiB
19:48:35:WU01:FS00:0xa7:Free Memory: 3.44GiB
19:48:35:WU01:FS00:0xa7:    Threads: POSIX_THREADS
19:48:35:WU01:FS00:0xa7: OS Version: 5.0
19:48:35:WU01:FS00:0xa7:Has Battery: false
19:48:35:WU01:FS00:0xa7: On Battery: false
19:48:35:WU01:FS00:0xa7: UTC Offset: -5
19:48:35:WU01:FS00:0xa7:        PID: 28686
19:48:35:WU01:FS00:0xa7:        CWD: /var/lib/fahclient/work
19:48:35:WU01:FS00:0xa7:         OS: Linux 5.0.6-200.fc29.x86_64 x86_64
19:48:35:WU01:FS00:0xa7:    OS Arch: AMD64
19:48:35:WU01:FS00:0xa7:********************************************************************************
19:48:35:WU01:FS00:0xa7:Project: 13825 (Run 263, Clone 2, Gen 0)
19:48:35:WU01:FS00:0xa7:Unit: 0x0000000080fccb095c9565d0b410513d
19:48:35:WU01:FS00:0xa7:Reading tar file core.xml
19:48:35:WU01:FS00:0xa7:Reading tar file frame0.tpr
19:48:35:WU01:FS00:0xa7:Digital signatures verified
19:48:35:WU01:FS00:0xa7:Calling: mdrun -s frame0.tpr -o frame0.trr -x frame0.xtc -cpt 15 -nt 8
19:48:35:WU01:FS00:0xa7:Steps: first=0 total=125000
19:48:40:WU01:FS00:0xa7:Completed 1 out of 125000 steps (0%)
19:48:40:WU00:FS00:Upload 23.31%
19:48:47:WU00:FS00:Upload 56.62%
19:48:55:WU00:FS00:Upload 84.92%
19:49:05:WU00:FS00:Upload complete
19:49:05:WU00:FS00:Server responded WORK_ACK (400)
19:49:05:WU00:FS00:Final credit estimate, 5842.00 points
19:49:05:WU00:FS00:Cleaning up
19:50:30:WU01:FS00:0xa7:Completed 1250 out of 125000 steps (1%)
19:52:20:WU01:FS00:0xa7:Completed 2500 out of 125000 steps (2%)
19:54:07:WU01:FS00:0xa7:Completed 3750 out of 125000 steps (3%)
...
22:45:02:WU01:FS00:0xa7:Completed 122500 out of 125000 steps (98%)
22:46:49:WU01:FS00:0xa7:Completed 123750 out of 125000 steps (99%)
22:48:36:WU01:FS00:0xa7:Completed 125000 out of 125000 steps (100%)
22:48:37:WU00:FS00:Connecting to 65.254.110.245:8080
22:48:37:WU00:FS00:Assigned to work server 128.252.203.9
22:48:37:WU00:FS00:Requesting new work unit for slot 00: RUNNING cpu:8 from 128.252.203.9
22:48:37:WU00:FS00:Connecting to 128.252.203.9:8080
22:48:38:WU00:FS00:Downloading 8.14MiB
22:48:40:WU01:FS00:0xa7:Saving result file ../logfile_01.txt
22:48:40:WU01:FS00:0xa7:Saving result file frame0.trr
22:48:40:WU01:FS00:0xa7:Saving result file frame0.xtc
22:48:40:WU01:FS00:0xa7:Saving result file md.log
22:48:40:WU01:FS00:0xa7:Saving result file science.log
22:48:40:WU01:FS00:0xa7:Folding@home Core Shutdown: FINISHED_UNIT
22:48:41:WU01:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
22:48:41:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:13825 run:263 clone:2 gen:0 core:0xa7 unit:0x0000000080fccb095c9565d0b410513d
22:48:41:WU01:FS00:Uploading 11.26MiB to 128.252.203.9
22:48:41:WU01:FS00:Connecting to 128.252.203.9:8080
22:48:47:WU01:FS00:Upload 27.19%
22:48:53:WU01:FS00:Upload 55.50%
22:49:01:WU01:FS00:Upload 82.13%
22:49:11:WU01:FS00:Upload complete
22:49:11:WU01:FS00:Server responded WORK_ACK (400)
22:49:11:WU01:FS00:Final credit estimate, 5821.00 points
22:49:11:WU01:FS00:Cleaning up
******************************* Date: 2019-04-11 *******************************
00:36:06:FS00:Paused
The same thing happened on 2019-04-14 at 23:19:29 UTC with the subject server. The client was shut down after being hung waiting on the download for 2 hours and 33 minutes.

Code: Select all

18:38:55:WU01:FS00:0xa7:*********************** Log Started 2019-04-14T18:38:55Z ***********************
18:38:55:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
18:38:55:WU01:FS00:0xa7:       Type: 0xa7
18:38:55:WU01:FS00:0xa7:       Core: Gromacs
18:38:55:WU01:FS00:0xa7:    Website: https://foldingathome.org/
18:38:55:WU01:FS00:0xa7:  Copyright: (c) 2009-2018 foldingathome.org
18:38:55:WU01:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
18:38:55:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 705 -lifeline 1323 -checkpoint 15 -np 8
18:38:55:WU01:FS00:0xa7:     Config: <none>
18:38:55:WU01:FS00:0xa7:************************************ Build *************************************
18:38:55:WU01:FS00:0xa7:    Version: 0.0.17
18:38:55:WU01:FS00:0xa7:       Date: Apr 27 2018
18:38:55:WU01:FS00:0xa7:       Time: 19:09:21
18:38:55:WU01:FS00:0xa7: Repository: Git
18:38:55:WU01:FS00:0xa7:   Revision: 21359963583d09ec2063ef946399441c4df4ccd7
18:38:55:WU01:FS00:0xa7:     Branch: master
18:38:55:WU01:FS00:0xa7:   Compiler: GNU 6.3.0 20170516
18:38:55:WU01:FS00:0xa7:    Options: -std=gnu++98 -O3 -funroll-loops
18:38:55:WU01:FS00:0xa7:   Platform: linux2 4.14.0-3-amd64
18:38:55:WU01:FS00:0xa7:       Bits: 64
18:38:55:WU01:FS00:0xa7:       Mode: Release
18:38:55:WU01:FS00:0xa7:       SIMD: avx_256
18:38:55:WU01:FS00:0xa7:************************************ System ************************************
18:38:55:WU01:FS00:0xa7:        CPU: Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz
18:38:55:WU01:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 45 Stepping 7
18:38:55:WU01:FS00:0xa7:       CPUs: 8
18:38:55:WU01:FS00:0xa7:     Memory: 7.71GiB
18:38:55:WU01:FS00:0xa7:Free Memory: 2.59GiB
18:38:55:WU01:FS00:0xa7:    Threads: POSIX_THREADS
18:38:55:WU01:FS00:0xa7: OS Version: 5.0
18:38:55:WU01:FS00:0xa7:Has Battery: false
18:38:55:WU01:FS00:0xa7: On Battery: false
18:38:55:WU01:FS00:0xa7: UTC Offset: -5
18:38:55:WU01:FS00:0xa7:        PID: 1327
18:38:55:WU01:FS00:0xa7:        CWD: /var/lib/fahclient/work
18:38:55:WU01:FS00:0xa7:         OS: Linux 5.0.6-200.fc29.x86_64 x86_64
18:38:55:WU01:FS00:0xa7:    OS Arch: AMD64
18:38:55:WU01:FS00:0xa7:********************************************************************************
18:38:55:WU01:FS00:0xa7:Project: 14103 (Run 13, Clone 11, Gen 4)
18:38:55:WU01:FS00:0xa7:Unit: 0x000000060002894b5c929b28b30073f3
18:38:55:WU01:FS00:0xa7:Reading tar file core.xml
18:38:55:WU01:FS00:0xa7:Reading tar file frame4.tpr
18:38:55:WU01:FS00:0xa7:Digital signatures verified
18:38:55:WU01:FS00:0xa7:Calling: mdrun -s frame4.tpr -o frame4.trr -cpt 15 -nt 8
18:38:55:WU01:FS00:0xa7:Steps: first=5000000 total=1250000
18:38:57:WU01:FS00:0xa7:Completed 1 out of 1250000 steps (0%)
18:39:01:WU00:FS00:Upload 33.73%
18:39:07:WU00:FS00:Upload 62.44%
18:39:14:WU00:FS00:Upload 89.71%
18:39:19:WU00:FS00:Upload complete
18:39:19:WU00:FS00:Server responded WORK_ACK (400)
18:39:19:WU00:FS00:Final credit estimate, 8506.00 points
18:39:19:WU00:FS00:Cleaning up
18:41:45:WU01:FS00:0xa7:Completed 12500 out of 1250000 steps (1%)
18:44:33:WU01:FS00:0xa7:Completed 25000 out of 1250000 steps (2%)
18:47:21:WU01:FS00:0xa7:Completed 37500 out of 1250000 steps (3%)
...
23:13:52:WU01:FS00:0xa7:Completed 1225000 out of 1250000 steps (98%)
23:16:40:WU01:FS00:0xa7:Completed 1237500 out of 1250000 steps (99%)
23:19:28:WU01:FS00:0xa7:Completed 1250000 out of 1250000 steps (100%)
23:19:29:WU00:FS00:Connecting to 65.254.110.245:8080
23:19:29:WU01:FS00:0xa7:Saving result file ../logfile_01.txt
23:19:29:WU01:FS00:0xa7:Saving result file frame4.trr
23:19:29:WU00:FS00:Assigned to work server 128.252.203.9
23:19:29:WU00:FS00:Requesting new work unit for slot 00: RUNNING cpu:8 from 128.252.203.9
23:19:29:WU00:FS00:Connecting to 128.252.203.9:8080
23:19:30:WU00:FS00:Downloading 8.14MiB
23:19:31:WU01:FS00:0xa7:Saving result file md.log
23:19:31:WU01:FS00:0xa7:Saving result file pullf.xvg
23:19:31:WU01:FS00:0xa7:Saving result file pullx.xvg
23:19:31:WU01:FS00:0xa7:Saving result file science.log
23:19:31:WU01:FS00:0xa7:Saving result file traj_comp.xtc
23:19:31:WU01:FS00:0xa7:Folding@home Core Shutdown: FINISHED_UNIT
23:19:31:WU01:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
23:19:31:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:14103 run:13 clone:11 gen:4 core:0xa7 unit:0x000000060002894b5c929b28b30073f3
23:19:32:WU01:FS00:Uploading 9.87MiB to 155.247.166.219
23:19:32:WU01:FS00:Connecting to 155.247.166.219:8080
23:19:38:WU01:FS00:Upload 25.97%
23:19:44:WU01:FS00:Upload 48.14%
23:19:50:WU01:FS00:Upload 82.35%
23:20:00:WU01:FS00:Upload complete
23:20:00:WU01:FS00:Server responded WORK_ACK (400)
23:20:00:WU01:FS00:Final credit estimate, 9638.00 points
23:20:00:WU01:FS00:Cleaning up
00:52:23:Lost lifeline PID 1508, exiting
00:52:23:Lost lifeline PID 1508, exiting
00:52:29:Caught signal SIGTERM(15) on PID 1510
00:52:29:Exiting, please wait. . .
goodyca
Posts: 187
Joined: Sun Dec 02, 2007 12:36 pm

Re: server 128.252.203.9 WU download failures

Post by goodyca »

Any word from the Pande group about this problem?
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: server 128.252.203.9 WU download failures

Post by bruce »

I asked, but have not heard back yet. This week is often holiday at various universities so staffing may be reduced.
Post Reply