171.67.108.158 again

Moderators: Site Moderators, PandeGroup

171.67.108.158 again

Postby Blue_Bubble » Wed Aug 14, 2019 7:13 am

I now have several systems at various locations stuck in "Downloading" over the last couple of days.

Common feature appears to be the Work Server at 171.67.108.158 again:

Code: Select all
19:16:25:WU01:FS00:0xa7:Completed 4900000 out of 5000000 steps (98%)
19:24:55:WU01:FS00:0xa7:Completed 4950000 out of 5000000 steps (99%)
19:24:56:WU00:FS00:Connecting to 65.254.110.245:8080
19:24:56:WU00:FS00:Assigned to work server 171.67.108.158
19:24:56:WU00:FS00:Requesting new work unit for slot 00: RUNNING cpu:4 from 171.67.108.158
19:24:56:WU00:FS00:Connecting to 171.67.108.158:8080
19:33:20:WU01:FS00:0xa7:Completed 5000000 out of 5000000 steps (100%)
19:33:20:WU01:FS00:0xa7:Saving result file ../logfile_01.txt
19:33:20:WU01:FS00:0xa7:Saving result file ener.edr
19:33:20:WU01:FS00:0xa7:Saving result file frame77.trr
19:33:21:WU01:FS00:0xa7:Saving result file md.log
19:33:21:WU01:FS00:0xa7:Saving result file science.log
19:33:21:WU01:FS00:0xa7:Saving result file traj_comp.xtc
19:33:21:WU01:FS00:0xa7:Folding@home Core Shutdown: FINISHED_UNIT
19:33:21:WU01:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
19:33:21:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:14153 run:8 clone:67 gen:77 core:0xa7 unit:0x0000005c0002894b5c546d39bf0634fb
19:33:21:WU01:FS00:Uploading 3.12MiB to 155.247.166.219
19:33:21:WU01:FS00:Connecting to 155.247.166.219:8080
19:33:25:WU01:FS00:Upload complete
19:33:25:WU01:FS00:Server responded WORK_ACK (400)
19:33:25:WU01:FS00:Final credit estimate, 6468.00 points
19:33:25:WU01:FS00:Cleaning up
******************************* Date: 2019-08-14 *******************************
07:11:12:FS00:Paused
Blue_Bubble
 
Posts: 10
Joined: Wed Aug 23, 2017 2:29 pm
Location: Duxford, Cambridgeshire, UK

Re: 171.67.108.158 again

Postby toTOW » Wed Aug 14, 2019 11:12 am

The v7 client has always had troubles recovering for network events ... it could be caused by the work server, or not ...

When a slot is stuck a downloading, the only way to get rid of it is to restart (kill) the client, or the whole machine with a system reboot.
Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.

FAH-Addict : latest news, tests and reviews about Folding@Home project.

Image
User avatar
toTOW
Site Moderator
 
Posts: 8766
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France

Re: 171.67.108.158 again

Postby toTOW » Wed Aug 14, 2019 5:22 pm

I can confirm that there is something wrong here. I have a client stuck at the same point ... :(

I noticed the server owner.
User avatar
toTOW
Site Moderator
 
Posts: 8766
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France

Re: 171.67.108.158 again

Postby bruce » Wed Aug 14, 2019 6:10 pm

Can you ping 171.67.108.158?
Does FAH recover if you restart it?

FAHClient generally does not recover from network interruptions if it happens while it's talking to a server.
bruce
 
Posts: 22616
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 171.67.108.158 again

Postby toTOW » Wed Aug 14, 2019 7:02 pm

It's a dedicated server, I don't think the network is to blame ... looking at server status page, 171.67.108.158 is getting many client assigned to it. I guess it's overloaded and can't handle them all.

Image

But yes, I rebooted after installing kernel updates, and it got work from another FAH server.
User avatar
toTOW
Site Moderator
 
Posts: 8766
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France

Re: 171.67.108.158 again

Postby toTOW » Wed Aug 14, 2019 8:54 pm

Joseph acknowledged the issue.
jcoffland wrote:Yes, something is wrong with vspd4. I'm looking into it now.
User avatar
toTOW
Site Moderator
 
Posts: 8766
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France

Re: 171.67.108.158 again

Postby cmhbob » Thu Aug 15, 2019 5:35 pm

I'm also affected. Any update?
cmhbob
 
Posts: 5
Joined: Sat Jan 31, 2009 1:31 am

Re: 171.67.108.158 again

Postby bruce » Thu Aug 15, 2019 6:45 pm

Earlier, 171.67.108.158 was effectively down (not responding to HTTP connections) but still receiving a lot of attempted connections from clients as you've shown in the screen-grab above. Apparently "assign rate" is the frequency that clients are directed to that WS, not the number of successful assignments.

As far as I can tell, it has been restarted and has been functioning correctly since about 07:40 PM Stanford time yesterday. Http connections are being accepted .. so hopefully WUs are being assigned.
bruce
 
Posts: 22616
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 171.67.108.158 again

Postby toTOW » Thu Aug 15, 2019 10:00 pm

Joseph put the server in Accept only mode, no client should be assigned to it.

cmhbob wrote:I'm also affected. Any update?

Restart the client or reboot your machine.
User avatar
toTOW
Site Moderator
 
Posts: 8766
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France

Re: 171.67.108.158 again

Postby bruce » Fri Aug 16, 2019 1:06 am

Notice that the projects from 171.67.108.158 do not appear on https://apps.foldingathome.org/psummary
bruce
 
Posts: 22616
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 171.67.108.158 again

Postby cmhbob » Fri Aug 16, 2019 2:12 am

I rebooted earlier and things are running fine now.
cmhbob
 
Posts: 5
Joined: Sat Jan 31, 2009 1:31 am


Return to Issues with a specific server

Who is online

Users browsing this forum: No registered users and 2 guests

cron