171.67.108.158 again

Moderators: Site Moderators, FAHC Science Team

Post Reply
Blue_Bubble
Posts: 12
Joined: Wed Aug 23, 2017 2:29 pm
Location: Duxford, Cambridgeshire, UK

171.67.108.158 again

Post by Blue_Bubble »

I now have several systems at various locations stuck in "Downloading" over the last couple of days.

Common feature appears to be the Work Server at 171.67.108.158 again:

Code: Select all

19:16:25:WU01:FS00:0xa7:Completed 4900000 out of 5000000 steps (98%)
19:24:55:WU01:FS00:0xa7:Completed 4950000 out of 5000000 steps (99%)
19:24:56:WU00:FS00:Connecting to 65.254.110.245:8080
19:24:56:WU00:FS00:Assigned to work server 171.67.108.158
19:24:56:WU00:FS00:Requesting new work unit for slot 00: RUNNING cpu:4 from 171.67.108.158
19:24:56:WU00:FS00:Connecting to 171.67.108.158:8080
19:33:20:WU01:FS00:0xa7:Completed 5000000 out of 5000000 steps (100%)
19:33:20:WU01:FS00:0xa7:Saving result file ../logfile_01.txt
19:33:20:WU01:FS00:0xa7:Saving result file ener.edr
19:33:20:WU01:FS00:0xa7:Saving result file frame77.trr
19:33:21:WU01:FS00:0xa7:Saving result file md.log
19:33:21:WU01:FS00:0xa7:Saving result file science.log
19:33:21:WU01:FS00:0xa7:Saving result file traj_comp.xtc
19:33:21:WU01:FS00:0xa7:Folding@home Core Shutdown: FINISHED_UNIT
19:33:21:WU01:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
19:33:21:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:14153 run:8 clone:67 gen:77 core:0xa7 unit:0x0000005c0002894b5c546d39bf0634fb
19:33:21:WU01:FS00:Uploading 3.12MiB to 155.247.166.219
19:33:21:WU01:FS00:Connecting to 155.247.166.219:8080
19:33:25:WU01:FS00:Upload complete
19:33:25:WU01:FS00:Server responded WORK_ACK (400)
19:33:25:WU01:FS00:Final credit estimate, 6468.00 points
19:33:25:WU01:FS00:Cleaning up
******************************* Date: 2019-08-14 *******************************
07:11:12:FS00:Paused
toTOW
Site Moderator
Posts: 6296
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 171.67.108.158 again

Post by toTOW »

The v7 client has always had troubles recovering for network events ... it could be caused by the work server, or not ...

When a slot is stuck a downloading, the only way to get rid of it is to restart (kill) the client, or the whole machine with a system reboot.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
toTOW
Site Moderator
Posts: 6296
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 171.67.108.158 again

Post by toTOW »

I can confirm that there is something wrong here. I have a client stuck at the same point ... :(

I noticed the server owner.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 171.67.108.158 again

Post by bruce »

Can you ping 171.67.108.158?
Does FAH recover if you restart it?

FAHClient generally does not recover from network interruptions if it happens while it's talking to a server.
toTOW
Site Moderator
Posts: 6296
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 171.67.108.158 again

Post by toTOW »

It's a dedicated server, I don't think the network is to blame ... looking at server status page, 171.67.108.158 is getting many client assigned to it. I guess it's overloaded and can't handle them all.

Image

But yes, I rebooted after installing kernel updates, and it got work from another FAH server.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
toTOW
Site Moderator
Posts: 6296
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 171.67.108.158 again

Post by toTOW »

Joseph acknowledged the issue.
jcoffland wrote:Yes, something is wrong with vspd4. I'm looking into it now.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
cmhbob
Posts: 4
Joined: Sat Jan 31, 2009 1:31 am

Re: 171.67.108.158 again

Post by cmhbob »

I'm also affected. Any update?
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 171.67.108.158 again

Post by bruce »

Earlier, 171.67.108.158 was effectively down (not responding to HTTP connections) but still receiving a lot of attempted connections from clients as you've shown in the screen-grab above. Apparently "assign rate" is the frequency that clients are directed to that WS, not the number of successful assignments.

As far as I can tell, it has been restarted and has been functioning correctly since about 07:40 PM Stanford time yesterday. Http connections are being accepted .. so hopefully WUs are being assigned.
toTOW
Site Moderator
Posts: 6296
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 171.67.108.158 again

Post by toTOW »

Joseph put the server in Accept only mode, no client should be assigned to it.
cmhbob wrote:I'm also affected. Any update?
Restart the client or reboot your machine.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 171.67.108.158 again

Post by bruce »

Notice that the projects from 171.67.108.158 do not appear on https://apps.foldingathome.org/psummary
cmhbob
Posts: 4
Joined: Sat Jan 31, 2009 1:31 am

Re: 171.67.108.158 again

Post by cmhbob »

I rebooted earlier and things are running fine now.
Post Reply