Multiple WU's Fail downld/upld to 155.247.166.*

Moderators: Site Moderators, PandeGroup

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Postby JimF » Sat Nov 09, 2019 12:57 am

My last two machines are now down. If they have to run the work units in order to clear them out, but if they cause everyone to go down in the process, it appears to be a classic Catch 22.
Let me know when they figure it out.

(I was wondering about Stanford - thanks for the update. I did not know they had turned over the load entirely).
JimF
 
Posts: 497
Joined: Thu Jan 21, 2010 2:03 pm

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Postby Paragon » Sat Nov 09, 2019 3:00 am

I can confirm that 219 is still down...took out two of my four machines today. I just rebooted one machine 5 times and it got stuck each time...although the last attempt actually threw an error and then it switched servers.

Code: Select all
02:48:54:WU00:FS01:Assigned to work server 155.247.166.219
02:48:54:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:Ellesmere XT [Radeon RX 470/480/570/580] from 155.247.166.219
02:48:54:WU00:FS01:Connecting to 155.247.166.219:8080
02:48:55:WU00:FS01:Downloading 27.46MiB
02:49:04:WU00:FS01:Download 0.46%
02:50:58:WU00:FS01:Download 0.64%
02:50:58:ERROR:WU00:FS01:Exception: Transfer failed
02:50:58:WU00:FS01:Connecting to 65.254.110.245:8080
02:50:58:WU00:FS01:Assigned to work server 155.247.166.220
02:50:58:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:Ellesmere XT [Radeon RX 470/480/570/580] from 155.247.166.220
02:50:58:WU00:FS01:Connecting to 155.247.166.220:8080
02:50:59:WU00:FS01:Downloading 15.63MiB
02:51:04:WU00:FS01:Download complete
Paragon
 
Posts: 32
Joined: Fri Oct 21, 2011 3:24 am
Location: United States

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Postby HaloJones » Sat Nov 09, 2019 8:06 am

Strange in this day and age that this isn't all virtualised and running off AWS or Azure.
1x Titan X, 1x 1070ti, 4x 1070
HaloJones
 
Posts: 352
Joined: Thu Jul 24, 2008 10:16 am

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Postby bruce » Sat Nov 09, 2019 2:00 pm

That download was from *.220, not *219 ... but the activity levels look normal on serverstat based on recent updates.
bruce
 
Posts: 22818
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Postby Joe_H » Sat Nov 09, 2019 2:07 pm

HaloJones wrote:Strange in this day and age that this isn't all virtualised and running off AWS or Azure.

Some parts of F@h have already been moved to cloud services, and as I understand it, more will be in the future. But that takes programming, time and money. An example of one that is in the cloud, if you look at the server stats page for assign2, the IP address listed is one in a private address range. Its actual hosting is on one of Amazon's servers.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 4574
Joined: Tue Apr 21, 2009 4:41 pm
Location: W. MA

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Postby MeeLee » Sun Nov 10, 2019 1:54 pm

They could reduce the size of WUs from 155.247.*, make it get a lower load.

I think it's valuable to have this server upload as few WUs as possible to consistent clients (clients or users that process many WUs fast).
A slow WU would make little difference for someone who intermittently folds, but is a huge pain for someone with a server to maintain every time one of his GPUs is down.
MeeLee
 
Posts: 370
Joined: Tue Feb 19, 2019 10:16 pm

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Postby Joe_H » Sun Nov 10, 2019 3:03 pm

The size of a WU from an existing project can not be changed. Ultimately it will take more serves to spread the load, and for the WU's that are being processed on the older A7 core to finish being returned.
Joe_H
Site Admin
 
Posts: 4574
Joined: Tue Apr 21, 2009 4:41 pm
Location: W. MA

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Postby bruce » Sun Nov 10, 2019 5:21 pm

MeeLee wrote:They could reduce the size of WUs from 155.247.*, make it get a lower load.

No.

We could reduce the load by having the servers tell clients "No WUs for your client's configuration" but that's not an acceptable solution.

Running a single WU that takes (say) 4 hours will download and upload the same amount of data as two WUs that take 2 hours each, Then, too, two WUs have to make another upload connection and another download connection. Nothing is gained by making smaller WUs.

Studying half of a protein is useless -- you have to study the whole protein.
bruce
 
Posts: 22818
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Postby MeeLee » Sun Nov 10, 2019 7:01 pm

Sure, but it prevents you from having GPUs idle.
The size of WUs can't be changed, but larger WUs are assigned to certain GPUs; and smaller WUs are assigned to older GPUs or CPUs.
You just want to prevent your biggest contributor to be idle (fast GPUs).
MeeLee
 
Posts: 370
Joined: Tue Feb 19, 2019 10:16 pm

Re: Multiple WU's Fail downld/upld to 155.247.166.*

Postby bruce » Mon Nov 11, 2019 1:41 am

A Project can have fewer atoms or more atoms. A project can be processed for N steps or for 2N steps or 0.5N steps. Changing the atom count is not poassible. Changing the number of steps is possible but only when the project is first constructed.
semi-
Changes in atom counts AND changes in steps are both commonly called larger/smaller WUs because changing either one changes the PPD.

Assigning specific projects can be permitted for certain GPU-Species and restricted from other specific GPU-Species. Those restrictions are rigid rules for the Assignment process. If your GPU needs an assignment, it will be assigned something from the pool of active projects permitted for your Species or it won't. There are no second choices that can be assigned only when the list of first choices happens to be empty.
bruce
 
Posts: 22818
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Previous

Return to Issues with a specific server

Who is online

Users browsing this forum: No registered users and 3 guests

cron