WU's not being send to work server 3.*

Moderators: Site Moderators, FAHC Science Team

Re: WU's not being send to work server 3.*

Postby Neil-B » Thu Apr 30, 2020 6:50 pm

Whilst I know how you must feel, and uploading straight away or before Timeout is obviously preferable, the WU is only down the drain once it reaches expiration.
1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent, Quadro K420 1GB, FAH 7.6.13
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro, Quadro M1000M 2GB, FAH 7.6.13
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro, GTX 750Ti 2GB, FAH 7.6.13
Neil-B
 
Posts: 1217
Joined: Sun Mar 22, 2020 6:52 pm
Location: UK

Re: WU's not being send to work server

Postby CaptainHalon » Thu Apr 30, 2020 9:11 pm

Neil-B wrote:but if you think it helps then feel free to state your opinions just as others might feel free to state contradictory ones :)


I think a lot of the frustration on the donor side could have been mitigated by putting simple controls in the client a long time ago. A white list/black list feature for project numbers would alleviate much frustration, and I think it should be something that's allowed. If you consider how much a research team would have to pay for AWS or Azure resources to accomplish the same tasks that the donors allow them to accomplish for free, then it should be a donor's right omit problem projects that are wasting their hardware resources and electricity.

More often that not when thumbing through the forums, I just see donors getting push back for complaints and told they can quit FAH if they don't like it. It's akin to giving a homeless man $100, watching him spend it all on liquor, and then being told "hey buddy, if ya don't like it, don't give me any money." I suppose that's his right, but it's still rather tasteless.
CaptainHalon
 
Posts: 18
Joined: Mon Apr 13, 2020 12:47 pm

Re: WU's not being send to work server 3.*

Postby HenrikJolsen » Thu Apr 30, 2020 10:06 pm

Same problem here

19:52:35:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:16435 run:2891 clone:0 gen:0 core:0x22 unit:0x0000000203854c135e9a4ef77d34b1df
19:52:35:WU00:FS01:Uploading 133.16MiB to 3.133.76.19
19:52:35:WU00:FS01:Connecting to 3.133.76.19:8080
19:52:36:WARNING:WU00:FS01:Exception: Failed to send results to work server: Transfer failed
19:52:36:WU00:FS01:Trying to send results to collection server
19:52:36:WU00:FS01:Uploading 133.16MiB to 3.21.157.11
19:52:36:WU00:FS01:Connecting to 3.21.157.11:8080
19:52:37:ERROR:WU00:FS01:Exception: Transfer failed

13 attempts already...
HenrikJolsen
 
Posts: 5
Joined: Sun Apr 12, 2020 4:44 pm

Re: WU's not being send to work server 3.*

Postby HaloJones » Thu Apr 30, 2020 10:12 pm

Allowing the donors to decide which units to do leads to cherry-picking and impacts the science. This project isn't run for the sake of the donors and for getting points. When it started there were no points, only units and the only statistics were how many units had been done. Many of the changes in the points system have been to try to prevent the donors from having to do anything other than accept work and do work. Donate the hardware available and let them fold.

Until the start of this CV-19 work, the number of donors and the amount of server hardware and work was pretty equitable. It worked most of the time and the only real recurring problem was the stats server constantly stopping.

Now with a new project, a ramp up in donors twenty fold, huge publicity from Nvidia, Intel and a bunch of influential Youtubers, the project is getting constant repeat questions on a forum where there are no staffers only other donors who try to answer and help.

Has this project been a victim of its own sudden success? Yes, of course it has. But arguing with 20/20 hindsight that it should have done x or y without stopping to perhaps ASK why it is the way it is, is perhaps not overly helpful.

It's been going for over a decade and the method of allocating work automatically via the priorities of the scientists has always worked fine. The new CV-19 units were put at the top of the priority list as soon as they were ready to be worked on and everyone would have got that work but oh no, the new donors complained that they couldn't specify that they would only do CV-19. Is there something wrong with curing cancer while waiting for the CV-19 work? Is there something wrong with being allocated the work that the researchers need doing?

We get enough questions about simply adding a GPU slot without having to deal with users asking which units they should blacklist and which ones get the best points. Who are you suggesting will maintain this white/black list? Are you volunteering?
1x Titan X, 5x 1070, 1x 970, 1 x Ryzen 3600

Image
HaloJones
 
Posts: 816
Joined: Thu Jul 24, 2008 11:16 am

Re: WU's not being send to work server 3.*

Postby bruce » Fri May 01, 2020 12:28 am

I understand this has just been corrected.
bruce
 
Posts: 19701
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: WU's not being send to work server 3.*

Postby Epsilon_Process » Fri May 01, 2020 2:02 am

bruce wrote:I understand this has just been corrected.


Yes, looks like it. Both my stuck work units over 130MiB did finally upload and receive points, although it took many tries before a server connected. I can only imagine there must be a considerable backlog.

Thanks for getting it all sorted out.
Epsilon_Process
 
Posts: 6
Joined: Fri Apr 10, 2020 6:52 am

Re: WU's not being send to work server 3.*

Postby schertt » Fri May 01, 2020 2:42 am

ChrisD5710 wrote:Maybe You should consider supporting work servers with less storage?


The science itself necessitates a large amount of storage space for the data. Fragmenting the infrastructure into even more servers that would then require even more attention than before is a troublesome way to approach the issue. Those types of servers aren't meant to be hosted by the average user; it requires a degree of understanding in both hardware and networking and a level of financial resource that typically comes at the institution level.
schertt
 
Posts: 25
Joined: Wed Apr 25, 2012 11:24 am

Re: WU's not being send to work server 3.*

Postby lazyacevw » Fri May 01, 2020 3:12 am

I've been failing with a large result over the last several days as well. No other issues with any of my slots over the past week:

Waiting on: Send Results
Attempts: 25
Assigned: 2020-04-28T21:46:14Z
Expiration: 2020-05-05T21:46:14Z
Bonus: 0 8-)

3.133.76.19 was restarted 30 minutes ago.
3.21.157.11 was restarted 3 hours ago.
We will see....

https://apps.foldingathome.org/serverstats

01:12:32:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:16435 run:1479 clone:4 gen:0 core:0x22 unit:0x0000000003854c135e9a4ef9e4ff1a84
01:12:32:WU02:FS01:Uploading 141.53MiB to 3.133.76.19
01:12:32:WU02:FS01:Connecting to 3.133.76.19:8080
01:14:42:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
01:14:42:WU02:FS01:Connecting to 3.133.76.19:80
01:16:53:WARNING:WU02:FS01:Exception: Failed to send results to work server: Failed to connect to 3.133.76.19:80: Connection timed out
01:16:53:WU02:FS01:Trying to send results to collection server
01:16:53:WU02:FS01:Uploading 141.53MiB to 3.21.157.11
01:16:53:WU02:FS01:Connecting to 3.21.157.11:8080
01:19:05:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
01:19:05:WU02:FS01:Connecting to 3.21.157.11:80
01:21:16:ERROR:WU02:FS01:Exception: Failed to connect to 3.21.157.11:80: Connection timed out
******************************* Date: 2020-05-01 *******************************
lazyacevw
 
Posts: 36
Joined: Tue Mar 17, 2020 9:12 pm

Re: WU's not being send to work server 3.*

Postby anandhanju » Fri May 01, 2020 4:49 am

@lazyacevw, Can you try stopping your client entirely and restarting the program to ensure it retries again on startup? Please post an update if it fails.
anandhanju
 
Posts: 508
Joined: Mon Dec 03, 2007 5:33 am
Location: Australia

Re: WU's not being send to work server 3.*

Postby lazyacevw » Fri May 01, 2020 6:49 am

anandhanju wrote:@lazyacevw, Can you try stopping your client entirely and restarting the program to ensure it retries again on startup? Please post an update if it fails.


Thanks! I restarted the computer yesterday but it didn't work. I went to try a service restart but as I was about to do so, the upload cleared right before my eyes! Must've taken 30 or so attempts.


Code: Select all
01:12:32:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:16435 run:1479 clone:4 gen:0 core:0x22 unit:0x0000000003854c135e9a4ef9e4ff1a84
01:12:32:WU02:FS01:Uploading 141.53MiB to 3.133.76.19
01:12:32:WU02:FS01:Connecting to 3.133.76.19:8080
01:14:42:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
01:14:42:WU02:FS01:Connecting to 3.133.76.19:80
01:16:53:WARNING:WU02:FS01:Exception: Failed to send results to work server: Failed to connect to 3.133.76.19:80: Connection timed out
01:16:53:WU02:FS01:Trying to send results to collection server
01:16:53:WU02:FS01:Uploading 141.53MiB to 3.21.157.11
01:16:53:WU02:FS01:Connecting to 3.21.157.11:8080
01:19:05:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
01:19:05:WU02:FS01:Connecting to 3.21.157.11:80
01:21:16:ERROR:WU02:FS01:Exception: Failed to connect to 3.21.157.11:80: Connection timed out
...
04:31:33:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:16435 run:1479 clone:4 gen:0 core:0x22 unit:0x0000000003854c135e9a4ef9e4ff1a84
04:31:33:WU02:FS01:Uploading 141.53MiB to 3.133.76.19
04:31:33:WU02:FS01:Connecting to 3.133.76.19:8080
04:33:42:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
04:33:42:WU02:FS01:Connecting to 3.133.76.19:80
04:33:46:WU02:FS01:Upload 0.04%
04:35:33:WU02:FS01:Upload 0.13%
04:35:39:WU02:FS01:Upload 0.97%
...
04:45:27:WU02:FS01:Upload 97.59%
04:45:33:WU02:FS01:Upload 98.61%
04:45:39:WU02:FS01:Upload 99.67%
04:45:43:WU02:FS01:Upload complete
04:45:43:WU02:FS01:Server responded WORK_ACK (400)
04:45:43:WU02:FS01:Final credit estimate, 43291.00 points
04:45:43:WU02:FS01:Cleaning up


Weird. The servers must still be not in a happy place.
lazyacevw
 
Posts: 36
Joined: Tue Mar 17, 2020 9:12 pm

Re: WU's not being send to work server 3.*

Postby hnapel » Fri May 01, 2020 12:37 pm

My WU for which I started this topic eventually got uploaded within the deadline, I'm not sure if it had to do with (simply) overload or that the server had some other issue, but anyway patience helps.
hnapel
 
Posts: 11
Joined: Sat Sep 13, 2008 9:41 pm

Re: WU's not being send to work server 3.*

Postby Neil-B » Fri May 01, 2020 1:06 pm

From a post in another thread - and reading between the lines - the cause of the issues was identified and a solution put in place.

Really glad they got the server accepting WUs before yours expired :)
Neil-B
 
Posts: 1217
Joined: Sun Mar 22, 2020 6:52 pm
Location: UK

Re: WU's not being send to work server 3.*

Postby jrweiss » Fri May 01, 2020 3:39 pm

Finally uploading after restarting client this morning. Very slow start, though, then accepting at ~4-5Mbps:

Code: Select all
14:29:01:WARNING:WU02:FS01:WorkServer connection failed on port 8080 trying 80
14:29:01:WU02:FS01:Connecting to 3.133.76.19:80
14:29:22:WARNING:WU02:FS01:Exception: Failed to send results to work server: Failed to connect to 3.133.76.19:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
14:29:22:WU02:FS01:Trying to send results to collection server
14:29:22:WU02:FS01:Uploading 140.41MiB to 3.21.157.11
14:29:22:WU02:FS01:Connecting to 3.21.157.11:8080
14:29:32:WU02:FS01:Upload 0.09%
14:33:08:WU02:FS01:Upload 0.13%
14:33:14:WU02:FS01:Upload 2.23%
14:33:20:WU02:FS01:Upload 3.96%
14:33:26:WU02:FS01:Upload 6.37%
14:33:32:WU02:FS01:Upload 8.59%
14:33:38:WU02:FS01:Upload 10.51%
14:33:44:WU02:FS01:Upload 12.06%
14:33:50:WU02:FS01:Upload 14.16%
14:33:56:WU02:FS01:Upload 15.22%
14:34:02:WU02:FS01:Upload 17.23%
14:34:08:WU02:FS01:Upload 19.23%
14:34:14:WU02:FS01:Upload 21.59%
14:34:20:WU02:FS01:Upload 23.81%
14:34:26:WU02:FS01:Upload 25.64%
14:34:32:WU02:FS01:Upload 27.42%
14:34:38:WU02:FS01:Upload 29.69%
14:34:44:WU02:FS01:Upload 31.52%
14:34:50:WU02:FS01:Upload 33.30%
14:34:56:WU02:FS01:Upload 35.12%
14:35:02:WU02:FS01:Upload 36.68%
14:35:08:WU02:FS01:Upload 38.73%
14:35:14:WU02:FS01:Upload 40.55%
14:35:20:WU02:FS01:Upload 42.06%
14:35:26:WU02:FS01:Upload 44.34%
14:35:32:WU03:FS00:0xa7:Completed 172500 out of 250000 steps (69%)
14:35:32:WU02:FS01:Upload 45.76%
14:35:38:WU02:FS01:Upload 47.81%
14:35:44:WU02:FS01:Upload 50.17%
14:35:50:WU02:FS01:Upload 51.55%
14:35:56:WU02:FS01:Upload 53.68%
14:36:02:WU02:FS01:Upload 56.00%
14:36:08:WU02:FS01:Upload 58.13%
14:36:14:WU02:FS01:Upload 60.00%
14:36:20:WU02:FS01:Upload 62.18%
14:36:26:WU02:FS01:Upload 64.32%
14:36:32:WU02:FS01:Upload 66.59%
14:36:38:WU02:FS01:Upload 67.66%
14:36:44:WU02:FS01:Upload 69.62%
14:36:50:WU02:FS01:Upload 72.16%
14:36:56:WU02:FS01:Upload 74.34%
14:37:02:WU02:FS01:Upload 76.52%
14:37:08:WU02:FS01:Upload 79.01%
14:37:14:WU02:FS01:Upload 80.93%
14:37:20:WU02:FS01:Upload 82.53%
14:37:26:WU02:FS01:Upload 84.40%
14:37:32:WU02:FS01:Upload 86.36%
14:37:38:WU02:FS01:Upload 88.45%
14:37:44:WU02:FS01:Upload 89.65%
14:37:50:WU02:FS01:Upload 91.96%
14:37:56:WU02:FS01:Upload 93.74%
14:38:02:WU02:FS01:Upload 95.84%
14:38:08:WU02:FS01:Upload 97.57%
14:38:14:WU02:FS01:Upload 99.98%
14:38:16:WU02:FS01:Upload complete
14:38:16:WU02:FS01:Server responded WORK_ACK (400)
14:38:16:WU02:FS01:Final credit estimate, 51387.00 points
14:38:16:WU02:FS01:Cleaning up
Ryzen 7 3700X; MSI GTX 1050ti, 451.48 driver
i7-4770K; MSI GTX 1050ti, 451.48 driver
User avatar
jrweiss
 
Posts: 697
Joined: Tue Dec 04, 2007 7:56 am
Location: @Home

3.133.76.19 (aws1.foldingathome.org)

Postby tbonse » Sun May 03, 2020 5:16 am

This server's uptime is less than 40 minutes and is already having problems again.

3.133.76.19 aws1.foldingathome.org WS 9.6.8 joseph 10,764.00/hr 0 0 Yes Assign 18,116 7,896 OPENMM_22, GRO_A7 6.72TiB 37 minutes 2020-05-03T04:09:54Z

Code: Select all
04:07:02:WU01:FS01:Uploading 78.05MiB to 3.133.76.19
04:07:02:WU01:FS01:Connecting to 3.133.76.19:8080
04:08:42:WARNING:WU00:FS01:WorkServer connection failed on port 8080 trying 80
04:08:42:WU00:FS01:Connecting to 3.133.76.19:80
04:09:09:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
04:09:09:WU01:FS01:Connecting to 3.133.76.19:80
04:10:49:ERROR:WU00:FS01:Exception: Failed to connect to 3.133.76.19:80: Connection timed out
04:10:49:WU00:FS01:Connecting to 65.254.110.245:80
04:10:50:WU00:FS01:Assigned to work server 128.252.203.10
04:10:50:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GP107GL [Quadro P1000] from 128.252.203.10
04:10:50:WU00:FS01:Connecting to 128.252.203.10:8080
04:11:16:WARNING:WU01:FS01:Exception: Failed to send results to work server: Failed to connect to 3.133.76.19:80: Connection timed out
04:11:16:WU01:FS01:Trying to send results to collection server
tbonse
 
Posts: 25
Joined: Wed Apr 15, 2020 11:32 am

Re: 3.133.76.19 (aws1.foldingathome.org)

Postby foldy » Sun May 03, 2020 8:22 am

Stats say it is online but I also cannot connect to http://aws1.foldingathome.org/
foldy
 
Posts: 1942
Joined: Sat Dec 01, 2012 4:43 pm

PreviousNext

Return to Issues with a specific server

Who is online

Users browsing this forum: Yandex [Bot] and 2 guests

cron