Problems sending unit results and receiving credits

Moderators: Site Moderators, FAHC Science Team

Post Reply
Manfred.Knick
Posts: 36
Joined: Wed Mar 25, 2020 10:21 am
Hardware configuration: Multiple XEON + GTX
Location: Germany

Problems sending unit results and receiving credits

Post by Manfred.Knick »

In multiple cases,
-) sending unit results fails
-) credits are being estmated, but not assigned / received

Installation background:

"F@H on Gentoo"
described in Forum "Q&A about unsupported distros of Linux"
viewtopic.php?f=89&t=33333

Projects affected:
(taken from one of my XEON-GP104-boxes)

Code: Select all

# cd /opt/foldingathome

# ll -AR log*
-rw-r--r-- 1 root root  81K 26. Mär 18:21 log.txt
logs:
-rw-r--r-- 1 foldingathome users 7,7K 19. Mär 22:31 log-20200323-165827.txt
-rw-r--r-- 1 root          root   20K 23. Mär 20:29 log-20200323-192956.txt
-rw-r--r-- 1 root          root   20K 23. Mär 20:32 log-20200323-193404.txt
-rw-r--r-- 1 root          root   60K 24. Mär 00:04 log-20200324-075427.txt
-rw-r--r-- 1 root          root   55K 24. Mär 22:06 log-20200325-072121.txt
-rw-r--r-- 1 root          root   64K 25. Mär 21:43 log-20200326-082502.txt

# grep -R -i '\(Sending unit results\|Final credit estimate\)' * 

logs/log-20200325-072121.txt:09:47:05:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:11762 run:0 clone:431 gen:6 core:0x22 unit:0x0000001080fccb0a5e6d80c4edb7251d
logs/log-20200325-072121.txt:09:48:35:WU02:FS01:Final credit estimate, 28118.00 points
logs/log-20200325-072121.txt:11:20:03:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:13850 run:0 clone:691 gen:8 core:0xa7 unit:0x00000009287234c95e72596d5ee2159d
logs/log-20200325-072121.txt:11:22:01:WU00:FS00:Final credit estimate, 3188.00 points
logs/log-20200325-072121.txt:17:46:30:WU02:FS00:Sending unit results: id:02 state:SEND error:NO_ERROR project:13850 run:0 clone:13926 gen:8 core:0xa7 unit:0x00000009287234c95e7303a905136e35
logs/log-20200325-072121.txt:17:48:09:WU02:FS00:Final credit estimate, 6266.00 points
logs/log-20200324-075427.txt:21:47:04:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11777 run:0 clone:18821 gen:3 core:0x22 unit:0x00000003287234c95e774a630cc9b47e
logs/log-20200324-075427.txt:21:51:37:WU01:FS01:Final credit estimate, 77471.00 points
logs/log-20200324-075427.txt:21:52:09:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:14328 run:9 clone:622 gen:17 core:0xa7 unit:0x000000149bf7a4d65e6d0ba63ed8b34f
logs/log-20200324-075427.txt:21:52:21:WU00:FS00:Final credit estimate, 2309.00 points
logs/log-20200324-075427.txt:22:56:31:WU03:FS00:Sending unit results: id:03 state:SEND error:NO_ERROR project:14303 run:9 clone:700 gen:16 core:0xa7 unit:0x000000139bf7a4d55e66cb49c0b01a8c
logs/log-20200324-075427.txt:22:57:19:WU03:FS00:Final credit estimate, 2057.00 points
logs/log-20200323-192956.txt:17:07:57:WU01:FS01:Sending unit results: id:01 state:SEND error:FAILED project:11779 run:0 clone:547 gen:1 core:0x22 unit:0x000000010d5a98395e73c5dcbe5bdb37
logs/log-20200323-192956.txt:17:09:11:WU01:FS01:Sending unit results: id:01 state:SEND error:FAILED project:11779 run:0 clone:547 gen:1 core:0x22 unit:0x000000010d5a98395e73c5dcbe5bdb37
logs/log-20200323-192956.txt:17:10:11:WU01:FS01:Sending unit results: id:01 state:SEND error:FAILED project:11779 run:0 clone:547 gen:1 core:0x22 unit:0x000000010d5a98395e73c5dcbe5bdb37
logs/log-20200323-192956.txt:17:12:31:WU01:FS01:Sending unit results: id:01 state:SEND error:FAILED project:11779 run:0 clone:547 gen:1 core:0x22 unit:0x000000010d5a98395e73c5dcbe5bdb37
logs/log-20200323-192956.txt:17:15:39:WU01:FS01:Sending unit results: id:01 state:SEND error:FAILED project:11779 run:0 clone:547 gen:1 core:0x22 unit:0x000000010d5a98395e73c5dcbe5bdb37
logs/log-20200323-192956.txt:17:19:54:WU01:FS01:Sending unit results: id:01 state:SEND error:FAILED project:11779 run:0 clone:547 gen:1 core:0x22 unit:0x000000010d5a98395e73c5dcbe5bdb37
logs/log-20200323-192956.txt:17:26:08:WU02:FS01:Sending unit results: id:02 state:SEND error:FAILED project:11743 run:0 clone:1516 gen:14 core:0x22 unit:0x0000001b8ca304f15e67e19d5384ca7c
logs/log-20200326-082502.txt:10:27:07:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:14533 run:0 clone:10870 gen:12 core:0x22 unit:0x0000001680fccb025e72f2adc5bead07
logs/log-20200326-082502.txt:10:35:52:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:14533 run:0 clone:10870 gen:12 core:0x22 unit:0x0000001680fccb025e72f2adc5bead07
logs/log-20200326-082502.txt:10:42:46:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:14533 run:0 clone:10870 gen:12 core:0x22 unit:0x0000001680fccb025e72f2adc5bead07
logs/log-20200326-082502.txt:10:49:17:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:14533 run:0 clone:10870 gen:12 core:0x22 unit:0x0000001680fccb025e72f2adc5bead07
logs/log-20200326-082502.txt:10:54:23:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:14533 run:0 clone:10870 gen:12 core:0x22 unit:0x0000001680fccb025e72f2adc5bead07
logs/log-20200326-082502.txt:10:58:56:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:14533 run:0 clone:10870 gen:12 core:0x22 unit:0x0000001680fccb025e72f2adc5bead07
logs/log-20200326-082502.txt:11:06:11:WU01:FS01:Final credit estimate, 97586.00 points
logs/log-20200326-082502.txt:13:53:20:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:13851 run:0 clone:16007 gen:2 core:0xa7 unit:0x00000006287234c95e7301801162f879
logs/log-20200326-082502.txt:13:57:51:WU00:FS00:Final credit estimate, 1000.00 points
logs/log-20200326-082502.txt:15:18:46:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:13861 run:0 clone:19516 gen:5 core:0xa7 unit:0x000000060d5a98395e730ad6ec5683f4
logs/log-20200326-082502.txt:15:19:03:WU01:FS00:Final credit estimate, 2390.00 points
log.txt:11:29:30:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:14533 run:0 clone:391 gen:6 core:0x22 unit:0x0000000a80fccb025e72f1eaa4d57f65
log.txt:11:32:06:WU01:FS01:Final credit estimate, 106216.00 points
log.txt:14:34:39:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:14533 run:0 clone:12426 gen:7 core:0x22 unit:0x0000000d80fccb025e72f2d47cf449dc
log.txt:14:37:20:WU02:FS01:Final credit estimate, 105199.00 points
log.txt:14:38:30:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:14334 run:0 clone:362 gen:14 core:0xa7 unit:0x0000000f0d5a98395e7525a2ca358921
log.txt:14:48:56:WU00:FS00:Final credit estimate, 880.00 points
log.txt:16:29:19:WU01:FS01:Sending unit results: id:01 state:SEND error:NO_ERROR project:11779 run:0 clone:4695 gen:15 core:0x22 unit:0x000000120d5a98395e73c5985859172e
log.txt:16:31:02:WU01:FS01:Final credit estimate, 66594.00 points
Mod Edit: Added Code Tags - PantherX
Jorgeminator
Posts: 49
Joined: Tue Mar 24, 2020 11:24 am
Location: Finland

Re: Problems sending unit results and receiving credits

Post by Jorgeminator »

I see only two projects that failed to be sent:

project:11779 run:0 clone:547 gen:1
project:11743 run:0 clone:1516 gen:14

Everything else looks like normal output to me.
Image
Manfred.Knick
Posts: 36
Joined: Wed Mar 25, 2020 10:21 am
Hardware configuration: Multiple XEON + GTX
Location: Germany

Re: Problems sending unit results and receiving credits

Post by Manfred.Knick »

Jorgeminator wrote:I see only two projects that failed to be sent:
"Only" two on this Workstation, right.
Jorgeminator wrote:Everything else looks like normal output to me.
"looks": right -
besides the fact that summing up the estimated credits results into a multitude of the assigned value -
delay at time of OP: approx. two days.

As other members of these forums with similar observations already stated:
-) the missing points are not that fatal -
-) but the computational results earned should definitely not get lost.
Manfred.Knick
Posts: 36
Joined: Wed Mar 25, 2020 10:21 am
Hardware configuration: Multiple XEON + GTX
Location: Germany

Re: Problems sending unit results and receiving credits

Post by Manfred.Knick »

Right now, the main workstation in my set is computing a WU that has no description attached to it;
searching, i found it's results being provided again an again since four (!) days already:
I've had this one fail 13 times now...
That is a "normal" set of errors resulting from server overload.
The team are aware and are trying to ramp up to cope with the 20x increase in F@H clients.
Sorry, but it makes no sense at all to multiply burn energy into heat and noise for nothing!

viewtopic.php?f=19&t=33255&p=318092&hilit=11776#p317647

Update:
The upload took more than 20 minutes,
often exploiting less than 30 KiloBit /s where more than 40 MegaBit /s upload would be available -
but finally succeeded.
Manfred.Knick
Posts: 36
Joined: Wed Mar 25, 2020 10:21 am
Hardware configuration: Multiple XEON + GTX
Location: Germany

Re: Problems sending unit results and receiving credits

Post by Manfred.Knick »

I do understand that F@H is overrun by 20x increase in client offerings.
But there is also an overwhelming increase in offerings from very capable clolleageus / big firms
to help on server-side - just two threads as an example:

"Please don't let this go to waste" (5 pages)
viewtopic.php?f=61&t=32751

"Do you need help?" (5 pages)
viewtopic.php?f=16&t=32735
including e.g.
"Please think big, ask for what you could put to use in a massive surge scenario, not what you think you could get. We really can help.
With an authoritative description of a big request, we can also save you the trouble of assembling many small offers.
You have an unparalleled case for support and a big ask at this a solid strategy."
viewtopic.php?f=16&t=32735&p=314789&hil ... ig#p314789
and
Postby bruce » Tue Mar 17, 2020 11:01 pm:
"I did say (elsewhere) that things are getting better at the server level. There's hope."
" We really have been hit by a tidal wave of unexpected computer resources without the resources to deal with it easily."
viewtopic.php?f=16&t=32735&p=314789&hil ... ig#p314651

My very humble proposal:
Perhaps it might be helpful to extend "Server outages and limitations"
viewtopic.php?f=106&t=33193#p317298
beyond only stating "We know about the work unit shortage"
with some updating lines about
- which help offerings have been accepted already
- progress on server side / infrastructure
giving people a hint what to expect.

"overwhelming":
With thousands of people dying every day, F@H has advanced to a bearer of hope that people could actively contribute something.

I am pretty sure that even a very a short follow-up to Greg Bowman's neat News article March 15, 2020
could ease the pressure upon F@H project members a lot.
I know that all of them are working above their limits already: Thank you so much!
Manfred.Knick
Posts: 36
Joined: Wed Mar 25, 2020 10:21 am
Hardware configuration: Multiple XEON + GTX
Location: Germany

Re: Problems sending unit results and receiving credits

Post by Manfred.Knick »

Some problems uploading results persist - but the percentage of cases has become more rare:
...
... ... :WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
...
... ... :WARNING:WU01:FS00:Exception: Failed to send results to work server: Failed to connect to ... : Connection refused
...
and follow-up re-tries typically succeed -
sometimes, it's the third attempt, but at least, the results seem to arrive@home!

Thanks!
Post Reply