Do you need help?

UofM.MartinK · Post by **UofM.MartinK** » Wed Apr 08, 2020 1:36 am

About three days ago, I used above link and filled out the Google Docs to "Sign up Folding@Home Fireside dev chat 9th April". I didn't get a feedback or invitation email yet, is that to be expected?

Also, what is the best set of documents to learn more about the technical implementation and limitations of the current WU generation/assignment/distribution/collection process? I learned a good bunch from reading forum posts, some info is scattered on the website and FAQs, also analyzing client's network activity and tracking several publicly available stats was helpful, especially the server status page. But I still have a lot of open questions which would be great to have some at least superficial answer for before joining the dev chat.

For example, as many other folders, if leaving the clients unattended, they are often idle for many hours a day even though WUs are available. I understand some of the reasons why WUs are not assigned right away, even if they might be available on the server, but there must be other underlying issues, most likely caused by the interaction/assumptions involving AS, WS and client. One indication for client behavior is that a manual PAUSE->FOLD succeeds significantly more often (statistically speaking, and after correcting for the elapsed time between queries etc) than just waiting for the next automatic timeout. And since the client behavior can't be changed (new clients can't and should not be rolled out in this phase), I wonder if these interactions are well understood and if minor adjustments of the AS/WS algorithms could improve the situation significantly?

Also, is there a good way to estimate how "efficiently" the available WUs are assigned? Aforementioned behavior is not of concern for overall F@H performance IF all available WUs are assigned swiftly/at the max possible rate, but there are indications that this is not the case. Is it fully understood what limits the theoretical as well as practically achieved assign rates?

And then there are many more questions about WS data storage/caching etc, for example while an assign rate of 10k/hour is impressive, it's "only" 3 WUs/second, and at least the CPU-based ones are only a handful of MB for download and upload. So take, say, 10MB times 4 (upload WU to WS, distribute WU to client, client uploads result, results are forwarded to CS), a very rough estimate is 10MBx4x3/s = 120 MB/s. Of course only SSDs can give you this performance in random access continuously - but that's also pointing to an interesting question. Could relatively simple tweaks on the server side, like making sure larger batches of WUs (per type) are read sequentially and cached before distribution, use many smaller HDD-backed storage pools in a round-robin fashion and often & automatically recreate them to improve serial data access, etc make 100's of TB's of storage usable for a fraction of the price of a 100TB all-SSD-solution?

Many more thoughts like that, based also on my experience with a 1000+-core cluster I use for my own circuit simulations and large-scale radiation transport simulations. In my experience, I was always able to usually tenfold performance with some relatively simple problem-specific tweaks, file-system and storage organization (dedicated pools for reading, writing, gathering...), ensuring serial data storage, pre-load caching, smart filesystem metadata handling etc. But also, for every problem, these solutions looked very differently, but a single (set of) file system(s)/storage pools was never a good fit.

Post by **PantherX** » Wed Apr 08, 2020 2:31 am

UofM.MartinK wrote:...there are many more questions about WS data storage/caching etc, for example while an assign rate of 10k/hour is impressive, it's "only" 3 WUs/second, and at least the CPU-based ones are only a handful of MB for download and upload. So take, say, 10MB times 4 (upload WU to WS, distribute WU to client, client uploads result, results are forwarded to CS), a very rough estimate is 10MBx4x3/s = 120 MB/s...

Welcome to the F@H Forum UofM.MartinK,

When it comes to distributing WUs, there are two aspects:
1) Initial Project generation -> This is when the initial WUs are being generated by the Server for distribution. It mostly happens when new projects are coming online.
2) Sequential WU generation -> This is when completed WU A is returned to the WS which accepts the data, verifies it and then generates new WU B based on the results of completed WU A.

From the above, we see that the faster a completed WU is returned to the Server, the quicker the science progresses. I don't know how intensive the Sequential WU generation is but I am aware that Initial Project generation can take several hours based on how may initial WUs gets generated.

Post by **bruce** » Wed Apr 08, 2020 3:12 am

alxbelu wrote:While I definitely agree, and at the same time understand that the team is still putting out fires, I feel that with limited effort it should be possible to add some simple load-balancing logic (to replace the round-robin strategy) to the assignment servers? For example, on the AS, restricting the number of WS X designations to Y clients over period Z. The limits could differ for each WS depending on its capabilities, so that for example older/more restricted servers would only get say 500 requests per 2 minutes, where higher performance ones might be able to handle 1000 requests per 2 minutes. (Obviously I don't know actual reasonable numbers here, but the team should have a fair grasp on the avg. time needed to serve each client and thus be able to calculate the optimal utilization)

The server are currently bandwidth limited (the bandwidth limiter is part of the assignment server code. It doesn't know the size of the transaction in advance. It thinks in round numbers by gating in individual transactions and leaving them alone until they finish.
fah1.eastus.cloudapp.azure.com 28,728.00/hr
fah2.eastus.cloudapp.azure.com 14,388.00/hr
oracle1.foldingathome.org 10,800.00/hr
linus1.foldingathome.org 10,776.00/hr
vav16.ocis.temple.edu 10,764.00/hr
orkney.seas.wustl.edu 10,692.00/hr
vav15.ocis.temple.edu 10,680.00/hr
fah3.eastus.cloudapp.azure.com 10,476.00/hr
islay.seas.wustl.edu 9,000.00/hr
plfah1-1.mskcc.org 3,600.00/hr
plfah2-1.mskcc.org 3,552.00/hr
fah4.eastus.cloudapp.azure.com 2,904.00/hr
vav4.ocis.temple.edu 2,220.00/hr
sazerac.seas.wustl.edu 2,160.00/hr
fah5.eastus.cloudapp.azure.com 2,148.00/hr
vav3.ocis.temple.edu 1,620.00/hr
ns338286.ip-37-187-12.eu 492.00/hr

It's easier to read it tabular forum https://apps.foldingathome.org/serverstats

UofM.MartinK · Post by **UofM.MartinK** » Wed Apr 08, 2020 4:13 am

bruce wrote: The server are currently bandwidth limited (the bandwidth limiter is part of the assignment server code. It doesn't know the size of the transaction in advance. It thinks in round numbers by gating in individual transactions and leaving them alone until they finish.

Q1: The WS are _network_ bandwidth limited, or a different bandwidth, e.g. IO or storage?

Q2: The assign rate, e.g. 10,800.00/hr, is a WS setting, the WSs report their set rate to the AS, and the AS makes sure it's round-robin does not exceed each WSs rate?

Q3: If all WS's are currently in their "deadtime" lockout given by the rate, the AS reports "no WUs"?

Q4: Are there stats available how many requests the ASs actually delegated to WSs, and is that number close to the sum of the advertised WSs assign rates?

alxbelu · Post by **alxbelu** » Wed Apr 08, 2020 12:40 pm

UofM.MartinK wrote:
bruce wrote: The server are currently bandwidth limited (the bandwidth limiter is part of the assignment server code. It doesn't know the size of the transaction in advance. It thinks in round numbers by gating in individual transactions and leaving them alone until they finish.
Q1: The WS are _network_ bandwidth limited, or a different bandwidth, e.g. IO or storage?

I just looked at LTT's video of setting up a WS server, that at least implies that they have loads of bandwidth available at any given time (generally they only see something like 0.2-0.3Gbps utilization). There's some other bits that may also be of interest for you, e.g. regarding their IO/storage solution: https://www.youtube.com/watch?v=HaMjPs66cTs

(granted, LTT also only seems to host CPU WUs, that I guess generally are around 1/10th or so the size of the GPU WUs)

UofM.MartinK · Post by **UofM.MartinK** » Wed Apr 08, 2020 3:11 pm

alxbelu wrote:
UofM.MartinK wrote: Q1: The WS are _network_ bandwidth limited, or a different bandwidth, e.g. IO or storage?
I just looked at LTT's video of setting up a WS server, that at least implies that they have loads of bandwidth available at any given time (generally they only see something like 0.2-0.3Gbps utilization). There's some other bits that may also be of interest for you, e.g. regarding their IO/storage solution: https://www.youtube.com/watch?v=HaMjPs66cTs

Yes, this server is a decent box, comparable to some of the NAS storage systems attached to our cluster to store simulation data. And the specs, from BW to storage (60GB ZRAID2 classic HDDs enhanced with Optane cache which seems not to be really utilized due to the 64GB RAM dedicated to caching) are also a lot more down-to-earth than the "100TB SSD" originally floated in some threads. This class of WS - if the number of WS is really the limit - should be much more available, even I could organize two or three similar boxes, although with 1Gbit uplinks due to a weird fibre uplink situation at our location. So I gather the hardware specs and availability are not a limitation at this point at all?

And if a "simple" 8x12TB RAIDZ2 with some ARC2 seems to be sufficient, there seems to be no need to optimize any serial data storage, access patterns etc. - where I would have some decent expertise.

Still hard to gauge whether WU generation or the AS round-robin is the most critical bottleneck - does somebody know if all the WUs are distributed efficiently (i.e., assigned at the maximum rate useful WUs can currently be generated and collected)?

Post by **sukritsingh** » Wed Apr 08, 2020 3:52 pm

Hi all! Thanks everyone so much for their interest! I'm going to post in the forum about this as well, but if you are interested in helping out with FAH we have a "Developer Fireside Chat" happening tomorrow at 4-5pm EDT. Sign up at this link to receive a discord URL: https://tinyurl.com/firesidedev

We have been lucky so far in terms of getting in contact with some larger corporate partners (Don't want to say who out of respect for their PR teams), but we are still working to scale up server power, and this is all because of our amazing community like you speaking up for us, so thank you so much!

alxbelu · Post by **alxbelu** » Wed Apr 08, 2020 3:54 pm

UofM.MartinK wrote:So I gather the hardware specs and availability are not a limitation at this point at all?

And if a "simple" 8x12TB RAIDZ2 with some ARC2 seems to be sufficient, there seems to be no need to optimize any serial data storage, access patterns etc. - where I would have some decent expertise.

Still hard to gauge whether WU generation or the AS round-robin is the most critical bottleneck - does somebody know if all the WUs are distributed efficiently (i.e., assigned at the maximum rate useful WUs can currently be generated and collected)?

If I were to guess, the "100TB SSD" spec might have been thrown around both at the point of adding 1-2 more WS rather than the current amount of offerings, as well as perhaps to let end users know that even if they have a nice "rig" or personal server at home, it wouldn't do and that it needs to be enterprise grade.

As for the current bottleneck, my guess is really that it's about lack of WU generation on the science end, i.e. not really about hardware infrastructure at this point, but a lack of researcher hours to put together new WUs. I base this on two observations; 1. there seems to be new teams/researchers getting involved (e.g. by looking at the project summary usernames/presentations), and 2. the fact that the LTT server seems to have a fair amount of excess capacity available (based on the stats shown in the video) at the seemingly fixed assignment rate of 10 800 WU/hr.

That said, I have had a few instances, around 3-4 out of 10 attempts, to the LTT server where my clients haven't gotten a WU, even around the day shown in the video (April 3) where it didn't seem like it was overloaded in any sense, which I guess could implicate some other code issue:

Code: Select all

******************************* Date: 2020-04-03 *******************************
12:10:04:WU02:FS00:Connecting to 65.254.110.245:8080
12:10:12:WARNING:WU02:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration
12:10:12:WU02:FS00:Connecting to 18.218.241.186:80
12:10:14:WU02:FS00:Assigned to work server 168.245.198.125
12:10:14:WU02:FS00:Requesting new work unit for slot 00: RUNNING cpu:8 from 168.245.198.125
12:10:14:WU02:FS00:Connecting to 168.245.198.125:8080
12:10:14:ERROR:WU02:FS00:Exception: Server did not assign work unit

Lot's of speculation here obviously, I hope you get to join the fireside chat; I'm not applying as I'm just a CS undergrad/hobbyist, but I'd like to think I've got a fair grasp on most high level concepts.

UofM.MartinK · Post by **UofM.MartinK** » Thu Apr 09, 2020 4:37 am

PantherX wrote: When it comes to distributing WUs, there are two aspects:
1) Initial Project generation -> This is when the initial WUs are being generated by the Server for distribution. It mostly happens when new projects are coming online.
2) Sequential WU generation -> This is when completed WU A is returned to the WS which accepts the data, verifies it and then generates new WU B based on the results of completed WU A.

From the above, we see that the faster a completed WU is returned to the Server, the quicker the science progresses. I don't know how intensive the Sequential WU generation is but I am aware that Initial Project generation can take several hours based on how may initial WUs gets generated.

OK, so the scientists upload the code or parameters necessary to generate the initial and sequential WU to the server, not the actual WUs. That's good to know.

sukritsingh wrote: We have been lucky so far in terms of getting in contact with some larger corporate partners (Don't want to say who out of respect for their PR teams), but we are still working to scale up server power, and this is all because of our amazing community like you speaking up for us, so thank you so much!

This is good news! Once this is all scaled up and it's certain that more nodes asking for work is not actually hurting overall progress, I am looking forward to ask my superiors for permission to bring 100 or more FX-8350's to join the fray in their "idle" time.

alxbelu wrote: As for the current bottleneck, my guess is really that it's about lack of WU generation on the science end, i.e. not really about hardware infrastructure at this point, but a lack of researcher hours to put together new WUs. I base this on two observations; 1. there seems to be new teams/researchers getting involved (e.g. by looking at the project summary usernames/presentations), and 2. the fact that the LTT server seems to have a fair amount of excess capacity available (based on the stats shown in the video) at the seemingly fixed assignment rate of 10 800 WU/hr.

My guess is similar, I assume most WS are not very busy and could already handle more - but perhaps there is still a technical limit buried somewhere, since so many of the WS are limited to 10,800/hr...

alxbelu wrote: That said, I have had a few instances, around 3-4 out of 10 attempts, to the LTT server where my clients haven't gotten a WU, even around the day shown in the video (April 3) where it didn't seem like it was overloaded in any sense, which I guess could implicate some other code issue

It could be a "code issue" - although with all now learned from these threads, I would guess that this is an occasional artifact of the fact that the AS is guessing who has work based on the manually configured assign rate, and not at all on real-time stats of how many WU are actually available on the WS. The WS itself might not even know the exact number in real-time, and neither does it need to - since there are more clients asking for work than there is actual work, not serving a client on the WS level occasionally might be better than wasting valuable time distributing a WU which is ready-to-go.

But is this really the case? If so, my concerns are unfounded. But I still see almost all "No WUs" from the assign servers (albeit it got a lot better in the last 72 hours!), and I still suspect a lot of WU's are ready and waiting with BW available, while clients are also ready and waiting for work, but once a client got the "No WUs" signal, it's rapidly increasing its timeout before asking again and it becomes and less and less likely to get an assignment. I suspect that the intended "request throttling", achieved by the combination of hard? deadtime lockouts at the WS and the quickly increasing client timeouts, might be overachieving and causes an unnecessary decrease in effective assignment rate once a certain threshold of clients asking for work is reached.

If my guess is right, the overall system is artificially and increasingly "starved" (and performs less computations in absolute terms) with every new client above the threshold asking for work, while all resources are underutilized, far away from "real" limitations (CPU, network BW in bytes/s or packets/s, etc) kicking in. Of course it is paramount to prevent any real limitation to be reached, because that could have very detrimental consequences. So any tweak to the algorithm (on the AS, perhaps WS) would have to be a very careful one. Provided there is a problem in the first place. It's all just an educated guess, because I still didn't find way to figure out the _actual_ number of WUs assigned globally - or at least delegated by the AS - in a given time interval.

Looking forward to finally understand some of this by participating in the Fireside chat later today

foldinghomealone2 · Post by **foldinghomealone2** » Thu Apr 09, 2020 9:55 am

UofM.MartinK wrote:
alxbelu wrote: As for the current bottleneck, my guess is really that it's about lack of WU generation on the science end, i.e. not really about hardware infrastructure at this point, but a lack of researcher hours to put together new WUs. ...
My guess is similar, I assume most WS are not very busy and could already handle more - but perhaps there is still a technical limit buried somewhere, since so many of the WS are limited to 10,800/hr...

Your guess is wrong.
As you can see in the server stats https://apps.foldingathome.org/serverstats currently ~500.000 WUs are available (initially set up WUs + newly generated WUs) whereas ~130.00 WUs/h can be assigned.
--> Assignment is the bottleneck, not the generation

I would assume that the limitation of 10.800 WU/h comes from the minimum required 1Gbit internet access (and that the limitation includes a certain safety margin).

UofM.MartinK · Post by **UofM.MartinK** » Thu Apr 09, 2020 2:49 pm

foldinghomealone2 wrote: As you can see in the server stats https://apps.foldingathome.org/serverstats currently ~500.000 WUs are available (initially set up WUs + newly generated WUs) whereas ~130.00 WUs/h can be assigned.
--> Assignment is the bottleneck, not the generation

I would assume that the limitation of 10.800 WU/h comes from the minimum required 1Gbit internet access (and that the limitation includes a certain safety margin).

Assignment is definitely the bottleneck, but I doubt that any WS actually distributes anywhere close to the assign rate it's configured for. If they do indeed (a F@H app for actual F@H performance would be great!), my concern is unfounded.

But the actual BW limit, whether it's 1Gb/s or 10Gb/s, is not being hit yet either, at least for CPU WUs:

10.800 WU/h is 3 WU/s
Each WU, for a round number's sake, being 10 MiB in size (CPU WU range up/down observed: 1-40MiB, average 4.8MiB)
Each WU needs to be transferred, worst case, 4 times per assignment
(or 3 if generated on the WS, or only 2 times if only the last generation needs to be actually transmitted to a different system for analysis/archiving)
Worst-case BW: 4x10MiBx3/s = 120MiB/s - that's exactly the Gbit speed you mention.

BUT most WS are actually connected with 5-10Gbit/s, at least they mentioned 1 GBYTE/s in the forums as their minimum BW for a WS. Since 10Gbit is commonplace in the server world nowadays & many data centers have multiples of that for their uplink, I took that as a an actual requirement most WS meet - might have been as wrong as the 100TB SSD storage, though.

But not even a single Gbit/s seems to be utilized, for example on the LTT server mentioned above configured for 10,8000 WU/h. It sounds more like 300MBit/s average with some 1GBit/s peaks, but they provisioned 5Gbit/s.

If this is all just safety margins, then my concerns are unfounded. But again, I suspect some hard-to-track-down inefficiencies in the _actual_ number of assigned WUs (in contrast to the "advertised"/configured ones) and that if another surge of new volunteers joins the cause (installs and starts clients), the overall F@H performance goes down in absolute terms - not just the efficiency.

foldinghomealone2 · Post by **foldinghomealone2** » Thu Apr 09, 2020 10:16 pm

Just attended the FAH Fire Side dev chat
The devoloper(s) calculate with 50MB per WU and 1000 clients connected to a WS at the same time.
They try to improve the performance of the WS to be able to handle 200.000 assignments / hour. More than that would not be necessary because then the generation of WUs would become the bottleneck.

Edit:
Currently the uploaded WUs are cached in the RAM which makes 50GB RAM usage with the given numbers.
And as the upload is much slower than the download, the WU has to stay in the RAM 'quite long'. And as it can't be predicted how many clients want to upload WUs at a certain time they have to limit the outgoing WU assginment to avoid overflow when they are returned.

UofM.MartinK · Post by **UofM.MartinK** » Thu Apr 09, 2020 10:47 pm

Yeah, that was some good background info. The most striking takeaway for me, until now confirming my suspicion: they seem to get back about 80K work units per hour, and "theoretically" assign 120-140lk. This is not just because of faulty clients and clients never returning, that rate is much better. So this is the inefficiency or scaling problem I try to hunt down...

foldinghomealone2 · Post by **foldinghomealone2** » Thu Apr 09, 2020 11:32 pm

I wondered also about the 'low' number of returned valid WUs.
He mentioned also that there are duplicated WUs processed for validation purposes. I assume those WUs will not be counted double.
However I don't know how many WUs are duplicated.

I understood that the 80.000 is the 'net' assingment rate of returned WUs they can use for research.

Post by **PantherX** » Thu Apr 09, 2020 11:40 pm

foldinghomealone2 wrote:...He mentioned also that there are duplicated WUs processed for validation purposes. I assume those WUs will not be counted double.
However I don't know how many WUs are duplicated...

Just to clarify about the duplication of WU, it happens under these circumstances:
1) An assigned WU didn't return before the Timeout period. It will then be reassigned once more.
2) A WU returned an error, it will then be assigned to 3 other donors for validation, i.e. is that WU really bad or was it due to unstable hardware? Once it reaches a number (I think 5) of reported bad WU, it automatically gets blacklisted, i.e. not assigned anymore and the researcher can investigate what happened to it.
3) Validation and verification of a new FahCore. This is rather rare and it is done to ensure that the science produced by the new FahCore is actually valid and meaningful.

Apart from the above circumstances, there's no other duplication of WU going on AFAIK.

F@H has a unique selling point in that it doesn't assign duplicate work (exceptions above) so all WUs completed are individual to you and unique. Verification of data is done on the F@H Servers. Other distributed systems work by sending out multiple copies of work and the verification is done by the donors themselves, i.e. all/most copies should return the same result.

Folding Forum

Do you need help?

Re: Do you need help?

Re: Do you need help?

Re: Do you need help?

Re: Do you need help?

Re: Do you need help?

Re: Do you need help?

Re: Do you need help?

Re: Do you need help?

Re: Do you need help?

Re: Do you need help?

Re: Do you need help?

Re: Do you need help?

Re: Do you need help?

Re: Do you need help?

Re: Do you need help?