new release: extra-large work units

Any announcements about FAH policy, servers and new projects will be made here.

Moderators: Site Moderators, FAHC Science Team

Locked
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

new release: extra-large work units

Post by kasson »

We are preparing for the public release of a new work unit category: extra-large advanced methods work units. Some background information is provided here, keep watching this thread for more details.

Why a new work unit category?
We have some specific projects where we 1) have large simulation systems and 2) want to get results fast. As multi-core processors get more powerful, we can perform calculations on Folding@Home that previously required supercomputing clusters.

What's different about these work units from the donor perspective?
These work units are special SMP work units that have larger upload and download sizes, shorter deadlines, and require more memory and CPU resources. That's why we've created a new category.

Is there any points incentive for running these work units?
The base value of these work units corresponds roughly to what an SMP work unit using the A2 core would yield on an equivalent calculation. However, because fast completion is a scientific priority for these work units, we are doing a **trial** of a new bonus scheme where faster WU completion yields a points bonus.

What systems can run these work units?
Right now, only Linux and OS/X systems can run these work units, and they require 8 or more cores. We prefer 8+ *physical* cores, although fast Core i7 machines that are dedicated folders have proven sufficient during the testing process. The points incentives are designed to match appropriate resources to points value; if your machine is marginal for the extra-large work units, you're probably better off running standard SMP.

IMPORTANT:
Although it may be obvious to most, in addition to setting the command-line flag, you must configure your client to accept BigWUs.

Does this have any relation to the large-points value work units and recent high-scoring users?
Yes. The initial projects are 2681 and 2682, valued at ~25K points base. Although these point values seem high, the work units are correspondingly larger, so the base PPD (points per day) value is roughly comparable to standard SMP.

A collaborator has donated a large amount of compute time to this project; those clients were initially running under username Anonymous/team 1. To give proper credit for the donation, we have changed the username to PDC, team 1. During the period of this donation, there are at any time between 100 and 400 8-core clients running under this username (800-3200 cores total).

Please stay tuned for further details regarding the upcoming release.

[mods, if there are important points of clarification, feel free to add a post]
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: upcoming release: extra-large work units

Post by kasson »

Here is some information about the bonus scheme being tested for these extra-large work units.

Important: The bonus scheme is based on the time that returned work units are received by our servers. We make every effort to keep these servers available to receive work, but there will inevitably be congestion or downtimes. We do not guarantee server availability. If for some reason you do not receive the expected bonus please do let us know, but unlike base points, we will generally not give recredits for bonuses. Bonuses are not guaranteed. Similar policies apply for unexpected loss of work units, etc. The bonus program has some "slack" calculated in to allow for such unexpected events.

***Trial*** of special bonus scheme for extra-large work units

We have decided to run a short term trial of a different bonus scheme for the extra-large work units currently under testing.

Background on these work units:
We have a number of scientific projects that involve larger molecular systems than are typically tractable on Folding@Home. These systems have many more atoms, resulting in larger download and (usually) upload packets, and they also involve substantially more computation. We have typically run such systems on traditional supercomputing clusters, but we are testing an extension that allows Folding@Home to contribute to solving these problems.

For the current test project, we have a particular need to obtain a moderate number of long trajectories as quickly as possible. We have additional opportunities related to the current project lined up as well as a number of other large systems, so if this test is successful (both with regard to the scientific results and a positive impact on the Folding@Home project) there is room to continue and expand.

Q: Why a different bonus scheme?
A: For this project, speed is particularly critical, more than on many other projects. Having 5x as many clients each of which works with 1/5 the speed is not as helpful here. We are targeting SMP systems with 8+ cores on the Linux and OS/X platforms where we have a relatively stable A2 core available. Our scientific priority is to compute the current set of work units quickly, and we wish to align the points incentive more closely with this priority.

Q: Why not just ordinary short work units and short deadlines?
A: Since these work units involve substantial downloads and uploads, this would place an undue load on both clients and our servers. We also wish to provide an additional incentive for people with many-core machines to process these work units quickly, again in line with our scientific priorities in this case.

Q:How are you awarding bonuses?
A: Bonuses are awarded according to the following formula:
Total points = base points * bonus factor

The bonus factor is computed based on the time from when our server issues the work unit to when it receives the work unit (WU_time), the time from work issuance to when the deadline would expire (deadline_time), the time from issuance to when the work unit times out and is marked for reissuance (timeout_time), and a constant factor k.
If WU_time > timeout time, bonus factor = 1.
If WU_time <= timeout time, bonus factor = sqrt(deadline_time * k / WU_time)

IMPORTANT:
Bonuses are only given *if* a client has a passkey and *after* a client has completed 10 Core A2 work units. Also, to qualify for a bonus, a client must have returned >80% of its work units within the deadline. Otherwise bonus factor = 1.

Example:
For project 2681, we will initially set k=2. We may adjust k as necessary. Again for project 2681, the current deadline time is 6 days and the current timeout time is 4 days. Most users' 8-core machines, clocked at 2.8 GHz or higher, complete these work units in slightly under 3 days, so they would receive a 100% bonus.

Q: I'd like to try running these units to burn in our new ultra-secret 80-core chip at work - with permission, of course. Is there any upper limit to the bonus points you will award?
A: At current time, the maximum bonus factor is 10x (completion of a 2681 work unit in 1/50th the deadline = approx 2 hrs 50 min). If you're in a position to exceed that, talk to us.

Q: What about cluster computing?
A: We're experimenting with ways to support FAH on clusters but don't have solutions we're happy with at the moment. We'll make an announcement if and when we'll have something ready for release.

Q: I have a couple of GPU's in my 8-way box. Can I keep running them in parallel with these special WU?
A: It depends on how the GPU work units affect the extra-large work unit speed. We don't recommend running extra-large work units if your system will take longer than 3 days to complete a 2681; the bonus system is designed to encourage this. And faster is better. If you can meet this parameter and also run one or more GPU work units, feel free.

Q: How long will this bonus program last?
A: It depends. We will continually be evaluating this bonus program and may alter or remove it at any time. We anticipate maintaining it at least through the initial cohort of 2681 work units, but everything depends on how the bonus trial goes.

Q: Do these units require anything else besides 8 or more hardware cores?
A: A hardware speed of 2.4 GHz or higher, and plenty of RAM. For the initial units in this trial, your machine should have at least 0.5 Gb of RAM available per folding core. 0.75 is better. 1 Gb per core is more than enough.

Q: I have all of that but for some reason I'm still being assigned regular WUs.
A1: When you configured the client, perhaps you didn't accept big WUs. If that's not set properly, several different errors might occur.

A2: If you're using -advmethods or any other flag intended to designate a specific class of WU assignment, remove that flag becuase it can override -bigadv.

Q: What about Core i7 chips with 8 virtual cores?
A: In our testing, these have been found somewhat marginal for completing work within the target deadlines (ideally 3 days per WU). If you have a fast Core i7 that is a dedicated folder, feel free to give it a try. If you're doing a lot of other things with that system, standard SMP may be a better bet.

Q: What if internet problems - or Stanford server problems - delay the return of my work unit?
A: Your bonus points would be reduced accordingly, as would your reliability factor if the delay pushed the WU past the timeout. These are risks you must accept when electing to fold these units. We strive to maintain a server environment that is as robust as possible, but the 80% cutoff for reliability factor is intended to allow leeway for network-connection, Stanford server, and work unit problems.

Q: What if an extra-large work unit fails, is deleted, and restarts?
A: As with ordinary WU, no points will be awarded for a deleted WU, and the clock will be reset to the start of the current attempt at the WU. Please be sure to report any failures to the forum.

Q: What if the servers run short of extra-large WU?
A: Your machine will be assigned ordinary SMP WU which may be core_a1 or core_a2 WU according to availability. Ordinary A2 work units will count toward your bonus qualification target of 10 on-time work units, and your reliability factor of 80% on-time units, but they will not earn bonuses.

Q: Do I have to fold 10 of these work units within the deadline before qualifying for bonuses, or does the bonus system take prior performance into account?
A: The bonus system is linked to passkeys. If you have been folding A2 work units under a passkey, you should have have a prior performance record.

Q: Can I try running these units on my super-overclocked, liquid-cooled quad-core system?
A: No. In our experience, fast quad-core systems tend to come in over the 4-day timeout and would thus 1) not contribute to the scientific goals of finishing these projects quickly and 2) not receive bonuses. Quad-core systems can make important contributions to standard SMP projects, and we'd encourage you to apply them there.

Q: What happens to the bonus program at the end of this trial?
A: We will evaluate several types of results from the trial: distribution of return times for the WU, fraction of WU requiring reassignment, number of machines assigned by their users to fold these WU, reported problems with the WU and bonus system, and donor feedback posted on the forum. We will also consult with the forum mods and admins and with our beta testers. At that point we will decide whether to continue, revise, or shut down the bonus program. As in any trial, however, we reserve the right to stop it or put it on temporary hold if serious problems are noted.
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: upcoming release: extra-large work units

Post by kasson »

We have posted an updated version of the Folding@Home client for OS/X and Linux that enables the extra-large work units.
Drop-in replacements for the fah6 SMP binary are available at:
http://www.stanford.edu/~kasson/folding/linux/fah6
statically linked, MD5 sum 29d2ee580f6f79ac996ad29f22554ec0

and for the console OSX/Intel client:
http://www.stanford.edu/~kasson/folding/osx/fah6
MD5 sum 44eeb86e2a4f19a7266591deba504474

As a reminder, you need at least 8 processor cores to run the extra-large work units.

How to do it:
run the fah6 binary as usual except add the following command-line flag: -bigadv
If you were running with -adv, remove that flag. If no bigadv work units are available for your configuration, the assignment server will try to assign advanced methods work units and then regular FAH work units.

Project 2681 is the initial project in this series. It is benchmarked at 25403 points, preferred deadline 4 days, final deadline 6 days.
We recommend running on systems that can complete this work unit in <= 3 days.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: new release: extra-large work units

Post by bruce »

A new forum has been created to discuss this specific topic. Please use it rather than either the Linux forum or the MacOS forum.
viewforum.php?f=55
Locked