Page 1 of 2

What do PRCG numbers mean?

Posted: Fri Feb 05, 2010 11:25 pm
by PantherX
Hi, i searched the forum for the definition of PRCG but didn't come up with any answer. I understand the P - Project is determined by Pandegroup and refers to the proteins but what about the rest of the letters; R - Run, C - Clone, G - Gen.
Any input is appreciated.

Re: What do PRCG numbers mean?

Posted: Fri Feb 05, 2010 11:58 pm
by BuddhaChu
"P - Project, R - Run, C - Clone, G - Gen." refer to the specific protein your client is working on and it appears in the client's log. Each item is a specific sub-item of the one to the left. Each Project has multiple Runs. Each Run has multiple Clones. Etc.

Examples:

[22:11:25] Project: 5766 (Run 12, Clone 103, Gen 274)

Code: Select all

[22:11:25] Project: 5766 (Run 12, Clone 103, Gen 274)
[16:19:58] Project: 2653 (Run 1, Clone 160, Gen 139)
[22:46:35] Project: 5766 (Run 10, Clone 99, Gen 1421)
When you post a problem and someone asks you for the PRCG, that's the line in the client log they're looking for.

BTW: If you're into a scientific parallel to PRCG, it would be the "biological classification" taxonomy. PRCG is the taxonomy of Folding at Home.

If you wanted an anology: "PRCG" is to "Folding at Home" as "LDKP" is to "Biological Classification"

Image

Re: What do PRCG numbers mean?

Posted: Sat Feb 06, 2010 12:24 am
by BuddhaChu
BTW: PRCG is defined in an unofficial Folding at Home dictionary:

http://www.maximumpc.com/forums/viewtopic.php?t=80226

Re: What do PRCG numbers mean?

Posted: Sat Feb 06, 2010 1:18 am
by 7im
PRCG is defined in the FAH WIKI (link above) as well:

http://fahwiki.net/index.php/Glossary

and Pande Group member Dan Ensign has some pretty interesting commentary there also...

http://fahwiki.net/index.php/Runs,_Clones_and_Gens

Re: What do PRCG numbers mean?

Posted: Sat Feb 06, 2010 4:17 am
by BuddhaChu
Thanx 7im. I didn't know about that F@H Wiki entry and I went a little off the deep end on my answer. ;)

Re: What do PRCG numbers mean?

Posted: Sat Feb 06, 2010 4:59 am
by 7im
Multiple points of view are always good, as are different writing styles. Everyone picks up different bits in different ways. :)

Re: What do PRCG numbers mean?

Posted: Sat Feb 06, 2010 1:25 pm
by PantherX
Thanks guys, read the URLs and that is pretty interesting stuff.

What do PRCG numbers mean?

Posted: Thu Nov 24, 2011 5:55 am
by Jesse_V
Hi. I was hoping someone would be able to clarify as to what exactly Projects, Runs, Clones, and Generations are. I've been around F@h for a while now, and I've been doing research into how it works, and I'm enjoying sharing my knowledge about the projects with others. (such as on the Wikipedia page) Oddly, I have yet to run across accurate information into what exactly the PRCG numbers are scientifically. I've been searching around, with limited success, and I was hoping someone could explain. It seems odd that something as fundamental as those numbers (they basically distinguish a WU from other WUs, so I think the information behind that is notable) would not be explained in some page buried in folding.stanford.edu. To my knowledge, servers treat all WUs basically the same way, so there must be a uniform definition. Thus my guess is that it is a rather technical matter, which may vary depending on the calculation techniques being used.

In my search, I ran across this old topic: viewtopic.php?f=17&t=240 which didn't say much but it did link to this page: http://fahwiki.net/index.php/Runs%2C_Clones_and_Gens which I had seen before. It states that it's written by Dan Ensign, which I'm assuming is the same Dan Ensign mentioned in Dr. Pande's 2007 blog post: http://folding.typepad.com/news/2007/09 ... eam-2.html and if that's the case then he knows what he's talking about, although it could be an oversimplication since it doesn't seem scientifically written. The information listed on that wiki page was last modified in 2008, so regardless of whether it was accurate at that time or not, I'm not sure I currently trust it. There's probably been some technical advances since then, so perhaps his description is obsolete to some degree. But it seems to describe the following:

Project - not explained outright but seem to be the protein under study. To me, this implies a particular amino acid sequence.
Run - the arrangement of the atoms in the protein in three dimensional space (orientation irrelevant) so in other words what configuration they're in.
Clone - describes the set of forces given to each atom
Gen - time steps in the simulation process, they are very serial so the n+1th is generated after the nth finishes

He seems to be talking about explicit solvation, which is relatively easy for me to visualize. In another topic (viewtopic.php?f=44&t=20008&p=198882#p198882) I used the analogy of billiards balls to try to explain this as I understood it. The Project was the set of balls you picked up, the Run was how you arranged them in the air, the Clone described how all your friends would set them in motion, and the Generations were frames in a recording of what happened. Now, I realize that my description is a simplification of a simplification, and that's why I'd like to clear things up here. My analogy would work for a small number of long simulations (like the Anton computer uses) but F@h's approaches are much more complicated and efficient, so the analogy likely fails there.

In continuing my search, I tried to access the Reference Links at the bottom of Dan Ensign's explanation. Almost all of them were dead, but I was able to find one using waybackmachine.org: http://web.archive.org/web/200712151740 ... clone.html which was written by an unknown author but he states that he's confused as well. Whoever wrote it did quote from the prominent F@h paper titled Atomistic protein folding simulations on the hundreds of microsecond timescale using worldwide distributed computing, which I've read through, and it describes how proteins spend much of their time "waiting" in various states before quickly transitioning to the next configuration. F@h takes advantage of this by simulating only the quick transitions, and then use algorithms to statistically stitch the entire simulation together. The unknown author indicated that his best guess is:

Run: Different parameters to the protein fold, such as different temperature, different force cutoff distance, etc..
Clone: One of many simulations being performed, each one has randomized initial velocities, the first one to cross the free-energy barrier wins
Gen: When one WU is finished, another is assigned to continue from where it left off. That one would be one generation further..

This seems to correlate a bit to what Dan Ensign was describing, although the unknown author noted that he could have Run and Gen mixed up, and Gens start when a protein leaves an energy minimum.

So basically, I'm not sure what they are. Do the same definitions apply to both implicit and explict solvation models? How does this work when free-energy perturbation or simulated tempering techniques are used? Compounding this issue is that "Project" describes not just the set of amino acids, but in some cases (see project 5749) also specifies a specific simulation temperature. So I'm a bit confused. Any description would be helpful, and any kind of reference to some reliable material would be really great as well. Thank you.

Re: What do PRCG numbers mean?

Posted: Thu Nov 24, 2011 9:27 am
by MtM
Dan Ensign -> http://folding.stanford.edu/Pande/People

From what I understood from Dan's texts and pm's at the time ( he wrote those articles in 'laymans' terms so we donors could try and understand them ) F@H does simulate the waiting time between transitions. That was the whole point of F@H, enter starting values into a simulation model and let it run, basically a coin flip where the chances of seeing an actual transition is very very slim. I am not sure if F@H has already found enough transitional states to only research variations on those starting values, but I can imagine this is only partially true?

I don't understand what you mean with 'the first to cross the free-energy barrier wins'. Clone's are made because F@H does not know what forces and start positions are needed to get a protein to fold, so clone's are made to simulate a larger range within a single Run ( group with same atom placement, temperature ect but with different velocities as mentioned in the wiki article ). Clone's are almost identical but differ in their initial molecular movement. Tbh I asked Dan why velocities and temperature was different, as the speed of which molecules move depend on their temperature from my understanding, but it was to hard to explain to someone without the background needed so I couldn't tell you/anyone this either as I don't know for sure. It might have to do with what you mentioned, explicit and implicit models. In one model, there are no forces to be considered except the forces between the molecules of the protein. In the other, there is a solvent which interacts with them as well. Maybe the article, since it was made before implicit solvent models were available to F@H, mentioned clones this way because they couldn't simulate the effects of a solvent's interaction with the molecules other then giving them different initial directions of movement, but I'm guessing here.

The generations are based on simulation length, or steps. If x steps have been reached the work unit is finished and sent in, and a new work unit can be given out continuing from the last state of that previous work unit.

As I just said, I'm not sure clones in implicit solvent models being the same, maybe they still need to stimulate variances in initial directions and speeds of molecules and it still applies.

Not a very satisfactory answer maybe, and I'm surely hoping someone who is a molecular dynamics expert to answer your questions but I thought I'd share what came to mind reading your post.

Re: What do PRCG numbers mean?

Posted: Thu Nov 24, 2011 5:31 pm
by Jesse_V
MtM wrote:Dan Ensign -> http://folding.stanford.edu/Pande/People

From what I understood from Dan's texts and pm's at the time ( he wrote those articles in 'laymans' terms so we donors could try and understand them ) F@H does simulate the waiting time between transitions. That was the whole point of F@H, enter starting values into a simulation model and let it run, basically a coin flip where the chances of seeing an actual transition is very very slim. I am not sure if F@H has already found enough transitional states to only research variations on those starting values, but I can imagine this is only partially true?

I don't understand what you mean with 'the first to cross the free-energy barrier wins'. Clone's are made because F@H does not know what forces and start positions are needed to get a protein to fold, so clone's are made to simulate a larger range within a single Run ( group with same atom placement, temperature ect but with different velocities as mentioned in the wiki article ). Clone's are almost identical but differ in their initial molecular movement. Tbh I asked Dan why velocities and temperature was different, as the speed of which molecules move depend on their temperature from my understanding, but it was to hard to explain to someone without the background needed so I couldn't tell you/anyone this either as I don't know for sure. It might have to do with what you mentioned, explicit and implicit models. In one model, there are no forces to be considered except the forces between the molecules of the protein. In the other, there is a solvent which interacts with them as well. Maybe the article, since it was made before implicit solvent models were available to F@H, mentioned clones this way because they couldn't simulate the effects of a solvent's interaction with the molecules other then giving them different initial directions of movement, but I'm guessing here.

The generations are based on simulation length, or steps. If x steps have been reached the work unit is finished and sent in, and a new work unit can be given out continuing from the last state of that previous work unit.

As I just said, I'm not sure clones in implicit solvent models being the same, maybe they still need to stimulate variances in initial directions and speeds of molecules and it still applies.

Not a very satisfactory answer maybe, and I'm surely hoping someone who is a molecular dynamics expert to answer your questions but I thought I'd share what came to mind reading your post.
Well I guess it is the read Dan Ensign, and it makes sense he'd try to simplify things down for the rest of us. Still, I would have appreciated some explanation that was about halfway between his style and one of the scientific publications. Oh well.

I gather that they figure out the related conformation states first, then the WUs figure out the rates of transition between them. And I believe I got that phrase from that unknown author, and he was saying that the first WU of each Clone basically is used to identify the fastest rate of transition between these states. It's my understanding that in explicit solvation, everything is simulated atom-by-atom, which may including surrounding atoms of the solvent such as water. In implicit solvation, things are more simplified; solvents are treated as a continuum using mathematical models (kind of like simulating a single ocean wave using sin(x)) and there are a lot less atoms being worked on since things are generalized somehow, but the simulation runs faster overall. So I'm not sure how these different techniques affect what PRCG numbers mean, but I'm suspecting there may be some sort of complicated difference.

Re: What do PRCG numbers mean?

Posted: Thu Nov 24, 2011 6:14 pm
by MtM
I am not an expert, but this sounds really wrong to me.
I gather that they figure out the related conformation states first, then the WUs figure out the rates of transition between them.
To me it reads like you're saying that the simulation is done from a folded to denatured state, when to my knowledge it is the other way around?

In layman's term ( sorry ) I believe that clone's are needed to cover a range of possible initial movement (direction/speed) for each individual atom, and that any of them could potentially lead to a transitional state and then a folded state. I don't know, for reasons given, why this direction/speed isn't specified by temperature ( well I have a hunch, temperature might relate more directly to speed of movement, but can't predict direction as those will vary due to interactions with other forces ( 'random start direction' followed by movement of atoms directed by attraction or repulsion to other molecules or solvents based on electron count(?) and to some extent maybe gravity(?) ect?? ).

I'll think it's best if I let someone qualified answer your question though, I don't know the exact answers.

Re: What do PRCG numbers mean?

Posted: Fri Nov 25, 2011 5:21 am
by gwildperson
I remember reading another layman's explanation that bruce wrote quite some time back. I don't know if it was on this website or the older version that died. It made sense at the time and the two explanations together were better than either one.

I did some thinking about this earlier today, trying to explain FAH to my brother over our Thanksgiving dinner so it's fresher in my mind than it would otherwise be.

Your explanations of Gens is exactly what I understand. It takes a whole series of Gens to create a single trajectory.

The explanation of Projects is mostly agrees with what I understand. There might be several projects for the exact same protein with the exact same set of force equations but including other differences. For example, the question of implicit solvent vs. explicit solvent simulations of the same protein would probably be different Projects. I doubt that this has anything to do with run or clone, though I'm really not exactly clear about the differences between Runs and Clones.

Every project starts out with the atoms in particular positions. It may be a completely denatured or at some intermediate state. I think it can run "backwards" depending on the initial energy, either folding or unfolding.

Not only must positions be assumed, initial velocity and direction must be assumed. Starting from different positions or with different velocities will probably create different runs and/or clones.

Re: What do PRCG numbers mean?

Posted: Wed Nov 30, 2011 5:48 am
by Jesse_V
MtM wrote:I am not an expert, but this sounds really wrong to me.
I gather that they figure out the related conformation states first, then the WUs figure out the rates of transition between them.
To me it reads like you're saying that the simulation is done from a folded to denatured state, when to my knowledge it is the other way around?

In layman's term ( sorry ) I believe that clone's are needed to cover a range of possible initial movement (direction/speed) for each individual atom, and that any of them could potentially lead to a transitional state and then a folded state. I don't know, for reasons given, why this direction/speed isn't specified by temperature ( well I have a hunch, temperature might relate more directly to speed of movement, but can't predict direction as those will vary due to interactions with other forces ( 'random start direction' followed by movement of atoms directed by attraction or repulsion to other molecules or solvents based on electron count(?) and to some extent maybe gravity(?) ect?? ).
Well I don't really know. That's why I'm asking. Sorry for the confusion, but by "related conformation" I mean a somewhat-folded state, in which the protein waits for a bit before jumping to the next conformation (which since they're adjacent are related I guess). If you watch this protein folding video you can see a bit of what I mean. In an interview, Dr. Pande also described the behavior of this protein as similar to a car parallel parking, it gets in the wrong way and has to back up and try again. But I'd be happy to try to wade through a technical explanation, because I want to get this right. I don't know myself, and as you know I'm working on the F@h article on Wikipedia, and I want to get it right there too. Drawing conclusions from oversimplifications is never a good idea.
gwildperson wrote:I remember reading another layman's explanation that bruce wrote quite some time back. I don't know if it was on this website or the older version that died. It made sense at the time and the two explanations together were better than either one.
If you could find this for me, I'd really, really appreciate it. I'm hoping either Bruce or a PG member finds this forum and answer this questions fully, and like I said above I don't mind a technical and thorough explanation, since I can always follow up with further research and perhaps some clarification questions. I know you said the description was in layman terms, and as long as its correct that's all I'm asking.
gwildperson wrote:I did some thinking about this earlier today, trying to explain FAH to my brother over our Thanksgiving dinner so it's fresher in my mind than it would otherwise be.
I always make sure I use the simplest terms possible, otherwise its easy to make them completely overwhelmed by the technical workings and overall complexity. I guess it depends on your audience though. The simplest explanation I've every seen is here. The first leading paragraphs of this page may also be helpful.
gwildperson wrote:Every project starts out with the atoms in particular positions. It may be a completely denatured or at some intermediate state. I think it can run "backwards" depending on the initial energy, either folding or unfolding.

Not only must positions be assumed, initial velocity and direction must be assumed. Starting from different positions or with different velocities will probably create different runs and/or clones.
Right, but I thought the PG group only studied unfolded (either partially or fully) => folded. There are a lot of variables that come into play I'm sure, such as initial atomic positioning, temperature, initial velocity, the surrounding solvent, and the forces applied to the protein, as well as a whole bunch of other more subtle factors. My question can be boiled down to "How do the PRCG numbers separate out these factors?"

Re: What do PRCG numbers mean?

Posted: Wed Nov 30, 2011 12:37 pm
by Napoleon
Jesse_V wrote:Right, but I thought the PG group only studied unfolded (either partially or fully) => folded.
I don't think so, check out projects 5787 - 5798 description, for example. I assume those GROGPU2 projects are still active.

Re: What do PRCG numbers mean?

Posted: Fri Dec 09, 2011 10:53 pm
by Jesse_V
According to an email from Dr. Pande, "There is a lot of flexibility in how PRCG is used, so the answer isn't simple. The main idea is that FAH works by statistical sampling, so having many related "replicas" of a given calculation working allows us to get a lot of information."

From this, I am concluding that there is not one universal definition. Without details from a knowledgeable source, at this point I'm under the assumption that the PRCG mean whatever the scientists that start the projects want them to mean, and there is no definitive explanation across the board. Dr. Pande's explanation does make sense though, since F@h uses many different simulation techniques and I expected that one definition wouldn't be applicable in a totally different core. And I understand in general how their Markov State Models work. Well as at least four small numbers are a bit easier than Rosetta@home's WUs, which appear to be named like "jsr_decoys_nat_frags_2nr7_abrelax_34262_774_0" Wow.

Mod Edit: Merged Two Topics - PantherX