genetic algorithms

stevedking · Post by **stevedking** » Thu Dec 20, 2012 3:04 am

Has anyone considered GA's rather than MSM's?

Post by **bruce** » Thu Dec 20, 2012 6:05 am

I'm not sure Genetic Algorithms would contribute to the field of knowledge that FAH is studying. Their objective is to understand WHY proteins misfold, not to predict the end-state solution to the folding of a specific protein. As such, MSMs associated with intermediate states between an initial and a final state and the probabilities of transitions between one intermediate state and another. How would GAs contribute to that knowledge base?

stevedking · Post by **stevedking** » Thu Dec 20, 2012 6:34 am

Well put, I see your point and the information you're trying to gain. The only part of a GA that might assist in the intermediate fold states is the collection of state data using "GA similarity templates" aka schema. I would run a GA from start to finish and collect a database of intermediate-fold string states and classify these string states as intermediate-fold schema (templates) and store them in a hash-lookup (each template will be unique). Your "objective function(s) output" and your "GA transition rule(s) states" can be tagged or assocaited along with your hash entries. The "transition rule" information tied with the objective function output for each intermediate-fold-schema ... might be able to identify or isolate specific reasons for mis-folds... as there are many misfolds and your database will be large. Just a thought.

Post by **Jesse_V** » Thu Dec 20, 2012 7:33 am

It's an interesting idea. Are you familiar with how MSMs work and how F@h uses them?
See http://folding.stanford.edu/English/FAQ-Simulation and http://en.wikipedia.org/wiki/Folding@ho ... gnificance

MSMs already collects the intermediate states and a lot of statistitics and other data of the relationships between the states. I know very little about genetic algorithms, but it seems to me that MSMs already do what you described in your last post.

P.S. Welcome to the forum.

Post by **bruce** » Thu Dec 20, 2012 9:49 pm

Yes, welcome to foldingforum.org, stevedking.

Some people are "visual" people (I am) and others may not be, but there are a number of diagrams in the research papers that I find helpful.

First, the folding process is not deterministic -- thermally based random motions are and important component of the process. As such the simulated protein must be repeatedly folded many, many times so that the statistical patterns emerge. Then those statistics are analyzed.

Take a look at Figure 3 in this paper. For this simple protein, the MSM process has isolated some 2000 intermediate states but for the purpose of the diagram, they've shown only a few, together with the highly probable and some less probable transitions. If state "a" is a starting shape and state "n" the most probable end-state, there's also some other mis-folded state "n1" which is unlike state "n" but which is associated with a disease. Mentally expand the diagram to some 2000 states with all their associated transitions and then try to figure out WHY some percentage of the protein ends up at "n1" rather than at "n." The MSM process can extract those 2000 shapes for additional scrutiny.

Moreover, just because the protein randomly spends more time in those 2000 shapes than in other shapes, there's no guarantee that the critical information fits within the 2000 state cutoff that they used.

At what point in this process would you apply GA and what information would it extract? Aren't these 2000 shapes what you are calling "similarity templates"? There's no doubt that the number of shapes can be reduced by classifying them as similar to others, but how would anyone know if the dissimilarity between similar states, itself, is not the hidden factor that leads to a mis-fold?

I'm not implying that there's anything wrong with your suggestion; I'm just asking why you think it might be an improvement. (I'm not a FAH scientist. My understanding of what I've explained here comes from reading the papers.)

stevedking · Post by **stevedking** » Fri Dec 21, 2012 4:42 am

Now that I think more about a GA... I suppose it might not be a good idea. (1) A GA usually runs on a given number of GA loops and then stops. Where it stops, may not be a proper mis-fold and just may simply stop at an intermediate fold. (2) The other problem with a GA is that it starts with a sample set from a large population, and in this case, the GA would like to start with a set of initial folds, rather than a single fold.

I'd guess that a GA could be modified or tuned to process strings of amino acids, but that may require some trickery. Well, I thought I'd ask and it appears the proper tool right now is the MSM.

Post by **Jesse_V** » Fri Dec 21, 2012 5:33 am

stevedking wrote:I'd guess that a GA could be modified or tuned to process strings of amino acids, but that may require some trickery. Well, I thought I'd ask and it appears the proper tool right now is the MSM.

Well it's always a good idea to think about new ideas. After all, that's what got F@h launched in the first place. AFAIK most scientists didn't think it was possible to parallelize protein folding. From the papers I've read, it seems that everyone thought it was an intrinsically serial process, and previous attempts to parallelize it required a lot of inter-process communication that only a really expensive and powerful supercomputer could handle (ANTON, BlueGene, etc). MSMs and adaptive sampling are essential for distributed computing to power F@h, but I'm sure they're not the ONLY solution.

Folding Forum

genetic algorithms

genetic algorithms

Re: genetic algorithms

Re: genetic algorithms

Re: genetic algorithms

Re: genetic algorithms

Re: genetic algorithms

Re: genetic algorithms