Artificial Intelligence for F@H

moritzgedig · Post by **moritzgedig** » Fri Nov 02, 2018 1:02 am

I wonder whether F@H could profit of the current surge in AI ?
I imagine it's use on the server side in the production of WUs. I think a neuronal net AI would be good in predicting the short term direction the folding will take. By now you should have enough data to train. You could then produce WUs around but not at that prediction and see whether they converge.
Obviously I have little inside in either field, but that is what I imagine could reduce the number of runs that you need to find the most likely paths without having to be on the lookout for divergence all the time. The AI might "guess" that the current state is divergent (leads to multiple semi-stable states) so that would be in the output as well.

Post by **bruce** » Fri Nov 02, 2018 6:13 am

FAH is not about predicting the most probable direction that a folding protein will take ... it's about finding many reasonable possibilities. The classic example is Alzheimer's disease, where the final shape is well-known and does not cause the disease, but finding how and why the protein sometimes miss-folds, producing the disease. I really like the second figure in the Wikipedia article about FAH showing how Markov State Models (MSMs) help to find many alternative paths of varying probability. Suppose just one of those paths leads to a stable alternate shape.

I'm not suggesting that AI can't be productive, but probably in some other way than you suggest. FAH's researchers do spend a lot of time and energy seeking to improve future research methodology. (Early FAH research didn't use MSMs.)

moritzgedig · Post by **moritzgedig** » Sat Nov 03, 2018 2:40 pm

bruce wrote:FAH is not about predicting the most probable direction that a folding protein will take

And I did not think or suggest so.
One has to make the distinction between long term and short term.
I assume that most of the compute time is used to "wiggle around" in a quasi stable state (call it configuration cluster, attractor, set of similar numerical values ) that then later becomes a Markov State.
During the transition from one state to another (you might not know it at that time) the short term path is likely predictable. At that point a AI could point the WU generator in a direction, biasing new WUs.
Any such attractor, configuration cluster is too high in dimensions to form a hyper-sphere and search it systematically, for exit paths. By predicting the short term development that the molecule is most likely to take from a known outlier you might save some compute time.
You would obviously not train the AI to always point back to the local "optimum", but train it on known transitions to generate a plausible direction that need not to be the most likely.

Suppose just one of those paths leads to a stable alternate shape.

Prion disease, I know.

probably in some other way than you suggest.

Just as good and interesting.

moritzgedig · Post by **moritzgedig** » Thu Nov 08, 2018 6:30 pm

Vijay Pande calles what I am talking about "shooting trajectories". It is the process of searching the state space. Each simulation, WU only does a short trajectory, most of which do not progress towards a final folded state.
The AI's job would be to minimize the number of short trajectories that do not add to the project and to hint at those that will progress into the direction of the transition towards the next local optimum, Markov State.
After years of F@H you now have the data to train an AI to be an expert beyond human capacity at predicting what will NOT happen.
The most fruitful, yielding way to accelerate computation is always to find a way to need less of it.
The danger is to miss a transition because the AI was wrong and you were too restrictive in that direction, but nothing keeps you from spending more resources later to explore the state space further. It would be dangerous to train the AI on data that was generated relying on it too heavily.

Post by **bruce** » Fri Nov 09, 2018 3:58 am

moritzgedig wrote:... and to hint at those that will progress into the direction of the transition towards the next local optimum, Markov State.

I'd like you to explain this in more detail.

To my layman's understanding, I always understood that the Markov States are not known until you "shoot some trajectories" and locate a suitable number of trajectories that congregate somewhere -- at a local minimum. When you're starting those trajectories, you have started with a blank N-dimensional slate where one or two endpoints are known and random starting points are chosen.

Then you pick out the places where those paths congregate and call them Markof states (worthy of additional study). Then in a new, more productive study, you concentrate on paths that start at those interesting locations and end a other interesting location, ignoring the other places which were encountered randomly but nothing interesting happened.

Are you proposing that AI can do something useful in the first type of study (random) or in the second type of study (after Markov states have been identified)?

moritzgedig · Post by **moritzgedig** » Fri Nov 09, 2018 6:49 pm

bruce wrote:To my layman's understanding, I always understood that the Markov States are not known until you "shoot some trajectories" and locate a suitable number of trajectories that congregate somewhere -- at a local minimum.

That is my understanding as well.

When you're starting those trajectories, you have started with a blank N-dimensional slate where one or two endpoints are known and random starting points are chosen.

This is in conflict with my understanding. There is no blank slate but many known short trajectories. They "fill" the space, if you could draw a surface on that volume your area of interest would be on it.
Obviously clustering known states is not that easy because the distance between points in that space is undefined. But apparently there is a similarly metric defined and ageed upon among experts. After all any "short" run should not jump across large distances in a well defined state space. Some form of dimension reduction is clearly possible and would be done by any human as one would naturally wish to represent the protein structure efficiently. Distances between bonded atoms will not change much, thus the structure storage and state vector will not be a linked list of numbered atoms with xyz coordinate.

Then you pick out the places where those paths congregate and call them Markov states

Yes, to my understanding.

worthy of additional study

Not AFAIK. By then you know it is either a final state or just an intermediate.
F@H is after the transitions "dynamic" between them.

Then in a new, more productive study, you concentrate on paths that start at those interesting locations and end a other interesting location, ignoring the other places which were encountered randomly but nothing interesting happened.

At first you at best know whether you are on the surface (interesting) or in the volume (not interesting). The protein will not just move downhill all the time it is not an annealing process.

Are you proposing that AI can do something useful in the first type of study (random) or in the second type of study (after Markov states have been identified)?

After Markov States have been identified you are done. Obviously the point of using an AI is not to search randomly. I repeatedly stated that the AI's job is to make educated expert guesses. I used the words "bias", "hint", "prediction".
Think of the aforementioned state space surface, instead of searching all of it, you would ask the AI where it thinks the next state transition is most likely to be found, but I guess that asking where not to focus is even more valuable.

Us talking about this is interesting but not productive. Who ever is responsible for the method and code that generates new WUs, trajectories is the only one who can say: "That is a dumb idea." or "That is an idea worth exploring."
Even if this idea is workable it might turn out to be too computationally expensive over the current method. Or maybe the entire idea is in conflict with the goal of F@H because they want the runs that they would be avoiding to form a solid, dense wall all around their Markov States.
But even if they do want the no-transition-data, they can always get it later after they got all the states the AI did not think didn't exist.

Post by **bruce** » Sat Nov 10, 2018 1:34 am

moritzgedig wrote:This is in conflict with my understanding. There is no blank slate but many known short trajectories. They "fill" the space, if you could draw a surface on that volume your area of interest would be on it.

What do you mean by "surface" and "volume" in a N dimensional-space? Each atom has 6 degrees of freedom and attractions/repulsions are not restricted to moving on any kind of reserved 2D surface. If it's being attracted toward another atom, the force will be toward wherever that atom happens to be ... With multiple nearby atoms, the sum of the forces can be in any direction with respect to whatever coordinate system you're using to define "surface"

This is in conflict with my understanding. There is no blank slate but many known short trajectories.

I think that's a matter of terminology. First, somebody has to shoot lots of short trajectories. Finding those short trajectories is what I called the first type of analysis. What I called the second type of analysis starts with a lot of short trajectories and after the markov states have been determined. So the answer to my question (avbove) is you're proposing to use AI on the second type of analysis -- interconnecting the markov states that have been identified.

Years ago, FAH's focused on creating longer "short trajectories" (before MSM methodology was adopted) -- and analysis was a lot less productive.

moritzgedig · Post by **moritzgedig** » Sat Nov 10, 2018 7:58 am

What do you mean by "surface" and "volume" in a N dimensional-space?

I am speaking in the most abstract meaning. For any volume in N-dimensional configuration/state hyperspace there is a (N-1)D surface that surrounds it.
You could find a linear transformation that puts all states of interest into a unit n-sphere.

With multiple nearby atoms, the sum of the forces can be in any direction with respect to whatever coordinate system you're using to define "surface"

Whatever surface you are talking about, I was not.

What I called the second type of analysis starts with a lot of short trajectories and after the markov states have been determined. So the answer to my question (avbove) is you're proposing to use AI on the second type of analysis -- interconnecting the markov states that have been identified.

Our understandings of the use of Markov States differ.

JimF · Post by **JimF** » Wed Nov 28, 2018 4:40 pm

moritzgedig wrote:I wonder whether F@H could profit of the current surge in AI ?

I am glad you asked.

A recent publication refers to "learning".
https://foldingathome.org/2018/11/27/ht ... tabstract/

Did our work at FAH contribute to that?
Might it in the future?

Post by **bruce** » Wed Nov 28, 2018 5:00 pm

JimF wrote:
Did our work at FAH contribute to that?
Might it in the future?

The authors of that paper are the same folks who use FAH in their research so certainly they're thinking of other types of research that might be accomplished using computational biology -- either in a distributed computer format or using the computer labs at their universities.

The FAH software is designed to solve a different type of problem, so it's fair to assume FAH has not contributed directly to the AI study == but I can't predict what follow-on software might be added in future enhancements or as an independent project in parallel to FAH.

AI learns based on experience so it requires a continuous stream of things that have been tried previously -- both successes and failures. i.e. a lot of input data needs to be digested. FAH, on the other hand, only needs to download a snapshot of the current state of the protein plus the parameters of what to do next, and then it can do a bunch of computing without further input. The requirements for data for FAH are much more consistent with the structure of distributed computing whereas AI doesn't fit well.

JimF · Post by **JimF** » Thu Nov 29, 2018 12:17 am

Thanks. On GPUGrid, they are using the "QC" CPU app to train their machines, which I believe then peform the final calculations in their own lab (on GPUs).
I know that is not the way it is done here thus far, but it would be interesting to see if it is applicable at some time.

I normally use GPUs here, due to their much higher output, but that could change with AI, and it might be more worthwhile to then contribute more CPU power. It all figures in to what hardware we buy eventually.

JimboPalmer · Post by **JimboPalmer** » Thu Nov 29, 2018 8:42 pm

Only kinda off topic.

What the AI cores are doing is very fast 4 by 4 multiply of 16 bit real numbers, optionally with a 4 by 4 add, into a 32 bit real number.
F@H currently uses 32 bit real numbers with very few 64 bit real numbers where needed. If some parts could be done in 16 bit precision (called half precision) there would be a LARGE speed up. (8 times for the part that could use 16 bits)
So if F@H could do any of it's math using 16 bit numbers, even if it did not involve AI, it could use the AI cores to good advantage.

Only the Pande Group can decide that. I know they once tried doing all calculations with 64 bit real numbers and found the slowdown unacceptable.

foldy · Post by **foldy** » Mon Dec 03, 2018 4:21 pm

It is still possible that the RTX AI cores get abandoned again on gaming GPUs if it has not enough bang for the buck. Then only the Quadro and Titan will have them in future.

Duckers · Post by **Duckers** » Mon Dec 03, 2018 4:24 pm

As topic states, i probably ain't gonna use the A.I machine learning cores in my RTX 2080 for gaming anytime soon, and was wondering if FaH supports or will get support/an algorithm to make use of the A.I cores for folding? That way i can use the A.I cores for folding to help science only and the rest of the gpu for gaming.

JimboPalmer · Post by **JimboPalmer** » Mon Dec 03, 2018 8:24 pm

Not as you plan it, no

There maybe parts of the program where using the half precision AI cores could speed up some calculations, but most will need to be full precision or double precision in the CUDA cores,

Folding Forum

Artificial Intelligence for F@H

Artificial Intelligence for F@H

Re: Artificial Intelligence for F@H

Re: Artificial Intelligence for F@H

Re: Artificial Intelligence for F@H

Re: Artificial Intelligence for F@H

Re: Artificial Intelligence for F@H

Re: Artificial Intelligence for F@H

Re: Artificial Intelligence for F@H

Re: Artificial Intelligence for F@H

Re: Artificial Intelligence for F@H

Re: Artificial Intelligence for F@H

Re: Artificial Intelligence for F@H

Re: Artificial Intelligence for F@H

Will FaH support RTX or the A.I cores for folding?

Re: Will FaH support RTX or the A.I cores for folding?