Page 1 of 2

Project 11715

PostPosted: Thu Sep 06, 2018 9:36 am
by boboviz
This project is generating a test set for what we hope will increase the power of F@h by orders of magnitude


Seems to be an ambitious project.
This project may be "trasversal" to all simulations/projects?? All simulation will have advancements?

Re: Project 11715

PostPosted: Fri Sep 07, 2018 1:27 am
by rafwiewiora
Well ambitious maybe in the results, but shouldn't be anything particularly hard to implement, theory is all there - see e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3637129/

We'll use it for most projects I anticipate, but not all. The adaptive schemes need a goal function - you have to choose what you want to explore most - there will be cases when we just want to go completely unbiased still.

Re: Project 11715

PostPosted: Fri Sep 07, 2018 5:43 pm
by Nert
I hope you will be able to keep us informed about the progress of this project. The prospect of "orders of magnitude" improvements in processing efficiency is exciting.

Re: Project 11715

PostPosted: Fri Sep 07, 2018 7:52 pm
by rafwiewiora
Will do for sure, should get to first trials of this in Oct, Nov maybe we'll have something on BETA.

Re: Project 11715

PostPosted: Mon Sep 10, 2018 1:31 pm
by boboviz
rafwiewiora wrote:Will do for sure, should get to first trials of this in Oct, Nov maybe we'll have something on BETA.


Great!! Thank you

Re: Project 11715

PostPosted: Tue Sep 11, 2018 5:07 pm
by JimF
These are all good questions, but I am wondering about the hardware implications. That is, will you do more complex simulations with the present ratio of GPU to CPU work units, or will you shift over to CPU? I would think that you could have a hard time keeping all the GPU's busy. It is a pleasant problem to have, but crunchers will need to adjust accordingly.

Re: Project 11715

PostPosted: Wed Sep 12, 2018 5:11 am
by bruce
FAH has repeatedly stated that the project quantity of research will continue to grow faster than hardware can grow. Sort of a Moore's law for protein research.

Those who predict the hardware version cannot continue have been repeatedly proven wrong.

Re: Project 11715

PostPosted: Wed Sep 12, 2018 4:39 pm
by JimF
That is the past. We are dealing with order of magnitude here, not the usual 50% improvement each year. I expect they will face some policy decisions on how to handle it.
Or, to put it another way, can they increase the complexity of the projects to match their new capability? I am sure they are wondering that themselves.

Re: Project 11715

PostPosted: Wed Sep 12, 2018 9:57 pm
by rafwiewiora
I have been wondering about that a bit indeed - no good answers until we see how this works in practice though. The crucial thing here is that we're not only increasing the computational power this way, but (in fact this is the more important motivation for me) we also will drastically lower the amount of data we have to collect to answer the same questions. The computational power already is far beyond our still not automated (we're pushing on that front too) data analysis speed and it's a bottleneck. So my hope is that a single researcher will be able to also address an order of magnitude (say a protein family vs. a single protein) more scientific questions and the computational power will be fully used. Will know more after first experiments!

Re: Project 11715

PostPosted: Sun Sep 16, 2018 11:12 am
by tcaud
rafwiewiora wrote:I have been wondering about that a bit indeed - no good answers until we see how this works in practice though. The crucial thing here is that we're not only increasing the computational power this way, but (in fact this is the more important motivation for me) we also will drastically lower the amount of data we have to collect to answer the same questions. The computational power already is far beyond our still not automated (we're pushing on that front too) data analysis speed and it's a bottleneck. So my hope is that a single researcher will be able to also address an order of magnitude (say a protein family vs. a single protein) more scientific questions and the computational power will be fully used. Will know more after first experiments!


I don't think I understand. You mean there is a surplus of computation being done relative to the ability to sort/make sense of the data? Similar to how the planet is warming because there is more CO2 being produced than the plantlife can absorb.

Re: Project 11715

PostPosted: Sun Sep 16, 2018 9:39 pm
by bruce
I look at it this way:
1) The limitations of available computation are profound, compared to the quantity of proteins that need more study.
2) High priority projects that can be run on specific hardware configurations [/i]SHOULD[/i] always be available but sometimes WUs from lower priority projects are distributed because of temporary server or project downtime. Those lower priority projects do get used to broaden the understanding of projects which have reached some initial minimum number of WUs.
3) From time to time, project suspensions do happen due to a number of potential reasons.
_ (a) The minimum number of WUs has been reached for a project but the scientist needs to reduce/study those data before deciding how to proceed -- often involving coordination and review by others. **
_ (b) An error has been discovered and must be fixed before wasting resources that are needed by other projects.
_ (c) While FAH uses data-center quality server hardware and server management methodology, issues do come up and we try to fix them a quickly as possible.
_(d) etc.

** Note: Scientists, too, do have a life and must balance personal dedication to each specific project with other demands on their time. I'm sure that figuring out more efficient methods of data reduction would make them very, very happy.

(Personal note: The annual CO2 production associated with my participation in FAH is essentially zero. Solar panels on my home produce enough power that my annual electric bill is zero, give or take some small amount.)

Re: Project 11715

PostPosted: Sun Sep 16, 2018 11:13 pm
by tcaud
I'm a bit surprised. It sounds like the researchers are doing this almost haphazard, like the pharma scientists who literally just create drug inhibitor chemicals and do trials with them to see if it works and leads to less resistance (this is what a neurologist explained to her class (I was in it)). I would expect that the planning would be a little more... meticulous? Are they producing a large series of slightly varied proteins based on hunches/speculations and trying them out in simulated cells?

Re: Project 11715

PostPosted: Mon Sep 17, 2018 6:26 pm
by JimF
tcaud wrote:I'm a bit surprised. It sounds like the researchers are doing this almost haphazard,

Not at all, insofar as I can see. It is just research. If it were production, it would be done in-house by a pharmaceutical company for proprietary reasons.

Re: Project 11715

PostPosted: Mon Sep 17, 2018 7:16 pm
by Joe_H
The Markhov state modeling being used is a statistical process, so there is a certain amount of randomness involved. Various starting states are selected and the trajectory over time modeled to find the minimum energy states and give statistics on how frequently those states are reached by all of the different trajectories. Not all possible starting states are used. As I understand it, there are some heuristics on choosing starting states, and others result in physically impossible conformations.

From what I have read about the approach being used here, the goal is to find methods that can be used to look at a short run of a single trajectory and determine its likelihood of having scientifically "interesting" results if continued further along the timeline. By concentrating on those higher likelihood trajectories, hopefully more useful data is created with less total computational time. But that carries with it some risk of missing some minimum energy states that would possibly have been detected on trajectories not selected for continuation.

Re: Project 11715

PostPosted: Mon Sep 17, 2018 8:33 pm
by bruce
tcaud wrote:I'm a bit surprised. It sounds like the researchers are doing this almost haphazard,

Not at all. The calculations required to model ever possible state of even a single protein and identify how it can mis-fold would take more that a lifetime. The key words in the paper mentioned above are "modeling of most biologically relevant systems and timescales [are} intractable."

The MSM methodology being discussed allows scientists to concentrate the significant events into multiiple short (i.e.- practical) studies while avoiding years and years of non-productive modelling. This mathematical study is focusing on enhancing that concentration without missing events that would be encountered in intractably long studies. Mathematical research is enhancing biological research.

Astronomers have a similar problem if they want to observe supernovas that only happen once in millions and millions of years.

At the molecular level, motion in a fluid is random. Read https://en.wikipedia.org/wiki/Brownian_motion
Within a single protein being studied by FAH (i.e.- at the atomic level) , nothing can be studied without considering statistics and a whole series of events must happen before the disease can develop. Would we happen to be watching when those events happened? Not likely. We would spend a lot of time seeing healthy biological processes before we actually observed bad events happening.