Page 2 of 2

Re: Project 11715

Posted: Tue Sep 18, 2018 7:49 pm
by rafwiewiora
Spot on answers from bruce and Joe_H - just to clarify about these:
The computational power already is far beyond our still not automated (we're pushing on that front too) data analysis speed and it's a bottleneck.

"You mean there is a surplus of computation being done relative to the ability to sort/make sense of the data?
- what I mean is for a single researcher it takes much longer to analyze the data they've collected then it took to collect the data. It works out at the global level - with increasing computational power we've increased the number of proteins we study, from an increasing number of researchers (e.g. I'm collecting data now for 2 other people to analyze). But it is somewhat frustrating to me that if it didn't take me such a long time to analyze my data, I could collect much more data for myself and study more proteins in my PhD. But of course if everyone does this too, we'll get short on computational power. And here comes adaptive sampling - it gives us both shorter analysis time and shorter data collection time by reducing the amount of data required to make the same conclusions. Overall the amount of science increases, and the work satisfaction of the researchers increases because they do more science and more exciting stuff (e.g. spend more time experimentally testing simulation predictions rather than analyzing data to make those predictions).

Re: Project 11715

Posted: Wed Sep 19, 2018 6:21 am
by bruce
I presume that if a project will get reliable results with N frames (or ns or whatever) and that gathering 2N samples will give even better results. I also presume that for 2N samples it will t certainly take more data reduction time that for N but probably somewhat less that 2N as much researcher time. In other words, the quality of the science goes up when the researcher spends more time on it, but it's not really limited by us Donors or our hardware. (This is in addition to being able to study more proteins.

Adaptive sampling does, however, improve the quality of the science by reducing the number of samples that need to be examined.

First of all, I don't think MSMBuilder can be distributed. so you're stuck with that process.

Second: Can AI be harnessed to accomplish a similar extraction of local semi-stable states? If so, you're probably still stuck with doing that process since I doubt that AI adapts well to distributed computing. If that's true, maybe next year's hardware budget should be adapting your cluster to do better AI.

Re: Project 11715

Posted: Wed Sep 19, 2018 12:21 pm
by SteveWillis
bruce wrote:(Personal note: The annual CO2 production associated with my participation in FAH is essentially zero. Solar panels on my home produce enough power that my annual electric bill is zero, give or take some small amount.)
Off topic: We don't have solar panels but we do contribute to a TVA program that reduces our carbon footprint since folding about triples our household electricity usage.
https://www.tva.com/Energy/Valley-Renew ... wer-Switch

Re: Project 11715

Posted: Wed Sep 19, 2018 2:37 pm
by rafwiewiora
bruce - great points. On the N vs 2N I was thinking about that a bit - it seems to me that the data analysis time is say a step function of the data amount - i.e. I can probably go up to 5N, maybe even 10N and still spend the same amount of time analyzing - the CPU time needed to build the models increases but those are big jobs anyway and I'd already run them overnight - it's mostly about 'the approach' (e.g. extensive hyperparameter validation we started doing recently for the multiple ms datasets we have now).

So like you say, we have to decide how far to take a project out - and that's also not easy - I have some heuristics but you have to build a model to really know. Some projects we have to then run longer, some we already would have collected more than we needed to make a satisfactory quality model (but more data is always better models). This is also where adaptive sampling helps - in order to do it, you have to have automated analysis in place which decides if you've collected enough and collects more if need be - this should let us manage the computational effort better across projects.

I'm not sure if the analysis has to be ever distributed actually - these are CPU-only jobs, they're not longer than a few days max. on tens of CPUs on the cluster, and this is with ms datasets that we will shrink down with adaptive. As for the AI - we're nowhere near the stage of worrying about how to practically do it - development is still at the theoretical level (e.g. https://www.nature.com/articles/s41467-017-02388-1). My hope is we'll run on a fully automated classical MSM setup for a couple of years, then move to a neural net version when it's robust. I guess at that stage we can start using GPUs, so perhaps a distributed analysis opportunity sits there. But again - we have a very strong cluster already.

Re: Project 11715

Posted: Wed Sep 19, 2018 4:45 pm
by bruce
Interesting paper.... That sort of what I was suggesting.

I probably should have used "neural network " rather that "AI" above but in common parlance, they're interchangeable. My point is that there are a lot of steps in VAMP that are not amenable with Distributed Computing so we can't help with that process, even if the applicable processing time introduces a noticeable lag in the data reduction process. Overnight on the cluster isn't important compared to the cost of managing all the input and output data surrounding that step. Having the right tools in place is critical, just as is looking globally for an optimal set of tools and the sequence of steps.

Many years ago, my company spend a lot of money on optimizing our factory's assembly shop floor, eliminating individual tool-boxes and developing dedicated work cells by processes, (ncluding optimizing the placement of hand-tools within that work-cell) and then figuring out optimal job routing into/out-of those cells based on the type of processing needed for that job. (We build a large variety of short-production-run assemblies [hundreds of lots of dozens of parts] which are not amenable to the kind of automation you commonly see in an automotive assembly line -- and, of course, we still have a "model shop" where highly skilled mechanics can build really small quantities of parts that need not be productionized )

Re: Project 11715

Posted: Tue Oct 09, 2018 2:21 pm
by boboviz
I don't longer receive work for 11715. The beta test is finished? Are you ready to start with production phase??

Re: Project 11715

Posted: Tue Oct 09, 2018 3:07 pm
by bollix47
P11715 went to full fah back in early Jul '18:
viewtopic.php?f=24&t=30909&p=302121#p302121
I've asked PG for it's current status.

Re: Project 11715

Posted: Tue Oct 09, 2018 5:07 pm
by artoar_11
According to HFM.NET, for the last time I received WU from p11715 on October 06.2018 with "beta" key.

Re: Project 11715

Posted: Tue Oct 09, 2018 5:41 pm
by bollix47
Thanks Arto, that confirms what I was told by PG ... i.e. the project is still active but getting close to the finish line.

@boboviz fyi I've completed at least 35 WUs from this project in the first 8 days of October so I too can confirm it's still active but, as trajectories end, the number of WUs available may be fewer & fewer.

Re: Project 11715

Posted: Wed Oct 10, 2018 10:46 pm
by bruce
When a project is in beta, it's competing with fairly few other projects. When it moves to Full FAH, there is probably a lot more competition for resources.

Within each client-type grouping, there is also a project priority, and when a project is nearing completion, it makes sense to be nice and to reduce the priority if the final due-date is still a distance in the future. (That doesn't alter the points, only the number of WUs distributed per day.)

I'm going to guess that you're asking about this particular project because it awards you a more generous PPD that other projects you might be getting. Is that true?