Page 1 of 2

Decode of trajectory

Posted: Mon Jan 20, 2014 4:04 am
by ChristianVirtual
I start to understand the data provided by the trajectory on high level and use it for visualization; right now in a more simple way (atoms with and without bonds).

I wonder if there is an algorithm described and available how to decode the atoms and bonds and figure out where is the backbone(s) of the protein and what side chains are connected.

I have some draft algorithm for the backbone but it's too rough as it's mainly focus on atom numbers and surrounding atoms, looking for N and C as potential backbone candidates. Not very precise as I can see small orphans in my model. Some side chains, when locally checked with my beginner algorithm, appears as backbone.

I'm sure there is something better around ... Any hint would be welcome

Re: Decode of trajectory

Posted: Mon Jan 20, 2014 12:20 pm
by ChristianVirtual
as more as I read as more complex it gets ... :oops:

Re: Decode of trajectory

Posted: Mon Jan 20, 2014 2:35 pm
by calxalot
Suggest you see what FAHViewer is doing.
https://fah.stanford.edu/svn/pub/trunk/viewer/

Re: Decode of trajectory

Posted: Mon Jan 20, 2014 3:51 pm
by Jesse_V
See I would be more interested in identifying carbon rings and the like, rather than the backbone. Carbon rings could be detected recursively by following neighboring carbon atoms and checking to see if you meet the original atom again. You've seen what my viewer can do, but I too would like to identify backbones and whatnot like the PS3 viewer could.

You're either going to have to identify structures via molecular connections or via the positions of atoms relative to one another, or some combination thereof. There are programs out there for identifying alpha helixes and beta sheets, but I don't know how they work. Topology and molecules are both very complicated topics.

Re: Decode of trajectory

Posted: Mon Jan 20, 2014 3:54 pm
by ChristianVirtual
From what I saw in the viewer code it would not help me to visualize a (simplified) backbone conformation ( similar to my avatar) or a surface model with residue types. But I will go through once more in case I missed something.

Re: Decode of trajectory

Posted: Mon Jan 20, 2014 3:57 pm
by Jesse_V
ChristianVirtual wrote:From what I saw in the viewer code it would not help me to visualize a (simplified) backbone conformation ( similar to my avatar) or a surface model with residue types. But I will go through once more in case I missed something.
I don't think you're going to find anything, FAHViewer just has different render modes but doesn't have that topology analysis as far as I can tell. My viewer also doesn't do backbone visualizations either.

Re: Decode of trajectory

Posted: Mon Jan 20, 2014 4:01 pm
by ChristianVirtual
Here is what I have sofar, quite a reduction of side chains and focus in backbone. But with not a high level of confidence :roll:
Image

And yeah, seems to get very complex to do it right :?
Like those little mistake at "four o'clock" ...

Re: Decode of trajectory

Posted: Mon Jan 20, 2014 4:14 pm
by Jesse_V
You know, I wonder if you could simplify the problem by eliminating atoms that have only one edge or are just connected in a single train. You could do this by iterating through all the atoms and counting how many bonds they have, or you could recursively navigate through the atoms in the protein and if one of your neighboring atoms isn't connected to anything other than yourself, eliminate it. There are other techniques such as building a tree data structure from the bonded atoms and then eliminate all the leaf nodes. Generally though, you might find a backbone if you drop any trains of atoms sticking out from the main paths. For example,

Code: Select all

0
|
0--0 <--this is a one-node train of atoms sticking out of the main branch.
|
0
likewise

Code: Select all

0
|
0--0--0--0--0 <--this is a train of four atoms
|
0

Re: Decode of trajectory

Posted: Mon Jan 20, 2014 4:32 pm
by ChristianVirtual
Something like that I was also thinking, get rid first of all those H's and many C's at the edge. Still I would need to identify rings as you mentioned. While reading the trajectory I build already some doubled-linked lists to make traveling trough the protein easier. Need to build the ring detector to go further. The N-C-C-N chain should be protected against this cutting.
How does both end looks like ? NH2 and COOH ?

Re: Decode of trajectory

Posted: Mon Jan 20, 2014 4:46 pm
by Jesse_V
Yeah, but I think you could do the same job without worry about what type they are. If you find a train that doesn't have any branches, (even if it's only one node sticking out) delete that train. Do this once and you have a basic outline of the protein. Do this again and you'd trim down your backbone even more.

You might look into a topological sort or building some kind of tree data structure and then trimming the leaves of the tree.

Re: Decode of trajectory

Posted: Tue Jan 21, 2014 2:41 am
by NookieBandit
Now that Sony no longer supports the PS3 for folding, perhaps someone at the Pande Lab could ask Sony if they would allow the algorithm used to identify backbones and the visualization code, more generally, to be put in the public domain? The code would be unlikely to be directly useful, but reading through it could yield some helpful hints on the way they solved the problem you guys are discussing...

Re: Decode of trajectory

Posted: Wed Jan 22, 2014 12:12 pm
by ChristianVirtual
with 99.99999% I believe "CA" stand for alpha Carbon; as kind of central atom in an amino acid.

Question to those with more knowledge and insights (means: to everyone :?: ) : Could I trust that the sequence in the data is always like this (of course with different side chains) ? [NH , CA, CO] ?

If that is the case I don't need all those fancy thought I have with backtracking or neighbour analysis; I just need to check the bonds list.

Code: Select all

    ["N", -0.4157, 1.7063, 14.01, 7],
    ["H", 0.2719, 1.15, 1.008, 1],
    ["CA", 0.0337, 1.9, 12.01, 6],
    ["HA", 0.0823, 1.25, 1.008, 1],
        ["CB", -0.1825, 1.9, 12.01, 6],
        ["HB1", 0.0603, 1.25, 1.008, 1],
        ["HB2", 0.0603, 1.25, 1.008, 1],
        ["HB3", 0.0603, 1.25, 1.008, 1],
    ["C", 0.5973, 1.875, 12.01, 6],
    ["O", -0.5679, 1.48, 16, 8],
    
    ["N", -0.4157, 1.7063, 14.01, 7],
    ["H", 0.2719, 1.15, 1.008, 1],
    ["CA", -0.0024, 1.9, 12.01, 6],
    ["HA", 0.0978, 1.25, 1.008, 1],
        ["CB", -0.0343, 1.9, 12.01, 6],
        ["HB1", 0.0295, 1.25, 1.008, 1],
        ["HB2", 0.0295, 1.25, 1.008, 1],
        ["CG", 0.0118, 1.875, 12.01, 6],
        ["CD1", -0.1256, 1.875, 12.01, 6],
        ["HD1", 0.133, 1.25, 1.008, 1],
        ["CE1", -0.1704, 1.875, 12.01, 6],
        ["HE1", 0.143, 1.25, 1.008, 1],
        ["CZ", -0.1072, 1.875, 12.01, 6],
        ["HZ", 0.1297, 1.25, 1.008, 1],
        ["CE2", -0.1704, 1.875, 12.01, 6],
        ["HE2", 0.143, 1.25, 1.008, 1],
        ["CD2", -0.1256, 1.875, 12.01, 6],
        ["HD2", 0.133, 1.25, 1.008, 1],
    ["C", 0.5973, 1.875, 12.01, 6],
    ["O", -0.5679, 1.48, 16, 8],
    
    ["N", -0.4157, 1.7063, 14.01, 7],
    ["H", 0.2719, 1.15, 1.008, 1],
    ["CA", -0.0237, 1.9, 12.01, 6],
    ["HA", 0.088, 1.25, 1.008, 1],
        ["CB", 0.0342, 1.9, 12.01, 6],
        ["HB1", 0.0241, 1.25, 1.008, 1],
        ["HB2", 0.0241, 1.25, 1.008, 1],
        ["CG", 0.0018, 1.9, 12.01, 6],
        ["HG1", 0.044, 1.25, 1.008, 1],
        ["HG2", 0.044, 1.25, 1.008, 1],
        ["SD", -0.2737, 1.775, 32.060001, 16],
        ["CE", -0.0536, 1.9, 12.01, 6],
        ["HE1", 0.0684, 1.25, 1.008, 1],
        ["HE2", 0.0684, 1.25, 1.008, 1],
        ["HE3", 0.0684, 1.25, 1.008, 1],
    ["C", 0.5973, 1.875, 12.01, 6],
    ["O", -0.5679, 1.48, 16, 8],

Update: with that assumption and some neighbor checking I get a much better result (without local misinterpretation) I can build my next ideas on:

Image

To be honest (and I guess you guessed already): I have no idea how that chemically and biological works, but it looks nice and give me a closer "emotional bond" to a WU if I can see more such details (instead of P1234 R1 C2 G3 only)

Re: Decode of trajectory

Posted: Thu Jan 23, 2014 1:58 pm
by ChristianVirtual
Just see this news: http://folding.stanford.edu/home/watch- ... in-action/

With this https://github.com/SimTk. Just copy for reference here in case the news get purged later. Guess I have some more reading over there.
Some wise Jedi knight once said: "May the source be with you" and "Use the source, Luke" :mrgreen:

Re: Decode of trajectory

Posted: Thu Jan 23, 2014 5:33 pm
by Jesse_V
Linux users be like "The source will be with you, always." :D

Re: Decode of trajectory

Posted: Mon Jun 30, 2014 1:14 pm
by ChristianVirtual
and sometimes I feel like Indiana Jones in the Big City: "That's not a bond ... that's a bond"

Image

:lol:


I really wish the trajectories would not exists in unknown boxes or we had enough info to stich them back properly. :egeek: