Protein folding: you'll never solve it without the Glycome

fibonacci · Post by **fibonacci** » Sun Nov 16, 2014 12:34 am

Why is it that the vast majority of thinking about the protein folding problem seems to be overly reductionist in its approach? What seems to be massively overlooked is that fact that many proteins are not just proteins, but rather they're GLYCOproteins. We now know that glycosylation of proteins has an absolutely profound influence on protein folding, so why is it so often overlooked for its impacts on the protein folding problem? The sum total of glycan structures that can be added to glycoproteins--aka the glycome--is extroadinarily massive and insanely complex and is often quoted as being orders of magnitude more complex than the entire proteome itself, as well as also being described as one of the most complex entities in nature. The problem, when it comes to glycosylation, is that no code exists for being able to predict which glycan structures may get added to proteins which may influence the way that they can fold. Additionally, even if you were able to solve the immense problem of correctly predicting glycan addition to proteins, you still can not predict yet with 100% accuracy at which sites on a protein will glycosylation since no sequence motif exists for certain types of glycosylation. For just one example of how glycans can alter the biophysics for protein folding see :

Effect of glycosylation on protein folding: A close look at thermodynamic stabilization

Ok, I know what you are saying, Anfinsen showed that he could unfold, then refold a small enzyme in a test tube. The fact that he could refold the purified enzyme in the absence of any other cellular factors demonstrated that the amino acid sequence contains all the information required to fold the protein. However, there is one extremely important caveat---this protein was intracellular. While intracellular proteins are devoid of the complex types of glycosylation above (and which is why Anfinsen's experiment probably worked), cell membrane proteins and secreted proteins will absolutely be modified by the complex entities of the glycome to affect their folding, cell surface half lives, physiological functions, and trafficking, so even if you were able to model an intracellular protein folding correctly by completely ignoring glycosylation the same approach probably wouldn't work for the enormous amount of proteins that are on the cell surface or are secreted.

However, there is still one other catch. The ultimate goal of protein folding is probably, I'm guessing, the ability to predict function from form, and this is where things are not straightforward at all. Yes, an intracellular enzyme may fold correctly based on its amino acid sequence and it is not N- or O- linked glycosylated, however, there is one other class of glycosylation that is extremely special--and that's the addition of a single sugar called N-acetylglucosamine (O-glcnac) to Ser/Thr sites that can act in a ying-yang relationship with phosphorylation (and O-glcnac plus N- and O- linked glycans are the reason people now say >90% of proteins are modified by sugars in one way or another). Phosphorylation, as many scientists know, serves to act as an on/off switch for proteins, so even if you correctly predicted the folding for an intracellular protein you could probably still guess its function if all you thought about was phosphorylation since phosphorylation only serves as an on/off mechanism. However, the addition O-GlcNAc is much different. Not only can it serve as an on/off regulatory mechanism just like phosphorylation, O-glcnac can endow the same protein with completely different capabilities. For examples of what O-glcnac can do see these papers :

O-GlcNAc regulates pluripotency and reprogramming by directly acting on core components of the pluripotency network.
Modulation of transcription factor function by O-GlcNAc modification

Ok, let's assume you even were able to predict the correct structure of an *intracellular* protein, then what? Function from form does. not. follow. Any one of the 400 types of post-translational modifications (example O-glcnac) that could occur on your protein could still radically alter protein function in ways that are completely unpredictable and are completely outside of the realms of what could be predicted from amino acid sequence. Why hasn't the protein folding problem been solved? For starters, I see too much reductionism (complete ignorance of the vast and enormous influences of post translational modifications on the protein folding problem that often get ignored). In order to predict proper protein folding, I have no doubt that you'd probably need to 1.) also be able to predict which exact glycan structures get added to certain types of proteins (impossible), 2.) predict where and when those glycans will be added (impossible) 3.) also include into your model all of the possible conformations of massively complex glycans that are able to exist in solution (i.e. glycans move just like proteins do in solution), which is insanely difficult by itself. Protein folding....you have a glycosylation/glycomics problem.

Post by **bruce** » Sun Nov 16, 2014 1:23 am

fibonacci wrote:The ultimate goal of protein folding is probably, I'm guessing, the ability to predict function from form, and this is where things are not straightforward at all.

Bad guess. Have you read any of the technical papers that have been published based on FAH's research? See http://folding.stanford.edu/home/papers

I notice you didn't use the word prion anywhere in your post. Yes, proteins generally fold into a particular shape but not always, and disease is often associated with those (shall we call them "mis-folded") proteins. The goal of FAH is NOT to determine form or to predict function, but rather to understand the folding process in such detail that one can understand WHY an otherwise healthy protein might mis-fold into a prion.

Predicting function is unnecessary when you start the knowledge of something called a prion which is already known to have a nefarious function (as in the case of diseases: BSE (Mad Cow), Alzheimer’s, Huntington’s, …) but which also seems to sometimes turn up when we don't want it to. FAH started with a goal to advance the understanding of these issues and has since branched out into other areas which are tractable using the analytical tools developed along the way.

See https://folding.stanford.edu/home/the-science#ntoc3

I'm not disputing that knowledge of the entire glycome is important, (and membrane and rest of the cell and ...) but that's a few orders a magnitude more complex, and there's plenty of unknown details that warrant additional research when looking strictly at proteins folding in a bath of water or saline. Then, too, understanding processes that happen on the femto- or pico- or nano-second timeframe while also understanding processes that happen on the micro- or milli- or whole second timeframe adds another major challenge to understanding the HOW and WHY of protein folding.

Post by **Jesse_V** » Sun Nov 16, 2014 7:14 am

Folding@home has also produced several paper on proteins folding in a crowded and complex environment. I agree with Bruce that there's plenty to to with molecules/proteins in solution, but things get even more complex if you bring other molecules into the picture. This has something that has been studied by F@h, but part of the problem is that even with massive computational resources, these problems are also extremely complex and demanding. I fully anticipate that with more donors and faster speed that we'd be tackling molecules where the fully complexity of surrounding environment is considered in detail. It would be up to the scientists to do the cost-benefit analysis and decide what projects are worth it, and whether or not they need that much detail.

fibonacci · Post by **fibonacci** » Sun Nov 16, 2014 4:26 pm

bruce wrote:
fibonacci wrote:The ultimate goal of protein folding is probably, I'm guessing, the ability to predict function from form, and this is where things are not straightforward at all.
Bad guess. Have you read any of the technical papers that have been published based on FAH's research? See http://folding.stanford.edu/home/papers

I notice you didn't use the word prion anywhere in your post. Yes, proteins generally fold into a particular shape but not always, and disease is often associated with those (shall we call them "mis-folded") proteins. The goal of FAH is NOT to determine form or to predict function, but rather to understand the folding process in such detail that one can understand WHY an otherwise healthy protein might mis-fold into a prion.

Precisely! You can not fully understand WHY an otherwise healhty protein will mis-fold without without considering how the glycome has been altered since it is well known that protein glycosylation is a critical control mechanism to regulate protein folding. Alterations to the glycosylation features of glycoproteins is likely a significant factor for contributing to the mis-folding of certian proteins like prions:

Inhibition of complex glycosylation increases the formation of PrPsc. Traffic. 2003 May;4(5):313-22.
Altered prion protein glycosylation in the aging mouse brain J Neurochem. 2007 Feb;100(3):841-54. Epub 2006 Nov 27.
Glycosylation influences cross-species formation of protease-resistant prion proteinEMBO J. Dec 3, 2001; 20(23): 6692–6699.

Another example is tau proteins or amyloid plaques:

Glycosylation of microtubule-associated protein tau: an abnormal posttranslational modification in Alzheimer's disease.Nat Med. 1996 Aug;2(8):871-5.
Post-translational modifications of tau protein: Implications for Alzheimer's diseaseNeurochem Int. 2011 Mar;58(4)
How post-translational modifications influence amyloid formation: a systematic study of phosphorylation and glycosylation in model peptidesChemistry. 2010 Jul 12;16(26):7881-8

I'm not disputing that knowledge of the entire glycome is important, (and membrane and rest of the cell and ...) but that's a few orders a magnitude more complex, and there's plenty of unknown details that warrant additional research when looking strictly at proteins folding in a bath of water or saline. Then, too, understanding processes that happen on the femto- or pico- or nano-second timeframe while also understanding processes that happen on the micro- or milli- or whole second timeframe adds another major challenge to understanding the HOW and WHY of protein folding.

Glycans are added immediately on the nascently folding protein, and as the first reference I provided in my first post discusses, glycans significantly alter the thermodynamics of protein folding--which is going to affect how the entire protein is going to fold on the femto-, pico-, or nano- second scale. This reference as well discusses how glycosylation will profoundly alter folding kinetics:

The effect of glycosylation on the folding kinetics of erythropoietin. J Mol Biol. 2011 Sep 23;412(3):536-50

I'm just curious as to how we plan on solving the protein folding problem if we're going to completely ignore a massive set of regulatory mechanisms that alter both the kinetics and thermodynamics of protein folding. Even if we don't ignore the effects of glycoylation, we still can't predict how, when, and where proteins will be glycosylated since there is no known code for making predictions like how we can use the DNA template to determine protein sequence. Protein folding and glycomics can not be separated.

fibonacci · Post by **fibonacci** » Sun Nov 16, 2014 4:44 pm

Jesse_V wrote:Folding@home has also produced several paper on proteins folding in a crowded and complex environment. I agree with Bruce that there's plenty to to with molecules/proteins in solution, but things get even more complex if you bring other molecules into the picture. This has something that has been studied by F@h, but part of the problem is that even with massive computational resources, these problems are also extremely complex and demanding. I fully anticipate that with more donors and faster speed that we'd be tackling molecules where the fully complexity of surrounding environment is considered in detail. It would be up to the scientists to do the cost-benefit analysis and decide what projects are worth it, and whether or not they need that much detail.

But that's my hangup, even if we had very powerful computational resources at our disposal, how can we predict which sets of glycans get added to proteins that can regulate their folding and also predict where on a protein those glycans will get added? No solution exists to this problem since no one has ever solved the glycocode. Just for an idea of the complexity:

-amino acids can only be added linearly together (that's the way the chemistry works for amide bonds of AAs).
-Carbohydrates can be added together in BOTH alpha/beta configurations AND also to 5 different choices for -OH groups that are arranged in 3D space (and this is where their massive complexity is derived from)
-3 different amino acids can only give you 6 different possible combinations, however, if you take 3 different sugars that could get added to proteins as they fold, the number of possible combinations is over 25,000 because of their alpha/beta configurations and the number of possible -OH groups in 3D space that they could be added to.
-If you move to just 6 sugars that could be added to proteins to regulate their folding, the number of possible combinations now jumps to over 1,000,000,000,000. Most glycans that get added to proteins are on average somewhere of the size of 6-13 glycans, so the 1,000,000,000,000 glycoforms of proteins you'd have to consider for folding may be a bit on the low side for some proteins.

Again, how you can solve the protein folding problem if you ignore fundamental science that regulates the biophysics of folding? Unfortunately, glycomics has proven almost impenetrable for prediction because of its astronomical complexity.

Post by **VijayPande** » Sun Nov 16, 2014 8:57 pm

The Glycome is clearly important, but one can come up with drugs for cancer, Alzheimers', infectious disease, and many other areas w/o having to address it. So why get more complex than you have to in order to solve these key problems. I'm sure it's something that we'll get involved with sooner or later, when we have a problem that requires it.

fibonacci · Post by **fibonacci** » Sun Nov 16, 2014 11:53 pm

VijayPande wrote:The Glycome is clearly important, but one can come up with drugs for cancer, Alzheimers', infectious disease, and many other areas w/o having to address it. So why get more complex than you have to in order to solve these key problems. I'm sure it's something that we'll get involved with sooner or later, when we have a problem that requires it.

I agree it is possible to come up with drugs to treat disease without directly addressing what the glycome is doing, but let's not forget how many, many, many drugs were discovered and how they are still discovered to this day, and that's through phenotypic screening. Phenotypic screening assumes far less about your target or system, and only looks for a desired effect, which is in complete contrast to targeted drug discovery.

How were new medicines discovered? Nature Reviews Drug Discovery 10, 507-519 (July 2011)
The why and how of phenotypic small-molecule screens Nat Chem Biol. 2013 Apr;9(4):206-9

In fact, phenotypic screening has proven to be more effective at creating new drugs than targeted discovery even today, and that's in an age when we probably spend billions of dollars more in targeted discovery than through phenotypic approaches. A big reason why in silico screening of targets fail and why we still can not model ligand protein binding accurately is probably because we ignore how things like protein glycosylation will radically alter conformational states and shapes of cell surface proteins. Almost all of the crystal structures people study are only obtained *after their glycans have been removed*. Should we really wonder then why model predictions of which small molecules are predicted to bind nicely to a protein target fail miserably in real life? Maybe it's because your crystal structure and models have probably failed to account for the numerous effects of glycoforms on the real protein conformational states that exist in vivo, thus I beg to differ, we already *do* have problems in my opinion that require the effects of the glycome, and it entirely overlaps with the protein folding problem, drug discovery, and simply understanding how many diseases such as prions and tauopathies work. The reason phenotypic screening was wildly successful in the past and still to this day could be because it still took into account glycoforms of proteins (in an indirect way of course) and made no assumption in order to predict protein conformations (such as igorning all effects of glycans on cell surface receptors) and small molecule binding--you simply threw in your small molecules into a batch of cells and looked for a desired effect.

Sometimes you can find drugs with targeted discovery and without addressing the glycome of course, but pharma has been in a well publicized rut for almost 2 decades now, with declining returns on investment through targeted drug discovery. Again, I'll also point out that a treatment like EPO couldn't be recombinantly made very well until entire glycosylation pathways that were needed to create the correct glycoforms of and to assist with EPO folding were engineered into a plant. Additionally, there is also a ton of publications out there on the need for properly glycosylated forms of monoclonal antibodies for both efficacy and properly folded mabs.

Emerging Principles for the Therapeutic Exploitation of Glycosylation Science. 2014 Jan 3;343(6166):1235681

Folding Forum

Protein folding: you'll never solve it without the Glycome

Protein folding: you'll never solve it without the Glycome

Re: Protein folding: you'll never solve it without the Glyco

Re: Protein folding: you'll never solve it without the Glyco

Re: Protein folding: you'll never solve it without the Glyco

Re: Protein folding: you'll never solve it without the Glyco

Re: Protein folding: you'll never solve it without the Glyco

Re: Protein folding: you'll never solve it without the Glyco