«INVESTIGATING STRUCTURE-FUNCTION RELATIONSHIPS IN FAMILY 7 CELLULASES BY MOLECULAR SIMULATION By Courtney Barnett Taylor Dissertation Submitted to ...»
INVESTIGATING STRUCTURE-FUNCTION RELATIONSHIPS IN FAMILY 7
CELLULASES BY MOLECULAR SIMULATION
Courtney Barnett Taylor
Submitted to the Faculty of the
Graduate School of Vanderbilt University
in partial fulfillment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
Clare McCabe Peter T. Cummings Eugene LeBoeuf Kenneth A. Debelak To Mom, Dad, John, and Mal, unwavering in their support and To Trent, for 13 years of patience ii
ACKNOWLEDGEMENTSThis work has been possible through the financial support of the DOE Office of the Biomass Program for funding. Computational time for this research was provided under NSF Teragrid on the TACC Ranger cluster and the NICS Athena and Kraken clusters, and through the resources at NERSC.
I am grateful for the support and guidance I have received during my research from my advisor, Clare McCabe. I am especially grateful for the leap of faith taken when agreeing to let me work from Baton Rouge for nearly my entire graduate career. I would like to thank the members of my Dissertation Committee for their time contributed to this work. Thanks to Faiz Talib for his hard work as an undergrad. I cannot have an acknowledgements section without addressing the group at NREL, especially Dr. Gregg Beckham, who was in essence my co-advisor and instrumental to this work. I thank the rest of the group, Drs. Christy Payne, Mike Crowley, Yannick Bomble, Deanne Sammonds, Lintao Bu, James Matthews, and Mike Himmel, for their massive amounts of patience, assistance, and willingness to answer every rambling email sent their way. They make pretty good friends, too!
Finally, with limited space, I thank my family and friends for maintaining my sanity. To all those I met at Vanderbilt, you are great people. To my Nashville crew (you know who you are), I feel like Dorothy leaving Oz; you all were the best part of this experience, and thanks for walking the road with me. To my closest friends and family that I wish I could name individually, I cannot thank you all enough for your support.
And finally to Trent, bless you bubs, you deserve sainthood at this point; thank you, I love you.
DEDICATION…………………………………………………………………………….ii ACKNOWLEDGMENTS………………………………………………………………..iii LIST OF TABLES……………………………………………………………………….vi LIST OF FIGURES……………………………………………………………………..viii
1.2 Tricoderma reesei Cellulases
Carbohydrate-Binding Module (CBM)
Connective Linker Peptide
1.3 Role of Glycosylation in Cellulase Action
1.4 Summary and Outline of Thesis
2. BACKGROUND AND THEORY
2.1 General Methodology: Molecular Dynamics
2.2 Thermodynamic Integration and Relative Binding Free Energy
3. COMPUTATIONAL INVESTIGATION OF AMINO ACID MUTATION ANDGLYCOSYLATION EFFECTS ON THE CEL7A CBM
3.2 Computational Methods
TI Simulations of Amino Acid Mutations
TI Simulations of Glycosylated CBMs
MD Simulations to Examine the Impact of Glycosylation on CBM Stability..........64!
Relative Binding Free Energy from TI
Interaction and H-bonding Between Residues of Interest and Cellulose..................78!
Protein Backbone Fluctuations With and Without Glycosylation
4. IMPACTS OF O-GLYCOSYLATION ON THE STRUCTURE AND FUNCTION OFCBMS IN FUNGI AND YEASTS
4.2 Computational Approach
All-atom MD Simulations of Glycosylated CBMs
Relative Binding Free Energy from TI Simulations
CBM Glycoprotein-Cellulose Interaction Energy
CBM Protein Backbone Root-mean Square Deviation (RMSD) and Root-mean Square Fluctuation (RMSF)
CBM Glycoprotein-Cellulose Hydrogen Bond Analysis
Binding Affinity Differences in Dimer Glycoforms
Binding Affinity Differences in Branched and Linear Trimer Glycoforms............125!
Impact of Glycosylation Location on Protein Structure and Affinity
Glycosylation As a Natural Means to Improve Binding Affinity
5. BINDING SITE DYNAMICS AND AROMATIC-CARBOHYDRATE
INTERACTIONS IN PROCESSIVE AND NON-PROCESSIVE FAMILY 7GLYCOSIDE HYDROLASES
5.2 Computational Procedures
All-Atom MD Simulations of the Cel7A and Cel7B Catalytic Domains................144!
Thermodynamic Integration and Relative Cellodextrin Binding Free Energy........149!
Molecular-level Comparison of the Cel7A and Cel7B Wild Type Catalytic Domains
Molecular-level Comparison of the Bound Cellodextrin in the Cel7A and Cel7B Wild Type
Relative Binding Free Energy from Aromatic Acid Mutation in the Cel7A and Cel7B from TI Simulation
Molecular-Level Comparison of Aromatic Acid Mutations
5.4 Discussion and Conclusions
6. CONCLUSIONS AND FUTURE WORK
6.2 Future Work
Validation of Findings on Glycosylation Impacts to Binding Affinity
Processive and Non-processive Catalytic Domains in Other GH Families.............189!
v LIST OF TABLES
Table 1.1: Sequence alignment of various CBMs for exoglucanases (Cel7A and Cel6A) and endoglucanases (Cel7B) in example fungi .
The flat-face residues of interest are highlighted in yellow. The native, or wild type, Cel7A amino acid sequence is shown on the first line
Table 1.2: Linder et al.
calculated partition coefficients and binding free energy for Cel7B over Cel7A wild type CBM and mutations in the Cel7A wild type CBM planar face [28,29]. Amino acid lettering: Y, tyrosine; W, tryptophan;
N, asparagine; A, alanine
Table 3.1: Relative binding free energy (!!G, kcal/mol) from amino acid TI calculations, including the two containing native glycosylation (Y5A-G and Y5W-G), and associated change in partition coefficient (KMut/KWT-NG).
The partition coefficient change was not calculated for the intermediate steps, Y5A-G and Y5W-G. The S3M1+Y5A-G and S3M1+Y5W-G entries are the sum of the Y5A-G and Y5W-G and S3M1 (see Table 3.4) entries, respectively
Table 3.2: Relative binding free energy (!!G (kcal/mol)) of Y5A, F5A, Y5F calculated (Equation 3.
2) and Y5F actual from 11 electrostatic windows and 13 van der Waals windows
Table 3.3: Relative binding free energy (!!G (kcal/mol)) of Y5W, S3M1, Y5WG calculated (Equation 3.
3) and Y5W-G actual
Table 3.4: Relative binding free energy (!!G, kcal/mol) from the native glycosylation TI calculations, for a single O-mannose residue at Ser-3 (S3M1) and a disaccharide mannan at Ser-3 (S3M2), and associated change in partition coefficient (KMut/KWT-NG).
The S3M1+S3M2 entry is the sum of the previous two entries
Table 3.5: Relative binding free energy (!!G, kcal/mol) from the engineered glycosylation TI calculations, with a single O-mannose residue at Ser-14 (S14M1-NG) with no glycosylation and a single O-mannose residue at Serwith the native glycans present (S14M1), and associated change in partition coefficient (KMut/KWT-NG).
The S3M1+S14M1 entry is the sum of the S14M1 and S3M1 (see Table 3.4) entries.
Table 3.6: Number of unique hydrogen bonds formed between the mannose residue of interest and the cellulose surface during the 100 ns MD simulations using a hydrogen bond cutoff of 3.
0 Å and an angle criteria of 60° from linear. The typical duration of each bond was between 10 and 15% of the total run.
vi Table 4.1: Cumulative relative binding free energies as a result of changing glycan patterns compared to a non-glycosylated wild type CBM.
The entire glycoform is the mutation from the non-glycosylated wild type, with green and blue representing mannose and glucose residues respectively.
Table 4.2: Individual TI simulation results corresponding to structures in Figure
4.2. The electrostatic and VDW values are calculated using Equation S1, i.e.
!!GElec = !GBound - !GFree. Partition coefficient ratios were not calculated for individual simulations
Table 4.3: H-bond duration between the glycans and cellulose surface.
H-bonds are calculated using a distance cutoff of 3.0 Å and an angle criteria of 60° from linear. The duration (% of run) is calculated by counting the number of bonds and dividing by the total number of observations. Refer to Figures 4.7 and 4.8 for graphical representations of the number of bonds.
Table 5.11: Count of residues involved with H-bonding to a cellodextrin predicted by structural studies [8,15] and the current study’s simulation results.
Table 5.12: Hydrogen bonding patterns and occupancies in Cel7A and Cel7B wild type CD.
Where applicable, active site residues are listed first in red (GluGlu-196, Asp-214/Asp-198, and Glu-201/Glu-217), followed by key aromatic tunnel residues in blue (Trp-376/Trp-329, Trp-367/Trp-320, TrpTyr-38, and Trp-40/Trp-40) for Cel7A/B. Other than the catalytic and aromatic residues, only occupancies 5% of the total 250 ns run are displayed
Table 5.13: Cel7A wild type native contact residues and fraction by binding site.
VDW and electrostatic interaction energies shown with errors calculated using block averaging
Table 5.14: Cel7A wild type native contact residues and fraction by binding site.
VDW and electrostatic interaction energies shown with errors calculated using block averaging
Table 5.15: Relative binding free energy changes for Cel7A and Cel7B per binding site as a result of aromatic acid mutation.
The partition coefficients were estimated using Eqn 5.1 for experimental binding affinities and then inverted to show the improvement of the wild type over the mutated states..........165!
Table 5.16: Detailed relative binding free energies and associated binding affinity (KWT/KMut) calculated from TI for the T.
reesei Cel7A system
Table 5.17: Detailed relative binding free energies and associated binding affinity (KWT/KMut) calculated from TI for the T.
reesei Cel7B system
Figure 1.1: Energy Consumption in the United States in 2010 by sector (Quadrillion BTUs) 
Figure 1.2: Cellobiose units (a) form cellobiose chains (b) containing hydrogen bonding networks, shown with red dotted lines.
The chains and sheets combine to form 36-chain microfibril structures (c).
Figure 1.3: Simplified schematic of synergistic cellulase enzyme action on crystalline and amorphous cellulose as suggested by Lynd et al.
 Nonreducing ends (white box) are drawn to the left and reducing ends (black box) are drawn to the right. In Trichoderma reesei, Cel7A is a reducing end exoglucanase, Cel6A is a non-reducing end exoglucanase, and Cel7B is an exoglucanase
Figure 1.4: Rendering of the Cellobiohydrolase-I (Cel7A) enzyme in complex with a cellulose fibril.
A cellodextrin chain (red) is threaded into the CD where hydrolysis will take place. The connective linker is heavily glycosylated with mannose sugars (light blue). Rendering taken from Himmel et al., 2007 .
Figure 1.5: Proposed catalytic cycle of Cel7A on crystalline cellulose from Chundawat et al.
, 2011 . The yellow and blue space-filling representations are O-glycosylation on the linker and CBM and N-glycosylation on the CD, respectively. The light blue molecule is the Cel7A enzyme, and the green substrate is a cellulose microfibril. The cellobiose product is expelled in (f) and shown in pink
Figure 1.6: Cel7A Family 1 CBM with flat face residues shown.
Aromatics (TyrTyr-31, and Tyr-32) are shown in yellow and polar (Asn-29 and Gln-7) are shown in orange
Figure 1.7: O-glycosylation cites of Cel7A linker peptide as proposed by Harrison  and Nevalainen  with first three residues of CBM included.
Mannose residues are represented by green circles. All of the Thr and Ser in Cel7A on the linker peptide are glycosylated with 1 to 3 mannose sugars with evidence of sulfation and phospohorylation also occurring in some strains [19,44,48].
Figure 1.8: Comparison of Cel7A (A and B) and Cel7B (C and D) catalytic structures [25,59].
The entrance to the Cel7A tunnel is shown in panel B and the entrance to the Cel7B cleft is shown in panel D. The protein loop structures that form the tunnel in Cel7A and the shorter loop structures present in the Cel7B cleft are both shown in blue. The cellodextrin is shown in green.
viii Figure 2.1: Graphical representation of the thermodynamic paths used to calculated !!G in experiment (top to bottom) and alchemically via computation (right to left). WT represents the wild type and Mut represents the mutation.
Figure 2.2: Example Lennard Jones (van der Waals) potential curves generated from equation 2.