FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:   || 2 | 3 | 4 | 5 |   ...   | 14 |

«Improving de novo model quality and its application in ab initio phasing Rojan Shrestha A Dissertation Presented By Rojan Shrestha Submitted to The ...»

-- [ Page 1 ] --

Doctoral Thesis

Improving de novo model quality and its application in ab initio


Rojan Shrestha

A Dissertation Presented


Rojan Shrestha

Submitted to

The Graduate School of Frontier Sciences of the

University of Tokyo in partial fulfillment of the requirements

for the degree of


August 2014

Department of Computational Biology


De novo models are computationally predicted three-dimensional models of the given proteins using only amino acids sequence information. The key components of de novo modeling are the methods responsible for conformational space searching and the evaluation of each conformation accurately using energy function. The conformational space is astronomically large due to the degrees of freedom associated with each residue, which creates the challenge to develop the efficient method for searching the conformational space. Another challenge in de novo modeling is to devise an accurate energy function to evaluate the conformers.

Despite these challenges, the de novo modeling has succeeded to generate accurate models for small and single domain proteins. Fragment assembly is an effective and efficient method for de novo modeling. This method assembles the fragments from known structures under the guidance of energy function. This concept was practically implemented in Rosetta, which achieved a number of break-through successes.

Rosetta has two major stages, which are termed as coarse-grained sampling and all- atom refinement, to generate the final model from the input sequence. At the initial stage, three-residue and nine-residue fragments obtained from known structures are assembled to generate full-length coarse-grained models. These models contain only backbone atoms and the centroid of side-chain atoms. Subsequently, side-chain atoms were packed to construct all-atom models followed by energy minimization in all- atom refinement. However, there exist many challenges in the prediction of accurate models needed for practical use such as solving the crystallographic phase problem.

To address these issues, I have focused on method development – biased conformation sampling and fragment quality improvement to enhance the quality of predicted models. Furthermore, I have developed the method to use de novo fragments for phasing and to assemble these fragments after phasing when full-length model is difficult to predict accurately for phasing.

First, I have developed a method to improve the conformational space search for accuracy improvement. This method first generated coarse-grained models using Rosetta. Second, an ensemble of lowest energy coarse-grained models was selected and deviation for each model from other models of the ensemble was calculated. The deviation for each residue was also computed and this score was called as average pair-wise residue distant score. The score correlated with the accuracy of predicted I residues in the model. When the predicted residues had larger scores, the residues were considered as less accurate and vice versa. Lastly, conformational search was biased using the score as residues with larger scores were given higher frequency for sampling. This procedure rebuilt selected coarse-grained models and then packed the side-chain atoms followed by energy minimization. Molecular replacement was run on these all-atom models and the entire simulation was terminated after a few correct solutions were obtained. This method was tested on 10 difficult targets, which were failed to achieve the success in previous studies using other methods - Rosetta and RosettaX. The rebuilding procedure improved the accuracy of coarse-grained models from 4.93 Å to 4.06 Å on average. Seven out of ten protein targets showed successful molecular replacement solution using rebuilt models.

The second method focused on improving the fragment quality to generate the better quality model. In this study, the method was developed to generate new fragment libraries using a resampling process. Therefore, the lowest energy all-atom models were selected after generation of models using Rosetta. These models were broken into overlapping fragments of three-residue and nine-residue. Average pairwise residue deviation score was computed for three-residue and nine-residue fragments to remove distant fragments. The resultant fragments were clustered and then twenty-five fragments were randomly selected from the top five clusters. These new fragments were used for the second round of prediction. The performance of the method was tested on a benchmark set of 30 different proteins. The accuracy of new fragments and predicted models was evaluated. The result showed that the new fragment library contained better fragments and enriched with many high-quality fragments. In order to evaluate the performance, the lowest energy models and one of best from top five models were taken as the best prediction and computed their root mean square deviation of C-alpha atom (CA-RMSD), template modeling score (TMscore), and global distance test total score (GDT-TS) to the native structures. In all these assessment criteria, this method performed significantly better than Rosetta for lowest energy models and best in top five models. On average, this method improved CA-RMSD from 5.99 Å to 5.03 Å when lowest energy models were selected as the best predicted models. Similarly, it improved both the TM-score and GDT-TS by 7%.

Lastly, a new method was developed to tackle the phase problem using fragmentation and fragment reassembly approach when the full-length model was II inaccurate to use as the template model in molecular replacement. In this method, de novo model were fragmented, independently phased, and reassembled. A lowest energy all-atom models produced using Rosetta were chosen for fragmentation. For each residue position, constant-length overlapping fragments were constructed. These fragments were clustered and two hundred candidate fragments were randomly selected for each residue position. The selected fragments were independently used as search model in molecular replacement. The fragments were assembled together after molecular replacement. To reassemble, one fragment was selected as a seed fragment and one low-energy de novo model was taken as a reference model. The reference model was superposed to the seed fragment. Using the seed fragment and the reference model, position and orientation of other fragments were determined in the crystallographic unit cell and partial model was obtained. The combinations of permissible origins and symmetry operators of space group with unit cell translation were computed to identify the location of other fragments. The combination that gave the smallest distant between the reference model and the candidate fragment was taken as the correct location. In this way, all the fragments were reassembled in the asymmetric unit. This method was tested in ten difficult proteins with three different fragments – thirteen-residue, seventeen-residue and twenty-one-residue. Ten targets were considered as difficult because the best predicted full-length models of these targets, which showed average CA-RMSD 3.97 Å, were unable to provide the phase angles after molecular replacement experiment. The crystal structures of eight protein targets were solved from a total of ten using seventeen-residue fragment and their average CA-RMSD is 1.25 Å.

III Acknowledgements First and foremost, I would like to thank my supervisor Professor Kam Y. J.

Zhang. It has been an honor to be his first Ph.D. student at The University of Tokyo and RIKEN. He has provided me a fantastic academic and research environments where I have had the chance to develop logical thinking, creativity, research skills, and to become an independent and collaborative research professional. I appreciate all his efforts to make my PhD study productive and enjoyable.

I would also like to thank Professor Masahiro Kasahara for stimulating discussions about programming. The discussion with him about programming was very fruitful for the study. I am also grateful to Professor Kasahara for being the jury member of thesis evaluation committee. I would also like to thank Professor Yutaka Suzuki and Professor Koji Tsuda from The University of Tokyo for being the judge for thesis evaluation committee. Similarly, I would like to thank Professor Min Yao from Hokkaido University for being the judge as external referee of my PhD thesis committee.

I like to thank all of my co-workers, Dr. Asuhtosh Kumar, Dr. Arnout Voet, Dr. David Simoncini, Dr. Muhammad Muddassar, Dr Kamlesh Sahu, Dr. Taeho Jo, Dr. Yong Zhou, and Dr. Ryo Takahashi, Mr. Francois Berenger and Ms. Xiao Yin Lee, for professional and personal supports. Their supports have made the PhD study enjoyable and interesting. I would also like to thank the secretary Ms. Hiroko Kani for her support in many aspects of life in Japan.

I gratefully acknowledge the RIKEN, Japan for many things. First, RIKEN funded me for three years to study PhD. The financial support, International Program Associate (IPA), provided from RIKEN was tremendous to spend good life in Japan during the PhD study. Without support from RIKEN, I would not have reached to write this PhD dissertation. I would also thank Graduate School of Frontier Sciences, The University of Tokyo for different research grants. RIKEN has also provided highly sophisticated facilities required for the research from workstation to supercomputer. I appreciate the supercomputing power provided by RIKEN Integrated Cluster of Clusters and would acknowledge Advance Center of Computing and Communication, RIKEN. All experiments I have presented in this thesis were carried out at RIKEN Integrated Cluster of Clusters.

IV I appreciate the open source community that has freely provided source code written in different programming languages that saved my time and effort tremendously. Especially, I would like to thank the researchers and developers of Rosetta software team from University of Washington, Phaser program group from University of Cambridge, and Kevin Cowtan developer of clipper from University of York.

Lastly, I sincerely thank my family for their all time supports, love, and encouragement. I am grateful to my parents (Min Bahadur Shrestha and Dropati Shrestha) who raised me with a love of science and supported me in all my pursuits. I thank my sister, Roj, and brother, Ujjan, for all their supports. Finally, I appreciate my wife, Shalu, for her love, supports, and encouragements during the period of this PhD.

Thank you!

V Table of Contents Abstract


List of Figures

List of Tables

Chapter 1. Introduction

1.1. Protein and its structure

1.2. Computational methods for protein structure prediction

1.3. X-ray crystallography for protein structure determination

1.4. Phase problem

1.5. Ab initio phasing with de novo models

Chapter 2. Objective of the study

Chapter 3. MORPHEUS – error-estimation-guided rebuilding of de novo models increases the success rate of ab initio phasing

3.1. Objective

3.2. Methods

3.2.1. Benchmark dataset and initial model generation

3.2.2. Determine incorrectly predicted residues or regions

3.2.3. Rebuilt inaccurately predicted residues

3.2.4. Molecular replacement with rebuilt models

3.3. Results

3.3.1. Model accuracy correlated with their divergence

3.3.2. Accuracy improvement after rebuilding

3.3.3. Ab initio phasing with rebuilt de novo models

3.3.4. Performance measurement

3.4. Discussion

–  –  –

3.4.2. Biased conformational space searching

3.4.3. Molecular replacement with rebuilt models

3.5. Conclusion

Chapter 4. NEFILIM – improving fragment quality for de novo structure prediction 41

4.1. Objective

4.2. Methods

4.2.1. Benchmark data set and initial model generation

4.2.2. Improved fragment library generation

4.2.3. Resampling with new fragments

4.3. Results

4.3.1. New fragments from the de novo models

4.3.2. Model accuracy improvement

4.3.3. Improved performance in resampling

4.4. Discussion

4.5. Conclusion

Chapter 5. FRAP – ab initio phasing with de novo fragments for difficult targets.

.... 63

5.1. Objective

5.2. Methods

5.2.1. Benchmark data selection

5.2.2. De novo fragments generation for molecular replacement

5.2.3. Fragment assembly after molecular replacement

5.3. Result and Discussion

5.3.1. Seed fragment and reference model

5.3.2. De novo fragments and molecular replacement

5.3.3. Fragment assembly

–  –  –

5.4. Conclusion

Chapter 6. Summary

Chapter 7. Reference

VIII List of Figures Figure 1.1 Different level of protein structure

Figure 1.2 Bond length, bond angle, and dihedral angle

Figure 3.1 Schematic diagram of MORPHEUS program

Figure 3.2 Scatter plot between coarse-grained energy and accuracy of the models

Figure 3.3 Correlation between APMDS and model accuracy

Figure 3.4 Correlation between APRDS and CA-RMSD of the residue in the sequence

Figure 3.5 Comparison of accuracy of models before and after rebuilding

Figure 3.6 Comparison of accuracy of residues before and after rebuilding

Figure 3.7 Comparsion of average improvement in models before and after rebuilding

Figure 3.8 Distribution of APRDS of model before and after rebuilding with their accuracy.

.............. 29 Figure 3.9 Superposition of models after rebuilding to the native structures

Figure 3.10 Total elapsed time spent by Rosetta3.


Figure 4.1 An overview of NEFILIM

Figure 4.2 Quality of best fragment in structure-derived and sequence-derived fragment library.

........ 46 Figure 4.3 Enrichment of good quality in sequence-derived and structure-derived fragments.............. 47 Figure 4.4 Best fragment for each residue position (nine-residue)

Figure 4.5 Average accuracy of fragments at each residue position (nine-residue)

Pages:   || 2 | 3 | 4 | 5 |   ...   | 14 |

Similar works:

«3 He a lt h 2 2 Fire 0 3 0 Re a c t iv it y P e rs o n a l P ro t e c t io n Material Safety Data Sheet Phenol, Liquified, neutralized, for molecular biology MSDS Section 1: Chemical Product and Company Identification Product Name: Phenol, Liquified, neutralized, for Contact Information: molecular biology Sciencelab.com, Inc. 14025 Smith Rd. Catalog Codes: SLP5032 Houston, Texas 77396 CAS#: Mixture. US Sales: 1-800-901-7247 International Sales: 1-281-441-4400 RTECS: Not applicable. Order...»

«International Journal of Comparative Psychology, 2003, 16, 65-84. Copyright 2003 by the International Society for Comparative Psychology Application of Behavioral Knowledge to Conservation in the Giant Panda Ronald R. Swaisgood Zoological Society of San Diego, U.S.A. Xiaoping Zhou, Gwiquan Zhang, Wolong Nature Reserve, China Donald G. Lindburg, Zoological Society of San Diego, U.S.A. and Hemin Zhang Wolong Nature Reserve, China Over the past several years we have developed a research program to...»

«DISSERTATION Titel der Dissertation „Combined in silico/in vitro screening tools for identification of new insulin receptor ligands“ Verfasserin Dipl.-Ing. (FH) Daniela Digles angestrebter akademischer Grad Doktorin der Naturwissenschaften (Dr.rer.nat.) Wien, 2011 Studienkennzahl lt. Studienblatt: A 091 490 Dissertationsgebiet lt. Studienblatt: Dr.-Studium der Naturwissenschaften Molekulare Biologie Betreuerin / Betreuer: Univ.-Prof. Dr. Gerhard F. Ecker Acknowledgements This work would not...»

«International Journal of Environmental & Science Education, 2015, 10(3), 301-318 A US-China Interview Study: Biology Students’ Argumentation and Explanation about Energy Consumption Issues Hui Jin Educational Testing Service, USA Hayat Hokayem Texas Christian University, USA Sasha Wang Boise State University, USA Xin Wei People's Education Press, CHINA Received 30 January 2015 Revised 09 February 2015 Accepted 09 February 2015 As China and the United States become the top two carbon...»

«The 6th IAL Symposium and Annual ABLS Meeting Asilomar, CA, USA, 13–19 July, 2008 BIOLOGY OF LICHENS AND BRYOPHYTES The American Bryological and Lichenological Society (ABLS) will be meeting jointly with the Intenational Association for Lichenology (IAL 6) at the Asilomar Conference Center. Thus, a partially integrated program is planned. Organized by: The International Association for Lichenology, the American Bryological and Lichenological Society, the British Lichen Society, Arizona State...»

«Digestive Systems: The Anatomy of Representative Vertebrates Modified from: Biology in the laboratory. 3rd edition. Helms, Helms, Kosinski and Cummings. Biological Investigations: Form, Function, Diversity and Process. 7th Edition. W.D. Dolphin Helms, Helms, Kosinski, Cummings. Biology in the Laboratory, 3rd edition. Freeman Publishing. Harold M. Kaplan and Kathleen A. Jones Southern Illinois University OVERVIEW The digestive system participates in the procurement and metabolism of...»

«  1   Curriculum Vita John Edward Korstad June 1, 2011 Personal Data: Born: July 4, 1949, in Woodland, California, U.S.A. Marital Status: Married to Sally D. (Steffen) Korstad; 4 children and currently 5 grandchildren 301 E. 122nd Ct. S., Jenks, OK 74037 Home Address: Current Position: Professor of Biology (since Fall 1980) Department of Biology Oral Roberts University 7777 S. Lewis Tulsa, OK 74171 Phone Numbers: Home: (918) 853-2580 School: (918) 495-6942 Cell: (918) 853-3579 Fax: (918)...»

«Dissertation Plankton Communities and Ecology of Tropical Lakes Hayq and Awasa, Ethiopia angestrebter akademischer Grad Doktor der aturwissenschaften (Dr. rer. nat.) Verfasser: Tadesse Fetahi Dissertationsgebiet (It. Studienblatt): Őkologie Betreuer: Univ.-Prof. Dr. Michael Schagerl Dr. Seyoum Mengistou, Dr. Demeke Kifle Vienna, June 2010 1 Addresses Tadesse Fetahi Addis Ababa University Science Faculty, Biology Department P. O. Box 1176 Addis Ababa, Ethiopia Email: t_fetahi@yahoo.com...»

«TOHOKU LABORATORY OF SUSTAINABLE UNIVERSITY ENVIRONMENTAL BIOLOGY Molecular identification and epidemiological characterization of cryptosporidiosis and fasciolosis in central Vietnam (ベトナム中部におけるクリプトスポリジウム症および肝蛭症の分子疫学的研究) Nguyen Thi Sam Dissertation submitted in fulfillment of the requirements for the degree of Doctor (PhD) in Agricultural Sciences, 2013 Advisors: Prof. Dr. Yutaka Nakai Dr. Nguyen Duc Tan Laboratory of...»

«1 Human Papillomavirus: Biology and Pathogenesis José Veríssimo Fernandes1 and Thales Allyrio Araújo de Medeiros Fernandes2 1Federal University of Rio Grande do Norte 2University of Rio Grande do Norte State Brazil 1. Introduction The human papillomavirus (HPV) is one of the most common causes of sexually transmitted disease in both men and women around the world, especially in developing countries, where the prevalence of asymptomatic infection varies from 2 to 44%, depending on the...»

«‫ة‬ – ‫ا‬ ‫ا‬ The Islamic University – Gaza ‫دة ا را ت ا‬ Deanery of Higher Education ‫م‬ ‫ا‬ ‫آ‬ Faculty of Science ‫ما‬ ‫ا‬ Master of Biological Sciences Medical Technology Calpain 10 Gene Polymorphism in Type 2 Diabetes Mellitus Patients in Gaza Strip Prepared by Mazen M. El Zaharna Supervisors: Prof. Fadel A. Sharif Dr. Abdalla Abed Submitted in Partial Fulfillment of Requirements for the Degree of Master of Biological Sciences/Medical...»

«1 Curriculum Vita John Korstad Feb. 22, 2015 Personal Data: Current Position: Professor of Biology (since Fall 1980) Department of Biology Oral Roberts University 7777 S. Lewis Tulsa, OK 74171 Phone Number: School: (918) 495-6942 Fax: (918) 495-6297 E-mail: jkorstad@oru.edu ORU Faculty Profile: http://webapps.oru.edu/new_php/academics/faculty_profile.php?id=9&k= Education: Undergraduate: California Lutheran University, Thousand Oaks, CA B.A. (Geology) and B.S. (Biology), 1972 Graduate:...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.