«Abstract of “Genome-wide algorithms for haplotype assembly, haplotype phasing, and IBD inference” by Derek Aguiar, Ph.D., Brown University, May ...»
of “Genome-wide algorithms for haplotype assembly, haplotype phasing, and IBD inference” by Derek Aguiar, Ph.D., Brown University, May 2014.
Determining the sequences of alleles co-inherited on a single chromosome, or haplotypes, is fundamentally important in genomics, molecular biology, and genomic medicine. Experimental methods
for determining haplotypes are currently labor intensive, expensive, and do not scale. The computation of haplotypes from genome sequencing, or haplotype assembly, employs graph-theoretic and combinatorial algorithms intertwined with statistical models of DNA. The related problem of haplotype reconstruction from a population sample, or haplotype phasing, uses the statistical linkage between neighboring alleles and identical-by-descent (IBD) evolutionary relationships to reconstruct the haplotype sequences. This dissertation introduces graph-theoretic, combinatorial, and statistical
algorithms for genome-wide haplotype reconstruction and IBD haplotype tract inference. Speciﬁcally, we present:
DELISHUS, a mathematical model and exact polynomial-time algorithm for computing deletion haplotypes in SNP array data.
The HapCompass algorithm for diploid genomes (e.g. humans) which models haplotype reconstruction as local optimizations on the cycle basis of a graph theoretic representation of variant alleles captured by sequence reads. This framework provides an algorithmic design strategy for a range of haplotype reconstruction problems and incorporates population genetics and identity-by-descent theory into the haplotype reconstruction model.
The ﬁrst model and algorithm for haplotype assembly of polyploid genomes, that is, organisms with more than two sets of homologous chromosomes (common in plant and tumor genomes).
Tractatus, the ﬁrst theoretically guaranteed exact and linear time algorithm for identical-bydescent multi-tract inference.
We compare our approaches with a variety of competing algorithms and investigate the feasibility of genome-wide haplotype reconstruction from computational and experimental perspectives.
Genome-wide algorithms for haplotype assembly, haplotype phasing, and IBD inference by Derek Aguiar B.Sc., Computer Science and Computer Engineering, University of Rhode Island, 2007 M.Sc., Computer Science, Brown University, 2010 A dissertation submitted in partial fulﬁllment of the requirements for the Degree of Doctor of Philosophy in the Department of Computer Science at Brown University Providence, Rhode Island May 2014 © Copyright 2014 by Derek Aguiar This dissertation by Derek Aguiar is accepted in its present form by the Department of Computer Science as satisfying the dissertation requirement
Research Interest Keywords Computational Molecular Biology: haplotype phasing, haplotype assembly, autism genomics, population genomics, genomic regulatory network visualization and analysis, immunogenomics, RNA-Seq pipelines, deletion inference Design and analysis of algorithms: graph theory, graph algorithms, combinatorial optimization, approximation algorithms
Sept. 2009 Master of Science in Computer Science, Brown University Class of 2010 Sept. 2008 Concentration: Computational Molecular Biology Advisor: Professor Sorin Istrail
Publications Derek Aguiar, Eric Morrow, Sorin Istrail, Tractatus: an exact and subquadratic algorithm for inferring identity-by-descent multi-shared haplotype tracts. In RECOMB, vol. 8394, pp. 1-17, 2014.
Derek Aguiar, Wendy S.W. Wong, Sorin Istrail, Tumor haplotype assembly algorithms for cancer genomics. In Pac Symp Biocomput., vol. 19, pp. 3-14, 2014.
Sarah Tulin1, Derek Aguiar1, Sorin Istrail, Joel Smith, A quantitative reference transcriptome for Nematostella vectensis early embryonic development: a pipeline for de novo assembly in emerging model systems. EvoDevo, vol. 4, no 16, 2013.
1 denotes co-ﬁrst authorship when ambiguous
Derek Aguiar, Bjarni V. Halldorsson, Eric M. Morrow, Sorin Istrail, DELISHUS: an eﬃcient and exact algorithm for genome-wide detection of deletion polymorphism in autism, In proceedings of ISMB 2012 and Bioinformatics, vol. 28, no. 12, pp. i154-i162, 2012.
Derek Aguiar, Sorin Istrail, HAPCOMPASS: A fast cycle basis algorithm for accurate haplotype assembly of sequence data, In Journal of Computational Biology, vol. 19, no. 6, pp. 577-590, 2012.
Bjarni V. Halldorsson1, Derek Aguiar1, and Sorin Istrail. Haplotype phasing by multi-assembly of shared haplotypes: phase-dependent interactions between rare variants. In Pac Symp Biocomput., pages 88-99, 2011.
Bjarni V. Halldorsson1, Derek Aguiar1, Ryan Tarpine1, and Sorin Istrail. The clark phaseable sample size problem: Long-Range phasing and loss of heterozygosity in GWAS. Journal of Computational Biology, 18(3):323-333, March 2011.
Bjarni V. Halldorsson1, Derek Aguiar1, Ryan Tarpine1, and Sorin Istrail. The clark phase-able sample size problem: Long-Range phasing and loss of heterozygosity in GWAS. In Bonnie Berger, editor, RECOMB, volume 6044, pages 158-173, 2010.
Sorin Istrail, Ryan Tarpine, Kyle Schutter, and Derek Aguiar. Practical computational methods for regulatory genomics: A cisGRN-lexicon and cisGRN-browser for gene regulatory networks. In Istvan Ladunga, editor, Computational Biology of Transcription Factor Binding, volume 674 of Methods in Molecular Biology, pages 369-399. Humana Press, 2010.
Posters D. Aguiar, A. Huang, R. Kantor, E. Morrow and S. Istrail, “Haplotype assembly in the presence of hemizygosity, haplotype sharing, polyploidy, and viral quasispecies,” HiT-Seq workshop in the 21st Annual International Conference on Intelligent Systems for Molecular Biology, July 2013, Berlin, Germany. (selected for oral presentation) S. Tulin, D. Aguiar, S. Istrail, and J. Smith, “Nematostella reference transcriptome and high throughput gene regulatory network construction,” SDB 71st Annual Meeting, July 2012, Montreal, Canada.
D. Aguiar and S. Istrail, “HAPCOMPASS: A fast cycle basis algorithm for accurate haplotype assembly of next-generation sequence data,” 20th Annual Intelligent Systems for Molecular Biology, July 2012, Long Beach, CA.
D. Aguiar, R. Tarpine, F. Lam, B. Halldorsson, E. Morrow, and S. Istrail, “Long-Range Haplotype Phasing by Multi-Assembly of Shared Haplotypes: Phase-Dependent Interactions Between Rare Variants,” Genomics of Common Disease, Wellcome Trust Sanger Genome Center, Cambridge UK, December 2011 D. Aguiar, R. Tarpine, F. Lam, B. Halldorsson, E. Morrow, and S. Istrail, “Long-Range Haplotype Phasing by Multi-Assembly of Shared Haplotypes: Phase-Dependent Interactions Between
R. Tarpine, J. Hart, T. Johnstone, D. Aguiar, S. Istrail, “Report on the Cyrene Project: A cisLexicon Containing the Regulatory Architecture of 557 Regulatory Genes Experimentally Validated Using the Davidson Criteria,” The Developmental Biology of the Sea Urchin Meeting, April 27-30, 2011, Woods Hole, MA.
D. Aguiar, R. Tarpine, E. Ruggieri, J. Nadel, D. Moskowitz, S. Istrail, “Beyond GWAS: Robust Computational Analysis of the Multiple Sclerosis Genetic Consortium Data,” Fourth Annual Center for Computational Biology Poster Session, April 28, 2010, Brown University, RI.
Invited Talks 18th Annual International Conference on Research in Computational Molecular Biology “Tractatus: an exact and subquadratic algorithm for inferring identical-by-descent multi-shared haplotype tracts.” April 2014 19th Paciﬁc Symposium on Biocomputing “Tumor Haplotype Assembly Algorithms for Cancer Genomics.” January 2014 IPP Symposium: Putting Big Data to Work. “Ome sweet ome: the genome as a model for big data.” April 25, 2013 21st Annual International Conference on Intelligent Systems for Molecular Biology “Haplotype assembly in polyploid genomes and identical by descent shared tracts.” July 2013 HiT-Seq workshop in the 21st Annual International Conference on Intelligent Systems for Molecular Biology “Haplotype assembly in the presence of hemizygosity, haplotype sharing, polyploidy, and viral quasispecies.” July 2013 20th Annual International Conference on Intelligent Systems for Molecular Biology “DELISHUS: An eﬃcient and exact algorithm for Genome-Wide detection of deletion polymorphism in autism.” July 2012 Brown Computational Biology Open House “SNPs and haplotypes and GWAS oh my!” Feb.
27, 2012 Second Annual IEEE ICCABS CANGS Workshop “Robust algorithms for inferring haplotype phase and deletion polymorphism from high throughput whole genome sequence data” Feb. 24, 2012 Brown Computer Science - Research Exchange Seminars with Tea (REST) “Computational Challenges in Genome-wide Association Studies” Nov. 30, 2010 Fourteenth International Conference on Research in Computational Molecular Biology “The Clark Phase-able Sample Size Problem Long-range Phasing and Loss of Heterozygosity in GWAS” August 12, 2010
Scholarships & Honors 2014 RECOMB Student Travel Fellowship 2013 ISMB Student Travel Fellowship 2013 NSF EPSCoR Academy Travel Award 2012 ISMB Student Travel Fellowship 2012 IEEE ICCABS CANGS Workshop Student Travel Award 2010 RECOMB Student Travel Award May 2007 President’s Award for Excellence (Computer Science), URI May 2007 President’s Award for Excellence (Computer Engineering), URI May 2007 Summa Cum Laude, URI May 2007 Outstanding Graduating Senior in Computer Engineering, URI May 2006 Outstanding Junior in Computer Engineering, URI Memberships & Activities 2012-present ISCB Membership 2006-present IEEE Membership 2006-present ACM Membership 2006-present Phi Eta Sigma Honor Society 2006-present Tau Beta Pi Honor Society 2007 Six Sigma Specialist Industry Experience May 2008 Software Engineer, Raytheon IDS, Portsmouth, RI Oct. 2007 Zumwalt Total Ship Computing Environment Infrastructure The majority of my work was spent developing and testing the data control and management software of a next-generation U.S. naval ship.
viii Oct. 2007 Software Engineer, Raytheon IDS, Portsmouth, RI June 2006 Joint Rapid Integrated Planning Service During my junior year I began working as a Software Engineer in Raytheon’s Mission Innovation (MI) group, a small (20-30-people) division of Raytheon IDS that partners with universities to apply company technology and resources to world-threatening issues (e.g. climate change, biological diversity protection, civil defense, etc.). While in MI, I experienced the intellectual excitement of working in a small dedicated team of diverse individuals, in this case on civil-defense-related technology; we used Google Earth, KML, SQL, and.NET to develop a web-based collaborative disaster-planning tool (JRIPS). I am listed as co-inventor on a patent prepared by Raytheon.
The work presented in this dissertation was performed in the laboratory of Sorin Istrail, PhD and portions of this work have been published in Aguiar and Istrail (2012, 2013), Aguiar, Morrow,
(2010, 2011). With the guidance of Sorin, I developed the theory, models, algorithms, and software
presented herein, with the follow exceptions:
Development of the haplotype phasing algorithm described in Chapter 2 was a product of close collaboration with Ryan Tarpine and Bjarni Halldorsson. We collaborated with Bjarni Halldorsson to deﬁne the initial modeling for DELISHUS in Chapter 3. The applications of DELISHUS to autism was a product of the close collaboration with Dr. Eric Morrow. Wendy SW Wong simulated cancer data and provided valuable input for Chapter 5. Eric Morrow provided guidance and data for the experiments on autism GWAS data in Chapter 6.
little research experience a chance to work in his lab. Sorin is the single most important reason I am writing this dissertation. From attending my ﬁrst conference at RECOMB 2009 to presenting in my last conference while a graduate student at RECOMB 2014, Sorin has been the ideal mentor. As a researcher, Sorin has instilled in me several axioms: to build mathematically rigorous foundations for algorithms that balance complexity and practicality, to aim for challenging and open problems, and to collaborate closely with medical doctors and researchers that use our work in a non-abstract manner. As a teacher, Sorin has taught me to teach diﬃcult problems and not be satisﬁed with the status quo. As a person, Sorin’s journey from Romania to becoming a professor at Brown University is an inspiration. He has been integral in every aspect of my success as a graduate student from writing papers and delivering presentations to being supportive and understanding when I tear ligaments in my knee. Thank you Sorin.
Over the course of many collaborations, I have had the privilege of working with superb research scientists who are experts of their respective ﬁelds. Bjarni Halldorsson, Eric Morrow, Wendy Wong, Joel Smith, Marta Gomez-Chirarri, Russell Turner, and Sarah Tulin have all been extremely helpful throughout my graduate studies. I am also grateful to Eli Upfal and Franco Preparata for helping me throughout this process and providing valuable feedback as part of my thesis committee; and also for the occasional political debates at Sorin’s dining room table.
The Istrail Lab has also provided a constant stream of talented students and post-docs that have helped me throughout the years. Ryan Tarpine, Austin Huang, and Alper Uzun are truly singular talents from which I have learned a great deal. I am also fortunate to have worked with great undergraduates including David Moskowitz, Ning Hou, Kyle Schutter, Allan Stewart, Tim
Several people were instrumental in my professional development prior to arriving at Brown.
Thank you Ed Davis for giving me opportunities in high school to be creative. Edmund Lamagna taught the most diﬃcult and, probably not coincidentally, the most interesting course at URI.
Thank you for challenging me and igniting my interest in algorithms. Rob Lawrence, Dave Stuart, and Lucia Falcon were instrumental in my professional development in industry and provided sincere encouragement to pursue my doctoral degree.