Thuy Phuong Dao

A dissertation submitted to Johns Hopkins University in conformity with the

requirements for the degree of Doctor of Philosophy

Baltimore, Maryland June 2014 © Thuy Phuong Dao 2014 All rights reserved Abstract Thermodynamic and kinetic studies of linear repeat proteins have provided unique insights into cooperativity, detailed maps of local stabilities, and sequence determinants of folding pathways. Most of these studies have focused on α-helical repeat proteins. Additional work on repeat proteins that feature other types of secondary structures, such as β-strands, is essential for understanding different kinds of interactions found in much more complicated globular proteins.

Here, we investigated the folding properties of a naturally occurring β-strand-containing leucine-rich repeat (LRR) protein PP32 as well as designed consensus bacterial LRR constructs.

PP32 contains five tandem LRRs flanked by α-helical and β-strand capping motifs on the N- and C-termini, respectively. Terminal caps are often observed in LRR proteins, but not in helical repeat proteins. Without the C-cap, PP32 is unfolded. Without the N-cap, PP32 is less stable, but retains its secondary structure. However, solution studies by NMR and mutational analysis show that removing the N-cap causes the first two repeats to exist in a molten-globule-like state, where secondary structures are formed but rigid tertiary packing is disrupted. Therefore, both caps are essential for structure formation and stability of PP32, though to different extents.

Although PP32 undergoes an equilibrium two-state unfolding transition, its kinetic folding mechanism is more complicated, with the formation of an on-pathway intermediate as the rate-limiting step. Φ-value analysis reveals a ii highly polarized transition state involving repeat 5 and part of repeat 4. Hydrogen exchange monitored by NMR spectroscopy shows that PP32 is most stable towards the C-terminus. Therefore, the folding pathway for PP32 is dictated by local stability, as observed for α-helical repeat proteins.

Whereas the studies of naturally occurring repeat proteins permit the variations in repeat sequence to be investigated, those of designed consensus repeats proteins, in which the repeats are identical (or nearly identical), allow us to resolve the intrinsic energy of individual repeats and interfacial energy between neighboring repeats. We have been able to create consensus bacterial LRR constructs that are well-behaved, stable and unfold via cooperative transitions. By fitting a nearest-neighbor Ising model to the unfolding transitions, we have determined that folding of individual repeats is unfavorable but the interactions between adjacent repeats are highly favorable, consistent with the high cooperativity observed for LRR proteins.

I would like to thank my thesis advisor, Dr. Doug Barrick, for all of his wonderful guidance, encouragement and patience over the years. His love for science, constant excitement for data, any data, and thoughtful analysis of results inspire me to work hard to become a better scientist. Aside from repeat proteins, I will miss our morning chats about soccer, football, our tomatoes, Doug’s Japanese rice bowls, and much more!!

I would like to thank my thesis committee members, Drs. Bertrand GarciaMoreno E., Vincent Hilser, Juliette Lecomte and Joel Tolman for the challenging questions, and the very useful comments and suggestions, especially early on in my graduate school years. They helped me tremendously in thinking critically and presenting my work logically.

I would like to thank Dr. Ananya Majumdar for all of his help with NMR, from training to experimental setups, and his friendship. It is always fun to hear his many interesting life stories.

I would like to thank all of the past and present members of the Barrick lab for being a wonderful support system and for putting up with me. Christine Hatem has been my bay mate for as long as I can remember. She is very kind, understanding and generous. She listens to all of my problems and is always there for me. I really appreciate having her in my life. Dr. Ellen Kloss took me on when I was a recruit and helped me find my way around the lab, the instruments, and the proteins. I am very glad to have known her as a scientist and a friend. Dr.

from her and very much enjoy our random soccer texts. Dr. Scott Johnson is always funny in his own sarcastic way, which makes great early-morning lab conversations. I also love his long emails about the police academy and patrols.

Jacob Marold was my first rotation student and I could not have asked for a better one. He is devoted, meticulous, thoughtful and I have learned much more from him than he has from me. Jake is also extremely considerate and caring, so I am very glad he is also my friend! Even though I have known Dr. Katie Tripp for less than two years, she has been my sounding board for most of that time, making the thesis-writing process much more bearable.

I would like to thank my classmates, especially Matt Preimesberger (and his sweet girlfriend Carla), Helen Jun, Jackson Buss, Gustavo AfanadorGonzalez, and the pseudo classmate Matt Pond (and his wonderful wife Monique). We have had many food adventures and long, fun nights that might or might not involve crazy, violent Thuy.

I would like to thank Ranice Crosby for taking care of us at all times. I would have missed so many deadlines and been very lost without her help.

I would like to thank the office staff, especially Jerry Levin, Jess (soon-tobe Appel) Bailey, Ken Rutledge and Lexie Ebert for making sure research can be carried out smoothly and for entertaining me when I bug them. A special shoutout to Jess who quickly became one of my dearest friends. I will always

to be celebrating Jess and her sweetheart Ben on their special day!!

I would like to thank my Whitman professors, Drs. Daniel Vernon and Doug Juers, and Nancy Forsthoefel for providing me the opportunities to discover how fun research is. Doug’s passion for teaching and biophysics inspired me to pursue the subject for my graduate work.

I would like to thank my family for believing in me and always being supportive of my decisions. I would especially like to thank my uncle Thuyết and my aunt Ngọc Lan who have been raising me as their own child. My uncle is wise, stern but fair. My aunt is gentle and kind-hearted. Together, they have taught me to be caring, considerate, hard-working and persistent. I hope to eventually make a difference in someone’s life as they have in mine.

Last but not least, I would like to thank Carlos for being there for me during the last few years. He is always patient, encouraging, and willing to help me with whatever I need, from NMR, to house work, to chauffeuring, regardless of how busy he is. I can’t wait for our many journeys ahead (including him being my boss…yikes!!)

Chapter II. Capping motifs stabilize the LRR protein PP32 and rigidify adjacent repeats (This chapter is reprinted with permission from the authors. T.P. Dao, A. Majumdar and D. Barrick (2014). Protein Science 23, 801-11.)

Chapter III. The highly polarized C-terminal transition state of the leucinerich repeat domain of PP32 is governed by local stability 53

Figure 1.2 Diversity in structures of LRR proteins in different subfamilies 17 Figure 1.

3 Hidden Markov models of representative LRR subfamilies 18

1.1 Repeat proteins are ideal for exploring folding pathways and cooperativity The folding of polypeptides into well-defined, functional “native” state is arguably the most fundamental biological process in the cell. In most cases, the native conformation of a protein is solely defined by its amino acid sequence, as first described for ribonuclease A (Anfinsen, 1973). The folding process normally involves the spontaneous organization of a chain of a hundred or more amino acids into a unique structure involving hundreds of narrowly defined backbone and side chain dihedral angles. Owing to the vast number of conformations accessible to an unfolded polypeptide, it is remarkable that proteins fold at all, let alone so quickly, sometimes within sub-millisecond timescales, as observed for many small proteins (Kubelka et al., 2004; Yang and Gruebele, 2003). Moreover, the folding reaction is often cooperative, involving only unfolded and folded species, but not partially folded species. Although an extensive amount of work has been done, the origin of cooperativity is still not fully understood (Sosnick and Barrick, 2011).

Cooperativity can be quantified by the energy distribution within a protein (Aksel and Barrick, 2009). Dissecting the contribution of a region to the folding process typically requires the protein to be studied with that region removed. For globular proteins, which comprise many contacts between regions that are distant in sequence, removing a region would most likely disrupt the overall fold, preventing interpretation of any available results. In favorable cases, this problem can be bypassed by using hydrogen exchange (HX) methods, which measure stability of the native protein at single-residue resolution (Hvidt and Linderstrøm-Lang, 1954; Krishna et al., 2004). For a complete and detailed stability map, the HX experiments have to be carried out over a range of mildly destabilizing conditions. This process is not only labor-intensive and time-consuming, but also limited to proteins of high global stability. Moreover, the irregular tertiary structures of globular proteins also make it hard to compare different regions.

In contrast to globular proteins, repeat proteins are simpler and more regular in architecture, and lack sequence-distant contacts. These proteins contain repeated units of highly similar secondary structure that stack together in a linear array. The regular and modular nature of repeat proteins allows the contribution of individual repeats to be dissected and compared by adding and deleting repeats (Kloss and Barrick, 2009; Mello and Barrick, 2004; Tripp and Barrick, 2004; Tsytlonok et al., 2013a; Vieux and Barrick, 2011). Moreover, the simple, repetitive organization of secondary and tertiary structure can be extended to primary structure through consensus design, further simplifying analysis and comparison (Aksel and Barrick, 2009; Aksel et al., 2011; Main et al., 2003; Parker et al., 2014; Stumpp et al., 2003; Tripp and Barrick, 2007; Wetzel et al., 2008) Though simple in architecture, different types of repeat proteins are highly diverse in structure, with repeats containing mainly helices or only coils, or mixtures of β-strands and other secondary structure elements (Figures 1.1 and 1.2). The different types of repeats lead to a diversity of cellular functions, with scaffolding serving as a common role (Andrade et al., 2001). From the perspective of folding studies, this diversity in secondary structures allows for the understanding of different simpler structural units, which can then be applied to much more complicated globular proteins.

Folding studies of repeat proteins have focused primarily on α-helical ankyrin repeat and tetratricopeptide repeat (TPR) and to a lesser extent on β-strand-containing leucine-rich repeats (LRR). Owing to the differences in structures of these two types of repeats, the average change in surface area upon folding of individual α-helical repeats is about twice that of β-stranded repeats (Kloss et al., 2008). Although the inter-repeat interactions are different for the two types of repeat proteins (helix packing vs. β-sheet formation), the average surface area buried between adjacent repeats is similar. How do the folding processes for α-helical and β-strand containing repeat proteins differ?

What are the consequence of the structural differences on folding? Even though much is known about the folding properties of these proteins, especially α-helical repeat proteins, more work is needed as described below.

1.2 Most repeat proteins unfold through equilibrium two-state transitions Due to the lack of contacts between regions far apart in sequence, elongated, modular repeat proteins might be expected to unfold via multiple transitions. However, most proteins studied to date are highly cooperative in folding, including α-helical repeat proteins (Aksel et al., 2011; Lowe and Itzhaki, 2007; Main et al., 2003; Mosavi et al., 2002; Tang et al., 1999; Zweifel and Barrick, 2001), LRR proteins (Courtemanche and Barrick, 2008a; Kelly et al., 2014; Kloss and Barrick, 2008), and coiled pentapeptide repeat protein HetL (Dao and Barrick, unpublished data). Moreover, the limit of cooperativity appears to be higher in repeat proteins than in globular proteins, which tend to unfold via multiple transitions at longer chain lengths. For LRR proteins, the observed m-values, which indicate the size of the cooperative units, are higher than predicted based on empirical values from studies of globular proteins (Myers et al., 1995), further supporting high cooperativity in folding of repeat proteins.

However, multiple-state equilibrium unfolding has also been observed for some repeat proteins (Junker et al., 2006; Kamen et al., 2000; Tsytlonok et al., 2013b; Werbeck and Itzhaki, 2007; Zeeb et al., 2002). Some of these proteins are very large, containing up to 20 repeats and over 500 residues.

For cooperativity to be observed, thermodynamic coupling from one end to the other is needed, which can be hard for very long modular proteins. It has also been shown that highly skewing the stability distribution across a repeat domain without enhancing inter-repeat stability can disrupt the two-state folding mechanism (Bradley and Barrick, 2002; Street et al., 2007; Tripp and Barrick, 2007).

