FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:   || 2 | 3 | 4 | 5 |   ...   | 18 |

«School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee: Maxine Eskenazi, Chair Alan W Black Reid Simmons Diane ...»

-- [ Page 1 ] --

Flexible Turn-Taking

for Spoken Dialog Systems

Antoine Raux


December 2008

School of Computer Science

Carnegie Mellon University

Pittsburgh, PA 15213

Thesis Committee:

Maxine Eskenazi, Chair

Alan W Black

Reid Simmons

Diane J. Litman, U. of Pittsburgh

Submitted in partial fulfillment of the requirements

for the degree of Doctor of Philosophy.

Copyright c 2008 Antoine Raux

This research was sponsored by the U.S. National Science Foundation under grant number IIS-0208835 The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government.

Keywords: Turn-Taking, Dialog, Dialog Systems, Speech, Computers, Science To my wife Miyako, iv Abstract Even as progress in speech technologies and task and dialog modeling has allowed the development of advanced spoken dialog systems, the low-level interaction behavior of those systems remains often rigid and inefficient.

The goal of this thesis, is to provide a framework and models to endow spoken dialog systems with robust and flexible turn-taking abilities. To this end, we designed a new dialog system architecture that combines a high-level Dialog Manager (DM) with a low-level Interaction Manager (IM). While the DM operates on user and system turns, the IM operates at the sub-turn level, acting as the interface between the real time information of sensors and actu- ators, and the symbolic information of the DM. In addition, the IM controls reactive behavior, such as interrupting a system prompt when the user barges in. We propose two approaches to control turn-taking in the IM.

First, we designed an optimization method to dynamically set the pause duration threshold used to detect the end of user turns. Using a wide range of dialog features, this algorithm allowed us to reduce average system latency by as much as 22% over a fixed-threshold baseline, while keeping the detection error rate constant.

Second, we proposed a general, flexible model to control the turn-taking behavior of conversational agents. This model, the Finite-State Turn-Taking Machine (FSTTM), builds on previous work on 6-state representations of the conversational floor and extends them in two ways. First, it incorporates the notion of turn-taking action (such as grabbing orreleasing the floor) and of state-dependent action cost. Second, it models the uncertainty that comes from imperfect recognition of user’s turn-taking intentions. Experimental results show that this approach performs significantly better than the threshold optimization method for end-of-turn detection, with latencies up to 40% shorter than a fixed-threshold baseline. We also applied the FSTTM model to the problem of interruption detection, which reduced detection latency by 11% over a strong heuristic baseline.

The architecture as well as all the models proposed in this thesis were evaluated on the CMU Let’s Go bus information system, a publicly available telephone-based dialog system that provides bus schedule information to the Pittsburgh population.

vi Acknowledgments I would like to thank my advisor, Maxine Eskenazi for giving me guidance when I needed it while leaving me the freedom to pursue my own interest during the past 6 years. In addition to mentoring my research, Maxine has always been concerned about other aspects of my life, in particular my family, and I thank her for that. Alan W Black has been a kind of informal co-advisor for me during my whole PhD, and I am deeply grateful for the numerous hours he has spent helping me shape up and strengthen my research. I would also like to thank Diane Litman and Reid Simmons for agreeing to be on my thesis committee and providing many helpful comments and suggestions.

During the course of my PhD, I made some very good friends in Pittsburgh. More often than not, they offered both academic and emotional support. In particular, I would like to thank Satanjeev ”Bano” Banerjee, Dan Bohus, Thomas Harris, Brian Langner, Mihai Rotaru, and Jahanzeb Sherwani for the many discussions we had, whether it be on the pros and cons of reinforcement learning for dialog management, or on the latest Steelers game.

I would also like to thank all the members of the Sphinx Group at CMU, of the Dialogs on Dialogs student reading group, and of the Young Researchers Roundtables on Spoken Dialog Systems participants, for many many discussions that helped me both dene my research and broaden my horizon. Many thanks to Hua Ai, Tina Bennett, Ananlada ”Moss” Chotimongkol, Heriberto Cuayahuitl, Matthias Denecke, Bob Frederking, Hartwig Holzapfel, Matthew Marge, Jack Mostow, Ravi Mosur, Verena Reiser, Alex Rudnicky, Jost Schatzmann, Rich Stern, Svetlana Stoyanchev, and Jason Williams.

I would like to thank my family. My parents, my sister Cecile and her family, and my brother Xavier, and my grandmother Jeannette, for always trusting me to go my own way and supporting me, even if that meant going thousands of miles away from home, and then thousands of miles in the other direction.

Finally, my deepest gratitude goes to my children Yuma and Manon, and my wife Miyako. Thank you for bearing with me when deadlines meant more stress and less time at home. Thank you Manon for brightening my days with your smiles and laughter and vii singing. Thank you Yuma for reminding me what really matters, with all the wisdom of your 4 years, and for building me a ”super special pen” that writes theses faster when I needed it. Last but, oh so not least, thank you Miyako for your love, patience, and strength in carrying the whole family during the past 6 years. I could not have done it without you.

–  –  –

3.1 Excerpt from a dialog with the system. (U: user turns, S: system turns).. 28

3.2 Relationship between word error rate and task success in the Let’s Go system.................................... 29

3.3 Effect of language and acoustic model retraining on word error rate. As explained in section 3.3.1, two gender-specific engines were used. The dark bars represent WER obtained when selecting for each user utterance the recognizer with the highest (automatically computed) recognition score, while the light bars represent WER obtained when selecting the recognizer with the lowest WER (oracle selection). The latter is a lower bound of the performance obtainable by selecting one recognizer for each utterance. At runtime, the selection is based on Helios confidence (see text), which, in addition to recognition score, uses information from the parser and dialog state. Therefore, runtime performance typically lies between the two bounds given here.......................... 30

3.4 The RavenClaw task tree for the Let’s Go spoken dialog system (March 2008). The tree is rotated 90 degrees, with the root on the left and leaves to the right. Left-to-right traversal becomes top-to-bottom in this layout.. 32

3.5 Evolution of call volume and system performance between March 2005 and September 2008. The acoustic and language models of the speech recognizer were retrained in the summer of 2006.............. 37

3.6 Distribution of the number of user turns per dialog............. 37

3.7 Histograms of the duration of the switching pauses preceding utterances by one of the participants in the HH and HC2 corpora........... 43

–  –  –

5.1 Relationship between endpointing threshold and cut-in rate......... 66

5.2 Relationship between endpointing threshold and cut-in rate (semi-logarithmic scale)..................................... 66

5.3 Relationship between endpointing threshold and non-understanding rate.. 67

5.4 False Alarm / Latency Trade-off in the Winter Corpus........... 68

5.5 Illustration of the proof by contradiction of Theorem 1........... 78

5.6 Performance of the proposed compared to a fixed-threshold baseline, a state-specific threshold baseline and the approach of Ferrer et al. [2003].. 81

5.7 Performance of the proposed approach using different feature sets..... 81

5.8 Example endpointing threshold decision tree learned by the proposed algorithm. Each internal node represents a test on dialog features. Cases for which the test is true follow the top branch while those for which it is not follow the bottom branch. Leaf nodes contain the thresholds obtained for a 3% overall cut-in rate............................ 82

5.9 Performance and tree size with increasing training set size for a 4% cut-in rate...................................... 83

5.10 Live evaluation results............................ 84

6.1 Our six-state model of turn-taking, inspired by Jaffe and Feldstein [1970] and Brady [1969]............................... 89

6.2 Cut-in / Latency Trade-off for Pause-based Endpointing in the FSTTM, compared with a fixed-threshold baseline, the threshold optimization approach described in Chapter 5, and the approach of Ferrer et al. [2003].. 102

–  –  –

5.1 Performance of the cut-in labeling heuristic................. 65

5.2 Performance of the cut-in labeling heuristic on actual speech boundaries.. 65

5.3 Effect of dialog Features on Pause Finality. * indicates that the results are not statistically significant at the 0.01 level................. 70

5.4 Effect of dialog Features on Turn-Internal Pause Duration. * indicates that the results are not statistically significant at the 0.01 level......... 71

–  –  –

xvii Performance of state-specific logistic regression for estimating P (F |O) 6.4 during speech segments........................... 104 Features selected by stepwise logistic regression to estimate P (F |O) during speech segments and their coefficients (all non-null coefficients are non null with p 0.01)........................... 105

6.6 Co-occurrence of Matches/Non-understandings and Manually Annotated Barge-ins/False interruptions......................... 108

6.7 Barge-in keywords for the dialog act ”explicit confirm”. * indicate words that signal self interruptions, all other words signal barge-ins........ 110

6.8 Barge-in keywords for the dialog act ”request next query”. * indicate words that signal self interruptions, all other words signal barge-ins.... 110

–  –  –


1.1 Introduction After several decades of research and development effort in the realm of practical spoken dialog systems, the technologies have matured enough to allow wide spread use of such systems. Still, the approaches that have permitted the creation of working systems have left many issues unsolved and spoken conversation with artificial agents remains often unsatisfactory. Perhaps the most prominent issue is the quality of automatic speech recognition (ASR), which often results in misunderstandings that, in turn, lead to dialog breakdowns.

To overcome these issues system designers either constrain the interaction in some way, as is the case in system-directed dialogs [Farf´ n et al., 2003], or endow systems with error a handling capabilities to smoothly recover from misrecognitions [Edlund et al., 2004, Bohus and Rudnicky, 2005]. These two strategies provide a way to cope with imperfect ASR, but they both come with a cost: they make dialogs longer either by only letting the user provide small amounts of information at a time (as in strongly system-directed dialogs), or by generating confirmation prompts (as in systems with error handling strategies). This would not be an issue if, in addition to issues in spoken language understanding, current spoken dialog systems did not also have poor turn-taking capabilities. Indeed, the cost of an additional turn for artificial conversational agents, in time spent and/or disruption of the flow of the conversation, is much higher than what happens in human-human conversation. As pointed out in recent publications [Porzel and Baudis, 2004, Ward et al., 2005], this weakness comes from the fact that low-level interaction has to a large extent been neglected by researchers in the field. Instead, they have concentrated on higher-level concerns such as natural language understanding and dialog planning.


1.2 The Conversational Floor Before going into any more details on turn-taking and system dialog systems, it is important to define the key concepts at play: conversational floor and turn-taking. Consider the

following example of a dialog between a spoken dialog system (S) and a human user (U):

–  –  –

Pages:   || 2 | 3 | 4 | 5 |   ...   | 18 |

Similar works:

«Only Shinran Will Not Betray Us: Takeuchi Ryō’on (1891-1967), the Ōtani-ha Administration, and Burakumin Jessica L. Main Faculty of Religious Studies McGill University, Montreal April 2012 A thesis submitted to McGill University in partial fulfillment of the requirements of the degree of Doctor of Philosophy © Jessica L. Main, 2012 Table of contents Table of contents i Abstract iii Résumé v Acknowledgements vii A note on usage and conventions x Introduction: Buddhist ethics and buraku...»

«  Pitch perception prior to cortical maturation Bonnie K. Lau A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2014 Reading Committee: Lynne A. Werner, Chair Andrew J. Oxenham David L. Horn Authorized to Offer Degree: Speech and Hearing Science     © Copyright 2014 Bonnie K. Lau     University of Washington Abstract Pitch perception prior to cortical maturation Bonnie K. Lau Chair of the Supervisory...»

«ABSTRACT Title of Document: THE INFLUENCE OF CONSUMER MOTIVATIONS ON CONSUMPTION INTENTIONS AND BEHAVIOR Francine da Silveira Espinoza, Doctor of Philosophy, 2009 Co-Directed By: Professor Dr. Rebecca Hamilton and Professor Dr. Joydeep Srivastava, Department of Marketing This Dissertation comprises two essays that investigate how consumers’ different motivations affect their cognitive responses and consumption behavior. Essay 1 shows that consumers’ motivation to rely on their own opinion...»


«“Heidegger and Marcuse: On Reification and Concrete Philosophy,” The Bloomsbury Companion to Heidegger,” F. Raffoul and E. Nelson, eds., Bloomsbury Press, 2013, pp. 171-176 Andrew Feenberg Introduction Herbert Marcuse (1898–1979) completed his doctorate in 1922 but decided not to pursue the habilitation which would have qualified him for an academic career. Instead, he returned to Berlin where he established an antiquarian bookstore with a partner. When he read Being and Time shortly...»

«Multiple System Atrophy and Parkinson’s disease Thesis submitted for the degree doctor of philosophy By Haya Kisos Submitted for the senate of Hebrew University June 2013 This work was carried out by supervision of Dr. Ronit Sharon and Prof. Tamir Ben Hur Abstract: The synucleinopathies are a diverse group of neurodegenerative disorders that share a common pathologic intracellular lesion, composed primarily of aggregates of insoluble α-Synuclein (α-Syn) protein in selectively vulnerable...»

«Visualizing Users, User Communities, and Usage Trends in Complex Information Systems Using Implicit Rating Data Seonho Kim Dissertation submitted to the faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science and Applications Advisory Committee: Edward A. Fox, Chair Weiguo Fan Christopher North Deborah Tatar Ricardo da Silva Torres April 14, 2008 Blacksburg, Virginia Keywords:...»

«UNIVERSITY OF CALIFORNIA, IRVINE Evolution of Cooperation: Comparative Study of Kinship Behavior DISSERTATION submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in Mathematical Behavioral Science by ¨ Bahattin Tolga Oztan Dissertation Committee: Professor Douglas R. White, Chair Professor Cailin O’Connor Professor Louis Narens 2016 ¨ c 2016 Bahattin Tolga Oztan DEDICATION To Mom, Dad and Noyan Kalyoncu ii TABLE OF CONTENTS Page LIST OF FIGURES v...»

«The Chosen Universalists: Jewish Philanthropy and Youth Activism in Post-Katrina New Orleans by Moshe Harris Gedalyah Kornfeld A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Anthropology) in the University of Michigan 2015 Doctoral Committee: Professor Stuart Kirsch, Chair Professor Ruth Behar Emerita Professor Gillian Feeley-Harnik Professor Deborah Dash Moore Professor Elisha Renne © Moshe Kornfeld 2015 DEDICATION To Rachel ii...»

«The Role of the Tear-Film Lipid Layer in Tear Dynamics and in Dry Eye by Colin Francis Cerretani A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Chemical Engineering in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Clayton J. Radke, Chair Professor John Newman Professor Stephen J. S. Morris Spring 2013 Abstract The Role of the Tear-Film Lipid Layer in Tear Dynamics and in Dry Eye by...»

«Fundamental Investigations of Anhydrous Metal Dodecaborates for Energy Applications A Dissertation Submitted for the Degree of Doctor of Philosophy Author: He Liqing Supervisor: Akiba Etsuo 2015 Fukuoka, Japan Fundamental Investigations of Anhydrous Metal Dodecaborates for Energy Applications A Dissertation Submitted to Engineering of KYUSHU UNIVERSITY in Partial Satisfaction of the Requirements for the Degree of Doctor of Philosophy By He Liqing DEPARTMENT OF HYDROGEN ENERGY SYSTEMS GRADUATE...»

«Piedras votivas de Pampacolca Nuevos datos sobre las lajas pintadas del sur del Perú Votivsteine aus Pampacolca Neue Daten über bemalte Steinplatten aus Südperu Inauguraldissertation zur Erlangung des Grades eines Doktors der Philosophie am Fachbereich Geschichtsund Kulturwissenschaften der Freien Universität Berlin vorgelegt von Renata Faron-Bartels M.A. Berlin 2011 1. Gutachter: Prof. Dr. Jürgen Golte 2. Gutachterin: Prof. Dr. Ursula Thiemer-Sachse Tag der Disputation: 10. 11. 2009 2...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.