WWW.DISSERTATION.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Dissertations, online materials
 
<< HOME
CONTACTS



Pages:   || 2 |

«Abstract. Passage retrieval and pseudo relevance feedback/query expansion have been reported as two effective means for improving document retrieval ...»

-- [ Page 1 ] --

Enhancing Relevance Models with Adaptive

Passage Retrieval

Xiaoyan Li1 and Zhigang Zhu2

1

Department of Computer Science, Mount Holyoke College, South Hadley,

MA 01075, USA

xli@mtholyoke.edu

2

Department of Computer Science, CUNY City College, New York, NY 10031, USA

zzhu@ccny.cuny.edu

Abstract. Passage retrieval and pseudo relevance feedback/query expansion

have been reported as two effective means for improving document retrieval in

literature. Relevance models, while improving retrieval in most cases, hurts performance on some heterogeneous collections. Previous research has shown that combining passage-level evidence with pseudo relevance feedback brings added benefits. In this paper, we study passage retrieval with relevance models in the language-modeling framework for document retrieval. An adaptive passage retrieval approach is proposed to document ranking based on the best passage of a document given a query. The proposed passage ranking method is applied to two relevance-based language models: the Lavrenko-Croft relevance model and our robust relevance model. Experiments are carried out with three query sets on three different collections from TREC. Our experimental results show that combining adaptive passage retrieval with relevance models (particularly the robust relevance model) consistently outperforms solely applying relevance models on full-length document retrieval.

Keywords: Relevance models, passage retrieval, language modeling.

1 Introduction Language modeling approach is a successful alternative to traditional retrieval models for text retrieval. The language modeling framework was first introduced by Ponte and Croft [19], followed by many research activities related to this framework since then [1, 3, 4, 8, 10-12, 14-18, 20, 21, 23]. For example, query expansion techniques [3,11,12,17,18,21,23], pseudo-relevance feedback [4,11,12,17,18,21,23], parameter estimation methods [10], multi-word features [20], passage segmentations [16] and time constraints [14] have been proposed to improve the language modeling frameworks. Among them, query expansion with pseudo feedback can increase retrieval performance significantly [11,18,23]. It assumes a few top ranked documents retrieved with the original query to be relevant and uses them to generate a richer query model.

However, two major problems remain unsolved in the query expansion techniques.

First, the performance of a significant number of queries decreases when query C. Macdonald et al. (Eds.): ECIR 2008, LNCS 4956, pp. 463–471, 2008.

© Springer-Verlag Berlin Heidelberg 2008 464 X. Li and Z. Zhu expansion techniques are applied onsome collections. Second, existing query expansion techniques are very sensitive to the number of documents used for pseudo feedback. Most approaches usually achieved the best performance when about 30 documents are used for pseudo feedback. As the number of feedback documents increases beyond 30, retrieval performance drops quickly. In our recent work [15], a robust relevance model is proposed based on a study of features that affected retrieval performance. These features included key words from original queries, relevance ranks of documents from the first round retrieval, and common words in the background data collection. The robust relevance model seamlessly incorporated these features into the relevance-based language model in [11] and further improved the performance and robustness of the model. The three features were also used in a recent work by Tao and Zhai [22] with regularized mixture models.

The robust relevance model and the regularized mixture model greatly ease the second problem, i.e. sensitivity of the retrieval performance to the number of documents used for pseudo feedback. However, the solution to the first problem is only partially. As we have reported in [15], the performance of the robust relevance model outperformed the Lavrenko-Croft relevance model and the simple query likelihood model on four test query sets, but it underperformed the simple query likelihood model on a query set against a subset of the TREC terabyte collection.

Passage retrieval is another effective means to improve document retrieval [5,6,7,16]. Particularly in [16], it was incorporated into the language modeling framework via various approaches. However, a major concern of passage retrieval in the language modeling framework is that it hurts retrieval performance on some collections, although it can provide comparable results and sometimes significant improvements over full-length document retrieval on collection with long and multitopic documents. Therefore, one important research issue for both relevance models and passage retrieval is when and how to apply relevance models and passage retrieval for better retrieval performance.

In this paper, an adaptive passage retrieval approach is proposed to document ranking based on the best passage of a document given a query. The best passage of a document is the passage with the highest relevance score with respect to the query. The size of the best passage varies from document to document and from query to query.

The best passage of a document can be a passage of the smallest window size considered or the document itself depends on whether it has the highest relevance score among all available passages. This adaptive passage selection is applied to two relevance-based language models: the Lavrenko-Croft relevance model [11] and our robust relevance model [15]. Experiments are carried out with three query sets on three different collections from TREC, including the ones that caused under-performance in the robust relevance model [15] and the fixed-size passage retrieval approach [16]. Our experimental results show that combining adaptive passage retrieval with relevance models consistently outperforms solely applying relevance models on full-length document retrieval. It indicates that passage-level evidence, if used appropriately, can be incorporated in relevance models to achieve better performance in terms of mean average precision, especially in the case of the robust relevance model.





The rest of the paper is structured as follows. In Section 2, we give a brief overview of the two relevance-based language models used in this paper. Section 3 describes our approach to combining the adaptive passage retrieval with the relevance Enhancing Relevance Models with Adaptive Passage Retrieval 465 models. Section 4 provides experimental results, compared to baseline results of fulllength document retrieval. Section 5 summarizes the paper with conclusions and some future work.

2 Relevance Models

2.1 The Lavrenko-Croft Relevance Model

–  –  –

where Po(w | R) stands for the relevance model of the query and its relevant documents, in which P(w, q1…qk) stands for the total probability of observing the word w together with query words q1…qk. A number of top ranked documents (say N) returned with a query likelihood language model are used to estimate the relevance model. In Equation (2), M is the set of the N top ranked documents used for estimating the relevance model for a query (together with its relevant documents).

P(D) is the prior probability to select the corresponding document language model D for generating the total probability in Equation (2). In the original relevance model approach, a uniform distribution was used for the prior.

–  –  –

2.2 Our Robust Relevance Model Based on the Lavrenko-Croft relevance model approach, we have proposed a robust relevance model to further improve retrieval performance and robustness [15]. In the 466 X. Li and Z. Zhu

–  –  –

Unlike the set M including only top N documents’ models in equation (2) for the Lavrenko-Croft relevance model, the robust relevance model treats the original query as a special document: the set S in equation (3) includes both the query model and the document models for the top N documents.

The robust relevance model also introduces a rank-related prior. In equation (4), |D| denotes the length of document D or the length of the query – the special document.

Rank(D) denotes the rank of document D in the ranked list of documents returned by using the basic query likelihood language model. The rank of the query is set to 0 so that it has the highest rank among all the documents used for relevance model approximation. Z1 is the normalization factor that makes the sum of the priors to 1.

Parameters α and β are used to control how much a document’s length and its rank affect the prior of the document, respectively.

Finally, the robust relevance model discounts common words in the background data collection. In equation (5), Pnew (w | R) denotes the probability of word w in the new relevance model. P (w | B) denotes the probability of word w in the background language model B. γ is the parameter for discounting the probability of a word in the new relevance model by its probability in the background language model. Z2 is the normalization factor that makes the sum of the probabilities of words in the new relevance model to 1. The best values for parameters α, β and γ reported in [15] are used in our experiments in this paper.

3 Combining Passage Retrieval with Relevance Models Passage retrieval can be applied in the language modeling framework. Various approaches were proposed in [16] to implement passage retrieval in the language modeling environment. In the context of relevance models, three different methods R1, R2 and R3 were developed in [16], and the method R1 was reported as the best candidate. Different types of passages including half-overlapped window and arbitrary passage with fixed or variable lengths were also tried for passage retrieval.

Enhancing Relevance Models with Adaptive Passage Retrieval 467 In our paper, as baselines, we use the method R1 with fixed-size half-overlapped windows to retrieve relevance documents. Half-overlapped windows of 150, 350 and 500 words are considered in our experiments.

Given a window size, documents are first broken into half-overlapped passages. A language model is then built for each passage. At query time, a simple query likelihood language model implemented in LEMUR [13] is used to retrieve top passages. In the case of the Lavrenko-Croft relevance model, the top retrieved passages are assumed relevant and used to build a relevance model for the query. In the case of our robust relevance model, both the top passages and the query itself are used to build a relevance model. Once the relevance model is built, a KL divergence score is computed between each passage model and the relevance model. The KL divergence score is then used for ranking passages. Documents finally are ranked based on the score of their best passage.

However, the problem with a fixed-size window approach is that the best performance is achieved with different window size on different collections.

Therefore, it is not clear how to preset a window size at query time on a new collection that is previously unseen. To solve this problem, we propose an adaptive passage retrieval approach in this paper. Documents are ranked based on their best passage. The best query in this context is the passage that can represent a document better than other passages with respect to a query. We observe that the size of a best passage can vary from document to document and query from query, because documents may discuss multiple topics, have a different focus, and by authors with different writing styles. There are various approaches that can be used to locate the best passage of a document. In this paper, we choose a very simple but efficient way to find the best passage of a document to improve retrieval performance for a given query.

We simply take the retrieval results from relevance models on full-length documents retrieval, fixed-size passages on relevance models with several preset window sizes, for example 150, 350 and 500, respectively. With the four result files, each document has four scores: full-length document score, the highest score among all 150-word-long passages, the highest score among all 350-word-long passages, and the highest score among all 500-word-long passages. We take the highest score among the four scores as the score of the best passage of the document. Documents are then ranked according to the score of their best passages. Note that the size of the best passage varies from document to document. The best passage of a document can be a passage of the smallest window size considered (say 150 in this paper) or the full-length document itself depends on whether it has the highest relevance score among all available passages of variable size. Therefore, the adaptive passage retrieval approach combines the best of the full document and fixed-size passage retrieval results.

4 Experiments and Results We have carried out experiments with three TREC query sets on three data collections. All experiments were performed with the Lemur toolkit [13]. The Krovetz 468 X. Li and Z. Zhu stemmer [9] was used for stemming and the standard stopword list of LEMUR was used to remove about 420 common terms. Top 30 documents or passages are used to estimate a relevance model for a query when using relevance model approaches.

Parameters α, β and γ in Equations (3)-(5) are set to the same values as used in [15].

4.1 Data

We used three query sets on three document collections in our experiments. (1).



Pages:   || 2 |


Similar works:

«DID SCHOOL CONCURRENCY AFFECT THE LOCATION OF RESIDENTIAL DEVELOPMENT?: AN ANALYSIS OF SCHOOL CONCURRENCY IN ALACHUA COUNTY, FLORIDA By LARA LEHMAN HOLIMON A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ARTS IN URBAN AND REGIONAL PLANNING UNIVERSITY OF FLORIDA 2012 1 © 2012 Lara Lehman Holimon 2 To my incredible parents, Jim and Beth Holimon, and to my remarkable siblings, Beth Ellen, Paul, and Jimmy...»

«Autocar Date : 30/07/2015 Edition : Web Hyundai Creta vs Renault Duster comparison There are no two ways about it. If there’s one model the new Hyundai Creta has to better it’s the huge-selling Renault Duster. And it’s for no other reason but just how similar these SUVs are. Both use monocoque construction, can seat five and are roughly 4.3 metres in length. A brochure-to-brochure comparison will also reveal that each can be had with petrol and diesel engine options and that both are...»

«TRANSPORTATION RESEARCH E-CIRCULAR Number E-C029 March 2001 Aviation Gridlock Understanding the Options and Seeking Solutions Phase 1: Airport Capacity and Demand Management February 16, 2001 Washington, D.C. NUMBER E-C029, MARCH 2001 TRANSPORTATION ISSN 0097-8515 RESEARCH E-CIRCULAR Aviation Gridlock Understanding the Options and Seeking Solutions Phase 1: Airport Capacity and Demand Management February 16, 2001 Washington, D.C. Cosponsored by Federal Aviation Administration Transportation...»

«Fèisean nan Gàidheal A’ toirt ar dualchais do ar n-òigridh Annual Report 2006 Fèisean nan Gàidheal Taigh a’ Mhill Port-Rìgh An t-Eilean Sgitheanach IV51 9BZ Phone 01478 613355 Fax 01478 613399 E-mail brath@feisean.org www.feisean.org Fèisean nan Gàidheal Annual Report 2006 Introduction from the Chair 1. Fèisean in Scotland 2005-06 2. Staffing 3. Board of Directors 4. Core Activities 5. Development Projects 2005–06 6. Advocacy and collaboration 7. Fèis News 8. 25 years of the...»

«1 MEDIA EDUCATION FOUNDATION 60 Masonic St. Northampton, MA 01060 | TEL 800.897.0089 | info@mediaed.org | www.mediaed.org Girls: Moving Beyond Myth Transcript LYNN PHILLIPS: Parents and other family members who want to help girls really need to start by listening closely to what it is that girls have to say. Often adults kind of presume to know what girls need and what they’re thinking. The more I interview girls, the more everybody tells me, “if you want to help me, listen to me.” So...»

«Archive: eNewsletter December 2011                  UC Berkeley Department of Music   A Letter from the Chair  Featured Stories  A Music Student recently had the opportunity to accompany Chancellor Birgeneau to Shanghai . 4  Visiting Postdocs bring unique research interests and popular classes to the Music Department. . 4  Graduating student starts a musical theatre tradition ...»

«COLONIZATION AND ADAPTATION PATTERNS OF THE HOUSE MOUSE (MUS MUSCULUS DOMESTICUS) ON THE KERGUELEN ARCHIPELAGO Dissertation zur Erlangung des Doktorgrades der Mathematisch-Naturwissenschaftlichen Fakultät der Christian-Albrechts-Universität zu Kiel vorgelegt von Émilie Annie Hardouin Plön, 6th December 2010 Supervisor: Prof. Dr. Diethard Tautz Prof. Dr. Hinrich Schulenburg Date of the oral examination: Dean II A ma grand-mère, Maïté Dominé III List of content: Acknowledgements...»

«PETER PAN [PETER AND WENDY] By J. M. Barrie [James Matthew Barrie] A Millennium Fulcrum Edition (c)1991 by Duncan Research Produced by David Widger Table  of  Contents   Chapter  1  PETER  BREAKS  THROUGH   2   Chapter  2  THE  SHADOW   8   Chapter  3  COME  AWAY,  COME  AWAY!   16   Chapter  4  THE  FLIGHT   28   Second  to  the  right,  and  straight  on  till  morning.   28   Chapter  5  THE  ISLAND  COME...»

«NOAA Coral Reef Information System Glossary of Terminology Coral Reef Information System Home Data & Publications Regional Portals CRCP Activities Glossary Home / Glossary Home / Glossary of Terminology and Acronyms/Abbreviations Acronyms/Abbreviations A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z| 20CR : 20th Century Reanalysis A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z| A : Adenine A/D : Analog/Digital A3H : Activities and Attractions Association of Hawaii AA : Alien Algae AA :...»

«07-103 The Persuasive Appeal of Stigma Michael I. Norton Elizabeth W. Dunn Dana R. Carney Dan Ariely Copyright © 2007 by Michael I. Norton, Elizabeth W. Dunn, Dana R. Carney, and Dan Ariely. Working papers are in draft form. This working paper is distributed for purposes of comment and discussion only. It may not be reproduced without permission of the copyright holder. Copies of working papers are available from the author. The Persuasive Appeal of Stigma 1 RUNNING HEAD: The Persuasive Appeal...»

«Applicability of Tether Deployment Simulation and Tests based on YES2 Flight Data Michiel Kruijff* Delft University of Technology, 2629 HS, Delft, The Netherlands Delta-Utec Space, 2312 TT, Leiden, The Netherlands Erik J. van der Heide Delta-Utec Space, 2312 TT, Leiden, The Netherlands Marco Stelzer Astrium GmbH, 88090 Immenstaad, Germany Tether deployment data from the second Young Engineers Satellite (YES2) experiment is compared to advanced simulator and ground test results, in order to draw...»

«! Internet!Society,!Sierra!Leone!Chapter!!ISOC.SL! National!Information!and!Communication!Technology!Taskforce!NICTTF! Open!Society!Initiative!for!West!Africa!!OSIWA! Internet!Society!–!ISOC!Global! ! ! ! ! ! Sierra!Leone!National!Information!and!Communication! Technology!Policy!Development!! and!! Sierra!Leone!Internet!Exchange! ! ! INTERIM!REPORT! PART!3!–SLIX!Implementation! ! November!2009! ! ! ! ! ! ! ! ! ! ! !...»





 
<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.