«Abstract. Passage retrieval and pseudo relevance feedback/query expansion have been reported as two effective means for improving document retrieval ...»
Enhancing Relevance Models with Adaptive
Xiaoyan Li1 and Zhigang Zhu2
Department of Computer Science, Mount Holyoke College, South Hadley,
MA 01075, USA
Department of Computer Science, CUNY City College, New York, NY 10031, USA
Abstract. Passage retrieval and pseudo relevance feedback/query expansion
have been reported as two effective means for improving document retrieval in
literature. Relevance models, while improving retrieval in most cases, hurts performance on some heterogeneous collections. Previous research has shown that combining passage-level evidence with pseudo relevance feedback brings added benefits. In this paper, we study passage retrieval with relevance models in the language-modeling framework for document retrieval. An adaptive passage retrieval approach is proposed to document ranking based on the best passage of a document given a query. The proposed passage ranking method is applied to two relevance-based language models: the Lavrenko-Croft relevance model and our robust relevance model. Experiments are carried out with three query sets on three different collections from TREC. Our experimental results show that combining adaptive passage retrieval with relevance models (particularly the robust relevance model) consistently outperforms solely applying relevance models on full-length document retrieval.
Keywords: Relevance models, passage retrieval, language modeling.
1 Introduction Language modeling approach is a successful alternative to traditional retrieval models for text retrieval. The language modeling framework was first introduced by Ponte and Croft , followed by many research activities related to this framework since then [1, 3, 4, 8, 10-12, 14-18, 20, 21, 23]. For example, query expansion techniques [3,11,12,17,18,21,23], pseudo-relevance feedback [4,11,12,17,18,21,23], parameter estimation methods , multi-word features , passage segmentations  and time constraints  have been proposed to improve the language modeling frameworks. Among them, query expansion with pseudo feedback can increase retrieval performance significantly [11,18,23]. It assumes a few top ranked documents retrieved with the original query to be relevant and uses them to generate a richer query model.
However, two major problems remain unsolved in the query expansion techniques.
First, the performance of a significant number of queries decreases when query C. Macdonald et al. (Eds.): ECIR 2008, LNCS 4956, pp. 463–471, 2008.
© Springer-Verlag Berlin Heidelberg 2008 464 X. Li and Z. Zhu expansion techniques are applied onsome collections. Second, existing query expansion techniques are very sensitive to the number of documents used for pseudo feedback. Most approaches usually achieved the best performance when about 30 documents are used for pseudo feedback. As the number of feedback documents increases beyond 30, retrieval performance drops quickly. In our recent work , a robust relevance model is proposed based on a study of features that affected retrieval performance. These features included key words from original queries, relevance ranks of documents from the first round retrieval, and common words in the background data collection. The robust relevance model seamlessly incorporated these features into the relevance-based language model in  and further improved the performance and robustness of the model. The three features were also used in a recent work by Tao and Zhai  with regularized mixture models.
The robust relevance model and the regularized mixture model greatly ease the second problem, i.e. sensitivity of the retrieval performance to the number of documents used for pseudo feedback. However, the solution to the first problem is only partially. As we have reported in , the performance of the robust relevance model outperformed the Lavrenko-Croft relevance model and the simple query likelihood model on four test query sets, but it underperformed the simple query likelihood model on a query set against a subset of the TREC terabyte collection.
Passage retrieval is another effective means to improve document retrieval [5,6,7,16]. Particularly in , it was incorporated into the language modeling framework via various approaches. However, a major concern of passage retrieval in the language modeling framework is that it hurts retrieval performance on some collections, although it can provide comparable results and sometimes significant improvements over full-length document retrieval on collection with long and multitopic documents. Therefore, one important research issue for both relevance models and passage retrieval is when and how to apply relevance models and passage retrieval for better retrieval performance.
In this paper, an adaptive passage retrieval approach is proposed to document ranking based on the best passage of a document given a query. The best passage of a document is the passage with the highest relevance score with respect to the query. The size of the best passage varies from document to document and from query to query.
The best passage of a document can be a passage of the smallest window size considered or the document itself depends on whether it has the highest relevance score among all available passages. This adaptive passage selection is applied to two relevance-based language models: the Lavrenko-Croft relevance model  and our robust relevance model . Experiments are carried out with three query sets on three different collections from TREC, including the ones that caused under-performance in the robust relevance model  and the fixed-size passage retrieval approach . Our experimental results show that combining adaptive passage retrieval with relevance models consistently outperforms solely applying relevance models on full-length document retrieval. It indicates that passage-level evidence, if used appropriately, can be incorporated in relevance models to achieve better performance in terms of mean average precision, especially in the case of the robust relevance model.
The rest of the paper is structured as follows. In Section 2, we give a brief overview of the two relevance-based language models used in this paper. Section 3 describes our approach to combining the adaptive passage retrieval with the relevance Enhancing Relevance Models with Adaptive Passage Retrieval 465 models. Section 4 provides experimental results, compared to baseline results of fulllength document retrieval. Section 5 summarizes the paper with conclusions and some future work.
2 Relevance Models
2.1 The Lavrenko-Croft Relevance Model
where Po(w | R) stands for the relevance model of the query and its relevant documents, in which P(w, q1…qk) stands for the total probability of observing the word w together with query words q1…qk. A number of top ranked documents (say N) returned with a query likelihood language model are used to estimate the relevance model. In Equation (2), M is the set of the N top ranked documents used for estimating the relevance model for a query (together with its relevant documents).
P(D) is the prior probability to select the corresponding document language model D for generating the total probability in Equation (2). In the original relevance model approach, a uniform distribution was used for the prior.
2.2 Our Robust Relevance Model Based on the Lavrenko-Croft relevance model approach, we have proposed a robust relevance model to further improve retrieval performance and robustness . In the 466 X. Li and Z. Zhu
Unlike the set M including only top N documents’ models in equation (2) for the Lavrenko-Croft relevance model, the robust relevance model treats the original query as a special document: the set S in equation (3) includes both the query model and the document models for the top N documents.
The robust relevance model also introduces a rank-related prior. In equation (4), |D| denotes the length of document D or the length of the query – the special document.
Rank(D) denotes the rank of document D in the ranked list of documents returned by using the basic query likelihood language model. The rank of the query is set to 0 so that it has the highest rank among all the documents used for relevance model approximation. Z1 is the normalization factor that makes the sum of the priors to 1.
Parameters α and β are used to control how much a document’s length and its rank affect the prior of the document, respectively.
Finally, the robust relevance model discounts common words in the background data collection. In equation (5), Pnew (w | R) denotes the probability of word w in the new relevance model. P (w | B) denotes the probability of word w in the background language model B. γ is the parameter for discounting the probability of a word in the new relevance model by its probability in the background language model. Z2 is the normalization factor that makes the sum of the probabilities of words in the new relevance model to 1. The best values for parameters α, β and γ reported in  are used in our experiments in this paper.
3 Combining Passage Retrieval with Relevance Models Passage retrieval can be applied in the language modeling framework. Various approaches were proposed in  to implement passage retrieval in the language modeling environment. In the context of relevance models, three different methods R1, R2 and R3 were developed in , and the method R1 was reported as the best candidate. Different types of passages including half-overlapped window and arbitrary passage with fixed or variable lengths were also tried for passage retrieval.
Enhancing Relevance Models with Adaptive Passage Retrieval 467 In our paper, as baselines, we use the method R1 with fixed-size half-overlapped windows to retrieve relevance documents. Half-overlapped windows of 150, 350 and 500 words are considered in our experiments.
Given a window size, documents are first broken into half-overlapped passages. A language model is then built for each passage. At query time, a simple query likelihood language model implemented in LEMUR  is used to retrieve top passages. In the case of the Lavrenko-Croft relevance model, the top retrieved passages are assumed relevant and used to build a relevance model for the query. In the case of our robust relevance model, both the top passages and the query itself are used to build a relevance model. Once the relevance model is built, a KL divergence score is computed between each passage model and the relevance model. The KL divergence score is then used for ranking passages. Documents finally are ranked based on the score of their best passage.
However, the problem with a fixed-size window approach is that the best performance is achieved with different window size on different collections.
Therefore, it is not clear how to preset a window size at query time on a new collection that is previously unseen. To solve this problem, we propose an adaptive passage retrieval approach in this paper. Documents are ranked based on their best passage. The best query in this context is the passage that can represent a document better than other passages with respect to a query. We observe that the size of a best passage can vary from document to document and query from query, because documents may discuss multiple topics, have a different focus, and by authors with different writing styles. There are various approaches that can be used to locate the best passage of a document. In this paper, we choose a very simple but efficient way to find the best passage of a document to improve retrieval performance for a given query.
We simply take the retrieval results from relevance models on full-length documents retrieval, fixed-size passages on relevance models with several preset window sizes, for example 150, 350 and 500, respectively. With the four result files, each document has four scores: full-length document score, the highest score among all 150-word-long passages, the highest score among all 350-word-long passages, and the highest score among all 500-word-long passages. We take the highest score among the four scores as the score of the best passage of the document. Documents are then ranked according to the score of their best passages. Note that the size of the best passage varies from document to document. The best passage of a document can be a passage of the smallest window size considered (say 150 in this paper) or the full-length document itself depends on whether it has the highest relevance score among all available passages of variable size. Therefore, the adaptive passage retrieval approach combines the best of the full document and fixed-size passage retrieval results.
4 Experiments and Results We have carried out experiments with three TREC query sets on three data collections. All experiments were performed with the Lemur toolkit . The Krovetz 468 X. Li and Z. Zhu stemmer  was used for stemming and the standard stopword list of LEMUR was used to remove about 420 common terms. Top 30 documents or passages are used to estimate a relevance model for a query when using relevance model approaches.
Parameters α, β and γ in Equations (3)-(5) are set to the same values as used in .
We used three query sets on three document collections in our experiments. (1).