### I. Introduction

### II. Related Work

### III. Methods

### 1. OHSUMED Test Collection

### 2. Test-Bed Information Retrieval System

*q*and a document

*d*, as follows:

*t*is a term of query

*q*,

*n*is the number of documents containing the term

*t*across a document collection that contains N documents, and

*f*is the frequency of the term

_{d,t}*t*in document

*d*.

*K*is

*k*((1-

_{1}*b*) +

*b*×

*dl*/

*avdl*). The parameters

*k*,

_{1}*b*, and

*k*are set by default to 1.2, 0.75, and 1,000, respectively. The parameters

_{3}*dl*and

*avdl*are the document length and the average document length, respectively, measured in some suitable unit (in this study, we used the byte length).

### 3. Term Ranking Algorithms

*t*in which algorithms are grouped according to the categories of their underlying common features.

### 4. Experimental Design

*w'*of a query term

_{q,t}*t*are described, in which

*w*is the weight of the term

_{q,t}*t*in the unexpanded query, and the tuning parameters are set to a default value (i.e., α = β = 1).

#### 1) Standard Rocchio formula

*w*is the weight of the term

_{k,t}*t*in a pseudo-relevant document

*k*(which equals the

*w*component of the Okapi BM25 formula).

_{d,t}#### 3) Variant 1

*max_norm_score*is the normalized value assigned to term

_{t}*t*by dividing the score of term

*t*by the maximum score of the term ranking algorithm used. We called this a

*max_norm*term reweighting.

#### 4) Variant 2

*max_norm*term reweighting does not accurately reflect the importance of the relative ranking orders of terms sorted by the term ranking algorithm. To understand the importance of their orders and to provide comparisons to the

*max_norm*term reweighting, a formula to emphasize how well terms were ordered by the term ranking method used was devised by the authors, as follows:

*rank_norm_score*is the evenly decreasing, normalized value assigned to term

_{t}*t*according to the rank position of term

*t*in the sorted term list and calculated by 1 - (

*rank*- 1) / |term_list|, where

_{t}*rank*is the rank position of term

_{t}*t*and |term_list| is the number of terms in the expanded query. We call this a

*rank_norm*term reweighting.

#### 5) Probabilistic term reweighting

*log*((

*N*-

*n*+ 0.5)/(

*n*+ 0.5))) of the Okapi BM25 formula is replaced with the new weight.

### 5. Evaluation Measurements

*t*-test, which is one of the recommended methods for evaluating retrieval experiments [33].

### IV. Results

### 1. Performance at the Default Parameters

*t*-test are indicated by

*p*< 0.01 and

*p*< 0.05. As shown, using default parameter settings did not noticeably improve any of the term ranking methods for the OHSUMED test collection. However, it was interesting that only the CHI1, CHI2, and F4MODIFIED term ranking algorithms, which favor infrequent terms, showed a statistically significant improvement when using Rocchio term reweighting. We analyzed the overlapping ratio of expansion terms between pair-wise term ranking algorithms. Figure 1 shows the top 15 overlapping ratios, where term ranking algorithms were linked according to the ratio of overlapping terms. As can be seen in the figure, although CHI1, CHI2, F4MODIFIED, and IDF found similar terms using their term ranking algorithms, IDF did not show a significant improvement. It appears that a few unique terms, expanded by a specific term ranking algorithm, may have a significant improving effect. In addition, because the performance of term ranking algorithms was differentiated by the term reweighting algorithms applied, both term ranking and reweighting methods should be taken into account when evaluating the performance of the PRF algorithms. With the default parameter setting, it is likely that, in conjunction with Rocchio term reweighting, the term ranking algorithms that favor infrequent terms can perform better in comparison to the other term ranking algorithms.

*max_norm*and

*rank_norm*term reweighting can explain the MAP differences in both the scores and the ranks of terms produced by different term ranking algorithms, as well as in the distinct terms selected by these algorithms. By comparing the

*max_norm*and the

*rank_norm*term reweighting methods from Table 2, we can see that rank-based normalizations are generally better than score-based ranking algorithms. It appears that the ranked order of terms is more important than the actual scores of the terms. Therefore, we performed a

*t*-test comparison between pair-wise term ranking algorithms using

*rank_norm*term reweighting. The

*p*-values are given in Table 3; only

*p*-values lower than 0.01 and 0.05 are reported. As shown, the RSV and the LCA performed best with the

*rank_norm*reweighting method, significantly outperforming total_freq, r_lohi and KLD. This indicates that RSV and LCA performed better than the other algorithms at ranking the most useful terms near the top of the list using the default parameter settings. Although none of the term ranking methods resulted in a statistically significant improvement when applied in conjunction with the

*rank_norm*term reweighting (Table 2), it may be valuable to note that most of the term ranking methods performed better for the rank_norm term reweighting than for the Rocchio term reweighting.

### 2. Performance at the Maximum Performance Parameters

*t*-test comparisons between pair-wise term ranking algorithms for the results of the

*rank_norm*term reweighting.

*rank_norm*. Furthermore, for

*rank_norm*term reweighting, LCA significantly outperformed all of the other methods (Table 5). This suggests that LCA can select more useful terms when the pseudo-relevant documents provided are large enough to infer co-occurrence of terms. Throughout our experiments, LCA consistently performed the best at automatic query expansion from the set of OHSUMED retrieved documents.

*max_norm*and

*rank_norm*, showed better improvements with fewer retrieved documents.

### V. Discussion

*max_norm*, and

*rank_norm*.