INTRODUCTION
Examplebased searching, where user provides an example publication to locate
the most similar set of publications to it, is becoming commonplace in literature
digital libraries (BaniAhmad et al., 2005).
The textbased approaches for computing similarity between publications consider
observing shared terms between publications to be an indicator of similarity
(BaniAhmad et al., 2005; Troy
and Zhang, 2007; Hawking and Thistlewaite, 1996).
Besides implementing examplebased searching, publicationtopublication similarity
computation can prove useful for digital libraries to classify publications
into contexts as well (BaniAhmad and Ozsoyoglu, 2010).
This task is important as it can significantly enhance sorting and grouping
search output of digital collections (BaniAhmad et al.,
2005; BaniAhmad and Ozsoyoglu, 2010).
Studies show that accurately ranking terms can significantly enhance the precision
of search results of digital libraries (Troy and Zhang,
2007) and thus making searching process more effective and efficient.
Studies show that augmenting the importance of terms may improve similarity
scores in keywordbased search (Troy and Zhang, 2007).
Term frequency was the first to be used as an indicator of term importance (Bennett
et al., 2008), the use of term frequencies in order to estimate term
significance was introduced first in Luhn (1958). Since
then, termfrequencybased methods have become the referencepoint by which
new research in documentrelevance computation is evaluated. Two more techniques
for ranking terms can be found in literature. Namely; (i) Chronological Term
Rank (CTR) proposed by Troy and Zhang (2007) and (ii)
and the wellknown Term Proximity (TP) (Troy and Zhang,
2007; Voorhees, 2005).
The CTR value of a term in a document is computed as the position where the
term is first observed in the given document. In Term Proximity, the documents
where searchterms are observed physically close to each other are considered
to be more relevant to user query (Troy and Zhang, 2007;
Voorhees, 2005). In BaniAhmad and
AlDweik (2010), the authors utilize TextRank to compute the importance
of terms and use it to improve the precision of examplebased search in digital
collections.
What motivated applying variable window sizes rather than fixed window size is that we have experimentally observed performance discrepancy when using TextRank with different sizes of windows. One possible cause for that discrepancy of TextRank performance is that the best window size to be used is probably different for different texts.
In TextRank algorithm, two terms are linked to each other if they are observed within a given window size even if they are from two different sentences or even paragraphs. This occurs to the terms that are located toward the end of one sentence and those observed at the beginning of the very next statement. Consequently, those terms will be linked in the final constructed graph. Thus, two terms that are from two sentences or even paragraphs can still recommend each other in TextRank. This raises an important question; why would a term give a recommendation to another term if the two are from two different sentences or even paragraphs that are probably topically different to some degree?
We believe that these cases of probably improper termtoterm recommendation relation produced the observed discrepancy of TextRank termscoring approach for different sizes of windows.
The above observations have led us to consider the length of sentences as a
way to identify edges between terms. This new termranking algorithm is slight
modification of the TextRank algorithm. TextRank is a text summarizing tool
that involves ranking terms (Mihalcea and Tarau, 2004).
We refer to the new modified version of TextRank by PunctuationAware TextRank
or PATeR for short. PATeR is similar to TextRank except that it considers variable
size of window when forming text graphs. That is; the window size is determined
by the length of sentences rather than considering fixed size windows. For that,
PATeR takes into consideration the punctuation of the text to capture the beginning
and end of statements.
In this study, we introduce enhancing examplebased searching through termranking computed by PATeR. The PATeR algorithm is superior to CTR as it recursively computes term importance from the entire publication textual content where it is observed, rather than relying only on local specific information.
The major finding of the paper is that PATeR works well when paired with Okapi BM25. We evaluate the performance of PATeR and compare it against the performance of CTR and the TP approaches. Up to 9% improvement, in terms of precision, over CTR is observed.
THE OKAPI BM25 MODEL
As mentioned before, in examplebased searching in Literature Digital Libraries (LDL), the current user provides a publication P and is asking for a set of the topK similar publications to P from the publications in the repository being searched S.
In keywordbased search, document D relevance to a given set of search keywords
W is computed as the similarity level between W and D. In the context of examplebased
searching, W are the words appearing in the abstract of the example publication
(BaniAhmad et al., 2005). One wellknown document
relevance estimation measure is the Okapi BM25 (Troy and
Zhang, 2007).
The Okapi BM25 is a termfrequency/inversedocumentfrequency (tf.idf) based
group of relevance estimation measures (Schenkel et al.,
2007). This means that it uses (i) the number of times a term is observed
in a document and (ii) the number of documents where that term is observed,
in addition to (iii) a set of other statistical measures, to compute the document
relevance based on some formula (Troy and Zhang, 2007).
One famous and widely used group of formulae to compute document relevance is
the Okapi BM25 (Robertson et al., 1992).
One famous formula of Okapi BM25 model is (Troy and Zhang,
2007):
where, tf, term frequency, is the number of occurrences of a term in all documents. qtf, the number of occurrences of a term t in the keywords of the query q. df, the number of documents in the collection containing the term of interest. idf, document frequency is most commonly used in term weights as inverse document frequency. dl, document length. N, Number of documents in the collection. avdl, the average length of all documents. q is the set of query terms, i.e. the terms that we are searching for. d ∩ q is the set of terms observed in both q and the document at hand d.
In the context of examplebased searching q is the set of words observed in the example publication.
OKAPI BM25 MODEL PAIRED WITH IMPORTANCE OF TERMS
Okapi BM25 is based on simple statistics. However, with the growing difficulty of achieving further retrieval improvements (higher precision and recall) using only term frequencies as in Eq. 1 above, there has been an increasing interest in information derived from document structure. Example of such information that can be derived is the relative importance of terms that appear in documents. Next we present multiple possible approaches to estimate term importance and augmenting it with Eq. 1.
The chronological term rank: In Troy and Zhang (2007),
Chronological term rank (CTR), which captures the positions of terms as they
occur in the sequence of words in a document, were used to enhance relevance
score between search terms and the documents to be searched. The CTR model is
defined as follows: let D = (t_{1}, . . . , t_{n}) be a document
where t_{i} are terms (words) ordered according to their appearingsequence
in D. The term rank t_{r} of term t is the location i where t appears
first in D.
CTR tokenimportance measure is augmented in Eq. 1 in multiple
ways (Troy and Zhang, 2007), one of which is:
where, R_{CTR} is the CTRbased term importance score. The R_{CTR} value is added by the ratio before.
In the formula above, R_{CTR} is computed as follows:
where, C and D are constants and found to be C = 0.6, D = 0.6 for best results
(Troy and Zhang, 2007). tr is the CTRbased term importance
score and dl is the length of the document.
TextRank: In Mihalcea and Tarau (2004), TextRank
is introduced and used for text summarization. TextRank algorithm involves mainly
two steps as we stated before. In the first, importances of words, observed
in text to be summarized, is computed. For that PageRank algorithm (Page
and Brin, 1998) is used.
TextRank applies PageRank on a special graph built from text as follows. Vertices
of this graph are the contentbearing words (i.e., all words observed in text
excluding stopwords or noise words such as “the”, “an” and
“who” (Wikipedia, n.d.)). A link is established between two words
(vertices) if they are observed together within a pregiven window size, W.
In Mihalcea and Tarau (2004), best choice of W is found
to be 20 for textsummarizing purposes. PATeR is similar to TextRank except
in that the window size is the length of statements, which may vary.
Term proximity: Term proximity refers to the lexical distance between
searchquery terms and is calculated as the number of words (including or excluding
stopwords) separating query terms in a document (Vechtomova
and Karamuftuoglu, 2008; Hawking and Thistlewaite, 1996;
Troy and Zhang, 2007).
One way to augment term proximity information to Okapi BM25 is presented in
Buttcher et al. (2006). In Schenkel
et al. (2007), an efficient evaluation framework including a proximity
scoring function integrated within a query engine for text retrieval is presented.
The following equation is used in this study:
where, Score_{BM25} (d, q) is defined in Eq. 1 above and accd(t) is the accumulated interim score (acc) for the query term t that depends on the distance of this term occurrences to other, adjacent query term occurrences.
The value K is computed as:
where, b, k1 and k are configurable parameters that are set to b=0.5 and k=k1=1.2,
respectively (Schenkel et al., 2007). And finally
the avdl is the average document length.
Next in our experiments we propose and evaluate replacing the accd score of
terms with the PATeR scores. The configurable parameters are assigned the same
values used in Schenkel et al. (2007) and Buttcher
et al. (2006).
THE PATeR TERM RANKING APPROACH
TextRank applies the PageRank algorithm on semantic graphs extracted from natural
language text. In such a graph terms represents vertices and links represents
semantic relationships between those terms. Thus, the PageRank scores obtained,
which are referred to as TextRank scores, are found to give a good approximation
of the relative importance of the terms within the document where they are observed
(Mihalcea and Tarau, 2004).
Formally, let G=(V,E) be a directed graph with the set of vertices V and set
of edges E. For a given vertex vi, let In(vi) be the set of vertices that point
to it and Out(vi) be the set of vertices that vertex vi points to. The score
of vi is recursively computed as follows (Page and Brin,
1998):
where, d is a damping factor that can be set between 0 and 1 (Page
and Brin, 1998). In general, out(v) is the number of pages cited by page
v and ln(v) is the set of pages citing page v. S(v) is the PageRank score of
vertex v.
In the context of Web surfing, the PageRank algorithm implements the “random
surfer model”, where a webuser is assumed to randomly click on links with
a probability level of d and jumps to a completely new page with probability
(1d) (Page and Brin, 1998). It has been found that
choosing d to be 0.85 gives accurate ranking results (Page
and Brin, 1998).
Starting from arbitrary PageRank scores assigned to each node in the graph at hand, the computation iterates until convergence. A PageRank score after convergence of some vertex v represents the “importance” of v within the graph.
To enable the application of graphbased ranking algorithms to texts, we have
to build a graph that represents the text and interconnects words can be added
as vertices in the graph. The application of graphbased ranking algorithms
to natural language texts consists of the following main steps: (1) identifying
text units (words) and adding them as vertices in the graph, (i) identify relations
that connect the text units identified in step 1 and use these relations to
draw edges between vertices in the graph (Mihalcea and Tarau,
2004).
Any relation that can be defined between two text units is useful and can be
added between their vertices (Mihalcea and Tarau, 2004).
For our experiment and implementation of TextRank, we are using a cooccurrence
relation, controlled by the distance between word occurrences as in (Mihalcea
and Tarau, 2004; BaniAhmad et al., 2005;
BaniAhmad, 2009). Two vertices are linked if their
corresponding text units cooccur within a window of maximum W words, where
W is the window size and can be set anywhere from 2 to 10 words (or even more).
In our experiment, we considered W to be covering the length of the sentence
where terms are observed within the abstract of the publication at hand.
The TextRank keyword extraction algorithm proceeds as follows. (i) The text
is tokenized, (ii) stop words are removed and (iii) tokens are stemmed using
Porter stemmer (Salton, 1989; Kowalski,
1997) algorithm as in Mihalcea and Tarau (2004).
(iv) All text units are added to the graph and an edge is added between them
are added. After the graph is constructed, the score associated with each vertex
is set to an initial value of 1 and PageRank is run on the graph for several
iterations until PageRank scores converge, at a maximum threshold of 0.0001
as in BaniAhmad (2009).
EXPERIMENTAL SETUP
In this section we compare the performance of using TextRank Graphbased termrank
against the use of Chronological Term Rank (Troy and Zhang,
2007) and the Term Proximity (Buttcher et al.,
2006; Schenkel et al., 2007) approaches.
In order to show the effectiveness of ranking terms via the PATeR approach, information retrieval experiments were conducted based on a collection of 90 publications that are manually selected by domain experts of the research topics of those publications. To implement examplebased search, the abstract of a publication from the collection is used to search for similar publications within the complete set. TopK relevant documents (similar publications) are retrieved using an Okapibased information retrieval experimental system developed for evaluation purposes. The Okapi BM25based relevance scores of the result set are later modified by augmenting the TextRank scores of terms as described earlier in the paper. Next we present the list of performance metrics used to comparatively evaluate the proposed approach with the CTR and the TP approaches.
The Precision at topK or (P@K) metric is used to measure the quality of a retrieval result: P@K is computed as the percentage of relevant publications (or similar publications to the example publication at hand) amongst the retrieved top scored K publications. We computed the precision considering multiple values for K; namely, K=5, 10 and 15.
A set of 90 abstracts of publications is used as a testbed. Those publications are divided into three groups each is of size 30. The three groups are carefully selected by domain experts from three different researchareas all in computer science. Each group is further subdivided into three subgroups of size 10 each. Each subgroup represents publications that are highly topically related. This means that, if you singleout one publication from one of the subgroups, the other 9 publications in that subgroup are good similar examples of the singledout one.
For our experiments, we used Eq. 2 above to compute ScoreBM25_{CTR} and Eq. 4 to compute the Term Proximity (TP) score (ScoreBM25_{CTR}). We augmented the PATeR score of terms with Okapi BM25 using the formula of Eq. 6 shown next.
The PATeR score of terms are also used with the formula of Eq. 4. The new formula is:
EXPERIMENTAL RESULTS AND EVALUATION
The effect of window size on TextRank performance: What motivated applying variable window sizes rather than fixed window size is that we experimentally observed discrepancy of TextRank termscoring approach for different sizes of windows. Figure 1 shows the correlation between the size of the window applied and the performance of examplebased search using the modified BM25 formula. The peak improvement was observed at window size of 1 and 3. However, we also observed that in rare cases other window sizes were superior to those two sizes. One possible cause for that discrepancy of TextRank performance is that the best window size to be used is probably different for different texts.

Fig. 1:  Percentage
precision enhancement TextRank over CTR (y axis) vs. the applied size
of window size (x axis) 
Table 1:  Precision
values (P@K) of the different approaches for different selections of K 

Table 2: 
Enhancements of PATeR over CTR 

Notice that the terms that are located toward the end of one sentence and those observed at the beginning of the very next statement are considered to be colocated by TextRank. Consequently, those terms will be linked in the final constructed graph. Thus, two terms that are from two sentences or even paragraphs can still recommend each other in TextRank. This raises an important question; why would a term give a recommendation to another term if the two are from two different sentences or even paragraphs that are probably topically different to some degree?
We believe that these cases of probably improper termtoterm recommendation relation produced the observed discrepancy of TextRank termscoring approach for different sizes of windows.
The above observations have led us to consider the length of sentences as a way to identify edges between terms.
BM25 with PATeRequation 7: In this section we compare the performance of PATeR (Eq. 7) against the Chronological Term Rank (Eq. 2) and the Term Proximity (Eq. 4).
Observation (Table 1): PATeR performed better
than CTR in terms of topK precision for all considered values of K.
Table 2 shows clearly that when augmenting the PATeR scores of terms with the BM25 formula, a maximum of 8.42% enhancement of precession values over that the CTR could achieve.
Observation (Table 1): Compared to (TP) results,
(PATeR) showed similar or comparable improvement in precision to TP.
This observation generally applies to all considered levels of precision; namely;
precision at top5 (p@5), top10 (p@10) and top15 (p@15). But if we consider
the time and space complexity required by the TP approach, the PATeR approach
provides us with less time and space complex alternative to the TP approach
at a close or slightly better level of precision than TP.
Table 3:  Precision
when applying PATeR 

Table 4: 
Precision enhancement of PATeR over CTR 

BM25 with PATeRequation 6: In this section we compare the performance of PATeR (Eq. 6) against the Chronological Term Rank (Eq. 2) and the Term Proximity (Eq. 4).
Observation (Table 3 and 4):
Compared to CTR, PATeR showed comparable and in some cases higher precision
values, to those of CTR.
We also observed that PATeR has relatively higher topK precision values for K=5. This saves the time of the current user as s/he finds what s/he is looking for on top of the search results list.
To summarize, we also observed that, in general, PATeR outperforms or is comparable to TP and outperforms CTR.
CONCLUSION
In this research we introduced enhancing relevance scores with the PunctuationAware TextRank term ranking algorithm (PATeR). Just like TextRank. PATeR is different than TextRank in the sense that the edges between terms are added considering variable rather than fixed window sizes. The window size in PATeR is the length of the statement. Present experiments show that PATeR has significant improvement over CTR when augmented into the wellknown BM25 formulae. PATeR has slight improvement over TP, but with less computational overhead.
ACKNOWLEDGMENTS
We would like to thank the reviewers for their very valuable comments that have substantially improved the paper in both substance and in presentation.
THE CASE EXPLORER PROJECT
The work conducted in this paper is part of the CASE EXPLORER project (20032008). The project is resumed by Sulieman BaniAhmad at AlBalqa Applied University in Jordan. The CASE EXPLORER is a scoreguided searching and querying prototype portal for ACM SIGMOD Anthology, a digital library for the database systems research community, containing about 15,000 papers.
As an extension of the CASE EXPLORER project, BaniAhmad resumed the project
in Jordan and worked on enhancing examplebased searching in literature digital
libraries through term ranking. A part of this is the work is presented in this
article. The other part can be found in BaniAhmad and AlDweik
(2010).