Asian Science Citation Index is committed to provide an authoritative, trusted and significant information by the coverage of the most important and influential journals to meet the needs of the global scientific community.  
ASCI Database
308-Lasani Town,
Sargodha Road,
Faisalabad, Pakistan
Fax: +92-41-8815544
Contact Via Web
Suggest a Journal
Journal of Computer Science
Year: 2009  |  Volume: 5  |  Issue: 3  |  Page No.: 184 - 190

Selective Flooding Based on Relevant Nearest-Neighbor using Query Feedback and Similarity across Unstructured Peer-to-Peer Networks

Iskandar Ishak and Naomie Salim    

Abstract: Problem statement: Efficient searching is a fundamental problem for unstructured peer to peer networks. Flooding requires a lot of resources in the network and thus will increase the search cost. Searching approach that utilizes minimum network resources is required to produce efficient searching in the robust and dynamic peer-to-peer network. Approach: This study addressed the need for efficient flood-based searching in unstructured peer-to-peer network by considering the content of query and only selecting peers that were most related to the query given. We used minimum information to perform efficient peer selection by utilizing the past queries data and the query message. We exploited the nearest-neighbor concept on our query similarity and query hits space metrics for selecting the most relevant peers for efficient searching. Results: As demonstrated by extensive simulations, our searching scheme achieved better retrieval and low messages consumption. Conclusion: This study suggested that, in an unstructured peer-to-peer network, flooding that was based on the selection of relevant peers, can improve searching efficiency.

Table 1 is the query ID, peer ID and the query keywords that have been answered and also the query hit. Only the latest peer that answered the query will be kept into a table. Routing is based on the similarity values of the query word with the keyword from the past queries stored in the profile. Peers that have high similarity with the query will be selected for routing.

Ant Colony optimization is also used in unstructured peer-to-peer search in[13]. The approach is called SemAnt where it emulates the nature of ants cooperating between themselves to find food based on the pheromone. The peers are the ones who act like an ant and cooperated between them in creating pheromone trails. The pheromone trails is the probabilistic overlay networks and also indicates the most promising path for a given query. As a result, the more popular a query becomes, the better the trail. The experiments shown that the search algorithm is stable, robust and converges fast whole its performance is pretty much acceptable.


Table 1: Profile table

MATERIALS AND METHODS

Relevance-Nearest Neighbor based search (RNN) using query feedback and similarity: Our search approach consist of 3 components; profile table, relevance peers estimation and nearest neighbor selection. Profile table is used to store past query message and query hits from other peers who have answered previous queries. The Relevance Nearest Neighbor based Search or RNN is based on the concept of giving the same weight both on the query hits and query content similarity with the incoming query and selecting only the nearest relevant peers. We also include the nearest neighbor approach to minimize the search cost but at the same time able to retrieve high query hits or recall. The search method is basically a flooding based search but is based on selective flooding. The search algorithm is shown in Fig. 1. The objective of this searching approach is to have an efficient search and high recall.

For a peer p∈P, we use T(p) to denote set of past query maintained by p. Each item in T(p) has two attributes; query term, q and number of hits n, so we denote each data item in T(p) as a pair of q(p), n(p). By using each q and n on T(p), we determine the relevance of a peer by using the similarity of all q(p) in T(p) to calculate the similarity between them. On the other hand, we use n(p) to calculate the stability of each peer in T(p). We define a reference point, which is the highest or optimum value of query similarity and also the highest query hits denoted as d point.


Fig. 1: Relevance Nearest Neighbor Search algorithm that uses the Selective flooding concept

Relevance peer estimation: The relevance-based component is based on the work in[14]. This component uses the peer similarity-hits graph model. Our peer-similarity graph model captures both the peer query hits and peer similarity with corresponding incoming query (Fig. 2). We used both information gathered in the profile table which is based on the work done in[9]. We incorporated both, query content and connection stability information to determine relevant peer to route query. Each peer stores information about past queries and the query hits in a table. There will be no global knowledge shared between all the peers but each peer will also have a list of data collected from the answered query and store it in neighbor profile table (Table 1).

The profile table contains the ID of the answering peer, connection ID, the query words that have been answered by other peers and a timestamp of the returned query. These query words are the words that matched the query sent by this peer and the words are contained in the peer are only answered query words. The list will keep the last M queries and a Least Recently Used (LRU) policy will keep the most recent queries in the table.

The relevance value will be based on two parameters, query hits and the similarity value between the query to be routed and the stored past queries. Query hits determine peer connection stability with the processing peers. The more query hits, the more stable the peer is connected and thus giving the impression of the particular peers connection reliability. Similarity value is based on cosine similarity (1). q is the incoming query while qi is the past queries stored in the profile table in each peer. As an example, let peer A has a list of past queries; d.

Query q is an incoming query and is waiting to be routed q will be compared with all the queries, qi, in d. The similarity between them will determine the relation between the content that the particular peer has in its storage with the query terms. The most relevance of peers is peers that have among the most query hits and it has a content related to the incoming query.


Fig. 2: Similarity and Query hits metric space

Fig. 3: Point of reference estimation

We calculate the relevance value using formula described in (3). In this formula, we are actually calculating the distance of relevance value of the peers inside the profile table with the most optimum relevance point in the similarity-query hits graph. M is the maximum cosine value, in which for the purpose of easy calculation, we decided to define M = 1. hi is the returned hits values for a particular query, while Hp is the maximum hits retrieved from all h that have been recorded. The formula to define the maximum hits (2), involved the use of nearest-neighbor concept, which will be explained later. Np is the total number of query hits of all peers stored in the neighbor profile table:

(1)

(2)

(3)

Nearest neighbor selection: We determine our group of peers within the relevance value by using the nearest neighbor principal. It is based on the fast algorithm for nearest neighbor search proposed in[15]. The purpose of the application of nearest neighbor method is to avoid comparing all the peers inside the table because table with size of N will require N times comparison and relevance calculation process. As we can see in Fig. 3, "nearest neighbors" in our context are the peers that resides within the area of radius r. However, instead of selecting a static value[14], we decided to make the value r dynamic. We determined the dynamic r value by exploiting the inequality of triangle (Fig. 4). The inequalities will determine a bound for peer selections and therefore, less relevance values will be compared to the distance r (Fig. 5).


Fig. 4: Doubling the normal search radius
Fig. 5: Nearest-neighbor search space

RESULTS

We evaluate the performance of our searching approach by extending a peer-to-peer simulator called Peerware. We generate 1020 peers with a total of 95676 documents. Each node holds random number of documents between 5-1486 documents. The document collection used is the Reuters-21578 document collection which appeared on the Reuters newswire in 1987. Three different number of query set are used; q100 that contain 100 random query terms; q75 contains 75 queries and q50 with 50 query terms. Each query terms contain between two to five words. Each peer is country-based and each peer holds news about that particular country. One country could have more than one peer representatives in the network.

We compare our approach with the Most-Query Hits (MQH) and Intelligent Search Mechanism (ISM) approaches. We compare the search approaches based on query recalls, number of messages used and search efficiency. Recalls are the number of query matches with the content of each peer, while number of messages used is the number of messages used to answer a query. Search efficiency is the performance evaluation parameter that is calculated by dividing recalls with number of messages used:

(4)

DISCUSSION

Our experiments showed that our approach (RNN) is efficient in terms of network usage compared to the other two searching techniques. The experiments that employed q50 showed that our searching approach recorded the highest recall when compared to the ISM and MQH techniques. The recall is 4.87% higher compared to the MQH approach and slightly over than ISM approach with 1.67% more than ISM recall (Fig. 6). Figure 7 shows the message usage for all three approaches for query set q50 in which our approach recorded the highest number of message usage than MQH and ISM (4.79 and 2.1% respectively). However, we recorded highest efficiency with 38.1% higher than MQH and 28.8% efficiency higher than ISM (Fig. 8).

RNN also recorded highest recall when employing the q75 query set in the experiment, our approach still managed to record highest recall; as shown in Fig. 9. The RNN search recorded 4.31% higher recall than MQH and 0.85% higher recall than ISM. The RNN recorded 9.4% messages higher than MQH but it yield better search efficiency with 16.5%.


Fig. 6: Recall recorded for 50 queries searching

Fig. 7: Messages used for 50 queries searching

RNN also recorded higher number of messages used than the ISM with 5.12% higher but the RNN approach registered 20.62% higher search efficiency than ISM. Messages used and search efficiency graph for experiment using the query set q75 are shown in Fig. 10 and 11 respectively.

Our experiment using q100 query set also showed that RNN approach recorded high recall than other search technique. As shown in Fig. 12, RNN recorded 1.67% higher recall than MQH approach and 0.22% higher recall than ISM. In terms of message usage, we can show in Fig. 13 that RNN approach recorded 3.01% less message than MQH and 8.5% messages less than ISM approach. In the same experiment setting, we found out that RNN search is the most efficient with 13.51% more efficient than MQH and 17.95% more efficient than ISM (Fig. 14).


Fig. 8: Search Efficiency for 50 queries searching

Fig. 9: Recall recorded for 75 queries searching

Fig. 10: Messages used for 75 queries searching

Fig. 11: Search efficiency for 75 queries searching

Fig. 12: Recall recorded for 100 queries searching

Fig. 13: Messages used for 100 queries searching

Fig. 14: Search Efficiency for 100 queries searching

CONCLUSION

This study exploits the query content and query feedback data for developing flood-based search in unstructured peer-to-peer that is minimal in cost but giving high retrieval. RNN exploits very minimal data and also nearest-neighbor concept to reduce the cost of searching in unstructured peer-to-peer networks. Our simulation tests showed that our searching approach performs better than the other two flood-based searching approaches that also use minimal data and local indices. We showed that by using minimal information of query hits and similarity, efficient search in unstructured peer-to-peer can be achieved

ACKNOWLEDGEMENT

The researchers would like to express their cordial thanks to Demetris Zeinalipour for giving the permission to use the full data set of Reuters-21578 document collection and the Peerware simulator.

" target="_blank">View Fulltext    |   Related Articles   |   Back
 
 
   
 
 
 
  Related Articles

No Article Found
 
 
 
Copyright   |   Desclaimer   |    Privacy Policy   |   Browsers   |   Accessibility