Subscribe Now Subscribe Today
Science Alert
 
FOLLOW US:     Facebook     Twitter
Blue
   
Curve Top
Journal of Software Engineering
  Year: 2011 | Volume: 5 | Issue: 4 | Page No.: 136-144
DOI: 10.3923/jse.2011.136.144
Text Document Clustering Using Semantic Neighbors
Malihe Danesh and Hossein Shirgahi

Abstract:
Data clustering is a powerful technique for discovering knowledge from textual documents. In this field, K-means family algorithms have many applications because of simplicity and high speed in clustering of large scale data. In these algorithms, the criterion of cosine similarity only measures the pairwise similarity of documents that it doesn't have fine operation whenever the clusters are not properly separated. On the contrary, the concepts of Neighbors and Link with the spot of general information in calculating of closeness rate of two documents, in addition to pairwise similarity between them, have better operation. In this model, semantic relations between words have been ignored and only documents with the same terms have been clustered together. This study uses WordNet Ontology for making new model of documents representation that semantic relations between words for reweighing words frequency in documents vector space model, have been used and then Neighbors and Link concepts applied to this model. Results of using the proposed method (Semantic Neighbors) on real-world text data show better operation than previous methods and more efficient in text document clustering.
PDF Fulltext XML References Citation Report Citation
How to cite this article:

Malihe Danesh and Hossein Shirgahi, 2011. Text Document Clustering Using Semantic Neighbors. Journal of Software Engineering, 5: 136-144.

DOI: 10.3923/jse.2011.136.144

URL: https://scialert.net/abstract/?doi=jse.2011.136.144

 
COMMENT ON THIS PAPER
 
 
 

 

 
 
 
 
 
 
 
 
 

 
 
 
 
 

       

       

Curve Bottom