Research Article

Text Document Clustering Using Semantic Neighbors

Malihe Danesh
Young Researchers Club, Jouybar Branch, Islamic Azad University, Jouybar, Iran

Hossein Shirgahi
Young Researchers Club, Jouybar Branch, Islamic Azad University, Jouybar, Iran

Data clustering is a powerful technique for discovering knowledge from textual documents. In this field, K-means family algorithms have many applications because of simplicity and high speed in clustering of large scale data. In these algorithms, the criterion of cosine similarity only measures the pairwise similarity of documents that it doesn't have fine operation whenever the clusters are not properly separated. On the contrary, the concepts of Neighbors and Link with the spot of general information in calculating of closeness rate of two documents, in addition to pairwise similarity between them, have better operation. In this model, semantic relations between words have been ignored and only documents with the same terms have been clustered together. This study uses WordNet Ontology for making new model of documents representation that semantic relations between words for reweighing words frequency in documents vector space model, have been used and then Neighbors and Link concepts applied to this model. Results of using the proposed method (Semantic Neighbors) on real-world text data show better operation than previous methods and more efficient in text document clustering.

PDF Fulltext XML References Citation

How to cite this article

Malihe Danesh and Hossein Shirgahi, 2011. Text Document Clustering Using Semantic Neighbors. Journal of Software Engineering, 5: 136-144.

DOI: 10.3923/jse.2011.136.144

URL: https://scialert.net/abstract/?doi=jse.2011.136.144

Journal of Software Engineering

Article Trend

Total views 751

Authors

Malihe Danesh

Hossein Shirgahi

Keywords

Research Article

Text Document Clustering Using Semantic Neighbors

How to cite this article

Leave a Comment