HOME JOURNALS CONTACT

Research Journal of Information Technology

Year: 2017 | Volume: 9 | Issue: 1 | Page No.: 7-17
DOI: 10.17311/rjit.2017.7.17
Model of Textual Data Linking and Clustering in Relational Databases
Wael M.S. Yafooz

Abstract: Background: A huge reliance on computer usage in everyday life leads to the continuous increase of large data applications in the form of textual data. The data are reposited to produce meaningful information. Therefore, databases become a backbone in most application software for organizing data into structured form. The structured information provides users with comprehensible knowledge. However, dealing with a large amount of textual data leads to two basic issues; insufficient query processing performance and inaccurate information retrieval. Attempts have been made to resolve both issues by database clustering techniques and textual document clustering. Nevertheless, most of the attempts require several stages of tedious programming scripts in constructing software applications that are external to databases. Materials and Methods: Therefore, this study proposes a Textual Virtual Schema Model (TVSM) to structure extracted textual data, while performing automatic column based information clustering in the internal structure of a relational database. Furthermore, a similarity measurement method is introduced to obtain high accuracy data clusters. An experiment has been conducted on textual Reuters’s corpus, WAP and classic dataset. Then, the clustering results are validated by measuring F-measure, entropy and purity. Results: The results show linkages between structured textual data and unstructured information, high performance of query processing and time improvement in document clustering with accurate clusters. Conclusion: This model envisages a beneficial and useful approach for various domains that involves a large amount of textual data such as document clustering, topic detecting and tracking, document summarization, personal data management and information retrieval.

Fulltext PDF Fulltext HTML

How to cite this article
Wael M.S. Yafooz , 2017. Model of Textual Data Linking and Clustering in Relational Databases. Research Journal of Information Technology, 9: 7-17.

© Science Alert. All Rights Reserved