HOME JOURNALS CONTACT

Information Technology Journal

Year: 2008 | Volume: 7 | Issue: 7 | Page No.: 1009-1015
DOI: 10.3923/itj.2008.1009.1015
Semantic-Based Segmentation of Arabic Texts
Ameur A. Touir, Hassan Mathkour and Waleed Al-Sanea

Abstract: In this study, we present an automatic technique to help segment the Arabic texts while preserving the semantics. The technique is based on an empirical study on the sentences and clauses connectors. It has evolved from tedious analysis of various Arabic texts and from observations that have been noted over a long period of time. The analysis made it possible to realize the functionality of each connector in terms of separating standalone segments in the Arabic texts. This has lead to a categorization of active and passive connectors. We used the introduced notion of active and passive connectors to develop an algorithm that respects the semantic of the text to identify the segments of a given Arabic text. The algorithm has been implemented and experimented with. Various Arabic essays were segmented using the algorithm and the results were compared to that of manual segmentations performed by linguistic experts. The performance of the algorithm was in line with the manual segmentations that were performed by the linguistic experts.

Fulltext PDF Fulltext HTML

How to cite this article
Ameur A. Touir, Hassan Mathkour and Waleed Al-Sanea, 2008. Semantic-Based Segmentation of Arabic Texts. Information Technology Journal, 7: 1009-1015.

© Science Alert. All Rights Reserved