Subscribe Now Subscribe Today
Science Alert
Curve Top
Journal of Artificial Intelligence
  Year: 2015 | Volume: 8 | Issue: 1 | Page No.: 1-9
DOI: 10.3923/jai.2015.1.9
Facebook Twitter Digg Reddit Linkedin StumbleUpon E-mail

Automatic Multi-Document Arabic Text Summarization Using Clustering and Keyphrase Extraction

Hamzah Noori Fejer and Nazlia Omar

Automatic text summarization has become important due to the rapid growth of information texts since it is very difficult for human beings to manually summarize large documents of texts. A full understanding of the document is essential to form an ideal summary. However, achieving full understanding is either difficult or impossible for computers. Therefore, selecting important sentences from the original text and presenting these sentences as a summary present the most common techniques in automated text summarization. Arabic natural language processing lacks tools and resources which are essential to advance research in Arabic text summarization. In addition to the limited resources, there has been little attention and research done in this field. Arabic text summarization still suffer from low accuracy as they use simple summarization techniques. The aim of this research is to improve Arabic text summarization by using clustering and keyphrase extraction. This study proposes a combined clustering method to group Arabic documents into several clusters. Keyphrase extraction module is applied to extract important keyphrases from each cluster, which helps to identify the most important sentences and find similar sentences based on several similarity algorithms. These algorithms are applied to extract one sentence from a group of similar sentences while ignoring the other similar sentences. The Recall-Oriented Understudy for Gisting Evaluation (ROGUE) metrics were used for the evaluation. For the summarization dataset the corpus DUC2002 was used. This model achieved an accuracy of 43.4%. The experiments have proved that the proposed model has given better performance in comparison to other work.
PDF Fulltext XML References Citation Report Citation
How to cite this article:

Hamzah Noori Fejer and Nazlia Omar, 2015. Automatic Multi-Document Arabic Text Summarization Using Clustering and Keyphrase Extraction. Journal of Artificial Intelligence, 8: 1-9.

DOI: 10.3923/jai.2015.1.9






Curve Bottom