Asian Science Citation Index is committed to provide an authoritative, trusted and significant information by the coverage of the most important and influential journals to meet the needs of the global scientific community.  
ASCI Database
308-Lasani Town,
Sargodha Road,
Faisalabad, Pakistan
Fax: +92-41-8815544
Contact Via Web
Suggest a Journal
 
Articles by Ghassan Khazal
Total Records ( 1 ) for Ghassan Khazal
  Ghassan Khazal and Alexander Zamyatin
  Arabic is one of the most complex languages and it has a rich vocabulary also it has difficult and different structure when compared with the others languages. Arabic language has many challenges in text mining one these challenges are how to achieve highest classification accuracy. We proposed in this research a feature engineering of the best combination of preprocessing procedures with appropriate feature representation that has direct affected the classification accuracy of the Arabic text. Preprocessing and feature representation represent the main steps in any text classification framework. This phase is very important to design any text classifier that deals with this sophisticated language. In this study, we used four classification classifiers Support Vector Machine (SVM), Decision Tree (DT), Naive Bayes (NB) and K-Nearest Neighbor KNN. From analysis and experimental results on Arabic text data we reveal that preprocessing techniques and feature representation and weighting have an important influence on the classification accuracy. Also, its depend on choosing the suitable combinations of preprocessing tasks with the appropriate feature representation and classification techniques provides a good improvement in the accuracy of classification. This study shows that the SVM (82.6%) and KNN (78.33%) have better performance on average over the DT (57.49%) and NB (76.21%). The SVM achieved accuracy (88.67%) with the combination of tokenization, filtering, normalization and light stemming with TFIDF as feature representation and KNN classifier gives 88.00% using the combination of tokenization, filtering as preprocessing and TFIDF as feature representation with information gain as feature selection.
 
 
 
Copyright   |   Desclaimer   |    Privacy Policy   |   Browsers   |   Accessibility