Subscribe Now Subscribe Today
Research Article
 

Classical Arabic English Machine Translation Using Rule-based Approach



Huda Alhusain Hebresha and Mohd Juzaiddin Ab Aziz
 
Facebook Twitter Digg Reddit Linkedin StumbleUpon E-mail
ABSTRACT

Arabic machine translation has been taking place in machine translation projects during last decade and so, many projects have been carried out to enhance the quality of translation from and into Arabic. This research concentrates on the translation of Classical Arabic (CA) rather than Modern Standard Arabic. The challenges of this research are the difficulty of delivering the appropriate meaning of the CA terms in the target language (English), different sentence structure between two languages, word agreement and ordering problem. Focusing on these issues, the research purpose was to create an automatic translation system to translate text from CA into English using Rule-based approach. The developed system Classical Arabic Machine Translation (CAMT) involves three phases which are: analysis, transfer and generation. In the analysis phase the CA input text is analyzed morphologically and syntactically. The transfer stage includes constructing reasonable rule of the CA input text structure and its equivalent rule in the target language (English). In the final phase, the Arabic source text will be generated to obtain the target text in English. The evaluation process is done by comparing the output produced by our system with the original human translation using iBLEU metric. An accuracy of 89.4% was the results produced by iBLEU algorithm, which prove that using Rule-based approach provides good results in translating CA into English.

Services
Related Articles in ASCI
Similar Articles in this Journal
Search in Google Scholar
View Citation
Report Citation

 
  How to cite this article:

Huda Alhusain Hebresha and Mohd Juzaiddin Ab Aziz, 2013. Classical Arabic English Machine Translation Using Rule-based Approach. Journal of Applied Sciences, 13: 79-86.

DOI: 10.3923/jas.2013.79.86

URL: https://scialert.net/abstract/?doi=jas.2013.79.86
 
Received: October 04, 2012; Accepted: December 18, 2012; Published: February 01, 2013



INTRODUCTION

Using computers to translate text or speech from one spoken human language (source language SL) to another spoken human language (target language TL) is what known as Machine translation MT. Machine Translation has been defined as “an automatic process that translates from one human language to another language by using context information”. In other words, machine translation is the translation of natural languages using a machine (Daniel and James, 2009).

MT has many advantages that make it preferred for more than an accurate human translation. First, MT systems are fast in providing output, which makes them a powerful tool. In fact, MT systems can provide low quality translation in situations where any translation is better than no translation at all or where low quality translation of a large amount of documents delivered in minutes, is more useful than optimal translation in a few weeks. Furthermore, MT systems produce a translation without bias, which can happen with human translators. Finally, MT is significantly cheaper. Unlike human translation, MT is a onetime cost i.e., the cost of the tool and its translation (Salem, 2009).

In case of Arabic language, There are two main types of written Arabic, namely Classical and Modern Standard Arabic (MSA). The term Classical Arabic (CA) indicates to the standard form of written and dialect language which was used by the people of Mecca since the 4th century AD, during the pre-Islamic period. CA is known worldwide as the language of the Qur’an “The Holly Book of Islam” (Amna, 2011). Whereas, Modern Standard Arabic is the universal official language of Arabic nations and is understood by all Arabic speakers. MSA is the approved language that used in official and formal written material, as well as the language adopted in formal TV shows, formal speeches, newspapers, literature, etc. Classical Arabic is still only used for religious purposes and is formally taught in schools; particularly in preparation for the study of religion or Arabic language and literature (Abdelmeneim, 2008). Classical Arabic syntactic and grammatical criteria are currently used by Arabic writers as an essential reference for their writings (Amna, 2011).

Most previous researches of Arabic MT have dealt with problems related to translation from and into Modern Standard Arabic (MSA). NLP topics on Classical Arabic are usually concentrated on the evaluation of the manual human translation of Classical Arabic and information retrieval systems, rather than creating MT systems (Amna, 2011; Shenassa and Khalvandi, 2008; Fauzan and Othman, 2006). For instance, Shenassa and Khalvandi (2008), designed a system to evaluate different English translations of the Qur’an using tools and concepts such as pos-tagging, natural language processing, computational linguistics and machine learning. Another example of such systems is an Information Retrieval System, which was designed by Fauzan and Othman (2006), for retrieving Quranic texts from websites that offer access to Quranic texts within their structure and links.

One work that addresses the problem of translating of classical Arabic is a machine translation system to translate Hadith from classical Arabic into English, using a rule-based approach. The aim of this work was to extract logical structure from Arabic and English in order to design and implement a rule based machine translation that translates Hadith from Arabic into English (Amna, 2011).

SENTENCE STRUCTURE IN CLASSICAL ARABIC

One of the biggest challenges when translating Arabic is mastering sentence structure. Unlike many European languages, Arabic’s structure is not at all similar to English (Hatem and Omar, 2010), for instance the default sentence structure for the Arabic sentence is VSO (Owens, 1984), whereas English follows the structure of SVO (Chen, 1983); therefore, the translation of this sentence is “Yusuf visited Abdullah”. Arabic word order, parts of speech and grammar are also different. Classical Arabic has a sentence structure which is more complicated than Modern Standard Arabic (MSA) sentence structure. In MSA, OSV and SOV sentence structures are not acceptable as a word order. The classical classification of Arabic sentences is: nominal sentences for the sentences containing no verb and verbal sentences that consist of a verb. For the verbal sentences structure, the different word orders of Subject-Verb-Object (SVO), Verb-Subject-Object (VSO), Verb-Object-Subject (VOS) and Object-Verb-Subject (OVS), are all acceptable in Arabic (Attia, 2004; Alla et al., 2005).

Nominal sentence structure: In a nominal clause, the default word order is subject-predicate (Perlmann, 1961). However, nominal sentence word order may change because of special logical stresses upon the predicate (Attia, 2004).

Mainly, nominal sentences could have the following structure (Shirko, 2010):

S→NP {AP/NP/PP}

The following examples present nominal sentences based on their structure:

Nominal sentences of an NP followed by an AP:
 
  Al-samaa safia
  The sky is clear

Nominal sentences of an NP followed by a NP:
 
  Hada modars jaied
  This is a good teacher

Nominal sentences of an NP followed by a PP:
 
  Al-tifl fi al-hadika
  The child is in the garden

Nominal sentences of an AP followed by a NP:
 
  Salamun hiya
  It is peace

In this sentence, the predicate precedes the subject which is a pronoun that refers to the night of power.

Verbal sentence structure: In an Arabic verbal clause, the normal word order is predicate ‘verb’-subject-object. In some cases, the subject could precede the predicate ‘verb’ in verbal sentences (Perlmann, 1961).

The possible word orders for Classical Arabic verbal sentences are SVO, VSO, VOS, OVS, OSV and SOV (Attia, 2004). An example of each sentence structure is provided as follows:

An example of SVO:
 
  Al-rasoul adda ala’manah

An example of VSO:
 
  Adda Al-rasoul ala’manah

An example of VOS:
 
  Adda ala’manah Al-rasoul

An example of OVS:
 
  Ala’manah adda Al-rasoul

An example of OSV:
 
  E’yaak nahno na’bud

An example of SOV:
 
  Nahno e’yaak na’bud

In brief, unlike other languages, Classical Arabic is a VSO, SVO, VOS, OVS, OSV and OSV language. Languages usually follow one or more of these types but to have a language that follows all of them is very rare.

COMPARISON BETWEEN ARABIC AND ENGLISH SENTENCE STRUCTURE

As mentioned previously, Arabic has many various word orders. Whereas, in English sentences must put the subject first, the verb second and finally the object. In the following section, we will present examples of Arabic sentences and compare them with their English translations.

Example 1: Arabic sentence:

English translation: The Prophet fulfil the commitment.

Giving numbers to the order of each word in both sentences as follows:

As we can see from the words order, both Arabic and English sentences have the same word order, which is SVO.

The transfer structure of the sentence is as follows:

AS [Det-Noun[] Verb[] Det-Noun []]
ES [Det-Noun[The Prophet] Verb[fulfil] Det-Noun [The commitment]]

Example 2: Arabic sentence:

English translation: The Prophet fulfil the commitment.

Giving numbers to the order of each word in both sentences as follows:

As a result, the lexical transfer of the sentence is: (Fulfil the prophet the commitment).

In this example, the Arabic sentence structure is VSO; whereas the English sentence is always SVO. Therefore, the translation of the Arabic sentence needs to be reordered so that we can obtain the correct translation. Here we need to reorder the subject, which is in the second place in the source language (Arabic), to be in the first place in the target language (English). We also need to put the verb in the second place in the English sentence, instead of its order as first in the Arabic sentence. By doing this, the transfer structure of the sentence will be:

AS [Verb[] Det-Noun[] Det-Noun[]]
ES [Det-Noun [The Prophet] Verb[fulfil] Det-Noun[The commitment]]

Example 3: Arabic sentence:

English translation: The Prophet fulfil the commitment.

Giving numbers to the order of each word in both sentences as follows:

So, according to previous table the lexical transfer is: (fulfil the commitment The Prophet).

As we can see, the Arabic sentence follows a VOS structure. Therefore, to get the right English translation, we have to reorder all of the sentence components (i.e., subject, verb and object). First, we need to rearrange the subject from the last position in the source sentence, to be the first in the target sentence. Then we need to put the verb in the second place, instead of its first position in the Arabic sentence. Finally, the object must be reordered to be the last word in the English sentence. After completing the reordering of the sentence, the transfer structure of the sentence will be as follows:

AS [Verb[] Det-Noun[] Det-Noun[]]
ES [Det-Noun[The Prophet] Verb[fulfil] Det-Noun[The commitment]]

Example 4: Arabic sentence:

English translation: The Prophet fulfil the commitment.

Giving numbers to the order of each word in both sentences:

Thus, the lexical transfer is as follows:

The commitment fulfil the Prophet

The word order of this Arabic sentence is object first, then the verb and lastly comes the subject (OVS). We can obtain the correct word order in the English sentence by swapping the position of the object and the subject. The transfer structure of the sentences is:

AS [Det-Noun[] Verb[] Det-Noun[]]
ES [Det-N[The Prophet] Verb[fulfil] Det-Noun[The commitment]]

Example 5: Arabic sentence:

English translation: He fulfil the commitment.

Giving numbers to the order of each word in both sentences:

The lexical transfer of this sentence is:

The commitment He fulfil

In this case, the Arabic sentence has an OSV word order. Therefore, we only need to change the position of the object from first to last, in order to have a proper English translation. The transfer structure of the sentences is as follows:

AS [Det-Noun[] Pronoun[] Verb[]]
ES [Pronoun[He] Verb[fulfil] Det-Noun[The commitment]]

Example 6: Arabic sentence:

English translation: He fulfil the commitment.

Giving numbers to the order of each word in both sentences:

The lexical transfer of this example is: (He the commitment fulfil).

This example follows an SOV structure. As we can see, the verb and the object have to be reordered in the English translation, so that we obtain a correct English sentence. Here, the verb needs to be reordered into second place and the object to the last position. The transfer structure of the sentences is as follows:

AS [Pronoun[] Det-Noun[] Verb[]]
ES [Pronoun[He] Verb[fulfil] Det-Noun[The commitment]]

SYSTEM DESIGN AND ARCHITECTURE

The overall process of the translation of Classical Arabic text into English system based on a Rule Based approach is presented in Fig. 1. Generally, the system elements can be divided into three stages: analysis of the source language (CA), transfer between two languages and generation of the target language (English) (Shaalan, 2010). However, the system involves the following steps.

Tokenisation: Tokenising is a pre-process task and it is done by dividing the CA input sentence into individual words as preparatory step for the following process.

E.g. the Tokenising outcome of the sentence is as follows: ,, , , .

Morphological analysis: In this step the morphological analyser analyses each word of the CA entry sentence morphologically and applies certain rules before implementing the derivation rules (Habash, 2006).

Morphological rules rely on the input word features such as gender and person features of the subject/noun and the POS as well as the verb/adjective category.

Fig. 1: General overall view of CAMT process

All of these features should be taken in account in order to get the correct derivation rules (Abu Shquier and Sembok, 2008).

Example 1:

In this example there are five verbs related to subject and the translation of each one is as follows:

As the work of the morphological analyser in this example is to provide information about the gender, number and person of the subject before applying the derivation rules and the information of the subject are as in Table 1.

So based on the previous information the system should be able to apply the correct derivation rules. For instance the gender of the subject is “male” and the number is “one” so whenever we want to mention to this subject using English pronouns the correct one to use is the pronoun “he”.

Table 1: Subject details provided by morphological analyser

Another example, as the sentence is in present simple tense so all verbs related to the subject should be converted to the suitable present simple form. As the subject is singular so the correct verbs to use in this example are:

Syntactic analysis: Syntactic analyser utilises the Arabic dictionary and grammar rules to check the CA input text in terms of spelling and grammar then this information is used to produce the analysis of the text structure as an output (Parsing process). This process starts by assigning all possible POS for each word in CA entry sentence. After that it uses the rules to choose the POS which is suitable for combining of the all sentence words correctly. The next process is converting the CA input sentence into a special data structure tree.

Example 2 of syntactic analyser output:


Table 2: Syntactic analyser POS output

Fig. 2: Parse tree of example 2

The POS outputs produced by Syntactic analyser for this example can be shown in Table 2.

The final structure design created by syntactic analyser for this example can be represented in Fig. 2.

Lexical transfer: This step is for dictionary translation. The task of this step is using the Arabic-English Bi-lingual dictionary to look up the English meaning for each word in the CA phrase. This process is done word by word maintaining the same order as the CA source phrase. The output of this step is a list of CA words and their equivalent English translation.

Example 3:

The lexical transfer produced output of this example is as in Table 3.

Structure transfer: This stage deals with the structure and patterns of the target sentences. The task of this step is lining up the words of the target English sentence based on the English grammar rules.

In example 3 the Arabic sentence has two noun phrases NP1 and NP2 and to have the correct structure for the English sentence the structure transfer swaps the order of the noun phrases so NP1 in Arabic sentence will be NP2 in the English sentence and vice versa NP2 in the Arabic sentence will be NP1 in the English sentence. The structure transfer for example 3 can be seen in Fig. 3.

Table 3: Lexical translation for example 3

Fig. 3: The structure transfer for example 3

Fig. 4: The structure transfer of example 4, AS: Arabic sentence, ES: English sentence

Generation: In this stage the target language sentence will be in its final version which should be correct in terms of its grammar structure and meaning translation. There are two steps to be done in the generation stage which are: morphological generation and syntactic generation. The morphological generator utilises English grammar rules to construct the correct forms of the inflected English words. However, the task of the syntactic generation is to generate the English sentence in its final structure.

Example 4:

The output from the structure transfer stage for this example is:

The English form of the verb produced by the structure transfer stage is always in its default form which is “visit” in example 4 as shown in Fig. 4 and this form is inappropriate to the subject of this sentence. As the main aim of the generation is to produce the final version of the English sentence based on English grammar rules.

Fig. 5: The morphological generation of example 4, AS: Arabic sentence, ES: English sentence

The words of the sentence should be in their correct forms, so the verb “visit” will be reformed according to the English present simple tense rules. Since the subject of this sentence is singular so the correct verb form to use with is “visits”. The generated English sentence of example 4 which is equivalent to the entered CA sentence produced at the end of the generation stage is as in Fig. 5.

RESULTS

The purpose of the experiments is to explore whether the machine translation systems specifically (Google and CAMT) are accurate enough for the translation of Classical Arabic into English. The used evaluation methodology relies on the comparison between the system output and the original translation of the entered text which is human translation. iBLEU metric is our selected methodology to evaluate the CAMT performance. Prifly, iBLEU is a visual and interactive automatic scoring method for machine Translation (Madnani, 2011).

Automatic evaluation of MT Using IBLEU metric: BLEU (Bilingual Evaluation Understudy) is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Quality is considered to be the correspondence between a machine translation’s output and that of a human translation: “the closer a machine translation is to a professional human translation, the better it is” (Papineni et al., 2002).

In first experiment we have evaluated a sample Google and CAMT translation output using the iBLEU algorithm. The evaluation procedure is done sentence by sentence from the test case. After computing N-gram score for all sentences in MT outputs, we calculated the score average achieved by a MT system to know the accuracy of each system. Table 4 presents the blue score of Google and CAMT for 1, 2 and 3 g.

According to results of the iBLEU evaluation, we can assert that the CAMT system performs better than Google in the translation of Classic Arabic. As reported in Table 4 the score average of 1 g for Google is (0.66176) and for CAMT system is (0.8941). In addition, the score average of 2 g for Google is (0.318855) and for CAMT system is (0.762805).

Table 4: iBLEU score for Google and CAMT systems

Whereas, the score average of 3 g for Google is (0.230153) and for CAMT system is (0.67686). So based to results of 1, 2 and 3 g CAMT is able to generate a better translation than Google when it comes to the translation of CA into English.

CONCLUSION

This research has been carried out with the aim of translating Classical Arabic (CA) text. The main objective of this work is to design and implement a machine translation system which is able to translate text from Classical Arabic into English using Rule-based approach. To achieve our goal, we designed a set of rules based on Arabic and English language grammar. The designed rules dealt with different sentence structures between CA and English, word agreement and ordering problem. We have used the Automatic Evaluation (iBLEU) in order to evaluate the correctness of our system results. The methodology relies on the comparison between MT system output and the human manual translation of the tested sentence. The results from Table 4 prove that the CAMT has provided good accuracy results and its performance is better than Google’s when it comes to the translation of Classical Arabic. This is because the dictionary used with the CAMT involves words relevant to Classical Arabic. In addition, unlike other MT systems (i.e., Google) which concentrate on Modern Standard Arabic, the rules of CAMT have been designed to deal with the Classical Arabic sentence structure. As further works, more rules are needed to be developed and implemented so the system output can achieve the best possible result with minimum unsolved obstacles.

REFERENCES
Abdelmeneim, S., 2008. The changing role of Arabic in religious discourse. Ph.D. Thesis, Indiana University of Pennsylvania, Indiana, USA.

Abu Shquier, M.M. and T.M.T. Sembok, 2008. Word agreement and ordering in English-Arabic machine translation. Proceedings of the International Symposium on Information Technology Volume 1, August 26-28, 2008, IEEE Xplore Press, Kuala Lumpur, Malaysia, pp: 1-10.

Alla, R., S. Richard and B. Elabbas, 2005. The challenge of arabic for NLP/MT. Challenges Processing Colloquial Arabic, University of Illinois at Urbana-Champaign, USA.

Amna, M.H., 2011. Translation of hadith from arabic to english using rule-based approach. M.Sc. Thesis, Universiti Kebangsaan Malaysia, Bangi.

Attia, M.A., 2004. Report on the introduction of arabic to ParGram. Proceedings of ParGram Fall Meeting. Dublin: Ireland.

Chen, P.P.S., 1983. English sentence structure and entity-relationship diagrams. Inform. Sci., 29: 127-149.
CrossRef  |  

Daniel, J. and H. James, 2009. Speech and Language Processing. Pearson Education, New Jersey.

Fauzan, M. and N. Othman, 2006. An Information Retrieval System for Quranic Texts: A Proposed System Design Faculty of ICT. International Islamic University, Malaysia.

Habash, N., 2006. Arabic Morphological Representations for Machine Translation. Center for Computational Learning Systems, Columbia University, New York, USA.

Hatem, A. and N. Omar, 2010. Syntactic reordering for arabic- english phrase-based machine translation. Commun. Comput. Inform. Sci., 118: 198-206.
Direct Link  |  

Madnani, N., 2011. iBLEU: Interactively debugging and scoring statistical machine translation systems. Proceedings of the 5th International Conference on Semantic Computing, September 18-21, 2011, Palo Alto, CA., pp: 213-214.

Owens, J., 1984. Structure, class and dependency: Modern linguistic theory and the Arabic grammatical tradition. Lingua, 64: 25-62.
CrossRef  |  

Papineni, K., S. Roukos, T. Ward and W.J. Zhu, 2002. BLEU: A method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, July 07-12, Philadelphia, Pennsylvania, pp: 311-318.

Perlmann, M., 1961. The Structure of the Arabic Langusge. Center for Applied Linguistics of the Modern Language Association of America, Washington, DC., USA.

Salem, Y., 2009. A generic framework for Arabic to English machine translation of simplex sentence using the role and reference grammar linguistic model. M.Sc. Thesis, Computing in the School of Information and Engineering, The Institute of Technology Blanchardstown, Dublin, Ireland.

Shaalan, K., 2010. Rule-based approach in arabic natural language processing. Int. J. Inform. Commun. Technol., 3: 11-19.
Direct Link  |  

Shenassa, M.E. and M.J. Khalvandi, 2008. Evaluation of Different English Translations of Holy Koran in Scope of Verb Process Type. Islamic Azad University Tehran, Iran.

Shirko, O., 2010. Machine translation of noun phrase: Rule based approach. Master Thesis, Universiti Kebangsaan Malaysia, Bangi.

©  2020 Science Alert. All Rights Reserved