When a computer translates an entire document automatically and after that
presents it to a human, the process is called Machine Translation (MT). MT sometimes
called Automatic Translation (AT) (Abu-Shquier, 2009).
MT is formally defined as the use of a computer to translate a message, typically
text or speech, from one natural language (human language) into another natural
language (Salem and Nolan, 2009). These definitions
involve accounting some process for the grammatical structure of each language
and using rules and grammar to transfer the grammatical of Source Language (SL)
into Target Language(TL). MT attempts to automate part or all of the process
of translating from one human language to another such as English and Arabic
with or without human assistance (Arnold et al., 1994;
Zughol and Abu-Alshaar, 2005). Depending on challenging
which the work in field of Machine Translation (MT) faced it, whereas requiring
accuracy and consistency at the same time. Among these challenges in MT which
involve ambiguity multiword units such as collocations, idioms and structural
differences units between languages (Nalluri and Kommaluri,
2011) for example word orderings that differ between languages. Another
structural difference problem is tense generation. Tenses may exist in one language
but in another may not. The lexical gaps also present challenges for translation
where the target language has to represent a word in the source language. As
well, the translation from Arabic to Malay is very challenged due to different
structures in both sources and target language. Because of this, the translation
accuracy requires understanding of context, besides understanding of the structure
and rules of both languages.
To shift from previous part, researchers has recently access to many systems, both commercial and research of varying levels of performance.
In case of Malay language, there has been some work on Malay to or from a few
languages like English and Japanese. Ogura et al.
(1999) introduced a prototype Japanese-to-Malay translation system based
on a semantic transfer method. This work address The different structure, where
Japanese and Malay have quite free word orders but the orders are very different,
where Japanese sentences are typically SOV, with the subject followed by the
object and the verb, whereas Malay is SVO, like English. In addition, the noun
phrase, in Japanese, modifiers come before the head noun (like in English),
whereas in Malay they typically follow the head. Abdual Rahman
and Abdul Aziz (2004) develop Example-Based MT system for English-Malay
translation. The system depends on example translations kept in a Bilingual
Knowledge .The work addressing the problem of many English words to be
represented in one Malay word, which then improves the linguistic quality
of translation by English to Malay EBMT.
In case of Arabic language, much research has been conducted, during the last
decade, in MT where the Arabic language was the focus. Most of them are on English
-Arabic MT. Also, there are many researches on Arabic MT to or from other languages
such as Chines. Shaalan (2000) addressed the translation
of the Arabic interrogative sentence into English for automating the translation
of user interfaces. This includes also the (imperative) form of the verbal sentence
(Shquier and Sembok, 2008). This work presents English
to Arabic approach for translating well-structured English sentences into well-structured
Arabic sentences, using a Grammar-based technique to handle the problems of
ordering and agreement. Shirko et al. (2010)
this system present dealing with word order which Arabic exhibits. This poses
a significant challenge to MT. The order in this study focus on noun phrase
only, usually adjective order. Salem (2009) developed
rule based approach using role and reference grammar based on Interlingua approach
to translate from Arabic to English; they used the representation of the logical
structure of an Arabic sentence in the proposed system, their desire to show
how characteristics of Arabic language will affect the progress of MT tool.
Newly works in Arabic (Habash and Hu, 2009). In this
study, they explore different ways of pivoting through English to translate
Arabic to Chinese. They use a standard phrase-based MT approach that is in the
same spirit of most statistical MT to handle different linguistic phenomena
in which Arabic, English and Chinese such as Subject-verb order, pro drop subject
and nominal modification. The word order of Prepositional Phrases (PP) which
differ between Arabic and chines.
Mat (2010) mention that the translation between Arabic
and Malay was begun early in the fifteenth century. In fact, there was a book
which had been translated on common belief (aqidah), entitled Mother of
the Evidence in 1575. The translation then was named The Beginning
of Guidance (Bidayah al-Hidayah). In spite of this, there is no one work
on Arabic -Malay machine translation so far. So this study aim to combine Arabic
and Malay in Machine translation for dialogue by using transfer based approach
at first time.
It is very important to classify the language text to separate templates, then,
develop machine translation of which of them. There are some templates according
||Letter templates, the structure of this template as the following:
letter date, letter header, letter, connectors, letter tail and sender name.
This includes many special forms which need special grammar
||Traditional template: This template used to satisfy the needs of writing
in so many fields like industry, sports, agriculture, economics and software
||The dialogues expression template is text that represents the dialogues
between two people or more. The dialogue is not traditional sentence but
it represents an interactive sentence like question phrase and incomplete
or complete sentence, such as what is your name? and My
name is Ahmed which represent question phrase and complete sentence,
Sentences types of dialogue are likely to be one of the following types:
||Fragment (word or phrase)
Statement type: This type may be a noun phrase as example 1, 2, 3 and 4 in Table 1, or verb phrase as 5,6 and 7 in Table 1.
|| Type of sentences in dialogue
Arabic is morpho-syntactically complex with many differences from Malay. We
describe here the prominent syntactic issues in which Arabic, Malay vary widely:
order verb phrase and noun phrase.
First, as for the word order of noun phrases (example 1, 2, 3 and 4) in (Table
1), Arabic and Malay are similar in that modifiers always precede the modified
such as example 2 in Table 1
kad matrik baru(noun +Adjective). However, if the noun attached with possessive
pronoun in Arabic structure, that lead to different order in the structure of
Malay sentence such as example 3 in Table 1
(noun+pronoun+adjective) where the possessive pronoun
connected with noun ,
in Malay kad matrik baru saya the possessive pronoun follow the
In Malay, if it has the structure (noun+adjective+ Adverb). The adverb comes
in front of adjective (Karim, 1995). In contrast, the
adverb in Arabic followed by the adjective (noun+adjective+adverb) such as example
which translate to Malay Perpustakaan sangat terlalu.
Secondly, Arabic verb subjects may be:
||Pro dropped (verb conjugated ) as example 5 in Table
the subject verb ()
is already connected to the verb ()
which indicates to a third person ()
(kami) the whole word meaning is kami pergi
||SVO the Similarities between Malay and Arabic languages both of them have
been treated as SVO language for example 6 or
||VSO the default word order in Arabic is to begin VSO. By contrast, Malay
language generally subject-verb languages as example 7 in Table
the subject ()
appears after initial verb in Arabic sentence but at the beginning of the
sentence in Malay language "Suami saya bekerja di universiti.
The morphology of the Arabic verb varies in the three cases. By contrast
Malay is generally subject-verb language
Interrogative (question) type: Interrogative sentences used in the dialogue is to ask a question, or search for information using intonation and words between questioning and explanation of the type of head start in two different sentences interrogative, which are:
||With interrogative pronoun: There are some interrogative
pronouns in both languages Arabic and Malay for example ,
"apa nama awak?", where the interrogative
gives the question meaning for the sentence. The following Table
2 realized the other interrogative pronoun that uses in dialogue
|| Interrogative with pronoun
|| Interrogative with prepositional phrase
|| Imperative type
In Table 2 the interrogative
has two meanings "apa and siapa", for "siapa" it used when asking about person
name only, the word apa is rendered by siapa. It is impolite to use interrogative
apawhen ask about name for person (Sulaiman, 2000).
In contrast, when asking about location name we use the word "apa".
||Interrogative with prepositional phrase: The second
type in interrogative sentence can also formed with interrogative prepositional
phrase as example
where the interrogative
consists of a preposition
and a interrogative
so called interrogative with prepositional Phrase. Among the interrogative
prepositional phrase are as following Table 3
Imperative type: An imperative sentence gives an order or makes a request.
To identify the imperative sentence by using imperative verb type at beginning
of sentence. as example 1 in table 4 or it may contain of
the function word ,
sila as second example in Table 4.
Fragment type: Fragment type is an incomplete sentences that are either
a word as
, or phrases such as .
DESIGN ARCHITECTURE AND ALGORITHM
Figure 1 represents the whole process. Various methods of analysis and transformation will be used before obtaining the final result. The methods which are chosen and the emphasis depends largely on the design of the system, however, the system includes the following stages.
After input the any type of dialogue sentences, the tokenization process is
tokenizing each sentence of dialogue for many tokens (words).
|| AMmtDS architecture
The processing translation is based on the transfer approach with three main
components: an analyzer, a transfer component and a generation component as
Analysis components: The analysis step comprises Arabic morphological analyzer and syntactic analysis. The most important steps of this phase as follows:
Morphological analysis of Arabic sentence: Morphological analysis is
the identification of a stem-form from a full (inflected) word-form (and sometimes
also the identification of the syntactic category of the stem) (Shaalan,
2000). Arabic language is considering rich inflectional language, affixes
can come anywhere in the word it can come at the first of the word (prefixes)
or at the end of the word (suffixes).
The process of Morphological analysis as follow:
||Scan each enters word to break it down into beginning additions
(prefix and suffix). What remain form breaking words is stem. Each word
will be passing through the following multi-level
||First level: The input words without prefixes and suffixes, which
means by making comparison the input word with the word stored in the database,
then tokenizing it. If the word is not in the database then go to the second
||Second level: The input word without prefixes and with suffix,
which means to break down all the possible suffixes and then go to the database.
If the word is not in the database then go to the third level.
||Third level: The input words without suffixes and with prefix,
which mean to isolate all the possible prefixes and then go to the database.
If the word is not in the database then we go to the forth level
||Fourth level: The input words with prefixes and suffixes, which
means that we have to isolate all the possible prefixes and suffixes and
then compare it with word stored in database. If the word is not in the
database then we consider it no word
After these process of morphological analysis the assumed example
it gives the result as the Table 5, where the each word analysis
to prefix , stem and suffix.
has morphological analysis the stem
and suffix .
The other words
the morphological has stem only because they do not attach with prefix or suffix.
Syntactic analysis: After morphological analysis the syntactical analysis is the process of analyzing a sequence of tokens (words) to determine its linguistic parts with respect to a given formal grammar. Syntactical analysis start with detecting the basic grammatical categories of V (Verb), N (Noun), Adj (Adjective), Adv (Adverb), Prep (Preposition), Conj (Conjunction), PN (Proper noun), Interrogative (Interr) Pronoun (Pron), function word (funcw) and their features sg (singular), pl (plural).
In Syntactical analysis, will be divided the sentence into smaller grouping
according to their syntactic function. For example in Table 5
can be representation this sentence in next Table 6, where
has grammatical categories verb and the grammatical categories of suffix is
pronoun and has features plural (pl), where the analysis of word
has grammatical categories as preposition (Prep) and Malaysia is proper noun
The transfer components: This second stage in translation, the transfer
component is a system of rules that relate words and structures in one language
to words and structures of another language (Arnold et
al., 1994). Transfer starts with the output of the analysis phase and
ends where the phase of generation starts. The transfer process is carried out
by the help of a database. There two process of transfer: lexical transfer,
Lexical transfer: Lexical transfer applied at the word level. This maps Arabic lexical units and morphological features to their Malay equivalent.
Structural transfer: Structural transfer is applied transfer at the
sentence level. This maps the Arabic syntactic structure result from lexical
transfer to the equivalent Malay syntactic structure. The transfer of structural
rules assigns restructuring of the linguistic parts of the source language and
reordering the words according to linguistic parts of the target language. It
is provide the rules for converting source language into equivalent target language
For the example in Fig. 2 it's carry statement type in dialogue
and has the pattern: V PRON PERP N which is VSO should be transferred to the
pattern: PRON V PERP N, to SVO when translated to Malay should be reorder to
equivalent Malay language structure.
|| Morphological analysis
|| Lexical transfer
For the example
its carry interrogative type in dialogue and has the pattern: INTERR NEG PRON
V N should be transferred to the pattern: INTERR PRON NEG V N ? Translated to
Malay should be reorder to equivalent Malay language.
Another example of transferring rule, for imperative sentence its verb carries
command meaning and it has the pattern: V PRON N should be transferred into
the same pattern: V PRON N. This procedure illustrated in the sentence .
Generation components: The last step in translation process is generation of the target language input. The morphological generation module is responsible for synthesizing the target words in its right form. The input to this module is the origin of the target words which is passed by transfer phase with some information about each word such as number, tense, of noun.
This module synthesizes gives the rights form tense of verbs and the right number of nouns. But the tense of verb not marked in Malay language. However, to plural of noun in Malay by reduplicated the noun.
RESULTS AND DISCUSSION
In general, the aim of this experiment is to investigate whether AMmtDS machine translation system, sufficiently robust to be translated from Arabic dialogue to Malay. The experiment was based on the compare the output AMmtDS system with human translation. The experiment was applied on the set of test data include 56 sentences from university campus dialogues.
There are problems that appeared in machine translation from Arabic to Malay,
implementation by AMmtDS system then classify the problems and assigns suitable
scores for them. The score is given by human expert in translation and it measures
the differences between the human translations depending on amounts of the magnitude
of error mismatch in structure or meaning according effect of problem in sentence
which expressed in a hypothetical translation:
The Error mismatch: There are some mismatches of error test examples which arise some problems in the output target sentence. The following specifies the problems that appeared in the generation target sentence:
||Synonyms problem. This problem shown because different synonyms
of words are involved, such as the verb
could be translated to datang, or berasal
||Ambiguity. This problem appeared because the translation some words has
more than one meaning. For example the meaning of subject pronoun .
Refer to subject pronoun saya or refer to pronoun awak. The structure of
sentence is correct grammatically but it isnt give correct meaning
in dialogue according who speaker and listener. Also interrogative maybe
translate to siapa or as preposition dari or daripada
||Addition and deletion. This problem appeared because the original translation
contains extra words that have no corresponding words in
||Target language translation or target language has no corresponding words
in original translation
This problem appeared because in some situations an exception is made to
to separate two or more nouns. But in Malay used between the last noun and
noun which preceded it only
||Order of demonstrative: This problem appeared because the order
of demonstrative has different order in target language according using
it in dialogue as pronoun demonstrative or as adjective demonstrative such
it translate to buku ini or ini buku.
Experiential results: Table 7 shows the sample for sentences of dialogue. In addition, it shows the output of sentences in AMmtDS system compared with original translation with number of problem that caused to mismatch the translation, then the score for each sentence.
We show the first row in AMmtDS output is match the original translation so it have score 10. In the fifth row the output is most match the original translation because according the problem 2 so it have score 8.
The result of translation evaluation as showing in Table 8 has performed by counting out the total score of sentences. The total score of whole sentences which is 483 divided by the whole number of sentences which (56*10) and finding their overall percentage which is 89.4.
The Table 8 showed that AMmtDS has scored the highest percentage. The significant improvement is attributed to the use of specific rules of dialogue sentences. We have already developed transfer-based framework for machine translation from Arabic dialogue to Malay. The translation form Arabic to Malay was newly developed well in this study. Various rules were discovered and developed in order to be able to cover more problems for Arabic dialogue sentences to Malay.
|| Syntactic analysis
|| Test suite
|| Result of AMmtDS system
This study has been concentrated on issues in the implementation of a transfer-based MT system, which translates the dialogue into Malay. The dialogue is not traditional language but it is interactive which contain interrogative, imperative, statement or fragment sentences. We showed that the transfer based approach is promising and used to automate the translation of set of dialogues of 56 sentences from university campus and that system get 89.4% of correct translation. In future works, solve the ambiguity and the meaning problems by create specialized lexicons, study the more structure and design that developed to execute more complex sentences format which used in dialogue because this study focused on the most structure that used but not all. These improvements will raise the correctness of the translations from 89.4-100%.