ArticlePDF Available

English-Arabic Hybrid Machine Translation System using EBMT and Translation Memory

January 2019
International Journal of Advanced Computer Science and Applications 10(1)

January 2019
10(1)

DOI:10.14569/IJACSA.2019.0100126

License
CC BY 4.0

Authors:

Rana Ehab

Modern Academy

Eslam Amer

Queen's University Belfast

Mahmoud E. A. Gadallah

Modern Academy

—The availability of a machine translation to translate from English-to-Arabic with high accuracy is not available because of the difficult morphology of the Arabic Language. A hybrid machine translation system between Example Based machine translation technique and Translation memory was introduced in this paper. Two datasets have been used in the experiments that were constructed by using internal medicine publications and Worldwide Arabic Medical Translation Guide Common Medical Terms sorted by Arabic. To examine the accuracy of the system constructed four experiments were made using Example Based Machine Translation system in the first, Google Translate in the second and Example Based with Google translate in the third and the fourth is the system proposed using Example Based with Translation memory. The system constructed achieved 77.17 score for the first dataset and 63.85 score for the second which were the highest score using BLEU score.

Content uploaded by Eslam Amer

Content may be subject to copyright.

Available via license: CC BY 4.0

Content may be subject to copyright.

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 10, No. 1, 2019

195 | P a g e

www.ijacsa.thesai.org

English-Arabic Hybrid Machine Translation System

using EBMT and Translation Memory

Rana Ehab1, Mahmoud Gadallah3

Computer Science Department, Modern Academy for

Computer Science and Management Technology

Cairo, Egypt

Eslam Amer2

Computer Science Department

Misr International University

Cairo, Egypt

Abstract—The availability of a machine translation to

translate from English-to-Arabic with high accuracy is not

available because of the difficult morphology of the Arabic

Language. A hybrid machine translation system between

Example Based machine translation technique and Translation

memory was introduced in this paper. Two datasets have been

used in the experiments that were constructed by using internal

medicine publications and Worldwide Arabic Medical

Translation Guide Common Medical Terms sorted by Arabic. To

examine the accuracy of the system constructed four experiments

were made using Example Based Machine Translation system in

the first, Google Translate in the second and Example Based with

Google translate in the third and the fourth is the system

proposed using Example Based with Translation memory. The

system constructed achieved 77.17 score for the first dataset and

63.85 score for the second which were the highest score using

BLEU score.

Keywords—Hybrid machine translation system; translation

memory; internal medicine publications; google translate; BLEU

I. INTRODUCTION

In 1952 the first conference on MT came. There was the

first demonstration of a translation system in January 1954,

and it attracted a great deal of attention and since then there

has been no stopping [1]. Since Language technologies are

very successful nowadays Machine Translation has been

applied to the medical domain [2]. The quality of language

technologies is growing very rapidly [2]. People with different

languages can share ideas and information worldwide on

every topic as business, economic, educational, political,

socio-cultural, etc. if machine translation researchers have the

ability to develop a perfect multilingual machine translation

system [3]. The presence of a machine translation that has the

ability to translate any text in any domain at the required

quality is expected in not-too-distant future [2]. Machine

translation must present a reasonable approach to translate

terms to meet commercial needs [4]. Generally users are



it means [2]. However, some applications require much more

than this [2]. As example, in the medical field the beauty and

correctness of the text may not be important, but the precision

and efficiency of the translated message are very important

[2]. Machine translation systems can be used to translate

medical records [2].

The most important task for saving with high-quality

medical services is the communication between medical

physicians and patients [5]. If medical physicians and patients

do not share a common language, the diagnosis and treatment

will be more difficult due to the language barrier that prevents

effective communication [5]. Another case is people who

travel to receive high-quality or affordable medical treatment

that is not available in their home country [5]. When

translating medical information and make it understandable

both physicians and patients will benefit [6]. As an example,

Healthcare Technologies for the World Traveller confirm that

a foreign patient may need a description of their diagnosis

with a related and full set of information [2].

The world has become a small village because of the rapid

changes in information and communication technology via

internet where people from all over the world can connect

with each other in dialogue and communication [7]. The

translation databases and translator workstation such as the

Google Translate (GT), Bing Translate, Yahoo, Babel Fish

and Systran that were developed and influenced by the

internet was the development of computer-based translation

tools [8].

Using websites in translation has been outspread.

However, the task of translating a medical text is not as easy

as translating any other English text because of the complex

information that it contains. So, using existed systems in

translation a medical text produces a translation text with

some problems. Because of the difference in language

categories, current methods are far from being at the degree

where they can be of practical use especially in English-to-

Arabic medical translation.

In the medical domain most institutional and research

information is available as English text [9]   

know English language well they will not be able to make use

of these information without a help [10]. So, the task is

helping everyone to use web and this will be achieved by

automatic language translators [10]. Because of the flow of

information in foreign languages through web the use of

machine translation technology is must [11].

Most of the researches in Arabic Machine Translation are

mainly concentrated on the translation between English and

Arabic because English is a universal language [12]. This will

help in simplifying the Arab communication with other

countries [12]. That was the reason to choose translating from

English to Arabic.

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 10, No. 1, 2019

196 | P a g e

www.ijacsa.thesai.org

The field of Machine Translation research is largely

controlled by corpus-based nowadays, or data- driven

approach [13]. Although Example Based Machine Translation

(EBMT) and Statistical Machine Translation (SMT) are from

corpus-based model each of them has their own advantages

and disadvantages [14]. Example Based Machine Translation

can works well with a limited training and testing datasets



other than Statistical Machine Translation that needs a large

dataset to result a significant translation [15]. Also when the

nature of the training and test are close the Example Based

Machine Translation System works well. Also reusing the

segment of a test sentence that can be found in the source side

of the example-base improves the translation by Example

Based Machine Translation Systems. The idea of Example

Based Machine Translation is getting translation examples of

similar sentences.

Using Example Based Machine translation is often linked

      

[15]. Translation memory (TM), is a database that is today

widely used Computer-Assisted Translation (CAT) tool

prepared for future reuse of already translated texts [16]. The

similarity between them is they both reuse the examples from

the existing translations. The main difference between them is

that Example Based Machine Translation is an automated

technique for translation whereas Translation Memory is an

interactive tool for the human translator [15].

Beside the technique used to build the translation system

the dataset that is used in training and testing of the system is

important. Some machine translation systems evaluation was

low because of the dataset used. So, when building a machine

translation system it is very important to consider the

goodness of the dataset that will be used. So, in the

experiments in this paper two datasets were used. The first

dataset constructed using internal medicine publications from

[17]. The second one constructed using internal medicine

publications and Worldwide Arabic Medical Translation

Guide Common Medical Terms sorted by Arabic which is an

English-Arabic medical dictionary that will be described later.

The attempt to use Example based machine technique and

Translation memory to translate English medical text to

Arabic medical text will be described in this paper. As the

constructed datasets were not large the choice to use Example

based machine technique that works well with a limited

training and testing datasets was the best. Also with the

advantages of Translation memory which are consistency,

speed and cost-saving [16] that will benefit the resulted

translation. Also, from the benefits of using Translation

memory: need for consistent use of terminology, data sharing

of common resources, re-use of already translated and revised

text suggest used of Translation Memory, in its simplest form

a database [16].

The rest of the paper is organized as following : the second

section describes the recently related works in machine

translation in medical domain and non-medical domain, the

third section describes the issues that face medical domain,

the forth section describes the datasets and the hybrid

translation system from English to Arabic that was built, the

fifth section describes the experiments to evaluate the system

built, the sixth section shows the evaluation of the experiments

using BLEU metric and finally the last section shows a brief

conclusion of the work.

II. RELATED WORK

In the medical domain there are many machine translation

systems for various languages have been developed using

different approaches of machine translation. Also machine

translation has been developed to translate English text to

Arabic text but not in medical domain.

Dandapat, et al. [15] used Example-based machine

translation and Translation Memory to translate medical text

from English to Bangle. They translated receptionist dialogues

of medical and primarily appointment scheduling. Their first

step was to collect their data and then building a Translation

Memory automatically from a corpus of patient dialogue using

Moses toolkit. They created two Translation Memories the

first contains phrase pairs that are aligned and the second one

contains the word aligned file [15].

They made five different experiments to show the

accuracy of their system. The fourth experiment achieved the

highest accuracy which is 57.56 [15] where they used their

system with the first and second Translation Memories.

However they achieved the highest accuracy, some errors

appeared, the first was the wrong of source-target equivalent

in both Translation Memory systems [15]. The second in the

recombination step that some words are translated separately

[15].

Névéol, et al. [18] built a statistical machine translation

system to translate systematic reviews from English to French.

They used three different datasets. They made five systems.

During the evaluation the last system achieved the best

accuracy which was (40.00 BLEU) [18] where they used

Cochrane translation table and an integrated translation table

between EMEA and WMT. Also, Subalalitha, et al. [19] tried

to use statistical machine translation to translate from English

to Hindi and achieved accuracy (73.43).

Renato, et al. [20] discussed translating clinical term

descriptions from Spanish to Brazilian Portuguese. HIBA

dictionary was used as a Spanish dictionary. They collected

medical terms of Portuguese language using several sources.

They made two experiments and evaluated them. For both

experiments they used for translation Bing, Google Translate

and their system M-SMT. In the first experiment their system

achieved the highest score which is (58.9) [20] using BLEU

score. In the second experiment their system achieved (86.7)

[20]. That shows that their system achieved the highest score.

As showed that the second experiment achieved higher

scores in all translation systems. However although they

achieved high scores there were some errors as [20]: OOV

words are usually translated into English or left in Spanish, a

part of the corpus had words with spelling in European

Portuguese, Compound medical terms, especially drugs with a

hyphen, possibly misaligned in training.

Li, et al. [21] developed a hybrid translation system

between Dictionary based machine translation technique and

Statistical machine translation technique. They translated

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 10, No. 1, 2019

197 | P a g e

www.ijacsa.thesai.org

query terms in medical domain from English to German and

vice versa [21]. Their corpus was a mix from more than one

corpus. They made two experiments to evaluate their system.

In the first one they used Phrase-based machine translation

system and in the second one they used their system [21].

According to their evaluation they achieved better evaluation

than the first one. Their system achieved (15.3) for translating

from English to German and (24.5) from German to English

which was higher than the accuracy of the other system.

, et al. [22] goal was to translate medical data from

English to Polish and vice versa so they developed a SMT

system for this purpose. For their dataset they used the

European Medical Agency (EMEA) data. To evaluate their

system they made 13 experiments. The results showed that

translating from Polish to English evaluates better than

translating from English to Polish [22] The fifth experiment

achieved the highest score among the other experiments that

was (76.34) for BLEU score for translation from Polish to

English and (73.32) for translating from English to Polish

[22]. Also Johanna Johnsi Rani G, et al. [23] used SMT to

translate medical reports from English to Tamil. They

evaluated their system with the results of Google Translate

and they achieved better accuracy.

, et al. [2] built a machine translation system rely on

using neural network. They used European Medicines Agency

(EMEA) parallel corpus to derive their corpus. The system

translates Polish medical text into English medical text and

vice versa [2]. They made three experiments to evaluate their

system. Their system achieved (24.32) for translating from

Polish-to-English and (17.50) for translating from English-to-

Polish [2] that was lower than the other two experiments. Also

Artetxe, et al. [24] tried to use neural network to translate

from French and German to English but they achieved low

accuracy.

Amer, et al. [25] built a query translation system which is

Wiki transpose for cross-lingual information retrieval (CLIR)

that relied on Wikipedia as a source for translations. They

used the system to check how reliable Wikipedia is to get

corresponding translation coverage of English to Portuguese

and also Portuguese to English queries [25]. For their

evaluation they made two experiments. They used English

Open Access, Collaborative Consumer Health Vocabulary

Initiative dataset in the first experiment [25]. They used a

collection of Portuguese medical terms that were rated by

medical experts as medical terms in the second experiment. A

coverage ratio in Wikipedia about 81% and about 80% [25] in

single English and Portuguese terms respectively was reached.

Rana Ehab, et al. [17] built a machine translation system

using Example based machine translation technique to

translate English medical sentences to Arabic medical

sentences. They constructed their parallel corpus using the

internal medicine publications for internal diseases only [17].

The matching stage was used from Example based technique

to find the closest example from the parallel corpus as the

example based for the system. The second experiment made

using Google translate and the same data were translated to

examine the accuracy of their system but Google translate

achieved higher score than their system. Google translate

achieved (53.56) for BLEU score and their system achieved

(48.86) [17].

Shaalan, et al. [12] built a translation system to translate

English noun phrase into Arabic. They used Transfer machine

translation approach as their system [12]. They evaluated their

system by using 50 titles from the computer science domain as

training dataset for their system and for testing they used other

66 new real thesis titles from the computer science domain.

Their evaluation showed that the system translated 47 noun

phrases correctly and the remaining 109 noun phrases have

problems [12].

Shaalan, et al. [26] built a translation system using Rule-

based transfer machine translation technique to translate

expert systems in the agriculture domain from English to

Arabic and vice versa. This translation process includes

translating knowledge base, in particular, prompts, responses,

explanation text, and advices. Those expert systems are built

in CLAES

[26].

They used for their system a set of real parallel 100

phrases and sentences from both English and Arabic versions

of agricultural expert systems at CLAES that were used as a

gold standard reference test data [26]. They made the

evaluation through two experiments. The second experiment

achieved higher accuracy than the first which is 0.6427 for

English to Arabic direction and 0.8122 for Arabic to English

direction [26]. Also Kouremenos, et al. [27] used also Rule-

based technique to translate Greek to Greek Sign language.

Al-Taani, et al. [28] translated well-structured English

sentences into well-structured Arabic sentences using rule

based approach. They used 184 English proverbs from Al-

Mawrid, English- Arabic dictionary [28]. Also they used 125

well structures English sentences from many text books.

During the evaluation 57,3% of the first dataset translated

correctly and 84.6 of the second dataset translated correctly

[28]. These results were not as they supposed because of many

reasons. From these reasons that proverbs have no specific

structure, also proverbs are much related to the culture of

some nations [28]. Also Mouiad Alawneh, et al. [3] translated

well-structured English sentences into well-structured Arabic

sentences but using Grammar parser and example based

machine translation technique.

As shown in the previous approaches of machine

translation in medical domain most of them used Statistical

machine translation technique and Example based machine

translation technique. There was also an attempt to use neural

network in translation but in comparison with SMT the second

achieved higher score. Also an approach [15] used Translation

memory with Example based technique and achieved higher

scores than using Example based technique with SMT. For

this reason the proposed system is to build a system using

Example based machine translation and Translation memory.

Also as shown that most of English to Arabic machine

translation systems in non-medical domain used Rule based

machine translation approach as they need to analyze the

English text in terms of morphology, syntax and semantic

Stands for Central Laboratory of Agricultural Expert Systems (CLAES),

Agricultural Research Centre (ARC), Egypt, http://www.claes.sci.eg

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 10, No. 1, 2019

198 | P a g e

www.ijacsa.thesai.org

which is not important for English text in medical domain.

The strengths of rationalism method and empiricist method are

merged through using Hybrid machine translation [29].

III. ISSUES WITH MEDICAL DOMAIN

In [17] to construct an efficient Machine Translation

system for a Medical Domain there are two main issues which

are: parallel corpus collection, size and type of corpus. Beside

them there is a third issue which is building a Translation

Memory [15]. The medical terms are different from any other

English terms. For that building an efficient medical corpus is

not an easy task. To evaluate the system two datasets were

used. The first one is [17] where they used internal medicine

publications to build it. The second dataset constructed using

internal medicine publications and Worldwide Arabic Medical

Translation Guide Common Medical Terms sorted by Arabic

which is an English-Arabic medical dictionary to build

English-Arabic parallel corpus. The first corpus consists of

259 medical sentences; for each sentence there are 8 words on

average [17]. The second corpus consists of 509 medical

sentences.

The proposed system uses Example-Based Machine

Translation which is a data-driven machine translation

technique [15, 17] that needs a machine readable parallel

corpus. So when building such a system how many examples

needed musy be known? In a comparison with such systems

the first corpus is very small but the second corpus is larger

than other corpuses as in Tabel 1. The first corpus is small

because it is built from only the medical data of internal

diseases but the second corpus includes more diseases besides

using Worldwide Arabic Medical Translation Guide Common

Medical Terms sorted by Arabic. As seen in Table 1. many

systems have been constructed using a small corpus.

As there is no access to an existed Translation Memory

building a Translation Memory automatically for the proposed

system using

Moses toolkit was considered. A Translation

memory was created based on word aligned file created using

Moses word alignment (Giza++) [15]. Because each source

word has multiple target equivalents all the multiple

equivalent words in sorted order were kept. This Translation

memory will help in the second stage of the system which is

finding the alignment between the result from the database

and its translation.

TABLE I. SOURCE OF MEDICAL TERMS OF PORTUGUESE LANGUAGE

System

Language Pair

Size

TTL

English-> Turkish

488

TDMT

English->Japanese

350

EDGAR

German-> English

303

ReVerb

English-> German

214

ReVerb

Irish -> English

120

METLA-1

English -> French

METLA

English -> Urdu

Moses (http://www.statmt.org/moses/) is a SMT system that

automatically trains a translation model for any language pair.

IV. OUR APPROACH

A. Data Preparation

In each domain words have different meanings so, their

translation has to fit in the excepted representation in the

domain. Therefore to ensure that they are treated consistently

throughout the technical text, it is important to identify them

correctly [30].

In the previous section, as mentioned two datasets were

used. The first one was constructed by [17] where they built it

from the indications and side effects from the internal

medicine publications in both languages English and Arabic

for internal diseases only.

The second dataset were constructed from indications and

side effects from the internal medicine publications for

multiple diseases and Worldwide Arabic Medical Translation

Guide Common Medical Terms sorted by Arabic .After that,

some processing on English data were made as tokenization, a

lower casing, and final cleaning. Pre-processing Arabic

sentences could change the meaning of the sentence due to the

morphology of the language and the meaning of the sentence

is very sensitive in the medical domain .So, no pre-processing

for the Arabic will be done.

B. Translation System

In the example based translation, a system is defined

which contains a set of source language sentences and

corresponding target language sentences. During the run time,

example based translation use bilingual corpus as its database.

This database is stored in the translation memory. In

translation memory, the user translates text these translations

are added to a database, and when the same sentence occurs

again during the translation, the previous translation is

inserted in to the translated document. The advantage of the

example based translation the translation memory saves the

user effort of re translating the sentence and this saves the

processor time and also the user time. EBMT can help to

overcome some of the weaknesses of the other approaches

[31].

With the advantages of the Example Based Machine

Translation approach and the Translation Memory a hybrid

system that uses both of them to translate English medical

sentence to Arabic medical sentence was developed. Arabic

language was chosen as destination language because there is

many possible ways to express the same sentence in Arabic

that provides a significant challenge to MT [3]. The accents of

modern Arabic are well-known as having agreement

asymmetries that are sensitive to word order effects. As all

Example Based Machine Translation system the proposed

approach is from three stages which are: Matching, Adaption

and Recombination [15].

1) The proposed hybrid machine translation system: The

hybrid machine translation system in Fig. 1 is used to translate

medical sentences from English to Arabic using Example

Based machine translation and Translation memory.

User initially inserts the input English sentence, the

sentence then goes to some pre-processing steps: tokenization,

lower casing and stop word removing, then the sentence sent

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 10, No. 1, 2019

199 | P a g e

www.ijacsa.thesai.org

to Example based which is the parallel corpus to find the

closest example by computing edit distance between the input

sentence and each example and this will be discussed later.

The example that gets the highest score will be the closest

example. Then using the parallel corpus the translation of the

closest example will be gotten.

Then the alignment between the input sentence and the

example will be found to find the unmatched portions, this

done while computing the edit distance. Also the alignment

between the example and its translation example will be found

to find the unmatched portions by using the Translation

memory and this will be described in section 4.2.3. Then the

unmatched portions of the input sentence will be replaced with

the unmatched portions of the translated sentence and add or

substitute from the translated sentence and this will be

discussed later.

Finally the un-translated segments that were replaced will

be translated and added to the translated sentence using the

translation memory and then the final translated sentence was

get.

2) Matching stage: In this stage the task is to find the

source closest examples from the database that closely

matches the input sentence and that is done by using word-

based edit distance metric (1) (Levenshtein, 1965; Wagner and

Fischer, 1974) [16].

Score (Si,Se)= 1- 

 || (1)

Where Si denotes the input sentence and Se denotes the

example from the database sentence. So, |Si| and |Se| denotes

the length of an input sentence and example sentence

extracted from database and ED(Si,Se) refers to the word

based edit distance between Si and Se.

Based on the above scoring technique the following

examples from the database in (2) for the input sentences in

(1) were gotten.

(1) a- impaired function of the liver

b- arthrosis

c- nasal congestion

(2) a- impaired function of the kidneys

b- arthritis

c- lung congestion

Then the associated translation St in (3) was gotten for the

sentences in (2) from the database. This translation will be

used in the following subsections to get new translation texts.

(3) a-







3) Adaption stage: In this stage the unsuitable fragments

from the resulted translation from the previous stage were

extracted. For this purpose the three sentences that have gotten

from the previous stage will be aligned, which were: input

sentence Si, the closest example of the source Se and its

translation St.

Fig. 1. Hybrid Machine Translation System.

Aligning the input sentence Si and the closest example Se

is done while computing the edit distance in equation (1). This

is shown in example (4) (4a1) with (4a2) are aligned, in (5)

(5a1) with (5a2) are aligned and in (6) (6a1) with (6a2) are

aligned. Then the closest example Se with its translation St

Input Sentence

Pre-processing

Example Based

Compute ED

Highest Score

Example Based

Translated Sentence

Un-matched portions

of Input and

Example

Translation

Memory

Align

Un-matched portions of

Example and translated

sentence

Replace &

Add or

Substitute

New Sentence with

Un-translated segments

Translation Memory

Translate un-

translated

segements

Translated

Sentence

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 10, No. 1, 2019

200 | P a g e

www.ijacsa.thesai.org

will be aligned by using the Translation memory that was built

and as shown (4a2) with (4a3) are aligned, (5a2) aligned with

(5a3) and (6a2) aligned with (6a3). In the next stage the

unmatched fragments will be replaced and the matched

fragments will keep unchanged.

(4) a



- impaired function of the [1:liver ]



- impaired function of the [1: kidneys]



- [1:



]



(5) a- 1- [1:arthrosis]

2- [1:arthritis]



[1:



]

(6) b -1-[1: nasal] congestion

2- [1:lung ] congestion

3- [1:



]



Recombination stage: After extracting the unsuitable

fragments in the previous stage the next purpose is to adjust

the resulted translation. This is done by adding or substituting

the fragments from the input sentence (Si) with the translation

equivalent sentence (St) [16]. From example (4) {



} need

to be replaced from (4a3) with {liver} from (4a1), from

example (5) {



} need to be replaced from (5a3) with {

arthrosis } from (5a1) and from example (6) {



} need to be

replaced from (



a3)



with { nasal} from



(6a1). And the results

will be the sentence in (7), (8) and (9).

(7) liver



(8) arthrosis

(9) nasal



During the aligning the alignment might not only one to

one align. If the input sentence (Si) has extra segments that

have no align to translation equivalent sentence (St) this

segments are added to the final resulted sentence but if there is

extra segments in the translation equivalent sentence (St) they

will be deleted from the final resulted sentence. After this step

the task is to translate the un-translated segments using two

methods. The first method is to use the translation memory to

get the translation of the un-translated segments. The second

method is to use Google translate as a statistical machine

translation to get the translation of the un-translated segments.

The final result of the translation using Translation memory

showed in (10), (11) and (12).

(10)



(11)



(12)



V. RESULTS AND DISCUSSION

As said before two datasets were used in the experiments

for each dataset four experiments were made to measure the

accuracy of the proposed system using bilingual evaluation

understudy (BLEU) matrix. The datasets were divided to one

word sentences, two word sentences and multiple word

sentences and for each the experiments were made. In the first

experiment Google translate was used as it is a statistical

machine translation [32] that is



widely known with its

robustness, good performance, and the fact that it does not

require manually crafted rules [33] to translate the input

sentences. In the second experiment EBMT was used from it

matching stage only was used and the closet translation was

gotten and takes it as the translation for the input sentence. In

the third experiment the translation memory was used in

recombination stage to translate the unmatched portions. In

the fourth experiment Google Translate was used in

recombination stage to translate the unmatched portions.

BLEU score was used to automatically evaluate the proposed

system. BLEU score captures the fluency of the translation.

The following tables (Table 2 and Table 3) where the four

experiments were made for the whole dataset shows the

accuracy over the two datasets and as shown when using the

proposed system that uses both Example Based Machine

Translation and Translation Memory the results where the best

over the other techniques.

Results in Table 4 and Fig. 2 also in Table 5 and Fig. 3

show that over one word translation, two words translation

and multi-words translation the proposed approach achieved

the highest score over the four experiments and using Google

translate to translate the un-matched portions shows a very

bad score. Also as shown when the input sentence is from

multi-words the score increased.

As shown in Table 6 most of machine translation systems

in medical domain used Statistical machine translation

technique that will cause little accuracy with the dataset used

because of the Arabic morphology and the size of the corpus.

Their datasets were from systematic, clinical descriptions,

queries where they are from hospitals data but the core of the

used dataset were from internal medicine publications that are

used daily by patients and may contain complex data that need

translation.

TABLE II. SYSTEMS ACCURACIES FOR THE FIRST DATASET

System

BLEU

Google Translate

53.56

EBMT

48.86

EBMT+ Translation Memory

77.17

EBMT+ Google

73.07

TABLE III. SYSTEMS ACCURACIES FOR THE SECOND DATASET

System

BLEU

Google Translate

51.06

EBMT

50.82

EBMT+ Translation Memory

63.85

EBMT+ Google

61.43

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 10, No. 1, 2019

201 | P a g e

www.ijacsa.thesai.org

TABLE IV. SYSTEMS ACCURACIES FOR THE FIRST DATASET FOR

DIFFERENT INOUTE SIZE

System

Accuracy

Google

Translate

EBMT

EBMT+

Translation

Memory

EBMT

Google

1 word

translation

51.42

41.52

66.02

48.41

2 words

translation

51.32

47.36

59.21

19.89

Multi words

translation

54.23

52.99

80.93

74.47

Fig. 2. Comparison between Systems Accuracies for the First Dataset for

Different Input Size in the First Dataset.

TABLE V. COMPARISON WITH OTHER SYSTEMS

Reference number

Technique

Dataset type

The proposed system

statistical machine translation system

609 systematic reviews from English

to French

EBMT+ Translation memory

(translation system)

And the dataset used is using internal

medicine publications and Worldwide

Arabic Medical Translation Guide

Common Medical Terms sorted by

Arabic which is an English-Arabic

medical dictionary

Translating from English to Arabic

statistical machine translation system

English to Hindi

statistical machine translation system

clinical term descriptions from

Spanish to Brazilian Portuguese

Dictionary based machine translation

technique and Statistical machine

translation technique

query terms in medical domain from

English to German and vice versa

statistical machine translation system

medical data from English to Polish

and vice versa

statistical machine translation system

medical reports from English to

Tamil

Neural networks

medical data from English to Polish

and vice versa

Neural networks

French and German to English

Example based machine translation

technique matching stage

the internal medicine publications for

internal diseases

Transfer Approach

50 titles from the computer science

domain for training

66 real thesis titles from the computer

science domain for testing

Dataset is Medical Text.

Using Eaxmple Based technique with

Translation Memory

Rule-based transfer machine

translation technique

100 phrases and sentences from both

English and Arabic versions of

agricultural expert systems at CLAES

Rule-based

translate Greek to Greek Sign

language.

rule based approach.

well-structured English sentences into

well-structured Arabic sentences

Grammar parser and example based

machine translation technique

well-structured English sentences into

well-structured Arabic sentences

Google Translate

EBMT

EBMT+Transltion

Memory

EBMT+Google

BLEU Accuracy

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 10, No. 1, 2019

202 | P a g e

www.ijacsa.thesai.org

TABLE VII. SYSTEMS ACCURACIES FOR THE SECOND DATASET FOR

DIFFERENT INOUTE SIZE

System

Accuracy

Google

Translate

EBMT

EBMT+

Translation

Memory

EBMT

Google

1 word

translation

35.33

50.82



.20

14.31

2 words

translation

44.71

50.70

28.46

16.79

Multi words

translation

54.09

52.99

72.49

53.56

Fig. 3. Comparison between Systems Accuracies for the Second Dataset for

Different Input Size in the First Dataset.

Also as shown that when translation from English to

Arabic but not in medical domain mot of them used Rule

based technique where they analyze the English data in terms

of morphology, syntactic and semantic which is not necessary

in medical domain.

VI. CONCLUSION AND FUTURE WORK

A hybrid machine translation system using Example based

machine translation technique and Translation memory was

introduced in this paper to translate English medical terms to

Arabic medical terms in comparison with using Google

translate only to translate, Example based machine translation

system using matching stage only and finally with a hybrid

system using Example based machine translation technique

and Google Translate.

The system that used Example based machine translation

technique with a Translation memory achieved the highest

score in comparison with the other three experiments and this

because Translation memory that was used stores the

translation of each medical term then when using it to translate

the unmatched portions of the input sentence (Si) that were

added to the translated text (St) of the closest sentence (Se)

from the database in the recombination stage translation of

the unmatched portions to the right Arabic medical term will

be ensured. For the first dataset the proposed system achieved

77.17 % and for the second dataset 63.85%. Google translate

translates some of medical terms according to its English

meaning not according to its medical meaning. Also the result

from matching stage produces sentences with unmatched

words between the input sentence and the closest sentence

from the database. Using Google translate also with Example

based machine translation translates the some of the

unmatched portions according to its English meaning not its

medical meaning.

However, using one word translation, two words

translation and multi-words translation datasets achieved high

score for our system but the multi-words translation dataset

achieved the highest accuracy which is 80.93 % for the first

dataset and 72.49% for the second dataset. The reason for that

is because the training dataset contains multi-words sentences

more the one word sentences and also more than two words

sentences.

Adjusting the final result according to the morphology of

the Arabic language could make the resulted translation more

accurate.

REFERENCES

[1]           

       

International Journal of Computer Applications 121, no. 23 ,2015.

[2]      -based machine translation for

medical text domain. based on european medicines agency leaflet

 Procedia Computer Science, 64, pp.2-9, 2015.

[3] Alawneh, Mouiad, Nazlia Omar, T. Sembok, H. Almuhtaseb, and C.

Mellish., “Machine  International

Conference on Biomedical Engineering and Technology, 2011.

[4]      -specific expressions

.

[5] Neubig, Graham, et al., -reliability speech translation in

   The First Workshop on Natural Language

Processing for Medical and Healthcare Fields. 2013.

[6]            

      

    , In: Ninth Workshop on Statistical

Machine Translation, Baltimore, MD, USA Association for

Computational Linguistics, p.221-228, 2014 .

[7] Alsohybe, Nabeel T., Neama Abdulaziz Dahan, and Fadl Mutaher Ba-

  -translation history and evolution: survey for Arabic-

 arXiv preprint arXiv:1709.04685 ,2017.

[8]        

devices improving system of translating languages: what about their

        

Brazilian journal of cardiovascular surgery, 30(6), pp.664-667, 2015.

[9] Yepes, Antonio Jimeno, Elise Prieur-Gaston, and Aurélie Névéol,

 data to create parallel corpora for

 1, no.

1: 146, , 2013.

[10]  

International Journal of Science, Engineering and Technology

Research, 2(3), pp.pp-716, 2013.

[11] 

[12] Shaalan, Khaled, Ahmed Rafea, Azza Abdel Moneim, and Hoda

Baraka, "Machine translation of English noun phrases into

Arabic.", International Journal of Computer Processing of

Oriental Languages 17, no. 02, pp: 121-134, 2004.

[13] Gupta, Somya. "A survey of data driven machine Translation." Diss,

Indian Institute of ,2010.

[14] Costa-  

and comparison of rule-based and statistical Catalan-Spanish machine

translation systems Computing and informatics, 31(2), pp.245-270,

2012.

Google Translate

EBMT

EBMT+Translation

Memory

EBMT+Google

BLEU Accuracy

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 10, No. 1, 2019

203 | P a g e

www.ijacsa.thesai.org

[15] Dandapat, S., Morrissey, S., Kumar Naskar, S. and Somers, H.,

  -based machine translation using

010.

[16] Seljan, Sanja, and Damir Pavuna. "Translation memory database in the

translation process." In Proceedings of the 17th International Conference

on Information and Intelligent Systems IIS 2006, pp. 327-332. Croatia,



[17] Rana Ehab   Example-based machine

translation: matching stage using internal medicine p 

International Conference on Software and Information Engineering

ICSIE, pp. 131-135, 2018.

[18] Névéol, A., Zweigenbaum, P., Max, A., Yvon, F., Ivanishcheva, Y. and

     ystematic reviews into

French, t 15(526), ,p.366K , 2013.

[19]     

      , International Journal of

Pure and Applied Mathematics, vol 118, no. 20, pp. 1649-1655, 2018.

[20] Renato, A., Castaño, J., Williams, M.D.P.A., Berinsky, H., Gambarte,

M.L., Park, H.J., Pérez-   A machine

translation approach for medical t HEALTHINF , pp. 369-378,

2018.

[21]  system description for

medical t    

on Statistical Machine Translation, pp. 229-232, 2014.

[22]      “Polish-English statistical machine

           

Internet Systems, Springer,Cham, pp. 169-179, 2015.

[23]            -

sensitive Machine Translation of Medical reports from English to

        

119, no. 16, pp. 297-304, 2018.

[24] Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho,

Unsupervised neural machine translation. In International

Conference on Learning Representations (ICLR), 2018.

[25] Amer, E. and Abd- wikipedia be a reliable source for

translation? testing wikipedia cross lingual coverage of medical

d ng (IOSR-JCE), Volume

18, Issue 3, PP 16-22, 2016.

[26] Shaalan, Khaled, Ashraf Hendam, and Ahmed Rafea, "An English-

Arabic bi-directional machine translation tool in the agriculture

domain.", In International Conference on Intelligent Information

Processing, pp. 281-290. Springer, Berlin, Heidelberg, 2010.

[27] Kouremenos, Dimitrios, Klimis Ntalianis, and Stefanos Kollias, "A

novel rule based machine translation scheme from Greek to Greek Sign

Language: Production of different types of large corpora and Language

Models evaluation", Computer Speech & Language 51, pp.110-135,

2018.

[28] Al-Taani, Ahmad T., and Zeyad M. Hailat, "A direct English-Arabic

machine translation system.", Information Technology Journal 4, no. 3

,pp: 256-261,2005.

[29]           

  , Procedia Engineering, 29, pp.3017-3022,

2012.

[30]        machine translation: a

survey, artificial intelligence r 42(4), pp.549-572, 2014.

[31] Artetxe, Mikel, Gorka Labaka, and Kepa Sarasola, "Building hybrid

machine translation systems by using an EBMT preprocessor to create

partial translations", In Proceedings of the 18th Annual Conference of

the European Association for Machine Translation, 2015.

[32] Costa-     Machine translation in medicine. a

quality analysis of statistical machine translation in the medical

  Conference on Advanced Research in Scientific Areas

(ARSA-2012), 2012.

[33]          

machine translation for biomedical tex      AMIA

Annual Symposium Proceedings, vol. 2011, p. 1290. American Medical

Informatics Association, 2011.

Arabic Machine Translation: A Survey With Challenges and Future Directions

Article

Full-text available

Dec 2021

In recent years, computer language area has witnessed important evolvement with applications in different domains. Machine Translation MT technology, considered as a subfield, has received important development with different approaches and techniques. Although, many MT systems and tools that support Arabic already exist; however, the quality of the translation is moderate and needs some improvement. In addition, the high demand for effective technologies to process and translate information from/to Arabic motivated the researchers in Arabic Machine Translation (AMT) to propose new approaches and solutions following the mainstream method, notably neural machine translation (NMT). In this paper, we provide a broad review and compare different NMT approaches for Arabic-English (and English-Arabic) machine translation research works. The discussed approaches address different linguistic and technical challenges and problems while demonstrating great success compared to traditional methods. The results of this work can serve the researchers and professional to be up-to-date and provide them with the necessary resources for modelling and improving of the AMT. These resources include corpora, toolkits, techniques and new models. The obtained results outline various findings, critics, and open issues in this area.

A Novel Framework for Sanskrit-Gujarati Symbolic Machine Translation System

Article

Full-text available

Jan 2022

Recent Progress, Emerging Techniques, and Future Research Prospects of Bangla Machine Translation: A Systematic Review

Article

Full-text available

Jan 2021

إطار عام يعتمد على التعلم العميق للترجمة الرقمية من العربية إلى الإنجليزية

Article

Jun 2024

Adopting machine translation in the healthcare sector: A methodological multi-criteria review

Article

Mar 2024
Comput Speech Lang

Arapça – Türkçe Çeviri Türlerinde Nöral Makine Çeviri Modellerinin Verimliliği: ChatGPT Örneği

Article

Full-text available

Oct 2023

Sezer Yılmaz

Holy Quran-Italian seq2seq Machine Translation with Attention Mechanism

Conference Paper

Jun 2022

Ammar Mohammed

Many machine translation studies have used large parallel groups to address sets of major European dialects. However, due to the lack of sufficient parallel information, few studies have considered Italian and Arabic. Moreover, dictionary-based translations of the Holy Quran from Arabic to Italian are usually incorrect. The meaning of the Quran has not been translated correctly. Because the dictionary-based translation considers the Quran to be a traditional document and translates it in order. This paper contributes in two ways. First, it presents a parallel corpus of 6237 Italian-Arabic sentences. Second, the paper introduces two deep learning models namely, long-shortterm memory (LSTM) sequence-to-sequence with an attention mechanism and Gated Recurrent Units (GRU) sequence-to-sequence with an attention mechanism for Arabic to Italian machine translation. Each of the proposed models is evaluated based on BLEU, ROUGE, and Cosine Similarity scores. The results indicate that the LSTM-based neural machine translation (NMT) outperforms the GRU-based NMT framework. The experimental results indicate that the LSTM model achieved mean scores of 0.96, 0.91, and 0.90 for Cosine Similarity, BLEU, and ROUGE, respectively. The GRU model achieved average scores of 0.94, 0.89, and 0.88 for Cosine Similarity, BLEU, and ROUGE scores, respectively.

Example-Based English to Arabic Machine Translation: Matching Stage Using Internal Medicine Publications

Conference Paper

Full-text available

May 2018

Automatic machine translation becomes an important source of translation nowadays. It is a software system that translates a text from one natural language to one (many) natural language. On the web, there are many machine translation systems that give the reasonable translation, although the systems are not very good. Medical records contain complex information that must be translated correctly according to its medical meaning, not its English meaning only. So, the quality of a machine translation in this domain is very important. In this paper, we present using matching stage from Example-Based Machine Translation technique to translate a medical text from English as source language to Arabic as the target language. We have used 259 medical sentences that are extracted from internal medicine publications for our system. Experimental results on BLUE metrics showed a decreased performance 0.486 comparing to GOOGLE translation which has an accuracy result about 0.536.

Machine-Translation History and Evolution: Survey for Arabic-English Translations

Article

Full-text available

Sep 2017

As a result of the rapid changes in information and communication technology (ICT), the world has become a small village where people from all over the world connect with each other in dialogue and communication via the Internet. Also, communications have become a daily routine activity due to the new globalization where companies and even universities become global residing cross countries' borders. As a result, translation becomes a needed activity in this connected world. ICT made it possible to have a student in one country take a course or even a degree from a different country anytime anywhere easily. The resulted communication still needs a language as a means that helps the receiver understands the contents of the sent message. People need an automated translation application because human translators are hard to find all the times, and the human translations are very expensive comparing to the translations automated process. Several types of research describe the electronic process of the Machine-Translation. In this paper, the authors are going to study some of these previous researches, and they will explore some of the needed tools Original Research Article Alsohybe et al.; CJAST, 23(4): 1-19, 2017; Article no.CJAST.36124 2 for the Machine-Translation. This research is going to contribute to the Machine-Translation area by helping future researchers to have a summary for the Machine-Translation groups of research and to let lights on the importance of the translation mechanism.

Can Wikipedia Be A Reliable Source For Translation?Testing Wikipedia Cross Lingual Coverage of Medical Domain

Article

Full-text available

Jun 2016

This paper introduces Wiki-Transpose, a query translation system for cross-lingual information retrieval (CLIR). Wiki-Transpose rely only on Wikipedia as information source for translations. The main goal of this paper is to check the coverage ratio of Wikipedia against specialized queries that are related to medical domain. Wiki-Transpose was evaluated using both English and Portuguese medical queries. Queries are mapped into both English and Portuguese Wikipedia concepts. Experiments showed that Wikipedia coverage ratio of queries is inversely proportional to the query size. Wikipedia coverage of single English term query is about 81%, and 80% for single Portuguese term query. This ratio is decreased when the number of terms in query increased. However, in the case of query translation, Wikipedia showed a comparative performance for (English – Portuguese) and (Portuguese – English) translation. In English – Portuguese translation, Wikipedia showed coverage ratio around 60% for single term queries, compared to 88% for Portuguese – English single term queries. The translation coverage ratio is also decreased when number of terms in query increase

Technological Devices Improve System of Translating Languages: What About their Usefulness on the Applicability in Medicine and Health Sciences?

Article

Full-text available

Dec 2015

INTRODUCTION: In a world in which global communication is becoming ever more important and in which English is increasingly positioned as the pre-eminent international language, that is, English as a Lingua Franca refers to the use of English as a medium of communication between peoples of different languages. It is important to highlight the positive advances in communication in health, provided by technology. OBJECTIVE: To present an overview on some technological devices of translating languages provided by the Web as well as to point out some advantages and disadvantages specially using Google Translate in Medicine and Health Sciences. METHODS: A bibliographical survey was performed to provide an overview on the usefulness of online translators for applicability using written and spoken languages. RESULTS: As we have to consider this question to be further surely answered, this study could present some advantages and disadvantages in using translating online devices. CONCLUSION: Considering Medicine and Health Sciences as expressive into the human scientific knowledge to be spread worldwidely; technological devices available on communication should be used to overcome some language barriers either written or spoken, but with some caution depending on the context of their applicability.

Neural-based Machine Translation for Medical Text Domain. Based on European Medicines Agency Leaflet Texts

Article

Full-text available

Sep 2015

The quality of machine translation is rapidly evolving. Today one can find several machine translation systems on the web that provide reasonable translations, although the systems are not perfect. In some specific domains, the quality may decrease. A recently proposed approach to this domain is neural machine translation. It aims at building a jointly-tuned single neural network that maximizes translation performance, a very different approach from traditional statistical machine translation. Recently proposed neural machine translation models often belong to the encoder-decoder family in which a source sentence is encoded into a fixed length vector that is, in turn, decoded to generate a translation. The present research examines the effects of different training methods on a Polish-English Machine Translation system used for medical data. The European Medicines Agency parallel text corpus was used as the basis for training of neural and statistical network-based translation systems. The main machine translation evaluation metrics have also been used in analysis of the systems. A comparison and implementation of a real-time medical translator is the main focus of our experiments.

Unsupervised Statistical Machine Translation

Conference Paper

Jan 2018

A Novel Rule Based Machine Translation Scheme from Greek to Greek Sign Language: Production of Different Types of Large Corpora and Language Models Evaluation

Article

Apr 2018

One of the aims of assistive technologies is to help people with disabilities to communicate with others and to provide means of access to information. As an aid to Deaf people, in this work we present a novel prototype Rule Based Machine Translation (RBMT) system for the creation of large and quality written Greek Sign Language (GSL) glossed corpora from Greek text. In particular, the proposed RBMT system assists the professional GSL translator in speeding up the production of different kinds of GSL glossed corpora. Then each glossed corpus is used for the production/creation of Language Model (LM) n-grams. With the GSL glossed corpus from Greek text, we can build, test and evaluate different kinds of Language Models for different kinds of glossed GSL corpora. Here, it should be noted that it does not require grammar knowledge of GSL but only very basic GSL phenomena covered by manual RBMT rules as it assists the professional human translator. Furthermore, it should also be stressed that Language Models for written GSL gloss are missing from the scientific literature, thus this work is pioneer in this field. Evaluation of the proposed scheme is carried out for the weather reports domain, where 20,284 tokens and 1000 sentences have been produced. By using the BiLingual Evaluation Understudy (BLEU) metric score, our prototype RBMT system achieves a relative score of 0.84 (84%) for 4-grams and 0.9 (90%) for 1-grams.

A Machine Translation Approach for Medical Terms

Conference Paper

Jan 2018

A Survey of Machine Translation Approaches

Article

Oct 2013

Vijay Laxmi

Study and comparison of rule-based and statistical catalan-spanish machine translation systems

Article

Jan 2012
COMPUT INFORM

Machine translation systems can be classified into rule-based and corpusbased approaches, in terms of their core methodology. Since both paradigms have been largely used during the last years, one of the aims in the research community is to know how these systems differ in terms of translation quality. To this end, this paper reports a study and comparison of several specific Catalan-Spanish machine translation systems: two rule-based and two corpus-based (particularly, statisticalbased) systems, all of them freely available on the web. The translation quality analysis is performed under two different domains: journalistic and medical. The systems are evaluated by using standard automatic measures, as well as by native human evaluators. In addition to these traditional evaluation procedures, this paper reports a novel linguistic evaluation, which provides information about the errors encountered at the orthographic, morphological, lexical, semantic and syntactic levels. Results show that while rule-based systems provide a better performance at orthographic and morphological levels, statistical systems tend to commit less semantic errors. Furthermore, results show all the evaluations performed are characterised by some degree of correlation, and human evaluators tend to be specially critical with semantic and syntactic errors.

English-Arabic Hybrid Machine Translation System using EBMT and Translation Memory

Abstract

Recommended publications

WE'RE COMMITTED TO MAKING A MEANINGFUL IMPACT ON THE WORLD

The Impact of The Centre for Secure Information Technologies (CSIT)

Transforming the Lives of People with Cystic Fibrosis

The Food Fortress- From A Crisis to The Formation of An Innovative Food Quality Assurance Scheme

An Automatic Evaluation for Online Machine Translation: Holy Quran Case Study

الترجمة الآلية إلى اللغة العربية.. صعوبات وتحديات "ترجمة غوغل" مثالا (Machine Translation into Arabi...

Example-Based English to Arabic Machine Translation: Matching Stage Using Internal Medicine Publicat...

Kidanemariam Firew- A Hybrid Machine Translation System for English to Wolaytta Language

Evaluation of Machine Translation Methods applied to Medical Terminologies

English to Arabic Machine Translation Based on Reordring Algorithm