ArticlePDF Available

Evaluation of Instagram's Neural Machine Translation for Literary Texts: An MQM-Based Analysis

February 2024
GEMA Online Journal of Language Studies 24(1):213-233

February 2024
24(1):213-233

DOI:10.17576/gema-2024-2401-13

Authors:

Altaf Fakih

Universiti Sains Malaysia

Abdul-Hafeed Ali Fakih

Najran University

Manjet Kaur Mehar Singh

Universiti Sains Malaysia

Addressing the global increase in social media users, platforms such as Instagram introduced automatic translation to broaden information dissemination and improve cross-cultural communication. Yet, the accuracy of these platforms' machine translation systems is still a concern. Therefore, this paper aims to explore the potential of Neural Machine Translation utilized by Instagram in producing high-quality translations. In doing so, this study attempts to scrutinize the reliability of Instagram's "See Translation" feature in the translation of literary texts from Arabic to English. A selection of auto-translated Instagram captions is analyzed through the identification, classification, and assignment of error types and penalty points, utilizing the MQM core typology. Subsequently, the Overall Quality Score of the error-based analysis is calculated automatically using the ContentQuo platform. Furthermore, the study investigates whether Instagram Neural Machine Translation can effectively convey the intended message within literary texts. From 30 purposively selected Instagram captions with literary content, the study found Instagram's machine translation lacking in 90% of cases, particularly in accuracy, fluency, and style. Among these, 61 errors were identified: 26 in fluency, 25 in accuracy, and 10 in style, adversely affecting the quality and failing to convey the original message. The findings suggest a need for enhanced algorithms and linguistic architecture in Neural Machine Translation systems to better recognize linguistic variants and text genres for more accurate and fluent translations.

The MQM-Core Error Typology (http://www.themqm.info/ (Last access 7/3/2023)).

…

A Metric Designed for the based on the MQM Framework Evaluation in the Present Study

…

The Overall Quality Score at ContentQuo

…

Issue Severity Levels

…

Example 2 of Non-Fluent Translation

…

Figures - uploaded by Manjet Kaur Mehar Singh

Content may be subject to copyright.

Content uploaded by Abdul-Hafeed Ali Fakih

Content may be subject to copyright.

Content uploaded by Manjet Kaur Mehar Singh

Content may be subject to copyright.

GEMA Online® Journal of Language Studies 213

Volume 24(1), February 2024 http://doi.org/10.17576/gema-2024-2401-13

eISSN: 2550-2131

ISSN: 1675-8021

Evaluation of Instagram's Neural Machine Translation for Literary Texts:

An MQM-Based Analysis

Altaf Fakih a1

a.afakih97@gmail.com

School of Languages, Literacies and Translation

Universiti Sains Malaysia, Malaysia

Mozhgan Ghassemiazghandi b2

mozhgan@usm.my

School of Languages, Literacies and Translation

Universiti Sains Malaysia, Malaysia

Abdul-Hafeed Fakih

a.hafeed1@gmail.com

Najran University, Saudi Arabia

Ibb University, Yemen

Manjet K. M. Singh

manjeet@usm.my

School of Languages, Literacies and Translation

Universiti Sains Malaysia, Malaysia

ABSTRACT

Addressing the global increase in social media users, platforms such as Instagram introduced

automatic translation to broaden information dissemination and improve cross-cultural

communication. Yet, the accuracy of these platforms' machine translation systems is still a

concern. Therefore, this paper aims to explore the potential of Neural Machine Translation utilized

by Instagram in producing high-quality translations. In doing so, this study attempts to scrutinize

the reliability of Instagram's "See Translation" feature in the translation of literary texts from

Arabic to English. A selection of auto-translated Instagram captions is analyzed through the

identification, classification, and assignment of error types and penalty points, utilizing the MQM

core typology. Subsequently, the Overall Quality Score of the error-based analysis is calculated

automatically using the ContentQuo platform. Furthermore, the study investigates whether

Instagram Neural Machine Translation can effectively convey the intended message within literary

texts. From 30 purposively selected Instagram captions with literary content, the study found

Instagram's machine translation lacking in 90% of cases, particularly in accuracy, fluency, and

style. Among these, 61 errors were identified: 26 in fluency, 25 in accuracy, and 10 in style,

adversely affecting the quality and failing to convey the original message. The findings suggest a

need for enhanced algorithms and linguistic architecture in Neural Machine Translation systems

to better recognize linguistic variants and text genres for more accurate and fluent translations.

Keywords: Literary Text Translation; Multidimensional Quality Metrics; Neural Machine

Translation; Translation Quality Assessment

a Main author

b Corresponding author

GEMA Online® Journal of Language Studies 214

Volume 24(1), February 2024 http://doi.org/10.17576/gema-2024-2401-13

eISSN: 2550-2131

ISSN: 1675-8021

INTRODUCTION

Instagram is a popular social networking platform where users share updates in the form of photo

and/or audio-visual elements as posts. These posts are often accompanied by textual details,

referred to as captions. According to Dixon's (2023) report on Statista, Instagram is currently the

fourth most popular social media platform, with 1.28 billion active users as of January 2022, and

is projected to reach 1.44 billion monthly active users by 2025. To address the growing global use

of Instagram, Meta (formerly Facebook), the parent company of Instagram, is developing new

Machine Translation (MT) innovations and MT systems to facilitate cross-lingual communication

among users from different countries, aiming at improving the effectiveness of interactions among

global users regardless of their language backgrounds. In this context, Meta has recently

introduced a new innovative MT system, called the No Language Left Behind (NLLB-200), built

upon a single artificial intelligence-based model that uses neural networks which was claimed that

they matched human’s performance. Given the consistent developments in the MT field, regular

evaluations of MT systems are highly required to monitor their quality and identify areas for

improvement. To this aim, the current study seeks to closely examine the quality of the recent

NMT system implemented in Instagram. Another gap that will be bridged in this study is the lack

of the studies that address the quality of “See translation” feature of Instagram in translating

literary texts in the Arabic context. As each language has its own unique characteristics, it is

profitable to conduct evaluations on various languages and context that yields to revealing the

language-related strong and weak aspects of each development, subsequently leaving a plenty of

room for improvement.

The study's significance lies in its evaluation of machine translation, an essential area of

research that helps improve the performance of existing MT systems and understand how they

function (Dorr et al., 2011). Furthermore, Trigueros (2021) stressed that there is a need for more

standardization for MT quality assessment and error analysis. Therefore, this study's findings can

serve as a valuable reference for the translation technology field in general and for MT evaluation

development, translation error-analysis methodology, and computational linguistics in particular.

Additionally, this study will offer practical benefits by providing insights for Meta developers to

improve the algorithms and linguistic architecture of their MT models, as well as grow awareness

among Instagram users of the reliability level of the instant translations provided by the platform.

Given the latest development of MT implemented in Instagram, this study aims to closely inspect

the potential of “See Translation” feature in auto-generating adequate translations, thereby check

on the concerns regarding the quality of such feature and highlighting where improvement is

needed for Meta developers. To this end, this study seeks to achieve the following two objectives:

a) To assess the quality of Instagram's Neural Machine Translation (NMT) system in translating

literary texts by using an analytical error-based approach, which utilizes the MQM system that

includes structured translation specifications, an error typology, and a scoring system integrated

with the ContentQuo platform; b) To examine the NMT system’s ability to convey the expressive

function of literary texts, as defined by Nord's translation function theory.

GEMA Online® Journal of Language Studies 215

Volume 24(1), February 2024 http://doi.org/10.17576/gema-2024-2401-13

eISSN: 2550-2131

ISSN: 1675-8021

LITERATURE REVIEW

MACHINE TRANSLATION OVERVIEW

Machine Translation is an interdisciplinary paradigm that involves different fields, including

Natural Language Processing (NLP), which focuses on developing and optimizing computer-

based translation systems (Ameur et al., 2020). It is also considered a branch of Computational

Linguistics, which investigates the use of computer software to translate text from one natural

language to another (Arnold et al., 1994; Sipayung et al., 2021). Due to its multidimensional

nature, MT presents complexities different from those in Human Translation and is continuously

evolving with the development of technology. MT is the process of translating text from one

language into one or more other languages utilizing computer-based systems and tools, and may

involve varying degrees of human intervention.

The increasing demand for translation, driven by economic globalization, exceeded human

capacity to handle all translation tasks, leading to the introduction of automatic translation systems

and resulting in significant changes in related fields. MT systems, according to their computational

architecture (Chéragui, 2012), are classified into four approaches: Rule-based MT (RBMT)

approach, Corpus-based MT (CBMT) approach, Hybrid MT approach, and Neural MT approach

(Trigueros, 2022). The first approach was Rule-based MT, which used two linguistic sub-

approaches: transfer and interlingua (Chéragui, 2012) that relied on monolingual and bilingual

dictionaries, grammar and transfer rules for generating translations (Espana-Bonet & Costa-jussa,

2016; Castilho et al., 2017; Trigueros, 2022). Later on, Corpus-based MT was introduced as an

alternative approach for MT in order to overcome the shortcomings of RBMT (Chéragui, 2012),

and was the first approach of data-driven methods that used sophisticated algorithms and

mathematical models to automatically learn the translation process from data (Ameur et al., 2020).

CBMT used monolingual and bilingual corpora of parallel texts in the translating process

(Hutchins, 1995). This approach was divided into two systems: Statistical MT system (SMT) and

Example-based MT system (EBMT). The advantage of this system is that it requires less human

effort for automatic training, along with its solid performance in terms of selection (Hutchins,

2007; Koehn, 2009; Trigueros, 2022). However, it sometimes outputs bad-quality translations that

are ill-structured or grammatically incorrect attributed to the difficulty in reaching corpora of

specific domains or language pairs (Habash et al., 2009; Espana-Bonet & Costa-jussa, 2016;

Trigueros, 2022). Nevertheless, corpus-based systems dominated the field for a while as many MT

developers adopted the approach to their MT systems, including Google Translate, Facebook, and

Instagram. The hybrid MT approach combines both Rule-based MT and statistical MT systems,

resulting in a solution that overcomes the deficiencies of each system and produces high-quality

translations with a high level of precision (Thurmair, 2009; Hunsicker et al., 2012; Tambouratzis

et al., 2014; Trigueros, 2022).

Moreover, most recently, a new data-driven MT approach, called Neural Machine

Translation (NMT) has been developed with a different mechanism. NMT is the latest technology

in Artificial Intelligence (AI), which consists of a system that uses neural networks and works in

building and training a single large neural network that reads a sentence and outputs correct

translations (Bahdanau et al., 2014; Trigueros, 2022). This system is based on the encoder-decoder

model in which the encoder reads the input and encodes it into a fixed length vector while the

decoder produces the translation output from the encoder vector (Cho et al., 2014; Bahdanau et

al., 2014; Trigueros, 2022). NMT represents the latest development of MT systems, which has

become the dominant paradigm that is currently applied in machine translation field (Ragni &

GEMA Online® Journal of Language Studies 216

Volume 24(1), February 2024 http://doi.org/10.17576/gema-2024-2401-13

eISSN: 2550-2131

ISSN: 1675-8021

Vieira, 2021; Trigueros, 2022). Moreover, Trigueros (2022) pointed out that the architecture of

NMT is characterized by some advantageous properties that prior MT systems do not own. For

instance, it uses fewer components and processing steps, and it requires less memory than SMT.

Moreover, it allows the use of human and data resources more efficiently than RBMT (Cho et al.,

2014; Bentivogli et al., 2016; Trigueros, 2022). Furthermore, the findings revealed that NMT

output contained fewer overall errors compared to SMT at the accuracy and fluency levels (Wu et

al., 2016; Castilho et al., 2017; Moorkens, 2018; Ragni & Vieira, 2021) since the neural networks

can be trained to recognize patterns in data and deal with massive amount of language data with

much ease, hence making NMT output more accurate (Das, 2018). Such characteristics have

pushed Meta, along with many other major companies, such as Google, Systran, and Microsoft

(Ameur et al., 2020; Trigueros, 2022) to shift from SMT and RBMT approaches to Neural MT

approach.

NMT OF INSTAGRAM

In 2017, Meta, Instagram’s parent company, announced its shift from phrase-based statistical

machine translation to neural machine translation (Mannes, 2017), resulting in more accurate and

fluent translations (Pino et al., 2017). In 2020, Meta introduced a new neural machine translation

model, the multilingual machine translation (M2M-100), which automatically translates between

any pair of 100 languages, including translation across 2,200 language pairs, without relying on

English as an intermediary source. The M2M-100 model aims to improve translation quality for

low-resource languages (Bhattacharyya, 2022). Additionally, Meta developed a single artificial

intelligence-based model, the No Language Left Behind (NLLB-200), which translates 200

languages, including those not adequately addressed by machine translation tools in Instagram.

The NLLB-200 model aims to improve the quality of machine translations and facilitate

communication worldwide. Meta evaluated the NLLB-200 model using automatic evaluation

metric, the BLEU algorithm, which measures how closely machine translations match human

translations and reported that it achieved BLEU scores that were 44% higher than any previous

record (Meta, 2022). Therefore, this study investigates whether the new advanced model, NLLB-

200, of Instagram MT can make any improvements in this respect.

MT QUALITY EVALUATION

Various studies have evaluated the quality of Instagram machine translation (MT) since its

introduction. Fadilah (2017) identified three types of semantic errors in its output: referential,

grammatical, and contextual. Grammatical and contextual errors were the most frequent, while the

translation of dictionary meaning performed better. Mawarni et al. (2017) focused on cultural-

specific terms (CSTs) and found a loss of meaning in the translations, which failed to transfer the

expressive meaning to the target culture. In line with previous findings, MT succeeded in

translating referential meaning but failed in translating pragmatic meaning.

Furthermore, Meilasari (2019) evaluated the accuracy of Instagram MT translations related

to ecology and environment vocabulary. The study found that the MT was unreliable, with 40%

of the translations being inaccurate and only 24% being accurate. Susanti (2018) analyzed

Instagram MT translations and identified incorrect and missing words as the most frequent lexical

errors. The study also found that the MT tended to use a word-for-word translation method,

resulting in a lack of recognition of the text's context and failing to represent the authentic

GEMA Online® Journal of Language Studies 217

Volume 24(1), February 2024 http://doi.org/10.17576/gema-2024-2401-13

eISSN: 2550-2131

ISSN: 1675-8021

language. Other researchers have compared the quality of Instagram machine translation with

human translation. Arvianti (2018) compared the performance of Instagram MT and human

translators in translating formal and informal language. The study found that while Instagram MT

produced good translations for formal language, it failed to translate texts written in an informal

language. Human translators were better able to recognize particular languages and better

understand context due to their more extensive vocabulary and context understanding. In addition,

Instagram MT translations have been compared to output from other MT systems. Larassati et al.

(2019) evaluated the output of neural machine translation utilized in Google Translate and

Instagram and found that both systems had translation errors, with Instagram MT having more

errors. The most frequent error types were terminology errors, syntax errors, and literalness, which

were interrelated. Similarly, Pujakesuma (2022) found that Google Translate and Instagram MT

made similar errors, such as mistranslation, and applied the same translation strategies, such as

literal translation.

Moreover, some researchers have evaluated Instagram MT by exploring its translation

strategies. Purwaningsih (2019) investigated the translation strategies employed by Instagram MT

in translating culturally specific Indonesian items, particularly Banyumas Batik motifs. The study

found that Instagram MT used three techniques, including literal translation, borrowing, and

particularization, with borrowing being the dominant technique for translating cultural items.

However, this led to a loss of the cultural sense. Purwaningsih (2019) recommended that Instagram

developers enrich the MT with a more extensive contextual linguistics database to improve the

quality of translation results.

The current study will focus on evaluating the output of the new innovative model “NLLB-

200” recently implemented and claimed to produce more accurate translations than the prior MT

models that were examined by the previous studies. Furthermore, the existing literature on

Instagram NMT evaluation has apparently focused on the Indonesian-English language pair,

leaving a research gap for Arabic language translation. Ameur et al. (2020) note that there are still

many linguistic problems related to Arabic that require further investigation as they pose

significant challenges to current Arabic MT systems. Therefore, this study aims to fill this research

gaps by evaluating the translation quality of the new AI-powered MT system for Arabic captions.

TEXT TYPE

Nord (2005) proposed a tripartite model of the functions of linguistic signs inspired by Bühler's

(1934) work, which includes four basic functions of communication in language: referential,

expressive, operative, and phatic. The referential function focuses on the meaning or content

referred to and represented in informative texts, such as scientific articles and news. The expressive

function refers to the emotions and attitudes of the sender towards the referred object, thought, or

idea, as often found in texts of high aesthetic value, such as literary works. The operative or

appellative function is concerned with the direction of the text toward the addressee. The phatic

function, attributed to Roman Jakobson, focuses on establishing communication between sender

and receiver and attracting the attention of the receiver regarding certain things. The expressive

function implied in literary texts is the focus of this study; it seeks to explore how Instagram NMT

can deal with the unique sentence structures, cultural elements, and aesthetic features present in

literary language that have fewer counterparts stored in the MT database. Additionally, the study

aims to investigate the extent to which this system can convey the expressive function inherent in

literary texts.

GEMA Online® Journal of Language Studies 218

Volume 24(1), February 2024 http://doi.org/10.17576/gema-2024-2401-13

eISSN: 2550-2131

ISSN: 1675-8021

METHOD

RESEARCH DESIGN

The present study utilized a qualitative descriptive method and analyzed the written content

(captions) taken from the @cairo_mockingbird Instagram account, a virtual platform for visual

arts and literary writings. The account contains over 12,000 posts, each featuring a photo and a

caption written in Arabic (as of last access on 24/3/2023). It is a community platform designed for

displaying visual arts and literary writings. To achieve the objectives of the study, the selected data

were analyzed by using a non-DEJ-based analytical evaluation method called the

Multidimensional Quality Metrics (MQM) core typology.

The latest version of MQM, from October 2021, was developed to allow a harmonization

with TAUS DQF (Dynamic Quality Framework) error typology, which resulted in creating a

flexible subset to MQM. The MQM error typology contains eight high-level dimensions; seven

dimensions are the core and the eighth is additional to provide a wide range of more detailed error

types that can be used where implementers require greater granularity. The tree view format

illustrated in Figure 1 below depicts the MQM-Core error typology. Each dimension consists of

more specific error subtypes:

• Accuracy contains three subtypes: mistranslation, over-translation, under-translation,

addition, omission, Do not translate (DNT), and untranslated. It involves the errors occurring

when the target text does not accurately represent the propositional content of the source text,

either by distorting, mistranslating, omitting, or adding to the original message.

• Fluency (or Linguistic Conventions) comprises four subcategories: grammar, punctuation,

unintelligible, and character coding. It focuses on the errors related to the linguistic form of

the text, including problems with grammaticality, orthography, and other mechanical

correctness.

• Terminology category includes three subcategories: inconsistent with terminology resource,

inconsistent use of terminology, and wrong term and it includes incorrect terms in the target

text that are not equivalent of the corresponding term in the source text.

• Style includes organizational style, third-party style, inconsistent with external reference,

that are grammatically acceptable but are inappropriate because they deviate from

inappropriate language style or organizational style guides.

• Audience appropriateness contains only one subcategory: cultural-specific reference. In this

category, the errors arising from the use of content in the translation product that is invalid or

inappropriate for the target audience are addressed.

• Locale conventions are the issues related to the locale-specific content (e.g., date/name

format, calendar type, postal code, locale-specific punctuation, or national language standard)

or formatting requirements for data elements.

• Design and markup include the issues related to the physical design (e.g., graphics and tables)

or the layout of a translation product.

• Custom: Any other issue observed or suggested by the evaluator(s) can be added to this

category.

GEMA Online® Journal of Language Studies 219

Volume 24(1), February 2024 http://doi.org/10.17576/gema-2024-2401-13

eISSN: 2550-2131

ISSN: 1675-8021

This study employed a non-DEJ-based evaluation method, in which the judge (annotator)

assesses the translation quality indirectly. Such evaluation methods are commonly used to evaluate

the accuracy and fluency of both human and machine translation results and involve comparing

either the source text with the target text or the target text with the translation reference

(Chatzikoumi, 2020). The rationale behind the choice of MQM core typology over other existing

analytical-based approach lies on two reasons. Firstly, this approach is based on a functional-

oriented perspective that was formulated on Melby’s (2002) paralleled work, which parallels

Skopos theory and the translation brief, and Nord’s extension of Skopos theory (1997), which is

known as Functionalism in translation theory and practice. This mainly serves the fulfillment of

the study’s second objective that involves investigating the expressive function included in the MT

translations. Secondly, MQM-core typology is characterized by its flexibility and usability

(Lommel et al., 2013). That is, the framework can be adjusted in a way that it serves the purpose

of the analysis and accounts for specific required needs. Besides, it should be noted that the MQM

can be applicable to professional translations as well as to MT output, i.e., the metric is designed

to evaluate the translation product, regardless of how the target text is generated.

MQM-core typology quality assessment metric includes three different stages for the

evaluation. The first stage is the Preliminary Stage that is conducted before the evaluating process

and includes three phases: Translation Specifications Evaluation, Evaluation Metric Design, and

Data Collection. The second stage Error Annotation is where annotation and error analysis of the

data is taken place, and lastly, the third stage Automatic Calculation includes the calculation of the

Overall Quality Score of the analysis. Second and third stages were conducted in ContentQuo

platform to assure more accurate results. Each evaluation stage is elaborated in details in next

section.

FIGURE 1. The MQM-Core Error Typology (http://www.themqm.info/ (Last access 7/3/2023)).

GEMA Online® Journal of Language Studies 220

Volume 24(1), February 2024 http://doi.org/10.17576/gema-2024-2401-13

eISSN: 2550-2131

ISSN: 1675-8021

TRANSLATION QUALITY EVALUATION (TQE) STAGES

STAGE 1: PRELIMINARY STAGE

TRANSLATION SPECIFICATIONS

In this phase, we determined translation parameters that should be met adopted from the 2006

ATSM Standard Guide for Quality Assurance in Translation’s structured translation specification

framework (ASTM F2575-06). They include metadata of the text under evaluation and its original.

This step is prerequisite as it works as a guideline for the evaluators or annotators to determine the

translation quality parameters that the translated text should meet and upon which the text should

be evaluated. The translation parameters, as shown in Table 1 and 2, and adopted in this paper,

were selected to align with the objectives of this study.

TABLE 1. Source Content Information

Textual characteristics

Source language

Arabic (Modern Standard Arabic and Egyptian Arabic)

Text type

Literary texts

Audience

Instagram users who are familiar with Arabic language and culture

Purpose

Expressive function: the text is intended to convey a particular message in the

mind of an author in an artistic form.

Specialized language

(Subject field)

The captions consist of sayings and texts quoted from novels and other

literary sources.

Specialized language

(Terminology)

The texts do not include specialized or complicated terminology, but rather

everyday use vocabulary. Therefore, it does not require a specialized term

base.

Volume

30 captions (376)

Complexity

Some captions are written in a straightforward form while some others in an

artistic style.

Origin

The source texts are captions posted on @cairo_mockingbird Instagram

account.

TABLE 2. Target Text Requirements

Target language

English

Audience

Instagram users who can understand English.

Purpose

Expressive function

Content Correspondence

The ST should be translated accurately and fluently.

Texts written in Modern Standard Arabic should be translated into formal

English while texts written in Egyptian Arabic should be translated into

informal-colloquial English.

Format

Captions underneath a photo on Instagram

Style

Stylistics should be taken into consideration in translating the ST.

EVALUATION METRIC DESIGN

A metric is a measurement with a specific purpose (Lommel & K. Melby, 2018). Due to the scope

of this study, researchers did not include all the dimensions appeared in Figure 1 and designed a

specific metric for evaluation that served the aim and objectives of the study. To verify the metric

of the evaluation, three dimensions were selected: accuracy, fluency, and style, as per the MQM

core typology and its subsets, as shown in Figure 2. The main goal of an MT system is to

automatically translate text while preserving its meaning and style, ensuring that the output is as

GEMA Online® Journal of Language Studies 221

Volume 24(1), February 2024 http://doi.org/10.17576/gema-2024-2401-13

eISSN: 2550-2131

ISSN: 1675-8021

linguistically fluent as possible (Ameur et al., 2020). Accordingly, the evaluation focused on three

aspects: accuracy (adequacy), which considers the semantic and pragmatic equivalence of lexis

between the source and target texts; fluency, which refers to the linguistic conventions of the target

language and naturalness (Chatzikoumi, 2020); and style, which measures the extent to which the

translated text uses appropriate language to convey the message effectively. Thus, the evaluation

encompassed the lexical, syntactic, semantic, pragmatic, and stylistic aspects of the translated

texts. The errors extracted from the TT were measured according to the following Error Severity

Levels:

1. Minor errors, which do not affect the comprehension of meaning but affect the fluency

(Weight: 1)

2. Major errors, which make TT difficult to understand, yet the general message is conveyed.

(Weight: 5)

3. Critical errors, which change the meaning of ST and make it incomprehensible or distorted

(Weight: 10)

FIGURE 2. A Metric Designed for the based on the MQM Framework Evaluation in the Present Study

DATA COLLECTION

The data collection process included three phases: (i) selecting the source of the data, namely,

an Instagram account, (ii) selecting the data (captions), and (iii) collecting the selected data.

Data source selection phase is determined by the following criteria:

• The source material should contain the data that are necessary to answer the stated

research objectives.

• The data included in the source should be within the scope of the study.

• The source should be a verified account with a substantial number of followers.

• Data included in the account should be in form of captions (texts) and not audio-visual

elements.

GEMA Online® Journal of Language Studies 222

Volume 24(1), February 2024 http://doi.org/10.17576/gema-2024-2401-13

eISSN: 2550-2131

ISSN: 1675-8021

The researchers selected the @cairo_mockingbird Instagram account as it met the data

source selection criteria. The account shares a variety of literary writings daily, providing ample

samples for the evaluation and contributing to answering the research questions. All captions are

written in Arabic, ensuring the data remains within the study's scope. Additionally, the account is

verified with 826 thousand followers (as of last access on 24/3/2023). Finally, the account often

uploads literary writings illustrated in a photo with the text replicated in a caption below the photo,

making the data easily accessible.

In the study, there are two types of data: first, the original captions written in Arabic,

referred to as “Source Text” (ST), and secondly, the English machine translations that is referred

to as “Instagram Machine Translation” (IMT). The researchers added another additional data that

includes human translations, referred to as “HT”, for the captions as a reference for the reader. The

two types of data were collected manually by the researchers using a purposive sampling. Firstly,

the researchers read intently all the captions posted on the @cairo_mockingbird Instagram

account. Secondly, 30 captions were selected purposively, ranging from short to medium-length

sentences (total of 376 words) written in Modern Standard Arabic (MSA) and Egyptian Dialect in

the form of a poetic language. Thirdly, the researchers collected the translated results manually

after tapping on “ اﻧﻈﺮ ﻟﻠﺘ ﺮﺟﻤﺔ ” or “See Translation” feature set beneath the selected captions, which

instantly translates the captions into English, the language set in their personal Instagram

application. Finally, source texts (the captions) and target texts (their translations) were collected

and divided into segments. Each segment pair, containing corresponding content (a source text and

target text), is termed a translation unit (TU).

STAGE 2: ERROR ANNOTATION

In this stage, the annotation was conducted semi-automatically using the harmonized MQM-Core

Typology and DQF error typology, integrated with ContentQuo platform. The annotators (two

experienced translators along with a skilled linguist) examined the translated text against the

source text based on the agreed translation specifications, and analytically annotated errors, which

involved identifying, classifying, and assigning error type and penalty points, in accordance with

the designed metric.

STAGE 3: AUTOMATIC CALCULATION

At this stage, the Overall Quality Score was calculated automatically by ContentQuo according to

the selected scoring model using the following formula: QualityScore = 100 - 100 * (ErrorPoints

/ Wordcount), then compared to the Threshold Value (100%) to assign a pass/fail rating.

ANALYSIS AND RESULTS

This section shows the results of the analytical evaluation conducted on the Instagram NMT

translation of 30 captions selected from @cairo_mockingbird account using ContentQuo platform.

As illustrated in Figure 3, Instagram NMT failed at translating 90% of the data from three different

aspects: accuracy, fluency, and style. 61 errors were found in the selected data, classified as

follows: 26 errors in Fluency, 25 errors in Accuracy, and 10 errors in Style. The severity of these

errors ranged from minor to critical, as depicted in Figure 4 and 5. In accuracy, Quality Score was

the lowest because the errors found in this category were critical and seriously affected the content

message as only 39.9% of the content was translated correctly. Fluency category came in the

GEMA Online® Journal of Language Studies 223

Volume 24(1), February 2024 http://doi.org/10.17576/gema-2024-2401-13

eISSN: 2550-2131

ISSN: 1675-8021

second lowest Quality Score as the fluency-related errors was less severe on the original content

message than the accuracy-related errors. Lastly, as inferred from the style issues that they slightly

affected the meaning of the sentences in which they were found. Each category has been explained

in more detail below.

FIGURE 3. The Overall Quality Score at ContentQuo

FIGURE 4. Issue Severity Levels

GEMA Online® Journal of Language Studies 224

Volume 24(1), February 2024 http://doi.org/10.17576/gema-2024-2401-13

eISSN: 2550-2131

ISSN: 1675-8021

FIGURE 5. Quality Score of Each Category

ACCURACY

Accuracy categories concerned how the MT system recognized the meaning of the source text and

reproduced it in the target text. Based on the results of the TQE, Instagram MT produced plenty

of errors under this category. For instance, Instagram MT system was unable to recognize the exact

meaning of the ST term within the context, thereby failing to find an appropriate equivalent. As

shown in Table 3, The TM was unable to recognize the exact meaning of the polysemous word

(ﺑﺤﺚ search), so it incorrectly chose ‘research’ as the equivalent in the TL.

TABLE 3. Example 1 of Inaccuracy

IMT

اﻷﻣﺎن ﺟﻤﯿﻞ ﺟﺪا

، أظﻨﮫ اﻟﺸﻌﻮر اﻟﻮﺣﯿﺪ

اﻟ ﺬي ﯾﺴﺘﺤﻖ ﻋﻨﺎء اﻟﺒﺤﺚ

Security is so beautiful, I think it’s

the only feeling worth the effort to

research.

Security is so beautiful. I think it is

the only feeling worth seeking out.

Moreover, in literary texts, authors sometimes represent the message they want to express

as a figure of speech, such as a metaphor. Instagram MT struggled with understanding and

translating the metaphors in the source texts. In Table 4, the MT system’s word-for-word

translation of the caption distorted the intended meaning of the metaphor. The vehicles ( ﺳﻤ ﺎ ء Sky)

and ( أرض Earth) that carry the meaning of the topic (God) and (People) indicates that God is above

the sky and people are down on the earth. The MT translation failed to convey the ground

relationship implied, resulting not only distorting the meaning but also the aesthetic value of the

literary device, i.e., metaphor.

TABLE 4. Example 2 of Inaccuracy

IMT

ﻧ

ﻠﺘ

ﻤ

ﺲ

ﺑﺎﻟﺴﻤﺎء ﻣﺎ ﺗ

ﺮﻓﺾ

اﻷرض

أن ﺗﻤﻨﺤﮫ

ﻟﻨﺎ.

We touch heaven what the earth

refuses to give us.

We ask God what people refuse to

grant us.

GEMA Online® Journal of Language Studies 225

Volume 24(1), February 2024 http://doi.org/10.17576/gema-2024-2401-13

eISSN: 2550-2131

ISSN: 1675-8021

Furthermore, within the same caption, it was found that Instagram MT system tended to

misread the captions even though they were written in a direct sentence structure and partially or

fully vocalized. Unlike English, Arabic language is characterized by having no letters to represent

the vowel sounds. Instead, the Arabic writing system uses small signs that are added above or

below the letters as vowel sounds called diacritics and the presence of such diacritical signs is

known as “Vocalization” (Ameur et al., 2020). Vocalization clarifies the way of reading words

and their exact meanings which indeed helps in solving lexical, semantic, and pragmatic

ambiguities in translation. The MT system in the present study failed to misread these signs, hence

reproducing wrong equivalents. In Table 4, the verb ( ﻧ

ﻠ

ﺘ

ﻤ

ﺲ

ُ), meaning (ask for) was mistranslated

into (touch ﻧ

ﺘ

ﻠ

ﻤ

ﺲ

َ). It can be concluded that the MT system still cannot decide the exact equivalent

for a word with or without vocalization.

TABLE 5. Example 3 of Inaccuracy

IMT

ﻧﺠﯿﺐ ﻣﺤﻔﻮظ

Najeeb is safe

- Naguib Mahfouz

د. ﺟﺎﺳﻢ ا ﻟﻤ ﻄﻮ ع

Jasim the volunteer - د

- Dr. Jasem Al-Mutawa

Additionally, one of the most frequent translation errors that Instagram MT produced was the

mistranslation of proper names. As demonstrated in Table 5, the MT system transliterated the first

names while it translated literally the surnames. This type of proper nouns falls under “Adjective

Constituent” noun-compound class; it consists of noun and adjective which are connected with

each (Bounhas & Slimani, 2009; Omar & Al-Tashi, 2018). This is a common issue occurring when

translating from Arabic into English because the Arabic language lacks a unified system or rules

in writing named entities in, such as capitalization. Additionally, the rich lexical variations and

highly inflected nature of Arabic further complicate this issue.

FLUENCY

Fluency error categories include errors related to the linguistic well-formedness of the translated

text, including morphology, syntax, orthography, and sentence readability. The evaluation results

showed that fluency errors were the most frequent errors produced by Instagram MT. These errors

range from minor ones affecting only the TT’s fluency, to major errors that make the text hard to

understand but convey the general message, and critical errors that distort the meaning and make

the TT unintelligible.

One of the root causes that led to the fluency errors was the flexible word order of Arabic

language. Unlike English language that has only one rigid SVO word order, Arabic has a flexible

sentence structure that can occur in multiple orders, such as SVO, VSO, OVS, etc. These flexibility

poses several problems when translating from Arabic into English. MT systems, built on fixed

encoding and decoding mechanisms and algorithms, often get confused by the multiple sentence

structure (i.e., word order) that Arabic language can take, particularly in literary texts. Therefore,

these MT systems fail to produce the Arabic text into the TT. For example, as shown in Table 6,

Instagram MT mistranslated the Arabic sentence that has an OVS word order, resulting in an

unintelligible output.

GEMA Online® Journal of Language Studies 226

Volume 24(1), February 2024 http://doi.org/10.17576/gema-2024-2401-13

eISSN: 2550-2131

ISSN: 1675-8021

TABLE 6. Example 1 of Non-Fluent Translation

IMT

ﻋﻠﻰ دف ء ا ﻟ ﻌ ﺎ ﺋ ﻠ ﺔ ﺗ ﺘ ﻜﺊ ا ﻟ ﺒ ﯿﻮت.

Warmth of family leaning homes.

A home rests on the warmth of

a family.

Another problem that was observed during the TQE of Instagram MT translations was that

they lacked pronoun-antecedent agreement. In English, the pronoun and its antecedent (the word

to which a pronoun refers) must agree in number, person, and gender. The MT system translated

each caption segment separately. The pronoun (i.e., it) in the target text in Table 7, for instance,

contradicts with its antecedent (years) in number. The MT system read and translated the two

sentences independently, out of the context, resulting in incohesive translations.

TABLE 7. Example 2 of Non-Fluent Translation

IMT

ﻷﻋﻮ ا م ﺗ ﻐ ﯿ ﺮ ا ﻟ ﻜ ﺜ ﯿ ﺮ . . أ ﻧ ﮭ ﺎ ﺗ ﺒ ﺪ ل ﺗﻀﺎ ر ﯾﺲ

اﻟﺠﺒﺎ ل، ﻓ ﻜ ﯿﻒ ﻻ ﺗﺒ ﺪل ﺷﺨﺼﯿﺘﻚ؟

- أﺣﻤﺪ ﺧﺎ ﻟ ﺪ ﺗﻮﻓ ﯿﻖ

The years changed a lot… It

changes the mountains, how can

it not change your character?

- Ahmed Khaled Tawfiq

Years make a lot of changes.

They change the terrains of

mountains, let alone your

character?

-Ahmed Khaled Tawfik

Furthermore, another frequent issue was errors related to orthography as shown in some

samples, which involve the target language’s conventions of writing, such as norms of spelling,

hyphenation, capitalization, word breaks, emphasis, and punctuation. These errors might not be

critical, but they negatively affect the readability of the translations. It was noticed that Instagram

MT tended to imitate the ST writing conventions which resulted in poorly written translations.

This strategy might be usable in languages that have similar writing norms, but in our case, the

source language and the target language have completely different orthographic systems, it led to

considerable issues, such as small letters at the beginning of a sentence and capital letters in the

middle of a sentence, and a lack of proper punctuation marks, among other things.

STYLE

Literary writings, as expressive texts, highly value the form of texts. Stylistics played a significant

role in the evaluation Several stylistic errors were found in Instagram MT output. The MT system

used basic translation strategies, such as literal and word-for-word translations with all types of

texts, be it informative, expressive, or persuasive. While literal translation could work in

informative texts that focus only on the content, in the literary texts that also value the form, it was

a root cause of generating translations that lacked the aesthetic values and had awkward sentence

structures, as illustrated in Table 8.

TABLE 8. Example 1 of Stylistic Errors

IMT

ﻟﺴﺖ أﻓﮭﻢ

ﻣﻦ ﻣﻌﻨﻰ اﻟﺤﺐ

إﻻ أن اﻟﺮ

وح

ﻗﺪ اھﺘﺪت إﻟﻰ ﺷﻲء ﻣﻦ ﺳ

ﺮ اﻹﻧﺴﺎﻧﯿﺔ ﻓﻲ

إﻧﺴﺎن

ﺟﻤﯿﻞ.

ـ ﻣﺼﻄﻔﻰ اﻟ ﺮ اﻓ ﻌﻲ

I do not understand the meaning of

love, except that the soul has been

guided to something of the secret of

humanity in a beautiful human

being.

- Mostafa Al-Rafay

The only thing I can understand

about the meaning of love is that

the soul has found a secret of

humanity in a beautiful human

being.

- Mostafa Al-Rafe'ie

GEMA Online® Journal of Language Studies 227

Volume 24(1), February 2024 http://doi.org/10.17576/gema-2024-2401-13

eISSN: 2550-2131

ISSN: 1675-8021

Idiomatic expression translation can be problematic, especially in machine translation that

most of times these expressions end up being translated literally. It occurs when there are linguistic

or cultural gaps between the SL and TL. However, Instagram MT failed to translate expressions

that had one-to-one direct equivalent, producing unidiomatic style in the TT. This is clearly

demonstrated in Table 9, where Instagram MT translated the ST literally, despite the existence of

a direct equivalent in English.

TABLE 9. Example 2 of Stylistic Errors

IMT

ﻣﺎ

ﺗﺰرﻋﮫ

اﻟﯿﻮم ﺗﺤﺼﺪه

ﻏﺪا

What you plant today you will harvest

tomorrow.

You reap what you sow.

Another issue commonly found in the output of Instagram MT was the lack of conformity

to the register of the ST. The translations seemed to have informal style by using colloquial terms

and contractions to reproduce the formality of the ST that represented in in using Modern Standard

Arabic. This issue is demonstrated in Table 10.

TABLE 10. Example 3 of Stylistic Error

IMT

ﻣﺶ ﻣ ﻌ ﻨ ﻰ إ ن ﺣ ﺪ ﺷ ﺎ ﯾ ﻞ ا ﻟ ﺸ ﯿ ﻠ ﺔ

ﻛﻮﯾﺲ ﯾ ﺒ ﻘﻰ ا ﻟﺸﯿ ﻠﺔ ﻣﺶ ﺗ ﻘ ﯿ ﻠ ﺔ!

It doesn’t mean that someone is

carrying the burden well Then the

burden is not heavy!

Just because someone else is carrying

the burden well, it doesn’t mean the

burden is not heavy!

Despite the above-mentioned weaknesses in Instagram MT system, the system has shown

improvement in some other aspect. It was able to properly translate texts written in the Egyptian

dialect. As shown in Table 10, the MT system managed to recognize the colloquial words ( ﺷﺎ ﯾﻞ ),

(اﻟﺸﯿﻠﺔ ), and ( ﻛﻮ ﯾﺲ ), and translated them into their proper equivalent terms in English (is carrying),

(the burden), and (well).

DISCUSSION

This small-scale exploration questioned whether Instagram MT is capable of producing accurate

and fluent translations that well maintain the intended message implied within the literary texts to

the target language. The results of the evaluation revealed that the MT system produced several

translation errors, covering different linguistic aspects including lexis, syntax, semantics,

pragmatics, orthography, and stylistics, which hindered the process of transferring the accurate

meaning of the source texts in fluent well-structured translations which definitely go against the

translation specifications, Table 1 and 2, that were set by the researchers before the evaluation.

These results contradict the concept of translation quality as defined by Koby et al. (2014), who

defined translation quality as reproducing accurate and fluent translation results for the target

audience that can serve the original purpose and comply with all other specifications negotiated

between the requester and provider, while considering the needs of the end-users. Consequently,

Instagram’s NMT system is not capable of producing translations that are well-structured, properly

convey the intended message, and preserve the aesthetic value of the literary texts. Despite

significant improvements made on Instagram’s NMT, e.g., the ability to recognize dialectical

GEMA Online® Journal of Language Studies 228

Volume 24(1), February 2024 http://doi.org/10.17576/gema-2024-2401-13

eISSN: 2550-2131

ISSN: 1675-8021

element as shown in Table 10, these linguistic aspects, such as accuracy, fluency, and adherence

to stylistic conventions, remain challenging.

Issues related to accuracy included producing wrong equivalents in the TL either because

the MT could not recognize the exact meaning of the term in the context as in Table 3 and 5,

misread vocalization as in Table 4. This is in line with Susanti (2018), who questioned the

reliability of Instagram MT translations by exploring the lexical errors and found out that MT is

prone to generate mistranslated words, incorrect translation, and unknown words. Likewise,

Cahyani et al. (2021) stressed that Instagram MT produced inappropriate translation because it

used improper procedures by choosing the lexis in the target language literally through a term that

has several synonyms which have different meanings in each use without considering the overall

context of the caption. Additionally, literal texts usually include literary devices, such as metaphor,

that consider a unique structure of language that implies contextual and cultural nuances. As we

can see in Table 4 that Instagram MT apparently does not have the flexibility to deal with unusual

forms of texts that do not have direct parallel structures in its database, and it fails to recognize the

contextual and cultural knowledge implied in the inputs as it only uses literal translation to produce

the dictionary meaning of the linguistic units (Purwaningsih, 2019; Omar & Gomaa, 2020). this

issue. In a similar vein to the findings of Purwaningsih et al. (2019) and Meilasari (2019),

Instagram NMT is still unable to recognize proper nouns, especially those come under “Adjective

Constituent” noun-compound class as illustrated in Table 5. Due to complexities of the

morphological differences of Arabic language, it possesses numerous types of noun compound

types and the extraction of Arabic noun compounds is one of the challenging tasks in machine

translation (MT) where in Arabic the words do not have capital or small letters which causes

semantic ambiguity exactly as what happened when Instagram NMT attempted to identify and

translate the names of well-known Arab writers (Naguib Mahfouz and Jasem Al-Mutawa) in Table

5. Though such proper nouns are well-known and frequently occur together, the MT system fails

to recognize the context in which they occur and identify them as names not just nouns or

adjectives. This issue can be attributed to two root causes: the lack of capitalization and available

resources of Arabic noun compound lexicon (Omar & Al-Tashi, 2018). To overcome this

limitation, we need to extract those compound nouns to process it further as well as improvement

should also include the Named Entity Recognition (NER) and Part-of-Speech tagging tasks

implemented in the MT. NER and POS tagging are responsible for identifying, determining, and

classifying proper names in a text which can help compensate the absence of capital letters in the

Arabic language that considers the main difficulty to achieving high performance in automated

translations (Alkhatib & Shaalan, 2018).

Issues under Fluency category, including producing unintelligible output, incohesive

translations and orthography-related errors, can be attributed to the significant linguistic

differences between the two languages. Arabic and English belong to different language families,

respectively, and have distinct grammar rules, morphology, semantics, pragmatics, and writing

conventions. These differences make it difficult for MT systems, resulting in inadequate

translations. Morphologically rich languages like Arabic pose to accurately recognize and

effectively bridge these linguistic and cultural gaps even more significant challenges for MT

systems. This flexibility in word order within Arabic, for example, makes it difficult for translation

systems to make accurate choices, negatively impacting the quality of translations (Ameur et al.,

2020; Omar & Gomaa, 2020).

Literary texts pose a greater challenge for machine translation systems due to their unique

style and use of figurative language, special diction, and language enhancers that carry implied

GEMA Online® Journal of Language Studies 229

Volume 24(1), February 2024 http://doi.org/10.17576/gema-2024-2401-13

eISSN: 2550-2131

ISSN: 1675-8021

meanings beyond words and sentences. Arvianti (2018) pointed out that compared to human

translators, MT systems have a limited vocabulary and struggle with context understanding,

making it difficult for them to recognize specialized language. Omar and Gomaa (2020) explored

the challenges of applying MT systems to literary translation and concluded that despite the

occurrence of errors, the usefulness of MT systems should not be underestimated. It is clear from

the data shown that some of the resultant problems are brought about by translating the texts

literally using basic translation strategies, such as word-for-word translations, as shown in Table

8 and 9, without considering the contextual, cultural, and aesthetic references which are essential

in literary texts, hence resulting in a loss of the expressive sense, and in some cases, the whole

meaning gets distorted.

Meilasari (2019) concluded that Instagram translation machine is not a reliable feature for

the target language reader who wants to understand certain language or cultural-related terms in

the source language because the MT only produces the translation product literally based on what

is provided by the source text and has no ability to analyze and restructure the text that is being

translated. As a matter of fact, MT technology has been introduced and implemented to social

networking platforms as a response to the rising demand for multilingual content. It can be

considered as a vehicle for accessibility as it provides a means for across nation communication

to take place in a way it bridges between languages and cultures in which users lack proficiency.

However, translation errors generated from MT systems can have negative impacts on the end-

users’ experience because, as the findings of this study shown, such errors affect the readability

either by distorting the fluency, accuracy or the style of the original content.

CONCLUSION

This paper evaluates the output of Instagram’s automatic translation of Arabic literary writings.

The evaluation involves an analysis of translated texts, the identification, classification, and

assignment of error types, along with the allocation of penalty points using the MQM core

typology. The study explores the ability of Instagram MT to convey the implied message in literary

texts. The findings indicate that Instagram MT fails to successfully translate 90% of the data at

three levels: accuracy, fluency, and style. Specifically, the selected data exhibited 61 errors,

comprising 26 in fluency, 25 in accuracy, and 10 in style. These errors significantly affect the

quality of translations, thereby impeding the transfer of the intended message embedded within

the source texts. The evaluation results reveal multiple translation errors. These errors negatively

impact the translations' accuracy, fluency, and style, hindering the conveyance of the intended

message of the source texts.

LIMITATIONS AND FURTHER RESEARCH

This study examines the quality of Instagram's AI-based machine translation for translating literary

Arabic texts into English. Further research can investigate other aspects of the Arabic context and

compare the linguistic needs of Arabic with those of other languages in MT systems. As AI

continues evolving, further evaluations are necessary to assess MT applications and text genres in

different language pairs for the purpose of further exploring this promising technology and

enhancing its output for assuring more sustainable circulation of information worldwide and

enriching end-user experience.

GEMA Online® Journal of Language Studies 230

Volume 24(1), February 2024 http://doi.org/10.17576/gema-2024-2401-13

eISSN: 2550-2131

ISSN: 1675-8021

ACKNOWLEDGEMENT

We sincerely thank ContentQuo, a translation quality management SaaS, for their indispensable

support in providing localization services and CAT tools to evaluate raw MT and PEMT output

using Error Annotation and Rating Scale.

REFERENCES

Alkhatib, M., & Shaalan, K. (2018). The key challenges for Arabic machine

translation. Intelligent Natural Language Processing: Trends and Applications, 139-156.

Ameur, M. S. H., Meziane, F., & Guessoum, A. (2020). Arabic machine translation: A survey of

the latest trends and challenges. Computer Science Review, 38, 100305.

https://doi.org/10.1016/j.cosrev.2020.100305.

Arnold, D., Balkan, L., Meijer, S., Humphreys, R., & Sadler, L. (1994). Machine translation: An

introductory guide. London: Blackwell.

Arvianti, G. F. (2018). Human translation versus machine translation of Instagram’s captions: Who

is the best? In English Language and Literature International Conference (ELLiC)

Proceedings (Vol.2, pp.531-536).

Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to

align and translate. arXiv preprint arXiv:1409.0473.

Bentivogli, L., Bisazza, A., Cettolo, M., & Federico, M. (2016). Neural versus phrase-based

machine translation quality: a case study. In Proceedings of the 2016 Conference on

Empirical Methods in Natural Language Processing (pp. 257–267). arXiv preprint

arXiv:1608.04631.

Bhattacharyya, S. (2022). Meta's machine translation journey. Analytics India Magazine.

Retrieved March22,2023, from https://analyticsindiamag.com/metas-machine-translation

journey/#text=Meta%20used%20neural%20machine%20translation,training%20time%20

from%2024%20hours.

Bounhas, I., & Slimani, Y. (2009). A hybrid approach for Arabic multi-word term extraction.

In 2009 International Conference on Natural Language Processing and Knowledge

Engineering (pp. 1-8). IEEE.

Cahyani, N. L. D. (2022). Derivational affixes found in the caption of selected posts of

@bawabali_official account on Instagram (Doctoral dissertation, Universitas

Mahasaraswati Denpasar).

Castilho, S., Moorkens, J., Gaspari, F., Calixto, I., Tinsley, J., & Way, A. (2017a). Is neural

machine translation the new state of the art? The Prague Bulletin of Mathematical

Linguistics, 108, 109–120.

Castilho, S., Moorkens, J., Gaspari, F., Sennrich, R., Sosoni, V., Georgakopoulou, P., ... &

Gialama, M. (2017b). A comparative quality evaluation of PBSMT and NMT using

professional translators. In Proceedings of Machine Translation Summit XVI: Research

Track (pp. 116-131).

Chatzikoumi, E. (2020). How to evaluate machine translation: A review of automated and human

metrics. Natural Language Engineering, 26(2),137-161.

Chéragui, M. A. (2012). Theoretical overview of machine translation. ICWIT, 160-169.

Das, A. K. (2018). Translation and artificial intelligence: Where are we heading. International

Journal of Translation, 30(1),72-101.

GEMA Online® Journal of Language Studies 231

Volume 24(1), February 2024 http://doi.org/10.17576/gema-2024-2401-13

eISSN: 2550-2131

ISSN: 1675-8021

Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural

machine translation: Encoder-decoder approaches. In Eighth Workshop on Syntax,

Semantics and Structure in Statistical Translation (SSST-8). arXiv preprint

arXiv:1409.1259.

Dorr B., Snover M. & Madnani N. (2011). Chapter 5.1 introduction. In Olive J., McCary J. and

Christianson C. (eds), Handbook of Natural Language Processing and Machine

Translation. DARPA Global Autonomous Language Exploitation. New York: Springer,

(pp. 801–803).

España-Bonet, C., & Costa-jussà, M. R. (2016). Hybrid machine translation overview. Hybrid

Approaches to Machine Translation, 1-24.

Fadilah, E. (2017). Semantic Errors Analysis of Instagram Machine Translation from Indonesian

to English. Published thesis, Syarif Hidayatullah State Islamic University of Jakarta,

Indonesia.

Habash, N., Dorr, B., & Monz, C. (2009). Symbolic-to-statistical hybridization: extending

generation-heavy machine translation. Machine Translation, 23, 23-63.

Hunsicker, S., Chen, Y., & Federmann, C. (2012, June). Machine learning for hybrid machine

translation. In Proceedings of the Seventh Workshop on Statistical Machine Translation,

312-316.

Hutchins, W. J. (1995). Machine translation: A brief history. In Concise History of the Language

Sciences (pp. 431-445). Pergamon.

Hutchins, J. (2007). Machine translation: A concise history. Computer Aided Translation: Theory

and Practice, 13(29-70), 11.

Koby G.S., Fields P., Hague D., Lommel A. & Melby A. (2014). Defining translation quality.

Tradumàtica 12,413–420.

Koehn, P. (2009). Statistical machine translation. Cambridge University Press.

Larassati, A., Setyaningsih, N., Nugroho, R. A., Suryaningtyas, V. W., Cahyono, S. P., &

Pamelasari, S. D. (2019). Google vs. Instagram machine translation: multilingual

application program interface errors in translating procedure text genre. In 2019

International Seminar on Application for Technology of Information and Communication

(iSemantic) (pp. 554-558). IEEE.

Lommel, A., & Melby, A. K. (2018). Tutorial: MQM-DQF: A good marriage (Translation quality

for the 21st Century). In Proceedings of the 13th Conference of the Association for

Machine Translation in the Americas (Volume 2: User Track).

Mawarni, B., Pambudi, B. D., & Ghasani, B. I. (2017). The problem of cultural untranslatability

found in the English translation of Jokowi’s Instagram posts. In UNNES International

Conference on ELTLT (pp. 104-108).

Meilasari, P. (2019). When Instagram translation machine translates ecology terms: Accurate or

not? In The 7th Library Studied Conference (p. 129).

Melby, A. K. (2002). The mentions of equivalence in translation. Meta, 35(1),207-213.

Meta AI. (2022). 200 languages within a single AI model: A breakthrough in high-quality machine

translation. Meta AI. Retrieved March 22, 2023, from https://ai.facebook.com/blog/nllb-

200-high-quality-machine-translation/

Mannes, J. (2017, August 3). Facebook finishes its move to neural machine Translationt.

TechCrunch. Retrieved March 22, 2023, from

https://techcrunch.com/2017/08/03/facebook-finishes-its-move-to-neural-machine-

translation/?guccounter=1

GEMA Online® Journal of Language Studies 232

Volume 24(1), February 2024 http://doi.org/10.17576/gema-2024-2401-13

eISSN: 2550-2131

ISSN: 1675-8021

Moorkens, J., Toral, A., Castilho, S., & Way, A. (2018). Translators’ perceptions of literary post-

editing using statistical and neural machine translation. Translation Spaces, 7(2),240-262.

Nord, C. (2005). Text analysis in translation: Theory, methodology, and didactic application of a

model for translation-oriented text analysis. New York: Rodopi.

Nord, C. (1997). Functionalist approaches explained. Manchester, UK: St. Jerome Publishing.

Omar, N., & Al-Tashi, Q. (2018). Arabic nested noun compound extraction based on linguistic

features and statistical measures. GEMA Online® Journal of Language Studies, 18(2).

Omar, A., & Gomaa, Y. (2020). The machine translation of literature: Implications for translation

pedagogy. International Journal of Emerging Technologies in Learning

(IJET), 15(11),228-235.

Pino, J. M., Sidorov, A., & Ayan, N. F. (2017). Transitioning to neural machine translation. Tech

at Meta. Retrieved March 22, 2023, from https://tech.facebook.com/artificial-

intelligence/2017/8/transitioning-entirely-to-neural-machine-translation

Pujakesuma, G. A. (2022). The Performance of Instagram's auto-translate and google translate

in translating house of highlight's Instagram captions [Thesis, Yogyakarta:Sanata Dharma

University]. http://repository.usd.ac.id/id/eprint/42510

Purwaningsih, D. R., Sholikhah, I. M., & Wardani, E. (2019). Revealing translation techniques

applied in the translation of Batik Motif names in see Instagram. Celt: A Journal of Culture,

English Language Teaching & Literature, 19(2), 287-301.

Ragni, V., & Nunes Vieira, L. (2022). What has changed with neural machine translation? A

critical review of human factors. Perspectives, 30(1),137-158.

Sabtan, Y. M. N., Hussein, M. S. M., Ethelb, H., & Omar, A. (2021). An evaluation of the accuracy

of the machine translation systems of social media language. International Journal of

Advanced Computer Science and Applications, 12(7), 406-415. DOI:

10.14569/IJACSA.2021.0120746

S. Dixon. (2023). Instagram users worldwide 2025. Statista. Retrieved March 22, 2023, from

https://www.statista.com/statistics/183585/instagram-number-of-global-users/

Sipayung, K. T., Sianturi, N. M., Arta, I. M. D., Rohayati, Y., & Indah, D. (2021). Comparison of

Translation Techniques by Google Translate and U-Dictionary: How Differently Does

Both Machine Translation Tools Perform in Translating? Elsya: Journal of English

Language Studies, 3(3),236-245.

Susanti, E. (2018). Lexical errors produced by Instagram machine translation [Doctoral

dissertation, Universitas Islam Negeri Maulana Malik Ibrahim].

Stymne, S., & Ahrenberg, L. (2012, May). On the practice of error analysis for machine translation

evaluation. In Proceedings of the Eighth International Conference on Language Resources

and Evaluation (LREC'12) (pp. 1785-1790).

Tambouratzis, G. (2014, April). Comparing CRF and template-matching in phrasing tasks within

a Hybrid MT system. In Proceedings of the 3rd Workshop on Hybrid Approaches to

Machine Translation (HyTra) (pp. 7-14).

Thurmair, G. (2009). Comparing different architectures of hybrid machine translation systems.

In Proceedings of Machine Translation Summit XII: Posters.

Widiastuti, N. M. A. (2021). Translation procedures of English phrasal verbs into Indonesian on

Instagram captions. In International Seminar on Austronesian Languages and Literature

IX (399-408). Denpasar: Udayana University.

GEMA Online® Journal of Language Studies 233

Volume 24(1), February 2024 http://doi.org/10.17576/gema-2024-2401-13

eISSN: 2550-2131

ISSN: 1675-8021

Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., & Dean, J. (2016). Google's

neural machine translation system: Bridging the gap between human and machine

translation. arXiv preprint arXiv:1609.08144.

ABOUT THE AUTHORS

Altaf A. Fakih is a Ph.D. candidate at Universiti Sains Malaysia (USM) where she also obtained

her M.A. in Translation and Interpreting studies. Her research areas of interest include Machine

Translation, Artificial Intelligence, Audiovisual Translation, and Translation Quality Assessment.

She is also a certified English-Arabic translator.

Mozhgan Ghassemiazghandi is a Senior Lecturer at the School of Languages, Literacies, and

Translation at Universiti Sains Malaysia. She holds a Ph.D. in Translation Studies. Her research

interests are in translation technology, machine translation, and audiovisual translation.

Additionally, Mozhgan is an experienced freelance translator and subtitler, with more than a

decade of experience in the field.

Abdul-Hafeed Fakih is a Professor of Linguistics at the Department of English, Najran University,

Saudi Arabia (and formerly at Ibb University, Yemen). He published several papers in indexed

journals. He is a member of different editorial boards of indexed journals.

Manjet K. M. Singh is an Associate Professor at Universiti Sains Malaysia. She holds a Ph.D. in

language studies and has been attached to School of Languages, Literacies and Translation for the

past 27 years. Manjet’s interests are broad ranging and include sociolinguistics, language teaching

and learning, academic literacy(ies), discourse, and multilingualism.

Construction of Algorithm for English Learners to Read Foreign Literature Based on Machine Translation Technology

Article

Full-text available

Apr 2024

Jinrong Shu

This study focuses on the development of an algorithm aimed at assisting English learners in comprehending foreign literature through the utilization of machine translation technology. With the growing importance of multilingual proficiency in a globalized world, the ability to read and understand literature in foreign languages is increasingly valued. However, language barriers often hinder learners' comprehension and appreciation of such texts. To address this challenge, our research endeavors to construct an algorithm that leverages machine translation techniques to facilitate the reading process for English learners. By integrating advanced translation algorithms and natural language processing methods, the proposed system aims to provide accurate and contextually relevant translations of foreign literature into English. Additionally, the algorithm will incorporate features tailored to the needs of English learners, such as vocabulary assistance, grammatical explanations, and cultural context annotations, to enhance their reading experience and comprehension levels. Through rigorous experimentation and evaluation, the effectiveness and usability of the algorithm will be assessed, with the ultimate goal of empowering English learners to engage more deeply with foreign literature and broaden their linguistic and cultural horizons.

An Evaluation of ChatGPT's Translation Accuracy Using BLEU Score

Article

Apr 2024

Mozhgan Ghassemiazghandi

Traditional views have long held that machine translation cannot achieve the quality and accuracy of human translators, especially in complex language pairs like Persian and English. This study challenges this perspective by demonstrating that ChatGPT-4, with access to vast amounts of multilingual data and leveraging advanced large language model algorithms, significantly outperforms widely utilized open-source machine translation tools and approaches the realm of human translation quality. This research aims to critically assess the translation accuracy of ChatGPT-4 against a traditional open-source machine translation tool from Persian to English, highlighting the advancements in artificial intelligence-driven translation technologies. Using Bilingual Evaluation Understudy scores for a comprehensive evaluation, this study compares the translation outputs from ChatGPT-4 with MateCat, providing a quantitative basis for comparing their accuracy and quality. ChatGPT-4 achieves a BLUE score of 0.88 and an accuracy of 0.68, demonstrating superior performance compared to MateCat, with a 0.82 BLUE score and 0.49 accuracy. The results indicate that the translations generated by ChatGPT-4 surpass those produced by MateCat and nearly mirror the quality of human translations. The evaluation demonstrates the effectiveness of OpenAI's large language model algorithms in improving translation accuracy.

CURRICULUM VITAE

Article

Full-text available

Jun 2024

Abdul-Hafeed Ali Fakih

-----

Comparison of Translation Techniques by Google Translate and U-Dictionary: How Differently Does Both Machine Translation Tools Perform in Translating?

Article

Full-text available

Oct 2021

Better translation produced by computation linguistics should be evaluated through linguistics theory. This research aims to describe translation techniques between Google Translate and U-Dictionary. The study used a qualitative research method with a descriptive design. This design was used to describe the occurrences of translation techniques in both translation machine, with the researchers serving as an instrument to compare translation techniques which is produced on machine. The data are from expository text entitled “Importance of Good Manners in Every Day Life”. The total data are 122 words/phrases which are pairs of translations, English as source language and Indonesia as target language. The result shows that Google Translate apply five of Molina & Albir’s (2002) eighteen translation techniques, while U-dictionary apply seven techniques. Google Translate dominantly apply literal translation techniques (86,8%) followed by reduction translation techniques (4,9%). U-dictionary also dominantly apply literal translation techniques (75,4%), but follows with the variation translation techniques (13,1%). This study showed that both machines produced different target texts for the same source language due to different applications of techniques, with U-dictionary proven to apply more variety of translation techniques than Google Translate. The researcher hopes this study can be used as an evaluation for improving the performance of machine translations.

An Evaluation of the Accuracy of the Machine Translation Systems of Social Media Language

Article

Full-text available

Jan 2021

The Machine Translation of Literature: Implications for Translation Pedagogy

Article

Full-text available

Jun 2020

The recent years have witnessed an increasing importance of machine translation systems due to the prolific production on online texts in different disciplines and furthermore, the inability of traditional translation methods in addressing translation needs all over the world. It is even argued that training on translation tools should be integrated into translation pedagogies and ultimately, courses should be provided for students and professionals. In spite of the effectiveness of translation tools and systems in providing solutions in relation to different disciplines and text genres, the usability and reliability of such systems in terms of literary texts, however, is still highly controversial. Many critics and educators still underestimate the usefulness of the machine translation systems in literature, which could be partially attributed to the unique nature of the language of the literary texts. The issue has its pedagogical implications to translation instruction due to the needs to integrate emerging technologies in teaching and learning practices. For proper use of translation technologies in educational contexts, these need to be well evaluated. For this purpose, this study evaluates the usefulness of applying machine translation systems to literature with the purpose of identifying the challenges that may have negative impacts on the reliability of machine translation systems. In order to do this this, two translation systems are selected, namely, Google Translate and Q Translate. By way of illustration, the study is based on a corpus of two English short stories. The study is based on two prose fiction texts. The first is J. K. Rowling’s novel Harry Potter and the Philosopher’s Stone. The second is Edgar Allan Poe’s short story The Black Cat. Automatic translations generated by the two machine translation systems were compared to human made Arabic translations with the purpose of identifying the problems within these translations. Results indicate that different lexical, structural, and pragmatic errors are encountered by users which negatively impact the reliability of these translations. Educators and translation instructors need to reflect on the challenges of machine translation systems in relation to literature. Software developers need also to address the problems faced by users and students in the translation from and into the Arabic language.

Translation and Artificial Intelligence: Where are we heading?

Article

Full-text available

Jun 2018

Alok Das

With revolutionary developments in Artificial Intelligence (AI) and Deep Learning (DL), contributing significantly to Natural Language Processing (NLP), the accuracy and quality of Machine Translation (MT) has improved manifold. There is a debate, however that its about time the human translation became irrelevant or redundant. After all, human imperfections are quite steadily taken care of by its own inventions. With the use of neural networks in machine translation, its been recently claimed that intelligent systems can now translate at par with human translators. Nonetheless, AI is still not devoid of problems associated with processing of a language, leave alone the complexities and complications typical of translation. Then comes the inherent biases while designing intelligent systems. How we design these systems depends on who we are, thereby setting in a biased worldview and social experiences. Given the diversity of language structures and cultures they represent, their handling by intelligent machines, even with deep learning capabilities, with human efficiency looks highly improbable, at least, for now.

Translators’ perceptions of literary post-editing using statistical and neural machine translation

Article

Full-text available

Nov 2018

In the context of recent improvements in the quality of machine translation (MT) output and new use cases being found for that output, this article reports on an experiment using statistical and neural MT systems to translate literature. Six professional translators with experience of literary translation produced English-to-Catalan translations under three conditions: translation from scratch, neural MT post-editing, and statistical MT post-editing. They provided feedback before and after the translation via questionnaires and interviews. While all participants prefer to translate from scratch, mostly due to the freedom to be creative without the constraints of segment-level segmentation, those with less experience find the MT suggestions useful.

Revealing Translation Techniques Applied in the Translation of Batik Motif Names in See Instagram

Article

May 2021

This article discusses one of the forms of machine translation, the Instagram translation feature called “see translation”. The research is focused on the translation techniques applied by the machine in translating Banyumas batik motifs from Indonesian to English found in @batikantodjamil and @batk_rd. This topic is worth discussing since machine translation is now getting more developed and is projected to replace human translator. However, in some cases, for example in dealing with culturally-bound terms, machine translation cannot perform contextual knowledge as well as the human translator. this mini research was conducted by applying qualitative research with purposive sampling technique in which the researchers obtain the data by selecting two batik center Instagram accounts containing batik motif names in the captions. The result shows that there are three translation techniques applied by the Instagram translation features, namely literal, borrowing, and particularization. The most dominant technique to use is borrowing technique, and it shows a tendency that such cultural terms in the source language do not have one-to-one correspondence in the target language. In other words, the touch of human translator is very important in the post-editing process of translation by machine to make the translation more acceptable. However, if it is impossible to involve human translator, the Instagram administrator should enrich the machine with more contextual linguistic database to provide the users with better translation results.

What has changed with neural machine translation? A critical review of human factors

Article

Mar 2021

In recent years, the push towards automation and translation productivity led to great efforts dedicated to the development of machine translation (MT) systems. Neural machine translation (NMT) represents the latest of these efforts. In this paper we present a critical review of human factors in NMT research with two goals: to provide a snapshot of research in NMT involving human stakeholders, and to appraise how professional translators have been included in discourses around NMT. We report four key findings. First, from translators’ perspective, changes brought about by the neural paradigm are not as much to do with workflows, but rather with the NMT editing process and its specifics. Second, the majority of NMT research involving human stakeholders is directed towards advancing the state of MT development rather than ensuring the usefulness of NMT as a tool for professionals. Third, the review suggested overall narrow conceptualisations of translation productivity that were often based solely on measures of processing time or throughput. Fourth, it emerged that NMT investigations involving end-users are still relatively scarce. We present and discuss these findings, and make recommendations for future research on topics including the concept of productivity and the role of NMT as a professional tool.

Arabic Machine Translation: A survey of the latest trends and challenges

Article

Sep 2020

Given that Arabic is one of the most widely used languages in the world, the task of Arabic Machine Translation (MT) has recently received a great deal of attention from the research community. Indeed, the amount of research that has been devoted to this task has led to some important achievements and improvements. However, the current state of Arabic MT systems has not reached the quality achieved for some other languages. Thus, much research work is still needed to improve it. This survey paper introduces the Arabic language, its characteristics, and the challenges involved in its translation. It provides the reader with a full summary of the important research studies that have been accomplished with regard to Arabic MT along with the most important tools and resources that are available for building and testing new Arabic MT systems. Furthermore, the survey paper discusses the current state of Arabic MT and provides some insights into possible future research directions.

Google vs. Instagram Machine Translation: Multilingual Application Program Interface Errors in Translating Procedure Text Genre

Conference Paper

Sep 2019

How to evaluate machine translation: A review of automated and human metrics

Article

Sep 2019

Eirini Chatzikoumi

This article presents the most up-to-date, influential automated, semiautomated and human metrics used to evaluate the quality of machine translation (MT) output and provides the necessary background for MT evaluation projects. Evaluation is, as repeatedly admitted, highly relevant for the improvement of MT. This article is divided into three parts: the first one is dedicated to automated metrics; the second, to human metrics; and the last, to the challenges posed by neural machine translation (NMT) regarding its evaluation. The first part includes reference translation–based metrics; confidence or quality estimation (QE) metrics, which are used as alternatives for quality assessment; and diagnostic evaluation based on linguistic checkpoints. Human evaluation metrics are classified according to the criterion of whether human judges directly express a so-called subjective evaluation judgment, such as ‘good’ or ‘better than’, or not, as is the case in error classification. The former methods are based on directly expressed judgment (DEJ); therefore, they are called ‘DEJ-based evaluation methods’, while the latter are called ‘non-DEJ-based evaluation methods’. In the DEJ-based evaluation section, tasks such as fluency and adequacy annotation, ranking and direct assessment (DA) are presented, whereas in the non-DEJ-based evaluation section, tasks such as error classification and postediting are detailed, with definitions and guidelines, thus rendering this article a useful guide for evaluation projects. Following the detailed presentation of the previously mentioned metrics, the specificities of NMT are set forth along with suggestions for its evaluation, according to the latest studies. As human translators are the most adequate judges of the quality of a translation, emphasis is placed on the human metrics seen from a translator-judge perspective to provide useful methodology tools for interdisciplinary research groups that evaluate MT systems.

Evaluation of Instagram's Neural Machine Translation for Literary Texts: An MQM-Based Analysis

Abstract and Figures

Recommended publications

Machine translation systems and quality assessment: a systematic review

Comparison of Translation Quality between Large Language Models and Neural Machine Translation Syste...

Machine Translation of Selected Ghazals of Hafiz from Persian into English

Comparing Statistical and Neural Machine Translation Performance on Hindi-to-Tamil and English-to-Ta...