ArticlePDF Available

Latest trends in hybrid machine translation and its applications

November 2014
Computer Speech & Language 32(1)

November 2014
32(1)

DOI:10.1016/j.csl.2014.11.001

License
CC BY-NC-ND 3.0

Authors:

Marta Ruiz Costa-jussa

Universitat Politècnica de Catalunya

José A. R. Fonollosa

Universitat Politècnica de Catalunya

This survey on hybrid machine translation (MT) is motivated by the fact that hybridization techniques have become popular as they attempt to combine the best characteristics of highly advanced pure rule or corpus-based MT approaches. Existing research typically covers either simple or more complex architectures guided by either rule or corpus-based approaches. The goal is to combine the best properties of each type. This survey provides a detailed overview of the modification of the standard rule-based architecture to include statistical knowledge, the introduction of rules in corpus-based approaches, and the hybridization of approaches within this last single category. The principal aim here is to cover the leading research and progress in this field of MT and in several related applications.

Classification of hybrid MT architectures.

…

Schema of hybridization guided by RBMT.

…

Schema of hybridization guided by corpus-based MT.

…

Figures - uploaded by Marta Ruiz Costa-jussa

Content may be subject to copyright.

Content uploaded by Marta Ruiz Costa-jussa

Content may be subject to copyright.

Available via license: CC BY-NC-ND 3.0

Content may be subject to copyright.

Available

online

www.sciencedirect.com

ScienceDirect

Computer

Speech

and

Language

(2015)

3–10

Latest

trends

hybrid

machine

translation

and

its

applications夽

Marta

Costa-jussà a,∗,

José

A.R.

Fonollosa b

aInstitute

for

Infocomm

Research,

Fusionopolis

Way ,

Singapore

138632,

Singapore

bUniversitat

Politècnica

Catalunya,

Jordi

Girona,

Barcelona

08034,

Spain

Received

October

2014;

accepted

November

2014

Available

online

November

2014

Abstract

This

survey

hybrid

machine

translation

(MT)

motivated

the

fact

that

hybridization

techniques

have

become

popular

they

attempt

combine

the

best

characteristics

highly

advanced

pure

rule

corpus-based

approaches.

Existing

research

typically

covers

either

simple

complex

architectures

guided

either

rule

corpus-based

approaches.

The

goal

combine

the

best

properties

each

type.

This

survey

provides

detailed

overview

the

modiﬁcation

the

standard

rule-based

architecture

include

statistical

knowl-

edge,

the

introduction

rules

corpus-based

approaches,

and

the

hybridization

approaches

within

this

last

single

category.

The

principal

aim

here

cover

the

leading

research

and

progress

this

ﬁeld

and

several

applications.

2014

The

Authors.

Published

Elsevier

Ltd.

This

open

access

article

under

the

BY-NC-ND

license

(http://creativecommons.org/licenses/by-nc-nd/3.0/).

MSC:

00-01;

99-00

Keywords:

Hybridization;

Machine

translation;

Corpus;

Rules;

Applications

Introduction

Machine

translation

(MT)

the

area

natural

language

processing

(NLP)

that

focuses

obtaining

target

language

text

from

source

language

text

means

automatic

techniques.

multidisciplinary

ﬁeld

and

the

challenge

has

been

approached

from

various

points

view

including

linguistics

and

statistics.

The

existence

different

perspectives

has

made

possible

the

proliferation

hybrid

methodologies.

Hybrid

methods

focus

combining

the

best

properties

two

approaches.

Nowadays,

has

become

very

popular

include

rules

statistical

(SMT)

approaches.

However,

there

are

also

relevant

works

enhancing

standard

rule-based

(RBMT)

adding

statistical

knowledge.

Recent

initiatives

such

the

three

editions

the

HyTra

workshop1

show

that

linguists,

engineers

and

computer

scientists

actively

interact

the

interests

building

successful

hybrid

architectures,

formulating

proposals

and

conducting

experiments.

This

survey

paper

reviews

recent

methods

that

combine

and

hybridize

approaches

single

architectures,

and

thus,

two

closely

lines

research

fall

outside

our

scope.

First,

the

methodologies

multi-engine

combination,

夽This

paper

has

been

recommended

for

acceptance

Roger

Moore.

∗Corresponding

author.

Current

address:

Instituto

Politécnico

Nacional,

Mexico.

Tel.:

+51

5525298370.

1http://parles.upf.edu/llocs/plambert/hytra/hytra2014/.

http://dx.doi.org/10.1016/j.csl.2014.11.001

0885-2308/©

2014

The

Authors.

Published

Elsevier

Ltd.

This

open

access

article

under

the

BY-NC-ND

license

(http://creativecommons.org/licenses/by-nc-nd/3.0/).

M.R.

Costa-jussà,

J.A.R.

Fonollosa

Computer

Speech

and

Language

(2015)

3–10

which

have

been

widely

studied

MT,2as

well

other

areas

(e.g.

speech

recognition).

These

approaches

assemble

outputs,

not

architectures.

And

second,

the

integration

linguistic

knowledge

into

SMT

when

studies

not

consider

different

paradigms.

For

survey

this

speciﬁc

topic

see

Costa-jussà

and

Farrús

(2014).

The

rest

the

paper

organized

follows.

Section

explains

two

classiﬁcations

approaches.

Section

reports

the

main

hybridization

methods

within

and

across

paradigms.

Section

describes

several

applications

with

hybrid

components.

Finally,

Section

summarizes

the

main

ﬁndings

this

survey.

Classiﬁcation

machine

translation

Basically,

approaches

can

classiﬁed

into

different

paradigms

using

two

criteria:

either

the

level

represen-

tation

the

sources

information.

2.1.

Level

representation

When

classifying

level

representation,

can

think

the

Vauquois

pyramid

that

basically

contains:

direct,

transfer

and

interlingua

approaches.

Direct.

Approaches

the

bottom

the

Vauquois

pyramid

require

one

single

step

transformation

between

source

and

target,

without

analysis

the

source

language

and

without

generation

the

target

language.

Within

this

category,

might

ﬁnd

simple

dictionary-based

translations.

Transfer.

Approaches

the

middle

the

Vauquois

pyramid

consist

three

steps:

analysis,

transfer

and

generation.

This

category

includes

RBMT,

EBMT

and

SMT

approaches.

Interlingua.

Approaches

the

top

the

Vauquois

pyramid

consist

two

steps:

analysis

and

generation.

The

analysis

transforms

the

source

language

into

the

interlingua

representation

and

the

generation

transforms

this

interlingua

representation

into

the

target

language.

Interlingua

universal

representation

all

languages,

needing

transfer

stage.

(2005)

offers

observations

whether

system

can

considered

direct

transfer

depending

largely

how

much

how

little

language-speciﬁc

monolingual

analysis

carried

out

and

also

how

the

intermediate

representations

are

the

source

and

target

texts

themselves.

Essentially,

most

the

approaches

(other

than

interlingua)

mentioned

this

article

could

classiﬁed

transfer-based

engines,

with

varying

degrees

complexity

their

transfer,

analysis

and

generation

stages.

2.2.

Sources

information

sources

information

can

rules

data.

The

former

linguistically

motivated,

and

the

latter

statistically

motivated.

Rules.

approaches

based

rules

(i.e.

RBMT)

use

linguistic

information

such

monolingual

and

bilingual

dictionaries

combined

with

human

linguistic

knowledge.

Rules

are

developed

manually

transfer

text

source

language

text

into

target

language

text.

Most

popular

RBMT

approaches

apply

three

different

phases:

analysis,

transfer

and

generation.

Data.

Data-driven

approaches

use

information

from

data

and

complex

algorithms

which

together

are

capable

modeling

translation.

Data

driven

includes:

example

(EBMT)

and

statistical-based

(SMT).

deﬁnition,

EBMT

approaches

perform

direct

translation

analogy

and

can

seen

pattern

matching

problem.

Unlike

these,

SMT

systems

try

ﬁnd

the

most

probable

translation

given

the

source

sentence,

reference

the

models

built

using

data

such

the

translation

and

language

model

(Brown

al.,

1993).

SMT

can

classiﬁed

into

phrase,

syntax

and

hierarchical.

The

main

difference

among

these

models

the

structure

the

bilingual

units

which

can

built

from:

(1)

plain

text

the

case

phrase

models;

(2)

complex

data

including

grammars

and

dependency

trees

syntax

models;

and

(3)

plain

text

but

allowing

hierarchical

units

hierarchical

systems.

Given

that

hybridization

the

focus

this

study,

will

consider

this

latter

criterion

(sources

information)

order

distinguish

paradigms.

Within

this

category,

detail

wide

variety

hybridization

approaches.

2See

references

http://www.statmt.org/survey/Topic/SystemCombination.

M.R.

Costa-jussà,

J.A.R.

Fonollosa

Computer

Speech

and

Language

(2015)

3–10

Fig.

Classiﬁcation

hybrid

architectures.

Fig.

Schema

hybridization

guided

RBMT.

Hybridization

machine

translation

architectures

Several

different

methodologies

have

been

used

hybridize

within

and

across

paradigms.

shown

Fig.

hybridization

RBMT

and

corpus-based

can

classiﬁed

into

those

guided

RBMT

guided

corpus-based

MT.

The

former

integrates

data

information

into

rule-based

architecture;

the

latter

integrates

linguistic

rules

into

corpus-based

architecture.

3.1.

Hybridization

guided

RBMT

There

are

several

kinds

strategy

within

this

category:

introducing

corpus

build

the

RBMT

system,

introducing

corpus-based

tools

weight

the

RBMT

output

and

carrying

out

statistical

post-editing

RBMT

output.

Using

corpus

build

the

RBMT

system.

The

main

reason

for

using

data

when

building

RBMT

system

reduce

its

cost

and

the

time

and

effort

required.

quite

straightforward

approach

enhance

dictionaries

with

phrases

(Habash

al.,

2009)

examples

(Sánchez-Martí

nez

al.,

2009;

Antonova

and

Misyurev,

2014)

extracted

from

parallel

corpora,

and

extract

new

entries

from

BabelNet

and

Wiktionary

(Göhring,

2014).

complex

approaches

extract

transfer

rules

(Sánchez-Martínez

and

Forcada,

2009),

build

lexical

selection

modules

using

parallel

corpora

with

ﬁnite-state

transducers

(Tyers

al.,

2012)

Maximum

Entropy

Markov

Models

(Rudnick

and

Gasser,

2013),

and

combine

several

these

techniques

(Costa-jussà

and

Centelles,

2015).

Corpus-based

tools

for

weighting

the

RBMT

output.

There

work

that

focuses

improving

the

RBMT

output

integrating

tools

such

language

models

(Dove

al.,

2012)

stochastic

parsers

(Federmann

and

Hunsicker,

2011).

Papers

Labaka

al.

(2014)

show

hybrid

translation

system

guided

the

RBMT

engine

and,

before

transference,

set

partial

candidate

translations

provided

SMT

subsystems

used

enrich

the

tree-based

representation.

The

ﬁnal

hybrid

translation

created

choosing

the

most

probable

combination

among

the

available

fragments

with

statistical

decoder

monotonic

way

(see

Fig.

2).

addition,

there

are

RBMT

systems

that

introduce

machine

learning

techniques

such

classiﬁers

order

identify

the

set

appropriate

translation

candidates

(Hunsicker

al.,

2012).

Recent

experiments

Systran

build

statistical

inference

module

replace

the

RBMT

transfer

module

(Crego,

2014)

and

experiments

Lingenio

show

that

RBMT

systems

can

learn

morphological

classiﬁcation,

semantic

and

syntactic

information

from

corpus

data

(Eberle,

2014).

Statistical

post-editing

RBMT

outputs.

There

are

studies

that

carry

out

statistical

post-editing

for

RBMT

systems

(Simard

al.,

2007;

Lagarda

al.,

2009)

and

even

commercial

reality3as

pointed

out

Béchara

al.

(2012).

Generally

speaking,

these

approaches

consider

RBMT

outputs

source

sentences

and

post-edited

results

target

3http://www.systran.co.uk/translation-products/server/systran-enterprise-server.

M.R.

Costa-jussà,

J.A.R.

Fonollosa

Computer

Speech

and

Language

(2015)

3–10

Fig.

Schema

hybridization

guided

corpus-based

MT.

sentences.

other

cases,

Suzuki

(2011)

conﬁdence

estimation

measures

are

used

instead

manually

post-edited

results.

The

statistical

module

tends

implemented

with

Moses

(Koehn

al.,

2007).

this

case,

RBMT

and

SMT

paradigms

are

concatenated

but

not

integrated

the

architecture

level.

3.2.

Hybridization

guided

corpus-based

hybrid

system

guided

corpus-based

may

incorporate

rules

just

combine

various

corpus-based

approaches.

There

are

basically

two

main

ways

integrating

rules

into

corpus-based

approaches:

using

rules

pre/post-processing,

and

integrating

dictionaries/rules

into

the

core

model.

Rules

pre/post-processing.

Pre-processing

rules

have

been

used

reorder

the

source

sentence

into

form

that

better

matches

the

target

language

(Xia

and

McCord,

2004;

Collins

al.,

2005;

Wang

al.,

2007;

Patel

al.,

2013).

The

schema

for

this

type

strategy

shown

Fig.

Post-processing

rules

for

morphology

generation

have

been

introduced

means

combination

machine

learning

and

the

introduction

dictionaries

(Formiga

al.,

2012).

Finally,

set

both

pre-processing

and

post-processing

rules

have

been

compiled

ad-hoc

for

the

Spanish-Catalan

translation

pair

Farrús

al.

(2011),

order

solve

the

normalization

problems

typically

found

noisy

corpora.

Incorporating

dictionaries/rules

into

the

core

model.

Rules

may

integrated

into

the

core

model

corpus-based

approaches.

Early

work

such

Carl

al.

(2000)

integrates

morphology

and

syntax

knowledge

from

the

RBMT

system

dynamically

into

EBMT

system.

other

cases,

RBMT

systems

have

been

integrated

into

the

phrase-based

SMT

modules.

For

example,

Hua

and

Haifeng

(2004)

use

RBMT

information

improve

statistical

word

alignment.

Then,

Eisele

al.

(2008)

augment

the

standard

phrase

table

with

entries

obtained

after

translating

the

data

with

several

RBMT

systems.

The

resulting

phrase

table

thus

combines

statistically

gathered

phrase

pairs

with

phrase

pairs

generated

linguistic

rules.

Similarly,

Sánchez-Cartagena

al.

(2011)

enrich

the

phrase

table

with

bilingual

phrase

pairs

matching

transfer

rules

and

dictionary

entries

from

shallow-transfer

RBMT

system,

and

carrying

out

comparison

with

earlier

paper

(Eisele

al.,

2008).

Further

work

these

latter

authors

(Chen

and

Eisele,

2010)

integrates

commercial

RBMT

system

with

hierarchical

SMT

system

extracting

rules

from

RBMT

translations.

The

hybrid

system

inherits

the

lexicons

from

both

sub-systems

well

local

syntactic

constructions

deﬁned

RBMT.

From

different

perspective,

Ahsan

al.

(2010)

focus

integrating

local

and

long

reorderings

well

the

generation

module

from

RBMT

system,

into

the

core

translation

model

standard

statistical

system.

Furthermore

Enache

al.

(2012)

introduce

rules

from

grammar

formalism

into

the

phrase

table,

and

Okuma

al.

(2008)

introduce

dictionaries

into

the

phrase

table

reduce

the

number

unknown

words.

Hybridization

within

corpus-based

approaches.

When

combining

corpus-based

approaches,

Groves

and

Way

(2005)

mix

sub-sentential

alignments

from

phrase-based

SMT

and

EBMT

systems,

proposing

build

hybrid

‘example-

based’

SMT

system

incorporating

marker

chunks

and

SMT

sub-sentential

alignments.

There

extensive

body

work

incorporating

translation

memories

(TM)

into

phrase-based

SMT

systems.

are

simply

large

databases

translated

words

and

sequences

words,

generally

created

human

translators.

One

the

most

recent

studies

(Wang

al.,

2013)

proposes

integrated

models

make

maximum

use

information

during

decoding.

The

aim

keep

all

its

possible

corresponding

target

phrases

for

each

source

phrase.

The

integrated

models

then

consider

all

corresponding

target

phrases

and

SMT

preferences

during

decoding.

Therefore,

the

proposed

integrated

models

combine

SMT

and

deep

level.

traditional

way

that

cannot

neglected

the

use

templates

(Och

and

Ney,

2004),

which

themselves

can

considered

stochastically-extracted

transduction

type

rules.

There

are

also

approaches

that

combine

n-gram

and

phrase

SMT

series

(Costa-jussà

and

Fonollosa,

2010).

The

former

pre-reorders

the

source

sentences

and

offers

reordering

graph

that

the

latter

translates

using

monotonic

decoding.

M.R.

Costa-jussà,

J.A.R.

Fonollosa

Computer

Speech

and

Language

(2015)

3–10

Table

Hybrid

architectures,

added

information

and

the

corresponding

most

representative

references.

Guided

Information

References

RBMT

Corpus

build Habash

al.

(2009),

Sánchez-Martí

nez

al.

(2009),

Antonova

and

Misyurev

(2014),

Göhring

(2014)

Sánchez-Martínez

and

Forcada

(2009),

Tyers

al.

(2012),

Rudnick

and

Gasser

(2013),

Costa-jussà

and

Centelles

(2015)

Corpus

weight

outputs

Federmann

and

Hunsicker

(2011),

Dove

al.

(2012),

Hunsicker

al.

(2012),

Labaka

al.

(2014),

Crego

(2014),

Eberle

(2014)

Statistical

post-editing

Simard

al.

(2007),

Lagarda

al.

(2009)

Suzuki

(2011),

Béchara

al.

(2012)

Corpus-based

Rules

pre/post-processing

Xia

and

McCord

(2004),

Collins

al.

(2005),

Wang

al.

(2007),

Patel

al.

(2013),

Farrús

al.

(2011),

Formiga

al.

(2012)

Dictionaries/rules

into

the

core

model

Carl

al.

(2000),

Hua

and

Haifeng

(2004),

Okuma

al.

(2008),

Eisele

al.

(2008),

Sánchez-Cartagena

al.

(2011)

Chen

and

Eisele

(2010),

Ahsan

al.

(2010),

Enache

al.

(2012)

Only

corpus Och

and

Ney

(2004),

Groves

and

Way

(2005),

Wang

al.

(2013)

Carbonell

al.

(2006),

Carl

al.

(2008),

Costa-jussà

and

Fonollosa

(2010),

Tambouratzis

al.

(2013)

Finally,

there

are

approaches

that

are

exempt

from

the

requirement

for

parallel

corpora

resources

general.

There

method

that

needs

parallel

text

and

relies

translation

model

built

from

bilingual

dictionary,

and

decoder

for

long-range

context

(Carbonell

al.,

2006).

the

same

direction,

other

systems

use

low

resources

(Carl

al.,

2008)

and

methodology

designed

facilitate

rapid

creation

the

system

for

unconstrained

language

pairs

(Tambouratzis

al.,

2013)

(Table

1).

Machine

translation

applications

with

hybrid

components

Among

the

variety

applications,

can

name

popular

ones

such

speech

translation,

cross-lingual

informa-

tion

retrieval

and

computer-aided

translation.

Hybridization

within

these

applications

has

been

used

different

ways

and

offer

comments

some

them

without

aiming

exhaustive.

See

Fig.

for

short

summary

references.

Speech

translation.

Frequently,

speech

translation

addressed

concatenation

speech

recognizer,

machine

translator

and

speech

synthesizer.

Hybridization

this

application

can

placed

any

the

three

systems.

speech

recognition,

hybridization

has

been

done

incorporating

neural

network

approaches

into

state-of-the-art

continuous

speech

recognition

systems

based

hidden

Markov

models

(HMMs)

(Bourlard

and

Morgan,

1993).

There

also

the

combination

hidden

Markov

models

(HMMs)

and

learning

vector

quantization

(LVQ)

(Katagiri

and

Lee,

1993),

the

use

Support

Vector

Machines

(SVMs)

for

classiﬁcation

integrating

this

method

into

HMM-based

speech

recognition

system

(Ganapathiraju,

2002).

text

synthesis,

the

hybridization

has

been

done

combining

concatenative

synthesis

and

statistical

synthesis

(Tiomkin

al.,

2011).

Cross-lingual

information

retrieval.

Normally,

the

application

cross-lingual

information

retrieval

done

concatenating

and

information

retrieval.

For

example,

Mittal

al.

(2010)

present

hybrid

information

system

combining:

(1)

ontology

for

the

retrieval

user’s

context

(2)

user

proﬁle

that

temporarily

updated

according

user’s

browsing

behavior

and

(3)

collaborative

ﬁltering

for

considering

recommendations

similar

users.

Elsewhere,

Rose

and

Belew

(1989)

use

combination

symbolic

and

connectionist

artiﬁcial

intelligence

techniques.

Table

Hybrid

applications.

applications

References

Speech

translation

Bourlard

and

Morgan

(1993),

Katagiri

and

Lee

(1993),

Ganapathiraju

(2002),

Tiomkin

al.

(2011)

Cross-lingual

information

retrieval

Mittal

al.

(2010),

Rose

and

Belew

(1989)

Computer-aided

translation

Wong

al.

(2012),

Yamabana

al.

(1997),

Federico

al.

(2014)

M.R.

Costa-jussà,

J.A.R.

Fonollosa

Computer

Speech

and

Language

(2015)

3–10

Computer-aided

translation.

Finally,

computer-aided

translation

deﬁnition

combination

the

roles

both

man

and

machine.

Recent

work,

(Wong

al.,

2012),

uses

machine-aided

translation

system,

which

hybrid

system

that

applies

not

only

technology

but

also

methodologies,

including

the

annotation

schema

Translation

Corresponding

Tree

(TCT)

the

representation

bilingual

examples,

and

the

language

formalism

Constraint-based

Synchronous

Grammar

(CSG)

analyzing

the

syntactic

structure

between

the

languages;

Yamabana

al.

(1997)

also

propose

hybrid

interactive

method

that

combines

rule

and

example-based

approaches

with

interactive

man-machine

interface.

Advanced

work

the

ﬁeld

such

(Federico

al.,

2014),

includes

approaches

incremental

training

active

learning

which

are

representative

live

human-machine

hybridization

where

the

system

learns

and

improves

based

human

interaction.

Conclusions

This

survey

reported

overview

several

relevant

works

hybrid

which

combine

different

architectures

provide

better

translation

quality.

Combinations

aim

extracting

the

best

features

each

paradigm

and

solving

the

problems

pure

architectures.

That

why

hybrid

has

helped

advance

the

ﬁeld

and

promising

line

research.

This

paper

provides

structured

classiﬁcation

that

can

cover

the

majority

research

hybrid

MT.

The

classiﬁcation

based

the

fact

that

combinations

approaches

are

normally

guided

core

system

which

can

either

rule

corpus-based.

Most

the

research

combines

sources

information

(rules

and

data),

but

there

are

also

projects

combining

various

corpus-based

approaches.

difﬁcult

assess

which

the

most

relevant

promising

hybrid

type

architecture,

but

would

seem

reasonable

use

the

best-performing

system

guide,

and

the

others

for

additional

information.

The

good

results

produced

hybridization

have

lead

corresponding

spread

applications

such

speech

translation,

cross-language

information

retrieval,

computer-aided

and

post-edited

systems.

Work

with

hybrid

strategies

both

and

its

applications

brings

signiﬁcant

improvement

because

they

allow

the

simultaneous

exploitation

variety

systems.

Acknowledgements

The

authors

would

express

particular

thanks

Declan

Groves

for

his

valuable

comments

while

reviewing

this

paper.

This

paper

has

been

partially

supported

the

Spanish

Ministerio

Economía

Competitividad,

contract

TEC2012-

38939-C03-02,

well

the

European

Regional

Development

Fund

(ERDF/FEDER),

the

Seventh

Framework

Program

the

European

Commission

through

the

International

Outgoing

Fellowship,

Marie

Curie

Action

(IMTraP-

2011-29951),

and

the

AGAUR

under

the

MOOCs

2013

contract.

References

Ahsan,

A.,

Kolachina,

P.,

Kolachina,

S.,

Sharma,

D.M.,

Sangal,

R.,

2010.

Coupling

statistical

machine

translation

with

rule-based

transfer

and

generation.

In:

Proceedings

the

9th

Conference

the

Association

for

Machine

Translation

the

Americas.

Antonova,

A.,

Misyurev,

A.,

2014.

Improving

the

precision

automatically

constructed

human-oriented

translation

dictionaries.

In:

Proceedings

the

3rd

Workshop

Hybrid

Approaches

Machine

Translation

(HyTra),

pp.

58–66.

Béchara,

H.,

Rubino,

R.,

He,

Y. ,

Ma,

Y. ,

Genabith,

J.,

2012.

evaluation

statistical

post-editing

systems

applied

RBMT

and

SMT

systems.

In:

Proceedings

International

Conference

Computational

Linguistics(COLING),

pp.

215–230.

Bourlard,

H.A.,

Morgan,

N.,

1993.

Connectionist

Speech

Recognition:

Hybrid

Approach.

Kluwer

Academic

Publishers,

Norwell,

MA,

USA.

Brown,

P.F. ,

Pietra,

V.J.D.,

Pietra,

S.A.D.,

Mercer,

R.L.,

1993.

The

mathematics

statistical

machine

translation:

parameter

estimation.

Comput.

Linguist.

(June

(2)),

263–311.

Carbonell,

J.G.,

Klein,

S.,

Miller,

D.,

Steinbaum,

M.,

Grassiany,

T.,

Frey,

J.,

2006.

Context-based

machine

translation.

In:

Proceedings

the

Conference

the

Association

for

Machine

Translation

the

Americas

(AMTA).

Carl,

M.,

Melero,

M.,

Badia,

T.,

Vandeghinste,

V. ,

Dirix,

P.,

Schuurman,

I.,

Markantonatou,

S.,

Soﬁanopoulos,

S.,

Vassiliou,

M.,

Yannoutsou,

O.,

2008.

METIS-II:

low

resource

machine

translation.

Mach.

Transl.

(1–2),

67–99.

Carl,

M.,

Pease,

C.,

Iomdin,

L.,

Streiter,

O.,

2000.

Towards

dynamic

linkage

example-based

and

rule-based

machine

translation.

Mach.

Transl.

(3),

223–257.

M.R.

Costa-jussà,

J.A.R.

Fonollosa

Computer

Speech

and

Language

(2015)

3–10

Chen,

Y. ,

Eisele,

A.,

2010.

Integrating

rule-based

with

hierarchical

translation

system.

In:

Proceedings

the

Seventh

conference

International

Language

Resources

and

Evaluation

(LREC).

Collins,

M.,

Koehn,

P.,

Kuˇ

cerová,

I.,

2005.

Clause

restructuring

for

statistical

machine

translation.

In:

Proceedings

the

ACL,

Ann

Arbor,

pp.

531–540.

Costa-jussà,

M.R.,

Centelles,

J.,

2015.

Description

the

Chinese-to-Spanish

rule-based

machine

translation

system

developed

with

hybrid

combination

human

annotation

and

statistical

techniques.

ACM

Trans.

Asian

Lang.

Inf.

Process.

(submitted

for

publication).

Costa-jussà,

M.R.,

Farrús,

M.,

2014.

Statistical

machine

translation

enhancements

through

linguistic

levels:

survey.

ACM

Comput.

Surv.

(3),

42.

Costa-jussà,

M.R.,

Fonollosa,

J.A.R.,

2010.

Using

linear

interpolation

and

weighted

reordering

hypotheses

the

Moses

system.

In:

Proceedings

the

Seventh

conference

International

Language

Resources

and

Evaluation

(LREC),

European

Languages

Resources

Association

(ELRA).

Crego,

J.M.,

2014

April.

SYSTRAN

RBMT

engine:

hybridization

experiments.

In:

Talk

3rd

Workshop

Hybrid

Approaches

Machine

Translation

(HyTra),

Gothenburg,

Sweden.

Dove,

C.,

Loskutova,

O.,

Fuente,

R.,

2012.

What’s

your

pick:

RbMT,

SMT

hybrid?

In:

Proceedings

11th

Conference

the

Association

for

Machine

Translation

the

Americas

(AMTA),

San

Diego.

Eberle,

K.,

2014.

Hybrid

Strategies

for

better

products

and

shorter

time-to-market.

In:

Talk

3rd

Workshop

Hybrid

Approaches

Machine

Translation

(HyTra),

April,

Gothenburg,

Sweden.

Eisele,

A.,

Federmann,

C.,

Saint-Amand,

H.,

Jellinghaus,

M.,

Herrmann,

T.,

Chen,

Y. ,

2008.

Using

Moses

integrate

multiple

rule-based

machine

translation

engines

into

hybrid

system.

In:

Proceedings

the

3rd

Workshop

Statistical

Machine

Translation

(WMT),

pp.

179–182.

Enache,

R.,

Espa˜

na-Bonet,

C.,

Ranta,

A.,

Màrquez,

L.,

2012.

hybrid

system

for

patent

translation.

In:

Proceedings

the

16th

Annual

Conference

the

European

Association

for

Machine

Translation

(EAMT),

May,

Trento,

Italy,

pp.

269–276.

Farrús,

M.,

Costa-jussà,

M.R.,

Mari

no,

J.,

Poch,

M.,

Hernández,

A.,

Henríquez,

C.,

Fonollosa,

J.,

2011.

Overcoming

statistical

machine

translation

limitations:

error

analysis

and

proposed

solutions

for

the

Catalan-Spanish

language

pair.

Lang.

Resour.

Eval.

(2),

181–208.

Federico,

M.,

Bertoldi,

N.,

Cettolo,

M.,

Negri,

M.,

Turchi,

M.,

Trombetti,

M.,

Cattelan,

A.,

Farina,

A.,

Lupinetti,

D.,

Martines,

A.,

Massidda,

A.,

Schwenk,

H.,

Barrault,

L.,

Blain,

F.,

Koehn,

P.,

Buck,

C.,

Germann,

U.,

2014.

The

MATEC,

tool.

In:

Proceedings

COLING

2014,

the

25th

International

Conference

Computational

Linguistics:

System

Demonstrations,

August,

Dublin,

Ireland,

pp.

129–132.

Federmann,

C.,

Hunsicker,

S.,

2011.

Stochastic

parse

tree

selection

for

existing

RBMT

system.

In:

Proceedings

the

6th

Workshop

Statistical

Machine

Translation

(WMT),

pp.

351–357.

Formiga,

L.,

Hernández,

A.,

Mari

no,

J.B.,

Monte,

E.,

2012.

Improving

English

Spanish

out-of-domain

translations

morphology

generalization

and

generation.

In:

AMTA

Workshop

Monolingual

Machine

Translation.

Ganapathiraju,

A.,

2002.

Support

Vector

Machines

for

Speech

Recognition

(PhD

thesis).

Mississippi

State,

MS,

USA.

Göhring,

A.,

2014.

Building

Spanish-German

dictionary

for

hybrid

MT.

In:

Proceedings

the

3rd

Workshop

Hybrid

Approaches

Machine

Translation

(HyTra),

pp.

30–35.

Groves,

D.,

Way,

A.,

2005

Dec.

Hybrid

data-driven

models

machine

translation.

Mach.

Transl.

(3–4),

301–323.

Habash,

N.,

Dorr,

B.,

Monz,

C.,

2009.

Symbolic-to-statistical

hybridization:

extending

generation-heavy

machine

translation.

Mach.

Transl.

(1),

23–63.

Hua,

W.,

Haifeng,

W.,2004.

Improving

statistical

word

alignment

with

rule-based

machine

translation

system.

In:

Proceedings

the

20th

International

Conference

Computational

Linguistics.

Association

for

Computational

Linguistics,

29.

Hunsicker,

S.,

Yu,

C.,

Federmann,

C.,

2012.

Machine

learning

for

hybrid

machine

translation.

In:

Proceedings

the

7th

Workshop

Statistical

Machine

Translation

(WMT),

June,

pp.

312–316.

Katagiri,

S.,

Lee,

C.-H.,

1993.

new

hybrid

algorithm

for

speech

recognition

based

HMM

segmentation

and

learning

vector

quantization.

IEEE

Trans.

Speech

Audio

Process.

(October

(4)),

421–430.

Koehn,

P.,

Hoang,

H.,

Birch,

A.,

Callison-Burch,

C.,

Federico,

M.,

Bertoldi,

N.,

Cowan,

B.,

Shen,

W.,

Moran,

C.,

Zens,

R.,

Dyer,

C.,

Bojar,

O.,

Constantin,

A.,

Herbst,

E.,

2007.

Moses:

open

source

toolkit

for

statistical

machine

translation.

In:

Proceedings

the

45th

Annual

Meeting

the

ACL

Interactive

Poster

and

Demonstration

Sessions,

pp.

177–180.

Labaka,

G.,

Espa

na-Bonet,

C.,

Màrquez,

L.,

Sarasola,

K.,

2014.

hybrid

machine

translation

architecture

guided

Syntax.

Mach.

Transl.

28,

1–35.

Lagarda,

A.-L.,

Alabau,

V. ,

Casacuberta,

F.,

Silva,

R.,

Diaz-de

Liano,

E.,

2009.

Statistical

post-editing

rule-based

machine

translation

system.

In:

Proceedings

Human

Language

Technologies:

The

2009

Annual

Conference

the

North

American

Chapter

the

Association

for

Computational

Linguistics,

Companion

Volume:

Short

Papers,

pp.

217–220.

Mittal,

N.,

Nayak,

R.,

Govil,

M.C.,

Jain,

K.C.,

2010.

Evaluation

hybrid

approach

personalized

web

information

retrieval

using

the

FIRE

data

set.

In:

Proceedings

the

1st

Amrita

ACM- W

Celebration

Women

Computing

India,

New

York,

NY,

USA,

pp.

52:1–52:6.

Och,

F.J.,

Ney,

H.,

2004.

The

alignment

template

approach

statistical

machine

translation.

Comput.

Linguist.

(December

(4)),

417–449.

Okuma,

H.,

Yamamoto,

H.,

Sumita,

E.,

2008.

Introducing

translation

dictionary

into

phrase-based

SMT.

IEICE

Trans.

Inf.

Syst.

E91-D

(July

(7)),

2051–2057.

Patel,

R.N.,

Gupta,

R.,

Pimpale,

Prakash,

B.,

M.S.,

2013.

Reordering

rules

for

English-Hindi

SMT.

In:

Proceedings

the

2nd

Workshop

Hybrid

Approaches

Translation

(HyTra),

August,

Soﬁa,

Bulgaria,

pp.

34–41.

Rose,

D.E.,

Belew,

R.K.,

1989.

Legal

information

retrieval

hybrid

approach.

In:

Proceedings

the

2nd

International

Conference

Artiﬁcial

Intelligence

and

Law,

ICAIL’89,

ACM,

New

York,

NY,

USA,

pp.

138–146.

Rudnick,

A.,

Gasser,

M.,

2013.

Lexical

selection

for

hybrid

with

sequence

labeling.

In:

Proceedings

the

2nd

Workshop

Hybrid

Approaches

Translation

(HyTra),

August,

Soﬁa,

Bulgaria,

pp.

102–108.

Sánchez-Cartagena,

V.M.,

Sánchez

Martí

nez,

F.,

Pérez

Ortiz,

J.A.,

al.,

2011.

Integrating

shallow-transfer

rules

into

phrase-based

statistical

machine

translation.

In:

Machine

Translation

Summit.

M.R.

Costa-jussà,

J.A.R.

Fonollosa

Computer

Speech

and

Language

(2015)

3–10

Sánchez-Martínez,

F.,

Forcada,

M.L.,

2009.

Inferring

shallow-transfer

machine

translation

rules

from

small

parallel

corpora.

Artif.

Intell.

Res.

34,

605–635.

Sánchez-Martí

nez,

F.,

Forcada,

M.L.,

Way,

A.,

2009.

Hybrid

rule-based-example-based

MT:

feeding

Apertium

with

sub-sentential

translation

units.

In:

Proceedings

the

3rd

Workshop

Example-Based

Machine

Translation,

Dublin,

pp.

11–18.

Simard,

M.,

Uefﬁng,

N.,

Isabelle,

P.,

Kuhn,

R.,

2007.

Rule-based

translation

with

statistical

phrase-based

post-editing.

In:

Proceedings

the

2nd

Workshop

Statistical

Machine

Translation

(WMT),

pp.

203–206.

Suzuki,

H.,

2011.

Automatic

post-editing

based

SMT

and

its

selective

application

sentence-level

automatic

quality

evaluation.

In:

Proceedings

the

13th

Machine

Translation

Summit

(MT

Summit

XIII),

International

Association

for

Machine

Translation,

pp.

156–163.

Tambouratzis,

G.,

Soﬁanopoulos,

S.,

Vassiliou,

M.,

2013.

Language-independent

hybrid

with

PRESEMT.

In:

Proceedings

the

2nd

Workshop

Hybrid

Approaches

Translation

(HyTra),

August,

Soﬁa,

Bulgaria,

pp.

123–130.

Tiomkin,

S.,

Malah,

D.,

Shechtman,

S.,

Kons,

Z.,

2011.

hybrid

text-to-speech

system

that

combines

concatenative

and

statistical

synthesis

units.

IEEE

Trans.

Audio

Speech

Lang.

Process.

(July

(5)),

1278–1288.

Tyers,

F.M.,

Sánchez-Martínez,

F.,

Forcada,

M.L.,

2012.

Flexible

ﬁnite-state

lexical

selection

for

rule-based

machine

translation.

In:

Proceedings

the

16th

Conference

the

European

Association

for

Machine

Translation

(EAMT),

May,

Trento,

Italy,

pp.

213–220.

Wang,

C.,

Collins,

M.,

Koehn,

P.,

2007.

Chinese

syntactic

reordering

for

statistical

machine

translation.

In:

Proceedings

the

Conference

Empirical

Methods

Natural

Language

Processing

and

Computational

Natural

Language

Learning,

pp.

737–745.

Wang,

K.,

Zong,

C.,

Su,

K.-Y.,

2013.

Integrating

translation

memory

into

phrase-based

machine

translation

during

decoding.

In:

Proceedings

the

51st

Annual

Meeting

the

Association

for

Computational

Linguistics,

August,

Soﬁa,

Bulgaria,

pp.

11–21.

Wong,

F.,

Oliveira,

F.,

Li,

Y. ,

2012.

Hybrid

machine

aided

translation

system

based

constraint

synchronous

grammar

and

translation

corresponding

tree.

Comput.

(February

(2)),

309–316.

Wu,

D.,

2005.

model

space:

statistical

versus

compositional

versus

example-based

machine

translation.

Mach.

Transl.

(December

(3–4)),

213–227.

Xia,

F.,

McCord,

M.,

2004.

Improving

statistical

system

with

automatically

learned

rewrite

patterns.

In:

Proceedings

the

20th

International

Conference

Computational

Linguistics

(COLING).

Yamabana,

K.,

Kamei,

S.-i.,

Muraki,

K.,

Doi,

S.,

Tamura,

S.,

Satoh,

K.,

1997.

hybrid

approach

interactive

machine

translation:

integrating

rule-

based,

corpus-based,

and

example-based

method.

In:

Proceedings

the

15th

International

Joint

Conference

Artiﬁcal

Intelligence,

IJCAI’97,

vol.

San

Francisco,

CA,

USA,

pp.

977–982.

English to Igbo Neural Machine Translation with Recurrent Neural Network and Transfer Learning

Thesis

Jul 2022

Ekle Ocheme Anthony

In this study, we developed a NMT model from English to Igbo, an African language spoken by over 40 million people in Nigeria and across west Africa. We used the standard benchmark dataset collected from bible corpora, local news, Wikipedia articles, and common crawl verified by language experts. RNN-based architectures, including LSTM and GRU, were employed in conjunction with the attention mechanism in our proposed solution. The translation quality exhibited by our model was found to be comparable to the current state-of-the-art benchmark for English-Igbo translation. And, by leveraging Transfer Learning techniques, our model outperformed the English-Igbo state-of-the-art benchmark by up to $+$4.83 BLEU points, achieving a translation quality of 70$\%$. This achievement is particularly significant in the context of low-resource translations.

Implications of Using AI in Translation Studies: Trends, Challenges, and Future Direction

Article

Full-text available

Apr 2024

This review paper provides an overview of the use of artificial intelligence (AI) in Translation Studies (TS), covering statistical machine translation, rule-based machine translation, neural machine translation, and hybrid machine translation. It explores the advantages and limitations of each model, as well as their applications in translation. Additionally, it discusses various techniques for evaluating the effectiveness of AI models in translation, along with their advantages and limitations, such as handling figurative language (e.g., idioms, metaphors) and cultural nuances. The review also delves into research directions for improving AI-based translation, elaborates on the ethical and social implications of AI in translation, and discusses the representation of AI in other disciplines such as literature and arts. Finally, the impact of AI as well as the opportunities and challenges that it could create for translators, such as professional challenges, data privacy, bias, and fairness matters were briefly discussed. By summarizing the main findings, and lessons learnt in AI-based translation, some recommendations regarding the current and future direction of using AI in translation were formulated.

The Impact of a Vision Intervention on Translation and Interpretation Students’ L2 Motivation

Article

Full-text available

Jan 2024

Visionary teaching interventions have had a positive impact on developing and strengthening students’ ideal L2 self and motivated behavior. However, research on the effects of this kind of intervention on the motivation of Translation and Interpretation students is scarce. Using a mixed methods approach, we have evaluated the impact of a semester-long intervention focused on Translation and Interpretation students’ future professional careers on their motivation, intended effort, and willingness to communicate. A questionnaire was used to estimate ideal L2 self, ought-to L2 self, learning attitudes, intended effort, ease of using imagery, and willingness to communicate in writing. Additionally, we used a semi-structured interview to explore in further depth the students’ perceptions of the experience. The results of this study reveal that visionary teaching increased both Ideal l2 self and Intended effort of students. Additionally, the analysis of the semi-structured interview data showed that the intervention was memorable for students and that it benefited them in establishing a future L2 professional vision as well as outline the steps to achieve it. Our findings strengthen the importance of including visionary teaching in translation and interpretation programs, so that students can become motivated and involved in their future professional paths.

Deep Learning Performance Characterization on GPUs for Various Quantization Frameworks

Article

Full-text available

Oct 2023

Deep learning is employed in many applications, such as computer vision, natural language processing, robotics, and recommender systems. Large and complex neural networks lead to high accuracy; however, they adversely affect many aspects of deep learning performance, such as training time, latency, throughput, energy consumption, and memory usage in the training and inference stages. To solve these challenges, various optimization techniques and frameworks have been developed for the efficient performance of deep learning models in the training and inference stages. Although optimization techniques such as quantization have been studied thoroughly in the past, less work has been done to study the performance of frameworks that provide quantization techniques. In this paper, we have used different performance metrics to study the performance of various quantization frameworks, including TensorFlow automatic mixed precision and TensorRT. These performance metrics include training time and memory utilization in the training stage along with latency and throughput for graphics processing units (GPUs) in the inference stage. We have applied the automatic mixed precision (AMP) technique during the training stage using the TensorFlow framework, while for inference we have utilized the TensorRT framework for the post-training quantization technique using the TensorFlow TensorRT (TF-TRT) application programming interface (API).We performed model profiling for different deep learning models, datasets, image sizes, and batch sizes for both the training and inference stages, the results of which can help developers and researchers to devise and deploy efficient deep learning models for GPUs.

An Overview of Speech-to-Speech Translation Framework and its Modules

Article

Aug 2023

Machine Translation System for Numeral in English Text to Yorùbá Language

Article

Full-text available

Dec 2022

The machine translation of numbers from the English language into the Yorùbá language is an integral part of a machine translation system from the English language into the Yorùbá language as a numeral system is an important aspect of any language. This paper presents a computational approach to English number text translation into Yorùbá text. The approach shows the number of ways an English number text can be translated into the Yorùbá language and the various forms in which the translations can be done based on context. This was carried out by collecting numeral data from the Yorùbá literature, formulating context-free grammar, and implementing the model with the Python programming language. The evaluation of the system was carried out using the Bilingual Evaluation Understudy (BLEU) score. The result of the approach is a software artefact for translating number text in the context of simple English sentences to Yorùbá text. They can further be integrated as a module into a robust machine translation system for effective and accurate translation from the English language to the Yorùbá language.

Linguistic Communication Channels Reveal Connections between Texts: The New Testament and Greek Literature

Article

Full-text available

Jul 2023

Emilio Matricciani

We studied two fundamental linguistic channels—the sentences and the interpunctions channels—and showed they can reveal deeper connections between texts. The applied theory does not follow the actual paradigm of linguistic studies. As a study case, we considered the Greek New Testament, with the purpose of determining mathematical connections between its texts and possible differences in the writing style (mathematically defined) of the writers and in the reading skill required of their readers. The analysis was based on deep-language parameters and communication/information theory. To set the New Testament texts in the larger Greek classical literature, we considered texts written by Aesop, Polybius, Flavius Josephus, and Plutarch. The results largely confirmed what scholars have found about the New Testament texts, therefore giving credibility to the theory. The Gospel according to John is very similar to the fables written by Aesop. Surprisingly, the Epistle to the Hebrews and Apocalypse are each other’s “photocopies” in the two linguistic channels and not linked to all other texts. These two texts deserve further study by historians of the early Christian church literature at the level of meaning, readers, and possible Old Testament texts that might have influenced them. The theory can guide scholars to study any literary corpus.

Linguistic Communication Channels Reveal Connections Between Texts: The New Testament and Greek Literature

Preprint

Full-text available

Jun 2023

Emilio Matricciani

We study two fundamental linguistic channels ‒ the Sentences and the Interpunctions channels ‒ and show they can reveal deeper connections between texts. The theory applied does not follow the actual paradigm of linguistic studies. As study‒case, we consider the Greek New Testament, with the purpose of determining mathematical connections between its texts and possible differences in writing style (mathematically defined) of writers, and in reading skill required to their readers. The analysis is based on deep‒language parameters and communication/information theory. To set the New Testament texts in the larger Greek Classical Literature, we consider texts written by Aesop, Polybius, Flavius Josephus and Plutarch. The results largely confirm what scholars have found about the New Testament texts giving, therefore, credibility to the theory. The gospel according to John is very similar to Fables written by Aesop. Surprisingly, the Epistle to the Hebrews and Apocalypse, are each other “photocopy” in the two linguistic channels, and not linked to all other texts. These two texts deserve further study by historians of the early Christian Church Literature at the level of meaning, readers and possible Old Testament texts which might have influenced them. The theory can guide scholars to study any literary corpus.

Machine translation and its evaluation: a study

Article

Full-text available

Feb 2023
ARTIF INTELL REV

Machine translation (namely MT) has been one of the most popular fields in computational linguistics and Artificial Intelligence (AI). As one of the most promising approaches, MT can potentially break the language barrier of people from all over the world. Despite a number of studies in MT, there are few studies in summarizing and comparing MT methods. To this end, in this paper, we principally focus on presenting the two mainstream MT schemes: statistical machine translation (SMT) and neural machine translation (NMT), including their basic rationales and developments. Meanwhile, the detailed translation models are also presented, such as the word-based model, syntax-based model, and phrase-based model in statistical machine translation. Similarly, approaches in NMT, such as the recurrent neural network-based, attention mechanism-based, and transformer-based models are presented. Last but not least, the evaluation approaches also play an important role in helping developers to improve their methods better in MT. The prevailing machine translation evaluation methodologies are also presented in this article.

Machine translation: its typology, advantages and disadvantages

Article

Full-text available

Jan 2021

The article identifies the possible classifications of machine translation, highlights the clearest typology, and identifies the advantages and disadvantages of each type of machine translation. Intercultural communication is difficult to imagine without the use of translation, but acquiring the competence of a translator requires a lot of time and effort. Therefore, it is difficult to overestimate the relevance of studying and solving problems related to machine translation, and the importance of its practical application in overcoming the language barrier. The term “Machine Translation” (abbreviated MT, in Ukrainian “Машинний переклад” or “МП”) refers to the action when one natural language is translated into another one using special software. During the research it was found that there are several classifications of machine translation (by the number of languages, by the direction of translation, by the role that a person plays in the process of MT). However, we have considered the most commonly used typology – the division into two main groups: machine translation based on rules and statistical. Hybrid systems are singled out, which are designed to combine the most effective features of rule-based systems and statistical systems. The study describes these four types of machine translation, their features, causes and uses. In addition, it is specified which programs belong to a particular type of machine translation. The article also points out the advantages and disadvantages of each of the four types of machine translation. Based on the study, it was concluded that currently becoming increasingly popular hybrid approaches designed to combine the advantages of classical and statistical approaches. At the moment, MT systems are not suitable for working with texts that contain a large number of complex and complex sentences and work well mainly at the phrase level. Key words: machine translation, hybrid system, statistical system, translation memory, rule-based system.

Hybrid Strategies for better products and shorter time-to-market

Conference Paper

Full-text available

Jan 2014

Kurt Eberle

An Evaluation of Statistical Post-Editing Systems Applied to RBMT and SMT Systems

Conference Paper

Full-text available

Dec 2012

Statistical post-editing (SPE) of the output produced by rule-based MT (RBMT) systems has been reported to produce extraordinary BLEU (and other automatic evaluation) score improvements. SPE has also been applied to the output of statistical MT (SMT) systems, albeit with more mixed results. We present a statistical post-editing pipeline and evaluate the outputs using automatic and human evaluation techniques, comparing the two SPE pipeline systems (RBMT + SPE and SMT + SPE) with the pure RBMT and SMT system, in an SPE scenario that uses independently existing bitext data, rather than manually corrected first stage MT output, as its training data. Our results show that although automatic evaluation metrics favour the pure SMT system, human evaluators prefer the output provided by the statistically post-edited RBMT system.

The MATEC, tool

Article

Jan 2014

A hybrid machine translation architecture guided by syntax

Article

Oct 2014

This article presents a hybrid architecture which combines rule-based machine translation (RBMT) with phrase-based statistical machine translation (SMT). The hybrid translation system is guided by the rule-based engine. Before the transfer step, a varied set of partial candidate translations is calculated with the SMT system and used to enrich the tree-based representation with more translation alternatives. The final translation is constructed by choosing the most probable combination among the available fragments using monotone statistical decoding following the order provided by the rule-based system. We apply the hybrid model to a pair of distantly related languages, Spanish and Basque, and perform extensive experimentation on two different corpora. According to our empirical evaluation, the hybrid approach outperforms the best individual system across a varied set of automatic translation evaluation metrics. Following some output analysis to better understand the behaviour of the hybrid system, we explore the possibility of adding alternative parse trees and extra features to the hybrid decoder. Finally, we present a twofold manual evaluation of the translation systems studied in this paper, consisting of (i) a pairwise output comparison and (ii) a individual task-oriented evaluation using HTER. Interestingly, the manual evaluation shows some contradictory results with respect to the automatic evaluation; humans tend to prefer the translations from the RBMT system over the statistical and hybrid translations.

Description of the Chinese-to-Spanish Rule-Based Machine Translation System Developed Using a Hybrid Combination of Human Annotation and Statistical Techniques

Article

Nov 2015

Two of the most popular Machine Translation (MT) paradigms are rule based (RBMT) and corpus based, which include the statistical systems (SMT). When scarce parallel corpus is available, RBMT becomes particularly attractive. This is the case of the Chinese--Spanish language pair. This article presents the first RBMT system for Chinese to Spanish. We describe a hybrid method for constructing this system taking advantage of available resources such as parallel corpora that are used to extract dictionaries and lexical and structural transfer rules. The final system is freely available online and open source. Although performance lags behind standard SMT systems for an in-domain test set, the results show that the RBMT’s coverage is competitive and it outperforms the SMT system in an out-of-domain test set. This RBMT system is available to the general public, it can be further enhanced, and it opens up the possibility of creating future hybrid MT systems.

Improving the precision of automatically constructed human-oriented translation dictionaries

Conference Paper

Jan 2014

In this paper we address the problem of automatic acquisition of a human-oriented translation dictionary from a large-scale parallel corpus. The initial translation equivalents can be extracted with the help of the techniques and tools developed for the phrase-table construction in statistical machine translation. The acquired translation equivalents usually provide good lexicon coverage, but they also contain a large amount of noise. We propose a supervised learning algorithm for the detection of noisy translations, which takes into account the context and syntax features, averaged over the sentences in which a given phrase pair occurred. Across nine European language pairs the number of serious translation errors is reduced by 43.2%, compared to a baseline which uses only phrase-level statistics.

Building a Spanish-German Dictionary for Hybrid MT

Article

Apr 2014

Anne Göhring

This paper describes the development of the Spanish-German dictionary used in our hybrid MT system. The compilation process relies entirely on open source tools and freely available language resources. Our bilingual dictionary of around 33,700 entries may thus be used, distributed and further enhanced as convenient.

Moses

Conference Paper

Jan 2007

Integrating Translation Memory into Phrase-Based Machine Translation during Decoding

Conference Paper

Aug 2013

Since statistical machine translation (SMT) and translation memory (TM) complement each other in matched and unmatched regions, integrated models are proposed in this paper to incorporate TM information into phrase-based SMT. Unlike previous multi-stage pipeline approaches, which directly merge TM result into the final output, the proposed models refer to the corresponding TM information associated with each phrase at SMT decoding. On a Chinese-English TM database, our experiments show that the proposed integrated Model-III is significantly better than either the SMT or the TM systems when the fuzzy match score is above 0.4. Furthermore, integrated Model-III achieves overall 3.48 BLEU points improvement and 2.62 TER points reduction in comparison with the pure SMT system. Besides, the proposed models also outperform previous approaches significantly.

Language-independent hybrid MT with PRESEMT

Conference Paper

Aug 2013

Latest trends in hybrid machine translation and its applications

Abstract and Figures

Recommended publications

A Distributed Plan Verifier

Engineering positioning test in Flanders: a powerful predictor for study success?

Using mobile agent-based techniques in knowledge management for organisational network interoperabil...

Foreword to state-of-the-art presentations