Conference PaperPDF Available

Abstract and Figures

Idioms are taking a vital part in effective communication as well as a crucial part of cultural inheritance. It represents the group of words together have the meaning which is different from an individual word meaning, for this metaphorical behavior idioms arise difficulties in the general machine translation system. In this paper, we have proposed a framework for translating Bengali to English. Context sensitive grammar rules are created for parsing. The top-down algorithm is used for parsing the sentences. We have proposed an algorithm for translating idioms in sentences. The proposed system is implemented and tested with about 15000 sentences. The performance analysis of the system gives 85.33% accuracy, which is quite satisfactory.
Content may be subject to copyright.
2020 IEEE Region 10 Symposium (TENSYMP), 5-7 June 2020, Dhaka, Bangladesh
An Emperical Framework of Idioms Translator
From Bengali to English: Rule Based Approach
Ayesha Khatun, Md Gulzar Hussain, Md Jahidul Islam, Sumaiya Kabir§, Md Mahin
Department of Computer Science & Engineering,
Green University of Bangladesh, Dhaka, Bangladesh.
ayeshankhatun@gmail.com, gulzar.ace@gmail.com, jahidul.jnucse@gmail.com,
summa.cse@gmail.com§, mahin@cse.green.edu.bd
Abstract—Idioms are taking a vital part in effective com-
munication as well as a crucial part of cultural inheritance.
It represents the group of words together have the meaning
which is different from an individual word meaning, for this
metaphorical behavior idioms arise difficulties in the general
machine translation system. In this paper, we have proposed a
framework for translating Bengali to English. Context sensitive
grammar rules are created for parsing. The top-down algorithm
is used for parsing the sentences. We have proposed an algorithm
for translating idioms in sentences. The proposed system is imple-
mented and tested with about 15000 sentences. The performance
analysis of the system gives 85.33% accuracy, which is quite
satisfactory.
KeywordsBangla Machine Translator; Idioms; Bangla Lan-
guage Processing (BLP); Left corner parsing algorithm.
I. INTRODUCTION
An Idiom is a commonly used word or sentence that implies
something other than its metaphorical sense. Idioms convey a
specific feeling and a specific tone for a language. Due to their
common use, idioms can be recognized. Machine Translation
(MT) relates to the application of computers, which is capable
of translating the source language into target languages. This
process generally does not have any human intervention.
The MT model follows three main phases of parsing,
transferring and generation. But our idiom translator follows
four stages, which are idiom translator, parser, transfer, and
generation. Idiom translator checks the idiom part in the sen-
tence and translates it. Parser gathers the syntactic information
of the sentence using Context Free Grammars (CFG). In the
transfer stage, rules are transferred from source language to
target language. And finally, the targeted sentence is generated
in the generation stage. As idioms do not signify the literal
meaning of the words used, it is hard to translate idioms from
source language to target language.
Native Bangla speakers are growing day after day by
speaking and hearing idioms. It also implies for native English
speakers. Idiom plays a vital role in the culture of different
language speakers. In this modern age, it’s very important
to share knowledge and culture between different regions.
But due to language barrier Bangladeshi’s are not getting the
advantage of learning the various culture. To overcome this
barrier Bangla Language Processing can play an important
role. Our Idiom translator will be able to help Bangali people
to understand idioms in the English language, which will help
them to adopt their culture and break the cultural barrier.
The rest of the paper is organized as follows: Section II
discusses related works. Methodology is discussed in Section
III and it illustrates a sample following our proposed method-
ology. Section IV demonstrates the result and discussion and
finally Section V refers the conclusion.
II. RE LE TE D WORK
Research on the processing of natural language started in the
1950s. In the late 1980s, the first statistical machine translation
systems were developed [1]. Till now many works are done
in English language. Authors of [2] developed a Japanese-
English machine translation system which was supported by
the Japanese government’s science and technology agency.
The system applies many structural transformations during the
transfer phase and generation phase to relieve the structural
difference of the same contents and avoid ellipsis problems.
Machine translation from Bangla language to other lan-
guages is in initial step now. Many works are done recently
on Bangla to English or vice versa. A phrase-based Statistical
Machine Translation (SMT) approach is proposed in [3]. In
their work Out-of-Vocabulary (OOV) words are also handled.
Authors of [4] proposed a rule-based transfer approach. They
proposed an algorithm for searching the word from the lexicon
and searching lexicon is made efficient by an intelligent
integer based lexicon system. NLP techniques used to translate
English to Bangla sentences in [5]. The context-free grammar
used to validate the syntactical structure of a sentence and
bottom-up approach is used to parse sentences. They used 50
sentences for every tense. In [6] they proposed a verb based
machine translation approach for English to Bangla. They
identified the main verb and make a simple form of English
sentence. Then they easily translate it into Bangla. Authors
of [7] also proposed context-sensitive grammar to translate
Bangla to English. A new technique with a set of context-
sensitive grammar rules is proposed to parse any Bangla
sentences with imperative, optative and exclamatory Bangla
sentences in [8] where moods got importance than the structure
of sentence. Authors of [9] work to find the appropriate verb
according to the tense and subject. A procedure for finding
semantically valid verb is proposed. They worked with verb
root and different algorithms are proposed in this paper.
978-1-7281-7366-5/20/$31.00 ©2020 IEEE
Maximum MT systems translate Bangla sentences to corre-
sponding English sentences but we found only one of them
includes idioms [10]. This paper presents, in addition to
English, a multi lingual parallel idiom data set for seven Indian
languages, and shows its relevance for two NLP applications.
A set of CSG rules is proposed for our MT system to translate
Bangla sentences with idioms to it’s corresponding English
sentence. Maximum work does not show the architecture of
procedure of translation idioms and work with fewer data.
In this system we proposed an architecture for translating
sentences.
III. PROP OS ED ME TH OD OL OG Y
In this propose system we have ten modules, the modules
are idioms checker, idioms translator, tokenizer, rule gener-
ator, database, parser, target language rules, source language
rules, machine translator and generator. Firstly, we consider a
Bengali sentence এই সমােজ লােকরা অচল পয়সা as input
of the system. Step by step procedure is given in Fig. 1.
Fig. 1. Workflow of proposed system
A. Tokenizer
The main task of the tokenizer module is to split sentences
into unit strings. It is like a database system of words with
corresponding Parts of Speech (POS) tag. Suppose for the
input sentence এই সমােজ লােকরা অচল পয়সা ”, the output
will be like "এই”,“সমােজ”, “ব” ,“লােকরা”, “অচল", "পয়সা”.
After tokenizing the sentence, tokens will be going to idioms
checker.
B. Idioms Checker
The main task of idioms checker is to check the idioms
in the sentence by using Idioms checker algorithm which is
Algorithm 1. In idioms dataset when wi=অচল, where অচল
also find in idioms dataset di, then it will find the next word
wi+1 =পয়সা, then concat the string k = stringConcat(অচল,
পয়সা). Now idioms di= is equal to ki+1 as idioms found in
dataset so it will go to the next step Idioms translator if it does
not find any term then it will concat the string up to i = 5 and
then go to parser. If the sentence contains any idioms, then it
will go to the idioms translator. For example,"এই সমােজ
লােকরা অচল পয়সা" as "অচল পয়সা" is an idiom, it will go to
the idioms translator module.
Algorithm 1: Algorithm for Idioms Checker
1. If wiis equal to split of di;
2. Find wi+1;
3. Function mPairWord(w1, w2, .....wn);
4. k =function stringConcat(w1, w2, .....wn);
5. for i= 0 to idioms dataset length do
if k== dithen
go to 6;
break;
else
go to 7;
end
end
6. go to idioms Translator module;
7. go to Parser module;
C. Idioms Translator
This translator translates the idioms into its original mean-
ing. As the idioms checker find that the sample input sentence
has idioms“অচল পয়সা”, after that this module translates
the idioms into its corresponding meaning“মলহীন”. After
translating idioms, it goes to parser module as shown in Fig
2.
Fig. 2. Module of Idioms Translator
D. Database
Database module is just like a dictionary which contains the
lexicon or token of a sentence and the related POS tag. For
example, in this sentence the pos tag of corresponding words
are,“এই” PN, “সমােজ” N, “ব”Adj, “লােকরা”N,
“মলহীন”Adj. In this system, it has another table which
has a set of Bangla idioms and its meaning. Table I shows the
Idioms Table.
TABLE I
BAN GLA IDIOMS TAB LE
Idioms (di) Meaning (mi)
অচল পয়সা লহীন
অকালক
া অপদাথ
ইচ
েড় পাকা অকালপ
উম-মধম হার
এলািহ কা িবরাট বাপার
E. Rule Generator
The main purpose of the rule generator module is to
generate the grammatical rules of Bangla sentences. For trans-
lating, the sentences, this module generates Context-Sensitive
Grammar (CSG) rules. For this input sentence and this is built
with the help of rules, these sentences need those rules NP
N (Biv) (Adj), S NP VP, NP (Qnt) (PP) N PN
TABLE II
BAN GLA CSGS RUL ES
Rule No Bangla CSGs Rules
1 S NP VP
2 NP N (Biv) (Adj)
3 NP N (Aux) (PP)
4 NP NP NP
5 NP (PN) N (Biv) (Adj)
6 NP (Adj) N (Biv)
7 NP (Qnt) (PP) N PN
8 NP N
9 PP Null
10 V Null
11 VP V
12 VP (Adj)
13 VP (NP) VP
14 VP V (Aux)
15 Adj , ভাল, অমল, খারাপ, . . . .
16 PN এই, আিম, আপিন,
িম, . . . . .
17 Nচার, হার, লাক, সমাজ,. . . . .
18 Vহয়, ছাড়ল, পড়া, খাওয়া, . . . .
19 Biv টােক, এরা, এ, . . . .
20 Aux িদেয়, পের, কের, . . . . .
21 Qnt একিট, পাচিট, . . .
for generating the parse tree. Sample CSG of Bangla simple
sentences is listed in Table II.
F. Parser
Graphical view of the grammatical structure of the sentence
is called the parse tree. Parser module helps to generate the
parse tree of a sentence by using CSG rules and lexicon. We
used left corner parsing algorithm to parse the sentence. This
module generates the parse tree for the input sentence“এই
সমােজ লােকরা লহীন” which is shown in Fig. 3.
Fig. 3. Representation of Bangla parse tree
G. Transfer
The task of the transfer module is to translate Bangla
sentence to English language. The grammatical rule for trans-
forming of grammar rule is listed in Table III. Using this
grammar rules and transformation algorithm, we can get parse
tree of English sentence, which is shown in Fig. 4.
The transformation process is divided into two part, rule
transfer and lexicon transfer. The process of transforming
grammar from source to target or from target to source
language is shown in Table IV.
TABLE III
ENG LIS H CSGS RULE S
Rule No English CSGs Rules
1 S NP VP
2 S VP NP
3 NP NP NP
4 NP Det N
5 NP (PP) N (Adv)
6 NP (PP) (PN) (Det) N
7 NP Adj N
8 NP Qnt N
9 NP N
10 NP PN
11 NP (Aux) N
12 VP V
13 VP V (Adj)
14 VP VP NP
15 VP V (Gr) (N) (Adj)
16 VP Aux V
17 N thief, beating, society, person,
18 PN this, that, I, She,...
19 V release, are, like, eat, go,...
20 Adj old, priceless, bad, good, ...
21 Aux do, are, is,..
22 PP in, on, to,..
23 Det the, a, an,
24 Gr ing
Fig. 4. Representation of English parse tree
TABLE IV
TRANSFORMATION OF TARGET TO SOURCE OR VICE VERSA
IV. EXPERIMENTAL RESULT
To assess the efficiency of our proposed system, we have
evaluated the system with about 15000 distinct types of
sentences with distinct sentence lengths. We collected these
sentences from various books, websites, Bangla grammar
books, Bangla text books etc.
A. Implementation
For executing the system, we used, Windows 10 as the
operating system, Java Swing to build the user interface, Java
as the programming language, and NetBeans 8.2 as IDE. The
snapshot of our implemented proposed MT system for the
sentence “এই সমােজ লােকরা অচল পয়সা” with idioms is
given in Fig. 5 where Google translator do not show the
appropriate transformation, given in Fig. 6.
Fig. 5. Translation of the sentence“এই সমােজ লােকরা অচল পয়সা”
Fig. 6. Translation of the sentence “এই সমােজ লােকরা অচল পয়সা”
in Google translator
B. Accuracy Rate
We observed that among 15000 sentences, a total of 12800
sentences were correctly translated with our proposed model.
The accuracy rate is the ratio of the correctly translated
sentences and the total number of sentences. Table V shows
the accuracy rate for the sentences with different lengths. A
graph of the system’s accuracy rate vs. the sentence length
is shown in Fig. 7. From this graph, we can observe that the
accuracy rate is decreasing where the length of the sentences
are increasing.
TABLE V
ACCURACY RATE OF DIFFERENT SENTENCES WITH
DIFFERENT LENGTH
Sentences
Length
No of
input
sentences
Correctly
translated
sentences
Overall ac-
curacy (%)
3 3500 3300 94.24
4 3250 2850 87.69
5 3100 2650 85.48
6 2750 2150 78.18
7 2400 1850 77.08
Total 15000 12800 85.33
Fig. 7. Accuracy vs. word length graph
C. Comparison Analysis
Comparison with paper [10] of our proposed method is
given in Table VI. Some parameters such as application, Em-
phasize, Feature, Accuracy etc. are shown in that comparison.
In Table VI we can see that XML markup language used as
feature in paper [10] where in our system rule-based approach
is used which is more appropriate than XML markup language.
TABLE VI
COMPARISON BETWEEN PAPER [10] AND OUR PROPOSED
SYSTEM
Paper [10] [R. Agrawal,
2018] Our proposed system
Application MT, Sentimental Analysis MT
Emphasize Indian Languages Only Bangla Language
Feature XML markup Rule Based
Accuracy 2.69% BLEU score 85.33% for 0.015
million corpora
Dataset
(Idioms) 2208 for 7 languages 986 for Bangla language
V. CONCLUSION
Aim of our paper is to translate different Bangla sentences
containing idioms to its corresponding English sentences. The
idea was to design a proper parsing technique to parse Bangla
sentences with idioms. Our proposed algorithm is able to
detect the idioms and translate it to its corresponding English
meaning. The experimental result shows, our technique gives
the accuracy of 85.33%. Our system might not get the exact
parse tree for some sentences. To evaluate our implemented
parsing model, we choose very simple and short Bangla
sentences. It is possible to design a stronger parser for Bangla
sentences to update CSG rules. These can be done by using
semantic features for further research.
ACKNOWLEDGEMENT
This work has been financially supported by Green Univer-
sity of Bangladesh Research Fund.
REF ER EN CES
[1] Wikipedia. (2019) Natural language processing. [Online]. Available:
https://en.wikipedia.org/wiki/Natural language processing
[2] M. Nagao, J. Tsujii, and J. Nakamura, “Machine translation from
japanese into english,” Proceedings of the IEEE, vol. 74, no. 7, pp.
993–1012, July 1986.
[3] M. Z. Islam, J. Tiedemann, and A. Eisele, “English to bangla phrase-
based machine translation,” in Proceedings of the 14th Annual confer-
ence of the European Association for Machine Translation, 2010.
[4] M. G. R. Alam, M. M. Islam, and N. Islam, “A new approach to develop
an english to bangla machine translation system,” Daffodil International
University Journal of Science and Technology, vol. 6, no. 1, pp. 36–42,
2011.
[5] K. Muntarina, M. G. Moazzam, and M. A.-A. Bhuiyan, “Tense based
english to bangla translation using mt system,” International Journal of
Engineering Science Invention, vol. 2, no. 10, pp. 30–38, 2013.
[6] M. Rabbani, K. M. R. Alam, and M. Islam, “A new verb based approach
for english to bangla machine translation,” in 2014 International Con-
ference on Informatics, Electronics & Vision (ICIEV). IEEE, 2014, pp.
1–6.
[7] M. S. Arefin, L. Alam, S. Sharmin, and M. M. Hoque, “An empirical
framework for parsing bangla assertive, interrogative and imperative sen-
tences,” in 2015 International Conference on Computer and Information
Engineering (ICCIE). IEEE, 2015, pp. 122–125.
[8] T. Alamgir and M. S. Arefin, “An empirical framework for parsing
bangla imperative, optative and exclamatory sentences, in 2017 In-
ternational Conference on Electrical, Computer and Communication
Engineering (ECCE). IEEE, 2017, pp. 164–169.
[9] M. Haque and M. Hasan, “English to bengali machine translation:
An analysis of semantically appropriate verbs,” in 2018 International
Conference on Innovations in Science, Engineering and Technology
(ICISET). IEEE, 2018, pp. 217–221.
[10] R. Agrawal, V. C. Kumar, V. Muralidharan, and D. M. Sharma, “No
more beating about the bush: A step towards idiom handling for indian
language nlp,” in Proceedings of the Eleventh International Conference
on Language Resources and Evaluation (LREC 2018), 2018.
... Authors of paper [17] proposed a soft keyboard for Bengali language and a noble text entry system for physically disabled people who have no hands. Authors of [18] proposed an rule based approach to translate bangla idioms to english. ...
Preprint
Full-text available
An online examination system is a software solution, which allows any industry or institute to arrange, conduct, and manage examinations via an online environment. Online Examination is an essential ingredient in electronic and interactive learning; both teachers and students are benefited from this. It's very much useful during the current situation of the global pandemic Novel Corona Virus (COVID-19). In this paper, we proposed a system with automatic assessment technique is generated. The algorithms for calculations word frequency, matching keywords, analyzing linguistics, generating grades are proposed in this system. The system is implemented by using PhpStrom and MySQL. The performances of the system is evaluated with a large number of students and questions as well as answers, and we found the absolute (about 0.3%) and relative error (about 3.57%) which is quite satisfactory.
Conference Paper
Full-text available
Machine translator translates a source language into a target language. Obtaining a semantically valid verbal form during the machine translation is an intricate task. The subsisting translators like "Google Translator" still facing quandaries in this issue of translation from English to Bengali. The Bengali verbal inflection is transmuted to compose verb according to the nature of subject and tense. A sentence may have multiple syntactically valid verb form, which introduces intricacy during the machine translation. This study mainly focuses on the analysis of Bengali person, tense and verbal inflections. This paper describes a procedure for finding semantically valid verb within a sentence during the machine translation from English to Bengali.
Conference Paper
Full-text available
This paper proposes verb based machine translation (VBMT), a new approach of machine translation (MT) from English to Bangla (EtoB). For translation, it simplifies any form (i.e. simple, complex, compound, active and passive form) of English sentence into the simplest form of English sentence i.e. subject plus verb plus object. When compared with existing rule based EtoB MT schemes, VBMT doesn't employ exclusive or individual structural rules of various English sentences; it only detects the main verb from any form of English sentence and then transforms it into the simplest form of English sentence. Thus VBMT can translate from EtoB very simply, correctly and efficiently. Rule based EtoB MT is tough because it requires the matching of sentences with the stored rules. Moreover, many existing EtoB MT schemes which deploy rules are almost inefficient to translate complex or complicated sentences because it is difficult to match them with well-established rules of English grammar. VBMT is efficient because after identifying the main verb of any form of English sentence, it binds the remaining parts of speech (POS) as subject and object. VBMT has been successfully implemented for the MT of Assertive, Interrogative, Imperative, Exclamatory, Active-Passive, Simple, Complex, and Compound form of English sentences applicable in both desktop and mobile applications.
Article
Full-text available
Machine translation (MT) is always a challenging job. It is really difficult to build up a complete machine translation system for natural languages. Machine translation includes natural language understanding and generation. The proposed system represents a new solution for building a MT system for English to Bangla translation, by modifying the rule-based transfer approach of MT system. In machine translation the searching of word from the lexicon is a compulsory task, here this searching stage is utilized efficiently by proposing an intelligent integer based lexicon system, consists of a number of separate lexicons and an algorithm is also developed for searching words from the lexicon in order to accomplish the basic steps of machine translation.
Article
Full-text available
Machine Translation (MT) is the task of automatically translating a text from one language to another. In this work we de-scribe a phrase-based Statistical Machine Translation (SMT) system that translates English sentences to Bangla. A translit-eration module is added to handle out-of-vocabulary (OOV) words. This is es-pecially useful for low-density languages like Bangla for which only a limited amount of training data is available. Fur-thermore, a special component for han-dling preposition is implemented to treat systematic grammatical differences be-tween English and Bangla. We have shown the improvement of our system through effective impacts on the BLEU, NIST and TER scores. The overall BLEU score of our system is 11.7 and for short sentences it is 23.3.
Conference Paper
Parsing is one of the most challenging task in the field of natural language processing and it plays an important role in order to analyze any natural language. To determine a legal structure for a sentence, we need to expose the rules of how sentences of a language are embodied and have a parsing algorithm to analyze sentences using those rules. This paper proposes a new technique to parse the Bangla sentences including imperative, optative and exclamatory sentences using a set of context sensitive grammars (CSG's) rules. This paper considers Bangla sentences based on the intonation or mood of the sentences rather than the structure of the sentences for parsing. The proposed framework can parse Bangla sentences with over 81% accuracy which is quite satisfactory.
Conference Paper
To interpret language we need to determine a sentence structure. To do this we know the rule of how sentences of a language are organized and have an algorithm to analyze sentences given those rules. Parsing serves in language to combine the meaning of words and phrases. Parsing a sentence then involves finding a possible legal structure for sentence. This paper proposes a set of context-sensitive grammars (CSG's) to parse the Bangla sentences including assertive, interrogative and imperative. Experimental result reveals that the proposed framework can parse Bangla of sentences with over 80% accuracy.
Article
This paper describes the outline of our Japanese to English machine translation system, which is supported by the Agency of Science and Technology of the Japanese Government. Many new methodologies are introduced to obtain high-quality translation results. The analysis is based on case grammar, which is suitable for a word-order-free language such as Japanese. The dictionary is rich enough to handle many specific expressions. It contains not only case frame information, but also semantic information, idiomatic expressions, and many others. In the transfer phase, the system applies many structural transformations, so that the structural difference of the same contents in Japanese and English can be relieved. In the generation phase, many structural transformations are again applied so that the ellipsis problems can be avoided, and that better stylistic expressions can be obtained. The system is running mainly for the abstracts of scientific and technical papers. The evaluation method of the translated results is also discussed, with many example translations.
Tense based english to bangla translation using mt system
  • K Muntarina
  • M G Moazzam
  • A Bhuiyan
K. Muntarina, M. G. Moazzam, and M. A.-A. Bhuiyan, "Tense based english to bangla translation using mt system," International Journal of Engineering Science Invention, vol. 2, no. 10, pp. 30-38, 2013.
No more beating about the bush: A step towards idiom handling for indian language nlp
  • R Agrawal
  • V C Kumar
  • V Muralidharan
  • D M Sharma
R. Agrawal, V. C. Kumar, V. Muralidharan, and D. M. Sharma, "No more beating about the bush: A step towards idiom handling for indian language nlp," in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018.
No more beating about the bush: A step towards idiom handling for indian language nlp
  • agrawal