Contexts in source publication

Context 1
... Mizo language belongs to the Kukish branch of the Sino-Tibetan language family. Two groups of languages are derived from Sino-Tibetan language family and the derivation of the language tree is shown in Fig. 1. Tibeto-Burman (TB) comprises hundreds of languages besides Tibetan and Burmese, spread over a vast geograph- ical area (China, India, the Himalayan region, peninsular SE Asia). Among these two sub-groups of Sino-Tibetan Kamarupan of TB, is jointly spoken by people of NE India (Kuki-Chin-Naga, Abor-Miri-Dafla, and Bodo-Garo sub- groups) and adjacent regions of Burma. These languages constitute the centre of diversification of the whole TB family. Nagaland alone, with an area of only 6350 sq. mi., which is home to some 90 Tibeto-Burman languages and dialects. With a few exceptions, e.g. Lushai (Lorrain 1940), Tangkhul Naga ( Pettigrew 1918, Bhat 1969, Garo (Burling 1961), Tiddim Chin (Henderson 1965), Bawm (Schwerli 1979), these Indospheric TB languages are poorly recorded until recently, and many are still hardly known at all. In the literature, Mizo is the new names of Lusai language 3 , which is under the category of TB language ...
Context 2
... Mizo language belongs to the Kukish branch of the Sino-Tibetan language family. Two groups of languages are derived from Sino-Tibetan language family and the derivation of the language tree is shown in Fig. 1. Tibeto-Burman (TB) comprises hundreds of languages besides Tibetan and Burmese, spread over a vast geograph- ical area (China, India, the Himalayan region, peninsular SE Asia). Among these two sub-groups of Sino-Tibetan Kamarupan of TB, is jointly spoken by people of NE India (Kuki-Chin-Naga, Abor-Miri-Dafla, and Bodo-Garo sub- ...

Citations

... Previous Natural Language Processing (NLP) study on the Mizo language includes an analysis on postediting effort required to build English-Mizo parallel dataset [19], a Multi-Word Expression (MWE) for Mizo language [20], identifying criteria for recognition of Name Entity Classes in Mizo language [21], resource building and POS tagging for the Mizo language [22]. The preliminary study of POS tagging in the Mizo language [23] addressed the distinctive characteristics of the Mizo language and the limitations of the Mizo tagging system. ...
Article
Full-text available
Machine Translation (MT) is the process of automatically converting the text or speech in one natural language to another language with the help of a machine. This work presents a Bidirectional Statistical Machine Translation (SMT) system of an extremely low resource language pair Mizo-English, built in a low resource setting. A total of 30800 sentences are collected from the English Bible dataset and manually translated to Mizo by a native linguistic expert to generate the English-Mizo parallel dataset. After subjecting to various pre-processing steps, the parallel dataset is used to build our MT system using MOSES tools. Our framework uses different tools, such as GIZA++ for creating the Translation Model (TM) and IRSTLM to determine the probability of the target model. The quality of our MT system is evaluated using two automatic evaluation metrics: BLEU and METEOR. Our MT systems are also manually evaluated using two parameters: adequacy and fluency.
... Besides, MT related works include recognizing named entity classes [2] With monolingual data to address the low-resource language problem, a filtering approach for the pseudo-parallel corpus is proposed to increase the parallel training corpus. ...
Article
Full-text available
Machine translation is one of the most powerful natural language processing applications for preserving and upgrading low-resource language. Mizo language is considered as low-resource since there is limited availability of resources. Therefore, it is a challenging task for English-Mizo language pair translation. Moreover, Mizo is a tonal language, where a word can express different meanings depending on a variety of tones. There are four variations of tones, namely high, low, rising, and falling. A tone marker is used to represent each of the tones, which is added to the vowels to indicate tone variation. Addressing tonal words in machine translation for such a low-resource pair is another challenging issue. In this paper, the English-Mizo corpus is developed where parallel sentences having tonal words are incorporated. The different machine translation models are explored based on statistical machine translation and neural machine translation for the baseline systems. Furthermore, the proposed approach attempts to augment the train data by expanding parallel data having tonal words and achieves state-of-the-art results for both forward and backward translations encountering tonal words.
... The results were being approved by the Mizo Linguistics experts. The study [14] has presented the beneficial of identifying the Name Entity Recognition (NER) classes for various Indian languages. They also point out the rule for recognizing the NER classes for Mizo Language. ...
Conference Paper
Full-text available
The communication in between different regional languages is always inevitable fact which requires a lot of effort to make meaningful in terms of technology. The machine translation methods have been endeavoured to fill this impediment. The Neural Machine Translation is one of the efforts which recently gained a remarkable enhancement in the matter of human judgement over the conventional approaches like phrase-based machine translation and statistical machine translation. Thus, the advances of neural machine translation approach influence the challenge in the mind of people. The different types of online translation and mobile application tools such as Google Translate, DeepL, SYSTRAN, etc. applications were came up to tackle this issue. But such kinds of convenience translation tools are not available for English-Mizo parallel languages. In this paper, we have tried to enhance the ability of language translation by owing the supremacy capability of Neural Machine Translation to improve the English to Mizo conversion in Bilingual Evaluation Understudy metric.
... The results were being approved by the Mizo Linguistics experts. The study [16] has presented the beneficial of identifying the Name Entity Recognition (NER) classes for various Indian languages. They also point out the rule for recognizing the NER classes for Mizo Language. ...
Chapter
The volume of data are increasing continuously due to daily transactions, social networks, data automation etc. Managing of large amount data locally is a big challenge and therefore, the people are planned to outsource the data in to third-party server which decrease the cost of storage and local maintenance. Recently, researchers are working hard to securely outsource the data into the cloud server. There is increasing the computation cost when secure access control defined from the data owner site to the third-party server. It will be more effective for the light weight devices also when the computation outsourced through third-party server like data outsourcing in the cloud. In this paper, we proposed a novel process for computation outsourcing into the third party server. Here, we basically worked on a bilinear pairing which is costly computation operation, mostly used in recent security purpose and outsourced those pairing computation into the cloud.
... The target language, Mizo (also known as Lushai), is a low-resource Indian language, which belongs to Sino-Tibetan family of languages. 1 It is actively spoken by Mizo people of Mizoram state of India, who are also the native speakers of the language. Although the declarative word order of the language is Object-Subject-Verb (OSV), it does follow Subject-Verb-Object (SVO) order, similar to English, in certain situations. ...
... English-to-Mizo translation lacks background work. However, the literature finds mention of identifying Multiword Expressions (MWE) for Mizo language [28], identifying rules for recognition of named entity classes in Mizo language [1], and resource building and POS tagging for Mizo language [32]. Since the specific work on English-to-Mizo machine translation is still in its infancy, we have examined some closely related works, with special focus on works concerning low-resource scenarios. ...
Article
Full-text available
Machine translation helps resolve language incomprehensibility issues and eases interaction among people from varying linguistic backgrounds. Although corpus-based approaches (statistical and neural) offer reasonable translation accuracy for large-sized corpus, robustness of such approaches lie in their ability to adapt to low-resource languages, which confront unavailability of large-sized corpus. In this paper, prediction aptness of two approaches has been meticulously explored in the context of Mizo, a low-resource Indian language. Translations predicted by the two approaches have been comparatively and adequately analyzed on a number of grounds to infer their strengths and weaknesses, particularly in low-resource scenarios.
Article
This research investigates the utilization of pre-trained BERT transformers within the context of the Mizo language. BERT, an abbreviation for Bidirectional Encoder Representations from Transformers, symbolizes Google’s forefront neural network approach to Natural Language Processing (NLP), renowned for its remarkable performance across various NLP tasks. However, its efficacy in handling low-resource languages such as Mizo remains largely unexplored. In this study, we introduce MizBERT , a specialized Mizo language model. Through extensive pre-training on a corpus collected from diverse online platforms, MizBERT has been tailored to accommodate the nuances of the Mizo language. Evaluation of MizBERT’s capabilities is conducted using two primary metrics: Masked Language Modeling (MLM) and Perplexity, yielding scores of 76.12% and 3.2565, respectively. Additionally, its performance in a text classification task is examined. Results indicate that MizBERT outperforms both the multilingual BERT (mBERT) model and the Support Vector Machine (SVM) algorithm, achieving an accuracy of 98.92%. This underscores MizBERT’s proficiency in understanding and processing the intricacies inherent in the Mizo language.
Article
The vast majority of languages in the world at present, are considered to be low-resource languages. Since the availability of large parallel data is crucial for the success of most modern machine translation approaches, improving machine translation for low-resource languages is a key challenge. Most unsupervised techniques for translation benefit closely related languages with monolingual data of substantial quantity. To facilitate research in this direction for the extremely low resource language pair English ( en ) and Mizo ( lus ), we have developed a parallel and monolingual corpus for the Mizo language from various news websites. We explore Unsupervised Neural Machine Translation (UNMT) based on the developed monolingual data. We observe that cross-lingual embedding (CLWE) initializations on subword segmented data during pre-training, based on both masked language modelling and sequence-to-sequence generation tasks, improve translation performance. We experiment with cross-lingual alignment, and combined alignment and joint training for learning the cross-lingual embedding representations. We also report baseline performances and the impact of CLWE initialization using semi-supervised and supervised neural machine translation. Empirical results show that both CLWE initializations work well for the distant pair English-Mizo compared to the baselines.
Article
Stress is the property of a language to exhibit prominence or distinction in one or more syllables in a given domain. The existence of word stress has not been suitably explored in previous acoustic studies of the Mizo language, which is a tonal language of the Kuki-Chin sub-category in Tibeto-Burman language families. In this study, we attempt to analyze word stress on disyllabic target words, specifically in three lexical categories— adjectives, nouns, and verbs . Utterances of the target words are recorded in isolated setting (out of focus) and in sentence frames (in focus). First, averages of features, namely— duration, intensity, F0, formants , and spectral tilt , are extracted and investigated for identification of stressed and unstressed syllables on a total of 2,880 samples. Next, the interaction of word stress on the four tones of Mizo is investigated. While it is found that H-tone is generally stressed, inferences are made that stressed syllables are not unique to a specific tone. Third, significance of the selected features are validated using a two-tailed paired sample t -test. Our analysis indicates that the mean differences in duration, intensity, and F0 of the stressed and unstressed syllables are significant across the lexical categories at p < 0.05. Next, validations on the significance of the mean differences are carried out using Cohen’s d effect size and Pearson’s Correlation Coefficient ( r ). Finally, three machine learning models—Support Vector Machines (SVM), Naive Baye’s, and Ensemble learning methods (AdaBoost and Boosted Aggregation), are used to identify stressed and unstressed syllables associated with tones in Mizo. Discriminating differences, especially in disyllabic verbs , are observed between stressed vs. unstressed syllables. Conclusions are drawn that duration is a strong and robust cue for acoustic correlates of stress, while intensity is a medium cue for stress and F0 a weak cue for stress.
Article
Machine Translation is an effort to bridge language barriers and misinterpretations, making communication more convenient through the automatic translation of languages. The quality of translations produced by corpus-based approaches predominantly depends on the availability of a large parallel corpus. Although machine translation of many Indian languages has progressively gained attention, there is very limited research on machine translation and the challenges of using various machine translation techniques for a low-resource language such as Mizo. In this article, we have implemented and compared statistical-based approaches with modern neural-based approaches for the English–Mizo language pair. We have experimented with different tokenization methods, architectures, and configurations. The performance of translations predicted by the trained models has been evaluated using automatic and human evaluation measures. Furthermore, we have analyzed the prediction errors of the models and the quality of predictions based on variations in sentence length and compared the model performance with the existing baselines.