Size data about the BootCaT subcorpora.

Source publication

Old Needs, New Solutions: Comparable Corpora for Language Professionals

Chapter

Full-text available

Dec 2013

Use of corpora by language service providers and language professionals remains limited due to the existence of competing resources that are likely to be perceived as less demanding in terms of time and effort required to obtain and (learn to) use them (e.g. translation memory software, term bases and so forth). These resources however have limitat...

Context 1

... "the side effects" "inform your doctor" "you need to" "if you take" "solution for infusion" "doctor or pharmacist" "and what it" "effects not listed" "to your doctor" English-T mg mixture or ingredients your influenza pain symptoms please doctor bowl syringe capsules use kg Default parameters were used (10 tuples, 10 URLs per query) and no manual filtering of results was performed. Size information about the resulting corpus are provided in Table 2. Notice that the Italian-G subcorpus is much smaller than the rest. ...

View in full-text

Context 2

... the results we obtained adopting a relatively straightforward pipeline, are encouraging both for English and for Italian: starting from the same, manual corpus, and using a genre-driven procedure of seed selection besides the traditional topic-driven one, the size of the resulting corpus is doubled for English, and substantially increased for Italian (cf. Table 2), with comparable levels of perceived relevance. Furthermore, using n-grams as seeds makes seed selection more straightforward, since no reference corpus is required (differently from the topic-driven pipeline). ...

View in full-text

Interpretazione e ricerca su aspetti neurolinguistici e cognitivi

Chapter

Full-text available

Dec 2021

This chapter focuses on the cognitive processes of interpreting, with the intention of providing the student with a knowledge base to become aware of the cognitive dynamics of this activity. In the first part of the chapter, basic notions of brain structures and functions of language are provided, psycholinguistic models of the simultaneous interpreting (SI) process are illustrated - with particular reference to the difference between experienced and novice interpreters -, and the results of functional magnetic resonance imaging studies are described to highlight the effects of constant Si practice on brain areas involved in language processing. The second part of the chapter delves into the executive functions that are essential for performing a cognitively complex multitasking task such as SI, namely working memory (WM), inhibition and cognitive flexibility. After providing basic knowledge on memory and illustrating the role of WM, the concepts of selective attention, attention inhibition and cognitive flexibility are explored. Subsequently, exercises are suggested to enhance these functions in order to develop specific skills. Finally, for those who would like to learn more about the methodology of cognitive research in the field of conference interpreting, a brief review of the most commonly used cognitive tests in empirical research on interpreting is included.

MƏTNLƏRDƏN TERMİNLƏRİN AVTOMATİK ÇIXARILMASI METODLARININ ARAŞDIRILMASI VƏ MÜQAYİSƏLİ ANALİZİ

Article

Jan 2021

Əfruz Qurbanova

Məqalədə mətnlərdən terminlərin avtomatik çıxarılmasının beş metodu tədqiq olunmuş və onların müqayisəli analizi verilmişdir. Mətnlərdən terminlərin çıxarılmasının ümumi məqsədi xüsusi sahənin əsas lüğətinin təyin edilməsidir. Terminlərin ənənəvi olaraq əl ilə çıxarılmasından fərqli olaraq avtomatik çıxarılması vaxt aparan bu işi sadələşdirmək üçün kompüterləşdirilmiş bir vasitədir və termin-namizədlərin əvvəlcədən müəyyənləşdirilməsinin avtomatlaşdırılmasına yönəlib. Hazırkı dövrdə bir çox sahələrdə (leksikoqrafiya, terminşünaslıq, informasiya axtarışı və s.) emal olunmalı informasiyanın həcminin artım dinamikası termin və açar sözlərin avtomatik seçilməsi məsələsini xüsusilə aktual edir. Təbii dilin emalı sahəsində qurulan qaydaları təqdim edən mətnlərdən terminlərin avtomatik çıxarılması üçün bir çox fərqli yanaşma və sistem işlənmişdir. Mətnlərdən terminlərin avtomatik çıxarılmasının müxtəlif altməsələləri – korpus kolleksiyası, vahid birliklər, termin və variantların müəyyən olunması və keyfiyyətin qiymətləndirilməsi qaydası təqdim olunmuşdur. Müəyyən predmet sahəsi üçün mətnlərdən terminlərin avtomatik çıxarılmasına tətbiqi yanaşma verilmişdir. "İnformasiya texnologiyaları problemləri" və "İnformasiya cəmiyyəti problemləri" jurnallarının məqalələrinin korpusu üzərində eksperiment aparılmışdır. Ekspert və formal qiymətləndirmə metodikası təklif olunmuş, terminlərin avtomatik çıxarılması metodlarının müqayisəli qiymətləndirilməsinin nəticələri verilmişdir.

Online corpus of spoken Ilokano language

Article

Full-text available

Mar 2019

There has been a great effort in the collection of different languages in the past years all over the world, and the development of online corpus outside the country brought new possibilities in the Philippines. However, there is a limited resource for the Ilokano Language. This paper introduces the Corpus of Spoken Ilokano Language, an online repository of spoken Ilokano in the Philippines specifically in region 1. The main component of this study is spoken Ilokano. It has been specifically built for natural language processing. It shows the difference of Ilokano language as spoken by Ilokanos in the region. The database consists of 160 speakers, 40 speakers in each province of the region, each speaking about 74 statements. Spoken Ilokano language was audio recorded and transcribed. A web application has been developed making the dataset available online. The corpus was validated to provide a useful resource of data that can be used for automatic speech recognition models.

13th Teaching and Language Corpora Conference TaLC 2018 Book of Abstracts

Conference Paper

Full-text available

Jul 2018

Book of abstracts TALC 2018 13th Teaching and Language Corpora Conference. University of Cambridge.

The use of comparable corpora in interpreting practice and teaching

Article

Full-text available

Jan 2018

Claudio Fantinuoli

Terminology research and domain knowledge acquisition constitute a substantial part of the preparation activity performed daily by professional interpreters. Corpora have been suggested to be an effective resource in enhancing the quality of interpretation. Corpus-based preparation can assist interpreters in investigating subject-related terminology and phraseology as well as help acquire subject-specific domain knowledge. This is particularly important in light of the fact that interpreters often do not have the same level of linguistic and domain expertise as their clients. It is therefore reasonable to suggest that tools for corpus analysis should be an integral part of a modern interpreter’s workstation. In this paper I will introduce how corpora can be used in interpreter practice and teaching in the context of deliberate practise. I will also describe the results of an empirical test of the resources created by a tool designed for this purpose in terms of their adequacy to satisfy the needs of interpreters.

Exploiting comparable corpora for domain-specific statistical machine translation

Article

Jan 2017

Magdalena Plamadă

Linguística de Corpus e ensino: a compilação de um corpus de especialidade para preparação e implementação de um curso preparatório rápido para exame de proficiência

Article

Full-text available

Dec 2014

Este artigo apresentará o processo de compilação de um corpus de especialidade na área de Relações Exteriores e seu uso para definir o conteúdo programático e a preparação de material didático para candidatos a um exame de proficiência em inglês para preenchimento de um cargo público no âmbito do governo federal. This article will present the compilation of a specialized corpus in Foreign Affairs and how it was used to define the syllabus and support the preparation of teaching materials for candidates that would take a proficiency exam as one of the requirements to fill a federal government position.

Other Applications of Comparable Corpora

Chapter

Aug 2023

This section concerns applications of comparable corpora beyond pure machine translation. It has been argued [1, 2] that downstream applications such as cross-lingual document classification, information retrieval or natural language inference, apart from proving the practical utility of NLP methods

Building Comparable Corpora

Chapter

Aug 2023

In a parallel corpus we know which document is a translation of what by design. If the link between documents in different languages is not known, it needs to be established. In this chapter we will discuss methods for measuring document similarity across languages and how to evaluate the results. Then, we will proceed to discussing methods for building comparable corpora of different degrees of comparability and for different tasks.

Traduire en français le lexique du patrimoine artistique de la ville de Bologne : le sous-corpus comparable BER du projet LBC

Chapter

Full-text available

Jan 2023

Valeria Zotti

This collection of studies focuses on the translation of the language of art and cultural heritage in a wide variety of text types from the Renaissance to the present, following different theoretical and methodological approaches ranging from corpus linguistics to lexicography, terminology, and translation studies. This book is meant for a wide audience including scholars and students of languages for special purposes, as well as professional translators and experts in the international communication of cultural heritage. These studies have been carried out as part of the Multilingual Cultural Heritage Lexicon research project (Lessico plurilingue dei Beni Culturali). An initiative which first originated at the University of Florence, now involving multiple Italian and international universities, this project is dedicated to compiling textual databases and plurilingual dictionaries through comparable and parallel corpora.

Size data about the BootCaT subcorpora.

Contexts in source publication

Citations