Table 2 - uploaded by Adriano Ferraresi
Content may be subject to copyright.
Size data about the BootCaT subcorpora.

Size data about the BootCaT subcorpora.

Source publication
Chapter
Full-text available
Use of corpora by language service providers and language professionals remains limited due to the existence of competing resources that are likely to be perceived as less demanding in terms of time and effort required to obtain and (learn to) use them (e.g. translation memory software, term bases and so forth). These resources however have limitat...

Contexts in source publication

Context 1
... "the side effects" "inform your doctor" "you need to" "if you take" "solution for infusion" "doctor or pharmacist" "and what it" "effects not listed" "to your doctor" English-T mg mixture or ingredients your influenza pain symptoms please doctor bowl syringe capsules use kg Default parameters were used (10 tuples, 10 URLs per query) and no manual filtering of results was performed. Size information about the resulting corpus are provided in Table 2. Notice that the Italian-G subcorpus is much smaller than the rest. ...
Context 2
... the results we obtained adopting a relatively straightforward pipeline, are encouraging both for English and for Italian: starting from the same, manual corpus, and using a genre-driven procedure of seed selection besides the traditional topic-driven one, the size of the resulting corpus is doubled for English, and substantially increased for Italian (cf. Table 2), with comparable levels of perceived relevance. Furthermore, using n-grams as seeds makes seed selection more straightforward, since no reference corpus is required (differently from the topic-driven pipeline). ...

Citations

... All'interno dei corpora è possibile anche effettuare una ricerca di keywords, ossia generare una lista di parole tipiche del dominio derivante dal confronto tra la lista di frequenza del corpus specialistico e un corpus di riferimento 5 . Inoltre AntConc permette anche la navigazione "a tutto testo" e offre una serie di ricche informazioni contestuali necessarie per il processo di selezione dei termini (Bernardini, Ferraresi 2013). Se tutte queste operazioni vengono svolte parallelamente in lingua di partenza e in lingua di arrivo, creando di volta in volta i match interlinguistici, sarà molto semplice compilare un glossario bilingue accurato. ...
Chapter
Full-text available
This chapter focuses on the cognitive processes of interpreting, with the intention of providing the student with a knowledge base to become aware of the cognitive dynamics of this activity. In the first part of the chapter, basic notions of brain structures and functions of language are provided, psycholinguistic models of the simultaneous interpreting (SI) process are illustrated - with particular reference to the difference between experienced and novice interpreters -, and the results of functional magnetic resonance imaging studies are described to highlight the effects of constant Si practice on brain areas involved in language processing. The second part of the chapter delves into the executive functions that are essential for performing a cognitively complex multitasking task such as SI, namely working memory (WM), inhibition and cognitive flexibility. After providing basic knowledge on memory and illustrating the role of WM, the concepts of selective attention, attention inhibition and cognitive flexibility are explored. Subsequently, exercises are suggested to enhance these functions in order to develop specific skills. Finally, for those who would like to learn more about the methodology of cognitive research in the field of conference interpreting, a brief review of the most commonly used cognitive tests in empirical research on interpreting is included.
... K-factor metodu bir neçə sözdən ibarət terminlərin avtomatik çıxarılması üçün nəzərdə tutulub və BootCaT sistemində reallaşıb [26]. BootCaT veb-in tematik korpusunun avtomatik formalaşmasına xidmət edir. ...
Article
Məqalədə mətnlərdən terminlərin avtomatik çıxarılmasının beş metodu tədqiq olunmuş və onların müqayisəli analizi verilmişdir. Mətnlərdən terminlərin çıxarılmasının ümumi məqsədi xüsusi sahənin əsas lüğətinin təyin edilməsidir. Terminlərin ənənəvi olaraq əl ilə çıxarılmasından fərqli olaraq avtomatik çıxarılması vaxt aparan bu işi sadələşdirmək üçün kompüterləşdirilmiş bir vasitədir və termin-namizədlərin əvvəlcədən müəyyənləşdirilməsinin avtomatlaşdırılmasına yönəlib. Hazırkı dövrdə bir çox sahələrdə (leksikoqrafiya, terminşünaslıq, informasiya axtarışı və s.) emal olunmalı informasiyanın həcminin artım dinamikası termin və açar sözlərin avtomatik seçilməsi məsələsini xüsusilə aktual edir. Təbii dilin emalı sahəsində qurulan qaydaları təqdim edən mətnlərdən terminlərin avtomatik çıxarılması üçün bir çox fərqli yanaşma və sistem işlənmişdir. Mətnlərdən terminlərin avtomatik çıxarılmasının müxtəlif altməsələləri – korpus kolleksiyası, vahid birliklər, termin və variantların müəyyən olunması və keyfiyyətin qiymətləndirilməsi qaydası təqdim olunmuşdur. Müəyyən predmet sahəsi üçün mətnlərdən terminlərin avtomatik çıxarılmasına tətbiqi yanaşma verilmişdir. "İnformasiya texnologiyaları problemləri" və "İnformasiya cəmiyyəti problemləri" jurnallarının məqalələrinin korpusu üzərində eksperiment aparılmışdır. Ekspert və formal qiymətləndirmə metodikası təklif olunmuş, terminlərin avtomatik çıxarılması metodlarının müqayisəli qiymətləndirilməsinin nəticələri verilmişdir.
... [6] says that the world wide web is not only a tool for information retrieval and exchange but also a massive repository of authentic data, "a self-renewing linguistic resource" offering "a freshness and topicality unmatched by fixed corpora." However, with all the researches that have been done, [7] says that the use of corpus by different language service providers and language professional remains limited due to the existence of computing resources that are likely to be perceived as less demanding regarding time and effort required to obtain them. ...
Article
Full-text available
There has been a great effort in the collection of different languages in the past years all over the world, and the development of online corpus outside the country brought new possibilities in the Philippines. However, there is a limited resource for the Ilokano Language. This paper introduces the Corpus of Spoken Ilokano Language, an online repository of spoken Ilokano in the Philippines specifically in region 1. The main component of this study is spoken Ilokano. It has been specifically built for natural language processing. It shows the difference of Ilokano language as spoken by Ilokanos in the region. The database consists of 160 speakers, 40 speakers in each province of the region, each speaking about 74 statements. Spoken Ilokano language was audio recorded and transcribed. A web application has been developed making the dataset available online. The corpus was validated to provide a useful resource of data that can be used for automatic speech recognition models.
... Step 3 consists of one or two translation exercises where students need to translate sentences by using the information collected from both the observation of the comparable corpus and of the parallel corpus. The general aim is to help students write natural-sounding, idiomatic translated texts, by having them use both "manufactured" and "do-it-yourself" (DIY) corpora (Bernardini & Ferraresi 2013). ...
Conference Paper
Full-text available
Book of abstracts TALC 2018 13th Teaching and Language Corpora Conference. University of Cambridge.
... Texts contained in a corpus, however, can also be consulted in a non-linear way: thanks to the query and display feature of corpus analysis tools, the user can approach the text in a bottom-up manner, starting from the terminology/phraseology of the domain to the creation of its conceptual structure. Furthermore, most tools allow full-text browsing and offer the rich contextual information required for decision-making (Bernardini/Ferraresi 2013). This allows interpreters to explore the textual material of a specialized subject in a dynamic, interactive and explorative way (Fantinuoli 2006(Fantinuoli , 2017b as they no longer need to browse through different texts and pages. ...
... No manual filtering of results was performed. The size of the two corpora is provided in The evaluation methodology applied to assess the output of the corpus building procedure is similar to the one described in Bernardini/Ferraresi (2013), with 30 students of interpretation (both graduate as well as undergraduate) acting as informants. They were asked to evaluate a randomly extracted list of 10 texts for each corpus (which corresponds to 11.36% and 12.66% of the total URLs successfully downloaded, converted and inserted in the corpus). ...
... See alsoBernardini/Ferraresi (2013) for similar observations on tests conducted with BootCat. 12 In Tribble's sense, "quick-and-dirty" are corpora informally produced for an immediate use without elaborate deliberations about their compositions(Tribble 1997). ...
Article
Full-text available
Terminology research and domain knowledge acquisition constitute a substantial part of the preparation activity performed daily by professional interpreters. Corpora have been suggested to be an effective resource in enhancing the quality of interpretation. Corpus-based preparation can assist interpreters in investigating subject-related terminology and phraseology as well as help acquire subject-specific domain knowledge. This is particularly important in light of the fact that interpreters often do not have the same level of linguistic and domain expertise as their clients. It is therefore reasonable to suggest that tools for corpus analysis should be an integral part of a modern interpreter’s workstation. In this paper I will introduce how corpora can be used in interpreter practice and teaching in the context of deliberate practise. I will also describe the results of an empirical test of the resources created by a tool designed for this purpose in terms of their adequacy to satisfy the needs of interpreters.
... Moreover, it is conceivable that the extracted texts can be used for other practical applications as well, such as computer-assisted language learning and translation (Delpech, 2014), cross-linguistic translation studies (Bernardini and Ferraresi, 2013) or terminology extraction (Morin et al., 2013). The latter research direction has received particular attention in the past twenty years, as it offered an effort-saving alternative to the manual compilation of dictionaries. 1 Moreover, language professionals show increased interest for automatic technologies, which have the potential to minimize their workload. ...
... Hoje ela está apenas disponível para download 1 . Desenvolvimentos mais recentes estão relatados emBernardini & Ferraresi (2013).LETRAS & LETRAS ( http://www.seer.ufu.br/index.php/letraseletras) -v. 30, n. 2 (jul/dez. ...
Article
Full-text available
Este artigo apresentará o processo de compilação de um corpus de especialidade na área de Relações Exteriores e seu uso para definir o conteúdo programático e a preparação de material didático para candidatos a um exame de proficiência em inglês para preenchimento de um cargo público no âmbito do governo federal. This article will present the compilation of a specialized corpus in Foreign Affairs and how it was used to define the syllabus and support the preparation of teaching materials for candidates that would take a proficiency exam as one of the requirements to fill a federal government position.
Chapter
This section concerns applications of comparable corpora beyond pure machine translation. It has been argued [1, 2] that downstream applications such as cross-lingual document classification, information retrieval or natural language inference, apart from proving the practical utility of NLP methods
Chapter
In a parallel corpus we know which document is a translation of what by design. If the link between documents in different languages is not known, it needs to be established. In this chapter we will discuss methods for measuring document similarity across languages and how to evaluate the results. Then, we will proceed to discussing methods for building comparable corpora of different degrees of comparability and for different tasks.
Chapter
Full-text available
This collection of studies focuses on the translation of the language of art and cultural heritage in a wide variety of text types from the Renaissance to the present, following different theoretical and methodological approaches ranging from corpus linguistics to lexicography, terminology, and translation studies. This book is meant for a wide audience including scholars and students of languages for special purposes, as well as professional translators and experts in the international communication of cultural heritage. These studies have been carried out as part of the Multilingual Cultural Heritage Lexicon research project (Lessico plurilingue dei Beni Culturali). An initiative which first originated at the University of Florence, now involving multiple Italian and international universities, this project is dedicated to compiling textual databases and plurilingual dictionaries through comparable and parallel corpora.