Veselin Stoyanov's scientific contributions

Publications (24)

Preprint
Full-text available
Current methods for few-shot fine-tuning of pretrained masked language models (PLMs) require carefully engineered prompts and verbalizers for each new task to convert examples into a cloze-format that the PLM can score. In this work, we propose PERFECT, a simple and efficient method for few-shot fine-tuning of PLMs without relying on any such handc...
Preprint
Full-text available
Retrieving relevant contexts from a large corpus is a crucial step for tasks such as open-domain question answering and fact checking. Although neural retrieval outperforms traditional methods like tf-idf and BM25, its performance degrades considerably when applied to out-of-domain data. Driven by the question of whether a neural retrieval model ca...
Preprint
The state of the art on many NLP tasks is currently achieved by large pre-trained language models, which require a considerable amount of computation. We explore a setting where many different predictions are made on a single piece of text. In that case, some of the computational cost during inference can be amortized over the different tasks using...
Conference Paper
Full-text available
We study the problem of multilingual masked language modeling, i.e. the training of a single model on concatenated text from multiple languages, and present a detailed study of several factors that influence why these models are so effective for cross-lingual transfer. We show, contrary to what was previously hypothesized, that transfer is possible...
Preprint
Recent breakthroughs of pretrained language models have shown the effectiveness of self-supervised learning for a wide range of natural language processing (NLP) tasks. In addition to standard syntactic and semantic NLP tasks, pretrained models achieve strong improvements on tasks that involve real-world knowledge, suggesting that large-scale langu...
Preprint
This paper shows that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks. We train a Transformer-based masked language model on one hundred languages, using more than two terabytes of filtered CommonCrawl data. Our model, dubbed XLM-R, significantly outperforms m...
Preprint
Full-text available
We study the problem of multilingual masked language modeling, i.e. the training of a single model on concatenated text from multiple languages, and present a detailed study of several factors that influence why these models are so effective for cross-lingual transfer. We show, contrary to what was previously hypothesized, that transfer is possible...
Preprint
The scarcity of labeled training data often prohibits the internationalization of NLP models to multiple languages. Recent developments in cross-lingual understanding (XLU) has made progress in this area, trying to bridge the language barrier using language universal representations. However, even if the language problem was resolved, models traine...
Preprint
Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of...
Preprint
Traditional language models are unable to efficiently model entity names observed in text. All but the most popular named entities appear infrequently in text providing insufficient context. Recent efforts have recognized that context can be generalized between entity names that share the same type (e.g., \emph{person} or \emph{location}) and have...
Preprint
State-of-the-art natural language processing systems rely on supervision in the form of annotated data to learn competent models. These models are generally trained on data in a single language (usually English), and cannot be directly used beyond that language. Since collecting data in every language is not realistic, there has been a growing inte...
Preprint
Neural Machine Translation (NMT) typically leverages monolingual data in training through backtranslation. We investigate an alternative simple method to use monolingual data for NMT training: We combine the scores of a pre-trained and fixed language model (LM) with the scores of a translation model (TM) while the TM is trained from scratch. To ach...

Citations

... If irrelevant images and texts are retrieved, the accuracy of the model-generated answers will significantly decrease. Recent research has shown that large pre-trained model parameters carry world knowledge [7,8], which can be applied to downstream tasks through appropriate prompt learning [9,10]. Mining question-related background knowledge from the information in pre-trained model parameters can mitigate the issue of answer generation is heavily dependent on the quality of retrieved information sources. ...
... Dense passage retriever. The dense retrieval sub-system is a neural network that learns from Wikipedia data to encode the citation context into a dense query vector [22][23][24][25][26] . This vector is then matched against the vector encodings of all passages in Sphere and the closest ones are returned. ...
... The noising-based methods add some discrete or continuous noise to texts which has little effect on semantics (Wei and Zou 2019; Coulombe 2018). The sampling-based methods sample novel data under the current data distributions (Kang et al. 2018;Du et al. 2021). ...
... Previous efforts attempt to map the language inputs into intermediate distributed features, such as word embeddings (Mikolov et al., 2013;Kiros et al., 2015;Pennington et al., 2014;Peters et al., 2018), sentence embeddings (Conneau et al., 2017;Reimers and Gurevych, 2019;Gao et al., 2021), and document embeddings (Dai et al., 2015;Wu et al., 2018), which are further used as inputs of downstream task-specific models to generate the final task-specific document representations. Furthermore, some researchers make preliminary exploration to decouple document encoding from tasks by freezing the part of layers of document encoders (Du et al., 2020;Saad-Falcon et al., 2022). But these works only achieve semi-decoupling of document encoding from tasks, and can only be used for the plugging during tuning setting. ...
... The fused embeddings are then passed to a series of transformer encoder layers to pool the embedding, denoted emb pool . Finally the pooled layers are passed to an autoregressive pointer generator decoder which produces tokens which are either 'pointers' to input tokens or newly 'generated' tokens (Aghajanyan et al., 2020;See et al., 2017). ...
... To explore the potential of using transformers for NER, we adapted an existing NER system (Luoma et al., 2023). Specifically, we built upon the RoBERTa-large-PM-M3-Voc model (RoBERTa-bio hereafter), which has demonstrated the best performance in several NER tasks (Lewis et al., 2020;Miranda-Escalada et al., 2023). We trained the model for multiclass classification of the nine categories of LSFs using LSF200 without OOC annotations. ...
... Widely adopted voice assistive technologies such as Siri and Alexa exemplify the widespread acceptance of language-based interaction. Recent advancements in natural language processing have significantly expanded the capabilities of speech interfaces, encompassing understanding spoken text [2], [3], [4], processing natural language [5], [6], [7], generating text [8], [9], and producing spoken words [10], [11], [12]. Within the realm of virtual reality (VR), speech technologies are gaining traction, being employed in conjunction with gestures for 3D scene navigation [13], [14], [15], [16], multimodal data exploration [17], [18], and as a control feature in various systems [19], [20]. ...
... We used the OSM Nominatim API (Clemens 2015) to geocode extracted toponyms. • For Sentiments: they were derived using the XLM-RoBERTa language model (Conneau et al. 2019). More precisely, we used the version which was finetuned for sentiment analysis (Barbieri et al. 2022). ...
... Recently, pre-trained models such as BERT Devlin et al. [2019] have changed the landscape of cross-lingual representation research. These models have enabled the generation of sentence encoders on multilingual unlabeled corpora without the need for parallel data Conneau et al. [2020], Feng et al. [2022], Goswami et al. [2021], Litschko et al. [2022]. Concurrently, certain studies have leveraged pre-trained multilingual transformers for cross-lingual information retrieval (IR). ...
... In a similar manner, The Ref. [25] suggest utilizing knowledge bases to provide remote labels, which can then be employed to enhance the training of supervised Named Entity Recognition (NER) models. The Ref. [26] utilize a knowledge base to train a Named Entity Recognition (NER) model called KALM. This model distinguishes whether a term in a sentence is derived from the knowledge base or a conventional dictionary. ...