2-D visualizations of contextualized representations for different occurrences of spring. A fine-grained distinction can be observed for the season meaning of spring, with a distinct cluster (on the right) denoting the spring of a specific year. Using PCA for dimensionality reduction.

2-D visualizations of contextualized representations for different occurrences of spring. A fine-grained distinction can be observed for the season meaning of spring, with a distinct cluster (on the right) denoting the spring of a specific year. Using PCA for dimensionality reduction.

Source publication
Article
Full-text available
Transformer-based language models have taken many fields in NLP by storm. BERT and its derivatives dominate most of the existing evaluation benchmarks, including those for Word Sense Disambiguation (WSD), thanks to their ability in capturing context-sensitive semantic nuances. However, there is still little knowledge about their capabilities and po...

Similar publications

Preprint
Full-text available
We present two supervised (pre-)training methods to incorporate gloss definitions from lexical resources into neural language models (LMs). The training improves our models' performance for Word Sense Disambiguation (WSD) but also benefits general language understanding tasks while adding almost no parameters. We evaluate our techniques with seven...

Citations

... More systematic approaches are published [12][13][14][15], and this treatise aspires to contribute to spreading awareness about this critical agenda. ...
Preprint
Full-text available
This treatise advocates for using innovative and clear communication of knowledge, particularly through English as a global academic lingua franca, to improve knowledge transfer and clarity across disciplines. While recognising efforts to define transdisciplinary knowledge via specialised languages, it points out the challenge of lacking a universal nomenclature. The treatise urges global collaboration to establish and adopt clear, consistent nomenclature, enhancing transdisciplinary creation, communication, and application of knowledge. Examples of ambiguities are presented along with the proposed solutions.
... For example, we can distinguish the meaning of 43227 and 38789, while we can't distinguish the meanings of two instances of "chair" without context. We conduct more statistical experiments on the CoarseWSD-20 dataset (Loureiro et al. 2021) and present the results in Table 6, Table 7, and Table 8. We can find that the universal words of "chair" and "apple" have a good correlation to concepts, while the universal words for "club" are the same in most cases. ...
Article
There are two primary approaches to addressing cross-lingual transfer: multilingual pre-training, which implicitly aligns the hidden representations of various languages, and translate-test, which explicitly translates different languages into an intermediate language, such as English. Translate-test offers better interpretability compared to multilingual pre-training. However, it has lower performance than multilingual pre-training and struggles with word-level tasks due to translation altering word order. As a result, we propose a new Machine-created Universal Language (MUL) as an alternative intermediate language. MUL comprises a set of discrete symbols forming a universal vocabulary and a natural language to MUL translator for converting multiple natural languages to MUL. MUL unifies shared concepts from various languages into a single universal word, enhancing cross-language transfer. Additionally, MUL retains language-specific words and word order, allowing the model to be easily applied to word-level tasks. Our experiments demonstrate that translating into MUL yields improved performance compared to multilingual pre-training, and our analysis indicates that MUL possesses strong interpretability. The code is at: https://github.com/microsoft/Unicoder/tree/master/MCUL.
... (Lamsiyah et al., 2021). Referring to the application of the BERT method in the study of summary findings, this approach is frequently used to examine summary results based on sentence features (Gupta and Patel, 2021b), semantic aspects , and lexical aspects (Loureiro et al., 2021). ...
... In this study [10], the BERT model was the target of a quantitative and qualitative study and analysis regarding lexical ambiguity. According to the authors of the research, one of the relevant results is that BERT has the power to accurately pick up the distinctions of word meanings, even when there is only a restricted number of examples for each word sense. ...
Article
Full-text available
Natural language processing (NLP) and artificial intelligence (AI) have advanced significantly in recent years, enabling the development of various tasks, such as machine translation, text summarization, sentiment analysis, and speech analysis. However, there are still challenges to overcome, such as natural language ambiguity. One of the problems caused by ambiguity is the difficulty of determining the proper meaning of a word in a specific context. For example, the word “mouse” can mean a computer peripheral or an animal, depending on the context. This limitation can lead to an incorrect semantic interpretation of the processed sentence. In recent years, language models (LMs) have provided a new impetus to NLP and AI, including in the task of word sense disambiguation (WSD). LMs are capable of learning and generating texts as they are trained on large amounts of data. However, in the Portuguese language, there are still few studies on WSD using LMs. Given this scenario, this article presents a method for WSD for the Portuguese language. To do this, it uses the BERTimbau language model, which is specific to the Portuguese. The results will be evaluated using the metrics established in the literature.
... While word2vec and other traditional word embedding models are context-insensitive (i.e. a word embedding is fixed irrespective of the context in which its mentioned), contextualised embedding models such as BERT dynamically produce word embeddings, in other words, different contexts trigger different embeddings for the same word. This ultimately enhances the learning of different meanings and senses (Loureiro et al., 2021) in language modelling tasks, such as those covered in this work. We additionally observe DRKA(DPR) models perform poorly in comparison to the other models, and we attribute this to error propagation as a result of decoupling the retrieving from the alignment process i.e. errors that originate from the retrieving process done by DPR automatically affect the alignment process. ...
Article
Full-text available
Injecting textual information into knowledge graph (KG) entity representations has been a worthwhile expedition in terms of improving performance in KG oriented tasks within the NLP community. External knowledge often adopted to enhance KG embeddings ranges from semantically rich lexical dependency parsed features to a set of relevant key words to entire text descriptions supplied from an external corpus such as wikipedia and many more. Despite the gains this innovation (Text-enhanced KG embeddings) has made, the proposal in this work suggests that it can be improved even further. Instead of using a single text description (which would not sufficiently represent an entity because of the inherent lexical ambiguity of text), we propose a multi-task framework that jointly selects a set of text descriptions relevant to KG entities as well as align or augment KG embeddings with text descriptions. Different from prior work that plugs formal entity descriptions declared in knowledge bases, this framework leverages a retriever model to selectively identify richer or highly relevant text descriptions to use in augmenting entities. Furthermore, the framework treats the number of descriptions to use in augmentation process as a parameter, which allows the flexibility of enumerating across several numbers before identifying an appropriate number. Experiment results for Link Prediction demonstrate a 5.5% and 3.5% percentage increase in the Mean Reciprocal Rank (MRR) and Hits@10 scores respectively, in comparison to text-enhanced knowledge graph augmentation methods using traditional CNNs.
... While word2vec and other traditional word embedding models are context-insensitive (i.e. a word embedding is fixed irrespective of the context in which its mentioned), contextualised embedding models such as BERT dynamically produce word embeddings, in other words, different contexts trigger different embeddings for the same word. This ultimately enhances the learning of different meanings and senses (Loureiro et al., 2021) in language modelling tasks, such as those covered in this work. We additionally observe DRKA(DPR) models perform poorly in comparison to the other models, and we attribute this to error propagation as a result of decoupling the retrieving from the alignment process i.e. errors that originate from the retrieving process done by DPR automatically affect the alignment process. ...
Preprint
Full-text available
Injecting textual information into knowledge graph (KG) entity representations has been a worthwhile expedition in terms of improving performance in KG oriented tasks within the NLP community. External knowledge often adopted to enhance KG embeddings ranges from semantically rich lexical dependency parsed features to a set of relevant key words to entire text descriptions supplied from an external corpus such as wikipedia and many more. Despite the gains this innovation (Text-enhanced KG embeddings) has made, the proposal in this work suggests that it can be improved even further. Instead of using a single text description (which would not sufficiently represent an entity because of the inherent lexical ambiguity of text), we propose a multi-task framework that jointly selects a set of text descriptions relevant to KG entities as well as align or augment KG embeddings with text descriptions. Different from prior work that plugs formal entity descriptions declared in knowledge bases, this framework leverages a retriever model to selectively identify richer or highly relevant text descriptions to use in augmenting entities. Furthermore, the framework treats the number of descriptions to use in augmentation process as a parameter, which allows the flexibility of enumerating across several numbers before identifying an appropriate number. Experiment results for Link Prediction demonstrate a 5.5% and 3.5% percentage increase in the Mean Reciprocal Rank (MRR) and Hits@10 scores respectively, in comparison to text-enhanced knowledge graph augmentation methods using traditional CNNs.
... WSD has a long history in NLP as an essential issue in several topics such as machine translation [67], information retrieval [3], ... etc. WSD is also known as a knowledge-intensive task that needs important resources such as sense inventories and annotated corpora [18]. Over the years, the WSD task has gained a large body of literature where approaches belong to the traditional NLP's: linguistic, knowledge-based, unsupervised, and supervised methods [18] [50] [51] [60] [76] [61]. More recently and unsurprisingly, with the great success of deep learning and neural language models, we can witness a shift towards those approaches in recent WSD works: ...
Article
Word sense disambiguation is the task of automatically determining the meaning of a polysemous word in a specific context. Word sense induction is the unsupervised clustering of word usages in a different context to distinguish senses and perform unsupervised WSD. Most studies consider function words as stop words and delete them in the pre-processing step. However, function words can encode meaningful information that can help to improve the performance of WSD approaches. We propose in this work a novel approach to solve Arabic verb sense disambiguation which is based on a preposition-based classification that is used in an automatic WSI step to build sense inventories to disambiguate Arabic verbs. On another hand, in the wake of the success of neural language models, recent works obtained encouraging results using BERT pre-trained models for English-language WSD approaches. Hence, we use contextualized word embeddings for an unsupervised Arabic WSD that is based on linguistic markers and uses SBERT pre-trained models, which yields encouraging results that outperform other existing unsupervised neural AWSD approaches.
... This work was done for the English language. They also concluded that feature extraction along with fine tuning the BERT model gave the best result of WSD [8]. ...
... distinguish the meanings of two "chair" without context. We conduct more statistical experiments on CoarseWSD-20 dataset (Loureiro et al., 2021) and present the results in Table 6, Table 7 and Table 8. We can find that the universal words of "chair" and "apple" have a good correlation to concepts, while the universal words for "club" are same in most cases. ...
Preprint
There are two types of approaches to solving cross-lingual transfer: multilingual pre-training implicitly aligns the hidden representations of different languages, while the translate-test explicitly translates different languages to an intermediate language, such as English. Translate-test has better interpretability compared to multilingual pre-training. However, the translate-test has lower performance than multilingual pre-training(Conneau and Lample, 2019; Conneau et al, 2020) and can't solve word-level tasks because translation rearranges the word order. Therefore, we propose a new Machine-created Universal Language (MUL) as a new intermediate language. MUL consists of a set of discrete symbols as universal vocabulary and NL-MUL translator for translating from multiple natural languages to MUL. MUL unifies common concepts from different languages into the same universal word for better cross-language transfer. And MUL preserves the language-specific words as well as word order, so the model can be easily applied to word-level tasks. Our experiments show that translating into MUL achieves better performance compared to multilingual pre-training, and our analyses show that MUL has good interpretability.
... In this work, we are working with the CoarseWSD-20 dataset (Loureiro et al., 2021) that is a dataset for figuring out the actual meaning of words that, in practice, can have different meanings. The dataset is made from Wikipedia and only includes nouns. ...
Preprint
Full-text available
The issue of word sense ambiguity poses a significant challenge in natural language processing due to the scarcity of annotated data to feed machine learning models to face the challenge. Therefore, unsupervised word sense disambiguation methods have been developed to overcome that challenge without relying on annotated data. This research proposes a new context-aware approach to unsupervised word sense disambiguation, which provides a flexible mechanism for incorporating contextual information into the similarity measurement process. We experiment with a popular benchmark dataset to evaluate the proposed strategy and compare its performance with state-of-the-art unsupervised word sense disambiguation techniques. The experimental results indicate that our approach substantially enhances disambiguation accuracy and surpasses the performance of several existing techniques. Our findings underscore the significance of integrating contextual information in semantic similarity measurements to manage word sense ambiguity in unsupervised scenarios effectively.