Similar publications

Conference Paper
Full-text available
We propose a novel semantic tagging task, semtagging, tailored for the purpose of multilingual semantic parsing, and present the first tagger using deep residual networks (ResNets). Our tagger uses both word and character representations and includes a novel residual bypass architecture. We evaluate the tagset both intrinsically on the new task of...
Conference Paper
Full-text available
Text summarization is considered as a challenging task in the NLP community. The availability of datasets for the task of multilingual text summarization is rare, and such datasets are difficult to construct. In this work, we build an abstract text summarizer for the German language text using the state-of-the-art "Transformer" model. We propose an...
Article
Full-text available
We propose the use of the pitch rate of free-form speech recorded by smartphones as an index of voice disability. This research compares the effectiveness of pitch rate, jitter, shimmer, and harmonic-to-noise ratio (HNR) as indices of voice disability in English, German, and Japanese. Normally, the evaluation of these indices is performed using lon...
Article
Full-text available
Word embeddings have become a standard resource in the toolset of any Natural Language Processing practitioner. While monolingual word embeddings encode information about words in the context of a particular language, cross-lingual embeddings define a multilingual space where word embeddings from two or more languages are integrated together. Curre...

Citations

... Each concept or entity is identified by a Uniform Resource Identifier (URI), which describes an entity in the knowledge graph. In their functionality, URIs clearly differ from labels(Montiel-Ponsoda et al., 2011): while labels are a way for humans to interact with the data in natural language, URIs are supposed to be the unambiguous identifiers of entities, i.e., one URI refers to only one entity. Labels can be changed and exist in multiple languages. ...
Thesis
Full-text available
Content on the web is predominantly in English, which makes it inaccessible to individuals who exclusively speak other languages. Knowledge graphs can store multilingual information, facilitate the creation of multilingual applications, and make these accessible to more language communities. In this thesis, we present studies to assess and improve the state of labels and languages in knowledge graphs and apply multilingual information. We propose ways to use multilingual knowledge graphs to reduce gaps in coverage between languages. We explore the current state of language distribution in knowledge graphs by developing a framework - based on existing standards, frameworks, and guidelines - to measure label and language distribution in knowledge graphs. We apply this framework to a dataset representing the web of data, and to Wikidata. We find that there is a lack of labelling on the web of data, and a bias towards a small set of languages. Due to its multilingual editors, Wikidata has a better distribution of languages in labels. We explore how this knowledge about labels and languages can be used in the domain of question answering. We show that we can apply our framework to the task of ranking and selecting knowledge graphs for a set of user questions A way of overcoming the lack of multilingual information in knowledge graphs is to transliterate and translate knowledge graph labels and aliases. We propose the automatic classification of labels into transliteration or translation in order to train a model for each task. Classification before generation improves results compared to using either a translation- or transliteration-based model on their own. A use case of multilingual labels is the generation of article placeholders for Wikipedia using neural text generation in lower-resourced languages. On the basis of surveys and semi-structured interviews, we show that Wikipedia community members find the placeholder pages, and especially the generated summaries, helpful, and are highly likely to accept and reuse the generated text.<br/
... [14] investigates how multilingual ontologies can facilitate question answering. Similarly, [3,9] look at how to enable multilinguality for ontologies. ...
Conference Paper
Full-text available
Multilinguality is an important topic for knowledge bases, especially Wikidata, that was build to serve the multilingual requirements of an international community. Its labels are the way for humans to interact with the data. In this paper, we explore the state of languages in Wikidata as of now, especially in regard to its ontology, and the relationship to Wikipedia. Furthermore, we set the multilinguality of Wikidata in the context of the real world by comparing it to the distribution of native speakers. We find an existing language maldistribution, which is less urgent in the ontology, and promising results for future improvements.<br/
... The only way to extract the languages is by using the lexicons containing the terms. Montiel-Ponsoda et al. (2011) introduce an extension module to lemon for modeling translations explicitly. The main part of this module is the Translation class, which represent the relation between lexical senses. ...
... Figure 6 shows an overview of this model. Gracia et al. (2014) propose a modification 13 of the work described in Montiel-Ponsoda et al. (2011). Figure 7 shows an overview of the model. ...
... The model in Montiel-Ponsoda et al. (2011) allows the specification of provenance information associated to translations by using the translationOrigin Fig. 6 Translation handling model suggested in Montiel-Ponsoda et al. (2011) property for pointing to external resources. Gracia et al. (2014) provenance is also taken into account through the use of DCMI metadata terms. ...
Article
Full-text available
Wiktionary is an online collaborative project based on the same principle than Wikipedia , where users can create, edit and delete entries containing lexical information. While the open nature of Wiktionary is the reason for its fast growth, it has also brought a problem: how reliable is the lexical information contained in every article? If we are planing to use Wiktionary translations as source content to accomplish a certain use case, we need to be able to answer this question and extract measures of their confidence . In this paper we present our work on assessing the quality of Wiktionary translations by introducing confidence metrics. Additionally, we describe our effort to share Wiktionary translations and the associated confidence values as linked data.
... With respect to Natural Language Processing, adopting linked data principles for the distribution of linguistic resources can bring many benefits, including: resource interoperability, both at a structural and conceptual level; resource integration (via interlinking); and resource maintenance (via a rich ecosystem of technologies allowing, among other things, continuous updating) [14]. Based on such insights, members of the NLP and SW communities -in particular the Open Lin-guistics Working Group and the W3C Ontology-Lexica Community Group 2 -joined efforts for the definition of best practices [27] and the design of principled models for the representation of linguistic information [46,42]. This laid the foundation for the development of a Linguistic Linked Open Data cloud (LLOD) and provided a real impetus for the publication and the use of linguistic data collections on the Web 3 . ...
... Variant types, however, are not specified in JRC-Names. As a consequence, even if lemon offers the possibility to represent term variation at the level of surface form, word or sense [42,43], name variants are all lemon:LexicalEntry (i.e. words), although some could be conceived as different lemon:Forms of a variant. ...
Article
Full-text available
Since 2004 the European Commission's Joint Research Centre (JRC) has been analysing the online version of printed media in over twenty languages and has automatically recognised and compiled large amounts of named entities (persons and organisations) and their many name variants. The collected variants not only include standard spellings in various countries, languages and scripts, but also frequently found spelling mistakes or lesser used name forms, all occurring in real-life text (e.g. Benjamin/Binyamin/Bibi/Benyamín/Biniamin/Netanyahu/Netanjahu/Nétanyahou/Netahny/). This entity name variant data, known as JRC-Names, has been available for public download since 2011. In this article, we report on our efforts to render JRC-Names as Linked Data (LD), using the lexicon model for ontologies lemon. Besides adhering to Semantic Web standards, this new release goes beyond the initial one in that it includes titles found next to the names, as well as date ranges when the titles and the name variants were found. It also establishes links towards existing datasets, such as DBpedia and Talk-Of-Europe. As multilingual linguistic linked dataset, JRC-Names can help bridge the gap between structured data and natural languages, thus supporting large-scale data integration, e.g. cross-lingual mapping, and web-based content processing, e.g. entity linking. JRC-Names is publicly available through the dataset catalogue of the European Union's Open Data Portal.
... Besides that, other important studies targeted the "cultural" translation issue, like the recent European project MONNET which has proposed a model named LEMON to deal with this issue. The authors of this work [9] developed a translation module which is intended to link between concepts coming from various language/culture ontologies. But their solution does not consider the absence of the equivalent term representing the concept in the target culture and still gives a literal translation to the concept even if it may be incorrect in the target culture. ...
... Besides that, other important studies targeted the " cultural " translation issue, like the recent European project MONNET which has proposed a model named LEMON to deal with this issue. The authors of this work (Montiel-Ponsoda et al., 2011) developed a translation module which is intended to link between concepts coming from various language/culture ontologies. First, they provide a literary translation of the concept in the target language. ...
... In order to define translation strategies that successfully address these problems, all the previous senses of context must be considered and interrelated. According to Montiel-Ponsoda et al. (2011), when there are several terms in each language, it is desirable to unambiguously express which term variant in language A is the translation of which term variant in language B. At this point, translation relations acquire significance. Nevertheless, even when all possible contextual constraints of both source and target terms and concepts are defined, this still does not establish 1:1 correspondences. ...
... As previously stated, Montiel-Ponsoda et al. (2011) propose representing in lemon two translation relations (i.e. descriptive and cultural translations). ...
Chapter
Full-text available
One of the main challenges of the Multilingual Semantic Web (MSW) isontology localization. This first needs a representation framework that allows for theinclusion of different syntactic, lexical, conceptual and semantic features, but it alsoneeds to account for dynamism and context from both a monolingual and multilingual perspective. We understand dynamism as the changing nature of both concepts and terms due to contextual constraints, whereas context is defined by the different pragmatic factors that modulate such dynamism (e.g. specialized domains, cultures,communicative situations). Context is thus an important construct when describing the concepts and terms of any domain in monolingual resources. However, in multilingual resources, context also affects interlingual correspondences. When dealing with multilingual ontologies, context features must be extended to include translation relations and degrees of equivalence.
... However, we claim that some of these crosslingual equivalences need to be analyzed carefully within the multilingual dimension, since we may want to establish cross-lingual and cross-cultural equivalences that may not admit the strong ontological commitments that current links make. For more on this, see [Montiel-Ponsoda et al., 2011b]. See also León-Araúz and Faber (this volume) for an extensive discussion on cross-linguistic problems. ...
Chapter
Full-text available
Linked Data technologies and methods are enabling the creation of a data network where pieces of data are interconnected on the Web using machine-readable formats such as Resource Description Framework (RDF). This paradigm offers great opportunities to connect and make available knowledge in different languages. However, in order to make this vision a reality, there is a need for guidelines, techniques, and methods that allow publishers of data to overcome language and technological barriers. In this chapter, we review existing methodologies from the point of view of multilingualism and propose a series of guidelines to help publishers when publishing Linked Data in several languages.
... The translation module we propose grounds on a previous attempt for extending lemon to support translations ( Montiel-Ponsoda et al. 2011). Our current approach, though, has some remarkable differences, being the main one that translation categories have been clearly separated from the model, as we will describe later in this section. ...
... In order to define translation strategies that successfully address these problems, all the previous senses of context must be considered and interrelated. According to Montiel Ponsoda et al (2011), when there are several terms in each language, it is desirable to unambiguously express which term variant in language A is the translation of which term variant in language B. At this point, translation relations acquire significance . Nevertheless, even when all possible contextual constraints of both source and target terms and concepts are defined, this still does not establish 1:1 correspondences. ...