10-fold cross validation F-Measure results.

Source publication

WOMBAT– A Generalization Approach for Automatic Link Discovery

Conference Paper

Full-text available

Jun 2017

A significant portion of the evolution of Linked Data datasets lies in updating the links to other datasets. An important challenge when aiming to update these links automatically under the open-world assumption is the fact that usually only positive examples for the links exist. We address this challenge by presenting and evaluating WOMBAT, a nove...

Context 1

... grid size was set to 5 and 100 iterations were carried out as in [16]. The results of the evaluation are presented in Table 2. The simple version of WOMBAT was able to outperform the state-of-the-art approaches in 4 out of the 8 data sets and came in the second position in 2 datasets. ...

View in full-text

AUC changes in different training time intervals. Horizontal axis...

Impact of the restart parameter α for iteration and the objective...

A simple example for relationship prediction

Supervised ranking framework for relationship prediction in heterogeneous information networks

Article

Full-text available

May 2018

In recent years, relationship prediction in heterogeneous information networks (HINs) has become an active topic. The most essential part of this task is how to effectively represent and utilize the important three kinds of information hidden in connections of the network, namely local structure information (Local-info), global structure informatio...

Figure 1: The overall architecture for our Meta-VGAN Zero Shot Learning...

Figure 3: The left figure shows the mean per class accuracy with and...

Towards Zero-Shot Learning with Fewer Seen Class Examples

Preprint

Full-text available

Nov 2020

We present a meta-learning based generative model for zero-shot learning (ZSL) towards a challenging setting when the number of training examples from each \emph{seen} class is very few. This setup contrasts with the conventional ZSL approaches, where training typically assumes the availability of a sufficiently large number of training examples fr...

Figure 2: Sentence Level Results on TIMERE

Learning with Noise: Enhance Distantly Supervised Relation Extraction with Dynamic Transition Matrix

Article

Full-text available

May 2017

Distant supervision significantly reduces human efforts in building training data for many classification tasks. While promising, this technique often introduces noise to the generated training data, which can severely affect the model performance. In this paper, we take a deep look at the application of distant supervision in relation extraction....

NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning

Preprint

Full-text available

May 2018

Video learning is an important task in computer vision and has experienced increasing interest over the recent years. Since even a small amount of videos easily comprises several million frames, methods that do not rely on a frame-level annotation are of special importance. In this work, we propose a novel learning algorithm with a Viterbi-based lo...

CIL: Contrastive Instance Learning Framework for Distantly Supervised Relation Extraction

Preprint

Full-text available

Jun 2021

The journey of reducing noise from distant supervision (DS) generated training data has been started since the DS was first introduced into the relation extraction (RE) task. For the past decade, researchers apply the multi-instance learning (MIL) framework to find the most reliable feature from a bag of sentences. Although the pattern of MIL bags...

WorldKG: World-Scale Completion of Geographic Information

Chapter

Full-text available

Dec 2023

Knowledge graphs provide standardized machine-readable representations of real-world entities and their relations. However, the coverage of geographic entities in popular general-purpose knowledge graphs, such as Wikidata and DBpedia, is limited. An essential source of the openly available information regarding geographic entities is OpenStreetMap (OSM). In contrast to knowledge graphs, OSM lacks a clear semantic representation of the rich geographic information it contains. The generation of semantic representations of OSM entities and their interlinking with knowledge graphs are inherently challenging due to OSM’s large, heterogeneous, ambiguous, and flat schema and annotation sparsity. This chapter discusses recent knowledge graph completion methods for geographic data, comprising entity linking and schema inference for geographic entities, to provide semantic geographic information in knowledge graphs. Furthermore, we present the WorldKG knowledge graph, lifting OSM entities into a semantic representation.

Iterative Geographic Entity Alignment with Cross-Attention

Chapter

Full-text available

Oct 2023

Aligning schemas and entities of community-created geographic data sources with ontologies and knowledge graphs is a promising research direction for making this data widely accessible and reusable for semantic applications. However, such alignment is challenging due to the substantial differences in entity representations and sparse interlinking across sources, as well as high heterogeneity of schema elements and sparse entity annotations in community-created geographic data. To address these challenges, we propose a novel cross-attention-based iterative alignment approach called IGEA in this paper. IGEA adopts cross-attention to align heterogeneous context representations across geographic data sources and knowledge graphs. Moreover, IGEA employs an iterative approach for schema and entity alignment to overcome annotation and interlinking sparsity. Experiments on real-world datasets from several countries demonstrate that our proposed approach increases entity alignment performance compared to baseline methods by up to 18% points in F1-score. IGEA increases the performance of the entity and tag-to-class alignment by 7 and 8% points in terms of F1-score, respectively, by employing the iterative method.

Explainable Integration of Knowledge Graphs Using Large Language Models

Chapter

Full-text available

Jun 2023

Linked knowledge graphs build the backbone of many data-driven applications such as search engines, conversational agents and e-commerce solutions. Declarative link discovery frameworks use complex link specifications to express the conditions under which a link between two resources can be deemed to exist. However, understanding such complex link specifications is a challenging task for non-expert users of link discovery frameworks. In this paper, we address this drawback by devising NMV-LS, a language model-based verbalization approach for translating complex link specifications into natural language. NMV-LS relies on the results of rule-based link specification verbalization to apply continuous training on T5, a large language model based on the Transformer architecture. We evaluated NMV-LS on English and German datasets using well-known machine translation metrics such as BLUE, METEOR, ChrF++ and TER. Our results suggest that our approach achieves a verbalization performance close to that of humans and outperforms state of the art approaches. Our source code and datasets are publicly available at https://github.com/dice-group/NMV-LS.KeywordsKG IntegrationNeural Machine VerbalizationExplainable AISemantic WebMachine Learning ApplicationsLarge Language Models

NELLIE: Never-Ending Linking for Linked Open Data

Article

Full-text available

Jan 2023

Knowledge graphs (KGs) that follow the Linked Data principles are created daily. However, there are no holistic models for the Linked Open Data (LOD). Building these models( i.e., engineering a pipeline system) is still a big challenge in order to make the LOD vision comes true. In this paper, we address this challenge by presenting NELLIE, a pipeline architecture to build a chain of modules, in which each of our modules addresses one data augmentation challenge. The ultimate goal of the proposed architecture is to build a single fused knowledge graph out of the LOD. NELLIE starts by crawling the available knowledge graphs in the LOD cloud. It then finds a set of matching KG pairs. NELLIE uses a two-phase linking approach for each pair (first an ontology matching phase, then an instance matching phase). Based on the ontology and instance matching, NELLIE fuses each pair of knowledge graphs into a single knowledge graph. The resulting fused KG is then an ideal data source for knowledge-driven applications such as search engines, question answering, digital assistants and drug discovery. Our evaluation shows an improved Hit @1 score of the link prediction task on the resulting fused knowledge graph by NELLIE in up to 94.44% of the cases. Our evaluation also shows a runtime improvement by several orders of magnitude when comparing our two-phases linking approach with the estimated runtime of linking using a naïve approach.

Deep Learning Meets Knowledge Graphs: A Comprehensive Survey

Preprint

Full-text available

Sep 2022

Knowledge Graphs (KGs) which can encode structural relations connecting two objects with one or multiple related attributes have become an increasingly popular research direction. Given the superiority of deep learning in representing complex data in continuous space, it is handy to represent KGs data, thus promoting KGs construction, representation, and application. This survey article provides a comprehensive overview of deep learning technologies and KGs by exploring research topics from diverse phases of the KGs lifecycle, such as construction, representation, and knowledge-aware application. We propose new taxonomies on these research topics for motivating cross-understanding between deep learning and KGs. Based on the above three phases, we classify the different tasks of KGs and task-related methods. Afterwards, we explain the principles of combing deep learning in various KGs steps like KGs embedding. We further discuss the contribution and advantages of deep learning applied to the different application scenarios. Finally, we summarize some critical challenges and open issues deep learning approaches face in KGs.

FTRLIM: Distributed Instance Matching Framework for Large-Scale Knowledge Graph Fusion

Article

Full-text available

May 2021
Entropy

Instance matching is a key task in knowledge graph fusion, and it is critical to improving the efficiency of instance matching, given the increasing scale of knowledge graphs. Blocking algorithms selecting candidate instance pairs for comparison is one of the effective methods to achieve the goal. In this paper, we propose a novel blocking algorithm named MultiObJ, which constructs indexes for instances based on the Ordered Joint of Multiple Objects’ features to limit the number of candidate instance pairs. Based on MultiObJ, we further propose a distributed framework named Follow-the-Regular-Leader Instance Matching (FTRLIM), which matches instances between large-scale knowledge graphs with approximately linear time complexity. FTRLIM has participated in OAEI 2019 and achieved the best matching quality with significantly efficiency. In this research, we construct three data collections based on a real-world large-scale knowledge graph. Experiment results on the constructed data collections and two real-world datasets indicate that MultiObJ and FTRLIM outperform other state-of-the-art methods.

An explainable Link Discovery

Article

Feb 2021
DATA KNOWL ENG

The number and size of datasets abiding by the Linked Data paradigm increase every day. Discovering links between these datasets is thus central to achieving the vision behind the Data Web. Declarative Link Discovery (LD) frameworks rely on complex Link Specification (LS) to express the conditions under which two resources should be linked. Understanding such LS is not a trivial task for non-expert users. Particularly when such users are interested in generating LS to match their needs. Even if the user applies a machine learning algorithm for the automatic generation of the required LS, the challenge of explaining the resultant LS persists. Hence, providing explainable LS is the key challenge to enable users who are unfamiliar with underlying LS technologies to use them effectively and efficiently. In this paper, we extend our previous work (Ahmed et al., 2019) by proposing a generic multilingual approach that allows verbalization of LS in many languages, i.e., converts LS into understandable natural language text. In this work, we ported our LS verbalization framework into German and Spanish, in addition to English language. Our adequacy and fluency evaluations show that our approach can generate complete and easily understandable natural language descriptions even by lay users. Moreover, we devised an experimental neural approach for improving the quality of our generated texts. Our neural approach achieves promising results in terms of BLEU, METEOR and chrF++.

EAGER: Embedding-Assisted Entity Resolution for Knowledge Graphs

Preprint

Full-text available

Jan 2021

Entity Resolution (ER) is a constitutional part for integrating different knowledge graphs in order to identify entities referring to the same real-world object. A promising approach is the use of graph embeddings for ER in order to determine the similarity of entities based on the similarity of their graph neighborhood. The similarity computations for such embeddings translates to calculating the distance between them in the embedding space which is comparatively simple. However, previous work has shown that the use of graph embeddings alone is not sufficient to achieve high ER quality. We therefore propose a more comprehensive ER approach for knowledge graphs called EAGER (Embedding-Assisted Knowledge Graph Entity Resolution) to flexibly utilize both the similarity of graph embeddings and attribute values within a supervised machine learning approach. We evaluate our approach on 23 benchmark datasets with differently sized and structured knowledge graphs and use hypothesis tests to ensure statistical significance of our results. Furthermore we compare our approach with state-of-the-art ER solutions, where our approach yields competitive results for table-oriented ER problems and shallow knowledge graphs but much better results for deeper knowledge graphs.

On the relation between keys and link keys for data interlinking

Article

Full-text available

Dec 2020

Both keys and their generalisation, link keys, may be used to perform data interlinking, i.e. finding identical resources in different RDF datasets. However, the precise relationship between keys and link keys has not been fully determined yet. A common formal framework encompassing both keys and link keys is necessary to ensure the correctness of data interlinking tools based on them, and to determine their scope and possible overlapping. In this paper, we provide a semantics for keys and link keys within description logics. We determine under which conditions they are legitimate to generate links. We provide conditions under which link keys are logically equivalent to keys. In particular, we show that data interlinking with keys and ontology alignments can be reduced to data interlinking with link keys, but not the other way around.

LIGER -Link Discovery with Partial Recall

Conference Paper

Full-text available

Oct 2020

Modern data-driven frameworks often have to process large amounts of data periodically. Hence, they often operate under time or space constraints. This also holds for Linked Data-driven frameworks when processing RDF data, in particular, when they perform link discovery tasks. In this work, we present a novel approach for link discovery under constraints pertaining to the expected recall of a link discovery task. Given a link specification, the approach aims to find a subsumed link specification that achieves a lower run time than the input specification while abiding by a predefined constraint on the expected recall it has to achieve. Our approach, dubbed LIGER, combines downward refinement operators with monotonicity assumptions to detect such specifications. We evaluate our approach on seven datasets. Our results suggest that the different implementations of LIGER can detect subsumed specifications that abide by expected recall constraints efficiently, thus leading to significantly shorter overall run times than our baseline.

10-fold cross validation F-Measure results.

Context in source publication

Similar publications

Citations