Table 2 - uploaded by Mohamed Sherif
Content may be subject to copyright.
10-fold cross validation F-Measure results. 

10-fold cross validation F-Measure results. 

Source publication
Conference Paper
Full-text available
A significant portion of the evolution of Linked Data datasets lies in updating the links to other datasets. An important challenge when aiming to update these links automatically under the open-world assumption is the fact that usually only positive examples for the links exist. We address this challenge by presenting and evaluating WOMBAT, a nove...

Context in source publication

Context 1
... grid size was set to 5 and 100 iterations were carried out as in [16]. The results of the evaluation are presented in Table 2. The simple version of WOMBAT was able to outperform the state-of-the-art approaches in 4 out of the 8 data sets and came in the second position in 2 datasets. ...

Similar publications

Article
Full-text available
In recent years, relationship prediction in heterogeneous information networks (HINs) has become an active topic. The most essential part of this task is how to effectively represent and utilize the important three kinds of information hidden in connections of the network, namely local structure information (Local-info), global structure informatio...
Preprint
Full-text available
We present a meta-learning based generative model for zero-shot learning (ZSL) towards a challenging setting when the number of training examples from each \emph{seen} class is very few. This setup contrasts with the conventional ZSL approaches, where training typically assumes the availability of a sufficiently large number of training examples fr...
Article
Full-text available
Distant supervision significantly reduces human efforts in building training data for many classification tasks. While promising, this technique often introduces noise to the generated training data, which can severely affect the model performance. In this paper, we take a deep look at the application of distant supervision in relation extraction....
Preprint
Full-text available
Video learning is an important task in computer vision and has experienced increasing interest over the recent years. Since even a small amount of videos easily comprises several million frames, methods that do not rely on a frame-level annotation are of special importance. In this work, we propose a novel learning algorithm with a Viterbi-based lo...
Preprint
Full-text available
The journey of reducing noise from distant supervision (DS) generated training data has been started since the DS was first introduced into the relation extraction (RE) task. For the past decade, researchers apply the multi-instance learning (MIL) framework to find the most reliable feature from a bag of sentences. Although the pattern of MIL bags...

Citations

... State-of-the-art entity linking approaches such as LIMES (Ngomo and Auer 2011) and WOMBAT (Sherif et al. 2017) assume that entities are represented through the same number of properties with a 1:1 property mapping. In the case of linking OSM nodes with the knowledge graph entities, these assumptions do not hold. ...
Chapter
Full-text available
Knowledge graphs provide standardized machine-readable representations of real-world entities and their relations. However, the coverage of geographic entities in popular general-purpose knowledge graphs, such as Wikidata and DBpedia, is limited. An essential source of the openly available information regarding geographic entities is OpenStreetMap (OSM). In contrast to knowledge graphs, OSM lacks a clear semantic representation of the rich geographic information it contains. The generation of semantic representations of OSM entities and their interlinking with knowledge graphs are inherently challenging due to OSM’s large, heterogeneous, ambiguous, and flat schema and annotation sparsity. This chapter discusses recent knowledge graph completion methods for geographic data, comprising entity linking and schema inference for geographic entities, to provide semantic geographic information in knowledge graphs. Furthermore, we present the WorldKG knowledge graph, lifting OSM entities into a semantic representation.
... In the past, approaches often relied on geographic distance and linguistic similarity between the labels of the entities [1,13]. LIMES [20] relies on rules to rate the similarity between entities and uses these rules in a supervised model to predict the links. Tempelmeier et al. [21] proposed the OSM2KG algorithm -a machinelearning model to learn a latent representation of OSM nodes and align them with knowledge graphs. ...
Chapter
Full-text available
Aligning schemas and entities of community-created geographic data sources with ontologies and knowledge graphs is a promising research direction for making this data widely accessible and reusable for semantic applications. However, such alignment is challenging due to the substantial differences in entity representations and sparse interlinking across sources, as well as high heterogeneity of schema elements and sparse entity annotations in community-created geographic data. To address these challenges, we propose a novel cross-attention-based iterative alignment approach called IGEA in this paper. IGEA adopts cross-attention to align heterogeneous context representations across geographic data sources and knowledge graphs. Moreover, IGEA employs an iterative approach for schema and entity alignment to overcome annotation and interlinking sparsity. Experiments on real-world datasets from several countries demonstrate that our proposed approach increases entity alignment performance compared to baseline methods by up to 18% points in F1-score. IGEA increases the performance of the entity and tag-to-class alignment by 7 and 8% points in terms of F1-score, respectively, by employing the iterative method.
... A specification is named atomic LS when it consists of exactly one filtering function. Although a complex specification (complex LS ) can be obtained by merging two specifications L 1 and L 2 through an operator ω that combines the results of L 1 and L 2 , here we use the operators ⊓, ⊔ and \ as they are complete and frequently used to define LS [20]. A graphical representation of a complex LS is given in Figure 1. ...
... The rule-based verbalizer in [1] is based on Reiter & Dale NLG architecture [19]. In [1] , real datasets (knowledge graphs) are used to generate LSs using Wombat [20]. Since the number of properties used in [1] is limited, it results in less diverse LSs. ...
Chapter
Full-text available
Linked knowledge graphs build the backbone of many data-driven applications such as search engines, conversational agents and e-commerce solutions. Declarative link discovery frameworks use complex link specifications to express the conditions under which a link between two resources can be deemed to exist. However, understanding such complex link specifications is a challenging task for non-expert users of link discovery frameworks. In this paper, we address this drawback by devising NMV-LS, a language model-based verbalization approach for translating complex link specifications into natural language. NMV-LS relies on the results of rule-based link specification verbalization to apply continuous training on T5, a large language model based on the Transformer architecture. We evaluated NMV-LS on English and German datasets using well-known machine translation metrics such as BLUE, METEOR, ChrF++ and TER. Our results suggest that our approach achieves a verbalization performance close to that of humans and outperforms state of the art approaches. Our source code and datasets are publicly available at https://github.com/dice-group/NMV-LS.KeywordsKG IntegrationNeural Machine VerbalizationExplainable AISemantic WebMachine Learning ApplicationsLarge Language Models
... • In the ontology matching stage, we implemented the content-based class matching ourselves and integrated two state-of-the-art systems. • For the instance matching stage, we base our implementation on the state-of-the art link discovery framework LIMES [6], where we modified the way of training the WOMBAT [12] to generate link specifications. We then integrated LIMES into NELLIE as listed in Algorithm 1in the paper. ...
... For more information, see https://creativecommons.org/licenses/by-nc-nd/4 LS) can be obtained by gluing two specifications L 1 and L 2 through an operator op that combines the results of two LSs L 1 and L 2 . Here, we use the operators , and \ as they are complete and frequently used to define LS [12]. A LS is also called linkage rule in the literature [7]. ...
... Note that a LS can be generated manually or automatically. In NELLIE, we use the state-of-the-art algorithm WOMBAT [12] to automatically generate LS. WOMBAT learns link specifications based on the concept of generalisation in quasi-ordered spaces. ...
Article
Full-text available
Knowledge graphs (KGs) that follow the Linked Data principles are created daily. However, there are no holistic models for the Linked Open Data (LOD). Building these models( i.e., engineering a pipeline system) is still a big challenge in order to make the LOD vision comes true. In this paper, we address this challenge by presenting NELLIE, a pipeline architecture to build a chain of modules, in which each of our modules addresses one data augmentation challenge. The ultimate goal of the proposed architecture is to build a single fused knowledge graph out of the LOD. NELLIE starts by crawling the available knowledge graphs in the LOD cloud. It then finds a set of matching KG pairs. NELLIE uses a two-phase linking approach for each pair (first an ontology matching phase, then an instance matching phase). Based on the ontology and instance matching, NELLIE fuses each pair of knowledge graphs into a single knowledge graph. The resulting fused KG is then an ideal data source for knowledge-driven applications such as search engines, question answering, digital assistants and drug discovery. Our evaluation shows an improved Hit @1 score of the link prediction task on the resulting fused knowledge graph by NELLIE in up to 94.44% of the cases. Our evaluation also shows a runtime improvement by several orders of magnitude when comparing our two-phases linking approach with the estimated runtime of linking using a naïve approach.
... Meanwhile, entity resolution has been applied in various domains, such as finance and biology. Entity resolution may be named differently in different studies, including deduplication [65], linking discovery [66], record linkage [67][68][69]. Traditional methods of entity resolution were based on distance or similarity. ...
Preprint
Full-text available
Knowledge Graphs (KGs) which can encode structural relations connecting two objects with one or multiple related attributes have become an increasingly popular research direction. Given the superiority of deep learning in representing complex data in continuous space, it is handy to represent KGs data, thus promoting KGs construction, representation, and application. This survey article provides a comprehensive overview of deep learning technologies and KGs by exploring research topics from diverse phases of the KGs lifecycle, such as construction, representation, and knowledge-aware application. We propose new taxonomies on these research topics for motivating cross-understanding between deep learning and KGs. Based on the above three phases, we classify the different tasks of KGs and task-related methods. Afterwards, we explain the principles of combing deep learning in various KGs steps like KGs embedding. We further discuss the contribution and advantages of deep learning applied to the different application scenarios. Finally, we summarize some critical challenges and open issues deep learning approaches face in KGs.
... Moreover, rather than training models to match instances, MDedup [40] trains models for discovering the matching dependencies (MDs) to select matched instances, where MD is one of the relaxed forms [41,42] of functional dependency [43] in data mining. Semi-supervised learning methods [44,45], unsupervised learning methods [46,47] and self-supervised learning model [48] are also introduced into the field of instance matching. Besides, works on representation learning for matching instances are gradually emerging [49][50][51]. ...
Article
Full-text available
Instance matching is a key task in knowledge graph fusion, and it is critical to improving the efficiency of instance matching, given the increasing scale of knowledge graphs. Blocking algorithms selecting candidate instance pairs for comparison is one of the effective methods to achieve the goal. In this paper, we propose a novel blocking algorithm named MultiObJ, which constructs indexes for instances based on the Ordered Joint of Multiple Objects’ features to limit the number of candidate instance pairs. Based on MultiObJ, we further propose a distributed framework named Follow-the-Regular-Leader Instance Matching (FTRLIM), which matches instances between large-scale knowledge graphs with approximately linear time complexity. FTRLIM has participated in OAEI 2019 and achieved the best matching quality with significantly efficiency. In this research, we construct three data collections based on a real-world large-scale knowledge graph. Experiment results on the constructed data collections and two real-world datasets indicate that MultiObJ and FTRLIM outperform other state-of-the-art methods.
... Declarative Link Discovery frameworks rely on complex Link Specification to express the conditions necessary for linking resources within these datasets. For instance, state-of-the-art LD frameworks such as 5 Limes [2] and Silk [3] adopt a property-based computation of links between entities. For configuring LD frameworks, the user can either (1) manually enter a LS or (2) use machine learning for automatic generation of LS. ...
... For exam- 10 ple, the Eagle algorithm [4] is a supervised machine-learning algorithm able to learn LS using genetic programming. In newer work, the Wombat algorithm [5] implements a positive-only learning algorithm for automatic LS finding based on generalization via an upward refinement operator. While LD experts can easily understand the generated LS from such algorithms, and even modify if 15 necessary, most lay users lack the expertise to proficiently interpret those LSs. ...
... In Section 4 we introduce our neural-based LS 45 verbalization approach.Subsequently, we introduce our Summarization approach in Section 5.We then evaluate our approach with respect to the adequacy and fluency [7] of the natural language representations it generates in Section 6.After a brief review of related work in Section 7,we conclude our work with some final remarks in Section 8. 50 Throughout the rest of the paper, we use the LS shown in Listing 1 as our running example. It is generated by the Wombat [5] algorithm to link the ABT-BUY benchmark dataset from [8], where the source resource x will be linked to the target resource y if our running example's LS holds. ...
Article
The number and size of datasets abiding by the Linked Data paradigm increase every day. Discovering links between these datasets is thus central to achieving the vision behind the Data Web. Declarative Link Discovery (LD) frameworks rely on complex Link Specification (LS) to express the conditions under which two resources should be linked. Understanding such LS is not a trivial task for non-expert users. Particularly when such users are interested in generating LS to match their needs. Even if the user applies a machine learning algorithm for the automatic generation of the required LS, the challenge of explaining the resultant LS persists. Hence, providing explainable LS is the key challenge to enable users who are unfamiliar with underlying LS technologies to use them effectively and efficiently. In this paper, we extend our previous work (Ahmed et al., 2019) by proposing a generic multilingual approach that allows verbalization of LS in many languages, i.e., converts LS into understandable natural language text. In this work, we ported our LS verbalization framework into German and Spanish, in addition to English language. Our adequacy and fluency evaluations show that our approach can generate complete and easily understandable natural language descriptions even by lay users. Moreover, we devised an experimental neural approach for improving the quality of our generated texts. Our neural approach achieves promising results in terms of BLEU, METEOR and chrF++.
... Entity resolution has attracted a significant amount of research, sometimes under different names such as record linkage [4], [5], link discovery [6], [7] or deduplication [8]. In the following we can only present some relevant ER approaches. ...
... Traditional ER approaches rely on learning distance-or similarity-based measures and then use a threshold or classifier to decide about whether two entities are the same. These classifiers can be unsupervised [12], [13], supervised [7], [14] or employ active learning [8], [15]. For example the Magellan Framework [2] provides supervised ml classifiers and provides extensive guides for the entire ER process. ...
Preprint
Full-text available
Entity Resolution (ER) is a constitutional part for integrating different knowledge graphs in order to identify entities referring to the same real-world object. A promising approach is the use of graph embeddings for ER in order to determine the similarity of entities based on the similarity of their graph neighborhood. The similarity computations for such embeddings translates to calculating the distance between them in the embedding space which is comparatively simple. However, previous work has shown that the use of graph embeddings alone is not sufficient to achieve high ER quality. We therefore propose a more comprehensive ER approach for knowledge graphs called EAGER (Embedding-Assisted Knowledge Graph Entity Resolution) to flexibly utilize both the similarity of graph embeddings and attribute values within a supervised machine learning approach. We evaluate our approach on 23 benchmark datasets with differently sized and structured knowledge graphs and use hypothesis tests to ensure statistical significance of our results. Furthermore we compare our approach with state-of-the-art ER solutions, where our approach yields competitive results for table-oriented ER problems and shallow knowledge graphs but much better results for deeper knowledge graphs.
... They also specify the similarity measures to be used for comparing datatype property values, the aggregation functions for combining similarity values, and the similarity thresholds beyond which two values are considered equal. Link specifications may be directly set by users or they may be built (semi-)automatically, for example, using machine learning techniques [27,29]. ...
Article
Full-text available
Both keys and their generalisation, link keys, may be used to perform data interlinking, i.e. finding identical resources in different RDF datasets. However, the precise relationship between keys and link keys has not been fully determined yet. A common formal framework encompassing both keys and link keys is necessary to ensure the correctness of data interlinking tools based on them, and to determine their scope and possible overlapping. In this paper, we provide a semantics for keys and link keys within description logics. We determine under which conditions they are legitimate to generate links. We provide conditions under which link keys are logically equivalent to keys. In particular, we show that data interlinking with keys and ontology alignments can be reduced to data interlinking with link keys, but not the other way around.
... EAGLE [14] and AGP [5] are some of the approaches that have used active learning in order to maintain a high level of accuracy by requesting less labeled examples by training. WOM-BAT [20] uses only positive examples for finding accurate LS. Additionally, unsupervised methods and tools such as PARIS [17] require no training data but are based on specific assumptions about the characteristics of the matching pairs. ...
Conference Paper
Full-text available
Modern data-driven frameworks often have to process large amounts of data periodically. Hence, they often operate under time or space constraints. This also holds for Linked Data-driven frameworks when processing RDF data, in particular, when they perform link discovery tasks. In this work, we present a novel approach for link discovery under constraints pertaining to the expected recall of a link discovery task. Given a link specification, the approach aims to find a subsumed link specification that achieves a lower run time than the input specification while abiding by a predefined constraint on the expected recall it has to achieve. Our approach, dubbed LIGER, combines downward refinement operators with monotonicity assumptions to detect such specifications. We evaluate our approach on seven datasets. Our results suggest that the different implementations of LIGER can detect subsumed specifications that abide by expected recall constraints efficiently, thus leading to significantly shorter overall run times than our baseline.