Entity types with 400 topics

Source publication

Topic Modeling for RDF Graphs

Conference Paper

Full-text available

Oct 2015

Topic models are widely used to thematically describe a collection of text documents and have become an important technique for systems that measure document similarity for classification, clustering, segmentation, entity linking and more. While they have been applied to some non-text domains, their use for semi-structured graph data, such as RDF,...

Context 1

... our test set was relatively small, we were able to see how precision changed based on data variations. As can be seen in Figure 6 we saw the highest precision using predicates and objects and in Figure 7 we saw the highest precision using predi- cates and objects that included the Wordnet synsets and definitions. Though it was clear that objects alone did not perform as well as including the predicate, future work will further explore the relationship between supplemental data and the number of topics chosen for the model. ...

View in full-text

Utilizing Ontological Classification Systems and Reasoning for Cyber-Physical Systems

Conference Paper

Full-text available

Feb 2016

This work presents a novel ontology-based approach for the complementation of technical specifications of cyber-physical system components using ontological classification and reasoning. We build on the AutomationML standard and outline how data represented with it can be transformed into an RDF instance graph. We exemplarily show how complementary...

NELLIE: Never-Ending Linking for Linked Open Data

Article

Full-text available

Jan 2023

Knowledge graphs (KGs) that follow the Linked Data principles are created daily. However, there are no holistic models for the Linked Open Data (LOD). Building these models( i.e., engineering a pipeline system) is still a big challenge in order to make the LOD vision comes true. In this paper, we address this challenge by presenting NELLIE, a pipeline architecture to build a chain of modules, in which each of our modules addresses one data augmentation challenge. The ultimate goal of the proposed architecture is to build a single fused knowledge graph out of the LOD. NELLIE starts by crawling the available knowledge graphs in the LOD cloud. It then finds a set of matching KG pairs. NELLIE uses a two-phase linking approach for each pair (first an ontology matching phase, then an instance matching phase). Based on the ontology and instance matching, NELLIE fuses each pair of knowledge graphs into a single knowledge graph. The resulting fused KG is then an ideal data source for knowledge-driven applications such as search engines, question answering, digital assistants and drug discovery. Our evaluation shows an improved Hit @1 score of the link prediction task on the resulting fused knowledge graph by NELLIE in up to 94.44% of the cases. Our evaluation also shows a runtime improvement by several orders of magnitude when comparing our two-phases linking approach with the estimated runtime of linking using a naïve approach.

DeepEx: A Robust Weak Supervision System for Knowledge Base Augmentation

Article

Full-text available

Dec 2021

Knowledge bases allow data organization and exploration, making easier the data semantic understanding and its use by machines. Traditional strategies for knowledge base construction and augmentation have mostly relied on manual effort or automatic extraction of content from structured and semi-structured sources. In this work, we present DeepEx, a system that autonomously extracts missing attributes of entities in knowledge bases from unstructured text. We use Wikipedia as data source. Given entities on Wikipedia represented by their articles (text and infobox), DeepEx uses a classifier to detect sentences in the articles mentioning the possible missing attributes of the entities and then employs a deep-learning extraction model on those sentences to identify the attributes. The sentence classifier and attribute extractor are built with labels automatically produced by a weak supervision approach using infobox structured information as supervision source. We have compared our strategy with previous approaches to this problem on 29 different attributes from 4 domains. The results showed that our extraction pipeline achieved statistically superior performance in comparison with some baselines and variations of our approach.

A Framework for Semantic Text Clustering

Article

Full-text available

Jan 2020

Fine-grained Type Inference in Knowledge Graphs via Probabilistic and Tensor Factorization Methods

Conference Paper

May 2019

Knowledge Graphs (KGs) have been proven to be incredibly useful for enriching semantic Web search results and allowing queries with a well-defined result set. In recent years much attention has been given to the task of inferring missing facts based on existing facts in a KG. Approaches have also been proposed for inferring types of entities, however these are successful in common types such as 'Person', 'Movie', or 'Actor'. There is still a large gap, however, in the inference of fine-grained types which are highly important for exploring specific lists and collections within web search. Generally there are also relatively fewer observed instances of fine-grained types present to train in KGs, and this poses challenges for the development of effective approaches. In order to address the issue, this paper proposes a new approach to the fine-grained type inference problem. This new approach is explicitly modeled for leveraging domain knowledge and utilizing additional data outside KG, that improves performance in fine-grained type inference. Further improvements in efficiency are achieved by extending the model to probabilistic inference based on entity similarity and typed class classification. We conduct extensive experiments on type triple classification and entity prediction tasks on Freebase FB15K benchmark dataset. The experiment results show that the proposed model outperforms the state-of-the-art approaches for type inference in KG, and achieves high performance results in many-to-one relation in predicting tail for KG completion task.

A Framework to Build Games with a Purpose for Linked Data Refinement: 17th International Semantic Web Conference, Monterey, CA, USA, October 8–12, 2018, Proceedings, Part II

Chapter

Full-text available

Nov 2018

With the rise of linked data and knowledge graphs, the need becomes compelling to find suitable solutions to increase the coverage and correctness of datasets, to add missing knowledge and to identify and remove errors. Several approaches - mostly relying on machine learning and NLP techniques - have been proposed to address this refinement goal; they usually need a partial gold standard, i.e. some "ground truth" to train automatic models. Gold standards are manually constructed, either by involving domain experts or by adopting crowdsourcing and human computation solutions. In this paper, we present an open source software framework to build Games with a Purpose for linked data refinement, i.e. web applications to crowdsource partial ground truth, by motivating user participation through fun incentive. We detail the impact of this new resource by explaining the specific data linking "purposes" supported by the framework (creation, ranking and validation of links) and by defining the respective crowdsourcing tasks to achieve those goals. To show this resource's versatility, we describe a set of diverse applications that we built on top of it; to demonstrate its reusability and extensibility potential, we provide references to detailed documentation, including an entire tutorial which in a few hours guides new adopters to customize and adapt the framework to a new use case.

Combining Word Embedding and Knowledge-Based Topic Modeling for Entity Summarization

Conference Paper

Jan 2018

Word embedding is becoming more popular in the Semantic Web community as an effective approach for capturing semantics in various contexts. In this paper, we combine word embedding and topic modeling to model RDF data for the entity summarization task. In our model, ES-LDA_ext, which is the extended version of our previous model, we utilize the word embedding to supplement the RDF data before applying entity summarization. In addition, in the model presented here, we use RDF literals as a very good source of information to create more reliable and representative summaries for entities. To do that, we use the Named Entity Recognition approach to extract entities within literals before feeding them into the word embedding model to enrich the RDF data. Experimental results demonstrate the effectiveness of the proposed model.

ES-LDA: Entity Summarization using Knowledge-based Topic Modeling

Conference Paper

Full-text available

Nov 2017

With the advent of the Internet, the amount of Semantic Web documents that describe real-world entities and their inter-links as a set of statements have grown considerably. These descriptions are usually lengthy, which makes the utilization of the underlying entities a difficult task. Entity summarization, which aims to create summaries for real world entities, has gained increasing attention in recent years. In this paper, we propose a probabilistic topic model, ES-LDA , that combines prior knowledge with statistical learning techniques within a single framework to create more reliable and representative summaries for entities. We demonstrate the effectiveness of our approach by conducting extensive experiments and show that our model outperforms the state-of-the-art techniques and enhances the quality of the entity summaries.

Thinking, Fast and Slow: Combining Vector Spaces and Knowledge Graphs

Article

Full-text available

Aug 2017

Knowledge graphs and vector space models are both robust knowledge representation techniques with their individual strengths and weaknesses. Vector space models excel at determining similarity between concepts, but they are severely constrained when evaluating complex dependency relations and other logic based operations that are a forte of knowledge graphs. In this paper, we propose the V-KG structure that helps us unify knowledge graphs and vector representation of entities, and allows us to develop powerful inference methods and search capabilities that combine their complementary strengths. We analogize this to thinking `fast' in vector space along with thinking `deeply' and `slowly' by reasoning over the knowledge graph. We have also created a query processing engine that takes complex queries and decomposes them into subqueries optimized to run on the respective knowledge graph part or the vector part of V-KG. We show that the V-KG structure can process specific queries that are not efficiently handled by vector spaces or knowledge graphs alone. We also demonstrate and evaluate the V-KG structure and the query processing engine by developing a system called Cyber-All-Intel for knowledge extraction, representation and querying in an end-to-end pipeline grounded in the cybersecurity informatics domain.

Conference Paper

May 2016

The Web of data is growing continuously with respect to both the size and number of the datasets published. Porting a dataset to five-star Linked Data however requires the publisher of this dataset to link it with the already available linked datasets. Given the size and growth of the Linked Data Cloud, the current mostly manual approach used for detecting relevant datasets for linking is obsolete. We study the use of topic modelling for dataset search experimentally and present Tapioca, a linked dataset search engine that provides data publishers with similar existing datasets automatically. Our search engine uses a novel approach for determining the topical similarity of datasets. This approach relies on probabilistic topic modelling to determine related datasets by relying solely on the metadata of datasets. We evaluate our approach on a manually created gold standard and with a user study. Our evaluation shows that our algorithm outperforms a set of comparable baseline algorithms including standard search engines significantly by 6 % F1-score. Moreover, we show that it can be used on a large real world dataset with a comparable performance.

Machine Learning for Refining Knowledge Graphs: A Survey

Article

Jan 2024
ACM COMPUT SURV

Knowledge graph (KG) refinement refers to the process of filling in missing information, removing redundancies, and resolving inconsistencies in knowledge graphs. With the growing popularity of KG in various domains, many techniques involving machine learning have been applied, but there is no survey dedicated to machine learning-based KG refinement yet. Based on a novel framework following the KG refinement process, this paper presents a survey of machine learning approaches to KG refinement according to the kind of operations in KG refinement, the training datasets, mode of learning, and process multiplicity. Furthermore, the survey aims to provide broad practical insights into the development of fully automated KG refinement.

Entity types with 400 topics

Context in source publication

Similar publications

Citations