Home
Tampere University
School of Information Sciences
Kostas Stefanidis

Kostas Stefanidis
Tampere University | UTA · School of Information Sciences

PhD in Computer Science

About

136

Publications

29,397

Reads

2,022

Citations

Skills and Expertise

October 2011 - November 2012

Norwegian University of Science and Technology

Department of Computer and Information Science
Trondheim, Norway

Position

PostDoc Position

February 2010 - March 2011

The Chinese University of Hong Kong

Department of Computer Science and Engineering
Hong Kong, Hong Kong

Position

PostDoc Position

Publications

EMiGRe: Unveiling Why Your Recommendations are Not What You Expect

Chapter

Jun 2024

DIAERESIS: Knowledge Graph Partitioning for Efficient Query Answering

Conference Paper

May 2024

Fig. 4. Constructing query partitioning information structure.

Fig. 6. Query execution for LUBM datasets and systems.

Fig. 8. Query execution for (a) LUBM 100, (b) LUBM 1300, (c) LUBM 2300,...

DIAERESIS: RDF data partitioning and query processing on SPARK

Article

Full-text available

Mar 2024

The explosion of the web and the abundance of linked data demand effective and efficient methods for storage, management, and querying. Apache Spark is one of the most widely used engines for big data processing, with more and more systems adopting it for efficient query answering. Existing approaches exploiting Spark for querying RDF data, adopt p...

Text quality of human-written summaries (Mean ± Standard Deviation).

Linguistic summarisation of multiple entities in RDF graphs

Article

Full-text available

Jan 2024

Methods for producing summaries from structured data have gained interest due to the huge volume of available data in the Web. Simultaneously, there have been advances in natural language generation from Resource Description Framework (RDF) data. However, no efforts have been made to generate natural language summaries for groups of multiple RDF en...

Treats: Fairness-Aware Entity Resolution Over Streaming Data

Preprint

Jan 2024

Figure 1. Overview of the process for generating weakly meal planning.

Question 2 Results (violated positions).

Healthy Personalized Recipe Recommendations for Weekly Meal Planning

Article

Full-text available

Dec 2023

Nowadays, in the pursuit of personalized health and well-being, dietary choices are critical. This paper introduces a novel recommendation system designed to provide users with personalized meal plans, consisting of breakfast, lunch, snack, and dinner, in alignment with their health history and preferences from other similar users. More specificall...

Multi-Objective Fairness in Team Assembly

Chapter

Aug 2023

Team assembly is a problem that demands trade-offs between multiple fairness criteria and computational optimization. We focus on four criteria: (i) fair distribution of workloads within the team, (ii) fair distribution of skills and expertise regarding project requirements, (iii) fair distribution of protected classes in the team, and (iv) fair di...

Fig. 1. The impact of N in cost, workload, expertise, representation...

Fig. 2. The impact of M in cost, workload, expertise, representation...

Example of a team T formed according to project requirements P.

Computational Team Assembly with Fairness Constraints

Preprint

Full-text available

Jun 2023

Structural Bias in Knowledge Graphs for the Entity Alignment Task

Chapter

Full-text available

May 2023

Knowledge Graphs (KGs) have recently gained attention for representing knowledge about a particular domain and play a central role in a multitude of AI tasks like recommendations and query answering. Recent works have revealed that KG embedding methods used to implement these tasks often exhibit direct forms of bias (e.g., related to gender, nation...

Special issue on Semantic Web Meets Health Data Management

Article

Full-text available

May 2023

Figure 2: Regions most likely to be unfair according to different...

Figure 3: LAR: Results for a high-resolution partitioning of 100 × 50.

Figure 4: Crime: Results for a partitioning of 20 × 20.

Figure 10: LAR: The centers of the square regions scanned, and their...

Auditing for Spatial Fairness

Preprint

Full-text available

Feb 2023

This paper studies algorithmic fairness when the protected attribute is location. To handle protected attributes that are continuous, such as age or income, the standard approach is to discretize the domain into predefined groups, and compare algorithmic outcomes across groups. However, applying this idea to location raises concerns of gerrymanderi...

Figure 1. ER workflow considering the streaming data and incremental...

Figure 2. Parallel architecture for blocking streaming data.

Figure 3. Workflow for the streaming blocking technique.

Figure 4. Time-window strategy for incremental blocking.

Figure 5. Effectiveness results of the techniques for the data sources:...

Incremental Entity Blocking over Heterogeneous Streaming Data

Article

Full-text available

Dec 2022

Web systems have become a valuable source of semi-structured and streaming data. In this sense, Entity Resolution (ER) has become a key solution for integrating multiple data sources or identifying similarities between data items, namely entities. To avoid the quadratic costs of the ER task and improve efficiency, blocking techniques are usually ap...

SQUIRREL: A framework for sequential group recommendations through reinforcement learning

Article

Sep 2022

Nowadays, sequential recommendations are becoming more prevalent. A user expects the system to remember past interactions and not conduct each recommendation round as a stand-alone process. Additionally, group recommendation systems are more prominent since more and more people are able to form groups for activities. Subsequently, the data that a g...

Example rankings: arl\documentclass[12pt]{minimal} \usepackage{amsmath}...

The general distinction of the methods for ensuring fair ranked outputs

Job applications with positive class probability

Fairness in rankings and recommendations: an overview

Article

Full-text available

May 2022

We increasingly depend on a variety of data-driven algorithmic systems to assist us in many aspects of life. Search engines and recommender systems among others are used as sources of information and to help us in making all sort of decisions from selecting restaurants and books, to choosing friends and careers. This has given rise to important con...

F2VAE: a framework for mitigating user unfairness in recommendation systems

Conference Paper

Apr 2022

Group Satisfaction and Disagreement for all aggregation methods for 15...

Sequential group recommendations based on satisfaction and disagreement scores

Article

Full-text available

Apr 2022

Recently, group recommendations have gained much attention. Nevertheless, most approaches consider only one round of recommendations. However, in a real-life scenario, it is expected that the history of previous recommendations is exploited to tailor the recommendations towards meeting the needs of the group members. Such history should include not...

Position (left) and popularity (right) bias in recommender systems....

All results measured during the training processes for Nowplaying,...

Top 10 scores calculated for a random user before (Left) and after...

Feature-blind fairness in collaborative filtering recommender systems

Article

Full-text available

Apr 2022

Recommender systems were originally proposed for suggesting potentially relevant items to users, with the unique objective of providing accurate suggestions. These recommenders started being adopted in several domains, and were identified as generating biased results that could harm the data items being recommended. The exposure in generated rankin...

Report on the Third International Workshop on Semantic Web Meets Health Data Management (SWH 2020)

Article

Dec 2021

Creating a holistic view of patient data comes with many challenges but also brings many benefits for disease prediction, prevention, diagnosis, and treatment. Especially in the COVID-19 era, this is more important than ever before. The third International Workshop on Semantic Web Meets Health Data Management (SWH) was aimed at bringing together an...

FairER: Entity Resolution With Fairness Constraints

Conference Paper

Full-text available

Sep 2021

There is an urgent call to detect and prevent "biased data" at the earliest possible stage of the data pipelines used to build automated decision-making systems. In this paper, we are focusing on controlling the data bias in entity resolution (ER) tasks aiming to discover and unify records/descriptions from different data sources that refer to the...

Coverage-Based Summaries for RDF KBs

Chapter

Jul 2021

As more and more data become available as linked data, the need for efficient and effective methods for their exploration becomes apparent. Semantic summaries try to extract meaning from data, while reducing its size. State of the art structural semantic summaries, focus primarily on the graph structure of the data, trying to maximize the summary’s...

Interactivity, Fairness and Explanations in Recommendations

Conference Paper

Jun 2021

Fairness-aware Methods in Rankings and Recommenders

Conference Paper

Jun 2021

Fig. 1 Fairness Definitions in Classification.

Fig. 3 (left) An example ranking (center) reranking with p = 1, (right)...

Fairness definitions taxonomy in Rankings, Recommenders and...

Fairness in Rankings and Recommendations: An Overview

Preprint

Full-text available

Apr 2021

We increasingly depend on a variety of data-driven algorithmic systems to assist us in many aspects of life. Search engines and recommender systems amongst others are used as sources of information and to help us in making all sort of decisions from selecting restaurants and books, to choosing friends and careers. This has given rise to important c...

Fairness in Rankings and Recommenders: Models, Methods and Research Directions

Conference Paper

Apr 2021

On mitigating popularity bias in recommendations via variational autoencoders

Conference Paper

Mar 2021

A Data-Driven Approach for Video Game Playability Analysis Based on Players' Reviews

Article

Full-text available

Mar 2021

Playability is a key concept in game studies defining the overall quality of video games. Although its definition and frameworks are widely studied, methods to analyze and evaluate the playability of video games are still limited. Using heuristics for playability evaluation has long been the mainstream with its usefulness in detecting playability i...

Report on the Second International Workshop on Semantic Web Meets Health Data Management (SWH 2019)

Article

Dec 2020

The advancements in health-care have brought to the foreground the need for flexible access to health-related information and created an ever-growing demand for efficient data management infrastructures. To this direction, many challenges must be first overcome, enabling seamless, effective and efficient access to several health data sets and novel...

An Overview of End-to-End Entity Resolution for Big Data

Article

Full-text available

Dec 2020

One of the most critical tasks for improving data quality and increasing the reliability of data analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to the same real-world entity. Despite several decades of research, ER remains a challenging problem. In this survey, we highlight the novel aspects of resolvi...

Why-Not Questions & Explanations for Collaborative Filtering

Chapter

Oct 2020

Throughout our digital lives, we are getting recommendations for about almost everything we do, buy or consume. However, it is often the case that recommenders cannot locate the best data items to suggest. To deal with this shortcoming, they provide explanations for the reasons specific items are suggested. In this work, we focus on explanations fo...

Fig. 2. Token blocking example. Descriptions having a common token are...

Fig. 3. Attribute clustering blocking example. Pairs of most similar...

Fig. 4. Prefix-infix(-suffix) blocking example. A set of descriptions...

Figure 4(c) shows the blocks produced after applying...

Benchmarking Blocking Algorithms for Web Entities

Preprint

Full-text available

May 2020

An increasing number of entities are described by interlinked data rather than documents on the Web. Entity Resolution (ER) aims to identify descriptions of the same real-world entity within one or across knowledge bases in the Web of data. To reduce the required number of pairwise comparisons among descriptions, ER methods typically perform a pre-...

Schema-agnostic blocking for streaming data

Conference Paper

Mar 2020

Fair sequential group recommendations

Conference Paper

Mar 2020

A Hybrid Recommender System for Steam Games

Chapter

Mar 2020

A recommender system can be considered as an information filtering system that seeks to predict the preference a user would have for a data item. It is commonly utilized in digital stores to recommend products to their users according to the users’ previous purchases. This applies to Steam as well, a widely used digital distribution platform for ga...

A Sentiment-Statistical Approach for Identifying Problematic Mobile App Updates Based on User Reviews

Article

Full-text available

Mar 2020

Mobile applications (apps) on IOS and Android devices are mostly maintained and updated via Apple Appstore and Google Play, respectively, where the users are allowed to provide reviews regarding their satisfaction towards particular apps. Despite the importance of user reviews towards mobile app maintenance and evolution, it is time-consuming and i...

Popularity evolution of Alexis Tsipras in 2015

Popularity evolution of Donald Trump, Hillary Clinton, and Barack Obama...

Evolution of attitude (left), sentimentality (middle), and...

Direct (Formula 13) and indirect (Formula 14) connectedness of “Alexis...

Tracking the history and evolution of entities: entity-centric temporal analysis of large social media archives

Article

Full-text available

Mar 2020

How did the popularity of the Greek Prime Minister evolve in 2015? How did the predominant sentiment about him vary during that period? Were there any controversial sub-periods? What other entities were related to him during these periods? To answer these questions, one needs to analyze archived documents and data about the query entities, such as...

Multidimensional Group Recommendations in the Health Domain

Article

Full-text available

Feb 2020

Providing useful resources to patients is essential in achieving the vision of participatory medicine. However, the problem of identifying pertinent content for a group of patients is even more difficult than identifying information for just one. Nevertheless, studies suggest that the group dynamics-based principles of behavior change have a positi...

Report on the First International Workshop on Semantic Web Technologies for Health Data Management (SWH 2018)

Article

Dec 2019

Better information management is the key to a more intelligent health and social system. To this direction, many challenges must be first overcome, enabling seamless, effective and efficient access to various health data sets and novel methods for exploiting the available information. The First International Workshop on Semantic Web Technologies fo...

Enhancing Long Term Fairness in Recommendations with Variational Autoencoders

Conference Paper

Nov 2019

Incremental Blocking for Entity Resolution over Web Streaming Data

Conference Paper

Oct 2019

The widespread use of information systems has become a valuable source of semi-structured data. In this context, Entity Resolution (ER) emerges as a fundamental task to integrate multiple knowledge bases or identify similarities between data items (i.e., entities). Since ER is an inherently quadratic task, blocking techniques are often used to impr...

Ratings vs. Reviews in Recommender Systems: A Case study on the Amazon Movies Dataset

Conference Paper

Full-text available

Sep 2019

Together with the prevalence of e-commerce and online shopping , recommender systems have been playing an increasingly important role in people's daily lives in terms of discovering their potential preferences. Therein, users' preferences are mostly reflected by their online behaviors, specially their evaluation towards particular items, e.g., nume...

GameRecs: Video Games Group Recommendations

Chapter

Sep 2019

Video games are a relatively new form of entertainment that has been rapidly gaining popularity in recent years. The number of video games available to users is huge and constantly growing, and thus it can be a daunting task to search for new ones to play. Given that some games are designed to be played together as a group, finding games suitable f...

FIFARecs: A Recommender System for FIFA18

Chapter

Sep 2019

One of the most popular features in the FIFA18 game is the career mode, where the target of the users is to improve their teams and win as much competitions as possible. Usually, it is hard for the users to decide which players to select to buy to maximally improve their team by taking into account all different players’ attributes. In this paper,...

Ratings vs. Reviews in Recommender Systems: A Case Study on the Amazon Movies Dataset

Chapter

Full-text available

Sep 2019

Together with the prevalence of e-commerce and online shopping, recommender systems have been playing an increasingly important role in people’s daily lives in terms of discovering their potential preferences. Therein, users’ preferences are mostly reflected by their online behaviors, specially their evaluation towards particular items, e.g., numer...

Figure 1: Parts of entity graphs, representing the Wikidata (left) and...

Figure 2: Value and neighbor similarity distribution of matches in 4...

Figure 3: (a) Parts of the disjunctive blocking graph corresponding to...

Figure 4: The architecture of MinoanER in Spark.

Figure 5: Sensitivity analysis of the four configuration parameters of...

MinoanER: Schema-Agnostic, Non-Iterative, Massively Parallel Resolution of Web Entities

Preprint

Full-text available

May 2019

Entity Resolution (ER) aims to identify different descriptions in various Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the Variety, Volume and Veracity of entity descriptions published in the Web of Data. To address them, we propose the MinoanER framework that simultaneously fulfills full automation, support of highly he...

End-to-End Entity Resolution for Big Data: A Survey

Preprint

May 2019

One of the most important tasks for improving data quality and the reliability of data analytics results is Entity Resolution (ER). ER aims to identify different descriptions that refer to the same real-world entity, and remains a challenging problem. While previous works have studied specific aspects of ER (and mostly in traditional settings), in...

A noise tolerant and schema-agnostic blocking technique for entity resolution

Conference Paper

Apr 2019

The increasing use of Web systems has become a valuable source of semi-structured data. In this context, the Entity Resolution (ER) task emerges as a fundamental step to integrate multiple knowledge bases or identify similarities between the data items (i.e., entities). Usually, blocking techniques are widely applied as an initial step of ER approa...

MinoanER: Schema-Agnostic, Non-Iterative, Massively Parallel Resolution of Web Entities

Conference Paper

Full-text available

Apr 2019

RDFDigest+: A Summary-driven System for KBs Exploration

Conference Paper

Full-text available

Jan 2019

In this paper, we present RDFDigest+, a novel tool that enables effective and efficient RDF/S Knowledge Base (KB) exploration using summaries. The tool employs a diverse set of algorithms for identifying the most important nodes, offering a wide range of possibilities to capture importance. The selected nodes can be combined using multiple state of...

Tracking the History and Evolution of Entities: Entity-centric Temporal Analysis of Large Social Media Archives

Preprint

Oct 2018

Sentiment-aware Analysis of Mobile Apps User Reviews Regarding Particular Updates

Conference Paper

Full-text available

Oct 2018

The contemporary online mobile application (app) market enables users to review the apps they use. These reviews are important assets reflecting the users needs and complaints regarding the particular apps, covering multiple aspects of the mobile apps quality. By investigating the content of such reviews, the app developers can acquire useful infor...

Exploring RDFS KBs Using Summaries: 17th International Semantic Web Conference, Monterey, CA, USA, October 8–12, 2018, Proceedings, Part I

Chapter

Full-text available

Sep 2018

Open Source Software Recommendations Using Github: 22nd International Conference on Theory and Practice of Digital Libraries, TPDL 2018, Porto, Portugal, September 10–13, 2018, Proceedings

Chapter

Sep 2018

The focus of this work is on providing an open source software recommendations using the Github API. Specifically, we propose a hybrid method that considers the programming languages, topics and README documents that appear in the users’ repositories. To demonstrate our approach, we implement a proof of concept that provides recommendations.

Mobile App Evolution Analysis based on User Reviews

Conference Paper

Full-text available

Sep 2018

The user reviews of mobile apps are important assets that reflect the users' needs and complaints about particular apps regarding features, usability, and designs. From investigating the content of such reviews, the app developers can acquire useful information guiding the future maintenance and evolution work. Previous studies on opinion mining in...

FairGRecs: Fair Group Recommendations by Exploiting Personal Health Information: 29th International Conference, DEXA 2018, Regensburg, Germany, September 3–6, 2018, Proceedings, Part II

Chapter

Full-text available

Aug 2018

Incremental Data Partitioning of RDF Data in SPARK: ESWC 2018 Satellite Events, Heraklion, Crete, Greece, June 3-7, 2018, Revised Selected Papers

Chapter

Aug 2018

Significant efforts have been dedicated recently to the development of architectures for storing and querying RDF data in distributed environments. Several approaches focus on data partitioning, which are able to answer queries efficiently, by using a small number of computational nodes. However, such approaches provide static data partitions. Give...

Incremental Data Partitioning of RDF Data in SPARK

Conference Paper

Full-text available

Jul 2018

RDF Query Answering Using Apache Spark: Review and Assessment

Conference Paper

Full-text available

Apr 2018

The explosion of the web and the abundance of linked data demand for effective and efficient methods for storage, management and querying. More specifically, the ever-increasing size and number of RDF data collections raises the need for efficient query answering, and dictates the usage of distributed data management systems for effectively partiti...

Simplifying Entity Resolution on Web Data with Schema-Agnostic, Non-Iterative Matching

Conference Paper

Full-text available

Feb 2018

Entity Resolution (ER) aims to identify different descriptions in various Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the Variety, Volume and Veracity of descriptions published in the Web of Data. To address them, we propose the MinoanER framework that fulfills full automation and support of highly heterogeneous entitie...

Multi-aspect Entity-Centric Analysis of Big Social Media Archives

Conference Paper

Sep 2017

Social media archives serve as important historical information sources, and thus meaningful analysis and exploration methods are of immense value for historians, sociologists and other interested parties. In this paper, we propose an entity-centric approach to analyze social media archives and we define measures that allow studying how entities ar...

Blocking for Entity Resolution in the Web of Data: Challenges and Algorithms

Chapter

Jun 2017

Kostas Stefanidis

In the Web of data, entities are described by interlinked data rather than documents on the Web. In this talk, we focus on entity resolution in the Web of data, i.e., on the problem of identifying descriptions that refer to the same real-world entity within one or across knowledge bases in the Web of data. To reduce the required number of pairwise...

Group Recommendations in MapReduce

Chapter

Jun 2017

Recommender systems have received significant attention, with most of the proposed methods focusing on recommendations for single users. However, there are contexts in which the items to be suggested are not intended for a user but for a group of people. For example, assume a group of friends or a family that is planning to watch a movie or visit a...

On Achieving Diversity in Recommender Systems

Conference Paper

May 2017

Throughout our digital lives, we are getting recommendations for about almost everything we do, buy or consume. In that way, the field of recommender systems has been evolving vastly to match the increasing user needs accordingly. News, products, ideas and people are only a few of the things that we can be recommended with daily. However, even with...

Fairness in Group Recommendations in the Health Domain

Conference Paper

Apr 2017

During the last decade, the number of users who look for health-related information has impressively increased. On the other hand, health professionals have less and less time to recommend useful sources of such information online to their patients. To this direction, we target at streamlining the process of providing useful online information to p...

On Recommending Evolution Measures: A Human-Aware Approach

Conference Paper

Apr 2017

As knowledge bases are constantly evolving, there is a clear need for monitoring and analyzing the changes that occur on them. Traditional approaches for studying the evolution of data focus on providing humans with deltas that include loads of information. In this work, we envision a processing model that recommends evolution measures taking into...

Parallel Meta-blocking for Scaling Entity Resolution over Big Heterogeneous Data

Article

Apr 2017

Entity resolution constitutes a crucial task for many applications, but has an inherently quadratic complexity. In order to enable entity resolution to scale to large volumes of data, blocking is typically employed: it clusters similar entities into (overlapping) blocks so that it suffices to perform comparisons only within each block. To further i...

Web-Scale Blocking, Iterative and Progressive Entity Resolution

Conference Paper

Apr 2017

Social-Based Collaborative Filtering

Chapter

Jan 2017

Social Search and Querying

Chapter

Jan 2017

Report on the Third International Workshop on Exploratory Search in Databases and the Web (ExploreDB 2016)

Article

Dec 2016

Group Recommendations in MapReduce

Conference Paper

Full-text available

Sep 2016

Proceedings of the Third International Workshop on Exploratory Search in Databases and the Web

Article

Jun 2016

The purpose of the ExploreDB workshop is to bring together researchers and practitioners that approach data exploration from different angles, ranging from data management, information retrieval to data visualization and human computer interaction, in order to study the emerging needs and objectives for data exploration, as well as the challenges a...

Benchmarking Blocking Algorithms for Web Entities

Article

Full-text available

Jun 2016

Recommendations beyond the ratings matrix

Conference Paper

May 2016

Recommender systems have become indispensable for several Web sites, such as Amazon, Netflix and Google News, helping users navigate through the abundance of available choices. Although the field has advanced impressively in the last years with respect to models, usage of heterogeneous information, such as ratings and text reviews, and recommendati...

Report on the Second International Workshop on Exploratory Search in Databases and the Web (ExploreDB 2015)

Article

May 2016

D2V – Understanding the Dynamics of Evolving Data: A Case Study in the Life Sciences

Article

Full-text available

Apr 2016

D2V, a research prototype for analysing the dynamics of Linked Open Data, has been used to study the evolution of biomedical datasets, such as the Experimental Factor Ontology (EFO) and the Gene Ontology (GO). Datasets are continuously evolving over time as our knowledge increases. Biomedical datasets in particular have undergone rapid changes in r...

Understanding Ontology Evolution Beyond Deltas

Conference Paper

Full-text available

Mar 2016

The dynamic nature of the data on the Web gives rise to a multitude of problems related to the description and analysis of the evolution of such data. Traditional approaches for identifying and analyzing changes are descriptive, focus-ing on the provision of a " delta " that describes the changes and often overwhelming the user with loads of inform...

Proceedings of the 19th International Conference on Extending Database Technology, EDBT

Book

Mar 2016

Demo Video for D2V: Tool for Defining, Detecting and Visualizing Changes on the Data Web

Data

Nov 2015

Parallel Meta-blocking: Realizing Scalable Entity Resolution over Large, Heterogeneous Data

Conference Paper

Full-text available

Oct 2015

Entity resolution constitutes a crucial task for many applications, but has an inherently quadratic complexity. Typically, it scales to large volumes of data through blocking: similar entities are clustered into blocks so that it suffices to perform comparisons only within each block. Meta-blocking further increases efficiency by cleaning the overl...

Big Data Entity Resolution: From Highly to Somehow Similar Entity Descriptions in the Web

Conference Paper

Full-text available

Oct 2015

In the Web of data, entities are described by inter-linked data rather than documents on the Web. In this work, we focus on entity resolution in the Web of data, i.e., identifying descriptions that refer to the same real-world entity. To reduce the required number of pairwise comparisons, methods for entity resolution perform blocking as a pre-proc...

Top-k Computations in MapReduce: A Case Study on Recommendations

Conference Paper

Full-text available

Oct 2015

Top-k is a well-studied problem in the literature, due to its wide spectrum of applications, like information retrieval, database querying, Web search and data mining. In the big data era, the volume of the data and their velocity, call for efficient parallel solutions that overcome the restricted resources of a single machine. Our motivating appli...

D2V: A Tool for Defining, Detecting and Visualizing Changes on the Data Web

Poster

Full-text available

Oct 2015

The dynamic nature of Web data gives rise to the need of understanding and analyzing the dynamics of individual datasets. As a matter of fact, the value of a dynamic dataset lies not only in its content, but also in its evolution history, which in some applications (e.g., trend analysis and identification), is more important than the data itself. I...

A Flexible Framework for Understanding the Dynamics of Evolving RDF Datasets

Conference Paper

Full-text available

Oct 2015

The dynamic nature of Web data gives rise to a multitude of problems related to the description and analysis of the evolution of RDF datasets, which are important to a large number of users and domains, such as, the curators of biological information where changes are constant and interrelated. In this paper, we propose a framework that enables ide...

Entity Resolution in the Web of Data

Book

Aug 2015

In recent years, several knowledge bases have been built to enable large-scale knowledge sharing, but also an entity-centric Web search, mixing both structured data and text querying. These knowledge bases offer machine-readable descriptions of real-world entities, e.g., persons, places, published on the Web as Linked Data. However, due to the diff...

Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web

Article

May 2015

Information Hunting: The Many Faces of Recommendations for Data Exploration

Article

Mar 2015

A Flexible Framework for Defining, Representing and Detecting Changes on the Data Web [arXiv]

Article

Full-text available

Jan 2015

The dynamic nature of Web data gives rise to a multitude of problems related to the identification, computation and management of the evolving versions and the related changes. In this paper, we consider the problem of change recognition in RDF datasets, i.e., the problem of identifying, and when possible give semantics to, the changes that led fro...

A Flexible Framework for Understanding the Dynamics of Evolving RDF Datasets: Extended Version

Technical Report

Full-text available

Jan 2015

Keyword Search on RDF Graphs: It Is More Than Just Searching for Keywords

Conference Paper

Jan 2015

In this paper, we propose a model for enabling users to search RDF data via keywords, thus, allowing them to discover relevant information without using complicated queries or knowing the underlying ontology or vocabulary. We aim at exploiting the characteristics of the RDF data to increase the quality of the ranked query results. We consider diffe...

Experimental Evaluation of Blocking Algorithms

Chapter

Jan 2015

In this chapter, we present the experimental framework we have designed for a critical assessment of blocking algorithms. In particular, we describe the datasets and the measures we employed to study the behavior of the blocking algorithms under different semantic and structural characteristics of entity descriptions in the Linked Open Data (LOD) c...

Entity Resolution in the Web of Data

Book

Jan 2015

Web of Data: Describing and Linking Entities

Chapter

Jan 2015

The Web bears the potential of being a universal source of knowledge used to answer questions, retrieve facts, solve problems, or create new knowledge. Many major scientific discoveries have been made possible by recognizing the connections across domains or by integrating insights from several sources [Gruber, 2008]. This process requires accessin...

Blocking

Chapter

Jan 2015

As we have seen in Chapter 2.1, grouping entity descriptions in blocks before comparing them for matching is an important pre-processing step for pruning the quadratic number of comparisons required to resolve a collection of entity descriptions. The main objective of algorithms for entity blocking, formally defined in Section 3.1, is to achieve a...

Matching and Resolving Entities

Chapter

Jan 2015

As we have seen in Chapter 1, an increasing number of real-world entities are described by a multitude of Knowledge Bases (KBs) published in the Web of data. These descriptions may provide partial, overlapping, and sometimes contradicting information for the same entities. Understanding how two descriptions are related is an essential task to a num...

Iterative Entity Resolution

Chapter

Jan 2015

As we have seen in Chapter 2.1, to minimize the number of missed matches, an iterative entity resolution (ER) process can progressively exploit any intermediate results of blocking and matching, discovering new candidate description pairs for resolution, even if this process entails additional processing cost. The main objective of the algorithms f...

NaviSoc: A Socially Enhanced Real-time Navigator

Conference Paper

Full-text available

Dec 2014

As the usage of social networks becomes more and more ubiquitous and people commute more often today, social streams have become a valuable source for many kinds of applications. For example, the various social streams could be exploited for choosing the optimal path (e.g., The shortest and/or the fastest) to reach a desired destination. To this di...

Report on the First International Workshop on Exploratory Search in Databases and the Web (ExploreDB 2014)

Article

Dec 2014

Databases are well-organized collections of data. Structured query languages, such as SQL, XQuery, and SPARQL, enable users to formulate precise queries over the data stored in a database. To be successful, users need to be familiar with the query language and the underlying data organization. Visual analytics naturally integrate the human in the d...

Enabling Social Search in Time through Graphs

Article

Nov 2014

Recently, social networks have attracted considerable attention. The huge volume of information contained in them, as well as their dynamic nature, make the problem of searching social data challenging. In this work, we envision the design of a complete framework for social search by exploiting both the underlying social graph and the temporal info...

"Strength Lies in Differences"

Conference Paper

Nov 2014

Nowadays, WWW brings overwhelming variety of choices to consumers. Recommendation systems facilitate the selection by issuing recommendations to them. Recommendations for users, or groups, are determined by considering users similar to the users in question. Scanning the whole database for locating similar users, though, is expensive. Existing appr...

On Designing Archiving Policies for Evolving RDF Datasets on the Web

Conference Paper

Full-text available

Oct 2014

When dealing with dynamically evolving datasets, users are often interested in the state of affairs on previous versions of the dataset, and would like to execute queries on such previous versions, as well as queries that compare the state of affairs across different versions. This is especially true for datasets stored in the Web, where the interl...

Network

Carlos Castillo
University Pompeu Fabra
Christian Bizer
Universität Mannheim
Wolfgang Nejdl
Forschungszentrum L3S
Haridimos Kondylakis
University of Crete
Bamshad Mobasher
DePaul University

Dimitrios G Katehakis
Foundation for Research and Technology - Hellas
Vassilis Christophides
University of Crete
Allel Hadjali
Laboratoire d'Informatique et d'Automatique pour les Systèmes (LIAS)
Sonia Bergamaschi
Università degli Studi di Modena e Reggio Emilia
Letizia Tanca
Politecnico di Milano