Kostas Stefanidis

Kostas Stefanidis
Tampere University | UTA · School of Information Sciences

PhD in Computer Science

About

136
Publications
29,397
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,022
Citations
Additional affiliations
October 2011 - November 2012
Norwegian University of Science and Technology
Position
  • PostDoc Position
February 2010 - March 2011
The Chinese University of Hong Kong
Position
  • PostDoc Position

Publications

Publications (136)
Article
Full-text available
The explosion of the web and the abundance of linked data demand effective and efficient methods for storage, management, and querying. Apache Spark is one of the most widely used engines for big data processing, with more and more systems adopting it for efficient query answering. Existing approaches exploiting Spark for querying RDF data, adopt p...
Article
Full-text available
Methods for producing summaries from structured data have gained interest due to the huge volume of available data in the Web. Simultaneously, there have been advances in natural language generation from Resource Description Framework (RDF) data. However, no efforts have been made to generate natural language summaries for groups of multiple RDF en...
Article
Full-text available
Nowadays, in the pursuit of personalized health and well-being, dietary choices are critical. This paper introduces a novel recommendation system designed to provide users with personalized meal plans, consisting of breakfast, lunch, snack, and dinner, in alignment with their health history and preferences from other similar users. More specificall...
Chapter
Team assembly is a problem that demands trade-offs between multiple fairness criteria and computational optimization. We focus on four criteria: (i) fair distribution of workloads within the team, (ii) fair distribution of skills and expertise regarding project requirements, (iii) fair distribution of protected classes in the team, and (iv) fair di...
Preprint
Full-text available
Team assembly is a problem that demands trade-offs between multiple fairness criteria and computational optimization. We focus on four criteria: (i) fair distribution of workloads within the team, (ii) fair distribution of skills and expertise regarding project requirements, (iii) fair distribution of protected classes in the team, and (iv) fair di...
Chapter
Full-text available
Knowledge Graphs (KGs) have recently gained attention for representing knowledge about a particular domain and play a central role in a multitude of AI tasks like recommendations and query answering. Recent works have revealed that KG embedding methods used to implement these tasks often exhibit direct forms of bias (e.g., related to gender, nation...
Preprint
Full-text available
This paper studies algorithmic fairness when the protected attribute is location. To handle protected attributes that are continuous, such as age or income, the standard approach is to discretize the domain into predefined groups, and compare algorithmic outcomes across groups. However, applying this idea to location raises concerns of gerrymanderi...
Article
Full-text available
Web systems have become a valuable source of semi-structured and streaming data. In this sense, Entity Resolution (ER) has become a key solution for integrating multiple data sources or identifying similarities between data items, namely entities. To avoid the quadratic costs of the ER task and improve efficiency, blocking techniques are usually ap...
Article
Nowadays, sequential recommendations are becoming more prevalent. A user expects the system to remember past interactions and not conduct each recommendation round as a stand-alone process. Additionally, group recommendation systems are more prominent since more and more people are able to form groups for activities. Subsequently, the data that a g...
Article
Full-text available
We increasingly depend on a variety of data-driven algorithmic systems to assist us in many aspects of life. Search engines and recommender systems among others are used as sources of information and to help us in making all sort of decisions from selecting restaurants and books, to choosing friends and careers. This has given rise to important con...
Article
Full-text available
Recently, group recommendations have gained much attention. Nevertheless, most approaches consider only one round of recommendations. However, in a real-life scenario, it is expected that the history of previous recommendations is exploited to tailor the recommendations towards meeting the needs of the group members. Such history should include not...
Article
Full-text available
Recommender systems were originally proposed for suggesting potentially relevant items to users, with the unique objective of providing accurate suggestions. These recommenders started being adopted in several domains, and were identified as generating biased results that could harm the data items being recommended. The exposure in generated rankin...
Article
Creating a holistic view of patient data comes with many challenges but also brings many benefits for disease prediction, prevention, diagnosis, and treatment. Especially in the COVID-19 era, this is more important than ever before. The third International Workshop on Semantic Web Meets Health Data Management (SWH) was aimed at bringing together an...
Conference Paper
Full-text available
There is an urgent call to detect and prevent "biased data" at the earliest possible stage of the data pipelines used to build automated decision-making systems. In this paper, we are focusing on controlling the data bias in entity resolution (ER) tasks aiming to discover and unify records/descriptions from different data sources that refer to the...
Chapter
As more and more data become available as linked data, the need for efficient and effective methods for their exploration becomes apparent. Semantic summaries try to extract meaning from data, while reducing its size. State of the art structural semantic summaries, focus primarily on the graph structure of the data, trying to maximize the summary’s...
Preprint
Full-text available
We increasingly depend on a variety of data-driven algorithmic systems to assist us in many aspects of life. Search engines and recommender systems amongst others are used as sources of information and to help us in making all sort of decisions from selecting restaurants and books, to choosing friends and careers. This has given rise to important c...
Article
Full-text available
Playability is a key concept in game studies defining the overall quality of video games. Although its definition and frameworks are widely studied, methods to analyze and evaluate the playability of video games are still limited. Using heuristics for playability evaluation has long been the mainstream with its usefulness in detecting playability i...
Article
The advancements in health-care have brought to the foreground the need for flexible access to health-related information and created an ever-growing demand for efficient data management infrastructures. To this direction, many challenges must be first overcome, enabling seamless, effective and efficient access to several health data sets and novel...
Article
Full-text available
One of the most critical tasks for improving data quality and increasing the reliability of data analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to the same real-world entity. Despite several decades of research, ER remains a challenging problem. In this survey, we highlight the novel aspects of resolvi...
Chapter
Throughout our digital lives, we are getting recommendations for about almost everything we do, buy or consume. However, it is often the case that recommenders cannot locate the best data items to suggest. To deal with this shortcoming, they provide explanations for the reasons specific items are suggested. In this work, we focus on explanations fo...
Preprint
Full-text available
An increasing number of entities are described by interlinked data rather than documents on the Web. Entity Resolution (ER) aims to identify descriptions of the same real-world entity within one or across knowledge bases in the Web of data. To reduce the required number of pairwise comparisons among descriptions, ER methods typically perform a pre-...
Chapter
A recommender system can be considered as an information filtering system that seeks to predict the preference a user would have for a data item. It is commonly utilized in digital stores to recommend products to their users according to the users’ previous purchases. This applies to Steam as well, a widely used digital distribution platform for ga...
Article
Full-text available
Mobile applications (apps) on IOS and Android devices are mostly maintained and updated via Apple Appstore and Google Play, respectively, where the users are allowed to provide reviews regarding their satisfaction towards particular apps. Despite the importance of user reviews towards mobile app maintenance and evolution, it is time-consuming and i...
Article
Full-text available
How did the popularity of the Greek Prime Minister evolve in 2015? How did the predominant sentiment about him vary during that period? Were there any controversial sub-periods? What other entities were related to him during these periods? To answer these questions, one needs to analyze archived documents and data about the query entities, such as...
Article
Full-text available
Providing useful resources to patients is essential in achieving the vision of participatory medicine. However, the problem of identifying pertinent content for a group of patients is even more difficult than identifying information for just one. Nevertheless, studies suggest that the group dynamics-based principles of behavior change have a positi...
Article
Better information management is the key to a more intelligent health and social system. To this direction, many challenges must be first overcome, enabling seamless, effective and efficient access to various health data sets and novel methods for exploiting the available information. The First International Workshop on Semantic Web Technologies fo...
Conference Paper
The widespread use of information systems has become a valuable source of semi-structured data. In this context, Entity Resolution (ER) emerges as a fundamental task to integrate multiple knowledge bases or identify similarities between data items (i.e., entities). Since ER is an inherently quadratic task, blocking techniques are often used to impr...
Conference Paper
Full-text available
Together with the prevalence of e-commerce and online shopping , recommender systems have been playing an increasingly important role in people's daily lives in terms of discovering their potential preferences. Therein, users' preferences are mostly reflected by their online behaviors, specially their evaluation towards particular items, e.g., nume...
Chapter
Video games are a relatively new form of entertainment that has been rapidly gaining popularity in recent years. The number of video games available to users is huge and constantly growing, and thus it can be a daunting task to search for new ones to play. Given that some games are designed to be played together as a group, finding games suitable f...
Chapter
One of the most popular features in the FIFA18 game is the career mode, where the target of the users is to improve their teams and win as much competitions as possible. Usually, it is hard for the users to decide which players to select to buy to maximally improve their team by taking into account all different players’ attributes. In this paper,...
Chapter
Full-text available
Together with the prevalence of e-commerce and online shopping, recommender systems have been playing an increasingly important role in people’s daily lives in terms of discovering their potential preferences. Therein, users’ preferences are mostly reflected by their online behaviors, specially their evaluation towards particular items, e.g., numer...
Preprint
Full-text available
Entity Resolution (ER) aims to identify different descriptions in various Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the Variety, Volume and Veracity of entity descriptions published in the Web of Data. To address them, we propose the MinoanER framework that simultaneously fulfills full automation, support of highly he...
Preprint
One of the most important tasks for improving data quality and the reliability of data analytics results is Entity Resolution (ER). ER aims to identify different descriptions that refer to the same real-world entity, and remains a challenging problem. While previous works have studied specific aspects of ER (and mostly in traditional settings), in...
Conference Paper
The increasing use of Web systems has become a valuable source of semi-structured data. In this context, the Entity Resolution (ER) task emerges as a fundamental step to integrate multiple knowledge bases or identify similarities between the data items (i.e., entities). Usually, blocking techniques are widely applied as an initial step of ER approa...
Conference Paper
Full-text available
Entity Resolution (ER) aims to identify different descriptions in various Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the Variety, Volume and Veracity of entity descriptions published in the Web of Data. To address them, we propose the MinoanER framework that simultaneously fulfills full automation, support of highly he...
Conference Paper
Full-text available
In this paper, we present RDFDigest+, a novel tool that enables effective and efficient RDF/S Knowledge Base (KB) exploration using summaries. The tool employs a diverse set of algorithms for identifying the most important nodes, offering a wide range of possibilities to capture importance. The selected nodes can be combined using multiple state of...
Preprint
How did the popularity of the Greek Prime Minister evolve in 2015? How did the predominant sentiment about him vary during that period? Were there any controversial sub-periods? What other entities were related to him during these periods? To answer these questions, one needs to analyze archived documents and data about the query entities, such as...
Conference Paper
Full-text available
The contemporary online mobile application (app) market enables users to review the apps they use. These reviews are important assets reflecting the users needs and complaints regarding the particular apps, covering multiple aspects of the mobile apps quality. By investigating the content of such reviews, the app developers can acquire useful infor...
Chapter
The focus of this work is on providing an open source software recommendations using the Github API. Specifically, we propose a hybrid method that considers the programming languages, topics and README documents that appear in the users’ repositories. To demonstrate our approach, we implement a proof of concept that provides recommendations.
Conference Paper
Full-text available
The user reviews of mobile apps are important assets that reflect the users' needs and complaints about particular apps regarding features, usability, and designs. From investigating the content of such reviews, the app developers can acquire useful information guiding the future maintenance and evolution work. Previous studies on opinion mining in...
Chapter
Significant efforts have been dedicated recently to the development of architectures for storing and querying RDF data in distributed environments. Several approaches focus on data partitioning, which are able to answer queries efficiently, by using a small number of computational nodes. However, such approaches provide static data partitions. Give...
Conference Paper
Full-text available
Significant efforts have been dedicated recently to the development of architectures for storing and querying RDF data in distributed environments. Several approaches focus on data partitioning, which are able to answer queries efficiently, by using a small number of computational nodes. However, such approaches provide static data partitions. Give...
Conference Paper
Full-text available
The explosion of the web and the abundance of linked data demand for effective and efficient methods for storage, management and querying. More specifically, the ever-increasing size and number of RDF data collections raises the need for efficient query answering, and dictates the usage of distributed data management systems for effectively partiti...
Conference Paper
Full-text available
Entity Resolution (ER) aims to identify different descriptions in various Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the Variety, Volume and Veracity of descriptions published in the Web of Data. To address them, we propose the MinoanER framework that fulfills full automation and support of highly heterogeneous entitie...
Conference Paper
Social media archives serve as important historical information sources, and thus meaningful analysis and exploration methods are of immense value for historians, sociologists and other interested parties. In this paper, we propose an entity-centric approach to analyze social media archives and we define measures that allow studying how entities ar...
Chapter
In the Web of data, entities are described by interlinked data rather than documents on the Web. In this talk, we focus on entity resolution in the Web of data, i.e., on the problem of identifying descriptions that refer to the same real-world entity within one or across knowledge bases in the Web of data. To reduce the required number of pairwise...
Chapter
Recommender systems have received significant attention, with most of the proposed methods focusing on recommendations for single users. However, there are contexts in which the items to be suggested are not intended for a user but for a group of people. For example, assume a group of friends or a family that is planning to watch a movie or visit a...
Conference Paper
Throughout our digital lives, we are getting recommendations for about almost everything we do, buy or consume. In that way, the field of recommender systems has been evolving vastly to match the increasing user needs accordingly. News, products, ideas and people are only a few of the things that we can be recommended with daily. However, even with...
Conference Paper
During the last decade, the number of users who look for health-related information has impressively increased. On the other hand, health professionals have less and less time to recommend useful sources of such information online to their patients. To this direction, we target at streamlining the process of providing useful online information to p...
Conference Paper
As knowledge bases are constantly evolving, there is a clear need for monitoring and analyzing the changes that occur on them. Traditional approaches for studying the evolution of data focus on providing humans with deltas that include loads of information. In this work, we envision a processing model that recommends evolution measures taking into...
Article
Entity resolution constitutes a crucial task for many applications, but has an inherently quadratic complexity. In order to enable entity resolution to scale to large volumes of data, blocking is typically employed: it clusters similar entities into (overlapping) blocks so that it suffices to perform comparisons only within each block. To further i...
Conference Paper
Full-text available
Recommender systems have received significant attention, with most of the proposed methods focusing on recommendations for single users. However, there are contexts in which the items to be suggested are not intended for a user but for a group of people. For example, assume a group of friends or a family that is planning to watch a movie or visit a...
Article
The purpose of the ExploreDB workshop is to bring together researchers and practitioners that approach data exploration from different angles, ranging from data management, information retrieval to data visualization and human computer interaction, in order to study the emerging needs and objectives for data exploration, as well as the challenges a...
Article
Full-text available
An increasing number of entities are described by interlinked data rather than documents on the Web. Entity Resolution (ER) aims to identify descriptions of the same real-world entity within one or across knowledge bases in the Web of data. To reduce the required number of pairwise comparisons among descriptions, ER methods typically perform a pre-...
Conference Paper
Recommender systems have become indispensable for several Web sites, such as Amazon, Netflix and Google News, helping users navigate through the abundance of available choices. Although the field has advanced impressively in the last years with respect to models, usage of heterogeneous information, such as ratings and text reviews, and recommendati...
Article
Full-text available
D2V, a research prototype for analysing the dynamics of Linked Open Data, has been used to study the evolution of biomedical datasets, such as the Experimental Factor Ontology (EFO) and the Gene Ontology (GO). Datasets are continuously evolving over time as our knowledge increases. Biomedical datasets in particular have undergone rapid changes in r...
Conference Paper
Full-text available
The dynamic nature of the data on the Web gives rise to a multitude of problems related to the description and analysis of the evolution of such data. Traditional approaches for identifying and analyzing changes are descriptive, focus-ing on the provision of a " delta " that describes the changes and often overwhelming the user with loads of inform...
Conference Paper
Full-text available
Entity resolution constitutes a crucial task for many applications, but has an inherently quadratic complexity. Typically, it scales to large volumes of data through blocking: similar entities are clustered into blocks so that it suffices to perform comparisons only within each block. Meta-blocking further increases efficiency by cleaning the overl...
Conference Paper
Full-text available
In the Web of data, entities are described by inter-linked data rather than documents on the Web. In this work, we focus on entity resolution in the Web of data, i.e., identifying descriptions that refer to the same real-world entity. To reduce the required number of pairwise comparisons, methods for entity resolution perform blocking as a pre-proc...
Conference Paper
Full-text available
Top-k is a well-studied problem in the literature, due to its wide spectrum of applications, like information retrieval, database querying, Web search and data mining. In the big data era, the volume of the data and their velocity, call for efficient parallel solutions that overcome the restricted resources of a single machine. Our motivating appli...
Poster
Full-text available
The dynamic nature of Web data gives rise to the need of understanding and analyzing the dynamics of individual datasets. As a matter of fact, the value of a dynamic dataset lies not only in its content, but also in its evolution history, which in some applications (e.g., trend analysis and identification), is more important than the data itself. I...
Conference Paper
Full-text available
The dynamic nature of Web data gives rise to a multitude of problems related to the description and analysis of the evolution of RDF datasets, which are important to a large number of users and domains, such as, the curators of biological information where changes are constant and interrelated. In this paper, we propose a framework that enables ide...
Book
In recent years, several knowledge bases have been built to enable large-scale knowledge sharing, but also an entity-centric Web search, mixing both structured data and text querying. These knowledge bases offer machine-readable descriptions of real-world entities, e.g., persons, places, published on the Web as Linked Data. However, due to the diff...
Article
The purpose of the ExploreDB workshop is to bring together researchers and practitioners that approach data exploration from different angles, ranging from data management, information retrieval to data visualization and human computer interaction, in order to study the emerging needs and objectives for data exploration, as well as the challenges a...
Article
Full-text available
The dynamic nature of Web data gives rise to a multitude of problems related to the identification, computation and management of the evolving versions and the related changes. In this paper, we consider the problem of change recognition in RDF datasets, i.e., the problem of identifying, and when possible give semantics to, the changes that led fro...
Technical Report
Full-text available
The dynamic nature of Web data gives rise to a multitude of problems related to the description and analysis of the evolution of RDF datasets, which are important to a large number of users and domains, such as, the curators of biological information where changes are constant and interrelated. In this paper, we propose a framework that enables ide...
Conference Paper
In this paper, we propose a model for enabling users to search RDF data via keywords, thus, allowing them to discover relevant information without using complicated queries or knowing the underlying ontology or vocabulary. We aim at exploiting the characteristics of the RDF data to increase the quality of the ranked query results. We consider diffe...
Chapter
In this chapter, we present the experimental framework we have designed for a critical assessment of blocking algorithms. In particular, we describe the datasets and the measures we employed to study the behavior of the blocking algorithms under different semantic and structural characteristics of entity descriptions in the Linked Open Data (LOD) c...
Chapter
The Web bears the potential of being a universal source of knowledge used to answer questions, retrieve facts, solve problems, or create new knowledge. Many major scientific discoveries have been made possible by recognizing the connections across domains or by integrating insights from several sources [Gruber, 2008]. This process requires accessin...
Chapter
As we have seen in Chapter 2.1, grouping entity descriptions in blocks before comparing them for matching is an important pre-processing step for pruning the quadratic number of comparisons required to resolve a collection of entity descriptions. The main objective of algorithms for entity blocking, formally defined in Section 3.1, is to achieve a...
Chapter
As we have seen in Chapter 1, an increasing number of real-world entities are described by a multitude of Knowledge Bases (KBs) published in the Web of data. These descriptions may provide partial, overlapping, and sometimes contradicting information for the same entities. Understanding how two descriptions are related is an essential task to a num...
Chapter
As we have seen in Chapter 2.1, to minimize the number of missed matches, an iterative entity resolution (ER) process can progressively exploit any intermediate results of blocking and matching, discovering new candidate description pairs for resolution, even if this process entails additional processing cost. The main objective of the algorithms f...
Conference Paper
Full-text available
As the usage of social networks becomes more and more ubiquitous and people commute more often today, social streams have become a valuable source for many kinds of applications. For example, the various social streams could be exploited for choosing the optimal path (e.g., The shortest and/or the fastest) to reach a desired destination. To this di...
Article
Databases are well-organized collections of data. Structured query languages, such as SQL, XQuery, and SPARQL, enable users to formulate precise queries over the data stored in a database. To be successful, users need to be familiar with the query language and the underlying data organization. Visual analytics naturally integrate the human in the d...
Article
Recently, social networks have attracted considerable attention. The huge volume of information contained in them, as well as their dynamic nature, make the problem of searching social data challenging. In this work, we envision the design of a complete framework for social search by exploiting both the underlying social graph and the temporal info...
Conference Paper
Nowadays, WWW brings overwhelming variety of choices to consumers. Recommendation systems facilitate the selection by issuing recommendations to them. Recommendations for users, or groups, are determined by considering users similar to the users in question. Scanning the whole database for locating similar users, though, is expensive. Existing appr...
Conference Paper
Full-text available
When dealing with dynamically evolving datasets, users are often interested in the state of affairs on previous versions of the dataset, and would like to execute queries on such previous versions, as well as queries that compare the state of affairs across different versions. This is especially true for datasets stored in the Web, where the interl...

Network

Cited By