Figure 2 - uploaded by Antti Ukkonen
Content may be subject to copyright.
Schematic depictions of the user interfaces for the assignment (left, k = 4 ) and triplet tasks (right) in the image clustering system used in our experiments. The worker must in both cases simply click on one of the photographs, upon which the system immediately retrieves a new task. 

Schematic depictions of the user interfaces for the assignment (left, k = 4 ) and triplet tasks (right) in the image clustering system used in our experiments. The worker must in both cases simply click on one of the photographs, upon which the system immediately retrieves a new task. 

Source publication
Conference Paper
Full-text available
The power of human computation is founded on the capabilities of humans to process qualitative information in a manner that is hard to reproduce with a computer. However, all machine learning algorithms rely on mathematical operations, such as sums, averages, least squares etc. that are less suitable for human computation. This paper is an effort t...

Context in source publication

Context 1
... task is to click on the image in the bottom row that is most similar to the image above. The assignment phase consists of |D| such HITs, one for every x ∈ D. Schematic depictions of the user-interface for the assignment and triplet tasks are shown in Figure 2. ...

Citations

... The middle operator helps to get the center of a set of objects. Since the sorting operator is very costly, it is better to avoid high costs by using the middle operator [49]. ...
Article
Full-text available
This study aimed to improve the labeling of objects inside images in the crowdsourcing process. Images are one of the most widely used types of data on the internet. Controlling and refining images through the crowdsourcing process is time-consuming and tedious because of their quality and nature. Gamification of data collection was presented as a solution to review, categorize images, and motivate people. Because participant motivation might influence the quality of the output data, we used gamification elements to manage user interaction in this study. The proposed method has a great effect on improving the quality of output data by considering various challenges such as motivation, financial costs, and delays. The proposed algorithm calculates the average of the points specified by each user and then compares it with the average of the total correct answers. In the end, the proposed algorithm uses this comparison to decide whether to accept or reject the answer. In this research, the LabelMe, Flickr, and VOC2012 datasets were used. Implementing the proposed method in a real context showed that the proposed design improved the image labeling accuracy, which was increased by 11.3% compared to the previous methods. In this experiment, the people who interacted the most generated the most accurate data.
... max{d(x, z), d(y, z)} as in [11]; and outlier queries d(x, y) < ? min{d(x, z), d(y, z)} as in [8]. Queries of the form d(x, y) < ? ...
... Some related work in this setting include a method for learning a distance metric [14] and kernel function [10,15]; obtaining a low-dimensional Euclidean embedding [18]; measuring centrality and data depth [8,11,13]; determining near neighbors [7]; and performing hierarchical [6] and correlation clustering [17]. Many of the above are motivated in part by human-aided computation in which query responses are crowdsourced. ...
Preprint
Full-text available
Data cohesion, a recently introduced measure inspired by social interactions, uses distance comparisons to assess relative proximity. In this work, we provide a collection of results which can guide the development of cohesion-based methods in exploratory data analysis and human-aided computation. Here, we observe the important role of highly clustered "point-like" sets and the ways in which cohesion allows such sets to take on qualities of a single weighted point. In doing so, we see how cohesion complements metric-adjacent measures of dissimilarity and responds to local density. We conclude by proving that cohesion is the unique function with (i) average value equal to one-half and (ii) the property that the influence of an outlier is proportional to its mass. Properties of cohesion are illustrated with examples throughout.
... For example, in [24] each worker views a small set of images as a HIT, where they are asked to provide a partial clustering of the set. Set queries have also been used for tasks such as crowd-sourced median finding [25], crowd-sourced planning [31], etc. ...
Preprint
Full-text available
Existing machine learning models have proven to fail when it comes to their performance for minority groups, mainly due to biases in data. In particular, datasets, especially social data, are often not representative of minorities. In this paper, we consider the problem of representation bias identification on image datasets without explicit attribute values. Using the notion of data coverage for detecting a lack of representation, we develop multiple crowdsourcing approaches. Our core approach, at a high level, is a divide and conquer algorithm that applies a search space pruning strategy to efficiently identify if a dataset misses proper coverage for a given group. We provide a different theoretical analysis of our algorithm, including a tight upper bound on its performance which guarantees its near-optimality. Using this algorithm as the core, we propose multiple heuristics to reduce the coverage detection cost across different cases with multiple intersectional/non-intersectional groups. We demonstrate how the pre-trained predictors are not reliable and hence not sufficient for detecting representation bias in the data. Finally, we adjust our core algorithm to utilize existing models for predicting image group(s) to minimize the coverage identification cost. We conduct extensive experiments, including live experiments on Amazon Mechanical Turk to validate our problem and evaluate our algorithms' performance.
... Pairwise and triplet similarity comparisons are two representative types of similarity comparison tasks [7,12,13,23]. Pairwise similarity comparison, also known as absolute similarity comparison, asks crowd workers to answer the following question, i.e., "Are objects A and B similar?". ...
Preprint
Full-text available
Crowdsourcing has been used to collect data at scale in numerous fields. Triplet similarity comparison is a type of crowdsourcing task, in which crowd workers are asked the question ``among three given objects, which two are more similar?'', which is relatively easy for humans to answer. However, the comparison can be sometimes based on multiple views, i.e., different independent attributes such as color and shape. Each view may lead to different results for the same three objects. Although an algorithm was proposed in prior work to produce multiview embeddings, it involves at least two problems: (1) the existing algorithm cannot independently predict multiview embeddings for a new sample, and (2) different people may prefer different views. In this study, we propose an end-to-end inductive deep learning framework to solve the multiview representation learning problem. The results show that our proposed method can obtain multiview embeddings of any object, in which each view corresponds to an independent attribute of the object. We collected two datasets from a crowdsourcing platform to experimentally investigate the performance of our proposed approach compared to conventional baseline methods.
... Some recent works perform data analytics directly on triplets without computing the kernels. An approximate median of the dataset using triplets is proposed in [16]. Algorithms to estimate density using relative similarity are provided in [34]. ...
... We perform data analytics tasks like median computation, finding approximate nearest neighbors, classification, and clustering. We compare the median results of our approach with CROWD-MEDIAN [16] and LENSDEPTH [21]. Clustering results are compared with LENSDEPTH and we compare classification results with LENSDEPTH and TRIPLETBOOST [27]. ...
... So, we make comparisons with these approaches in error-free settings only, and we show the impact of the error on our methods only. We have implemented CROWD-MEDIAN [16] and LENSDEPTH [21] algorithms while for TRIPLETBOOST, we use the results reported in their paper [27]. ...
... It led to a surge of popularity of comparisons in the context of crowdsourcing information about objects that cannot be easily represented by Euclidean features, such as food (Wilber et al., 2014) or musical artists (Ellis et al., 2002), or objects for which humans cannot robustly estimate a pairwise similarity, for instance cars (Kleindessner and von Luxburg, 2017) or natural scenes (Heikinheimo and Ukkonen, 2013). The purpose of collecting comparisons is often to learn patterns in the objects, such as latent clusters, or use them for prediction, as in classification. ...
... The purpose of collecting comparisons is often to learn patterns in the objects, such as latent clusters, or use them for prediction, as in classification. Hence, there has been significant development of algorithms for comparison based learning (Agarwal et al., 2007;Heikinheimo and Ukkonen, 2013;Haghiri et al., 2017;Kazemi et al., 2018;. ...
Preprint
Full-text available
Comparison-based learning addresses the problem of learning when, instead of explicit features or pairwise similarities, one only has access to comparisons of the form: \emph{Object $A$ is more similar to $B$ than to $C$.} Recently, it has been shown that, in Hierarchical Clustering, single and complete linkage can be directly implemented using only such comparisons while several algorithms have been proposed to emulate the behaviour of average linkage. Hence, finding hierarchies (or dendrograms) using only comparisons is a well understood problem. However, evaluating their meaningfulness when no ground-truth nor explicit similarities are available remains an open question. In this paper, we bridge this gap by proposing a new revenue function that allows one to measure the goodness of dendrograms using only comparisons. We show that this function is closely related to Dasgupta's cost for hierarchical clustering that uses pairwise similarities. On the theoretical side, we use the proposed revenue function to resolve the open problem of whether one can approximately recover a latent hierarchy using few triplet comparisons. On the practical side, we present principled algorithms for comparison-based hierarchical clustering based on the maximisation of the revenue and we empirically compare them with existing methods.
... The middle operator helps to get the center of a set of objects. Since the sorting operator is very costly, it is better to avoid high costs by using the middle operator [53]. ...
Article
This study aimed to improve the labeling of objects inside images in the crowdsourcing process. One of the most widely used data in the web world is images. Controlling and refining images through the crowdsourcing process is timeconsuming and tedious because of their quality and nature. For this purpose, a solution was proposed by combining gamification with the data collection process. The suggested solution is to review, categorize images, and motivate people. The crowdsourcing process has many challenges, one of which is the motivation of the participants. Because the participants' motivation can affect the quality of the output data, we applied gamification elements to control the interaction between users in this study. Users reduce the deceptive behavior by reviewing each other's answers. On the other hand, three important data collection parameters are time, cost, and quality of the obtained data. The proposed method has a great effect on improving the quality of output data by considering various challenges such as motivation, financial costs, and delays. The proposed algorithm calculates the average of the points specified by each user and then compares it with the average of the total correct answers. In the end, the proposed algorithm uses this comparison to decide whether to accept or reject the answer. In this research, the LabelMe, Flickr, and VOC2012 datasets were used. Implementing the proposed method in a real context showed that the proposed design improved the image labeling accuracy, which was increased by 11.3% compared to the previous methods. In this experiment, the people who interacted the most generated the most accurate data.
... Although there are experimental investigations on triplet-based annotation for images, see e.g. [14], there are (to the best of our knowledge) no investigations on how crowdworkers would perform in the triplet-based annotation task in the medical context. This implies that models of crowdworkers, as proposed e.g. in [15], cannot be used, since there is no a priori knowledge on the task complexity 'from a purely objective standpoint', i.e. from 'the characteristics of the task alone' (quoting from [15], preample of section 3.1). ...
... We also found that EDA is not predictive by itself, but we still found associations between correctness and uncertainty indicators. The task of identifying the most similar pair inside a triplet of images has been investigated in [14] and in [33], whereby Ahonen et al. [33] have also measured EDA in their experiment. Ahonen et al. [33] did not come to conclusive results on the predictiveness of EDA. ...
Article
Full-text available
Background: As healthcare-related data proliferate, there is need to annotate them expertly for the purposes of personalized medicine. Crowdworking is an alternative to expensive expert labour. Annotation corresponds to diagnosis, so comparing unlabeled records to labeled ones seems more appropriate for crowdworkers without medical expertise. We modeled the comparison of a record to two other records as a triplet annotation task, and we conducted an experiment to investigate to what extend sensor-measured stress, task duration, uncertainty of the annotators and agreement among the annotators could predict annotation correctness. Materials and methods: We conducted an annotation experiment on health data from a population-based study. The triplet annotation task was to decide whether an individual was more similar to a healthy one or to one with a given disorder. We used hepatic steatosis as example disorder, and described the individuals with 10 pre-selected characteristics related to this disorder. We recorded task duration, electro-dermal activity as stress indicator, and uncertainty as stated by the experiment participants (n = 29 non-experts and three experts) for 30 triplets. We built an Artificial Similarity-Based Annotator (ASBA) and compared its correctness and uncertainty to that of the experiment participants. Results: We found no correlation between correctness and either of stated uncertainty, stress and task duration. Annotator agreement has not been predictive either. Notably, for some tasks, annotators agreed unanimously on an incorrect annotation. When controlling for Triplet ID, we identified significant correlations, indicating that correctness, stress levels and annotation duration depend on the task itself. Average correctness among the experiment participants was slightly lower than achieved by ASBA. Triplet annotation turned to be similarly difficult for experts as for non-experts. Conclusion: Our lab experiment indicates that the task of triplet annotation must be prepared cautiously if delegated to crowdworkers. Neither certainty nor agreement among annotators should be assumed to imply correct annotation, because annotators may misjudge difficult tasks as easy and agree on incorrect annotations. Further research is needed to improve visualizations for complex tasks, to judiciously decide how much information to provide, Out-of-the-lab experiments in crowdworker setting are needed to identify appropriate designs of a human-annotation task, and to assess under what circumstances non-human annotation should be preferred.
... The main purpose of those methods is to facilitate data visualization of similarity inferred from human assessments. However, other tasks employing similarity triplets have been studied, such as medoid estimation [23], density estimation [24], or clustering [25]. Closely related to our approach is [19], which employed deep learning to scale the ordinal problem to large datasets. ...
Article
Full-text available
Ordinal embedding is the task of computing a meaningful multidimensional representation of objects, for which only qualitative constraints on their distance functions are known. In particular, we consider comparisons of the form “Which object from the pair (j, k) is more similar to object i?”. In this paper, we generalize this framework to the case where the ordinal constraints are not given at the level of individual points, but at the level of sets, and propose a distributional triplet embedding approach in a scalable learning framework. We show that the query complexity of our approach is on par with the single-item approach. Without having access to features of the items to be embedded, we show the applicability of our model on toy datasets for the task of reconstruction and demonstrate the validity of the obtained embeddings in experiments on synthetic and real-world datasets.
... The main purpose of the above methods is to facilitate data visualization of similarity inferred from human assessments. However, other tasks employing similarity triplets have been studied, such as medoid estimation Heikinheimo and Ukkonen [2013], density estimation Ukkonen et al. [2015], or clustering Ukkonen [2017]. Closely related to our approach are Haghiri et al. [2019], Anderton and Aslam [2019], which employs deep learning to scale the ordinal problem to large datasets. ...
Preprint
Full-text available
Ordinal embedding aims at finding a low dimensional representation of objects from a set of constraints of the form "item $j$ is closer to item $i$ than item $k$". Typically, each object is mapped onto a point vector in a low dimensional metric space. We argue that mapping to a density instead of a point vector provides some interesting advantages, including an inherent reflection of the uncertainty about the representation itself and its relative location in the space. Indeed, in this paper, we propose to embed each object as a Gaussian distribution. We investigate the ability of these embeddings to capture the underlying structure of the data while satisfying the constraints, and explore properties of the representation. Experiments on synthetic and real-world datasets showcase the advantages of our approach. In addition, we illustrate the merit of modelling uncertainty, which enriches the visual perception of the mapped objects in the space.