Fig 3 - uploaded by Michael Behrisch
Content may be subject to copyright.
Four different sets are distinguished in the approach: (1) The exploration set ES contains all possible views. (2) A sampled version of the

Four different sets are distinguished in the approach: (1) The exploration set ES contains all possible views. (2) A sampled version of the

Source publication
Article
Full-text available
The extraction of relevant and meaningful information from multivariate or high-dimensional data is a challenging problem. One reason for this is that the number of possible representations, which might contain relevant information, grows exponentially with the amount of data dimensions. Also, not all views from a possibly large view space, are pot...

Context in source publication

Context 1
... Figure 3 depicts, from a potentially very large exploration set, denoted as ES, only a limited amount of visualizations can be pre- sented initially to the user. We will denote this subset as the represen- tation set RS. ...

Citations

... Third, to better guide user exploration and insight generation, researchers have proposed interactive ML for visual data exploration, which learns what visual concepts are important to users from user feedback [5,10,15,32,61]. For example, Behrisch et al. [5] trained a classifier to interactively capture users' notion of interestingness when exploring many scatter plots. ...
... Third, to better guide user exploration and insight generation, researchers have proposed interactive ML for visual data exploration, which learns what visual concepts are important to users from user feedback [5,10,15,32,61]. For example, Behrisch et al. [5] trained a classifier to interactively capture users' notion of interestingness when exploring many scatter plots. This classifier is then used to recommend potentially interesting plots and guide the exploration of large multidimensional data. ...
Conference Paper
Full-text available
Latent vectors extracted by machine learning (ML) are widely used in data exploration (e.g., t-SNE) but suffer from a lack of interpretability. While previous studies employed disentangled representation learning (DRL) to enable more interpretable exploration, they often overlooked the potential mismatches between the concepts of humans and the semantic dimensions learned by DRL. To address this issue, we propose Drava, a visual analytics system that supports users in 1) relating the concepts of humans with the semantic dimensions of DRL and identifying mismatches, 2) providing feedback to minimize the mismatches, and 3) obtaining data insights from concept-driven exploration. Drava provides a set of visualizations and interactions based on visual piles to help users understand and refine concepts and conduct concept-driven exploration. Meanwhile, Drava employs a concept adaptor model to fine-tune the semantic dimensions of DRL based on user refinement. The usefulness of Drava is demonstrated through application scenarios and experimental validation.
... Visual data analysis design considering user's preferences and interest is a research topic that receives increasing attention in recent years. Current research mainly focuses on using either explicit user feedback [5,10,38] provided in the form of ratings (representing user's visual preferences) or tags (representing user's topic of needs), or gaze movements [48,50,51], to help identify visualizations that best address the users' preferences, expertise, and tasks. ...
Chapter
Advances in sensor and data acquisition technology and in methods of data analysis pose many research challenges but also promising application opportunities in many domains. The need to cope with and leverage large sensor data streams is particularly urgent for industrial applications due to strong business competition and innovation pressure. In maintenance, for example, sensor readings of machinery or products may allow to predict at which point in time maintenance will be required and allow to schedule service operations respectively. Another application is the discovery of the relationships between production input parameters on the quality of the output products. Analysis of respective industrial data typically cannot be done in an out-of-the-box manner but requires to incorporate background knowledge from fields such as engineering, operation research, and business to be effective. Hence, approaches for interactive and visual data analysis can be particularly useful for analyzing complex industrial data, combining the advantages of modern automatic data analysis with domain knowledge and hypothesis generation capabilities of domain experts. In this chapter, we introduce some of the main principles of visual data analysis. We discuss how techniques for data visualization, data analysis, and user interaction can be combined to analyze data, generate and verify hypotheses about patterns in data, and present the findings. We discuss this in the light of important requirements and applications in the analysis of industrial data and based on current research in the area. We provide examples for visual data analysis approaches, including condition monitoring, quality control, and production planning.
... Talbot et al. [34] introduced an interactive visualization system called EnsembleMatrix, which provided an excellent human-computer interaction interface to allow users to quickly combine multiple classifiers operating on numerous feature sets to generate an integrated classifier. Behrisch M et al. [35] designed a framework for feedback-driven view exploration, allowing users to iteratively express the concepts they were interested in and optimize the recommendation algorithm. Paiva et al. [36] proposed a visual data classification method to support users in completing classification-related tasks, such as model creation and classifier adjustment and optimization, to achieve the goal of automatic data classification. ...
Article
Full-text available
Wind power ramp events (WPREs) are a common phenomenon in wind power generation. This unavoidable phenomenon poses a great harm to the balance of active power and the stability of frequency in the power supply system, which seriously threatens the safety, stability, and economic operation of the power grid. In order to deal with the impact of ramp events, accurate and rapid detection of ramp events is of great significance for the formulation of response measures. However, some attribute information is ignored in previous studies, and the laws and characteristics of ramp events are difficult to present intuitively. In this paper, we propose a visualization-based ramp event detection model for wind power generation. Firstly, a ramp event detection model is designed considering the multidimensional attributes of ramp events. Then, an uncertainty analysis scheme of ramp events based on the confidence is proposed, enabling users to analyze and judge the detection results of ramp events from different dimensions. In addition, an interactive optimization model is designed, supporting users to update samples interactively, to realize iterative optimization of the detection model. Finally, a set of visual designs and user-friendly interactions are implemented, enabling users to explore WPREs, judge the identification results, and interactively optimize the model. Case studies and expert interviews based on real-world datasets further demonstrate the effectiveness of our system in the WPREs identification, the exploration of the accuracy of identification results, and interactive optimization.
... For similar reasons, the paper byWenskovitch et al. [WCR * 18], that tries to connect and aggregate benefits from clustering and DR methods, was excluded. Moreover, papers on high-dimensional data clustering or exploratory data analysis are not included (e.g., Behrisch et al.[BKSS14], Lehmann et al. [LKZ * 15], Nam et al. [NHM * 07], and Wu et al. [WCH * 15]). Finally, there are related works that provide important contributions to the visualization community, but do not study trust explicitly, and thus were not included: ...
Preprint
Machine learning (ML) models are nowadays used in complex applications in various domains, such as medicine, bioinformatics, and other sciences. Due to their black box nature, however, it may sometimes be hard to understand and trust the results they provide. This has increased the demand for reliable visualization tools related to enhancing trust in ML models, which has become a prominent topic of research in the visualization community over the past decades. To provide an overview and present the frontiers of current research on the topic, we present a State-of-the-Art Report (STAR) on enhancing trust in ML models with the use of interactive visualization. We define and describe the background of the topic, introduce a categorization for visualization techniques that aim to accomplish this goal, and discuss insights and opportunities for future research directions. Among our contributions is a categorization of trust against different facets of interactive ML, expanded and improved from previous research. Our results are investigated from different analytical perspectives: (a) providing a statistical overview, (b) summarizing key findings, (c) performing topic analyses, and (d) exploring the data sets used in the individual papers, all with the support of an interactive web-based survey browser. We intend this survey to be beneficial for visualization researchers whose interests involve making ML models more trustworthy, as well as researchers and practitioners from other disciplines in their search for effective visualization techniques suitable for solving their tasks with confidence and conveying meaning to their data.
... Thus, they are often combined with other visualization techniques. For instance, Bremm et al. [BLH*11] combine node-link diagrams with distance matrices, Behrisch et al. [BKSS15] with scatterplot views and Elzen et al. [EW11] with streamgraphs. As introduced by Schneiderman [Shn92], Treemaps are rectangular shapes that recursively subdivide rectangular spaces according to an underlying hierarchy. ...
... In today's highly automated manufacturing processes, the extraction of relevant and meaningful information from high-dimensional data remains a challenging problem [BKSS15,EBJ*21]. In this regard, the cooperation between human experts and ML techniques has often proved to be a promising solution by combining the strengths of both worlds [SMF*20, JFSK15]. ...
Article
Full-text available
Random Forests (RFs) are a machine learning (ML) technique widely used across industries. The interpretation of a given RF usually relies on the analysis of statistical values and is often only possible for data analytics experts. To make RFs accessible to experts with no data analytics background, we present RfX, a Visual Analytics (VA) system for the analysis of a RF's decision‐making process. RfX allows to interactively analyse the properties of a forest and to explore and compare multiple trees in a RF. Thus, its users can identify relationships within a RF's feature subspace and detect hidden patterns in the model's underlying data. We contribute a design study in collaboration with an automotive company. A formative evaluation of RFX was carried out with two domain experts and a summative evaluation in the form of a field study with five domain experts. In this context, new hidden patterns such as increased eccentricities in an engine's rotor by observing secondary excitations of its bearings were detected using analyses made with RfX. Rules derived from analyses with the system led to a change in the company's testing procedures for electrical engines, which resulted in 80% reduced testing time for over 30% of all components.
... A second important type of labels is continuous labels, which are often applied when a more fine-grained degree of interestingness is required. Here, systems exist for candidate rating and evaluation [56] or to choose between irrelevant and relevant views [6]. Yet other labeling approaches allow the comparison of pairs of objects [10] or groups of objects [16] in combination with algorithmic models using this implicit feedback to adjust the attribute weightings or the feature space. ...
Article
Full-text available
In this design study, we present IRVINE, a Visual Analytics (VA) system, which facilitates the analysis of acoustic data to detect and understand previously unknown errors in the manufacturing of electrical engines. In serial manufacturing processes, signatures from acoustic data provide valuable information on how the relationship between multiple produced engines serves to detect and understand previously unknown errors. To analyze such signatures, IRVINE leverages interactive clustering and data labeling techniques, allowing users to analyze clusters of engines with similar signatures, drill down to groups of engines, and select an engine of interest. Furthermore, IRVINE allows to assign labels to engines and clusters and annotate the cause of an error in the acoustic raw measurement of an engine. Since labels and annotations represent valuable knowledge, they are conserved in a knowledge database to be available for other stakeholders. We contribute a design study, where we developed IRVINE in four main iterations with engineers from a company in the automotive sector. To validate IRVINE, we conducted a field study with six domain experts. Our results suggest a high usability and usefulness of IRVINE as part of the improvement of a real-world manufacturing process. Specifically, with IRVINE domain experts were able to label and annotate produced electrical engines more than 30% faster.
... We propose PSEUDo, a scalable (visual) pattern analysis technique based on the conceptual idea of relevance feedback-driven exploration [9], allowing us to tackle all four challenges. Our method focuses on a human-in-the-loop approach in which we combine a query-aware adaption of Locality-Sensitive Hashing (LSH) with a novel feedback-driven active learning algorithm. ...
... On the one hand, users need a mechanism to define a visual search pattern, for instance, through query-by-sketch or query-by-example [21], and on the other hand, a way to steer the retrieval process. One opportunity here is active learning-based approaches that rely on the user's explicit relevance feedback, such as in [9,23,50]. ...
... We derive our conceptual model, depicted in Fig. 2, from FDive, the feedback-driven preference model learning approach proposed by Behrisch et al. [9]. In light of this conceptual model, we assume that the randomly initialized parameters in LSH functions are trainable, creating room for coping with the ambiguity of time series similarity. ...
Preprint
Full-text available
We present PSEUDo, an adaptive feature learning technique for exploring visual patterns in multi-track sequential data. Our approach is designed with the primary focus to overcome the uneconomic retraining requirements and inflexible representation learning in current deep learning-based systems. Multi-track time series data are generated on an unprecedented scale due to increased sensors and data storage. These datasets hold valuable patterns, like in neuromarketing, where researchers try to link patterns in multivariate sequential data from physiological sensors to the purchase behavior of products and services. But a lack of ground truth and high variance make automatic pattern detection unreliable. Our advancements are based on a novel query-aware locality-sensitive hashing technique to create a feature-based representation of multivariate time series windows. Most importantly, our algorithm features sub-linear training and inference time. We can even accomplish both the modeling and comparison of 10,000 different 64-track time series, each with 100 time steps (a typical EEG dataset) under 0.8 seconds. This performance gain allows for a rapid relevance feedback-driven adaption of the underlying pattern similarity model and enables the user to modify the speed-vs-accuracy trade-off gradually. We demonstrate superiority of PSEUDo in terms of efficiency, accuracy, and steerability through a quantitative performance comparison and a qualitative visual quality comparison to the state-of-the-art algorithms in the field. Moreover, we showcase the usability of PSEUDo through a case study demonstrating our visual pattern retrieval concepts in a large meteorological dataset. We find that our adaptive models can accurately capture the user's notion of similarity and allow for an understandable exploratory visual pattern retrieval in large multivariate time series datasets.
... Several works focused on high-level aspects, modeling and characterizing the problem in Information Visualization, see, e.g., Schulz et al. [19] and Visual Analytics, see, e.g., Ceneda et al. [5]. Other proposals focused on low-level aspects, like suggesting areas of interaction in the visualization, see, e.g., Boy et al., in [4] and/or best data representation, see, e.g., Behrisch et al. [3]. Sarvghad et al. [18] and Xia et al. [12] propose a way to assess which dimensions are the most used in a multidimensional dataset analysis, and which are the relationships among them. ...
Conference Paper
Full-text available
Filtering is one of the basic interaction techniques in Information Visualization, with the main objective of limiting the amount of dis- played information using constraints on attribute values. Research focused on direct manipulation selection means or on simple interactors like sliders or check-boxes: while the interaction with a single attribute is, in principle, straightforward, getting an under- standing of the relationship between multiple attribute constraints and the actual selection might be a complex task. To cope with this problem, usually referred as cross-filtering, the paper provides a general definition of the structure of a filter, based on domain values and data distribution, the identification of visual feedbacks on the relationship between filters status and the current selection, and guidance means to help in fulfilling the requested selection. Then, leveraging on the definition of these design elements, the paper pro- poses CrossWidgets, modular attribute selectors that provide the user with feedback and guidance during complex interaction with multiple attributes. An initial controlled experiment demonstrates the benefits that CrossWidgets provide to cross-filtering activities.
... Scherer et al. [48] introduced regressional feature vectors to enable visual sketch-based queries and to explore interesting scatter plots. Behrisch et al. [6] propose a feedback-driven approach that iteratively learns the preferences of the user to make valuable suggestions for further explorations. Turkay et al. [56] present a method to interactively generate representative factors that combine several data points across dimensions in order to reduce the number of dimensions. ...
Preprint
Investigating relationships between variables in multi-dimensional data sets is a common task for data analysts and engineers. More specifically, it is often valuable to understand which ranges of which input variables lead to particular values of a given target variable. Unfortunately, with an increasing number of independent variables, this process may become cumbersome and time-consuming due to the many possible combinations that have to be explored. In this paper, we propose a novel approach to visualize correlations between input variables and a target output variable that scales to hundreds of variables. We developed a visual model based on neural networks that can be explored in a guided way to help analysts find and understand such correlations. First, we train a neural network to predict the target from the input variables. Then, we visualize the inner workings of the resulting model to help understand relations within the data set. We further introduce a new regularization term for the backpropagation algorithm that encourages the neural network to learn representations that are easier to interpret visually. We apply our method to artificial and real-world data sets to show its utility.
... For similar reasons, the paper byWenskovitch et al. [WCR * 18], that tries to connect and aggregate benefits from clustering and DR methods, was excluded. Moreover, papers on high-dimensional data clustering or exploratory data analysis are not included (e.g., Behrisch et al.[BKSS14], Lehmann et al. [LKZ * 15], Nam et al. [NHM * 07], and Wu et al. [WCH * 15]). Finally, there are related works that provide important contributions to the visualization community, but do not study trust explicitly, and thus were not included: ...
Article
Full-text available
Machine learning (ML) models are nowadays used in complex applications in various domains, such as medicine, bioinformatics, and other sciences. Due to their black box nature, however, it may sometimes be hard to understand and trust the results they provide. This has increased the demand for reliable visualization tools related to enhancing trust in ML models, which has become a prominent topic of research in the visualization community over the past decades. To provide an overview and present the frontiers of current research on the topic, we present a State‐of‐the‐Art Report (STAR) on enhancing trust in ML models with the use of interactive visualization. We define and describe the background of the topic, introduce a categorization for visualization techniques that aim to accomplish this goal, and discuss insights and opportunities for future research directions. Among our contributions is a categorization of trust against different facets of interactive ML, expanded and improved from previous research. Our results are investigated from different analytical perspectives: (a) providing a statistical overview, (b) summarizing key findings, (c) performing topic analyses, and (d) exploring the data sets used in the individual papers, all with the support of an interactive web‐based survey browser. We intend this survey to be beneficial for visualization researchers whose interests involve making ML models more trustworthy, as well as researchers and practitioners from other disciplines in their search for effective visualization techniques suitable for solving their tasks with confidence and conveying meaning to their data.