Figure - available from: User Modeling and User-Adapted Interaction
This content is subject to copyright. Terms and conditions apply.
Confusion matrix of the classification results with the proposed batch balancing for an initial training with four examples per class. The classes are the three gestures (fist (1), hand (2), and thumbs-up (3)), and the two movement sequences (turning the monitor on/off (4) and putting on the headphones (5)) in addition to the do nothing class 0

Confusion matrix of the classification results with the proposed batch balancing for an initial training with four examples per class. The classes are the three gestures (fist (1), hand (2), and thumbs-up (3)), and the two movement sequences (turning the monitor on/off (4) and putting on the headphones (5)) in addition to the do nothing class 0

Source publication
Article
Full-text available
Pervasive computing environments deliver a multitude of possibilities for human–computer interactions. Modern technologies, such as gesture control or speech recognition, allow different devices to be controlled without additional hardware. A drawback of these concepts is that gestures and commands need to be learned. We propose a system that is ab...

Citations

... In addition, image based eye tracking is very accurate but still has its limitations under challenging settings like near infrared reflections on glassesor in adequate lightning conditions [17]. In the past years there were multiple open source tools published [19,38,15,24,31] which can be used freely for eye tracking. Additionally, multiple algorithms for pupil [9,10,11], eyelid [19], and iris [13,12,14,19] extraction were published which also lead to improved eye trackers from the industry [24,43,20]. ...
Preprint
In this paper, we present two approaches and algorithms that adapt areas of interest We present a new deep neural network (DNN) that can be used to directly determine gaze position using EEG data. EEG-based eye tracking is a new and difficult research topic in the field of eye tracking, but it provides an alternative to image-based eye tracking with an input data set comparable to conventional image processing. The presented DNN exploits spatial dependencies of the EEG signal and uses convolutions similar to spatial filtering, which is used for preprocessing EEG signals. By this, we improve the direct gaze determination from the EEG signal compared to the state of the art by 3.5 cm MAE (Mean absolute error), but unfortunately still do not achieve a directly applicable system, since the inaccuracy is still significantly higher compared to image-based eye trackers. Link: https://es-cloud.cs.uni-tuebingen.de/d/8e2ab8c3fdd444e1a135/?p=%2FEEGGaze&mode=list
... In the study of Fuhl [41] the computer learns to execute an action by receiving visual input and the status of the action. Users are visually observed and paired to their actions (on or off decisions), enabling them to perform gestures in front of the camera and execute an action on the computer (opening an application, pressing a key). ...
Article
Full-text available
People with speech and motor impairments may experience difficulties in interaction and learning, among other situations that can lead to emotional, social, and cognitive problems. Augmentative and alternative communication (AAC) is a research area that involves using non-oral modes as a complement or substitute for spoken language. The AAC supported by computer vision (CV) systems can benefit from recognizing the user’s remaining functional movements as an alternative design approach to interaction. The complete MyPGI, Methodology to yield Personalized Gestural Interaction, is presented. MyPGI guides the design of AAC systems for people with motor and speech difficulties, using CV techniques and machine learning to enable personalized and noninvasive gestural interaction. The MyPGI methodology was used to develop an AAC system, named PGCA (Personal Gesture Communication Assistant), employing a low-cost approach, used in experiments conducted with volunteers, including students with motor and speech difficulties. Experiments, interviews, and usability evaluation were conducted to evaluate the feasibility of the methodology and the system developed. The results suggest the methodology as promising to support the design of AAC systems capable of enabling personalized gestural interaction, also showing benefits of this approach, technical challenges, and means to overcome them. The results also add knowledge about specific challenges and needs of the target audience. The MyPGI methodology, developed after several iterations and evaluations, is capable to support the design of AAC systems that enable personalized gestural interaction. This article presents an overview of the methodological steps performed, results obtained, and future perspectives for the methodology.
... DNNs [30] have found their way into a variety of fields [1,27]. In eye tracking [31,49], they are already used for scanpath analysis [32,33,4], as well as other approaches based on machine learning [24,2,3,12], feature extraction [49] such as pupil [28,18,25,48,47,46,11,26,16,10,8,45,6], iris [9,19,5] and eyelids [22,21,23], eyeball estimation [7], and also for eye movement classification [41,42,14]. ...
Preprint
In this report, we combine the idea of Wide ResNets and transfer learning to optimize the architecture of deep neural networks. The first improvement of the architecture is the use of all layers as information source for the last layer. This idea comes from transfer learning, which uses networks pre-trained on other data and extracts different levels of the network as input for the new task. The second improvement is the use of deeper layers instead of deeper sequences of blocks. This idea comes from Wide ResNets. Using both optimizations, both high data augmentation and standard data augmentation can produce better results for different models. Link: https://github.com/wolfgangfuhl/PublicationStuff/tree/master/TechnicalReport1/Supp
... Somit liegt der Fokus auf Visualisierungen für die Klassifikation. Natürlich gibt es noch viele weitere Visualisierungen, welche speziell für Verhalten [51], Regression [52,53,54,55], die Validierung [56,57] oder große Datenmengen [58,59] geeignet sind. Dies übersteigt aber den Rahmen dieser Arbeit wodurch wir uns hier nur auf die wichtigste für diese Arbeit konzentrieren. ...
Thesis
Full-text available
In dieser Arbeit wurde eine Studie zur automatischen Klassifikation von Mausdaten nach Tätigkeit und Nutzer durchgeführt. Hierfür wurde zuerst die Software Matlab vorgestellt, die zur Aufbereitung und Auswertung der Daten Verwendung fand. Anschließend wurden die Klassifikationsverfahren erläutert, die die besten Ergebnisse erzielten. Ebenfalls wurde die Konfusionsmatrix eingeführt, die zur Darstellung der Klassifikationsergebnisse verwendet wurde. Außerdem wurden einige Genauigkeitsmetriken vorgestellt, die zur Auswertung eingesetzt wurden. Nachfolgend wurden verwandte Arbeiten genannt, deren Ergebnisse kurz erklärt und ebenfalls wurde auf den Zusammenhang zwischen Blick- und Maussignal hingewiesen. Für die Studie wurden die gängigen Signale der Maus aufgenommen, während die Probanden verschiedene Tasks im Browser ausführten. Die Signale bestanden aus den Cursorpositionen auf dem Bildschirm, Links-, Rechts- und Mausradklicks, sowie aus Hoch- und Runterscrollen. Aus den entstandenen TXT-Dateien konnten einerseits die Cursorverläufe und die weiteren Maussignale als Diagramme grafisch dargestellt werden und andererseits konnten sie in verschiedene Matlab-Datenvektoren umgewandelt werden. Mit den Datenvektoren wurden maschinelle Lernmethoden für die Klassifikationen trainiert und getestet. Es entstanden sowohl nach Tasks als auch nach Probanden gute Ergebnisse bei den Klassifikationen. Mit weiteren Optimierungen bei der Daten-Vorverarbeitung aber auch am maschinellen Lernmodell, können sicherlich noch viel bessere Ergebnisse erzielt werden. Ein Beispiel hierfür wäre die Neuaufteilung der Task-Kategorien, oder durch das Einsetzen größerer Datensätze. Während die rohen Mausdaten für die Bewertung von Webseiten eingesetzt werden können, sind mit den Matlab-Datenvektoren und dem Einsatz der Klassifikationsverfahren Anwendungen für die Marktanalyse und die Identifikation der Person am Rechner möglich.
... Neben der bereits beschriebenen Studie, in der ebenfalls Webpräsenzen von Nachrichtenagenturen analysiert wurden [64], ist die Analyse von Blickverhalten auf Webseiten mittels Eye Tracking weit verbreitet [59,66,87,91,92]. Somit gilt Eye Tracking als ein elementarer Bestandteil der aktuellen Mensch-Computer-Interaktion-Forschung [37] und ist sehr relevant für die Analyse und schließlich die Verbesserung von Webseiten. ...
Thesis
Full-text available
Die tägliche Interaktion mit Webseiten ist für die meisten Menschen heut- zutage nicht mehr wegzudenken. Die Intention der Webseiten ist es dabei, den entsprechenden Inhalt zu vermitteln und eine nutzerfreundliche Atmosphäre zu etablieren. Um die Struktur von Webseiten zu bewerten, eignen sich Eye Tracker, mit welchen das Blickverhalten auf Webseiten nachvollzogen werden kann. Ziel dieser Thesis ist es, mithilfe von Eye-Tracking-Daten die Strukturen von Bild.de, der Spiegel und der Tagesschau auf Unterscheidbarkeit zu untersuchen. Dazu werden die in Form von Koordinaten gesammelten Blickdaten zunächst in Sakkaden (kurze, sprunghafte Augenbewegungen) und Fixationen (Augen fixieren ein Objekt) gruppiert und anschließend mittels verschiedener Diagramme visualisiert, evaluiert und bewertet. Die aufgestellten Hypothesen werden mit maschinellen Lernmethoden überprüft.
... The application fields are image classification [23,25,47,48,33,8,9,18,20,21], semantic segmentation [15,27,11], landmark regression [45,19,49], object detection [51,17,34,24,16,14,12,36], and many more. In the real world, this concerns autonomous driving, human-machine interaction [38,10,35], eye tracking [7,30,29,31,26,32,53,52,13,50], robot control, facial recognition, medical diagnostic systems, and many other areas [39,28,46]. In all these areas, the accuracy, reliability, and provability of the networks is very important and thus a focus of current research in machine learning [22,40,42,43,41,37]. ...
Preprint
In this work, we introduce pixel wise tensor normalization, which is inserted after rectifier linear units and, together with batch normalization, provides a significant improvement in the accuracy of modern deep neural networks. In addition, this work deals with the robustness of networks. We show that the factorized superposition of images from the training set and the reformulation of the multi class problem into a multi-label problem yields significantly more robust networks. The reformulation and the adjustment of the multi class log loss also improves the results compared to the overlay with only one class as label. https://atreus.informatik.uni-tuebingen.de/seafile/d/8e2ab8c3fdd444e1a135/?p=%2FTNandFDT&mode=list
... Eye trackers can be applied to a wide range of fields that include expertise determination [Chandrika et al. 2019], human computer interaction [Fuhl 2020], human robot interaction [Chadalavada et al. 2020], improved remote assistance [Špakov et al. 2019], facilitating the work of surgeons [Di Stasi et al. 2014;Fuhl et al. 2017], and much more. Due to this variety of possible applications, eye trackers must perform reliably under a multitude of conditions, which creates many challenges in image processing [Fuhl 2019] as well as in eye movement classification [Fuhl et al. 2020c]. ...
... These advancements, especially through the introduction of convolutions [68], residual blocks [58], and memory functionality [60], have led to deep neural networks being the de facto standard approach for many areas of algorithm development today. This has led to major advances in the areas of security [39,29], computer vision [56,27,33,52,51,50,18,35,25,17,15,13,37,31,30,32,8,16,28,12], speech recognition [71], pattern recognition [2,36,38,11], validation [20,48], human computer interaction [5,14,49,46,47,34,9,10,19,21,22], perception understanding [45,24,26], and big data processing [1,44]. The application areas of deep neural networks in modern times are Autonomous Driving [61], Gaze Estimation [84], Collision Detection [7], Industrial Algorithm Development [69], Tumor Detection [76], Person Identification [83], Text Translation [74], Image Generation [55], Quality Enhancement of Images, and many more. ...
Preprint
In this work, we present an alternative to conventional residual connections, which is inspired by maxout nets. This means that instead of the addition in residual connections, our approach only propagates the maximum value or, in the leaky formulation, propagates a percentage of both. In our evaluation, we show on different public data sets that the presented approaches are comparable to the residual connections and have other interesting properties, such as better generalization with a constant batch normalization, faster learning, and also the possibility to generalize without additional activation functions. In addition, the proposed approaches work very well if ensembles together with residual networks are formed.
... Eye-Tracking refers to the recording and analysis [FKB + 18, FKS + 15a, FBH + 19, FCK + 19] of eye movements [FRE20a, FRE20b, FSK + 18] and can be used in any area where the eyes are used [CY13], it is therefore an interdisciplinary research field. Eye-Tracking studies are used among others in fields of psychology [NS04], user identification [FBK20,FSK21], cognitive studies[LTY + 13], advertisement and marketing [WP08a,WP08b], neuroscience[WSP + 10] and Human-Computer Interaction [JK03,MB14,Fuh20]. This research is located in the field of Human-Computer Interaction, it is about the classification of attention based on Eye-Tracking [FKSK18]. ...
Thesis
Full-text available
This bachelor thesis is about improving attention classification in the context of naturalistic driving. To classify the driver's attention, gaze and head orientation based on the angles to the x- and y-axis were used. The dataset was recorded with appearance-based Eye-Tracking and provided by the Human-Computer Interaction department of the Eberhard Karls University Tübingen. To investigate possible improvements in the classification, two methods were used. The first method, namely Angles, is based on linking the angles of the gaze and head position and then sorting these angles according to their size in ascending order for each segment. In addition, each segment receives an attention label and thus becomes an object that has to be classified. The second method examines whether the use of heatmaps leads to improvements in the classification of the segments. Before the heatmaps are calculated, the angles of the gaze and head orientation are also linked and in addition normalized to $\pi$ or rather $\pi/2$ in order to limit the area for which the heatmaps and thus the frequency of different pairs of angles within a segment are calculated. Furthermore, histograms were created to study the mean frequency of individual angles in attentive and inattentive segments. \\ The application of the methods to the raw data set and the subsequent training of the classifiers showed in the evaluation that the Angles method led to an improvement in the detection of inattentive segments in two of the four classifiers used and to an improved training time for all classifiers. The use of heat maps has led to an improved training time as well, but in a deterioration in classification accuracy. The result was the almost constant classification of all segments as attentive.
... Since our actions and intentions can be recognized and -to a certain degree -anticipated from the way we move our eyes, eye movement analysis can enable completely new applications, especially when coupled with modern display technologies like VR or AR. For example, the gaze signal, together with the associated possibility of human-machine interaction [22], enables people with disabilities to interact with their environment through the use of special devices tailored to the patient's disability [1]. In the case of surgical microscopes where the surgeon has to operate a multitude of controls, the visual signal can be used for automatic focusing [55,60]. ...
Preprint
Full-text available
We present TEyeD, the world's largest unified public data set of eye images taken with head-mounted devices. TEyeD was acquired with seven different head-mounted eye trackers. Among them, two eye trackers were integrated into virtual reality (VR) or augmented reality (AR) devices. The images in TEyeD were obtained from various tasks, including car rides, simulator rides, outdoor sports activities, and daily indoor activities. The data set includes 2D\&3D landmarks, semantic segmentation, 3D eyeball annotation and the gaze vector and eye movement types for all images. Landmarks and semantic segmentation are provided for the pupil, iris and eyelids. Video lengths vary from a few minutes to several hours. With more than 20 million carefully annotated images, TEyeD provides a unique, coherent resource and a valuable foundation for advancing research in the field of computer vision, eye tracking and gaze estimation in modern VR and AR applications. Data and code at https://unitc-my.sharepoint.com/:f:/g/personal/iitfu01_cloud_uni-tuebingen_de/EvrNPdtigFVHtCMeFKSyLlUBepOcbX0nEkamweeZa0s9SQ?e=fWEvPp