Confusion matrix of the classification results with the proposed batch balancing for an initial training with four examples per class. The classes are the three gestures (fist (1), hand (2), and thumbs-up (3)), and the two movement sequences (turning the monitor on/off (4) and putting on the headphones (5)) in addition to the do nothing class 0

Source publication

Five different user movements of the same subject using our web camera

The left plot shows the average training time per training-phase, and...

Starting a browser while performing a gesture starts the data...

Confusion matrix of the classification results with the proposed batch...

From perception to action using observed actions to learn gestures

Article

Full-text available

Mar 2021

Wolfgang Fuhl

Pervasive computing environments deliver a multitude of possibilities for human–computer interactions. Modern technologies, such as gesture control or speech recognition, allow different devices to be controlled without additional hardware. A drawback of these concepts is that gestures and commands need to be learned. We propose a system that is ab...

One step closer to EEG based eye tracking

Preprint

Mar 2023

In this paper, we present two approaches and algorithms that adapt areas of interest We present a new deep neural network (DNN) that can be used to directly determine gaze position using EEG data. EEG-based eye tracking is a new and difficult research topic in the field of eye tracking, but it provides an alternative to image-based eye tracking with an input data set comparable to conventional image processing. The presented DNN exploits spatial dependencies of the EEG signal and uses convolutions similar to spatial filtering, which is used for preprocessing EEG signals. By this, we improve the direct gaze determination from the EEG signal compared to the state of the art by 3.5 cm MAE (Mean absolute error), but unfortunately still do not achieve a directly applicable system, since the inaccuracy is still significantly higher compared to image-based eye trackers. Link: https://es-cloud.cs.uni-tuebingen.de/d/8e2ab8c3fdd444e1a135/?p=%2FEEGGaze&mode=list

MyPGI - a methodology to yield personalized gestural interaction

Article

Full-text available

Jan 2023
Univers Access Inform Soc

People with speech and motor impairments may experience difficulties in interaction and learning, among other situations that can lead to emotional, social, and cognitive problems. Augmentative and alternative communication (AAC) is a research area that involves using non-oral modes as a complement or substitute for spoken language. The AAC supported by computer vision (CV) systems can benefit from recognizing the user’s remaining functional movements as an alternative design approach to interaction. The complete MyPGI, Methodology to yield Personalized Gestural Interaction, is presented. MyPGI guides the design of AAC systems for people with motor and speech difficulties, using CV techniques and machine learning to enable personalized and noninvasive gestural interaction. The MyPGI methodology was used to develop an AAC system, named PGCA (Personal Gesture Communication Assistant), employing a low-cost approach, used in experiments conducted with volunteers, including students with motor and speech difficulties. Experiments, interviews, and usability evaluation were conducted to evaluate the feasibility of the methodology and the system developed. The results suggest the methodology as promising to support the design of AAC systems capable of enabling personalized gestural interaction, also showing benefits of this approach, technical challenges, and means to overcome them. The results also add knowledge about specific challenges and needs of the target audience. The MyPGI methodology, developed after several iterations and evaluations, is capable to support the design of AAC systems that enable personalized gestural interaction. This article presents an overview of the methodological steps performed, results obtained, and future perspectives for the methodology.

Technical Report: Combining knowledge from Transfer Learning during training and Wide Resnets

Preprint

Jun 2022

Wolfgang Fuhl

In this report, we combine the idea of Wide ResNets and transfer learning to optimize the architecture of deep neural networks. The first improvement of the architecture is the use of all layers as information source for the last layer. This idea comes from transfer learning, which uses networks pre-trained on other data and extracts different levels of the network as input for the new task. The second improvement is the use of deeper layers instead of deeper sequences of blocks. This idea comes from Wide ResNets. Using both optimizations, both high data augmentation and standard data augmentation can produce better results for different models. Link: https://github.com/wolfgangfuhl/PublicationStuff/tree/master/TechnicalReport1/Supp

Bewertung von Maussignalen zur Taskklassifikation unter Verwendung von maschinellem Lernen

Thesis

Full-text available

Nov 2021

Pierre Didier Serotzki

In dieser Arbeit wurde eine Studie zur automatischen Klassifikation von Mausdaten nach Tätigkeit und Nutzer durchgeführt. Hierfür wurde zuerst die Software Matlab vorgestellt, die zur Aufbereitung und Auswertung der Daten Verwendung fand. Anschließend wurden die Klassifikationsverfahren erläutert, die die besten Ergebnisse erzielten. Ebenfalls wurde die Konfusionsmatrix eingeführt, die zur Darstellung der Klassifikationsergebnisse verwendet wurde. Außerdem wurden einige Genauigkeitsmetriken vorgestellt, die zur Auswertung eingesetzt wurden. Nachfolgend wurden verwandte Arbeiten genannt, deren Ergebnisse kurz erklärt und ebenfalls wurde auf den Zusammenhang zwischen Blick- und Maussignal hingewiesen. Für die Studie wurden die gängigen Signale der Maus aufgenommen, während die Probanden verschiedene Tasks im Browser ausführten. Die Signale bestanden aus den Cursorpositionen auf dem Bildschirm, Links-, Rechts- und Mausradklicks, sowie aus Hoch- und Runterscrollen. Aus den entstandenen TXT-Dateien konnten einerseits die Cursorverläufe und die weiteren Maussignale als Diagramme grafisch dargestellt werden und andererseits konnten sie in verschiedene Matlab-Datenvektoren umgewandelt werden. Mit den Datenvektoren wurden maschinelle Lernmethoden für die Klassifikationen trainiert und getestet. Es entstanden sowohl nach Tasks als auch nach Probanden gute Ergebnisse bei den Klassifikationen. Mit weiteren Optimierungen bei der Daten-Vorverarbeitung aber auch am maschinellen Lernmodell, können sicherlich noch viel bessere Ergebnisse erzielt werden. Ein Beispiel hierfür wäre die Neuaufteilung der Task-Kategorien, oder durch das Einsetzen größerer Datensätze. Während die rohen Mausdaten für die Bewertung von Webseiten eingesetzt werden können, sind mit den Matlab-Datenvektoren und dem Einsatz der Klassifikationsverfahren Anwendungen für die Marktanalyse und die Identifikation der Person am Rechner möglich.

B.Sc. Thesis: Blickdaten basierte Strukturbewertungen moderner Webpräsenzen von Nachrichtenagenturen und Überprüfung der aufgestellten Hypothesen mit maschinellem Lernen

Thesis

Full-text available

Nov 2021

Catalina Schlotterer

Die tägliche Interaktion mit Webseiten ist für die meisten Menschen heut- zutage nicht mehr wegzudenken. Die Intention der Webseiten ist es dabei, den entsprechenden Inhalt zu vermitteln und eine nutzerfreundliche Atmosphäre zu etablieren. Um die Struktur von Webseiten zu bewerten, eignen sich Eye Tracker, mit welchen das Blickverhalten auf Webseiten nachvollzogen werden kann. Ziel dieser Thesis ist es, mithilfe von Eye-Tracking-Daten die Strukturen von Bild.de, der Spiegel und der Tagesschau auf Unterscheidbarkeit zu untersuchen. Dazu werden die in Form von Koordinaten gesammelten Blickdaten zunächst in Sakkaden (kurze, sprunghafte Augenbewegungen) und Fixationen (Augen fixieren ein Objekt) gruppiert und anschließend mittels verschiedener Diagramme visualisiert, evaluiert und bewertet. Die aufgestellten Hypothesen werden mit maschinellen Lernmethoden überprüft.

Tensor Normalization and Full Distribution Training

Preprint

Sep 2021

Wolfgang Fuhl

In this work, we introduce pixel wise tensor normalization, which is inserted after rectifier linear units and, together with batch normalization, provides a significant improvement in the accuracy of modern deep neural networks. In addition, this work deals with the robustness of networks. We show that the factorized superposition of images from the training set and the reformulation of the multi class problem into a multi-label problem yields significantly more robust networks. The reformulation and the adjustment of the multi class log loss also improves the results compared to the overlay with only one class as label. https://atreus.informatik.uni-tuebingen.de/seafile/d/8e2ab8c3fdd444e1a135/?p=%2FTNandFDT&mode=list

A Multimodal Eye Movement Dataset and a Multimodal Eye Movement Segmentation Analysis

Conference Paper

Full-text available

May 2021

Maximum and Leaky Maximum Propagation

Preprint

May 2021

Wolfgang Fuhl

In this work, we present an alternative to conventional residual connections, which is inspired by maxout nets. This means that instead of the addition in residual connections, our approach only propagates the maximum value or, in the leaky formulation, propagates a percentage of both. In our evaluation, we show on different public data sets that the presented approaches are comparable to the residual connections and have other interesting properties, such as better generalization with a constant batch normalization, faster learning, and also the possibility to generalize without additional activation functions. In addition, the proposed approaches work very well if ensembles together with residual networks are formed.

Improving driver attention classification using different features for machine learning approaches

Thesis

Full-text available

Apr 2021

Fadi Al-Kayid

This bachelor thesis is about improving attention classification in the context of naturalistic driving. To classify the driver's attention, gaze and head orientation based on the angles to the x- and y-axis were used. The dataset was recorded with appearance-based Eye-Tracking and provided by the Human-Computer Interaction department of the Eberhard Karls University Tübingen. To investigate possible improvements in the classification, two methods were used. The first method, namely Angles, is based on linking the angles of the gaze and head position and then sorting these angles according to their size in ascending order for each segment. In addition, each segment receives an attention label and thus becomes an object that has to be classified. The second method examines whether the use of heatmaps leads to improvements in the classification of the segments. Before the heatmaps are calculated, the angles of the gaze and head orientation are also linked and in addition normalized to $\pi$ or rather $\pi/2$ in order to limit the area for which the heatmaps and thus the frequency of different pairs of angles within a segment are calculated. Furthermore, histograms were created to study the mean frequency of individual angles in attentive and inattentive segments. \\ The application of the methods to the raw data set and the subsequent training of the classifiers showed in the evaluation that the Angles method led to an improvement in the detection of inattentive segments in two of the four classifiers used and to an improved training time for all classifiers. The use of heat maps has led to an improved training time as well, but in a deterioration in classification accuracy. The result was the almost constant classification of all segments as attentive.

TEyeD: Over 20 million real-world eye images with Pupil, Eyelid, and Iris 2D and 3D Segmentations, 2D and 3D Landmarks, 3D Eyeball, Gaze Vector, and Eye Movement Types

Preprint

Full-text available

Feb 2021

We present TEyeD, the world's largest unified public data set of eye images taken with head-mounted devices. TEyeD was acquired with seven different head-mounted eye trackers. Among them, two eye trackers were integrated into virtual reality (VR) or augmented reality (AR) devices. The images in TEyeD were obtained from various tasks, including car rides, simulator rides, outdoor sports activities, and daily indoor activities. The data set includes 2D\&3D landmarks, semantic segmentation, 3D eyeball annotation and the gaze vector and eye movement types for all images. Landmarks and semantic segmentation are provided for the pupil, iris and eyelids. Video lengths vary from a few minutes to several hours. With more than 20 million carefully annotated images, TEyeD provides a unique, coherent resource and a valuable foundation for advancing research in the field of computer vision, eye tracking and gaze estimation in modern VR and AR applications. Data and code at https://unitc-my.sharepoint.com/:f:/g/personal/iitfu01_cloud_uni-tuebingen_de/EvrNPdtigFVHtCMeFKSyLlUBepOcbX0nEkamweeZa0s9SQ?e=fWEvPp

Citations