Figure - available from: Frontiers in Human Neuroscience
This content is subject to copyright.
Facial landmarks and pose variants. (A) The red dots denote the landmarks tracked by DLC. (B) Schematic of the three different poses used to train and evaluate GazeNet (human head model design obtained from Thingiverse.com, #758647, designed by lehthanis, license: CC BY-SA 3.0, modified neck, added textures).

Facial landmarks and pose variants. (A) The red dots denote the landmarks tracked by DLC. (B) Schematic of the three different poses used to train and evaluate GazeNet (human head model design obtained from Thingiverse.com, #758647, designed by lehthanis, license: CC BY-SA 3.0, modified neck, added textures).

Source publication
Article
Full-text available
Real-time gaze tracking provides crucial input to psychophysics studies and neuromarketing applications. Many of the modern eye-tracking solutions are expensive mainly due to the high-end processing hardware specialized for processing infrared-camera pictures. Here, we introduce a deep learning-based approach which uses the video frames of low-cost...

Citations

... We have recently shown that an adaptation of current deep-learning methodologies for eye tracking 48,49 , together with a dual-camera system, can determine gaze to discriminate specific facial areas and quantify movement synchronization 42 . The camera system has been developed with neurodevelopmental disorders in mind; it is minimal, does not include any wearables, such as eye-tracking glasses, and is located on a table between two individuals without covering the face of the person in front of them, see Fig. 1. ...
... Future research could consider recruiting dyads, which would assist in controlling this potential influence but would also include other aspects of variability. In terms of strengths, as previously discussed, the camera system used in our study 42 was developed using a deep-learning methodology which is relatively robust to movement 42,49 . It was also set up to explore patterns of mutual eye gaze in people with varying levels of neurodevelopmental traits, meaning that it was intended to be minimally distracting, and did not include any wearable components or additional cameras in the room 3,69 . ...
Article
Full-text available
Eye contact is a central component in face-to-face interactions. It is important in structuring communicative exchanges and offers critical insights into others' interests and intentions. To better understand eye contact in face-to-face interactions, we applied a novel, non-intrusive deep-learning-based dual-camera system and investigated associations between eye contact and autistic traits as well as self-reported eye contact discomfort during a referential communication task, where participants and the experimenter had to guess, in turn, a word known by the other individual. Corroborating previous research, we found that participants’ eye gaze and mutual eye contact were inversely related to autistic traits. In addition, our findings revealed different behaviors depending on the role in the dyad: listening and guessing were associated with increased eye contact compared with describing words. In the listening and guessing condition, only a subgroup who reported eye contact discomfort had a lower amount of eye gaze and eye contact. When describing words, higher autistic traits were associated with reduced eye gaze and eye contact. Our data indicate that eye contact is inversely associated with autistic traits when describing words, and that eye gaze is modulated by the communicative role in a conversation.
... Zdarsky et al. [13] introduced a deep learning-based approach that uses the video frames of low-cost web cameras. Using DeepLabCut (DLC), an open-source toolbox for extracting points of interest from videos, they obtained facial landmarks critical to gaze location and estimated the point of gaze on a computer screen via a shallow neural network. ...
Article
Full-text available
The Efficient Convolution Operators for Tracking (ECO) algorithm has garnered considerable attention in both academic research and practical applications due to its remarkable tracking efficacy, yielding exceptional accuracy and success rates in various challenging contexts. However, the ECO algorithm heavily relies on the deep learning Visual Geometry Group (VGG) network model, which entails complexity and substantial computational resources. Moreover, its performance tends to deteriorate in scenarios involving target occlusion, background clutter, and similar challenges. To tackle these issues, this study introduces a novel enhancement to the pedestrian tracking algorithm. Specifically, the VGG network is substituted with a lightweight MobileNet v2 model, thereby reducing computational demands. Additionally, a Double Attention Networks (A2-Net) module is incorporated to augment the extraction of crucial information, while pre-training techniques are integrated to expedite model convergence. Experimental results demonstrate that the C-ECO algorithm achieves comparable accuracy and success rates to the conventional ECO algorithm, despite reducing the model size by 27.96% and increasing the tracking frame rate by 46.11%. Notably, when compared to other prevalent tracking algorithms, the C-ECO algorithm exhibits an accuracy of 82.20% and a success rate of 64.72%. These findings underscore the enhanced adaptability of the C-ECO algorithm in complex environments, offering a more lightweight model while delivering superior tracking capabilities.
... The process was based on a convolutional neural network (CNN) learning method using eye and face images to predict the point of gaze on the screen. Similarly, [15], [42], [43] also used a method involving predicting the point of gaze on the screen by using annotating screen coordinates. Huang et al. [44] proposed a regressionbased learning method that used hand-crafted features from eye images. ...
... Arvin et al. [47] reported an approach involving using a contour detector to fit ellipsoids on 300 × 300 eye images for pupil center estimation. DeepLabCut [42], [48] has been used to annotate and detect pupil centers using deep learning models such as Resnet50. George et al. [49] implemented a geometrical eye center localization method dependent on fast convolution and ellipse fitting. ...
Article
The ability to perform quantitative and automated neurological assessment could enhance diagnosis and treatment in the pre-hospital setting, such as during telemedicine or emergency medical services (EMS) encounters. Such a tool could be developed by adapting clinically significant information such as symmetry of eye movement or conjugate eye movement. Here we describe a digital camera-based eye tracking method “NeuroGaze” to capture the symmetry of eye movement while performing neurological eye examination. The proposed method was developed based on detecting the center of the pupil for both eyes from a given video and measuring eye conjugacy by transforming the pupil center coordinates to relative gaze. The method was tested on healthy volunteers while performing three neurological eye examinations <sup>1</sup> <sup>1</sup> The NeuroEye dataset is made available at https://www.kaggle.com/datasets/mahassan8/neuroeye . We also compared our proposed approach to state-of-the-art digital camera-based eye-tracking methods and commercial off-the-shelf (COTS) eye trackers. NeuroGaze outperformed digital camera-based eye tracking methods by reporting a mean Spearman rank-order correlation coefficient of 0.86 for the H-test, 0.87 for the Dot-test, and 0.56 for the OKN-Test, and shows similarity in trends for the relative gaze trajectories with a noticeable offset in the scale of the relative gaze angle compared to COTS eye tracker (see Fig. 1). The study demonstrates that by using a pupil-center-based eye-tracking method, a digital camera can measure clinically relevant information regarding eye movement.
... They acquired facial cues essential to gaze placement using DeepLabCut (DLC), an open-source toolset for collecting points of interest from films, then used a shallow neural network to predict the point of glance on a computer screen. The design in [15] achieved a median inaccuracy of around one degree of visual angle when tested for three extreme positions. The findings establish the groundwork for more research by scientists studying psychophysics or neuro-marketing in the field of deep learning techniques to eye tracking. ...
Article
We might have encountered some people who suffer from paralysis, had just undergone surgery, or have strange syndromes (such as Locked-In, Quadriplegia, etc.). They might not be able to speak or move their body. People in these circumstances may find it difficult to communicate, which could have an impact on their health. Consequently, a system that can inform the care taker of their basic needs is necessary. As a result, we can develop a method by which the patient can readily communicate his or her needs through eye gazes. To link these movements with an alphabet that is predefined for each purpose, we can apply the idea of Morse code. This alphabet is then used to find the need in the dataset and create a visual and auditory alert for the nurse/care taker.
... Kalman filtering was applied to all neural network outputs to make facial landmarks and pose estimations closer to reality (Diaz Barros, Mirbach, Garcia, Varanasi, & Stricker, 2019). The bounding boxes were detected using YOLOv4 (Bochkovskiy, Wang, & Liao, 2020), and enhanced by a convolutional autoencoder for center estimation, an approach that has shown excellent accuracy for pupil segmentation (Zdarsky, Treue, & Esghaei, 2021). ...
... The video that was not used in neural network training served for the unbiased evaluation of the final model. Data points were classified as eyeblinks or outliers, obtained by a likelihood threshold in line with the procedures of deep learning eye tracking (Zdarsky et al., 2021), together represented 7.95% of the data. ...
... Our results demonstrate that i+i, our novel dual-camera system, can estimate gaze to an angular accuracy of approximately 2 degrees from both partners in face-to-face interaction while addressing some of the drawbacks of previous solutions. This is comparable with current deep learning eye tracking using one camera (Rakhmatulin & Duchowski, 2020;Zdarsky et al., 2021), and seems sufficient to accurately differentiate gaze directed towards different parts of the face of each interlocutor during face-to-face interaction. ...
Article
Full-text available
Quantification of face-to-face interaction can provide highly relevant information in cognitive and psychological science research. Current commercial glint-dependent solutions suffer from several disadvantages and limitations when applied in face-to-face interaction, including data loss, parallax errors, the inconvenience and distracting effect of wearables, and/or the need for several cameras to capture each person. Here we present a novel eye-tracking solution, consisting of a dual-camera system used in conjunction with an individually optimized deep learning approach that aims to overcome some of these limitations. Our data show that this system can accurately classify gaze location within different areas of the face of two interlocutors, and capture subtle differences in interpersonal gaze synchrony between two individuals during a (semi-)naturalistic face-to-face interaction.
... Literature review shows that ML methods are extensively used for forecasting problems; however, CO 2 solubility based on developed ML techniques has been rarely studied. The modeling techniques based on Bayesian NNs are investigated in [12,13], and the performance of Bayesian NNs is examined in driving problems. ...
Article
Full-text available
In many problems, to analyze the process/metabolism behavior, a model of the system is identified. The main gap is the weakness of current methods vs. noisy environments. The primary objective of this study is to present a more robust method against uncertainties. This paper proposes a new deep learning scheme for modeling and identification applications. The suggested approach is based on non-singleton type-3 fuzzy logic systems (NT3-FLSs) that can support measurement errors and high-level uncertainties. Besides the rule optimization, the antecedent parameters and the level of secondary memberships are also adjusted by the suggested square root cubature Kalman filter (SCKF). In the learning algorithm, the presented NT3-FLSs are deeply learned, and their nonlinear structure is preserved. The designed scheme is applied for modeling carbon capture and sequestration problem using real-world data sets. Through various analyses and comparisons, the better efficiency of the proposed fuzzy modeling scheme is verified. The main advantages of the suggested approach include better resistance against uncertainties, deep learning, and good convergence.
... Several algorithms have been proposed in literature to automatically extract eye blinking from video, once the facial landmarks related to the eyes have been identified [88][89][90]. In this work, the algorithm based on the Eye Aspect Ratio (EAR) proposed in [91] was employed to capture blinking from the facial landmarks tracked by GMFM. ...
Article
Full-text available
Physical and cognitive rehabilitation is deemed crucial to attenuate symptoms and to improve the quality of life in people with neurodegenerative disorders, such as Parkinson’s Disease. Among rehabilitation strategies, a novel and popular approach relies on exergaming: the patient performs a motor or cognitive task within an interactive videogame in a virtual environment. These strategies may widely benefit from being tailored to the patient’s needs and engagement patterns. In this pilot study, we investigated the ability of a low-cost BCI based on single-channel EEG to measure the user’s engagement during an exergame. As a first step, healthy subjects were recruited to assess the system’s capability to distinguish between (1) rest and gaming conditions and (2) gaming at different complexity levels, through Machine Learning supervised models. Both EEG and eye-blink features were employed. The results indicate the ability of the exergame to stimulate engagement and the capability of the supervised classification models to distinguish resting stage from game-play (accuracy > 95%). Finally, different clusters of subject responses throughout the game were identified, which could help define models of engagement trends. This result is a starting point in developing an effectively subject-tailored exergaming system.
... The reason behind the lower number of mobile eye tracking devices is the nonaffordability of pupil localization algorithms. Since mobile devices are space-restricted devices and have no multiple cameras or infrared light sources, an affordable pupil localization algorithm is a must to launch new pupil tracking applications on mobile devices [75]. ...
... Machine Learning (ML) lets machines improve over time. Deep learning [75] uses multi-layered artificial neural networks to analyze and learn from data. These methods mimic the brain's analysis and produce more complex predictions and conclusions as the neural network's layers are added. ...
... High accuracy, flexibility in using different types of video recordings, cost-effectiveness, a userfriendly interface, open-source, and scalability in DeepLab-Cut make it more prevalent among different research areas. With the help of DeepLabCut and low-cost web cameras, pupils can be extracted [75]. It helps eliminate the use of high-end processing devices to process infrared camera pictures since low-cost web cameras are used for videos. ...
Article
Full-text available
Pupil localization extracts pupil center coordinates from images and videos of the human eye along with the pupillary boundary. Pupil localization essentially plays a major role in identity verification, disease recognition, visual focus of attention (VFOA) tracking, dementia cognitive assessment, and human fatigue detection. However, the process of pupil localization still remains challenging due to various factors, such as poor-quality images, eye makeup, contact lenses, eyelashes, hair strips, eyebrows, closed eyes, and eye saccades. The pupil localization strategies are essentially divided into learning-based and non-learning-based approaches and discussed in detail with the relevant techniques used. This article aims to deliver the essence of current trends in pupil localization and critically discusses the advantages and disadvantages of each method. Hence, this article can be useful to a broad spectrum of readers as a guide to analyze the latest trends in pupil localization.
... Nikhlas et al. worked on deep learning-based video gaze tracking to capture crucial eyelid positions before the eye-blinking event occurs. The authors achieved a correlation between the events before and after blinking [25]. A similar approach was executed by Ildar et al., where neural boundaries captured the relative face and pupil positions while the subjects were asked to follow the mouse pointer [26]. ...
... [25][26]. ...
Article
Full-text available
Communication in modern days has developed a lot, including wireless networks, Artificial Intelligence (AI) interaction, and human-computer interfaces. People with paralysis and immobile disorders face daily difficulties communicating with others and gadgets. Eye tracking has proven to promote accessible and accurate interaction compared to other complex automatic interactions. The project aims to develop an electronic eye blinker that integrates with the experimental setup to determine clinical pupil redundancy. The proposed solution comes up with an eye-tracking tool within an inbuilt laptop webcam that tracks the eye’s pupil in the given screen dimensions and generates heat maps on the tracked locations. These heat maps can denote a letter (in case of eye writing), an indication to click on that location (in case of gadget communication), or for blinking analysis. The proposed method achieves a perfect F-measure score of 0.998 to 1.000, which is comparatively more accurate and efficient than the existing technologies. The solution also provides an effective method to determine the eye's refractive error, which can replace the complex refractometers. Further, the spatially tracked coordinates obtained during the experiment can be used to analyze the patient’s blinking pattern, which, in turn, can detect retinal disorders and their progress during medication. One of the applications of the project is to integrate the derived model with a Brain-computer interface system to allow fast communications for the disabled.
... Motion tracking is also critical for developmental research (van Schaik and Dominici, 2020), and DLC is capable of tracking infants (Pe´rez et al., 2021). DLC can also be used track gaze (Zdarsky et al., 2021) as well as facial expression (Argyle et al., 2021;Namba et al., 2021), making it a useful tool for the assessment of various neuropathies and for cognitive psychology research. Given the promise and utility of DLC, we predict that it and other markerless motion tracking technologies will see widespread adoption for clinical applications. ...
Article
Full-text available
Clinical assessments of movement disorders currently rely on the administration of rating scales, which, while clinimetrically validated and reliable, rely on clinicians’ subjective analyses, resulting in interrater differences. Intraoperative microelectrode recording for deep brain stimulation targeting similarly relies on clinicians’ subjective evaluations of movement-related neural activity. Digital motion tracking can improve the diagnosis, assessment, and treatment of movement disorders by generating objective, standardized measures of patients’ kinematics. Motion tracking with concurrent neural recording also enables motor neuroscience studies to elucidate the neurophysiology underlying movements. Despite these promises, motion tracking has seen limited adoption in clinical settings due to the drawbacks of conventional motion tracking systems and practical limitations associated with clinical settings. However, recent advances in deep learning based computer vision algorithms have made accurate, robust markerless motion. tracking viable in any setting where digital video can be captured. Here, we review and discuss the potential clinical applications and technical limitations of deep learning based markerless motion tracking methods with a focus on DeepLabCut (DLC), an open-source software package that has been extensively applied in animal neuroscience research. We first provide a general overview of DLC, discuss its present usage, and describe the advantages that DLC confers over other motion tracking methods for clinical use. We then present our preliminary results from three ongoing studies that demonstrate the use of DLC for 1) movement disorder patient assessment and diagnosis, 2) intraoperative motor mapping for deep brain stimulation targeting and 3) intraoperative neural and kinematic recording for basic human motor neuroscience.