Conference PaperPDF Available

Smarter Eyewear- Using Commercial EOG Glasses for Activity Recognition

Authors:

Abstract and Figures

Smart eyewear computing is a relatively new subcategory in ubiquitous computing research, which has enormous potential. In this paper we present a first evaluation of soon commercially available Electrooculography (EOG) glasses (J!NS MEME) for the use in activity recognition. We discuss the potential of EOG glasses and other smart eye-wear. Afterwards, we show a first signal level assessment of MEME, and present a classification task using the glasses. We are able to distinguish of 4 activities for 2 users (typing, reading, eating and talking) using the sensor data (EOG and acceleration) from the glasses with an accuracy of 70 % for 6 sec. windows and up to 100 % for a 1 minute majority decision. The classification is done user-independent. The results encourage us to further explore the EOG glasses as platform for more complex, real-life activity recognition systems.
Content may be subject to copyright.
Smarter Eyewear Using Commercial
EOG Glasses for Activity Recognition
Shoya Ishimaru
Graduate School of Engineering
Osaka Prefecture University
Sakai, Osaka, Japan
ishimaru@m.cs.osakafu-u.ac.jp
Yuji Uema
Graduate School of Media Design
Keio University
Yokohama, 223-8526 Japan
uema@kmd.keio.ac.jp
Kai Kunze
Graduate School of Engineering
Osaka Prefecture University
Sakai, Osaka, Japan
firstname.lastname@gmail.com
Koichi Kise
Graduate School of Engineering
Osaka Prefecture University
Sakai, Osaka, Japan
kise@cs.osakafu-u.ac.jp
Katsuma Tanaka
Graduate School of Engineering
Osaka Prefecture University
Sakai, Osaka, Japan
ishimaru@m.cs.osakafu-u.ac.jp
Masahiko Inami
Graduate School of Media Design
Keio University
Yokohama, 223-8526 Japan
inami@kmd.keio.ac.jp
Permission to make digital or hard copies of part or all of this work for personal
or classroom use is granted without fee provided that copies are not made or
distributed for profit or commercial advantage and that copies bear this notice
and the full citation on the first page. Copyrights for third-party components
of this work must be honored. For all other uses, contact the Owner/Author.
Copyright is held by the owner/author(s).
UbiComp’14 Adjunct, September 13-17, 2014, Seattle, WA, USA
ACM 978-1-4503-3047-3/14/09.
http://dx.doi.org/10.1145/2638728.2638795
Abstract
Smart eyewear computing is a relatively new subcategory
in ubiquitous computing research, which has enormous
potential. In this paper we present a first evaluation of
soon commercially available Electrooculography (EOG)
glasses (J!NS MEME) for the use in activity recognition.
We discuss the potential of EOG glasses and other smart
eye-wear. Afterwards, we show a first signal level
assessment of MEME, and present a classification task
using the glasses. We are able to distinguish of 4 activities
for 2 users ( typing, reading, eating and talking) using the
sensor data (EOG and acceleration) from the glasses with
an accuracy of 70 % for 6 sec. windows and up to 100 %
for a 1 minute majority decision. The classification is done
user-independent.
The results encourage us to further explore the EOG
glasses as platform for more complex, real-life activity
recognition systems.
Author Keywords
Smart Glasses, Electrooculography, Activity Recognition,
Eye Movement Analysis
ACM Classification Keywords
I.5.4 [PATTERN RECOGNITION Applications]: Signal
processing
Introduction
With wearable computing receiving increasing interest
from industry, we believe that especially smart eyewear is
a fascinating research area. In this paper, we show that
Figure 1: The EOG Glasses used
for the experiments. The second
picture shows the 3 electrodes
touching each side of the nose
and the area between the eyes.
The last picture shows a user
wearing JINS MEME.
the sensor data quality obtained by the EOG glasses
seems good enough for activity recognition tasks.
The contributions are two fold. First, we want to motivate
that smart eye wear is interesting for ubiquitous
computing applications, as it enables to track activities
that are hard to observe otherwise, especially in regard to
cognitive tasks.
Second, we evaluate specific smart glasses, a prototype of
J!NS MEME (available to consumers next year) for their
use for activity recognition tasks. We show a signal level
evaluation and a simple classification task of 4 activities (
2 users 2 x 5 min. per activity). Both indicate that the
device can be used for more complex scenarios.
In the end we discuss application scenarios and limitations
for smart eyewear.
Toward Using Smarter Eyewear
Since the release of Google Glass, smart eyewear gains
more and more traction for a wide range of applications
(e.g. oculus rift for virtual reality). This new class of
devices proves to be an interesting platform for ubiquitous
computing, especially for activity recognition. As we
humans perceive most of our environment with senses on
our head (hearing, smell, taste and most dominantly our
vision), the head is very valuable position for sensors.
Tracking eye movements can give us great insights about
the context of the user, from recognizing what documents
a user is reading, over recognizing memory recall to
assessing expertise level [8, 6, 1, 4, 7].
Hardware
To evaluate the potential of smart eyewear for activity
sensing, we are using an early prototype from J!NS
MEME. The glasses are not a general computing
platform. They are a sensing device. They can stream
sensor data to a computer (e.g. smart phone, laptop,
desktop) using Bluetooth LE. Sensor data includes vertical
and horizontal EOG channels and accelerometer +
gyroscope data. The runtime of the device is 8 hours
enabling long term recording and, more important, long
term real-time streaming of eye and head movement.
They are unobtrusive and look mostly like normal eyewear
(see Figure 1).
Before recording with the device the first time, the
electrodes should be adjusted a bit to the user’s nose/eyes
to get an optimal EOG signal. This is a one-time
adjustment due to the early prototype stage.
Initial Signal Level EOG Evaluation
A manual inspection of the data recorded by the
prototype reveals that detecting blinks and reading
activity seems feasible. A signal example from a user
blinking 7 times is given in Figure 2. We depict the raw
vertical EOG component before any filtering, even then
the blinks are easy recognizable. Another signal example
from the horizontal EOG component is shown in Figure 3.
In this case, the user was reading. Again we depict the
”raw” horizontal component of the EOG signal.
Simple Blink detection –We apply a very simple peak
detection algorithm on data from 2 users. We consider a
point being the a peak if it has the maximal value, and
was preceded (to the left) by a value lower by a constant.
2 users with wearing the smart glasses sitting in front of
the stationary eye tracker and blink 30 times naturally.
We can detect 58 of the 60 blinks with this very simple
algorithm applied to the ”raw” vertical EOG component
signal.
Figure 2: The vertical EOG
component (raw signal), while
the user blinks seven times.
Figure 3: The horizontal EOG
component (raw signal), while
the user reads.
Classification Task
For a first impression, we evaluate if the sensor data from
the glasses can distinguish more complex activity
recognition tasks. We assume that modes of locomotion
etc. can easily be recognized by the motion sensors alone.
Therefore we concentrate on tasks performed while sitting
in a common office scenario. We include 4 activities:
typing a text in a word processor, eating a noodle dish,
reading a book and talking to another person.
Method
We use a simple classification method: windowed feature
extraction with a K-Nearest Neighbor classifier (k = 5)
and majority decision. 7 features are calculated over a 6
sec sliding window (2 overlapping): the median and
variance of the vertical and horizontal EOG signal and the
variance for each of the 3 accelerometer axes. The
features are used to train the nearest neighbor classifier.
On top of the classification we apply a 1 minute majority
decision for smoothing.
Experimental Setup
For the experimental setup, we record data using the J!NS
MEME prototype connected over Bluetooth to a Windows
laptop for 2 participants 4 activities, each activity for 2 x
5 min. We asked them to perform the activities naturally
while sitting at a desk.
Before starting to record with a participant, we need to
adjust the electrodes on the current prototype towards the
facial features of the user to be sure to capture a clean
EOG signal. This initial setup step needs to be done only
once per user.
Initial Results and Discussion
We apply the windowed feature extraction and
classification method on the data, performing a user
independent classification, training with the data of one
user and evaluating with the other user.
For the frame-by-frame classification we reach a correct
classification rate of 71 % on average for a 6 sec. window
(2 sec. overlap). The confusion matrix is given in
Figure 4. Applying the majority decision window of 1 min.
we reach 100 % discrimination between classes.
Strengthened by the good performance distinguishing the
4 activities for 2 users in a user independent way, we will
evaluate the platform to see if the detection of specific
activities is possible in real life situations during long term
deployment. Being able to detect food intake behavior or
learning tasks (e.g. reading) are of particular interest to
us.
Related Work
We follow the early pioneering work from Bulling et al.
and Manabe et al. in using EOG for activity
recognition[9, 3]. Bulling et al. described an approach to
recognize different visual activities using EOG prototypes,
including reading, solely from gaze behavior using
machine learning techniques in stationary and mobile
settings [3, 2].
There is some work to use Google Glass as activity
recognition platform. This work is complementary to the
approach in this paper, as Google Glass is a very different
device (a full fledged wearable computer) with different
sensing modalities [5]. Most of the related work uses
dedicated research prototypes, often attaching electrodes
directly to the skin above or below the eye.
Conclusion
We presented an initial evaluation of a smart glasses
prototype for activity recognition. Both signal level
analysis and 4 activity classification task show favorable
results. 58 of 60 blinks for 2 users can be detected by
straight forward peak detection. The 4 activities, typing,
eating, reading and talking can be distinguished perfectly
over a 1 minute window.
Smart glasses like J!NS MEME are very unobtrusive and
can be easily confused with ”normal” glasses. Yet, the
question is if this type of devices can produce a high
enough signal quality to be used for complex activity
recognition systems. Of course, the verdict is still out, yet,
our initial results are very positive, indicating the potential
of smart glasses for ubiquitous computing applications.
Figure 4: The confusion matrix
in percent for the frame-by-frame
classification using a 6 sec.
sliding window (accuracy 70 % ).
Acknowledgements
We would like to thank the research department of J!NS
for supplying us with prototypes. This is work is partly
supported by the CREST project.
References
[1] Bulling, A., and Roggen, D. Recognition of visual
memory recall processes using eye movement analysis.
In Proceedings of the 13th international conference on
Ubiquitous computing, ACM (2011), 455–464.
[2] Bulling, A., Ward, J. A., and Gellersen, H. Multimodal
Recognition of Reading Activity in Transit Using
Body-Worn Sensors. ACM Trans. on Applied
Perception 9, 1 (2012), 2:1–2:21.
[3] Bulling, A., Ward, J. A., Gellersen, H., and Toster, G.
Eye Movement Analysis for Activity Recognition Using
Electrooculography. IEEE Trans. on Pattern Analysis
and Machine Intelligence 33, 4 (Apr. 2011), 741–753.
[4] Eivazi, S., Bednarik, R., Tukiainen, M., von und zu
Fraunberg, M., Leinonen, V., and askel¨ainen, J. E.
Gaze behaviour of expert and novice
microneurosurgeons differs during observations of
tumor removal recordings. In ETRA, ACM (New York,
NY, USA, 2012), 377–380.
[5] Ishimaru, S., Kunze, K., Kise, K., Weppner, J.,
Dengel, A., Lukowicz, P., and Bulling, A. In the blink
of an eye: combining head motion and eye blink
frequency for activity recognition with google glass. In
Proceedings of the 5th Augmented Human
International Conference, ACM (2014), 15.
[6] Kunze, K., Iwamura, M., Kise, K., Uchida, S., and
Omachi, S. Activity recognition for the mind: Toward
a cognitive quantified self. Computer 46, 10 (2013),
0105–108.
[7] Kunze, K., Kawaichi, H., Yoshimura, K., and Kise, K.
The wordmeter estimating the number of words read
using document image retrieval and mobile eye
tracking. In Proc. ICDAR 2013 (2013).
[8] Kunze, K., Utsumi, Y., Shiga, Y., Kise, K., and
Bulling, A. I know what you are reading: recognition
of document types using mobile eye tracking. In
Proceedings of the 17th annual international
symposium on International symposium on wearable
computers, ACM (2013), 113–116.
[9] Manabe, H., and Fukumoto, M. Full-time wearable
headphone-type gaze detector. In CHI’06 Extended
Abstracts on Human Factors in Computing Systems,
ACM (2006), 1073–1078.
... We measured the Big Five traits of participants using a Japanese sentence-based self-questionnaire, the Trait Descriptors Personality Inventory (TDPI) (Iwai et al., 2019). We also measured time series data of head and eye movements from a commercial wearable device called JINS MEME, which has head-mounted motion sensors (i.e., accelerometers and gyrometers) and eye movement sensors (i.e., electrooculography) (Ishimaru et al., 2014). ...
... The participants rated their degree of fit to 20 questions on a seven-point scale, with four questions for each trait (Iwai et al., 2019). We used the JINS MEME wearable device (Ishimaru et al., 2014), which is similar in size to ordinary glasses, to obtain 50 Hz sensor data on head movements (accelerometers: ACC X, Y, and Z; gyrometers: GYRO X, Y, and Z) and eye movements (absolute potential of left electrode: EOG L; vertical potential difference: EOG V; horizontal potential difference: EOG H). Since our goal was to analyze participants in realistic situations, we did not impose strong controls on the placement of the stimuli or the posture of the participants. ...
... The smoothed JINS MEME 50 Hz real sensor data from head and eye movements were divided using a window of 512 points (10.24 seconds) that moved in increments of 128 points (2.56 seconds). That is, we obtained 10.24-second pieces of 50 Hz sensor data in which eye movement events (e.g., saccades and fixations) could be observed (Ishimaru et al., 2014(Ishimaru et al., , 2016(Ishimaru et al., , 2017. Input data in arrays of 9 × 512 (for the sensor data modality and time, respectively) were then standardized for each modality, each participant, and each task to have a mean of 0 and a standard deviation of 1. ...
Article
Full-text available
It has both scientific and practical benefits to substantiate the theoretical prediction that personality (Big Five) traits systematically modulate gaze behavior in various real-world (working) situations. Nevertheless, previous methods that required controlled situations and large numbers of participants failed to incorporate real-world personality modulation analysis. One cause of this research gap is the mixed effects of individual attributes (e.g., the accumulated attributes of age, gender, and degree of measurement noise) and personality traits in gaze data. Previous studies may have used larger sample sizes to average out the possible concentration of specific individual attributes in some personality traits, and may have imposed control situations to prevent unexpected interactions between these possibly biased individual attributes and complex, realistic situations. Therefore, we generated and analyzed real-world gaze behavior where the effects of personality traits are separated out from individual attributes. In Experiment 1, we successfully provided a methodology for generating such sensor data on head and eye movements for a small sample of participants who performed realistic nonsocial (data-entry) and social (conversation) work tasks (i.e., the first contribution). In Experiment 2, we evaluated the effectiveness of generated gaze behavior for real-world personality modulation analysis. We successfully showed how openness systematically modulates the autocorrelation coefficients of sensor data, reflecting the period of head and eye movements in data-entry and conversation tasks (i.e., the second contribution). We found different openness modulations in the autocorrelation coefficients from the generated sensor data of the two tasks. These modulations could not be detected using real sensor data because of the contamination of individual attributes. In conclusion, our method is a potentially powerful tool for understanding theoretically expected, systematic situation-specific personality modulation of real-world gaze behavior.
... The Electrooculography (EOG) metric measures the potential difference between the cornea and the retina caused by eye movements. The metric has medium sensitivity for classifying visual tasks (e.g., typing, web browsing, reading and watching videos) (Bulling et al., 2010;Ishimaru et al., 2014b;Islam et al., 2021). The metric is capable of detecting cognitive tasks (e.g., Datta et al., 2014;Lagodzinski et al., 2018), but its cognitive sensitivity is indeterminate. ...
... This association makes oculography a rich source of information for task recognition. Fixation, saccades, blink rate, and scanpath are the most commonly used metrics for detecting visual tasks (Bulling et al., 2010;Martinez et al., 2017;Srivastava et al., 2018), followed by EOG potentials (Ishimaru et al., 2014b;Ishimaru et al., 2017;Lu et al., 2018). Visual tasks typically occur in office or desktop-based environments, where the participants are sedentary. ...
... Classical machine learning algorithms (e.g., Bulling et al., 2010;Ishimaru et al., 2014b;Martinez et al., 2017;Srivastava et al., 2018;Landsmann et al., 2019) have medium to high sensitivity. Most algorithms conform with the suitability criterion, while rarely conforming with the generalizability criterion. ...
Article
Full-text available
Human-robot teams collaborating to achieve tasks under various conditions, especially in unstructured, dynamic environments will require robots to adapt autonomously to a human teammate’s state. An important element of such adaptation is the robot’s ability to infer the human teammate’s tasks. Environmentally embedded sensors (e.g., motion capture and cameras) are infeasible in such environments for task recognition, but wearable sensors are a viable task recognition alternative. Human-robot teams will perform a wide variety of composite and atomic tasks, involving multiple activity components (i.e., gross motor, fine-grained motor, tactile, visual, cognitive, speech and auditory) that may occur concurrently. A robot’s ability to recognize the human’s composite, concurrent tasks is a key requirement for realizing successful teaming. Over a hundred task recognition algorithms across multiple activity components are evaluated based on six criteria: sensitivity, suitability, generalizability, composite factor, concurrency and anomaly awareness. The majority of the reviewed task recognition algorithms are not viable for human-robot teams in unstructured, dynamic environments, as they only detect tasks from a subset of activity components, incorporate non-wearable sensors, and rarely detect composite, concurrent tasks across multiple activity components.
... I measured the Big Five traits of participants using a Japanese sentence-based self-questionnaire, the Trait Descriptors Personality Inventory (TDPI) (Iwai et al., 2019). I also measured time series data of head and eye movements from a commercial wearable device called JINS MEME, which has head-mounted motion sensors (i.e., accelerometers and gyrometers) and eye movement sensors (i.e., electrooculography) (Ishimaru et al., 2014). ...
... The participants rated their degree of fit to 20 questions on a 7-point scale, with four questions for each trait (Iwai et al., 2019). I used the JINS MEME wearable device (Ishimaru et al., 2014), which is similar in size to ordinary glasses, to obtain 50 Hz sensor data on head movements (accelerometers: ACC X, Y, and Z; gyrometers: GYRO X, Y, and Z) and eye movements (absolute potential of left electrode: EOG L; vertical potential difference: EOG V; horizontal potential difference: EOG H). Because our goal was to analyze participants in realistic situations, I did not impose strong controls on the placement of the stimuli or the posture of the participants. ...
... I used a wearable sensor device called JINS MEME, which is similar in size to ordinary glasses, to ensure that the participants' subjective sensation of the experimental situation was not significantly different from real-life situations. JINS MEME has been reported to measure gaze behavior with sufficient accuracy to detect activities such as saccades, blinks, and other daily activities (Ishimaru et al., 2014). Nevertheless, accurate detection of fine eye movements such as microsaccades may be difficult with such an apparatus (Laubrock, Engbert, and Kliegl, 2005). ...
Thesis
Estimating and optimizing the psychological conditions of workers for their respective occupations improves performance. For this purpose, we can use psychological variables (e.g., concentration levels) proposed by basic and applied psychological studies. For example, we can keep workers’ concentration levels high by taking advantage of general phenomena reported in basic studies (e.g., taking breaks to compensate for decreases in concentration over time in any task) or specific phenomena in applied studies (e.g., presenting warning signals at points that are temporally or spatially hazardous in particular tasks). These ‘basic’ and ‘applied’ psychological variables, proposed by corresponding study principles, have conflicting objectives and generality. Basic psychological variables can be used to understand general psychological mechanisms that operate in any situation and, although limited in effect, sometimes improve specific occupational performance. In contrast, applied psychological variables can be used to improve specific occupational performance in specific occupational situations without considering the general psychological mechanisms underlying most situations. However, both a basic understanding of the general mechanisms underlying variables and their specific effectiveness in applications are critical for occupational performance improvements. Through basic features/benefits, we can aggregate and utilize knowledge about workers’ general tendencies, as identified in many controlled experimental tasks (i.e., general situations). In contrast, through applied features/benefits, we can directly describe and utilize workers’ tendencies in particular occupational tasks (i.e., specific situations). Nevertheless, few approaches have ever evaluated both aspects simultaneously. Therefore, Part I of this dissertation examines whether proposing psychological variables with both basic and applied features/benefits is possible. For this purpose, I conducted experiments that used controlled (i.e., general), and yet occupational (i.e., specific) situations (i.e., moderately occupational situations) to propose occupationally specific psychological variables with the underlying general mechanisms. In Part II, I discuss general frameworks that make this possible. I think approaching psychological variables in line with state and trait properties might be effective in proposing psychological variables with basic and applied features/benefits (i.e., intermediate occupational psychological constructs; intermediate OPCs). Accordingly, I propose a state-trait framework that suggests study designs for eliciting these intermediate OPCs. In Chapter 1, I discuss the issues regarding the gap between the basic and applied psychological variables, the need for bridging this gap, and the overview of this dissertation. In Part I, empirical studies added applied features/benefits to basic variable examples or basic features/benefits to applied variable examples. Specifically, I estimated psychological variables using biological indicators in moderately occupational situations. This estimation may show the underlying biological systems associated with the indicators, or the general mechanisms, and the psychological variables to be optimized with these indicators in specific occupational situations, or specific effectiveness. I selected the estimation targets, or the psychological variable examples, according to the classification of the states and traits related to generality. While state-like variables are changeable within any individual by an external factor, trait-like variables are internally stable within individuals but differ between individuals. For state-like variable examples, in Chapters 2-3, I estimated the participants’ second-to-second performances in the simplified version of operations monitoring tasks using the pupillary fluctuation amplitude (i.e., estimating short-term vigilance levels). Traditional concepts of medium-term vigilance mainly suggest general mechanisms underlying performance decrements over several hours without any disturbance (i.e., basic), but they do not involve occupationally specific real-time performance fluctuations (i.e., applied). Therefore, Chapter 2 expanded the method of estimating vigilance levels from medium-term (i.e., basic) to short-term (i.e., applied). Subsequently, the general mechanisms underlying short-term vigilance levels needed to be clarified. Chapter 3 revealed that the proposed method captures how the states of temporal attention mechanisms (i.e., basic) modulate short-term vigilance levels (i.e., applied), presenting the concept of temporal attention. For trait-like variable examples, in Chapter 4, I estimated the participants’ gaze behaviors in the experiments that reproduced realistic working tasks using the biological Big Five (i.e., estimating real-world visual attention tendencies). Traditional concepts of visual attention tendencies mainly suggest general attentional mechanisms in laboratory-controlled situations (i.e., basic), but they do not involve occupationally specific gaze behaviors in realistic working tasks (i.e., applied). Therefore, Chapter 4 expanded the method of estimating (explaining) visual attention tendencies from laboratory-controlled situational (i.e., basic) to realistic situational (i.e., applied). The results suggest that the proposed method captures how the openness traits of visual attention mechanisms (i.e., basic) modulate real-world visual attention tendencies (i.e., applied), encompassing the concept of openness modulation. For trait-like variable examples, in Chapter 5, I also estimated the participants’ preferences for various occupational titles using the biological Big Five (i.e., estimating data-driven occupational preferences). Traditional concepts of theory-driven occupational preferences mainly do not suggest general personality mechanisms underlying these preferences. Therefore, Chapter 5 expanded the occupational preferences from theory-driven (i.e., applied) to data-driven (i.e., basic), which the Big Five traits might systematically estimate (explain). The results suggest that the proposed estimation (explanation) captures how Openness, Extraversion, and Agreeableness traits of personality mechanisms (i.e., basic) modulate data-driven occupational preferences (i.e., applied), describing the concept of occupational personality traits. Finally, Chapter 6 summarizes and discusses the current empirical studies. In Part II, theoretical discussion, Chapter 7 proposed the state-trait framework for deriving OPCs that bridge the gap between basic and applied features/benefits. I point out that possessing general mechanisms, that is, individual- and situation-independent processes or individual-dependent and situation-independent structures, makes OPCs have basic features/benefits. Additionally, I point out that possessing specific effectiveness, or occupational situation-dependent psychobehavioral variations, makes OPCs have applied features/benefits. The OPCs connecting both features/benefits, namely intermediate OPCs, show how the individual- and situation-independent processes or situation-independent structures generate psychobehavioral variations depending on individuals or specific occupational situations or both. I suggest that the cross-disciplinary, that is, applicable to basic and applied disciplines, folk psychology, or state and trait psychologies, could function as such generation laws, leading to the bridges between the processes, structures, and psychobehavioral variations. Indeed, the proposed concepts of temporal attention (i.e., state), openness modulation (i.e., trait), and occupational personality traits (i.e., trait) may be examples of state and trait bridges in the intermediate OPCs. If approaching OPCs in line with the state and trait properties is effective for proposing intermediate OPCs with basic and applied features/benefits, we may be able to propose the state-trait framework that guides empirical studies for proposing intermediate OPCs generally. To conclude this dissertation, I hope the proposed framework will integrate basic and applied psychological studies in various fields.
... Second is to detect eye movement types whether the motion of the eye itself or blinking detection. The third is frame of 6 s and in another experiment a time frame of one minute is used to collect data for eye activity [11]. The success rate for identification of the activity is 70% for the first time frame and 100% for the later time frame [11]. ...
... The third is frame of 6 s and in another experiment a time frame of one minute is used to collect data for eye activity [11]. The success rate for identification of the activity is 70% for the first time frame and 100% for the later time frame [11]. ...
Article
Full-text available
Detection of eye movement types whether the movement of the eye itself or blinking has attracted a lot of recent research. In this paper, one method to detect the type of wink or blink produced by the eye is scrutinized and another method is proposed. We discuss what statistical analysis can teach us about detection of eye movement and propose a method based on long short-term memory (LSTM) networks to detect those types. The statistical analysis is composed of two main steps, namely calculation of the first derivative followed by a digitization step. According to the values of the digitized curve and the duration of the signal, the type of the signal is detected. The success rate reached 86.6% in detection of the movement of the eye when those volunteers are not trained on using our system. However, when they are trained, the detection success rate reached 93.3%. The statistical analysis succeeds in achieving detection of all types of eye movement except one type which is the non-intentional blinking. Although rate of success achieved is high, but as the number of people using this system increases, the error in detection increases that is because it is fixed and not adaptive to changes. However; we learnt from statistical analysis that the first derivative is a very important feature to classify the type of an EOG signal. Next, we propose using the LSTM network to classify EOG signals. The effect of using the first derivative as a feature for identifying the type of EOG signals is discussed. The LSTM algorithm succeeds in detecting the type of EOG signals with a percentage equal to 92% for all types of eye movement.
... This association makes oculography a rich source of information for task recognition. Fixation, saccades, blink rate, and scanpath are the most commonly used metrics for detecting visual tasks [1]- [3], followed by electrooculography signals [4]- [6]. ...
... The system monitoring task entails a visual inspection task. The resource management composite task (Figure 2c) included six fuel tanks (A-F) and eight fuel pumps (1)(2)(3)(4)(5)(6)(7)(8). The arrow by the fuel pump's number indicated the direction fuel was pumped. ...
... Over the years researchers relied on eye movement information for reading detection. While some researchers obtained eye-movement information using camera-based eye trackers [17,27,28], some relied on the EOG signal [29][30][31]. Researchers explored reading detection in different controlled environments in the office, in transit and also during social interactions [32][33][34][35]. ...
Article
Full-text available
Electrooculography, also known as EOG, is a technique that is used to calculate the corneo-retinal standing potential, which is located between the cornea and the retina of the human eye. Applications of EOG include eye disease diagnosis and eye movement tracking. There has been various research on reading activity detection from EOG signals in controlled or laboratory settings. However, determining reading behaviours from data collected from real-world environments remains a challenging problem. Reading detection in practical scenarios can lead us to track our daily reading activity, in turn improving our learning experience and even workplace productivity. Tracking regular reading behaviour can also lead to further research in cognitive psychology, literacy development, reading motivation, and reading comprehension. In this study, we investigated an electrooculogram dataset that was collected on the field from 10 users who were engaged in their daily activities on two separate days. We propose a pipeline combining the statistical features with deep learning features from pre-trained ImageNet models. To detect the fine-grained reading activities, we employed a nested classification approach where we detect reading and not reading at first and then do one more step of classification to discriminate among three different reading activities. With our pipeline, we could achieve 66.56% accuracy in detecting the reading activities whereas the original dataset publication showed a baseline performance of only 32%.
... Another use of Electrooculography (EOG) J!NS MEME glasses have been demonstrated by Ishimaru et al. [14]. The study provides a signal level assessment of MEME glasses and shows the ability to distinguish 4 activities (typing, reading, eating, talking) with an accuracy of 70% for 6 second windows and up to 100% for a 1minute majority decision. ...
Conference Paper
Continuous and automatic monitoring of an individual’s physical activity using wearable devices provides valuable insights into their daily habits and patterns. This information can be used to promote healthier lifestyles, prevent chronic diseases, and improve overall well-being. Smart glasses are an emerging technology that can be worn comfortably and continuously. Their wearable nature and hands-free operation make them well suited for long-term monitoring of physical activity and other real-world applications. To this end, we investigated the ability of the multi-sensor OCOsense™ smart glasses to recognize everyday activities. We evaluated three end-to-end deep learning architectures that showed promising results when working with IMU (accelerometer, gyroscope, and magnetometer) data in the past. The data used in the experiments was collected from 18 participants who performed pre-defined activities while wearing the glasses. The best architecture achieved an F1 score of 0.81, demonstrating its ability to effectively recognize activities, with the most problematic categories being standing vs. sitting.
... Another use of Electrooculography (EOG) J!NS MEME glasses have been demonstrated by Ishimaru et al. [14]. The study provides a signal level assessment of MEME glasses and shows the ability to distinguish 4 activities (typing, reading, eating, talking) with an accuracy of 70% for 6 second windows and up to 100% for a 1minute majority decision. ...
Preprint
Full-text available
Continuous and automatic monitoring of an individual's physical activity using wearable devices provides valuable insights into their daily habits and patterns. This information can be used to promote healthier lifestyles, prevent chronic diseases, and improve overall well-being. Smart glasses are an emerging technology that can be worn comfortably and continuously. Their wearable nature and hands-free operation make them well suited for long-term monitoring of physical activity and other real-world applications. To this end, we investigated the ability of the multi-sensor OCOsense smart glasses to recognize everyday activities. We evaluated three end-to-end deep learning architectures that showed promising results when working with IMU (accelerometer, gyroscope, and magnetometer) data in the past. The data used in the experiments was collected from 18 participants who performed pre-defined activities while wearing the glasses. The best architecture achieved an F1 score of 0.81, demonstrating its ability to effectively recognize activities, with the most problematic categories being standing vs. sitting.
... Eye blinking EOG signals were incorporated to improve existing EEG-based biometric authentication [64]. The authors of [65][66][67], used EOG sensors to track eye movement and detect eye blinks. These methods sometimes require dedicated hardware in order to detect eye blink and require EOG goggles, which may not suitable for the general public. ...
Article
Full-text available
The rapid expansion of video conferencing and remote works due to the COVID-19 pandemic has resulted in a massive volume of video data to be analyzed in order to understand the audience engagement. However, analyzing this data efficiently, particularly in real-time, poses a scalability challenge as online events can involve hundreds of people and last for hours. Existing solutions, especially open-sourced contributions, usually require dedicated and expensive hardware, and are designed as centralized cloud systems. Additionally, they may also require users to stream their video to remote servers, which raises privacy concerns. This paper introduces scalable and efficient computer vision algorithms for analyzing face orientation and eye blink in real-time on edge devices, including Android, iOS, and Raspberry Pi. An example solution is presented for proctoring online meetings, workplaces, and exams. It analyzes audiences on their own devices, thus addressing scalability and privacy issues, and runs at up to 30 fps on a Raspberry Pi. The proposed face orientation detection algorithm is extremely simple, efficient, and able to estimate the head pose in two degrees of freedom, horizontal and vertical. The proposed Eye Aspect Ratio (EAR) with simple adaptive threshold demonstrated a significant improvement in terms of false positives and overall accuracy compared to the existing constant threshold method. Additionally, the algorithms are implemented and open sourced as a toolkit with modular, cross-platform MediaPipe Calculators and Graphs so that users can easily create custom solutions for a variety of purposes and devices.
Article
Incremental and quantitative improvements of two-way interactions with e x tended realities (XR) are contributing toward a qualitative leap into a state of XR ecosystems being efficient, user-friendly, and widely adopted. However, there are multiple barriers on the way toward the omnipresence of XR; among them are the following: computational and power limitations of portable hardware, social acceptance of novel interaction protocols, and usability and efficiency of interfaces. In this article, we overview and analyse novel natural user interfaces based on sensing electrical bio-signals that can be leveraged to tackle the challenges of XR input interactions. Electroencephalography-based brain-machine interfaces that enable thought-only hands-free interaction, myoelectric input methods that track body gestures employing electromyography, and gaze-tracking electrooculography input interfaces are the examples of electrical bio-signal sensing technologies united under a collective concept of ExG. ExG signal acquisition modalities provide a way to interact with computing systems using natural intuitive actions enriching interactions with XR. This survey will provide a bottom-up overview starting from (i) underlying biological aspects and signal acquisition techniques, (ii) ExG hardware solutions, (iii) ExG-enabled applications, (iv) discussion on social acceptance of such applications and technologies, as well as (v) research challenges, application directions, and open problems; evidencing the benefits that ExG-based Natural User Interfaces inputs can introduce to the area of XR.
Conference Paper
Full-text available
We demonstrate how information about eye blink frequency and head motion patterns derived from Google Glass sensors can be used to distinguish different types of high level activities. While it is well known that eye blink frequency is correlated with user activity, our aim is to show that (1) eye blink frequency data from an unobtrusive, commercial platform which is not a dedicated eye tracker is good enough to be useful and (2) that adding head motion patterns information significantly improves the recognition rates. The method is evaluated on a data set from an experiment containing five activity classes (reading, talking, watching TV, mathematical problem solving, and sawing) of eight participants showing 67% recognition accuracy for eye blinking only and 82% when extended with head motion patterns.
Conference Paper
Full-text available
Reading is a ubiquitous activity that many people even perform in transit, such as while on the bus or while walking. Tracking reading enables us to gain more insights about expertise level and potential knowledge of users -- towards a reading log tracking and improve knowledge acquisition. As a first step towards this vision, in this work we investigate whether different document types can be automatically detected from visual behaviour recorded using a mobile eye tracker. We present an initial recognition approach that com- bines special purpose eye movement features as well as machine learning for document type detection. We evaluate our approach in a user study with eight participants and five Japanese document types and achieve a recognition performance of 74% using user-independent training.
Conference Paper
Full-text available
We introduce the Wordometer, a novel method to estimate the number of words a user reads using a mobile eye tracker and document image retrieval. We present a reading detection algorithm which works with over 91 % accuracy over 10 test subjects using 10-fold cross validation. We implement two algorithms to estimate the read words using a line break detector. A simple version gives an average error rate of 13,5 % for 9 users over 10 documents. A more sophisticated word count algorithm based on support vector regression with an RBF kernel reaches an average error rate from only 8.2 % (6.5 % if one test subject with abnormal behavior is excluded). The achieved error rates are comparable to pedometers that count our steps in our daily life. Thus, we believe the Wordometer can be used as a step counter for the information we read to make our knowledge life healthier.
Article
Full-text available
Applying mobile sensing technology to cognitive tasks will enable novel forms of activity recognition. Physical activity recognition technology has become mainstream-many dedicated mobile devices and smartphone apps count the steps we climb or the miles we run. What if devices and apps were also available that could count the words we read and how far we've progressed in our learning? The authors of this article demonstrate that mobile eye tracking can be used to do just that. Focusing on reading habits, they've prototyped cognitive activity recognition systems that monitor what and how much users read as well as how much they understand. Such systems could revolutionize teaching, learning, and assessment both inside and outside the classroom. Further, as sensing technology improves, activity recognition could be extended to other cognitive tasks including concentrating, retaining information, and auditory or visual processing. While this research is extremely exciting, it also raises numerous ethical questions-for example, who should know what we read or how much we understand?
Article
Full-text available
Reading is one of the most well-studied visual activities. Vision research traditionally focuses on understanding the perceptual and cognitive processes involved in reading. In this work we recognize reading activity by jointly analyzing eye and head movements of people in an everyday environment. Eye movements are recorded using an electrooculography (EOG) system; body movements using body-worn inertial measurement units. We compare two approaches for continuous recognition of reading: String matching (STR) that explicitly models the characteristic horizontal saccades during reading, and a support vector machine (SVM) that relies on 90 eye movement features extracted from the eye movement data. We evaluate both methods in a study performed with eight participants reading while sitting at a desk, standing, walking indoors and outdoors, and riding a tram. We introduce a method to segment reading activity by exploiting the sensorimotor coordination of eye and head movements during reading. Using person-independent training, we obtain an average precision for recognizing reading of 88.9% (recall 72.3%) using STR and of 87.7% (recall 87.9%) using SVM over all participants. We show that the proposed segmentation scheme improves the performance of recognizing reading events by more than 24%. Our work demonstrates that the joint analysis of eye and body movements is beneficial for reading recognition and opens up discussion on the wider applicability of a multimodal recognition approach to other visual and physical activities.
Article
Full-text available
In this work, we investigate eye movement analysis as a new sensing modality for activity recognition. Eye movement data were recorded using an electrooculography (EOG) system. We first describe and evaluate algorithms for detecting three eye movement characteristics from EOG signals-saccades, fixations, and blinks-and propose a method for assessing repetitive patterns of eye movements. We then devise 90 different features based on these characteristics and select a subset of them using minimum redundancy maximum relevance (mRMR) feature selection. We validate the method using an eight participant study in an office environment using an example set of five activity classes: copying a text, reading a printed paper, taking handwritten notes, watching a video, and browsing the Web. We also include periods with no specific activity (the NULL class). Using a support vector machine (SVM) classifier and person-independent (leave-one-person-out) training, we obtain an average precision of 76.1 percent and recall of 70.5 percent over all classes and participants. The work demonstrates the promise of eye-based activity recognition (EAR) and opens up discussion on the wider applicability of EAR to other activities that are difficult, or even impossible, to detect using common sensing modalities.
Article
Differences between visual attention strategies of experts and novices have been investigated in many fields, but little has been done in the field of microneurosurgery. In the hands of an experienced surgeon, microneurosurgery seems like an elegant, routine and clean procedure with minimal blood loss. However, microneurosurgery is a multifaceted task with clinical risks associated to surgeons' skills. In a preliminary study, eye movements of eight surgeons were recorded while observing four images representing four phases in a tumor removal surgery. A comparison of the eye movement strategies shows clear markers of expertise depending on the phase of the surgery.
Conference Paper
Physical activity, location, as well as a person's psychophysiological and affective state are common dimensions for developing context-aware systems in ubiquitous computing. An important yet missing contextual dimension is the cognitive context that comprises all aspects related to mental information processing, such as perception, memory, knowledge, or learning. In this work we investigate the feasibility of recognising visual memory recall. We use a recognition methodology that combines minimum redundancy maximum relevance feature selection (mRMR) with a support vector machine (SVM) classifier. We validate the methodology in a dual user study with a total of fourteen participants looking at familiar and unfamiliar pictures from four picture categories: abstract, landscapes, faces, and buildings. Using person-independent training, we are able to discriminate between familiar and unfamiliar abstract pictures with a top recognition rate of 84.3% (89.3% recall, 21.0% false positive rate) over all participants. We show that eye movement analysis is a promising approach to infer the cognitive context of a person and discuss the key challenges for the real-world implementation of eye-based cognition-aware systems.
Conference Paper
A headphone-type gaze detector for a full-time wearable interface is proposed. It uses a Kalman filter to analyze multiple channels of EOG signals measured at the locations of headphone cushions to estimate gaze direction. Evaluations show that the average estimation error is 4.4® (horizontal) and 8.3® (vertical), and that the drift is suppressed to the same level as in ordinary EOG. The method is especially robust against signal anomalies. Selecting a real object from among many surrounding ones is one possible application of this headphone gaze detector.
Full-time wearable headphone-type gaze detector, CHI '06 Extended Abstracts on Human Factors in Computing Systems
  • Masaaki Hiroyuki Manabe
  • Fukumoto