Conference PaperPDF Available

Smarter Eyewear- Using Commercial EOG Glasses for Activity Recognition

September 2014

September 2014

DOI:10.1145/2638728.2638795

Conference: Proceedings of UbiComp'14 Adjunct

Authors:

Kai Kunze

Keio University

Katsuma Tanaka

Osaka Prefecture University

Yuji Uema

Keio University

Show all 6 authorsHide

Smart eyewear computing is a relatively new subcategory in ubiquitous computing research, which has enormous potential. In this paper we present a first evaluation of soon commercially available Electrooculography (EOG) glasses (J!NS MEME) for the use in activity recognition. We discuss the potential of EOG glasses and other smart eye-wear. Afterwards, we show a first signal level assessment of MEME, and present a classification task using the glasses. We are able to distinguish of 4 activities for 2 users (typing, reading, eating and talking) using the sensor data (EOG and acceleration) from the glasses with an accuracy of 70 % for 6 sec. windows and up to 100 % for a 1 minute majority decision. The classification is done user-independent. The results encourage us to further explore the EOG glasses as platform for more complex, real-life activity recognition systems.

The horizontal EOG component (raw signal), while

…

The confusion matrix in percent for the frame-by-frame classification using a 6 sec. sliding window (accuracy 70 % ).

…

Figures - uploaded by Kai Kunze

Content may be subject to copyright.

Content uploaded by Kai Kunze

Content may be subject to copyright.

Smarter Eyewear– Using Commercial

EOG Glasses for Activity Recognition

Shoya Ishimaru

Graduate School of Engineering

Osaka Prefecture University

Sakai, Osaka, Japan

ishimaru@m.cs.osakafu-u.ac.jp

Yuji Uema

Graduate School of Media Design

Keio University

Yokohama, 223-8526 Japan

uema@kmd.keio.ac.jp

Kai Kunze

Graduate School of Engineering

Osaka Prefecture University

Sakai, Osaka, Japan

ﬁrstname.lastname@gmail.com

Koichi Kise

Graduate School of Engineering

Osaka Prefecture University

Sakai, Osaka, Japan

kise@cs.osakafu-u.ac.jp

Katsuma Tanaka

Graduate School of Engineering

Osaka Prefecture University

Sakai, Osaka, Japan

ishimaru@m.cs.osakafu-u.ac.jp

Masahiko Inami

Graduate School of Media Design

Keio University

Yokohama, 223-8526 Japan

inami@kmd.keio.ac.jp

Permission to make digital or hard copies of part or all of this work for personal

or classroom use is granted without fee provided that copies are not made or

distributed for proﬁt or commercial advantage and that copies bear this notice

and the full citation on the ﬁrst page. Copyrights for third-party components

of this work must be honored. For all other uses, contact the Owner/Author.

UbiComp’14 Adjunct, September 13-17, 2014, Seattle, WA, USA

ACM 978-1-4503-3047-3/14/09.

http://dx.doi.org/10.1145/2638728.2638795

Abstract

Smart eyewear computing is a relatively new subcategory

in ubiquitous computing research, which has enormous

potential. In this paper we present a ﬁrst evaluation of

soon commercially available Electrooculography (EOG)

glasses (J!NS MEME) for the use in activity recognition.

We discuss the potential of EOG glasses and other smart

eye-wear. Afterwards, we show a ﬁrst signal level

assessment of MEME, and present a classiﬁcation task

using the glasses. We are able to distinguish of 4 activities

for 2 users ( typing, reading, eating and talking) using the

sensor data (EOG and acceleration) from the glasses with

an accuracy of 70 % for 6 sec. windows and up to 100 %

for a 1 minute majority decision. The classiﬁcation is done

user-independent.

The results encourage us to further explore the EOG

glasses as platform for more complex, real-life activity

recognition systems.

Author Keywords

Smart Glasses, Electrooculography, Activity Recognition,

Eye Movement Analysis

ACM Classiﬁcation Keywords

I.5.4 [PATTERN RECOGNITION Applications]: Signal

processing

Introduction

With wearable computing receiving increasing interest

from industry, we believe that especially smart eyewear is

a fascinating research area. In this paper, we show that

Figure 1: The EOG Glasses used

for the experiments. The second

picture shows the 3 electrodes

touching each side of the nose

and the area between the eyes.

The last picture shows a user

wearing JINS MEME.

the sensor data quality obtained by the EOG glasses

seems good enough for activity recognition tasks.

The contributions are two fold. First, we want to motivate

that smart eye wear is interesting for ubiquitous

computing applications, as it enables to track activities

that are hard to observe otherwise, especially in regard to

cognitive tasks.

Second, we evaluate speciﬁc smart glasses, a prototype of

J!NS MEME (available to consumers next year) for their

use for activity recognition tasks. We show a signal level

evaluation and a simple classiﬁcation task of 4 activities (

2 users 2 x 5 min. per activity). Both indicate that the

device can be used for more complex scenarios.

In the end we discuss application scenarios and limitations

for smart eyewear.

Toward Using Smarter Eyewear

Since the release of Google Glass, smart eyewear gains

more and more traction for a wide range of applications

(e.g. oculus rift for virtual reality). This new class of

devices proves to be an interesting platform for ubiquitous

computing, especially for activity recognition. As we

humans perceive most of our environment with senses on

our head (hearing, smell, taste and most dominantly our

vision), the head is very valuable position for sensors.

Tracking eye movements can give us great insights about

the context of the user, from recognizing what documents

a user is reading, over recognizing memory recall to

assessing expertise level [8, 6, 1, 4, 7].

Hardware

To evaluate the potential of smart eyewear for activity

sensing, we are using an early prototype from J!NS

MEME. The glasses are not a general computing

platform. They are a sensing device. They can stream

sensor data to a computer (e.g. smart phone, laptop,

desktop) using Bluetooth LE. Sensor data includes vertical

and horizontal EOG channels and accelerometer +

gyroscope data. The runtime of the device is 8 hours

enabling long term recording and, more important, long

term real-time streaming of eye and head movement.

They are unobtrusive and look mostly like normal eyewear

(see Figure 1).

Before recording with the device the ﬁrst time, the

electrodes should be adjusted a bit to the user’s nose/eyes

to get an optimal EOG signal. This is a one-time

adjustment due to the early prototype stage.

Initial Signal Level EOG Evaluation

A manual inspection of the data recorded by the

prototype reveals that detecting blinks and reading

activity seems feasible. A signal example from a user

blinking 7 times is given in Figure 2. We depict the raw

vertical EOG component before any ﬁltering, even then

the blinks are easy recognizable. Another signal example

from the horizontal EOG component is shown in Figure 3.

In this case, the user was reading. Again we depict the

”raw” horizontal component of the EOG signal.

Simple Blink detection –We apply a very simple peak

detection algorithm on data from 2 users. We consider a

point being the a peak if it has the maximal value, and

was preceded (to the left) by a value lower by a constant.

2 users with wearing the smart glasses sitting in front of

the stationary eye tracker and blink 30 times naturally.

We can detect 58 of the 60 blinks with this very simple

algorithm applied to the ”raw” vertical EOG component

signal.

Figure 2: The vertical EOG

component (raw signal), while

the user blinks seven times.

Figure 3: The horizontal EOG

component (raw signal), while

the user reads.

Classiﬁcation Task

For a ﬁrst impression, we evaluate if the sensor data from

the glasses can distinguish more complex activity

recognition tasks. We assume that modes of locomotion

etc. can easily be recognized by the motion sensors alone.

Therefore we concentrate on tasks performed while sitting

in a common oﬃce scenario. We include 4 activities:

typing a text in a word processor, eating a noodle dish,

reading a book and talking to another person.

Method

We use a simple classiﬁcation method: windowed feature

extraction with a K-Nearest Neighbor classiﬁer (k = 5)

and majority decision. 7 features are calculated over a 6

sec sliding window (2 overlapping): the median and

variance of the vertical and horizontal EOG signal and the

variance for each of the 3 accelerometer axes. The

features are used to train the nearest neighbor classiﬁer.

On top of the classiﬁcation we apply a 1 minute majority

decision for smoothing.

Experimental Setup

For the experimental setup, we record data using the J!NS

MEME prototype connected over Bluetooth to a Windows

laptop for 2 participants 4 activities, each activity for 2 x

5 min. We asked them to perform the activities naturally

while sitting at a desk.

Before starting to record with a participant, we need to

adjust the electrodes on the current prototype towards the

facial features of the user to be sure to capture a clean

EOG signal. This initial setup step needs to be done only

once per user.

Initial Results and Discussion

We apply the windowed feature extraction and

classiﬁcation method on the data, performing a user

independent classiﬁcation, training with the data of one

user and evaluating with the other user.

For the frame-by-frame classiﬁcation we reach a correct

classiﬁcation rate of 71 % on average for a 6 sec. window

(2 sec. overlap). The confusion matrix is given in

Figure 4. Applying the majority decision window of 1 min.

we reach 100 % discrimination between classes.

Strengthened by the good performance distinguishing the

4 activities for 2 users in a user independent way, we will

evaluate the platform to see if the detection of speciﬁc

activities is possible in real life situations during long term

deployment. Being able to detect food intake behavior or

learning tasks (e.g. reading) are of particular interest to

us.

Related Work

We follow the early pioneering work from Bulling et al.

and Manabe et al. in using EOG for activity

recognition[9, 3]. Bulling et al. described an approach to

recognize diﬀerent visual activities using EOG prototypes,

including reading, solely from gaze behavior using

machine learning techniques in stationary and mobile

settings [3, 2].

There is some work to use Google Glass as activity

recognition platform. This work is complementary to the

approach in this paper, as Google Glass is a very diﬀerent

device (a full ﬂedged wearable computer) with diﬀerent

sensing modalities [5]. Most of the related work uses

dedicated research prototypes, often attaching electrodes

directly to the skin above or below the eye.

Conclusion

We presented an initial evaluation of a smart glasses

prototype for activity recognition. Both signal level

analysis and 4 activity classiﬁcation task show favorable

results. 58 of 60 blinks for 2 users can be detected by

straight forward peak detection. The 4 activities, typing,

eating, reading and talking can be distinguished perfectly

over a 1 minute window.

Smart glasses like J!NS MEME are very unobtrusive and

can be easily confused with ”normal” glasses. Yet, the

question is if this type of devices can produce a high

enough signal quality to be used for complex activity

recognition systems. Of course, the verdict is still out, yet,

our initial results are very positive, indicating the potential

of smart glasses for ubiquitous computing applications.

Figure 4: The confusion matrix

in percent for the frame-by-frame

classiﬁcation using a 6 sec.

sliding window (accuracy 70 % ).

Acknowledgements

We would like to thank the research department of J!NS

for supplying us with prototypes. This is work is partly

supported by the CREST project.

References

[1] Bulling, A., and Roggen, D. Recognition of visual

memory recall processes using eye movement analysis.

In Proceedings of the 13th international conference on

Ubiquitous computing, ACM (2011), 455–464.

[2] Bulling, A., Ward, J. A., and Gellersen, H. Multimodal

Recognition of Reading Activity in Transit Using

Body-Worn Sensors. ACM Trans. on Applied

Perception 9, 1 (2012), 2:1–2:21.

[3] Bulling, A., Ward, J. A., Gellersen, H., and Tr¨oster, G.

Eye Movement Analysis for Activity Recognition Using

Electrooculography. IEEE Trans. on Pattern Analysis

and Machine Intelligence 33, 4 (Apr. 2011), 741–753.

[4] Eivazi, S., Bednarik, R., Tukiainen, M., von und zu

Fraunberg, M., Leinonen, V., and J¨a¨askel¨ainen, J. E.

Gaze behaviour of expert and novice

microneurosurgeons diﬀers during observations of

tumor removal recordings. In ETRA, ACM (New York,

NY, USA, 2012), 377–380.

[5] Ishimaru, S., Kunze, K., Kise, K., Weppner, J.,

Dengel, A., Lukowicz, P., and Bulling, A. In the blink

of an eye: combining head motion and eye blink

frequency for activity recognition with google glass. In

Proceedings of the 5th Augmented Human

International Conference, ACM (2014), 15.

[6] Kunze, K., Iwamura, M., Kise, K., Uchida, S., and

Omachi, S. Activity recognition for the mind: Toward

a cognitive quantiﬁed self. Computer 46, 10 (2013),

0105–108.

[7] Kunze, K., Kawaichi, H., Yoshimura, K., and Kise, K.

The wordmeter – estimating the number of words read

using document image retrieval and mobile eye

tracking. In Proc. ICDAR 2013 (2013).

[8] Kunze, K., Utsumi, Y., Shiga, Y., Kise, K., and

Bulling, A. I know what you are reading: recognition

of document types using mobile eye tracking. In

Proceedings of the 17th annual international

symposium on International symposium on wearable

computers, ACM (2013), 113–116.

[9] Manabe, H., and Fukumoto, M. Full-time wearable

headphone-type gaze detector. In CHI’06 Extended

Abstracts on Human Factors in Computing Systems,

ACM (2006), 1073–1078.

How do personality traits modulate real-world gaze behavior? Generated gaze data shows situation-dependent modulations

Article

Full-text available

Jan 2024

It has both scientific and practical benefits to substantiate the theoretical prediction that personality (Big Five) traits systematically modulate gaze behavior in various real-world (working) situations. Nevertheless, previous methods that required controlled situations and large numbers of participants failed to incorporate real-world personality modulation analysis. One cause of this research gap is the mixed effects of individual attributes (e.g., the accumulated attributes of age, gender, and degree of measurement noise) and personality traits in gaze data. Previous studies may have used larger sample sizes to average out the possible concentration of specific individual attributes in some personality traits, and may have imposed control situations to prevent unexpected interactions between these possibly biased individual attributes and complex, realistic situations. Therefore, we generated and analyzed real-world gaze behavior where the effects of personality traits are separated out from individual attributes. In Experiment 1, we successfully provided a methodology for generating such sensor data on head and eye movements for a small sample of participants who performed realistic nonsocial (data-entry) and social (conversation) work tasks (i.e., the first contribution). In Experiment 2, we evaluated the effectiveness of generated gaze behavior for real-world personality modulation analysis. We successfully showed how openness systematically modulates the autocorrelation coefficients of sensor data, reflecting the period of head and eye movements in data-entry and conversation tasks (i.e., the second contribution). We found different openness modulations in the autocorrelation coefficients from the generated sensor data of the two tasks. These modulations could not be detected using real sensor data because of the contamination of individual attributes. In conclusion, our method is a potentially powerful tool for understanding theoretically expected, systematic situation-specific personality modulation of real-world gaze behavior.

Multi-dimensional task recognition for human-robot teaming: literature review

Article

Full-text available

Aug 2023

Human-robot teams collaborating to achieve tasks under various conditions, especially in unstructured, dynamic environments will require robots to adapt autonomously to a human teammate’s state. An important element of such adaptation is the robot’s ability to infer the human teammate’s tasks. Environmentally embedded sensors (e.g., motion capture and cameras) are infeasible in such environments for task recognition, but wearable sensors are a viable task recognition alternative. Human-robot teams will perform a wide variety of composite and atomic tasks, involving multiple activity components (i.e., gross motor, fine-grained motor, tactile, visual, cognitive, speech and auditory) that may occur concurrently. A robot’s ability to recognize the human’s composite, concurrent tasks is a key requirement for realizing successful teaming. Over a hundred task recognition algorithms across multiple activity components are evaluated based on six criteria: sensitivity, suitability, generalizability, composite factor, concurrency and anomaly awareness. The majority of the reviewed task recognition algorithms are not viable for human-robot teams in unstructured, dynamic environments, as they only detect tasks from a subset of activity components, incorporate non-wearable sensors, and rarely detect composite, concurrent tasks across multiple activity components.

A state-trait approach for bridging the gap between basic and applied occupational psychological constructs

Thesis

May 2023

Jumpei Yamashita

Estimating and optimizing the psychological conditions of workers for their respective occupations improves performance. For this purpose, we can use psychological variables (e.g., concentration levels) proposed by basic and applied psychological studies. For example, we can keep workers’ concentration levels high by taking advantage of general phenomena reported in basic studies (e.g., taking breaks to compensate for decreases in concentration over time in any task) or specific phenomena in applied studies (e.g., presenting warning signals at points that are temporally or spatially hazardous in particular tasks). These ‘basic’ and ‘applied’ psychological variables, proposed by corresponding study principles, have conflicting objectives and generality. Basic psychological variables can be used to understand general psychological mechanisms that operate in any situation and, although limited in effect, sometimes improve specific occupational performance. In contrast, applied psychological variables can be used to improve specific occupational performance in specific occupational situations without considering the general psychological mechanisms underlying most situations. However, both a basic understanding of the general mechanisms underlying variables and their specific effectiveness in applications are critical for occupational performance improvements. Through basic features/benefits, we can aggregate and utilize knowledge about workers’ general tendencies, as identified in many controlled experimental tasks (i.e., general situations). In contrast, through applied features/benefits, we can directly describe and utilize workers’ tendencies in particular occupational tasks (i.e., specific situations). Nevertheless, few approaches have ever evaluated both aspects simultaneously. Therefore, Part I of this dissertation examines whether proposing psychological variables with both basic and applied features/benefits is possible. For this purpose, I conducted experiments that used controlled (i.e., general), and yet occupational (i.e., specific) situations (i.e., moderately occupational situations) to propose occupationally specific psychological variables with the underlying general mechanisms. In Part II, I discuss general frameworks that make this possible. I think approaching psychological variables in line with state and trait properties might be effective in proposing psychological variables with basic and applied features/benefits (i.e., intermediate occupational psychological constructs; intermediate OPCs). Accordingly, I propose a state-trait framework that suggests study designs for eliciting these intermediate OPCs. In Chapter 1, I discuss the issues regarding the gap between the basic and applied psychological variables, the need for bridging this gap, and the overview of this dissertation. In Part I, empirical studies added applied features/benefits to basic variable examples or basic features/benefits to applied variable examples. Specifically, I estimated psychological variables using biological indicators in moderately occupational situations. This estimation may show the underlying biological systems associated with the indicators, or the general mechanisms, and the psychological variables to be optimized with these indicators in specific occupational situations, or specific effectiveness. I selected the estimation targets, or the psychological variable examples, according to the classification of the states and traits related to generality. While state-like variables are changeable within any individual by an external factor, trait-like variables are internally stable within individuals but differ between individuals. For state-like variable examples, in Chapters 2-3, I estimated the participants’ second-to-second performances in the simplified version of operations monitoring tasks using the pupillary fluctuation amplitude (i.e., estimating short-term vigilance levels). Traditional concepts of medium-term vigilance mainly suggest general mechanisms underlying performance decrements over several hours without any disturbance (i.e., basic), but they do not involve occupationally specific real-time performance fluctuations (i.e., applied). Therefore, Chapter 2 expanded the method of estimating vigilance levels from medium-term (i.e., basic) to short-term (i.e., applied). Subsequently, the general mechanisms underlying short-term vigilance levels needed to be clarified. Chapter 3 revealed that the proposed method captures how the states of temporal attention mechanisms (i.e., basic) modulate short-term vigilance levels (i.e., applied), presenting the concept of temporal attention. For trait-like variable examples, in Chapter 4, I estimated the participants’ gaze behaviors in the experiments that reproduced realistic working tasks using the biological Big Five (i.e., estimating real-world visual attention tendencies). Traditional concepts of visual attention tendencies mainly suggest general attentional mechanisms in laboratory-controlled situations (i.e., basic), but they do not involve occupationally specific gaze behaviors in realistic working tasks (i.e., applied). Therefore, Chapter 4 expanded the method of estimating (explaining) visual attention tendencies from laboratory-controlled situational (i.e., basic) to realistic situational (i.e., applied). The results suggest that the proposed method captures how the openness traits of visual attention mechanisms (i.e., basic) modulate real-world visual attention tendencies (i.e., applied), encompassing the concept of openness modulation. For trait-like variable examples, in Chapter 5, I also estimated the participants’ preferences for various occupational titles using the biological Big Five (i.e., estimating data-driven occupational preferences). Traditional concepts of theory-driven occupational preferences mainly do not suggest general personality mechanisms underlying these preferences. Therefore, Chapter 5 expanded the occupational preferences from theory-driven (i.e., applied) to data-driven (i.e., basic), which the Big Five traits might systematically estimate (explain). The results suggest that the proposed estimation (explanation) captures how Openness, Extraversion, and Agreeableness traits of personality mechanisms (i.e., basic) modulate data-driven occupational preferences (i.e., applied), describing the concept of occupational personality traits. Finally, Chapter 6 summarizes and discusses the current empirical studies. In Part II, theoretical discussion, Chapter 7 proposed the state-trait framework for deriving OPCs that bridge the gap between basic and applied features/benefits. I point out that possessing general mechanisms, that is, individual- and situation-independent processes or individual-dependent and situation-independent structures, makes OPCs have basic features/benefits. Additionally, I point out that possessing specific effectiveness, or occupational situation-dependent psychobehavioral variations, makes OPCs have applied features/benefits. The OPCs connecting both features/benefits, namely intermediate OPCs, show how the individual- and situation-independent processes or situation-independent structures generate psychobehavioral variations depending on individuals or specific occupational situations or both. I suggest that the cross-disciplinary, that is, applicable to basic and applied disciplines, folk psychology, or state and trait psychologies, could function as such generation laws, leading to the bridges between the processes, structures, and psychobehavioral variations. Indeed, the proposed concepts of temporal attention (i.e., state), openness modulation (i.e., trait), and occupational personality traits (i.e., trait) may be examples of state and trait bridges in the intermediate OPCs. If approaching OPCs in line with the state and trait properties is effective for proposing intermediate OPCs with basic and applied features/benefits, we may be able to propose the state-trait framework that guides empirical studies for proposing intermediate OPCs generally. To conclude this dissertation, I hope the proposed framework will integrate basic and applied psychological studies in various fields.

Classifying blinking and winking EOG signals using statistical analysis and LSTM algorithm

Article

Full-text available

Sep 2023

Detection of eye movement types whether the movement of the eye itself or blinking has attracted a lot of recent research. In this paper, one method to detect the type of wink or blink produced by the eye is scrutinized and another method is proposed. We discuss what statistical analysis can teach us about detection of eye movement and propose a method based on long short-term memory (LSTM) networks to detect those types. The statistical analysis is composed of two main steps, namely calculation of the first derivative followed by a digitization step. According to the values of the digitized curve and the duration of the signal, the type of the signal is detected. The success rate reached 86.6% in detection of the movement of the eye when those volunteers are not trained on using our system. However, when they are trained, the detection success rate reached 93.3%. The statistical analysis succeeds in achieving detection of all types of eye movement except one type which is the non-intentional blinking. Although rate of success achieved is high, but as the number of people using this system increases, the error in detection increases that is because it is fixed and not adaptive to changes. However; we learnt from statistical analysis that the first derivative is a very important feature to classify the type of an EOG signal. Next, we propose using the LSTM network to classify EOG signals. The effect of using the first derivative as a feature for identifying the type of EOG signals is discussed. The LSTM algorithm succeeds in detecting the type of EOG signals with a percentage equal to 92% for all types of eye movement.

Visual Task Recognition for Human-Robot Teams

Conference Paper

Full-text available

Nov 2022

EOG-Based Reading Detection in the Wild Using Spectrograms and Nested Classification Approach

Article

Full-text available

Jan 2023

Electrooculography, also known as EOG, is a technique that is used to calculate the corneo-retinal standing potential, which is located between the cornea and the retina of the human eye. Applications of EOG include eye disease diagnosis and eye movement tracking. There has been various research on reading activity detection from EOG signals in controlled or laboratory settings. However, determining reading behaviours from data collected from real-world environments remains a challenging problem. Reading detection in practical scenarios can lead us to track our daily reading activity, in turn improving our learning experience and even workplace productivity. Tracking regular reading behaviour can also lead to further research in cognitive psychology, literacy development, reading motivation, and reading comprehension. In this study, we investigated an electrooculogram dataset that was collected on the field from 10 users who were engaged in their daily activities on two separate days. We propose a pipeline combining the statistical features with deep learning features from pre-trained ImageNet models. To detect the fine-grained reading activities, we employed a nested classification approach where we detect reading and not reading at first and then do one more step of classification to discriminate among three different reading activities. With our pipeline, we could achieve 66.56% accuracy in detecting the reading activities whereas the original dataset publication showed a baseline performance of only 32%.

Recognizing Activities of Daily Living using Multi-sensor Smart Glasses

Conference Paper

May 2023

Continuous and automatic monitoring of an individual’s physical activity using wearable devices provides valuable insights into their daily habits and patterns. This information can be used to promote healthier lifestyles, prevent chronic diseases, and improve overall well-being. Smart glasses are an emerging technology that can be worn comfortably and continuously. Their wearable nature and hands-free operation make them well suited for long-term monitoring of physical activity and other real-world applications. To this end, we investigated the ability of the multi-sensor OCOsense™ smart glasses to recognize everyday activities. We evaluated three end-to-end deep learning architectures that showed promising results when working with IMU (accelerometer, gyroscope, and magnetometer) data in the past. The data used in the experiments was collected from 18 participants who performed pre-defined activities while wearing the glasses. The best architecture achieved an F1 score of 0.81, demonstrating its ability to effectively recognize activities, with the most problematic categories being standing vs. sitting.

Recognizing Activities of Daily Living using Multi-sensor Smart Glasses

Preprint

Full-text available

Apr 2023

Continuous and automatic monitoring of an individual's physical activity using wearable devices provides valuable insights into their daily habits and patterns. This information can be used to promote healthier lifestyles, prevent chronic diseases, and improve overall well-being. Smart glasses are an emerging technology that can be worn comfortably and continuously. Their wearable nature and hands-free operation make them well suited for long-term monitoring of physical activity and other real-world applications. To this end, we investigated the ability of the multi-sensor OCOsense smart glasses to recognize everyday activities. We evaluated three end-to-end deep learning architectures that showed promising results when working with IMU (accelerometer, gyroscope, and magnetometer) data in the past. The data used in the experiments was collected from 18 participants who performed pre-defined activities while wearing the glasses. The best architecture achieved an F1 score of 0.81, demonstrating its ability to effectively recognize activities, with the most problematic categories being standing vs. sitting.

Efficient Online Engagement Analytics Algorithm Toolkit That Can Run on Edge

Article

Full-text available

Feb 2023

The rapid expansion of video conferencing and remote works due to the COVID-19 pandemic has resulted in a massive volume of video data to be analyzed in order to understand the audience engagement. However, analyzing this data efficiently, particularly in real-time, poses a scalability challenge as online events can involve hundreds of people and last for hours. Existing solutions, especially open-sourced contributions, usually require dedicated and expensive hardware, and are designed as centralized cloud systems. Additionally, they may also require users to stream their video to remote servers, which raises privacy concerns. This paper introduces scalable and efficient computer vision algorithms for analyzing face orientation and eye blink in real-time on edge devices, including Android, iOS, and Raspberry Pi. An example solution is presented for proctoring online meetings, workplaces, and exams. It analyzes audiences on their own devices, thus addressing scalability and privacy issues, and runs at up to 30 fps on a Raspberry Pi. The proposed face orientation detection algorithm is extremely simple, efficient, and able to estimate the head pose in two degrees of freedom, horizontal and vertical. The proposed Eye Aspect Ratio (EAR) with simple adaptive threshold demonstrated a significant improvement in terms of false positives and overall accuracy compared to the existing constant threshold method. Additionally, the algorithms are implemented and open sourced as a toolkit with modular, cross-platform MediaPipe Calculators and Graphs so that users can easily create custom solutions for a variety of purposes and devices.

Emerging ExG-based NUI Inputs in Extended Realities: A Bottom-up Survey

Article

Jul 2021

Incremental and quantitative improvements of two-way interactions with e x tended realities (XR) are contributing toward a qualitative leap into a state of XR ecosystems being efficient, user-friendly, and widely adopted. However, there are multiple barriers on the way toward the omnipresence of XR; among them are the following: computational and power limitations of portable hardware, social acceptance of novel interaction protocols, and usability and efficiency of interfaces. In this article, we overview and analyse novel natural user interfaces based on sensing electrical bio-signals that can be leveraged to tackle the challenges of XR input interactions. Electroencephalography-based brain-machine interfaces that enable thought-only hands-free interaction, myoelectric input methods that track body gestures employing electromyography, and gaze-tracking electrooculography input interfaces are the examples of electrical bio-signal sensing technologies united under a collective concept of ExG. ExG signal acquisition modalities provide a way to interact with computing systems using natural intuitive actions enriching interactions with XR. This survey will provide a bottom-up overview starting from (i) underlying biological aspects and signal acquisition techniques, (ii) ExG hardware solutions, (iii) ExG-enabled applications, (iv) discussion on social acceptance of such applications and technologies, as well as (v) research challenges, application directions, and open problems; evidencing the benefits that ExG-based Natural User Interfaces inputs can introduce to the area of XR.

In the Blink of an Eye - Combining Head Motion and Eye Blink Frequency for Activity Recognition with Google Glass

Conference Paper

Full-text available

Mar 2014

We demonstrate how information about eye blink frequency and head motion patterns derived from Google Glass sensors can be used to distinguish different types of high level activities. While it is well known that eye blink frequency is correlated with user activity, our aim is to show that (1) eye blink frequency data from an unobtrusive, commercial platform which is not a dedicated eye tracker is good enough to be useful and (2) that adding head motion patterns information significantly improves the recognition rates. The method is evaluated on a data set from an experiment containing five activity classes (reading, talking, watching TV, mathematical problem solving, and sawing) of eight participants showing 67% recognition accuracy for eye blinking only and 82% when extended with head motion patterns.

I know what you are reading: recognition of document types using mobile eye tracking

Conference Paper

Full-text available

Sep 2013

Reading is a ubiquitous activity that many people even perform in transit, such as while on the bus or while walking. Tracking reading enables us to gain more insights about expertise level and potential knowledge of users -- towards a reading log tracking and improve knowledge acquisition. As a first step towards this vision, in this work we investigate whether different document types can be automatically detected from visual behaviour recorded using a mobile eye tracker. We present an initial recognition approach that com- bines special purpose eye movement features as well as machine learning for document type detection. We evaluate our approach in a user study with eight participants and five Japanese document types and achieve a recognition performance of 74% using user-independent training.

The Wordometer -- Estimating the Number of Words Read Using Document Image Retrieval and Mobile Eye Tracking

Conference Paper

Full-text available

Aug 2013

We introduce the Wordometer, a novel method to estimate the number of words a user reads using a mobile eye tracker and document image retrieval. We present a reading detection algorithm which works with over 91 % accuracy over 10 test subjects using 10-fold cross validation. We implement two algorithms to estimate the read words using a line break detector. A simple version gives an average error rate of 13,5 % for 9 users over 10 documents. A more sophisticated word count algorithm based on support vector regression with an RBF kernel reaches an average error rate from only 8.2 % (6.5 % if one test subject with abnormal behavior is excluded). The achieved error rates are comparable to pedometers that count our steps in our daily life. Thus, we believe the Wordometer can be used as a step counter for the information we read to make our knowledge life healthier.

Activity Recognition for the Mind: Toward a Cognitive "Quantified Self"

Article

Full-text available

Oct 2013

Applying mobile sensing technology to cognitive tasks will enable novel forms of activity recognition. Physical activity recognition technology has become mainstream-many dedicated mobile devices and smartphone apps count the steps we climb or the miles we run. What if devices and apps were also available that could count the words we read and how far we've progressed in our learning? The authors of this article demonstrate that mobile eye tracking can be used to do just that. Focusing on reading habits, they've prototyped cognitive activity recognition systems that monitor what and how much users read as well as how much they understand. Such systems could revolutionize teaching, learning, and assessment both inside and outside the classroom. Further, as sensing technology improves, activity recognition could be extended to other cognitive tasks including concentrating, retaining information, and auditory or visual processing. While this research is extremely exciting, it also raises numerous ethical questions-for example, who should know what we read or how much we understand?

Multimodal Recognition of Reading Activity in Transit Using Body-Worn Sensors

Article

Full-text available

Mar 2012

Reading is one of the most well-studied visual activities. Vision research traditionally focuses on understanding the perceptual and cognitive processes involved in reading. In this work we recognize reading activity by jointly analyzing eye and head movements of people in an everyday environment. Eye movements are recorded using an electrooculography (EOG) system; body movements using body-worn inertial measurement units. We compare two approaches for continuous recognition of reading: String matching (STR) that explicitly models the characteristic horizontal saccades during reading, and a support vector machine (SVM) that relies on 90 eye movement features extracted from the eye movement data. We evaluate both methods in a study performed with eight participants reading while sitting at a desk, standing, walking indoors and outdoors, and riding a tram. We introduce a method to segment reading activity by exploiting the sensorimotor coordination of eye and head movements during reading. Using person-independent training, we obtain an average precision for recognizing reading of 88.9&percnt; (recall 72.3&percnt;) using STR and of 87.7&percnt; (recall 87.9&percnt;) using SVM over all participants. We show that the proposed segmentation scheme improves the performance of recognizing reading events by more than 24&percnt;. Our work demonstrates that the joint analysis of eye and body movements is beneficial for reading recognition and opens up discussion on the wider applicability of a multimodal recognition approach to other visual and physical activities.

Eye Movement Analysis for Activity Recognition Using Electrooculography

Article

Full-text available

May 2011

In this work, we investigate eye movement analysis as a new sensing modality for activity recognition. Eye movement data were recorded using an electrooculography (EOG) system. We first describe and evaluate algorithms for detecting three eye movement characteristics from EOG signals-saccades, fixations, and blinks-and propose a method for assessing repetitive patterns of eye movements. We then devise 90 different features based on these characteristics and select a subset of them using minimum redundancy maximum relevance (mRMR) feature selection. We validate the method using an eight participant study in an office environment using an example set of five activity classes: copying a text, reading a printed paper, taking handwritten notes, watching a video, and browsing the Web. We also include periods with no specific activity (the NULL class). Using a support vector machine (SVM) classifier and person-independent (leave-one-person-out) training, we obtain an average precision of 76.1 percent and recall of 70.5 percent over all classes and participants. The work demonstrates the promise of eye-based activity recognition (EAR) and opens up discussion on the wider applicability of EAR to other activities that are difficult, or even impossible, to detect using common sensing modalities.

Gaze behaviour of expert and novice microneurosurgeons differs during observations of tumor removal recordings

Article

Mar 2012

Differences between visual attention strategies of experts and novices have been investigated in many fields, but little has been done in the field of microneurosurgery. In the hands of an experienced surgeon, microneurosurgery seems like an elegant, routine and clean procedure with minimal blood loss. However, microneurosurgery is a multifaceted task with clinical risks associated to surgeons' skills. In a preliminary study, eye movements of eight surgeons were recorded while observing four images representing four phases in a tumor removal surgery. A comparison of the eye movement strategies shows clear markers of expertise depending on the phase of the surgery.

Recognition of visual memory recall processes using eye movement analysis

Conference Paper

Sep 2011

Physical activity, location, as well as a person's psychophysiological and affective state are common dimensions for developing context-aware systems in ubiquitous computing. An important yet missing contextual dimension is the cognitive context that comprises all aspects related to mental information processing, such as perception, memory, knowledge, or learning. In this work we investigate the feasibility of recognising visual memory recall. We use a recognition methodology that combines minimum redundancy maximum relevance feature selection (mRMR) with a support vector machine (SVM) classifier. We validate the methodology in a dual user study with a total of fourteen participants looking at familiar and unfamiliar pictures from four picture categories: abstract, landscapes, faces, and buildings. Using person-independent training, we are able to discriminate between familiar and unfamiliar abstract pictures with a top recognition rate of 84.3% (89.3% recall, 21.0% false positive rate) over all participants. We show that eye movement analysis is a promising approach to infer the cognitive context of a person and discuss the key challenges for the real-world implementation of eye-based cognition-aware systems.

Full-time wearable headphone-type gaze detector

Conference Paper

Apr 2006

A headphone-type gaze detector for a full-time wearable interface is proposed. It uses a Kalman filter to analyze multiple channels of EOG signals measured at the locations of headphone cushions to estimate gaze direction. Evaluations show that the average estimation error is 4.4® (horizontal) and 8.3® (vertical), and that the drift is suppressed to the same level as in ordinary EOG. The method is especially robust against signal anomalies. Selecting a real object from among many surrounding ones is one possible application of this headphone gaze detector.

Full-time wearable headphone-type gaze detector, CHI '06 Extended Abstracts on Human Factors in Computing Systems

Masaaki Hiroyuki Manabe
Fukumoto

Smarter Eyewear- Using Commercial EOG Glasses for Activity Recognition

Abstract and Figures

Recommended publications

New Evidence Combination Rules for Activity Recognition in Smart Home

Feature Learning for Activity Recognition in Ubiquitous Computing.

CircleSense: A pervasive computing system for recognizing social activities

Efficient Dense Labeling of Human Activity Sequences from Wearables using Fully Convolutional Networ...