ArticlePDF Available

An Adaptive Fatigue Detection System Based on 3D CNNs and Ensemble Models

June 2023
Symmetry 15(6):1274

June 2023
15(6):1274

DOI:10.3390/sym15061274

License
CC BY 4.0

Authors:

Ahmed Sedik

Kafrelsheikh University

M. Marey

Memorial University of Newfoundland

Due to the widespread issue of road accidents, researchers have been drawn to investigate strategies to prevent them. One major contributing factor to these accidents is driver fatigue resulting from exhaustion. Various approaches have been explored to address this issue, with machine and deep learning proving to be effective in processing images and videos to detect asymmetric signs of fatigue, such as yawning, facial characteristics, and eye closure. This study proposes a multistage system utilizing machine and deep learning techniques. The first stage is designed to detect asymmetric states, including tiredness and non-vigilance as well as yawning. The second stage is focused on detecting eye closure. The machine learning approach employs several algorithms, including Support Vector Machine (SVM), k-Nearest Neighbor (KNN), Multi-layer Perceptron (MLP), Decision Tree (DT), Logistic Regression (LR), and Random Forest (RF). Meanwhile, the deep learning approach utilizes 2D and 3D Convolutional Neural Networks (CNNs). The architectures of proposed deep learning models are designed after several trials, and their parameters have been selected to achieve optimal performance. The effectiveness of the proposed methods is evaluated using video and image datasets, where the video dataset is classified into three states: alert, tired, and non-vigilant, while the image dataset is classified based on four facial symptoms, including open or closed eyes and yawning. A more robust system is achieved by combining the image and video datasets, resulting in multiple classes for detection. Simulation results demonstrate that the 3D CNN proposed in this study outperforms the other methods, with detection accuracies of 99 percent, 99 percent, and 98 percent for the image, video, and mixed datasets, respectively. Notably, this achievement surpasses the highest accuracy of 97 percent found in the literature, suggesting that the proposed methods for detecting drowsiness are indeed effective solutions.

Proposed fatigue detection system (with a sample from database in [24]).

…

Proposed data augmentation technique.

…

Example of machine learning and performance curves (SVM) for DROZY dataset.

…

Proposed Deep Learning Models for Multiclass Scenario Learning and Performance Curves for DROZY Dataset.

…

Example of machine learning and performance curves (SVM) for combined dataset.

…

Figures - available via license: Creative Commons Attribution 4.0 International

Content may be subject to copyright.

Available via license: CC BY

Content may be subject to copyright.

Citation: Sedik, A.; Marey, M.;

Mostafa, H. An Adaptive Fatigue

Detection System Based on 3D CNNs

and Ensemble Models. Symmetry

2023,15, 1274. https://doi.org/

10.3390/sym15061274

Academic Editors: Lorentz Jäntschi

and Sergei D. Odintsov

Received: 8 April 2023

Revised: 25 May 2023

Accepted: 5 June 2023

Published: 16 June 2023

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

symmetry

Article

An Adaptive Fatigue Detection System Based on 3D CNNs and

Ensemble Models

Ahmed Sedik 1, 2, *, Mohamed Marey 1and Hala Mostafa 3, *

1Smart Systems Engineering Laboratory, College of Engineering, Prince Sultan University,

Riyadh 11586, Saudi Arabia; mfmmarey@psu.edu.sa

2Department of the Robotics and Intelligent Machines, Faculty of Artiﬁcial Intelligence,

Kafrelsheikh University, Kafrelsheikh 33516, Egypt

3Department of Information Technology, College of Computer and Information Sciences,

Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia

*Correspondence: ahmed.seddiq@ai.kfs.edu.eg (A.S.); hfmostafa@pnu.edu.sa (H.M.)

Abstract:

Due to the widespread issue of road accidents, researchers have been drawn to investigate

strategies to prevent them. One major contributing factor to these accidents is driver fatigue resulting

from exhaustion. Various approaches have been explored to address this issue, with machine and deep

learning proving to be effective in processing images and videos to detect asymmetric signs of fatigue,

such as yawning, facial characteristics, and eye closure. This study proposes a multistage system

utilizing machine and deep learning techniques. The ﬁrst stage is designed to detect asymmetric

states, including tiredness and non-vigilance as well as yawning. The second stage is focused on

detecting eye closure. The machine learning approach employs several algorithms, including Support

Vector Machine (SVM), k-Nearest Neighbor (KNN), Multi-layer Perceptron (MLP), Decision Tree

(DT), Logistic Regression (LR), and Random Forest (RF). Meanwhile, the deep learning approach

utilizes 2D and 3D Convolutional Neural Networks (CNNs). The architectures of proposed deep

learning models are designed after several trials, and their parameters have been selected to achieve

optimal performance. The effectiveness of the proposed methods is evaluated using video and image

datasets, where the video dataset is classiﬁed into three states: alert, tired, and non-vigilant, while

the image dataset is classiﬁed based on four facial symptoms, including open or closed eyes and

yawning. A more robust system is achieved by combining the image and video datasets, resulting

in multiple classes for detection. Simulation results demonstrate that the 3D CNN proposed in

this study outperforms the other methods, with detection accuracies of 99 percent, 99 percent, and

98 percent for the image, video, and mixed datasets, respectively. Notably, this achievement surpasses

the highest accuracy of 97 percent found in the literature, suggesting that the proposed methods for

detecting drowsiness are indeed effective solutions.

Keywords:

fatigue detection; drowsiness detection; deep learning; image processing; machine

learning; video processing; yawning detection

1. Introduction

The issue of drivers falling asleep while operating a vehicle has received considerable

attention from numerous researchers in the automotive ﬁeld, who have dedicated their

efforts toward developing a range of drowsiness detection systems. This is an active area

of research that involves incorporating various components of the Internet of Things (IoT)

and application technology [

], such as sensors, cloud computing, facilities, smartphones,

and distributed data processing. To develop a reliable and effective fatigue detection

system, researchers typically employ three primary methodologies: behavior-based, vehicle-

based, and physical-based approaches [

]. Figure 1presents an overview of the distinct

characteristics of each of these methodologies.

Symmetry 2023,15, 1274. https://doi.org/10.3390/sym15061274 https://www.mdpi.com/journal/symmetry

Symmetry 2023,15, 1274 2 of 20

Symmetry 2023, 15, 1274 2 of 21

Figure 1. Overview of the principal methodologies employed in fatigue detection systems.

Behavioral-based methods utilize computer vision and image processing techniques

to evaluate images and videos of the operator, with the objective of assessing their level

of alertness. This strategy is based on analyzing various essential physiological indicators,

such as eye blinking, facial expressions such as lip movements, yawning, eye closure, nod-

ding, and head posture, to ascertain whether the operator is awake, drowsy, or asleep [3].

A diﬀerent approach, known as vehicle-based systems, involves incorporating a

driver fatigue detection system into the steering wheel of the vehicle using embedded

sensors and devices. This integrated system measures various indicators, including steer-

ing wheel velocity, steering wheel angle, steering wheel movement, hand position, lane

departure, and hand absence [4].

Figure 1. Overview of the principal methodologies employed in fatigue detection systems.

Behavioral-based methods utilize computer vision and image processing techniques

to evaluate images and videos of the operator, with the objective of assessing their level of

alertness. This strategy is based on analyzing various essential physiological indicators,

such as eye blinking, facial expressions such as lip movements, yawning, eye closure, nod-

ding, and head posture, to ascertain whether the operator is awake, drowsy, or asleep [3].

A different approach, known as vehicle-based systems, involves incorporating a driver

fatigue detection system into the steering wheel of the vehicle using embedded sensors

and devices. This integrated system measures various indicators, including steering wheel

velocity, steering wheel angle, steering wheel movement, hand position, lane departure,

and hand absence [4].

Symmetry 2023,15, 1274 3 of 20

The physical-based fatigue detection methods employ human bio-signals such as

Electroencephalography (EEG), Electrooculography (EOC), and Electrocardiography (ECG)

to monitor the driver behind the steering wheel. In addition, some signs are involved in

these methods, such as breathing, respiratory, and pulse rates [5].

1.1. Related Work

The endeavors toward developing a system for detecting fatigue can be categorized

into two primary groups: conventional algorithms and machine learning algorithms [

Among the machine learning algorithms, CNN and SVM are the most commonly employed

and efﬁcient classiﬁers [

]. Although SVM is quick and precise in analyzing small datasets,

its accuracy and speed decrease when utilized for larger datasets. Conversely, CNN

exhibits high accuracy and stability for both small and large datasets, but its training can

be time-consuming when utilizing CPUs and may incur high processing costs when using

GPUs [7].

The development of a fatigue detection system for driving is a crucial component in

improving driving safety measures. Previous efforts to employ behavioral-based techniques

entailed utilizing software to observe driving behavior by capturing real-time images of

the driver using infrared illumination [

]. This approach considers multiple parameters,

including PERCLOS, face position, blink frequency, nodding frequency, and eye closure

duration, to monitor the driver’s conduct. A classiﬁer evaluates these parameters to

determine the driver’s level of alertness. Currently, this system surpasses other algorithms

as it has the capability to observe and analyze a wide range of factors and collect data in

both daytime and nighttime conditions.

Abtahi et al. [

] devised a simple method using image processing to detect signs

of driver fatigue. This approach involves capturing facial features, such as eye and lip

movements, to identify yawning and ocular languor and then monitoring the driver’s face

in the image. The concept identiﬁes changes in the geometric properties of the driver’s

face to recognize fatigue. Flores et al. [

] put forth an Advanced Driver Assistance

System (ADAS) in their investigation, which employs a technique for detecting tiredness

by scrutinizing the driver’s face and eyes to evaluate their facial expressions and eye

movements. The authors conducted real-time testing of the system under varying lighting

conditions in contrast.

Several techniques, such as those described in references [

–

], strive to improve the

precision of fatigue detection by identifying the same facial characteristics as described in

reference [

]. To this end, Sigari et al. [

] developed a method that compares the driver’s

present head orientation to a pre-existing facial template and projects the top half of the

driver’s facial image horizontally to detect alterations in eye closure and eyelid distance.

A fuzzy-based approach that integrates both parameters was used to automatically activate

the algorithm, and it was found to be effective. Nonetheless, it faces difﬁculties during

daylight hours and is incapable of detecting fatigue when the driver is wearing glasses.

In a prior investigation [

], a deep neural network architecture was proposed to

address the challenge of identifying drowsiness. The approach involved analyzing the

driver’s facial characteristics from RGB footage using a feature fusion architecture devel-

oped with three separate convolutional neural network models: VGG16, InceptionV3, and

ResNet50. However, the accuracy of this approach was found to be limited, with a score of

78%. On the other hand, Galarza et al. [

] presented an interactive system for detecting

drowsiness that incorporated behavioral data, such as eye position, head posture, and

yawning frequency, utilizing an Android smartphone. This method offered several beneﬁts,

including consistent performance across different settings (e.g., lighting conditions and

driver accessories, such as glasses, caps, or hearing aids) and an accurate detection rate of

drowsiness, achieving a detection accuracy of 93.37%.

Bassi et al. [

] conducted a study wherein they developed a fatigue detection system

that employed machine learning techniques, including local binary pattern, SVM, and

Principal Component Analysis (PCA). The primary goal of the system was to enhance the

Symmetry 2023,15, 1274 4 of 20

performance of SVM by selecting the optimal linear, polynomial, and quadratic kernels

and assessing their effectiveness. The SVM model’s accuracy differed for different kernels,

with the polynomial kernel yielding the highest accuracy of 99%. However, this approach

was deemed computationally intensive, and its testing necessitated a considerable amount

of time, despite its efﬁcacy.

In their study, Ouabida et al. [

] proposed a method for detecting driver fatigue using

an optical correlator for driver-eye tracking. Speciﬁcally, they employed the Vander Lugt

Correlator (VLC) to estimate the position of the eye center and ﬁlter out visual noise in

challenging settings. Their approach yielded an impressive 95% accuracy rate. However, a

notable disadvantage of this technique is its susceptibility to light reﬂections from external

sources, such as other vehicles or streetlights.

An alternative approach to detecting driver fatigue was introduced by Maior et al. [

who utilized computer vision and machine learning techniques to extract eye patterns and

monitor blink movements from video streams. This method employed SVM, RF, and MLP

algorithms and yielded a 94% accuracy rate. Nonetheless, this technique is associated with

relatively lengthy processing times.

In their study, Saurav et al. [

] presented a system that utilizes video streaming

technology to identify occurrences of yawning. To enhance the accuracy of fatigue detection,

advanced deep learning models, speciﬁcally Bi-directional Long Short-Term Memory (Bi-

LSTM) and CNN, were employed. The system also leverages a camera feed to capture data

from the mouth area and distinguish between typical mouth movements and indications of

fatigue. The effectiveness of the system was assessed by evaluating it against two datasets,

the Yawning Detection Dataset (YawDD) [

] and the National Tsing Hua University

Yawning Detection Dataset (NTHUDDD) [

]. The outcome of the evaluation showed a

high accuracy rate of 96%.

Biswal et al. [

] have devised an intelligent monitoring system that can detect and

caution against driver fatigue. The system relies on video streaming and blink analysis

techniques to estimate the distance between the eye and the face, as well as the Eye Aspect

Ratio (EAR). An advantage of this system is its ability to integrate with IoT modules for

trafﬁc incident alerts. Another approach proposed by Jeon et al. [

] combines vehicle-based

and behavioral methods. This method captures data from the steering wheel and pedal

pressure sensors and employs Convolutional Neural Networks (CNNs) for classiﬁcation,

with a reported success rate of 94%. However, this technique’s accuracy is susceptible to

ﬂuctuation due to alterations in the road environment, which is its primary limitation.

By contrast, a number of algorithms have incorporated machine learning and deep

learning models to create physical techniques that rely on input from EEG [

], ECG [

and EOG. These techniques represent a fusion of physical and behavioral strategies. For

example, Ko et al. [

] proposed a system that extracts Differential Entropy (DE) from EEG

signals and applies CNN for classiﬁcation. This process generates hierarchical features

and class-discriminative information, enabling the detection of sleepiness via a density-

connected layer. Similarly, Zhu et al. [

] employed CNN to gather and analyze data

from wearable EEG sensors. They employed a pre-trained AlexNet model with CNN to

classify the collected EEG signals, resulting in a 94% accuracy rate. However, the primary

difﬁculty associated with this approach lies in the time delay between acquiring EEG data

and processing it with CNN.

1.2. Novelty and Contributions

Upon examination of the reviewed literature, it becomes evident that a multitude of

feature extraction techniques were employed to extract the essential features from the input

data. Additionally, various strategies were employed in the classiﬁcation task to attain

optimal detection accuracy. Many researchers investigated fatigue detection methods to

prevent drivers from drowsiness [

–

], and others detect sleepiness from eye closure

rate [

]. Nevertheless, despite using robust feature extraction and classiﬁcation methods,

the highest level of detection accuracy attained was 97% on a collected dataset using a

Symmetry 2023,15, 1274 5 of 20

driving simulator with an ensemble machine learning method. The objective of this article

is to develop an enhanced system that can achieve a higher level of detection accuracy than

the existing systems presented in previous literature. This objective is achieved using a

cascaded decision system which comprises two stages. The ﬁrst stage is requested to detect

yawning and tiredness, while the second one detects the eye closure state.

In addition, we deployed machine learning and deep learning, which have been em-

ployed in various applications, including emotion recognition [

], speech recognition [

image reconstruction [

], and medical diagnosis [

–

]. The present study suggests a

deep learning approach based on 2D and 3D CNNs for identifying fatigue in images and

videos. The proposed study’s contributions are as follows:

The process of feature extraction from images and videos is accomplished by the

utilization of the Haar Cascaded Classiﬁer (HCC).

To investigate a cascaded system that detects both tiredness and eye closure. This

system is the ﬁrst concern in this topic, to the best of the author’s knowledge.

To explore an improved approach to detect fatigue from images based on machine

learning methods utilizing SVM, RF, DT, KNN, QDA, MLP, and LR.

To design some deep learning models based on 2D and 3D CNNs to handle the input

data in RGB modality with speciﬁc hyper-parameters.

A comparison is carried out among the proposed models, which presents the optimal

one based on the accuracy of detection and testing time.

The remaining sections of this paper can be divided into four parts. Section 2outlines

the materials and proposed methods utilized in this study. Section 3presents a comprehen-

sive analysis of the results obtained from these methods. In Section 4, a brief comparison

between the proposed methods and previous works in the literature is discussed. Lastly,

Section 5serves as the conclusion of this paper.

2. Materials and Methods

This introduces a method for detecting drowsiness, due to fatigue, among drivers

by monitoring their conduct, speciﬁcally utilizing videos and images. The system is

composed of four principal phases, including feature extraction, preprocessing, scaling,

and classiﬁcation. In the feature extraction phase, the Haar Cascaded Classiﬁer is utilized

to extract the driver’s facial and ocular features from the captured videos or images.

Furthermore, the preprocessed facial images are enhanced by utilizing data augmentation

techniques to increase the amount of input data for the classiﬁcation procedure. Moreover,

the enhanced data is standardized and scaled to be incorporated into the classiﬁcation

models. The overall architecture of the suggested system is depicted in Figure 2. The

concept of the proposed system is to detect fatigue (tiredness), which is a primary cause

of drowsiness. The algorithm of the proposed system is shown in Algorithm 1. First, the

input video, which is recorded at a rate of 30 frames per second, is fed into the system.

Then, the face is extracted from the video frame in the “Face extraction” step. The extracted

faces are enrolled in to “Face detection” step for face and eye recognition. The classiﬁcation

process to detect drowsiness from facial symptoms and yawning is performed in Step 3,

while the detection of drowsiness from the detected eyes is performed in Steps 4 and 5. We

deployed HCC to detect them to be fed into the classiﬁers to detect their state, whether

open or not. The steps of the proposed algorithm are as follows:

Symmetry 2023,15, 1274 6 of 20

Algorithm 1: Steps of Drowsiness Detection in The Proposed System

Input Data

Step 1: Face extraction

Step 2: Face Detection

Step 3: Drowsiness Detection from face

if yawn;

Alert;

else if tired;

Alert;

end if;

else Go to step 5;

Step 4: Eye Detection

Step 5: Drowsiness detection from Eye

if yawn;

Alert;

else Go to step 1;

end if;

Symmetry 2023, 15, 1274 6 of 21

Step 3: Drowsiness Detection from face

if yawn;

Alert;

else if tired;

Alert;

end if;

else Go to step 5;

Step 4: Eye Detection

Step 5: Drowsiness detection from Eye

if yawn;

Alert;

else Go to step 1;

end if;

Classification

SVM, RF, DT, MLP, KNN, QDA,

2D CNN, 3D CNN

Pre-processing

(Augmentation)

Embedding

(Tensors Generation)

Train the Model

Model is

Fit ?

Validate the Model

Test the Model

Yes

Feature Extraction

(Haar Cascaded Classifier)

Deep Learning

Model

Tensors

Alert Tired

Non-Vigilant

Closed

Open

Yawn

Non Yawn

Augmented

Images

Camera Module

Input Images

(Samples)

Extracted

Face

Figure 2. Proposed fatigue detection system (with a sample from database in [24]).

2.1. Image Augmentation

This paper presents a data augmentation technique based on a Generative Adversar-

ial Network (GAN), which has been shown to be eﬀective in several applications. In this

study, we apply the Convolutional GAN (CGAN) method to augment the input images.

Unlike the standard use of GANs, our study uses them solely for data augmentation and

not for classiﬁcation purposes.

Figure 2. Proposed fatigue detection system (with a sample from database in [24]).

Symmetry 2023,15, 1274 7 of 20

2.1. Image Augmentation

This paper presents a data augmentation technique based on a Generative Adversarial

Network (GAN), which has been shown to be effective in several applications. In this study,

we apply the Convolutional GAN (CGAN) method to augment the input images. Unlike

the standard use of GANs, our study uses them solely for data augmentation and not for

classiﬁcation purposes.

Speciﬁcally, our CGAN comprises a generator network and a discriminator network,

as shown in Figure 3. The generator consists of ﬁve convolutional transpose layers and

a denoising fully connected layer to generate feature maps from input images. The dis-

criminator comprises ﬁve convolutional layers and a denoising fully connected layer to

reconstruct the original image. The generated images are used to augment the available

dataset, improving the performance of the DLMs.

Symmetry 2023, 15, 1274 7 of 21

Speciﬁcally, our CGAN comprises a generator network and a discriminator network,

as shown in Figure 3. The generator consists of ﬁve convolutional transpose layers and a

denoising fully connected layer to generate feature maps from input images. The discrim-

inator comprises ﬁve convolutional layers and a denoising fully connected layer to recon-

struct the original image. The generated images are used to augment the available dataset,

improving the performance of the DLMs.

Figure 3. Proposed data augmentation technique.

2.2. Classiﬁcation

Classiﬁers such as DT, KNN, SVM, RF, and MLP with backpropagation, QDA, and

LR are used in the machine learning approach. In addition, the hyperparameters of the

proposed algorithms are mechanically chosen using the grid search method [44,45].

Additionally, the deep learning methodology consists of two deep learning models.

The ﬁrst model, namely the 3D CNN, comprises 16 layers. Its structure encompasses sev-

eral duties, including feature extraction, feature reduction, full connectivity, and classiﬁ-

cation. The feature extraction stage employs four 3D convolutional layers with ﬁlter sizes

32, 64, 64, and 128. Furthermore, the feature reduction task is accomplished through four

3D Max pooling layers, using a window size of 2. Both these tasks operate on the 3D mo-

dality to handle the depth of the input images, where the input video frames are fed into

the proposed model with an input shape of (162, 162, and 3). Consequently, this model

intends to account for any color changes that may occur within the frame’s color channels.

Another objective is the fully connected task, which is managed by utilizing a 3D

global average pooling (3D GAP) to process the output 3D feature map generated by the

convolutional and pooling layers sequence. The GAP layer is subsequently followed by a

series of dense layers that produce an eigenvector, which is then fed into the classiﬁcation

layer. This classiﬁcation layer consists of a dense layer equipped with a softmax activation

function. The architecture of the proposed 2D and 3D model is shown in Figures 4 and 5.

Discriminator

Conv2D

4×4

1,1

Conv2D

64*2

4×4

2,2

Conv2D

64*4

4×4

2,2

Conv2D

64*8

4×4

1,1

Dense

Generator

Conv2D Transpose

64*8

4×4

1,1

Conv2D Transpose

64*4

4×4

2,2

Conv2D Transpose

64*2

4×4

2,2

Conv2D Transpose

64*1

4×4

2,2

Conv2D Transpose

4×4

1,1

BN BN BN BN BN

Dense

8*8*64

FC1 Conv1 Conv2 Conv3 Conv4 Conv5

FC1 Conv1Conv2Conv4Conv5

Conv2D Transpose

4×4

1,1

Conv3

Figure 3. Proposed data augmentation technique.

2.2. Classiﬁcation

Classiﬁers such as DT, KNN, SVM, RF, and MLP with backpropagation, QDA, and

LR are used in the machine learning approach. In addition, the hyperparameters of the

proposed algorithms are mechanically chosen using the grid search method [44,45].

Additionally, the deep learning methodology consists of two deep learning models.

The ﬁrst model, namely the 3D CNN, comprises 16 layers. Its structure encompasses several

duties, including feature extraction, feature reduction, full connectivity, and classiﬁcation.

The feature extraction stage employs four 3D convolutional layers with ﬁlter sizes 32, 64,

64, and 128. Furthermore, the feature reduction task is accomplished through four 3D Max

pooling layers, using a window size of 2. Both these tasks operate on the 3D modality

to handle the depth of the input images, where the input video frames are fed into the

proposed model with an input shape of (162, 162, and 3). Consequently, this model intends

to account for any color changes that may occur within the frame’s color channels.

Another objective is the fully connected task, which is managed by utilizing a 3D

global average pooling (3D GAP) to process the output 3D feature map generated by the

convolutional and pooling layers sequence. The GAP layer is subsequently followed by a

series of dense layers that produce an eigenvector, which is then fed into the classiﬁcation

layer. This classiﬁcation layer consists of a dense layer equipped with a softmax activation

function. The architecture of the proposed 2D and 3D model is shown in Figures 4and 5.

Symmetry 2023,15, 1274 8 of 20

Symmetry 2023, 15, 1274 8 of 21

Figure 4. Architecture of the proposed 3D CNN model.

Figure 5. Architecture of the proposed 2D CNN model.

Figure 4. Architecture of the proposed 3D CNN model.

Symmetry 2023, 15, 1274 8 of 21

Figure 4. Architecture of the proposed 3D CNN model.

Figure 5. Architecture of the proposed 2D CNN model.

3. Results

This section offers a thorough assessment of the proposed fatigue detection techniques.

Firstly, a detailed depiction of the datasets employed in this research is provided. Secondly,

the evaluation metrics employed to measure the effectiveness of the proposed methods

are presented. Thirdly, the hyperparameters utilized in this study are illustrated. More-

Symmetry 2023,15, 1274 9 of 20

over, the results obtained from the experiments are outlined, accompanied by discussions

and comments on these ﬁndings. Finally, a comparative analysis of the endeavors is con-

ducted to provide a comprehensive comprehension of the strengths and limitations of the

proposed approaches.

The proposed techniques were assessed on a personal computer containing an Intel

Core i7 CPU, 8 GB NVIDIA GPU driver, 32 GB of RAM, and running on the Windows

11 operating system. These speciﬁcations were enough to handle the processing of the

videos with a frame rate of 30 frames per second (time interval of 33.33 ms), as shown in the

simulation results. The programming codes for the proposed approaches were developed

utilizing Python 3.8 and the Keras and TensorFlow toolkits and the design of the proposed

models, including the layers and learning parameters.

3.1. Datasets

The proposed techniques are executed on the “ULg Multimodality Drowsiness Database,”

commonly abbreviated as DROZY [

]. This database comprises two segments. The ﬁrst

segment entails collecting data from 14 young, healthy individuals, comprising three

males and eleven females, utilizing video streaming monitoring. The data in this segment

was gathered utilizing Kinect technology and video sensors that are equipped with Near-

Infrared (NIR) sensitivity, resulting in a resolution of 512

424 pixels in MP4 format.

Illustrations of NIR intensity scenes generated from video frames are displayed in Figure 6.

This dataset is collected at the rate of frames of 30 frames per second (time interval of

33.33 ms) with a number of frames of 17,000 frames per person. Table 1illustrates the

number of images for each person involved in this dataset (publicly provided).

Symmetry 2023, 15, 1274 10 of 21

Figure 6. Examples of intensity scenes produced from DROZY video frames [46].

The second dataset is the drowsiness dataset [24]. This dataset comprises images of

the drivers with diﬀerent eye and face symptoms, including eyes closed or open and

yawning or not. The objective is to distinguish among these states using the proposed

deep learning models. In addition, data augmentation is employed to increase the number

of images fed into the deep-learning models. Figure 7 shows samples of this dataset.

The datasets are shuﬄed and split into training and testing subsets with an 80/20

ratio. In addition, the training process is performed using k-fold cross-validation with a k

value of 10.

Figure 7. Sample of Drowsiness Dataset [24].

3.2. Evaluation Metrics

To evaluate proposed approaches, various evaluation metrics are used, such as accu-

racy, recall, precision, F1 score, and Mahews Correlation Coeﬃcient (MCC). These met-

rics (MCC) are deﬁned by Equations (1)–(5) [47]. The term False Negative (FN) refers to

the number of instances in which drowsy states are mistakenly identiﬁed as normal. True

Positive (TP) denotes the number of drowsy states correctly identiﬁed as such. True Neg-

ative (TN) pertains to the number of normal states accurately identiﬁed as normal. False

Positive (FP) refers to the number of normal states inaccurately identiﬁed as drowsy.

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦  𝑁𝑜.𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑑𝑒𝑡𝑒𝑐𝑡𝑒𝑑 𝑖𝑚𝑎𝑔𝑒𝑠

𝑇𝑜𝑡𝑎𝑙 𝑁𝑜.𝑜𝑓 𝑖𝑚𝑎𝑔𝑒𝑠 100



100

)(

)( ×

+++

NNPP

FTFT

(1)

𝑅𝑒𝑐𝑎𝑙𝑙  𝑇𝑃𝑅  𝑇



𝑇



𝐹



󰇛1−𝐹𝑁𝑅󰇜 (2)

𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛  𝑇



𝑇



𝐹



(3)

𝐹12𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑟𝑒𝑐𝑎𝑙𝑙

𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑟𝑒𝑐𝑎𝑙𝑙 (4)

Open Closed Yawn No yawn

Figure 6. Examples of intensity scenes produced from DROZY video frames [46].

Table 1. Summary of the number of images in the DROZY dataset [46].

Person Number of Frames

Alert Non-Vigilant Tired

1 17,865 15,195 14,185

2 17,899 14,156 13,033

3 17,882 13,540 14,198

6 17,789 13,079 14,272

7 17,898 14,163 13,167

8 17,913 14,198 14,331

10 17,863 17,886 14,204

11 17,866 17,900 14,339

12 17,914 17,861 17,972

13 17,889 17,908 17,889

14 17,902 17,198 17,875

The second dataset is the drowsiness dataset [

]. This dataset comprises images of the

drivers with different eye and face symptoms, including eyes closed or open and yawning

or not. The objective is to distinguish among these states using the proposed deep learning

models. In addition, data augmentation is employed to increase the number of images fed

into the deep-learning models. Figure 7shows samples of this dataset.

Symmetry 2023,15, 1274 10 of 20