ArticlePDF Available

Abstract and Figures

Anomaly detection in pedestrian walkways is an important research topic, commonly used to improve the safety of pedestrians. Due to the wide utilization of video surveillance systems and the increased quantity of captured videos, the traditional manual examination of labeling abnormal events is a tiresome task. So, an automated surveillance system that detects anomalies becomes essential among computer vision researchers. Presently, the development of deep learning (DL) models has gained significant interest in different computer vision processes namely object classification and object detection, and these applications were depending on supervised learning that required labels. Therefore, this paper develops an automated deep learning based anomaly detection technique in pedestrian walkways (DLADT-PW) for vulnerable road user's safety. The goal of the DLADT-PW model is to detect and classify the various anomalies that exist in the pedestrian walkways such as cars, skating, jeep, etc. The DLADT-PW model involves preprocessing as the primary step, which is applied for removing the noise and raise the quality of the image. In addition, mask region convolutional neural network (Mask-RCNN) with densely connected networks (DenseNet) model is employed for the detection process. To ensure the better anomaly detection performance of the DLADT-PW technique, an extensive set of simulations were performed and the outcomes are investigated under distinct aspects. The obtained experimental values confirmed the superior characteristics of the DLADT-PW technique by achieving a maximum detection accuracy.
Content may be subject to copyright.
Safety Science 142 (2021) 105356
Available online 10 June 2021
0925-7535/© 2021 Elsevier Ltd. All rights reserved.
An automated deep learning based anomaly detection in pedestrian
walkways for vulnerable road users safety
Irina V. Pustokhina
a
, Denis A. Pustokhin
b
, Thavavel Vaiyapuri
c
, Deepak Gupta
d
,
Sachin Kumar
e
, K. Shankar
f
,
*
a
Department of Entrepreneurship and Logistics, Plekhanov Russian University of Economics, 117997 Moscow, Russia
b
Department of Logistics, State University of Management, 109542 Moscow, Russia
c
College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Saudi Arabia
d
Department of Computer Science & Engineering, Maharaja Agrasen Institute of Technology, Delhi, India
e
Department of Computer Science, South Ural State University, Chelyabinsk, Russian Federation
f
Department of Computer Applications, Alagappa University, Karaikudi, India
ARTICLE INFO
Keywords:
Anomaly detection
Pedestrian walkways
Deep learning
Safety
Mask RCNN
ABSTRACT
Anomaly detection in pedestrian walkways is an important research topic, commonly used to improve the safety
of pedestrians. Due to the wide utilization of video surveillance systems and the increased quantity of captured
videos, the traditional manual examination of labeling abnormal events is a tiresome task. So, an automated
surveillance system that detects anomalies becomes essential among computer vision researchers. Presently, the
development of deep learning (DL) models has gained signicant interest in different computer vision processes
namely object classication and object detection, and these applications were depending on supervised learning
that required labels. Therefore, this paper develops an automated deep learning based anomaly detection
technique in pedestrian walkways (DLADT-PW) for vulnerable road users safety. The goal of the DLADT-PW
model is to detect and classify the various anomalies that exist in the pedestrian walkways such as cars,
skating, jeep, etc. The DLADT-PW model involves preprocessing as the primary step, which is applied for
removing the noise and raise the quality of the image. In addition, mask region convolutional neural network
(Mask-RCNN) with densely connected networks (DenseNet) model is employed for the detection process. To
ensure the better anomaly detection performance of the DLADT-PW technique, an extensive set of simulations
were performed and the outcomes are investigated under distinct aspects. The obtained experimental values
conrmed the superior characteristics of the DLADT-PW technique by achieving a maximum detection accuracy.
1. Introduction
Annually, more than 270 000 pedestrians lose their lives on the
worlds roads. The capacity to respond to pedestrian safety is an
important component of efforts to prevent road trafc injuries. Pedes-
trian collisions, like other road trafc crashes, should not be accepted as
inevitable because they are, in fact, both predictable and preventable.
Recent technological advances like computer vision (CV), surveillance
cameras (CCTV), etc. can be used to protect pedestrians and promote
safe walking require an understanding of the nature of risk factors for
pedestrian crashes. This study aims to ensure the safety of pedestrians
using computer vision techniques. An extensive application of
surveillance cameras (CCTV) in public places led the CV centric model to
learn the reputation over the CV research team. The captured visual data
is composed of enriched details which are accurate when compared with
alternate data sources like GPS, mobile communication, radar signals,
and so on. Also, it plays a major role in forecasting congestion, accidents,
and some other abnormal activities by gathering details regarding the
condition of road trafc. The use of CCTV nds helpful in several real
time applications (Zhang et al., 2018; Cocca et al., 2016; Wester and
Giesecke, 2019; Tsai, 2014; Rahouti et al., 2020) like grade-crossing-
trespassing, industrial safety management, accidental fall, etc.
Numerous computer vision depends on works have been proposed by
concentrating on operations like data acquisition, feature extraction,
* Corresponding author.
E-mail addresses: ivpustokhina@yandex.ru (I.V. Pustokhina), dpustokhin@yandex.ru (D.A. Pustokhin), t.thangam@psau.edu.sa (T. Vaiyapuri), deepakgupta@
mait.ac.in (D. Gupta), sachinagnihotri16@gmail.com (S. Kumar), drkshankar@ieee.org (K. Shankar).
Contents lists available at ScienceDirect
Safety Science
journal homepage: www.elsevier.com/locate/safety
https://doi.org/10.1016/j.ssci.2021.105356
Received 22 January 2021; Received in revised form 2 May 2021; Accepted 24 May 2021
Safety Science 142 (2021) 105356
2
scene learning, activity learning, behavioral learning, and so forth. The
basic aim of these studies is to compute the operations like scene
detection, video processing models, anomaly prediction approaches,
vehicle prediction and observation, multi camera-relied schemes and
challenges, activity examination, trafc observation, human behavior
learning, and so on. Here, anomalous prediction is considered to be a
sub-domain of behavior learning from the captured visual scenes. The
accessibility of video from public places has resulted in the simulation of
video analysis as well as anomalous prediction (Rahouti et al., 2020).
Moreover, anomalous prediction approaches understand the common
behavior by the training process. Any signicant change from normal
behavior is considered to be anomalous. The existence of vehicles on
pathways, unexpected dispersion of people from a crowd, person faints
whereas walking, jaywalking, signal bypassing at a trafc junction, U-
turn of vehicles in red signals are some of the common examples of
anomalies.
In general, anomaly prediction approaches apply unsupervised as
well as semi-supervised learning. An important aim of this work is for
nding the anomaly prediction schemes applied in road trafc cases and
concentrates on the utilities like vehicles, trespassers, atmosphere, and
communication. It has been pointed that, the scope of this has to enclose
the nature of input data as well as the representations, possibility of
supervised learning, class of abnormalities, the capability of the systems
in application content, anomaly prediction results as well as termination
criteria. The anomaly prediction mechanism is operated by under-
standing the common data patterns for developing a public prole.
When the general patterns are dened, anomalies could be predicted
using newly developed schemes. Hence, the simulation of the model is a
label that predicts whether data is abnormal or healthy.
Recently, diverse models were deployed for computing pedestrian
prediction which suits the bounding boxes for a pedestrian available in
an image. It has gained maximum attention from the developers of
computer vision and the signicant element for diverse human-based
domains such as driverless cars, automated trafc signaling, person
examination, etc. However, the predened models are unt for resolving
the complexity of a model named scaling problem that remains the same
and causes the outcome of pedestrian detection approach. The tradi-
tional approaches have managed to solve the scaling problem on the 2D
scale. First, brute-force data is augmented to improve the capability of
the scale-invariance model. Followed by, a single method with multiple
scale lters was employed in all samples with diverse sizes. However,
the presence of intra-class variance of maximum and tiny samples is
complicated to overcome the signicantly varied feature responses
along with individual approaches. To make use of drastically differing
attributes with varied scales, the divide-and-conquer paradigm can be
applied (Gong et al., 2014) for resolving the complicated scale variance
problem.
Ultimately, Deep Learning (DL) relied on anomaly prediction
methods are deployed. Initially, Convolution Neural Network (CNN) has
been employed and categorized the presence of objects. It has experi-
enced few issues like massive spatial locations as well as aspect ratios of
objects from an image. In order to overcome these problems, a number
of regions have to be selected and results in processing complexity. Thus,
region-based CNN (R-CNN) and YOLO have been established to nd the
incidence at a robust rate. Here, a novel approach has been developed
and overcome the problems involved in selecting a maximum number of
regions and employed a selective search mechanism to extract images
called region proposals. Finally, the selective search model has gener-
ated a maximum number of regions.
In order to enhance the safety of pedestrians, this paper designs a
novel DL based anomaly detection technique in pedestrian walkways
(DLADT-PW). The DLADT-PW model aims to recognize and categorize
the dissimilar anomalies present in the pedestrian walkways such as
cars, skating, jeep, etc. The DLADT-PW model includes preprocessing as
the primary step, which is applied to eradicate the noise and increase the
quality of the image. Besides, mask region convolutional neural network
(Mask-RCNN) with densely connected networks (DenseNet) model is
applied for the detection process. For verifying the superior anomaly
detection performance of the DLADT-PW technique, a wide set of sim-
ulations were accomplished and the results are inspected under distinct
aspects.
The rest of the paper is organized as follows. Section 2 briefs the
existing works and section 3 discusses the proposed model. Then, section
4 validates the performance of the proposed model and section 5 con-
cludes the paper.
2. Literature review
This section intends to survey an extensive set of available hand-
crafted feature based anomaly detection techniques and deep learning
based anomaly detection techniques.
2.1. Hand-Crafted features based method
In general, 3 components could be ltered from hand-engineered
features relied on the anomalous prediction approach. In case of a
feature extraction system, diverse feature descriptions have been
developed (Yang et al., 2020). At this point, low-level trajectory attri-
butes from series of images have been applied to dene the normal
movement patterns. But, the above-mentioned approaches are concen-
trated on anomaly affected by a crowd rather than a single object is
considered as a basic element. Hence, the trajectory features are
depending upon crowd monitoring and these approaches are unt in
handling single object anomaly prediction (Alqaralleh et al., 2020).
Additionally, the trajectory features have minimum-level spatio-tem-
poral features like the histogram of oriented ows (HOF) as well as the
histogram of oriented gradients (HOG). Kratz and Nishino (Kratz and
Nishino, 2009) utilized the dispersion of spatiotemporal gradients to
demonstrate the appropriate motion details in local spatiotemporal
motion. In (Xu et al., 2014), the motion feature depicted by the histo-
gram of optical ow is employed as a low-level feature for motion-
pattern denition.
Kim and Grauman (Kim and Grauman, 2009) utilized the mixture of
probabilistic principal component analyzers (MPPCA) methods for
dening the local activity patterns using optical ow as low-level met-
rics. Mahadevan et al. (Mahadevan et al., 2010) examined a technology
for normal crowd features that relied on mixtures of dynamic textures
(MDT) and Li et al. (Li et al., 2014) employed a Conditional Random
Field (CRF) to combine the results according to the given application.
For modeling, the existence, as well as motion features from PCA, Feng
et al. (Feng et al., 2017), established a deep Gaussian mixture model
(GMM). Moreover, few sparse coding approaches were employed for
encoding the normal patterns. Next, the normal dictionary has been
learned from over complete normal basis set and the sparse reforming
cost has been applied for measuring the common feature of the testing
sample. Eventually, the training and testing process can be triggered by,
Lu et al. (2013) with the help of several dictionaries for encoding the
normal size-invariant blocks from multiscale frames. Yu et al. (2017)
visualized the low-rank feature of bases from dictionary learning state,
afterward, a weighted sparse reformation scheme has been applied for
measuring the abnormality of samples.
2.2. Deep learning based method
Recently, DL frameworks are employed in massive computer vision
process effectively, and in anomaly, prediction works (Alqaralleh et al.,
2020). Mostly, convolutional AE or fully convolutional systems have
been applied for reforming a novel group of frames. In case of sequence
video frames with no abnormalities, Liu et al. (2018) have trained Fully
Convolutional Network (FCN) approach which mimics the U-Net for
predicting the consecutive frame. Followed by, the deviations among
predicted frame and corresponding ground truth frames were applied
I.V. Pustokhina et al.
Safety Science 142 (2021) 105356
3
for predicting the anomalies in the detection state. Ribeiro et al. (2018)
utilized the outcome of convolutional AE which has been assumed for
redeveloping input frame sequences. Since the AE is trained under the
application of normal video sequences, reconstruction error has been
employed as an anomaly value. Therefore, the better applicability, as
well as normalization of Deep Neural Network (DNN), the consideration
of anomalous events, may accelerate maximum reconstruction errors.
Hence, the main objective of these models is extracting features using AE
and predicting the anomalies by probability estimation of features.
Sabokrou et al. (Sabokrou et al., 2017) developed a deep convolu-
tional neural network (DCNN) along with the kernels equipped by using
sparse AE (SAE). Considering the cubic patches obtained from actual
images as inputs, feature maps from 3 middle and nal layers have been
induced as the Gaussian classication model. In Xu et al. (Xu et al.,
2017), 3 stacked denoising AE have been projected for learning the
spatial features, temporal features, and the mixture of these 2 models.
Next, 3 one-class SVM methods are employed for estimating the learned
features and examine the anomaly values. In addition, Sabokrou et al.
(Sabokrou et al., 2018) have projected a pre-trained CNN and inter-
cepted it as a feature extractor into FCN which is capable of extracting
the features for receptive eld with no cropping the input frames as
patches.
3. The proposed DLADT-PW technique
The workow involved in the presented DLADT-PW technique is
given here. As depicted, the surveillance video is primarily converted
into a set of frames, and anomalies are detected in each frame. Next, the
preprocessing is performed to improve the quality of the image. Fol-
lowed by, the Mask RCNN model is applied for the detection of anom-
alies and DenseNet 169 model is utilized as the baseline network for
Mask RCNN. At last, the anomalies that exist in the frame are success-
fully identied and classied.
3.1. Preprocessing
In general, the collected data is complicated and composed of inap-
propriate images, blurred images, and noisy images. Followed by, the
data is subjected to clean, smooth, and label so that the dataset quality is
enhanced. At this point, eliminate the noisy images. Next, Median
Filtering is used for smoothening the image noise caused by several
unwanted objects on target objects. An image histogram is one of the
typical approaches used for data cleaning. The actual purpose of this
model for transforming the image into a histogram and apply a corre-
lation coefcient model for identifying the image homogeneity. The
estimation formula of the correlation coefcient is given below:
r(x,y) = Cov(x,y)
̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
Var[x]Var[y]
,(1)
where x,y indicates the histogram outcome of 2 images, Var[x]de-
nes the covariance of x, Var[y]refers the covariance of y,and Cov(x,y)
represents the covariance from x and y.Besides, the score of the applied
function is ranked from [-1 to 1]. Moreover, the estimated outcome has
most of the similarities between the 2 images. Here, the histogram
model has been applied for removing the gathered images. In prior to
applying the histogram, eliminate the irregular images which are often
irrelevant. Afterward, select an appropriate image for a class and esti-
mate the correlation coefcient among correct image and residual
images.
A major variance between image and noise is considered to be the
extension of the gray level. Therefore, the visual obstacle of an image is
caused by a drastic difference between the gray level of noise and the
corresponding gray level (Li et al., 2020). Thus, an image smoothing
mechanism is applied for removing the noise with the help of grayscale
variations.
3.2. Mask R-CNN
In general, Mask RCNN is an elegant, exible, and common approach
used for object prediction, and instance segmentation which is capable
of predicting the objects available in an image at the time of generating a
high-dimension segmentation mask. Feature Pyramid Networks (FPNs)
are employed for object prediction and the rst block architecture of
Mask R-CNN is applied for feature extraction. Hence, Regional Proposal
Network (RPN), is considered to be the second block of Mask RCNN
which distributes the convolutional features in conjunction with the
prediction system and enables the cost-free RP. The RPN is also
employed to Mask RCNN rather than using selective search and the RPN
distribution of convolution feature in full map with the detection sys-
tem. It is also capable of predicting boundary location and object values
in every position and FCN.
In order to enhance the forecast accuracy of the method, Mask RCNN
applies the bilinear interpolation approach named region of interest
(ROI) according to the Faster RCNN. Also, ROI aligns layer eliminates
the harsh quantization of the ROI pool as well as aligns the obtained
features properly with the input (Xu et al., 2020). Afterward, this
approach of ROI alignment is applied for computing the accurate mea-
sures of input features based on bilinear interpolation at regularly
sampled positions in all ROI bins to accumulate the simulation outcome.
Mask R-CNN is suitable in computing 3 processes like target detection,
prediction, and segmentation. Here, when the image is conveyed by
FPN, 5 sets of feature maps are produced with different sizes, and the
candidate frame region is emanated by the RPN. Classication detection
in Mask RCNN is relevant with mask branch and applied for gaining the
spatial structure of object with the help of pixel-to-pixel organization
from convolutional layers which undergoes encoding. By means of po-
tential misalignment among input as well as feature maps without ROI
Pooling, Roi Align has been applied in the Mask RCNN by applying
bilinear interpolation to enhance the model accuracy.
3.2.1. RPn
In feature maps from convolutional layers and network proceeds
convolutional process on 3*3 pixels sliding window. A point in feature
maps emanates feature codes for respective window regions that
concern the minimum-dimensional feature codes of dimensions from
Mask RCNN. Followed by, ranking of classication values from initial
regression feature boxes which are decided, and the values of relevant
coordinates undergo decoding as accurate coordinates by the given Eqs.
(2) and (3):
tx= (xxa)/wa,ty= ( − ya)/ha(2)
tw=log(w/wa),th= (h/ha)(3)
Where (xa,ya)implies the manages of the center of anchor and (wa,
ha)denotes the height as well as the width of the anchor. (x,y)depicts
the direct of middle forecasted ROI in actual image and (w,h)denes the
height and width of ROI detected in the ground truth image. (tx,ty)
signies the regression score of coordinates and (tw,th)represents the
regression score of the height and width on the feature map. In partic-
ular, when the measure of intersection-over-union (IoU) from the
detected bounding boxes from ROI along with ground truths are
maximum than the dened threshold where the targets in ROI are
considered as a foreground as well as background.
3.2.2. Loss function
In multi-task loss function has been applied in training Mask RCNN
with 3 portions namely, classication loss of bounding box, location
regression loss of bounding box as well as loss of mask as depicted by the
given function.
L=Lcls +Lbox +Lmask (4)
I.V. Pustokhina et al.
Safety Science 142 (2021) 105356
4
Lcls = − log[pi*pi+(1p*
i)(1pi)](5)
Lbox =r(tit*
i)(6)
Lmask =Sigmoid(Clsk)(7)
Where pi denotes the detected probability for ROI in classication
loss Lcls and p*
i used to ground truth as 1 when the ROI is assumed as
foreground or 0 else. ti denotes the vector of accurate manages to
detected bounding box (Eq. (6)) and t*
i refers to the ground truth from
position regression loss in which r means the robust loss function to
estimate the regression error (Xu et al., 2020). Every ROI detects the
result of K*m2 dimensions by using mask branch and encoding K binary
masks along with a resolution of m*m. The loss of mask Lmask is assumed
as the Average Binary Cross-entropy Loss to perform the sigmoid func-
tion on every pixel from ROI. In class k(Clsk), the mask loss is depicted in
Eq. (7).
3.3. DenseNet 169 model
The baseline of the Mask RCNN contains the DenseNet-169 model.
The DenseNet structure is developed from ResNet, which is comprised of
a building block where it is unied with the former layer. Here, excess
merges are employed to learn residuals-based errors. DenseNet has
projected the mixture of outcomes obtained from previous layers despite
using the combination. Consider the single image x0 is passed by CNN.
This network is composed of L layers, in which non-linear trans-
formationHl(
ˆ
A)is implemented, where l refers to the layer indexes.
Hl(
ˆ
A)means the composite function like Batch Normalization (BN),
Rectied Linear Units (ReLU), Pooling, or Conv. A nal result of lth layer
is represented by xl. FFNN connects the outcome of lth layer as input for
(l +1)th layer that intends to generate layer transition: xl=Hl(xl1).
ResNets has a skip-connection that bypasses non-linear conversion
under the application of a given identity function:
xl=Hl(xl1) + xl1(8)
The advantages of ResNets are that the gradient controls are directed
from recent layers to existing layers and it is accomplished with the help
of identity function. Hence, the unication of identity function and
simulation outcome of Hl obstructs the data communication. Moreover,
data ow is improvised under the application of multiple connectivity
patterns (Huang et al., 2017). Fig. 1 implies the outline of the last
DenseNet structure. At last, lth layer has gained the feature-maps of
advanced layers, x0,,xl1, as input:
xl=Hl([x0,x1,,xl1] ),(9)
where[x0,x1,,xl1]denotes s the mixture of feature-maps gener-
ated in layers 0,,l1. DenseNet is mainly applied to managing
numerous connectivity. It has been executed with the help of massive
inputs of Hl(
ˆ
A)in Eq. (4) as an individual tensor. The integration used in
Eq. (4) is non-feasible if there is a prominent modication in feature map
size. Also, down-sampling is employed and classify the network as
densely connected blocks. The transition layers used in this study are
comprised of BN layers, Conv layer, and average pooling layer.
The function Hl offers k feature maps that apply lth layer with
k0+k× (l1)input feature-maps, where k0 indicates the channels with
the input layer. The drastic difference between DenseNet and former
networks is, DenseNet is limited with narrow layers and represented by
k =12. A layer has k feature-maps of the corresponding state in which
the growth rate generalizes data into a global state. It is noted that 1 ×1
Conv is assumed as bottleneck layer prior to use 3 ×3 Conv limits of
input feature-maps, and enhances the computational efcacy. Most of
the time, DenseNet is effective and system with bottleneck layer. Fig. 2
demonstrates the layers in DenseNet-169.
4. Experimental validation
The proposed model is simulated using Python 3.6.5 tool. For vali-
dation, UCSD Anomaly Detection Dataset (Murugan et al., 2019) is
utilized for the training and testing of the proposed model. In UCSD
Anomaly Detection Dataset required a group of images taken from a
static camera located at an elevation overlooking pedestrian pathways.
A crowd density in the pathway is not static as well as ranged from
sparse to over-crowd. In normal cases, the video required only pedes-
trians while the abnormal performances or anomalies contained the
effort of non-pedestrian entities in the walkways. Anomalies occur in the
videos like bikers, skaters, vehicles, tiny carts, as well as people walking
through pathways or in the grass that surrounds it. Details of the dataset
are provided in Table 1. Besides, the parameter setting is given as fol-
lows. Batch Size: 64, Optimizer: Adam, Epoch: 100, Learning rate:
0.001, and Activation function: ReLU.
Fig. 3 demonstrates a sample set of images from the UCSD dataset.
The image contains the pedestrians with some anomalies.
Fig. 4 visualizes the results offered by the presented DLADT-PW
technique on the applied UCSD dataset. Fig. 4a shows the test image
involving a set of pedestrians with some anomalies. Fig. 4b shows the
detection of two anomalies that exist on the applied input frame. The
gure noties that the DLADT-PW technique has effectively identied
the anomalies.
Table 2 has portrayed the detection accuracy of anomalies of the
proposed DLADT-PW model on the applied test004 video sequence. The
resultant table values denoted the procient anomaly detection perfor-
mance of the DLADT-PW model. For instance, anomaly 1 in the frames
078, 091, 092, and 110 are detected with the maximum accuracy of
0.95, 0.96, 0.97, and 0.98 respectively. Besides, the anomalies in the rest
of the frames such as 113, 115, 125, 142, 146, 147, 148, 150, 178, 179,
and 180 are detected with the maximum identical accuracy of 0.99.
Similarly, anomaly 2 in frames 125, 142, and 146 are noticed with high
Fig. 1. DenseNet Architecture.
I.V. Pustokhina et al.
Safety Science 142 (2021) 105356
5
accuracy of 0.97, 0.98, and 0.97 respectively. Also, the anomalies in the
rest of the frames such as 147, 148, 150, 178, 179, and 180 are identied
with the maximum identical accuracy of 0.99.
Table 3 and Fig. 5 provided the analysis of the comparative result of
the DLADT-PW technique with existing models on the applied Test004
sequence. The values that exist in the table denoted that the SF model
has failed to showcase effective detection performance over all the other
methods. At the same time, the MDT and MPPCA models have depicted
slightly improved outcomes over the SF model. Concurrently, the Fast R-
CNN model has demonstrated moderate outcome whereas a near-
optimal detection rate is accomplished by the RS-CNN model. Howev-
er, the presented DLADT-PW model has resulted in maximum detection
performance over all the other compared methods. For instance, on the
applied frame of 040, the DLADT-PW model has obtained a maximum
accuracy of 0.950 whereas the RS-CNN, Fast R-CNN, MDT, MPPCA, and
SF models have led to reduced accuracy of 0.940, 0.819, 0.768, 0.744,
and 0.524 respectively. Along with that, on the applied frame of 046, the
DLADT-PW method has attained a higher accuracy of 0.970 whereas the
RS-CNN, Fast R-CNN, MDT, MPPCA, and SF models have led to reduced
accuracy of 0.950, 0.853, 0.752, 0.768, and 0.536 correspondingly.
Eventually, on the applied frame of 106, the DLADT-PW model has
achieved a maximum accuracy of 0.990 but the RS-CNN, Fast R-CNN,
MDT, MPPCA, and SF methodologies have led to reduced accuracy of
0.990, 0.912, 0.834, 0.723, and 0.513 respectively. Furthermore, on the
applied frame of 136, the DLADT-PW model has obtained a superior
accuracy of 0.980 whereas the RS-CNN, Fast R-CNN, MDT, MPPCA, and
SF models have led to reduced accuracy of 0.980, 0.946, 0.839, 0.713,
and 0.632 correspondingly.
In the same way, on the applied frame of 158, the DLADT-PW model
has obtained a maximum accuracy of 0.990 but the RS-CNN, Fast R-
CNN, MDT, MPPCA, and SF manners have led to reduced accuracy of
0.990, 0.771, 0.783, 0.716, and 0.544 respectively. Moreover, on the
applied frame of 180, the DLADT-PW model has reached a higher ac-
curacy of 0.990 whereas the RS-CNN, Fast R-CNN, MDT, MPPCA, and SF
models have led to reduced accuracy of 0.990, 0.853, 0.852, 0.704, and
0.605 correspondingly.
Table 4 has showcased the detection accuracy of anomalies of the
DLADT-PW model on the applied test007 video sequence. The resultant
table values referred the procient anomaly detection performance of
the DLADT-PW model. For instance, the anomaly 1 in the frames 078,
091, 092, 110, 113, 115, 125, 142, 146, 147, 148, 150, 178, 179, and
180 are detected with the highest accuracy of 0.95, 0.97, 0.99, 0.95,
0.99, 0.99, 0.98, 0.99, 0.98, 0.99, 0.99, 0.96, 0.89, 0.88, and 0.97
correspondingly. Likewise, the anomaly 2 in the frames 078, 091, 092,
110, 113, 115, 125, 142, 146, 147, 148, 150, 178, 179, and 180 are
noticed with the superior accuracy of 0.96, 0.99, 0.97, 0.95, 0.83, 0.60,
0.98, 0.95, 0.70, 0.80, 0.60, 0.90, 0.86, 0.78, and 0.88 respectively.
Besides, the anomaly 3 in the frames 110, 113, 115, 125, 142, 147, 148,
178, 179, and 180 are noticed with the maximum accuracy of 0.99, 0.99,
0.99, 0.99, 0.97, 0.80, 0.60, 0.65, 0.70, and 0.75 correspondingly.
Table 5 and Fig. 6 demonstrated the comparative outcomes analysis
of the DLADT-PW technique with existing techniques on the applied
Test007 sequence. The values in the table signied that the SF model has
failed to exhibited effective detection performance over all the other
techniques. In line with, the MDT and MPPCA models have demon-
strated somewhat increased result over the SF model. Concurrently, the
Fast R-CNN model has demonstrated moderate outcome whereas a near-
optimal detection rate is accomplished by the RS-CNN model. However,
the proposed DLADT-PW model has resulted in higher detection per-
formance over all the other compared models. For instance, on the
applied frame of 040, the DLADT-PW model has attained a superior
accuracy of 0.955 while the RS-CNN, Fast R-CNN, MDT, MPPCA, and SF
models have led to reduced accuracy of 0.940, 0.892, 0.842, 0.758, and
0.636 respectively. Likewise, on the applied frame of 046, the DLADT-
PW model has obtained a superior accuracy of 0.980 whereas the RS-
CNN, Fast R-CNN, MDT, MPPCA, and SF models have led to reduced
accuracy of 0.975, 0.928, 0.860, 0.658, and 0.709 respectively. Even-
tually, on the applied frame of 106, the DLADT-PW manner has obtained
a maximum accuracy of 0.860 while the RS-CNN, Fast R-CNN, MDT,
MPPCA, and SF models have led to reduced accuracy of 0.833, 0.829,
0.824, 0.704, and 0.652 respectively. Moreover, on the applied frame of
Fig. 2. Layers in DenseNet-169.
Table 1
Description of dataset.
Dataset Testbed Frames Time (sec)
UCSDped2 Test007 360 12
Test004
I.V. Pustokhina et al.
Safety Science 142 (2021) 105356
6
136, the DLADT-PW model has obtained a maximum accuracy of 0.840
whereas the RS-CNN, Fast R-CNN, MDT, MPPCA, and SF methodologies
have led to reduced accuracy of 0.835, 0.798, 0.748, 0.788, and 0.687
respectively.
Also, on the applied frame of 158, the DLADT-PW model has ach-
ieved a maximum accuracy of 0.930 but the RS-CNN, Fast R-CNN, MDT,
MPPCA, and SF approaches have led to reduced accuracy of 0.870,
0.825, 0.811, 0.709, and 0.699 correspondingly. Eventually, on the
applied frame of 180, the DLADT-PW model has reached a maximum
accuracy of 0.867 whereas the RS-CNN, Fast R-CNN, MDT, MPPCA, and
SF models have led to reduced accuracy of 0.830, 0.808, 0.799, 0.769,
and 0.705 correspondingly.
Table 6 investigates the average accuracy analysis of the DLADT-PW
with existing models on the applied dataset (Shankar and Perumal,
2020). Fig. 7 examines the average accuracy analysis of the DLADT-PW
with existing models on the applied Test004 dataset. From the gure, it
is evident that the MPPCA and SF models have obtained a reduced
average accuracy of 0.746 and 0.564 respectively. Followed by, a
slightly improved average accuracy of 0.851 and 0.811 have been ob-
tained by the Fast R-CNN and MDT models. Though the RS-CNN
approach has led to a competitive average accuracy of 0.975, the pre-
sented DLADT-PW model has accomplished a maximum average accu-
racy of 0.982.
Fig. 8 observes the average accuracy analysis of the DLADT-PW with
existing methods on the applied Test007 dataset. From the gure, it can
be evident that the MPPCA and SF models have reached a reduced
average accuracy of 0.718 and 0.690 correspondingly. Likewise, a
somewhat enhanced average accuracy of 0.821 and 0.778 has been
attained by the Fast R-CNN and MDT models. But, the RS-CNN method
has led to a competitive average accuracy of 0.867, the proposed
Fig. 3. Sample images.
Fig. 4. (a) Test Image (b) Anomaly Detected Image.
I.V. Pustokhina et al.
Safety Science 142 (2021) 105356
7
DLADT-PW method has accomplished a higher average accuracy of
0.896. From the above-mentioned tables and gures, it is evident that
the presented DLADT-PW technique is found to be an effective tool for
the detection of anomalies in pedestrian walkways.
5. Conclusion
This paper has designed an automated DLADT-PW model to improve
the safety of pedestrians. DLADT-PW model aims to recognize and
categorize the dissimilar anomalies present in the pedestrian walkways
such as cars, skating, jeep, etc. Firstly, the surveillance video is primarily
converted into a set of frames, and anomalies are detected in each frame.
Next, the preprocessing is performed to improve the quality of the
image. Followed by, the Mask RCNN model is applied for the detection
of anomalies and DenseNet 169 model is utilized as the baseline network
for Mask RCNN. At last, the anomalies that exist in the frame are
Table 2
Accuracy of anomalies in Test004 Sequences.
Frame Number Anomaly 1 Anomaly 2
078 0.95
091 0.96
092 0.97
110 0.98
113 0.99
115 0.99
125 0.99 0.97
142 0.99 0.98
146 0.99 0.97
147 0.99 0.99
148 0.99 0.99
150 0.99 0.99
178 0.99 0.99
179 0.99 0.99
180 0.99 0.99
Table 3
Result Analysis of Existing with DLADT-PW model for the test case Test004 in
terms of Accuracy.
Frames DLADT-
PW
RS-
CNN
Fast R-
CNN
MDT MPPCA Social
Force
040 0.950 0.940 0.819 0.768 0.744 0.524
042 0.960 0.940 0.824 0.766 0.782 0.639
046 0.970 0.950 0.853 0.752 0.768 0.536
051 0.980 0.960 0.793 0.898 0.759 0.601
075 0.990 0.990 0.783 0.827 0.752 0.524
106 0.990 0.990 0.912 0.834 0.723 0.513
123 0.980 0.970 0.913 0.879 0.714 0.575
135 0.985 0.975 0.924 0.806 0.775 0.536
136 0.980 0.980 0.946 0.839 0.713 0.632
137 0.990 0.985 0.917 0.856 0.754 0.579
149 0.990 0.990 0.839 0.786 0.709 0.613
158 0.990 0.990 0.771 0.783 0.716 0.544
177 0.990 0.990 0.793 0.753 0.770 0.522
178 0.990 0.985 0.824 0.765 0.802 0.514
180 0.990 0.990 0.853 0.852 0.704 0.605
Fig. 5. Result analysis of DLADT-PW model on Test004 dataset.
Table 4
Accuracy of anomalies in Test007 Sequences.
Frame Number Anomaly 1 Anomaly 2 Anomaly 3
078 0.95 0.96
091 0.97 0.99
092 0.99 0.97
110 0.95 0.95 0.99
113 0.99 0.83 0.99
115 0.99 0.60 0.99
125 0.98 0.98 0.99
142 0.99 0.95 0.97
146 0.98 0.70
147 0.99 0.80 0.80
148 0.99 0.60 0.60
150 0.96 0.90
178 0.89 0.86 0.65
179 0.88 0.78 0.70
180 0.97 0.88 0.75
Table 5
Result Analysis of Existing with DLADT-PW model for the test case Test007 in
terms of Accuracy.
Frames DLADT-
PW
RS-
CNN
Fast R-
CNN
MDT MPPCA Social
Force
040 0.955 0.940 0.892 0.842 0.758 0.636
042 0.980 0.955 0.918 0.850 0.743 0.723
046 0.980 0.975 0.928 0.860 0.658 0.709
051 0.963 0.947 0.917 0.827 0.710 0.640
075 0.937 0.923 0.877 0.847 0.737 0.665
106 0.860 0.833 0.829 0.824 0.704 0.652
123 0.983 0.980 0.913 0.885 0.686 0.699
135 0.970 0.960 0.942 0.817 0.680 0.724
136 0.840 0.835 0.798 0.748 0.788 0.687
137 0.863 0.827 0.810 0.801 0.673 0.741
149 0.730 0.680 0.546 0.527 0.672 0.633
158 0.930 0.870 0.825 0.811 0.709 0.699
177 0.800 0.733 0.689 0.623 0.756 0.747
178 0.787 0.723 0.629 0.610 0.724 0.689
180 0.867 0.830 0.808 0.799 0.769 0.705
I.V. Pustokhina et al.
Safety Science 142 (2021) 105356
8
successfully identied and classied. For verifying the superior anomaly
detection performance of the DLADT-PW technique, a wide set of sim-
ulations were accomplished and the results are inspected under distinct
aspects. The obtained experimental values conrmed the superior
characteristics of the DLADT-PW technique by achieving a maximum
detection accuracy. In future work, the presented DLADT-PW technique
can be extended to the detection of anomalies under the consideration of
poor weather conditions. In the future, the presented work can be
implemented in various real-time scenarios like the detection of vehicles
in pedestrian walkways. In addition, the presented model can be
employed to detect crime scenes like robbery, quarreling, and so on from
the surveillance cameras.
Data Availability Statement
Data sharing not applicable to this article as no datasets were
generated or analyzed during the current study.
Declaration of Competing Interest
The authors declare that they have no conict of interest. The
manuscript was written through the contributions of all authors. All
authors have approved the nal version of the manuscript.
References
Alqaralleh, B.A.Y., Mohanty, S.N., Gupta, D., Khanna, A., Shankar, K., Vaiyapuri, T.,
2020. Reliable Multi-Object Tracking Model Using Deep Learning and Energy
Efcient Wireless Multimedia Sensor Networks. IEEE Access 8, 213426213436.
https://doi.org/10.1109/ACCESS.2020.3039695.
Cocca, P., Marciano, F., Alberti, M., 2016. Video surveillance systems to enhance
occupational safety: A case study. Saf. Sci. 84, 140148.
Feng, Y., Yuan, Y., Lu, X., 2017. Learning deep event models for crowd anomaly
detection. Neurocomputing 219, 548556.
Gong, Y., Wang, L., Guo, R., Lazebnik, S., 2014. Multiscale orderless pooling of deep
convolutional activation features. ECCV 392407.
Huang, G., Liu, Z., Van Der Maaten, L. and Weinberger, K.Q., 2017. Densely connected
convolutional networks. In Proceedings of the IEEE conference on computer vision
and pattern recognition (pp. 4700-4708). http://www.svcl.ucsd.edu/projects/
anomaly/dataset.htm.
Kim, J.; Grauman, K. Observe locally, infer globally: A space-time MRF for detecting
abnormal activities with incremental updates. In Proceedings of the 2009 IEEE
Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2025
June 2009; pp. 29212928.
Kratz, L.; Nishino, K. Anomaly detection in extremely crowded scenes using spatio-
temporal motion pattern models. In Proceedings of the 2009 IEEE Conference on
Computer Vision and Pattern Recognition, Miami, FL, USA, 2025 June 2009; pp.
14461453.
Fig. 6. Result analysis of DLADT-PW model on Test007 dataset.
Table 6
Average Analysis of Accuracy on Applied Dataset.
Methods DLADT-
PW
RS-
CNN
Fast R-
CNN
MDT MPPCA Social
Force
Test004 0.982 0.975 0.851 0.811 0.746 0.564
Test007 0.896 0.867 0.821 0.778 0.718 0.690
Fig. 7. Average accuracy analysis of DLADT-PW method on Test004 dataset.
Fig. 8. Average accuracy analysis of DLADT-PW method on Test007 dataset.
I.V. Pustokhina et al.
Safety Science 142 (2021) 105356
9
Li, Y., Xu, X. and Yuan, C., 2020. Enhanced Mask R-CNN for Chinese Food Image
Detection. Mathematical Problems in Engineering, 2020.
Li, W., Mahadevan, V., Vasconcelos, N., 2014. Anomaly detection and localization in
crowded scenes. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1832.
Liu, W.; Luo, W.; Lian, D.; Gao, S. Future frame prediction for anomaly detectionA new
baseline. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and
Pattern Recognition, Salt Lake City, UT, USA, 1822 June 2018; pp. 65366545.
Lu, C.; Shi, J.; Jia, J. Abnormal event detection at 150 FPS in MATLAB. In Proceedings of
the 2013 IEEE International Conference on Computer Vision, Sydney, NSW,
Australia, 18 December 2013; pp. 27202727.
Mahadevan, V.; Li, W.; Bhalodia, V.; Vasconcelos, N. Anomaly detection in crowded
scenes. In Proceedings of the 2010 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition, San Francisco, CA, USA, 1318 June 2010; pp.
19751981.
Murugan, B.S., Elhoseny, M., Shankar, K., Uthayakumar, J., 2019. Region-based scalable
smart system for anomaly detection in pedestrian walkways. Comput. Electr. Eng.
75, 146160.
Rahouti, A., Lovreglio, R., Gwynne, S., Jackson, P., Datoussaïd, S., Hunt, A., 2020.
Human behaviour during a healthcare facility evacuation drills: Investigation of pre-
evacuation and travel phases. Saf. Sci. 129, 104754.
Ribeiro, M., Lazzaretti, A.E., Lopes, H.S., 2018. A study of deep convolutional auto-
encoders for anomaly detection in videos. Pattern Recognit. Lett. 105, 1322.
Sabokrou, M., Fayyaz, M., Fathy, M., Klette, R., 2017. Deep-cascade: Cascading 3D deep
neural networks for nast anomaly detection and localization in crowded scenes. IEEE
Trans. Image Process. 26, 19922004.
Sabokrou, M., Fayyaz, M., Fathy, M., Moayed, Z., Klette, R., 2018. Deep-anomaly: Fully
convolutional neural network for fast anomaly detection in crowded scenes. Comput.
Vis. Image Underst. 172, 8897.
Shankar, K., Perumal, E., 2020. A novel hand-crafted with deep learning features based
fusion model for COVID-19 diagnosis and classication using chest X-ray images.
Complex Intell. Syst. https://doi.org/10.1007/s40747-020-00216-6.
Tsai, M.K., 2014. Automatically determining accidental falls in eld surveying: A case
study of integrating accelerometer determination and image recognition. Saf. Sci. 66,
1926.
Wester, M., Giesecke, J., 2019. Accepting surveillanceAn increased sense of security
after terror strikes? Saf. Sci. 120, 383387.
Xu, C., Wang, G., Yan, S., Yu, J., Zhang, B., Dai, S., Li, Y. and Xu, L., 2020. Fast Vehicle
and Pedestrian Detection Using Improved Mask R-CNN. Mathematical Problems in
Engineering, 2020.
Xu, D., Song, R., Wu, X., Li, N., Feng, W., Qian, H., 2014. Video anomaly detection based
on a hierarchical activity discovery within spatio-temporal contexts.
Neurocomputing 143, 144152.
Xu, B., Wang, W., Falzon, G., Kwan, P., Guo, L., Chen, G., Tait, A., Schneider, D., 2020.
Automated cattle counting using Mask R-CNN in quadcopter vision system. Comput.
Electron. Agric. 171, 105300.
Xu, D., Yan, Y., Ricci, E., Sebe, N., 2017. Detecting anomalous events in videos by
learning deep representations of appearance and motion. Comput. Vis. Image
Underst. 156, 117127.
Yang, E., Parvathy, V.S., Selvi, P.P., Shankar, K., Seo, C., Joshi, G.P., Yi, O., 2020. Privacy
Preservation in Edge Consumer Electronics by Combining Anomaly Detection with
Dynamic Attribute-Based Re-Encryption. Mathematics 8 (11), 1871.
Yu, B., Liu, Y., Sun, Q., 2017. A content-adaptively sparse reconstruction method for
abnormal events detection with low-rank property. IEEE Trans. Syst. Man Cybern.
Syst. 47, 704716.
Zhang, Z., Trivedi, C., Liu, X., 2018. Automated detection of grade-crossing-trespassing
near misses based on computer vision analysis of surveillance video data. Saf. Sci.
110, 276285.
I.V. Pustokhina et al.
... For decades, highly parameterized complex deep learningbased approaches have achieved significant interest in diverse applications such as action recognition [1,2], image classification [3,4], object detection [5,6], semantic segmentation [7][8][9], and so on [10][11][12][13][14] due to having great performances, i.e., richer representational capacity. This higher representational expertise comes at the cost of training millions of parameters through massive computations that limit the effectiveness of the deep learning-based approaches in low-computational resource-based IoT, edge, microservices, and mobile devices [15][16][17][18][19][20][21][22][23][24][25] in various real-time-based applications, for instance, autonomous driving [21,26,27], surveillance systems [22,23,28,29], real-time computing in edge and IoT devices [5,30] and so on [31,32]. To address this challenge, numerous lightweight networks have been proposed that reduce the required computational resources and execution time during inference by sacrificing performance in terms of accuracy [33][34][35][36][37]. ...
Article
Full-text available
The deployment of deep learning architectures on low-computational resource devices is challenging due to their high number of parameters and computational complexity. These heavy and complex architectures result in increased latency in real-time applications. However, splitting the deep architecture in a way that parallelizes the forward propagation into different subnets deploying into multiple low-computational resource devices, and then, aggregating the predictions may reduce the latency while preserving the performance. In this paper, we propose a novel deep learning architecture called Ensembled Parallel Networks (EnParaNets) that leverage network dissection, knowledge distillation, and ensemble learning strategies to reduce inference time while maintaining, even in some cases, outperforming the baseline accuracy in real-time applications. The methodology involves splitting the original network into N equal-sized blocks, forming N Sub-ParaNets for each block, and enhancing their representations using (A) contrastive knowledge distillation along with reducing Kullback–Leibler divergence between logits distributions of the teacher and student networks, and (B) L2 loss between intermediate representations of the original network and corresponding Sub-ParaNets. Predictive distributions from each Sub-ParaNet are assembled to form the final EnParaNet. The proposed EnParaNet outperforms the baseline models of seven diverse architectures: ResNet56, VGG_13, WRN_40_2, DenseNet, ResNeXt50, MobileNetv2, and ShuffleNetv2 in terms of accuracy while reducing inference time significantly using training methods A and B, respectively. Our proposed EnParaNet enhances ResNet56, VGG_13, WRN_40_2, MobileNetv2, DenseNet, ResNeXt50, and ShuffleNetv2 by 2.69%, 0.24%, 1.95%, 7.69%, 0.33%, 2.13%, and 3.12%, respectively, using training method A, where the inference time is reduced by 45%, 24%, 47%, 31%, 33%, 32%, and 44%, respectively. With training method B, EnParaNet achieves improvements of 1.75%, 2.90%, 1.09%, 3.91%, and 1.66%, with inference time reductions of 50%, 42%, 49%, 48%, and 49%, respectively. Moreover, a comprehensive ablation study analyzes the performance of the proposed technique and highlights its effectiveness and challenges. Furthermore, we also evaluate the performance of EnParaNet in transferability and adversarial robustness tasks.
... Pustokhina et al. [13] devised an automated DL-based AD technique in PW (DLADT-PW) for exposed road consumer security. In the first stage, the DLADT-PW method contains pre-processing, which is then used to extract the noise and increase image quality. ...
Article
Full-text available
Anomaly Detection (AD) in Pedestrian Walkways (PWs) is critical to urban security and safety systems. It is widely used to detect abnormal or unusual behaviours, situations, or events in areas dedicated to pedestrian traffic, namely crosswalks, sidewalks, or pedestrian bridges. The main objective is to improve efficiency, safety, and security in the urban environment by identifying deviations and monitoring pedestrian activities from established norms. This kind of AD typically includes surveillance cameras, sensors, and advanced software algorithms. Using advanced machine learning (ML) and computer vision (CV) approaches, this technique continuously monitors the pedestrian area to detect potential threats and irregularities. Deep Learning Assisted AD in Pedestrian Walkways presents a novel and very efficient method to enhance security and safety in urban environments. Therefore, this study designs an Intelligent Multi-Group Marine Predator Algorithm with Deep Learning Assisted Anomaly Detection (MMPADL-AD) in Pedestrian Walkways. The MMPADL-AD system aims to ensure security in PWs via the AD process. The MMPADL-AD technique incorporates a NASNet feature extractor that proficiently extracts high-level features from surveillance data, allowing a deep understanding of pedestrian behaviours. Besides, the MMPADL-AD technique applies convolutional long short-term memory (ConvLSTM), inheriting the benefits of convolutional neural networks) and LSTM for the AD process. Finally, the MMPA has been used for the hyperparameter tuning mechanism, which optimizes the model’s performance, assuring accuracy and adaptability. Benchmark data accompanied an extensive set of experiments to ensure the higher effectiveness of the MMPADL-AD approach. The experimental values highlighted the supremacy of the MMPADL-AD approach over other DL methods.
... The average results of the HHAODL-ODC technique on two datasets are given in Table 3; Fig. 8 [10,[22][23][24]. The results identified that the HHAODL-ODC technique properly categorized the class labels. ...
Article
Full-text available
Video surveillance has played a pivotal role in ensuring the safety and security of the public across different sectors, including retail, transportation, and urban environments. Object detection (OD) in surveillance videos includes identifying and localizing certain objects, namely vehicles, persons, or suspicious items. Conventional approaches often struggle with variations in complex backgrounds, lighting conditions, and occlusions. Deep learning (DL)-based techniques, particularly convolutional neural network (CNN), have shown outstanding performance in attaining accurate OD and handling these challenges. Despite the success of DL-based OD in surveillance videos, various challenges exist such as dealing with variations in camera viewpoints, recognizing small objects, ensuring robustness to adverse weather conditions, and handling occlusions. This study presents a Hybrid Harris Hawk-Arithmetic Optimization with Deep Learning-Driven Object Detection and Classification (HHAODL-ODC) method for Surveillance Video Analysis. The purpose of the study is to develop a HHAODL-ODC technique for object detection and classification. To accomplish this, the HHAODL-ODC technique follows two major components namely object detector and object classifier. Primarily, the HHAODL-ODC technique employs a YOLO-v5 object detector with EfficientNet as a backbone network. Next, the classification of objects takes place using the Spatial Angular-Stacked Sparse Autoencoder (SA-SSAE) model. The HHAO algorithm has been applied for the hyperparameter tuning process to improve the object classification results of the EfficientNet model. The stimulation validation of the HHAODL-ODC method is tested using a benchmark surveillance video dataset. The experimental outcomes highlighted the superior performance of the HHAODL-ODC algorithm over other DL techniques under various measures.
... Due to the better performance of CNN, the model has been used in many studies with different data and hand-made feature models [11][12][13][14]. Other neural network concepts like mask region CNN and dense network [15] are also used to achieve high accuracy. Still, weight optimization is complex due to the increased number of neurons. ...
Article
Full-text available
In surveillance video, crowd anomaly detection uses humans' position and orientation deviation. Encoding these positions is complicated and uses manual or handcrafted features for anomaly detection. It leads to high computation time, higher false positives and inaccurate detection. To resolve this issue, the novel deep learning approach progressive attention-based anomaly detection network (PA2DNet) is proposed for crowd anomaly detection. Keyframes are selected from the video, and optimal features are extracted with the VGG16 feature extraction technique. Then, the features are integrated with a discriminative enhancement module and classified with the progressive structure of a squeeze and concatenate module. The PA2DNet model hybridizes the progressive structure and the Pyramid Squeeze Attention module of the Pyramid Neural Network. The model's accuracy is improved by representing feature information at each input image dimension. The employed ensemble model of machine learning and deep learning achieves the goal of a highly accurate and faster anomaly detection method. The proposed approach is implemented with the UCSD dataset, and the performance is evaluated with precision, recall, f-measure, accuracy, etc. Using the proposed anomaly detection, an accuracy rate of 99% is achieved with a training time of 85.34 s.
... Eventually, S-137 sequence had an increased detection accuracy of 96.80%, while the DLADT-PW, RS-CNN, Fast R-CNN, and MDT algorithms had decreased detection accuracy of 88.96, 85.52, 84.48, and 83%, respectively. Finally, S-180 sequence had an increased detection accuracy of 96.47%, while the DLADT-PW, RS-CNN, Fast R-CNN, and MDT systems had decreased detection accuracy of 90.05, 86.15, 83.47, and 83.78%, correspondingly.The overall anomaly detection results of the SCADL-ADPW technique have been compared with other DL models inTable 3andFigure 6(Pustokhina et al., 2021). The outcomes notify the improved average accuracy results over other models. ...
Article
Full-text available
Anomaly detection in pedestrian walkways of visually impaired people (VIP) is a vital research area that utilizes remote sensing and aids to optimize pedestrian traffic and improve flow. Researchers and engineers can formulate effective tools and methods with the power of machine learning (ML) and computer vision (CV) to identifying anomalies (i.e. vehicles) and mitigate potential safety hazards in pedestrian walkways. With recent advancements in ML and deep learning (DL) areas, authors have found that the image recognition problem ought to be devised as a two-class classification problem. Therefore, this manuscript presents a new sine cosine algorithm with deep learning-based anomaly detection in pedestrian walkways (SCADL-ADPW) algorithm. The proposed SCADL-ADPW technique identifies the presence of anomalies in the pedestrian walkways on remote sensing images. The SCADL-ADPW techniques focus on the identification and classification of anomalies, i.e. vehicles in the pedestrian walkways of VIP. To accomplish this, the SCADL-ADPW technique uses the VGG-16 model for feature vector generation. In addition, the SCA approach is designed for the optimal hyperparameter tuning process. For anomaly detection, the long short-term memory (LSTM) method can be exploited. The experimental results of the SCADL-ADPW technique are studied on the UCSD anomaly detection dataset. The comparative outcomes stated the improved anomaly detection results of the SCADL-ADPW technique.
... A comparison study of the CBODL-RPD model in terms of TPR on the test-004 sequence is displayed in Table 4 and A brief analysis of the CBODL-RPD approach in terms of TPR on the test-007 sequence is displayed in Table 5 and of CBODL-RPD model on Test-007 sequence. Table 6 provides a comprehensive comparative study of the CBODL-RPD approach with recent approaches [28,29,30]. The figure revealed that the CBODL-RPD model had shown maximum performance with an increased of 95.54% and a reduced computation time of 2.10s. ...
Article
Full-text available
Pedestrian detection is a significant research topic in the computer vision (CV) domain for a longer period. Recently, deep learning (DL) and specifically convolutional neural network (CNN) exhibit significant improvement in the computer vision tasks such as object detection, segmentation, image classification, etc. With this motivation, this study develops a novel Colliding Bodies Optimization with Deep Learning based Robust Pedestrian Detection (CBODL-RPD) model. The goal of the CBODL-RPD approach is to identify the occurrence of pedestrians and non-pedestrians via object detection process. For object detection process, YOLO v4 with Adagrad optimizer is applied. In addition, the CBODL-RPD technique employs SqueezeNet model to generate feature vectors, and the hyperparameter tuning process is performed via the CBO algorithm. At last, deep belief network (DBN) model is applied for accurate pedestrian detection. A comprehensive experimental analysis is made to demonstrate the significant pedestrian detection results of the CBODL-RPD technique. The comparative outcome study reported the improved outcomes of the CBODL-RPD method over other recent methods.
... On the other hand, both normal and abnormal activities are used to train deep learning models in multi-model learning setting [17]. Several studies took advantage of the supervised deep learning to detect anomaly activities in videos [18]- [25]. Many deep learning models including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Long-Short Term Memory (LSTM), Gated Recurrent Units (GRUs), and Generative Adversarial Networks (GANs) are used for anomaly detection and prevention [26]- [28]. ...
Chapter
Road traffic safety discusses the procedures and measures utilized for preventing road users from being dead or critically injured. Archetypal road users contain horse riders, cyclists, pedestrians, vehicle passengers, motorists, and passengers of on-road public transport (mostly buses and trams). Anomaly detection in pedestrian pathways is a crucial investigation topic, generally utilized for improving pedestrian safety. Because of the varied consumption of video surveillance methods and the improved quantity of captured videos, the typical manual analysis of labeling abnormal proceedings is a tiresome task, thus, an automated surveillance method in which anomaly detection develops important betwixt computer vision researchers. At present, the progress of deep learning (DL) algorithms has obtained important interest in distinct computer vision procedures. Therefore, this article introduces a new Golden Jackal Optimization with Deep Learning-based Anomaly Detection in Pedestrian Walkways (GJODL-ADPW) for road traffic safety. The presented GJODL-ADPW technique aims to effectively recognize the presence of anomalies (such as vehicles, skaters) on pedestrian walkways. In the presented GJODL-ADPW technique, Xception methodology was exploited for effective extraction feature process. For optimal hyperparameter selection, the GJO algorithm is utilized in this study. Finally, bidirectional long short-term memory (BiLSTM) approach was employed for anomaly detection purposes. A widespread experimental analysis is performed to examine the enhanced performance of the GJODL-ADPW system. A detailed comparative analysis demonstrated the enhancements of the GJODL-ADPW technique over other recent approaches.KeywordsPedestrian walkwaysRoad safetySurveillance systemAnomaly detectionDeep learning
Article
Full-text available
A pedestrian’s assertiveness when crossing an intersection measures his or her willingness to cross under given conditions, and this level of assertiveness affects pedestrian crossing behavior and safety. Crossing assertiveness at an unsignalized intersection is a complex psychological decision affected by many features, such as the speeds and trajectories of oncoming vehicles, eye contact, facial expression, and hand gesture communications between pedestrians and drivers. To provide a comprehensive understanding of crossing assertiveness of pedestrians at unsignalized intersections, this study applied a pattern recognition method—association rules mining to uncover the patterns for different levels of crossing assertiveness, including assertive, neutral, and passive, using a unique naturalistic driving dataset. An elaborated feature engineering with the decision tree, gradient-boosting decision tree, and XGBoost with SHAP were utilized to select a distinct feature set as input of the Apriori algorithm to recognize the patterns. The results revealed that the driver’s facial expression, the driver’s initiative and passive yield, and the presence of the “yield-to-pedestrian” traffic sign were highly associated with assertive crossing. Features such as the absence of pedestrians on the crosswalk, the presence of incoming speeding vehicles, and the absence of traffic control signs were strongly related to passive crossing. Meanwhile, the number and position of pedestrians at the crosswalk or near the curbside, the communication between pedestrians and drivers, and who actively seeks eye contact were the three major features to convert crossing from neutral to assertive or passive. The results provided a unique and meaningful understanding of pedestrian crossing assertiveness at unsignalized intersections.
Article
Full-text available
Presently, sensor-cloud based environment becomes highly beneficial due to its applicability in several domains. Wireless multimedia sensor network (WMSN) is one among them, which involves a set of multimedia sensors to collect data about the deployed region. Compared to traditional object tracking models, animal tracking in WMSN is a tedious process owing to the harsh, dynamic, and energy limited sensors. This article introduces a new Reliable Multi-Object Tracking Model using Deep Learning (DL) and Energy Efficient WMSN. Initially, the fuzzy logic technique is employed to determine the cluster heads (CHs) to attain energy efficiency. Next, in the second stage, a novel tracking algorithm by the use of Recurrent Neural Network (RNN) with a tumbling effect called RNN-T is developed. The proposed RNN-T model gets executed by every sensor node and the CHs execute the tracking algorithm to track the animals. Finally, the tracking results are transmitted to the cloud server for investigation purposes. In order to assess the performance of the presented model, an extensive experimental analysis is carried out by the use of a real-time wildlife video. The obtained results ensured that the RNN-T model has achieved better performance over the compared methods in different aspects.
Article
Full-text available
COVID-19 pandemic is increasing in an exponential rate, with restricted accessibility of rapid test kits. So, the design and implementation of COVID-19 testing kits remain an open research problem. Several findings attained using radio-imaging approaches recommend that the images comprise important data related to coronaviruses. The application of recently developed artificial intelligence (AI) techniques, integrated with radiological imaging, is helpful in the precise diagnosis and classification of the disease. In this view, the current research paper presents a novel fusion model hand-crafted with deep learning features called FM-HCF-DLF model for diagnosis and classification of COVID-19. The proposed FM-HCF-DLF model comprises three major processes, namely Gaussian filtering-based preprocessing, FM for feature extraction and classification. FM model incorporates the fusion of handcrafted features with the help of local binary patterns (LBP) and deep learning (DL) features and it also utilizes convolutional neural network (CNN)-based Inception v3 technique. To further improve the performance of Inception v3 model, the learning rate scheduler using Adam optimizer is applied. At last, multilayer perceptron (MLP) is employed to carry out the classification process. The proposed FM-HCF-DLF model was experimentally validated using chest X-ray dataset. The experimental outcomes inferred that the proposed model yielded superior performance with maximum sensitivity of 93.61%, specificity of 94.56%, precision of 94.85%, accuracy of 94.08%, F score of 93.2% and kappa value of 93.5%.
Article
Full-text available
The expanding utilization of edge consumer electronic (ECE) components and other innovations allows medical devices to communicate with one another to distribute sensitive clinical information. This information is used by health care authorities, specialists and emergency clinics to offer enhanced medication and help. The security of client data is a major concern, since modification of data by hackers can be life-threatening. Therefore, we have developed a privacy preservation approach to protect the wearable sensor data gathered from wearable medical devices by means of an anomaly detection strategy using artificial intelligence combined with a novel dynamic attribute-based re-encryption (DABRE) method. Anomaly detection is accomplished through a modified artificial neural network (MANN) based on a gray wolf optimization (GWO) technique, where the training speed and classification accuracy are improved. Once the anomaly data are removed, the data are stored in the cloud, secured through the proposed DABRE approach for future use by doctors. Furthermore, in the proposed DABRE method, the biometric attributes, chosen dynamically, are considered for encryption. Moreover, if the user wishes, the data can be modified to be unrecoverable by re-encryption with the true attributes in the cloud. A detailed experimental analysis takes place to verify the superior performance of the proposed method. From the experimental results, it is evident that the proposed GWO–MANN model attained a maximum average detection rate (DR) of 95.818% and an accuracy of 95.092%. In addition, the DABRE method required a minimum average encryption time of 95.63 s and a decryption time of 108.7 s, respectively.
Article
Full-text available
Food image detection plays an essential role in visual object detection, considering its applicability in solutions that improve people’s nutritional status and thus their health-care. At present, most food detection technologies are aimed at Western food and Japanese food, but few at Chinese foods. In this work, we exert effort to establish a Chinese food image dataset called CF-108 that can be used as an essential data basis for Chinese food image detection. The CF-108 dataset contains most Chinese dishes and covers large variations in presentations of the same category. In addition, we introduce a training architecture that replaces the traditional convolution in mask region convolutional neural network (Mask R-CNN) with depthwise separable convolution, namely, Mask R-DSCNN, to reduce the expensive computation cost. Experiments demonstrate that Mask R-DSCNN can significantly reduce resource consumption and improve Chinese food images’ detection efficiency without hurting too much accuracy.
Article
Full-text available
This study presents a simple and effective Mask R-CNN algorithm for more rapid detection of vehicles and pedestrians. The method is of practical value for anticollision warning systems in intelligent driving. Deep neural networks with more layers have greater capacity but also have to perform more complicated calculations. To overcome this disadvantage, this study adopts a Resnet-86 network as a backbone that differs from the backbone structure of Resnet-101 in the Mask R-CNN algorithm within practical conditions. The results show that the Resnet-86 network can reduce the operation time and greatly improve accuracy. The detected vehicles and pedestrians are also screened out based on the Microsoft COCO dataset. The new dataset is formed by screening and supplementing COCO dataset, which makes the training of the algorithm more efficient. Perhaps, the most important part of our research is that we propose a new algorithm, Side Fusion FPN. The parameters in the algorithm have not increased, the amount of calculation has increased by less than 0.000001, and the mean average precision (mAP) has increased by 2.00 points. The results show that, compared with the algorithm of Mask R-CNN, our algorithm decreased the weight memory size by 9.43%, improved the training speed by 26.98%, improved the testing speed by 7.94%, decreased the value of loss by 0.26, and increased the value of mAP by 17.53 points.
Article
Full-text available
Evacuation models are key tools to assess the fire safety of complex buildings. Those tools and their results rely on the input values selected by users based on existing datasets and the way they are transformed into output. Several evacuation studies have been carried out to provide input values for evacuation models. However, those regarding healthcare facilities evacuation are still rare in the literature. In this paper, we present a new evacuation dataset for healthcare facility evacuation simulation. The data was collected from an outpatients’ area of a public hospital in Auckland (New Zealand) during two unannounced fire drills. The video images were analysed to generate new evacuation model inputs for healthcare facility evacuation scenarios. The drills involved both staff and patients. Pre-evacuation times, evacuee horizontal travel speeds, exit selection and total evacuation times were collected and analysed. Moreover, we investigated evacuee reactions and actions to study staff and patients’ interaction during the evacuation process. The results showed that pre-evacuation time of patients ranges from 8 to 63 s; while, pre-evacuation time of staff ranges from 8 to 141 s. In addition, during the movement phase, staff who were not assisting patients, and patients with no impairments, travelled at a similar average walking speed (i.e. 1.06 m/s for staff members and 0.93 m/s for patients with no impairments). Finally, the results indicated that the average travel speed of patients with walking impairments and staff assisting them was almost half of the travel speed of the first two groups (i.e. 0.52 m/s).
Article
Full-text available
The accurate and reliable counting of animals in quadcopter acquired imagery is one of the most promising but challenging tasks in intelligent livestock management in the future. In this paper we demonstrate the application of the cutting-edge instance segmentation framework, Mask R-CNN, in the context of cattle counting in different situations such as extensive production pastures and also in intensive housing such as feedlots. The optimal IoU threshold (0.5) and the full-appearance detection for the algorithm in this study are verified through performance evaluation. Experimental results in this research show the framework’s potential to perform reliably in offline quadcopter vision systems with an accuracy of 94% in counting cattle on pastures and 92% in feedlots. Compared with the existing typical competing algorithms, Mask R-CNN outperforms both in the counting accuracy and average precision especially on the datasets with occlusion and overlapping. Our research shows promising steps towards the incorporation of artificial intelligence using quadcopters for enhanced management of animals.
Article
Over the last two decades, several large terror attacks have led to increased discussion of the use of different surveillance technologies. The use of novel technologies for pre-emptive security and surveillance has been discussed and criticized academically, but few studies have addressed the public. Studies that target the public tend to assume an oversimplified trade-off between privacy and security, i.e. how much privacy a person is willing to yield to attain greater security. We used three large surveys of random samples from the Swedish population to study public attitudes to a number of surveillance technologies. The last survey took place shortly after a terror attack in Stockholm, and an aim was to see how this affected acceptance. The main differences between 2009 and 2017 were that the demand for transparency (i.e. public scrutiny) had increased dramatically, and that the notion of risk posed by the new technologies had diminished. Beyond this, changes in attitudes were small. Technologies were perceived as contributing to making society safer – albeit not decisively. Also, acceptance was not only influenced by what data was collected, but also by who was collecting and owning it. In public discussions about security, two things are often assumed: that an increase of hard security measures will increase societal security, and that citizens are willing to do a trade-off between privacy and security. We find that this is not the case. Instead, citizens weigh the pros and cons of surveillance and also distinguish between different forms of surveillance.
Article
Different-sized anomalies and its occurrence in a shorter period have always been an open research issue. To resolve the issue of detecting anomalies of different sizes, especially in pedestrian pathways, within a shorter time period, the current research article introduced a Region based Scalable Convolution Neural Network (RS-CNN). The proposed method used region based proposals for faster identification and performed well with the scalability issues. The RS-CNN model was validated using different video sequences from the UCSD anomaly detection dataset. When compared with state-of-the-art detection techniques such as Fast R-CNN, Minimization of Drive Testing (MDT), Mixtures of Probabilistic Principal Component Analyzers (MPPCA) and Social Force (SF), the RS-CNN model was found to be faster and efficient even in the presence of anomalies of various sizes.