ArticlePDF Available

An automated deep learning based anomaly detection in pedestrian walkways for vulnerable road users safety

June 2021
Safety Science 142:105356

June 2021
142:105356

Authors:

Irina Pustokhina

State University of Management

Denis Pustokhin

Financial University under the Government of the Russian Federation

Thavavel Vaiyapuri

Prince Sattam bin Abdulaziz University

Deepak Gupta

Maharaja Agarsain Institute of Technology

Show all 6 authorsHide

Anomaly detection in pedestrian walkways is an important research topic, commonly used to improve the safety of pedestrians. Due to the wide utilization of video surveillance systems and the increased quantity of captured videos, the traditional manual examination of labeling abnormal events is a tiresome task. So, an automated surveillance system that detects anomalies becomes essential among computer vision researchers. Presently, the development of deep learning (DL) models has gained significant interest in different computer vision processes namely object classification and object detection, and these applications were depending on supervised learning that required labels. Therefore, this paper develops an automated deep learning based anomaly detection technique in pedestrian walkways (DLADT-PW) for vulnerable road user's safety. The goal of the DLADT-PW model is to detect and classify the various anomalies that exist in the pedestrian walkways such as cars, skating, jeep, etc. The DLADT-PW model involves preprocessing as the primary step, which is applied for removing the noise and raise the quality of the image. In addition, mask region convolutional neural network (Mask-RCNN) with densely connected networks (DenseNet) model is employed for the detection process. To ensure the better anomaly detection performance of the DLADT-PW technique, an extensive set of simulations were performed and the outcomes are investigated under distinct aspects. The obtained experimental values confirmed the superior characteristics of the DLADT-PW technique by achieving a maximum detection accuracy.

DenseNet Architecture.

…

Layers in DenseNet-169.

…

Sample images.

…

(a) Test Image (b) Anomaly Detected Image.

…

Result analysis of DLADT-PW model on Test007 dataset.

…

Content may be subject to copyright.

Content uploaded by Sachin Kumar

Content may be subject to copyright.

Safety Science 142 (2021) 105356

Available online 10 June 2021

An automated deep learning based anomaly detection in pedestrian

walkways for vulnerable road users safety

Irina V. Pustokhina

, Denis A. Pustokhin

, Thavavel Vaiyapuri

, Deepak Gupta

Sachin Kumar

, K. Shankar

Department of Entrepreneurship and Logistics, Plekhanov Russian University of Economics, 117997 Moscow, Russia

Department of Logistics, State University of Management, 109542 Moscow, Russia

College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Saudi Arabia

Department of Computer Science & Engineering, Maharaja Agrasen Institute of Technology, Delhi, India

Department of Computer Science, South Ural State University, Chelyabinsk, Russian Federation

Department of Computer Applications, Alagappa University, Karaikudi, India

ARTICLE INFO

Keywords:

Anomaly detection

Pedestrian walkways

Deep learning

Safety

Mask RCNN

ABSTRACT

Anomaly detection in pedestrian walkways is an important research topic, commonly used to improve the safety

of pedestrians. Due to the wide utilization of video surveillance systems and the increased quantity of captured

videos, the traditional manual examination of labeling abnormal events is a tiresome task. So, an automated

surveillance system that detects anomalies becomes essential among computer vision researchers. Presently, the

development of deep learning (DL) models has gained signicant interest in different computer vision processes

namely object classication and object detection, and these applications were depending on supervised learning

that required labels. Therefore, this paper develops an automated deep learning based anomaly detection

technique in pedestrian walkways (DLADT-PW) for vulnerable road user’s safety. The goal of the DLADT-PW

model is to detect and classify the various anomalies that exist in the pedestrian walkways such as cars,

skating, jeep, etc. The DLADT-PW model involves preprocessing as the primary step, which is applied for

removing the noise and raise the quality of the image. In addition, mask region convolutional neural network

(Mask-RCNN) with densely connected networks (DenseNet) model is employed for the detection process. To

ensure the better anomaly detection performance of the DLADT-PW technique, an extensive set of simulations

were performed and the outcomes are investigated under distinct aspects. The obtained experimental values

conrmed the superior characteristics of the DLADT-PW technique by achieving a maximum detection accuracy.

1. Introduction

Annually, more than 270 000 pedestrians lose their lives on the

world’s roads. The capacity to respond to pedestrian safety is an

important component of efforts to prevent road trafc injuries. Pedes-

trian collisions, like other road trafc crashes, should not be accepted as

inevitable because they are, in fact, both predictable and preventable.

Recent technological advances like computer vision (CV), surveillance

cameras (CCTV), etc. can be used to protect pedestrians and promote

safe walking require an understanding of the nature of risk factors for

pedestrian crashes. This study aims to ensure the safety of pedestrians

using computer vision techniques. An extensive application of

surveillance cameras (CCTV) in public places led the CV centric model to

learn the reputation over the CV research team. The captured visual data

is composed of enriched details which are accurate when compared with

alternate data sources like GPS, mobile communication, radar signals,

and so on. Also, it plays a major role in forecasting congestion, accidents,

and some other abnormal activities by gathering details regarding the

condition of road trafc. The use of CCTV nds helpful in several real

time applications (Zhang et al., 2018; Cocca et al., 2016; Wester and

Giesecke, 2019; Tsai, 2014; Rahouti et al., 2020) like grade-crossing-

trespassing, industrial safety management, accidental fall, etc.

Numerous computer vision depends on works have been proposed by

concentrating on operations like data acquisition, feature extraction,

* Corresponding author.

E-mail addresses: ivpustokhina@yandex.ru (I.V. Pustokhina), dpustokhin@yandex.ru (D.A. Pustokhin), t.thangam@psau.edu.sa (T. Vaiyapuri), deepakgupta@

mait.ac.in (D. Gupta), sachinagnihotri16@gmail.com (S. Kumar), drkshankar@ieee.org (K. Shankar).

Contents lists available at ScienceDirect

Safety Science

journal homepage: www.elsevier.com/locate/safety

https://doi.org/10.1016/j.ssci.2021.105356

Received 22 January 2021; Received in revised form 2 May 2021; Accepted 24 May 2021

Safety Science 142 (2021) 105356

scene learning, activity learning, behavioral learning, and so forth. The

basic aim of these studies is to compute the operations like scene

detection, video processing models, anomaly prediction approaches,

vehicle prediction and observation, multi camera-relied schemes and

challenges, activity examination, trafc observation, human behavior

learning, and so on. Here, anomalous prediction is considered to be a

sub-domain of behavior learning from the captured visual scenes. The

accessibility of video from public places has resulted in the simulation of

video analysis as well as anomalous prediction (Rahouti et al., 2020).

Moreover, anomalous prediction approaches understand the common

behavior by the training process. Any signicant change from normal

behavior is considered to be anomalous. The existence of vehicles on

pathways, unexpected dispersion of people from a crowd, person faints

whereas walking, jaywalking, signal bypassing at a trafc junction, U-

turn of vehicles in red signals are some of the common examples of

anomalies.

In general, anomaly prediction approaches apply unsupervised as

well as semi-supervised learning. An important aim of this work is for

nding the anomaly prediction schemes applied in road trafc cases and

concentrates on the utilities like vehicles, trespassers, atmosphere, and

communication. It has been pointed that, the scope of this has to enclose

the nature of input data as well as the representations, possibility of

supervised learning, class of abnormalities, the capability of the systems

in application content, anomaly prediction results as well as termination

criteria. The anomaly prediction mechanism is operated by under-

standing the common data patterns for developing a public prole.

When the general patterns are dened, anomalies could be predicted

using newly developed schemes. Hence, the simulation of the model is a

label that predicts whether data is abnormal or healthy.

Recently, diverse models were deployed for computing pedestrian

prediction which suits the bounding boxes for a pedestrian available in

an image. It has gained maximum attention from the developers of

computer vision and the signicant element for diverse human-based

domains such as driverless cars, automated trafc signaling, person

examination, etc. However, the predened models are unt for resolving

the complexity of a model named scaling problem that remains the same

and causes the outcome of pedestrian detection approach. The tradi-

tional approaches have managed to solve the scaling problem on the 2D

scale. First, brute-force data is augmented to improve the capability of

the scale-invariance model. Followed by, a single method with multiple

scale lters was employed in all samples with diverse sizes. However,

the presence of intra-class variance of maximum and tiny samples is

complicated to overcome the signicantly varied feature responses

along with individual approaches. To make use of drastically differing

attributes with varied scales, the divide-and-conquer paradigm can be

applied (Gong et al., 2014) for resolving the complicated scale variance

problem.

Ultimately, Deep Learning (DL) relied on anomaly prediction

methods are deployed. Initially, Convolution Neural Network (CNN) has

been employed and categorized the presence of objects. It has experi-

enced few issues like massive spatial locations as well as aspect ratios of

objects from an image. In order to overcome these problems, a number

of regions have to be selected and results in processing complexity. Thus,

region-based CNN (R-CNN) and YOLO have been established to nd the

incidence at a robust rate. Here, a novel approach has been developed

and overcome the problems involved in selecting a maximum number of

regions and employed a selective search mechanism to extract images

called region proposals. Finally, the selective search model has gener-

ated a maximum number of regions.

In order to enhance the safety of pedestrians, this paper designs a

novel DL based anomaly detection technique in pedestrian walkways

(DLADT-PW). The DLADT-PW model aims to recognize and categorize

the dissimilar anomalies present in the pedestrian walkways such as

cars, skating, jeep, etc. The DLADT-PW model includes preprocessing as

the primary step, which is applied to eradicate the noise and increase the

quality of the image. Besides, mask region convolutional neural network

(Mask-RCNN) with densely connected networks (DenseNet) model is

applied for the detection process. For verifying the superior anomaly

detection performance of the DLADT-PW technique, a wide set of sim-

ulations were accomplished and the results are inspected under distinct

aspects.

The rest of the paper is organized as follows. Section 2 briefs the

existing works and section 3 discusses the proposed model. Then, section

4 validates the performance of the proposed model and section 5 con-

cludes the paper.

2. Literature review

This section intends to survey an extensive set of available hand-

crafted feature based anomaly detection techniques and deep learning

based anomaly detection techniques.

2.1. Hand-Crafted features based method

In general, 3 components could be ltered from hand-engineered

features relied on the anomalous prediction approach. In case of a

feature extraction system, diverse feature descriptions have been

developed (Yang et al., 2020). At this point, low-level trajectory attri-

butes from series of images have been applied to dene the normal

movement patterns. But, the above-mentioned approaches are concen-

trated on anomaly affected by a crowd rather than a single object is

considered as a basic element. Hence, the trajectory features are

depending upon crowd monitoring and these approaches are unt in

handling single object anomaly prediction (Alqaralleh et al., 2020).

Additionally, the trajectory features have minimum-level spatio-tem-

poral features like the histogram of oriented ows (HOF) as well as the

histogram of oriented gradients (HOG). Kratz and Nishino (Kratz and

Nishino, 2009) utilized the dispersion of spatiotemporal gradients to

demonstrate the appropriate motion details in local spatiotemporal

motion. In (Xu et al., 2014), the motion feature depicted by the histo-

gram of optical ow is employed as a low-level feature for motion-

pattern denition.

Kim and Grauman (Kim and Grauman, 2009) utilized the mixture of

probabilistic principal component analyzers (MPPCA) methods for

dening the local activity patterns using optical ow as low-level met-

rics. Mahadevan et al. (Mahadevan et al., 2010) examined a technology

for normal crowd features that relied on mixtures of dynamic textures

(MDT) and Li et al. (Li et al., 2014) employed a Conditional Random

Field (CRF) to combine the results according to the given application.

For modeling, the existence, as well as motion features from PCA, Feng

et al. (Feng et al., 2017), established a deep Gaussian mixture model

(GMM). Moreover, few sparse coding approaches were employed for

encoding the normal patterns. Next, the normal dictionary has been

learned from over complete normal basis set and the sparse reforming

cost has been applied for measuring the common feature of the testing

sample. Eventually, the training and testing process can be triggered by,

Lu et al. (2013) with the help of several dictionaries for encoding the

normal size-invariant blocks from multiscale frames. Yu et al. (2017)

visualized the low-rank feature of bases from dictionary learning state,

afterward, a weighted sparse reformation scheme has been applied for

measuring the abnormality of samples.

2.2. Deep learning based method

Recently, DL frameworks are employed in massive computer vision

process effectively, and in anomaly, prediction works (Alqaralleh et al.,

2020). Mostly, convolutional AE or fully convolutional systems have

been applied for reforming a novel group of frames. In case of sequence

video frames with no abnormalities, Liu et al. (2018) have trained Fully

Convolutional Network (FCN) approach which mimics the U-Net for

predicting the consecutive frame. Followed by, the deviations among

predicted frame and corresponding ground truth frames were applied

I.V. Pustokhina et al.

Safety Science 142 (2021) 105356

for predicting the anomalies in the detection state. Ribeiro et al. (2018)

utilized the outcome of convolutional AE which has been assumed for

redeveloping input frame sequences. Since the AE is trained under the

application of normal video sequences, reconstruction error has been

employed as an anomaly value. Therefore, the better applicability, as

well as normalization of Deep Neural Network (DNN), the consideration

of anomalous events, may accelerate maximum reconstruction errors.

Hence, the main objective of these models is extracting features using AE

and predicting the anomalies by probability estimation of features.

Sabokrou et al. (Sabokrou et al., 2017) developed a deep convolu-

tional neural network (DCNN) along with the kernels equipped by using

sparse AE (SAE). Considering the cubic patches obtained from actual

images as inputs, feature maps from 3 middle and nal layers have been

induced as the Gaussian classication model. In Xu et al. (Xu et al.,

2017), 3 stacked denoising AE have been projected for learning the

spatial features, temporal features, and the mixture of these 2 models.

Next, 3 one-class SVM methods are employed for estimating the learned

features and examine the anomaly values. In addition, Sabokrou et al.

(Sabokrou et al., 2018) have projected a pre-trained CNN and inter-

cepted it as a feature extractor into FCN which is capable of extracting

the features for receptive eld with no cropping the input frames as

patches.

3. The proposed DLADT-PW technique

The workow involved in the presented DLADT-PW technique is

given here. As depicted, the surveillance video is primarily converted

into a set of frames, and anomalies are detected in each frame. Next, the

preprocessing is performed to improve the quality of the image. Fol-

lowed by, the Mask RCNN model is applied for the detection of anom-

alies and DenseNet 169 model is utilized as the baseline network for

Mask RCNN. At last, the anomalies that exist in the frame are success-

fully identied and classied.

3.1. Preprocessing

In general, the collected data is complicated and composed of inap-

propriate images, blurred images, and noisy images. Followed by, the

data is subjected to clean, smooth, and label so that the dataset quality is

enhanced. At this point, eliminate the noisy images. Next, Median

Filtering is used for smoothening the image noise caused by several

unwanted objects on target objects. An image histogram is one of the

typical approaches used for data cleaning. The actual purpose of this

model for transforming the image into a histogram and apply a corre-

lation coefcient model for identifying the image homogeneity. The

estimation formula of the correlation coefcient is given below:

r(x,y) = Cov(x,y)

̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅

Var[x]Var[y]

√,(1)

where x,y indicates the histogram outcome of 2 images, Var[x]de-

nes the covariance of x, Var[y]refers the covariance of y,and Cov(x,y)

represents the covariance from x and y.Besides, the score of the applied

function is ranked from [-1 to 1]. Moreover, the estimated outcome has

most of the similarities between the 2 images. Here, the histogram

model has been applied for removing the gathered images. In prior to

applying the histogram, eliminate the irregular images which are often

irrelevant. Afterward, select an appropriate image for a class and esti-

mate the correlation coefcient among correct image and residual

images.

A major variance between image and noise is considered to be the

extension of the gray level. Therefore, the visual obstacle of an image is

caused by a drastic difference between the gray level of noise and the

corresponding gray level (Li et al., 2020). Thus, an image smoothing

mechanism is applied for removing the noise with the help of grayscale

variations.

3.2. Mask R-CNN

In general, Mask RCNN is an elegant, exible, and common approach

used for object prediction, and instance segmentation which is capable

of predicting the objects available in an image at the time of generating a

high-dimension segmentation mask. Feature Pyramid Networks (FPNs)

are employed for object prediction and the rst block architecture of

Mask R-CNN is applied for feature extraction. Hence, Regional Proposal

Network (RPN), is considered to be the second block of Mask RCNN

which distributes the convolutional features in conjunction with the

prediction system and enables the cost-free RP. The RPN is also

employed to Mask RCNN rather than using selective search and the RPN

distribution of convolution feature in full map with the detection sys-

tem. It is also capable of predicting boundary location and object values

in every position and FCN.

In order to enhance the forecast accuracy of the method, Mask RCNN

applies the bilinear interpolation approach named region of interest

(ROI) according to the Faster RCNN. Also, ROI aligns layer eliminates

the harsh quantization of the ROI pool as well as aligns the obtained

features properly with the input (Xu et al., 2020). Afterward, this

approach of ROI alignment is applied for computing the accurate mea-

sures of input features based on bilinear interpolation at regularly

sampled positions in all ROI bins to accumulate the simulation outcome.

Mask R-CNN is suitable in computing 3 processes like target detection,

prediction, and segmentation. Here, when the image is conveyed by

FPN, 5 sets of feature maps are produced with different sizes, and the

candidate frame region is emanated by the RPN. Classication detection

in Mask RCNN is relevant with mask branch and applied for gaining the

spatial structure of object with the help of pixel-to-pixel organization

from convolutional layers which undergoes encoding. By means of po-

tential misalignment among input as well as feature maps without ROI

Pooling, Roi Align has been applied in the Mask RCNN by applying

bilinear interpolation to enhance the model accuracy.

3.2.1. RPn

In feature maps from convolutional layers and network proceeds

convolutional process on 3*3 pixels sliding window. A point in feature

maps emanates feature codes for respective window regions that

concern the minimum-dimensional feature codes of dimensions from

Mask RCNN. Followed by, ranking of classication values from initial

regression feature boxes which are decided, and the values of relevant

coordinates undergo decoding as accurate coordinates by the given Eqs.

(2) and (3):

tx= (x−xa)/wa,ty= ( − ya)/ha(2)

tw=log(w/wa),th= (h/ha)(3)

Where (xa,ya)implies the manages of the center of anchor and (wa,

ha)denotes the height as well as the width of the anchor. (x,y)depicts

the direct of middle forecasted ROI in actual image and (w,h)denes the

height and width of ROI detected in the ground truth image. (tx,ty)

signies the regression score of coordinates and (tw,th)represents the

regression score of the height and width on the feature map. In partic-

ular, when the measure of intersection-over-union (IoU) from the

detected bounding boxes from ROI along with ground truths are

maximum than the dened threshold where the targets in ROI are

considered as a foreground as well as background.

3.2.2. Loss function

In multi-task loss function has been applied in training Mask RCNN

with 3 portions namely, classication loss of bounding box, location

regression loss of bounding box as well as loss of mask as depicted by the

given function.

L=Lcls +Lbox +Lmask (4)

I.V. Pustokhina et al.

Safety Science 142 (2021) 105356

Lcls = − log[pi*pi+(1−p*

i)(1−pi)](5)

Lbox =r(ti−t*

i)(6)

Lmask =Sigmoid(Clsk)(7)

Where pi denotes the detected probability for ROI in classication

loss Lcls and p*

i used to ground truth as 1 when the ROI is assumed as

foreground or 0 else. ti denotes the vector of accurate manages to

detected bounding box (Eq. (6)) and t*

i refers to the ground truth from

position regression loss in which r means the robust loss function to

estimate the regression error (Xu et al., 2020). Every ROI detects the

result of K*m∧2 dimensions by using mask branch and encoding K binary

masks along with a resolution of m*m. The loss of mask Lmask is assumed

as the Average Binary Cross-entropy Loss to perform the sigmoid func-

tion on every pixel from ROI. In class k(Clsk), the mask loss is depicted in

Eq. (7).

3.3. DenseNet 169 model

The baseline of the Mask RCNN contains the DenseNet-169 model.

The DenseNet structure is developed from ResNet, which is comprised of

a building block where it is unied with the former layer. Here, excess

merges are employed to learn residuals-based errors. DenseNet has

projected the mixture of outcomes obtained from previous layers despite

using the combination. Consider the single image x0 is passed by CNN.

This network is composed of L layers, in which non-linear trans-

formationHl(

A⋅)is implemented, where l refers to the layer indexes.

Hl(

A⋅)means the composite function like Batch Normalization (BN),

Rectied Linear Units (ReLU), Pooling, or Conv. A nal result of lth layer

is represented by xl. FFNN connects the outcome of lth layer as input for

(l +1)th layer that intends to generate layer transition: xl=Hl(xl−1).

ResNets has a skip-connection that bypasses non-linear conversion

under the application of a given identity function:

xl=Hl(xl−1) + xl−1(8)

The advantages of ResNets are that the gradient controls are directed

from recent layers to existing layers and it is accomplished with the help

of identity function. Hence, the unication of identity function and

simulation outcome of Hl obstructs the data communication. Moreover,

data ow is improvised under the application of multiple connectivity

patterns (Huang et al., 2017). Fig. 1 implies the outline of the last

DenseNet structure. At last, lth layer has gained the feature-maps of

advanced layers, x0,⋯,xl−1, as input:

xl=Hl([x0,x1,⋯,xl−1] ),(9)

where[x0,x1,⋯,xl−1]denotes s the mixture of feature-maps gener-

ated in layers 0,⋯,l−1. DenseNet is mainly applied to managing

numerous connectivity. It has been executed with the help of massive

inputs of Hl(

A⋅)in Eq. (4) as an individual tensor. The integration used in

Eq. (4) is non-feasible if there is a prominent modication in feature map

size. Also, down-sampling is employed and classify the network as

densely connected blocks. The transition layers used in this study are

comprised of BN layers, Conv layer, and average pooling layer.

The function Hl offers k feature maps that apply lth layer with

k0+k× (l−1)input feature-maps, where k0 indicates the channels with

the input layer. The drastic difference between DenseNet and former

networks is, DenseNet is limited with narrow layers and represented by

k =12. A layer has k feature-maps of the corresponding state in which

the growth rate generalizes data into a global state. It is noted that 1 ×1

Conv is assumed as bottleneck layer prior to use 3 ×3 Conv limits of

input feature-maps, and enhances the computational efcacy. Most of

the time, DenseNet is effective and system with bottleneck layer. Fig. 2

demonstrates the layers in DenseNet-169.

4. Experimental validation

The proposed model is simulated using Python 3.6.5 tool. For vali-

dation, UCSD Anomaly Detection Dataset (Murugan et al., 2019) is

utilized for the training and testing of the proposed model. In UCSD

Anomaly Detection Dataset required a group of images taken from a

static camera located at an elevation overlooking pedestrian pathways.

A crowd density in the pathway is not static as well as ranged from

sparse to over-crowd. In normal cases, the video required only pedes-

trians while the abnormal performances or anomalies contained the

effort of non-pedestrian entities in the walkways. Anomalies occur in the

videos like bikers, skaters, vehicles, tiny carts, as well as people walking

through pathways or in the grass that surrounds it. Details of the dataset

are provided in Table 1. Besides, the parameter setting is given as fol-

lows. Batch Size: 64, Optimizer: Adam, Epoch: 100, Learning rate:

0.001, and Activation function: ReLU.

Fig. 3 demonstrates a sample set of images from the UCSD dataset.

The image contains the pedestrians with some anomalies.

Fig. 4 visualizes the results offered by the presented DLADT-PW

technique on the applied UCSD dataset. Fig. 4a shows the test image

involving a set of pedestrians with some anomalies. Fig. 4b shows the

detection of two anomalies that exist on the applied input frame. The

gure noties that the DLADT-PW technique has effectively identied

the anomalies.

Table 2 has portrayed the detection accuracy of anomalies of the

proposed DLADT-PW model on the applied test004 video sequence. The

resultant table values denoted the procient anomaly detection perfor-

mance of the DLADT-PW model. For instance, anomaly 1 in the frames

078, 091, 092, and 110 are detected with the maximum accuracy of

0.95, 0.96, 0.97, and 0.98 respectively. Besides, the anomalies in the rest

of the frames such as 113, 115, 125, 142, 146, 147, 148, 150, 178, 179,

and 180 are detected with the maximum identical accuracy of 0.99.

Similarly, anomaly 2 in frames 125, 142, and 146 are noticed with high

Fig. 1. DenseNet Architecture.

I.V. Pustokhina et al.

Safety Science 142 (2021) 105356

accuracy of 0.97, 0.98, and 0.97 respectively. Also, the anomalies in the

rest of the frames such as 147, 148, 150, 178, 179, and 180 are identied

with the maximum identical accuracy of 0.99.

Table 3 and Fig. 5 provided the analysis of the comparative result of

the DLADT-PW technique with existing models on the applied Test004

sequence. The values that exist in the table denoted that the SF model

has failed to showcase effective detection performance over all the other

methods. At the same time, the MDT and MPPCA models have depicted

slightly improved outcomes over the SF model. Concurrently, the Fast R-

CNN model has demonstrated moderate outcome whereas a near-

optimal detection rate is accomplished by the RS-CNN model. Howev-

er, the presented DLADT-PW model has resulted in maximum detection

performance over all the other compared methods. For instance, on the

applied frame of 040, the DLADT-PW model has obtained a maximum

accuracy of 0.950 whereas the RS-CNN, Fast R-CNN, MDT, MPPCA, and

SF models have led to reduced accuracy of 0.940, 0.819, 0.768, 0.744,

and 0.524 respectively. Along with that, on the applied frame of 046, the

DLADT-PW method has attained a higher accuracy of 0.970 whereas the

RS-CNN, Fast R-CNN, MDT, MPPCA, and SF models have led to reduced

accuracy of 0.950, 0.853, 0.752, 0.768, and 0.536 correspondingly.

Eventually, on the applied frame of 106, the DLADT-PW model has

achieved a maximum accuracy of 0.990 but the RS-CNN, Fast R-CNN,

MDT, MPPCA, and SF methodologies have led to reduced accuracy of

0.990, 0.912, 0.834, 0.723, and 0.513 respectively. Furthermore, on the

applied frame of 136, the DLADT-PW model has obtained a superior

accuracy of 0.980 whereas the RS-CNN, Fast R-CNN, MDT, MPPCA, and

SF models have led to reduced accuracy of 0.980, 0.946, 0.839, 0.713,

and 0.632 correspondingly.

In the same way, on the applied frame of 158, the DLADT-PW model

has obtained a maximum accuracy of 0.990 but the RS-CNN, Fast R-

CNN, MDT, MPPCA, and SF manners have led to reduced accuracy of

0.990, 0.771, 0.783, 0.716, and 0.544 respectively. Moreover, on the

applied frame of 180, the DLADT-PW model has reached a higher ac-

curacy of 0.990 whereas the RS-CNN, Fast R-CNN, MDT, MPPCA, and SF

models have led to reduced accuracy of 0.990, 0.853, 0.852, 0.704, and

0.605 correspondingly.

Table 4 has showcased the detection accuracy of anomalies of the

DLADT-PW model on the applied test007 video sequence. The resultant

table values referred the procient anomaly detection performance of

the DLADT-PW model. For instance, the anomaly 1 in the frames 078,

091, 092, 110, 113, 115, 125, 142, 146, 147, 148, 150, 178, 179, and

180 are detected with the highest accuracy of 0.95, 0.97, 0.99, 0.95,

0.99, 0.99, 0.98, 0.99, 0.98, 0.99, 0.99, 0.96, 0.89, 0.88, and 0.97

correspondingly. Likewise, the anomaly 2 in the frames 078, 091, 092,

110, 113, 115, 125, 142, 146, 147, 148, 150, 178, 179, and 180 are

noticed with the superior accuracy of 0.96, 0.99, 0.97, 0.95, 0.83, 0.60,

0.98, 0.95, 0.70, 0.80, 0.60, 0.90, 0.86, 0.78, and 0.88 respectively.

Besides, the anomaly 3 in the frames 110, 113, 115, 125, 142, 147, 148,

178, 179, and 180 are noticed with the maximum accuracy of 0.99, 0.99,

0.99, 0.99, 0.97, 0.80, 0.60, 0.65, 0.70, and 0.75 correspondingly.

Table 5 and Fig. 6 demonstrated the comparative outcomes analysis

of the DLADT-PW technique with existing techniques on the applied

Test007 sequence. The values in the table signied that the SF model has

failed to exhibited effective detection performance over all the other

techniques. In line with, the MDT and MPPCA models have demon-

strated somewhat increased result over the SF model. Concurrently, the

Fast R-CNN model has demonstrated moderate outcome whereas a near-

optimal detection rate is accomplished by the RS-CNN model. However,

the proposed DLADT-PW model has resulted in higher detection per-

formance over all the other compared models. For instance, on the

applied frame of 040, the DLADT-PW model has attained a superior

accuracy of 0.955 while the RS-CNN, Fast R-CNN, MDT, MPPCA, and SF

models have led to reduced accuracy of 0.940, 0.892, 0.842, 0.758, and

0.636 respectively. Likewise, on the applied frame of 046, the DLADT-

PW model has obtained a superior accuracy of 0.980 whereas the RS-

CNN, Fast R-CNN, MDT, MPPCA, and SF models have led to reduced

accuracy of 0.975, 0.928, 0.860, 0.658, and 0.709 respectively. Even-

tually, on the applied frame of 106, the DLADT-PW manner has obtained

a maximum accuracy of 0.860 while the RS-CNN, Fast R-CNN, MDT,

MPPCA, and SF models have led to reduced accuracy of 0.833, 0.829,

0.824, 0.704, and 0.652 respectively. Moreover, on the applied frame of

Fig. 2. Layers in DenseNet-169.

Table 1

Description of dataset.

Dataset Testbed Frames Time (sec)

UCSDped2 Test007 360 12

Test004

I.V. Pustokhina et al.

Safety Science 142 (2021) 105356

136, the DLADT-PW model has obtained a maximum accuracy of 0.840

whereas the RS-CNN, Fast R-CNN, MDT, MPPCA, and SF methodologies

have led to reduced accuracy of 0.835, 0.798, 0.748, 0.788, and 0.687

respectively.

Also, on the applied frame of 158, the DLADT-PW model has ach-

ieved a maximum accuracy of 0.930 but the RS-CNN, Fast R-CNN, MDT,

MPPCA, and SF approaches have led to reduced accuracy of 0.870,

0.825, 0.811, 0.709, and 0.699 correspondingly. Eventually, on the

applied frame of 180, the DLADT-PW model has reached a maximum

accuracy of 0.867 whereas the RS-CNN, Fast R-CNN, MDT, MPPCA, and

SF models have led to reduced accuracy of 0.830, 0.808, 0.799, 0.769,

and 0.705 correspondingly.

Table 6 investigates the average accuracy analysis of the DLADT-PW

with existing models on the applied dataset (Shankar and Perumal,

2020). Fig. 7 examines the average accuracy analysis of the DLADT-PW

with existing models on the applied Test004 dataset. From the gure, it

is evident that the MPPCA and SF models have obtained a reduced

average accuracy of 0.746 and 0.564 respectively. Followed by, a

slightly improved average accuracy of 0.851 and 0.811 have been ob-

tained by the Fast R-CNN and MDT models. Though the RS-CNN

approach has led to a competitive average accuracy of 0.975, the pre-

sented DLADT-PW model has accomplished a maximum average accu-

racy of 0.982.

Fig. 8 observes the average accuracy analysis of the DLADT-PW with

existing methods on the applied Test007 dataset. From the gure, it can

be evident that the MPPCA and SF models have reached a reduced

average accuracy of 0.718 and 0.690 correspondingly. Likewise, a

somewhat enhanced average accuracy of 0.821 and 0.778 has been

attained by the Fast R-CNN and MDT models. But, the RS-CNN method

has led to a competitive average accuracy of 0.867, the proposed

Fig. 3. Sample images.

Fig. 4. (a) Test Image (b) Anomaly Detected Image.

I.V. Pustokhina et al.

Safety Science 142 (2021) 105356

DLADT-PW method has accomplished a higher average accuracy of

0.896. From the above-mentioned tables and gures, it is evident that

the presented DLADT-PW technique is found to be an effective tool for

the detection of anomalies in pedestrian walkways.

5. Conclusion

This paper has designed an automated DLADT-PW model to improve

the safety of pedestrians. DLADT-PW model aims to recognize and

categorize the dissimilar anomalies present in the pedestrian walkways

such as cars, skating, jeep, etc. Firstly, the surveillance video is primarily

converted into a set of frames, and anomalies are detected in each frame.

Next, the preprocessing is performed to improve the quality of the

image. Followed by, the Mask RCNN model is applied for the detection

of anomalies and DenseNet 169 model is utilized as the baseline network

for Mask RCNN. At last, the anomalies that exist in the frame are

Table 2

Accuracy of anomalies in Test004 Sequences.

Frame Number Anomaly 1 Anomaly 2

078 0.95 –

091 0.96 –

092 0.97 –

110 0.98 –

113 0.99 –

115 0.99 –

125 0.99 0.97

142 0.99 0.98

146 0.99 0.97

147 0.99 0.99

148 0.99 0.99

150 0.99 0.99

178 0.99 0.99

179 0.99 0.99

180 0.99 0.99

Table 3

Result Analysis of Existing with DLADT-PW model for the test case Test004 in

terms of Accuracy.

Frames DLADT-

RS-

CNN

Fast R-

CNN

MDT MPPCA Social

Force

040 0.950 0.940 0.819 0.768 0.744 0.524

042 0.960 0.940 0.824 0.766 0.782 0.639

046 0.970 0.950 0.853 0.752 0.768 0.536

051 0.980 0.960 0.793 0.898 0.759 0.601

075 0.990 0.990 0.783 0.827 0.752 0.524

106 0.990 0.990 0.912 0.834 0.723 0.513

123 0.980 0.970 0.913 0.879 0.714 0.575

135 0.985 0.975 0.924 0.806 0.775 0.536

136 0.980 0.980 0.946 0.839 0.713 0.632

137 0.990 0.985 0.917 0.856 0.754 0.579

149 0.990 0.990 0.839 0.786 0.709 0.613

158 0.990 0.990 0.771 0.783 0.716 0.544

177 0.990 0.990 0.793 0.753 0.770 0.522

178 0.990 0.985 0.824 0.765 0.802 0.514

180 0.990 0.990 0.853 0.852 0.704 0.605

Fig. 5. Result analysis of DLADT-PW model on Test004 dataset.

Table 4

Accuracy of anomalies in Test007 Sequences.

Frame Number Anomaly 1 Anomaly 2 Anomaly 3

078 0.95 0.96 –

091 0.97 0.99 –

092 0.99 0.97 –

110 0.95 0.95 0.99

113 0.99 0.83 0.99

115 0.99 0.60 0.99

125 0.98 0.98 0.99

142 0.99 0.95 0.97

146 0.98 0.70 –

147 0.99 0.80 0.80

148 0.99 0.60 0.60

150 0.96 0.90 –

178 0.89 0.86 0.65

179 0.88 0.78 0.70

180 0.97 0.88 0.75

Table 5

Result Analysis of Existing with DLADT-PW model for the test case Test007 in

terms of Accuracy.

Frames DLADT-

RS-

CNN

Fast R-

CNN

MDT MPPCA Social

Force

040 0.955 0.940 0.892 0.842 0.758 0.636

042 0.980 0.955 0.918 0.850 0.743 0.723

046 0.980 0.975 0.928 0.860 0.658 0.709

051 0.963 0.947 0.917 0.827 0.710 0.640

075 0.937 0.923 0.877 0.847 0.737 0.665

106 0.860 0.833 0.829 0.824 0.704 0.652

123 0.983 0.980 0.913 0.885 0.686 0.699

135 0.970 0.960 0.942 0.817 0.680 0.724

136 0.840 0.835 0.798 0.748 0.788 0.687

137 0.863 0.827 0.810 0.801 0.673 0.741

149 0.730 0.680 0.546 0.527 0.672 0.633

158 0.930 0.870 0.825 0.811 0.709 0.699

177 0.800 0.733 0.689 0.623 0.756 0.747

178 0.787 0.723 0.629 0.610 0.724 0.689

180 0.867 0.830 0.808 0.799 0.769 0.705

I.V. Pustokhina et al.

Safety Science 142 (2021) 105356

successfully identied and classied. For verifying the superior anomaly

detection performance of the DLADT-PW technique, a wide set of sim-

ulations were accomplished and the results are inspected under distinct

aspects. The obtained experimental values conrmed the superior

characteristics of the DLADT-PW technique by achieving a maximum

detection accuracy. In future work, the presented DLADT-PW technique

can be extended to the detection of anomalies under the consideration of

poor weather conditions. In the future, the presented work can be

implemented in various real-time scenarios like the detection of vehicles

in pedestrian walkways. In addition, the presented model can be

employed to detect crime scenes like robbery, quarreling, and so on from

the surveillance cameras.

Data Availability Statement

Data sharing not applicable to this article as no datasets were

generated or analyzed during the current study.

Declaration of Competing Interest

The authors declare that they have no conict of interest. The

manuscript was written through the contributions of all authors. All

authors have approved the nal version of the manuscript.

References

Alqaralleh, B.A.Y., Mohanty, S.N., Gupta, D., Khanna, A., Shankar, K., Vaiyapuri, T.,

2020. Reliable Multi-Object Tracking Model Using Deep Learning and Energy

Efcient Wireless Multimedia Sensor Networks. IEEE Access 8, 213426–213436.

https://doi.org/10.1109/ACCESS.2020.3039695.

Cocca, P., Marciano, F., Alberti, M., 2016. Video surveillance systems to enhance

occupational safety: A case study. Saf. Sci. 84, 140–148.

Feng, Y., Yuan, Y., Lu, X., 2017. Learning deep event models for crowd anomaly

detection. Neurocomputing 219, 548–556.

Gong, Y., Wang, L., Guo, R., Lazebnik, S., 2014. Multiscale orderless pooling of deep

convolutional activation features. ECCV 392–407.

Huang, G., Liu, Z., Van Der Maaten, L. and Weinberger, K.Q., 2017. Densely connected

convolutional networks. In Proceedings of the IEEE conference on computer vision

and pattern recognition (pp. 4700-4708). http://www.svcl.ucsd.edu/projects/

anomaly/dataset.htm.

Kim, J.; Grauman, K. Observe locally, infer globally: A space-time MRF for detecting

abnormal activities with incremental updates. In Proceedings of the 2009 IEEE

Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25

June 2009; pp. 2921–2928.

Kratz, L.; Nishino, K. Anomaly detection in extremely crowded scenes using spatio-

temporal motion pattern models. In Proceedings of the 2009 IEEE Conference on

Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp.

1446–1453.

Fig. 6. Result analysis of DLADT-PW model on Test007 dataset.

Table 6

Average Analysis of Accuracy on Applied Dataset.

Methods DLADT-

RS-

CNN

Fast R-

CNN

MDT MPPCA Social

Force

Test004 0.982 0.975 0.851 0.811 0.746 0.564

Test007 0.896 0.867 0.821 0.778 0.718 0.690

Fig. 7. Average accuracy analysis of DLADT-PW method on Test004 dataset.

Fig. 8. Average accuracy analysis of DLADT-PW method on Test007 dataset.

I.V. Pustokhina et al.

Safety Science 142 (2021) 105356

Li, Y., Xu, X. and Yuan, C., 2020. Enhanced Mask R-CNN for Chinese Food Image

Detection. Mathematical Problems in Engineering, 2020.

Li, W., Mahadevan, V., Vasconcelos, N., 2014. Anomaly detection and localization in

crowded scenes. IEEE Trans. Pattern Anal. Mach. Intell. 36, 18–32.

Liu, W.; Luo, W.; Lian, D.; Gao, S. Future frame prediction for anomaly detection—A new

baseline. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and

Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6536–6545.

Lu, C.; Shi, J.; Jia, J. Abnormal event detection at 150 FPS in MATLAB. In Proceedings of

the 2013 IEEE International Conference on Computer Vision, Sydney, NSW,

Australia, 1–8 December 2013; pp. 2720–2727.

Mahadevan, V.; Li, W.; Bhalodia, V.; Vasconcelos, N. Anomaly detection in crowded

scenes. In Proceedings of the 2010 IEEE Computer Society Conference on Computer

Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp.

1975–1981.

Murugan, B.S., Elhoseny, M., Shankar, K., Uthayakumar, J., 2019. Region-based scalable

smart system for anomaly detection in pedestrian walkways. Comput. Electr. Eng.

75, 146–160.

Rahouti, A., Lovreglio, R., Gwynne, S., Jackson, P., Datoussaïd, S., Hunt, A., 2020.

Human behaviour during a healthcare facility evacuation drills: Investigation of pre-

evacuation and travel phases. Saf. Sci. 129, 104754.

Ribeiro, M., Lazzaretti, A.E., Lopes, H.S., 2018. A study of deep convolutional auto-

encoders for anomaly detection in videos. Pattern Recognit. Lett. 105, 13–22.

Sabokrou, M., Fayyaz, M., Fathy, M., Klette, R., 2017. Deep-cascade: Cascading 3D deep

neural networks for nast anomaly detection and localization in crowded scenes. IEEE

Trans. Image Process. 26, 1992–2004.

Sabokrou, M., Fayyaz, M., Fathy, M., Moayed, Z., Klette, R., 2018. Deep-anomaly: Fully

convolutional neural network for fast anomaly detection in crowded scenes. Comput.

Vis. Image Underst. 172, 88–97.

Shankar, K., Perumal, E., 2020. A novel hand-crafted with deep learning features based

fusion model for COVID-19 diagnosis and classication using chest X-ray images.

Complex Intell. Syst. https://doi.org/10.1007/s40747-020-00216-6.

Tsai, M.K., 2014. Automatically determining accidental falls in eld surveying: A case

study of integrating accelerometer determination and image recognition. Saf. Sci. 66,

19–26.

Wester, M., Giesecke, J., 2019. Accepting surveillance–An increased sense of security

after terror strikes? Saf. Sci. 120, 383–387.

Xu, C., Wang, G., Yan, S., Yu, J., Zhang, B., Dai, S., Li, Y. and Xu, L., 2020. Fast Vehicle

and Pedestrian Detection Using Improved Mask R-CNN. Mathematical Problems in

Engineering, 2020.

Xu, D., Song, R., Wu, X., Li, N., Feng, W., Qian, H., 2014. Video anomaly detection based

on a hierarchical activity discovery within spatio-temporal contexts.

Neurocomputing 143, 144–152.

Xu, B., Wang, W., Falzon, G., Kwan, P., Guo, L., Chen, G., Tait, A., Schneider, D., 2020.

Automated cattle counting using Mask R-CNN in quadcopter vision system. Comput.

Electron. Agric. 171, 105300.

Xu, D., Yan, Y., Ricci, E., Sebe, N., 2017. Detecting anomalous events in videos by

learning deep representations of appearance and motion. Comput. Vis. Image

Underst. 156, 117–127.

Yang, E., Parvathy, V.S., Selvi, P.P., Shankar, K., Seo, C., Joshi, G.P., Yi, O., 2020. Privacy

Preservation in Edge Consumer Electronics by Combining Anomaly Detection with

Dynamic Attribute-Based Re-Encryption. Mathematics 8 (11), 1871.

Yu, B., Liu, Y., Sun, Q., 2017. A content-adaptively sparse reconstruction method for

abnormal events detection with low-rank property. IEEE Trans. Syst. Man Cybern.

Syst. 47, 704–716.

Zhang, Z., Trivedi, C., Liu, X., 2018. Automated detection of grade-crossing-trespassing

near misses based on computer vision analysis of surveillance video data. Saf. Sci.

110, 276–285.

I.V. Pustokhina et al.

EnParaNet: a novel deep learning architecture for faster prediction using low-computational resource devices

Article

Full-text available

Jun 2024
NEURAL COMPUT APPL

The deployment of deep learning architectures on low-computational resource devices is challenging due to their high number of parameters and computational complexity. These heavy and complex architectures result in increased latency in real-time applications. However, splitting the deep architecture in a way that parallelizes the forward propagation into different subnets deploying into multiple low-computational resource devices, and then, aggregating the predictions may reduce the latency while preserving the performance. In this paper, we propose a novel deep learning architecture called Ensembled Parallel Networks (EnParaNets) that leverage network dissection, knowledge distillation, and ensemble learning strategies to reduce inference time while maintaining, even in some cases, outperforming the baseline accuracy in real-time applications. The methodology involves splitting the original network into N equal-sized blocks, forming N Sub-ParaNets for each block, and enhancing their representations using (A) contrastive knowledge distillation along with reducing Kullback–Leibler divergence between logits distributions of the teacher and student networks, and (B) L2 loss between intermediate representations of the original network and corresponding Sub-ParaNets. Predictive distributions from each Sub-ParaNet are assembled to form the final EnParaNet. The proposed EnParaNet outperforms the baseline models of seven diverse architectures: ResNet56, VGG_13, WRN_40_2, DenseNet, ResNeXt50, MobileNetv2, and ShuffleNetv2 in terms of accuracy while reducing inference time significantly using training methods A and B, respectively. Our proposed EnParaNet enhances ResNet56, VGG_13, WRN_40_2, MobileNetv2, DenseNet, ResNeXt50, and ShuffleNetv2 by 2.69%, 0.24%, 1.95%, 7.69%, 0.33%, 2.13%, and 3.12%, respectively, using training method A, where the inference time is reduced by 45%, 24%, 47%, 31%, 33%, 32%, and 44%, respectively. With training method B, EnParaNet achieves improvements of 1.75%, 2.90%, 1.09%, 3.91%, and 1.66%, with inference time reductions of 50%, 42%, 49%, 48%, and 49%, respectively. Moreover, a comprehensive ablation study analyzes the performance of the proposed technique and highlights its effectiveness and challenges. Furthermore, we also evaluate the performance of EnParaNet in transferability and adversarial robustness tasks.

Intelligent Multi-Group Marine Predator Algorithm With Deep Learning Assisted Anomaly Detection in Pedestrian Walkways

Article

Full-text available

Jan 2024

Anomaly Detection (AD) in Pedestrian Walkways (PWs) is critical to urban security and safety systems. It is widely used to detect abnormal or unusual behaviours, situations, or events in areas dedicated to pedestrian traffic, namely crosswalks, sidewalks, or pedestrian bridges. The main objective is to improve efficiency, safety, and security in the urban environment by identifying deviations and monitoring pedestrian activities from established norms. This kind of AD typically includes surveillance cameras, sensors, and advanced software algorithms. Using advanced machine learning (ML) and computer vision (CV) approaches, this technique continuously monitors the pedestrian area to detect potential threats and irregularities. Deep Learning Assisted AD in Pedestrian Walkways presents a novel and very efficient method to enhance security and safety in urban environments. Therefore, this study designs an Intelligent Multi-Group Marine Predator Algorithm with Deep Learning Assisted Anomaly Detection (MMPADL-AD) in Pedestrian Walkways. The MMPADL-AD system aims to ensure security in PWs via the AD process. The MMPADL-AD technique incorporates a NASNet feature extractor that proficiently extracts high-level features from surveillance data, allowing a deep understanding of pedestrian behaviours. Besides, the MMPADL-AD technique applies convolutional long short-term memory (ConvLSTM), inheriting the benefits of convolutional neural networks) and LSTM for the AD process. Finally, the MMPA has been used for the hyperparameter tuning mechanism, which optimizes the model’s performance, assuring accuracy and adaptability. Benchmark data accompanied an extensive set of experiments to ensure the higher effectiveness of the MMPADL-AD approach. The experimental values highlighted the supremacy of the MMPADL-AD approach over other DL methods.

Hybrid harris hawk-arithmetic optimization with deep learning-driven object detection and classification for surveillance video analysis

Article

Full-text available

May 2024
MULTIMED TOOLS APPL

Video surveillance has played a pivotal role in ensuring the safety and security of the public across different sectors, including retail, transportation, and urban environments. Object detection (OD) in surveillance videos includes identifying and localizing certain objects, namely vehicles, persons, or suspicious items. Conventional approaches often struggle with variations in complex backgrounds, lighting conditions, and occlusions. Deep learning (DL)-based techniques, particularly convolutional neural network (CNN), have shown outstanding performance in attaining accurate OD and handling these challenges. Despite the success of DL-based OD in surveillance videos, various challenges exist such as dealing with variations in camera viewpoints, recognizing small objects, ensuring robustness to adverse weather conditions, and handling occlusions. This study presents a Hybrid Harris Hawk-Arithmetic Optimization with Deep Learning-Driven Object Detection and Classification (HHAODL-ODC) method for Surveillance Video Analysis. The purpose of the study is to develop a HHAODL-ODC technique for object detection and classification. To accomplish this, the HHAODL-ODC technique follows two major components namely object detector and object classifier. Primarily, the HHAODL-ODC technique employs a YOLO-v5 object detector with EfficientNet as a backbone network. Next, the classification of objects takes place using the Spatial Angular-Stacked Sparse Autoencoder (SA-SSAE) model. The HHAO algorithm has been applied for the hyperparameter tuning process to improve the object classification results of the EfficientNet model. The stimulation validation of the HHAODL-ODC method is tested using a benchmark surveillance video dataset. The experimental outcomes highlighted the superior performance of the HHAODL-ODC algorithm over other DL techniques under various measures.

PA2Dnet based ensemble classifier for the detection of crowd anomaly detection

Article

Full-text available

Nov 2023
MULTIMED TOOLS APPL

In surveillance video, crowd anomaly detection uses humans' position and orientation deviation. Encoding these positions is complicated and uses manual or handcrafted features for anomaly detection. It leads to high computation time, higher false positives and inaccurate detection. To resolve this issue, the novel deep learning approach progressive attention-based anomaly detection network (PA2DNet) is proposed for crowd anomaly detection. Keyframes are selected from the video, and optimal features are extracted with the VGG16 feature extraction technique. Then, the features are integrated with a discriminative enhancement module and classified with the progressive structure of a squeeze and concatenate module. The PA2DNet model hybridizes the progressive structure and the Pyramid Squeeze Attention module of the Pyramid Neural Network. The model's accuracy is improved by representing feature information at each input image dimension. The employed ensemble model of machine learning and deep learning achieves the goal of a highly accurate and faster anomaly detection method. The proposed approach is implemented with the UCSD dataset, and the performance is evaluated with precision, recall, f-measure, accuracy, etc. Using the proposed anomaly detection, an accuracy rate of 99% is achieved with a training time of 85.34 s.

Assisting Visually Impaired People Using Deep Learning-based Anomaly Detection in Pedestrian Walkways for Intelligent Transportation Systems on Remote Sensing Images

Article

Full-text available

Aug 2023

Anomaly detection in pedestrian walkways of visually impaired people (VIP) is a vital research area that utilizes remote sensing and aids to optimize pedestrian traffic and improve flow. Researchers and engineers can formulate effective tools and methods with the power of machine learning (ML) and computer vision (CV) to identifying anomalies (i.e. vehicles) and mitigate potential safety hazards in pedestrian walkways. With recent advancements in ML and deep learning (DL) areas, authors have found that the image recognition problem ought to be devised as a two-class classification problem. Therefore, this manuscript presents a new sine cosine algorithm with deep learning-based anomaly detection in pedestrian walkways (SCADL-ADPW) algorithm. The proposed SCADL-ADPW technique identifies the presence of anomalies in the pedestrian walkways on remote sensing images. The SCADL-ADPW techniques focus on the identification and classification of anomalies, i.e. vehicles in the pedestrian walkways of VIP. To accomplish this, the SCADL-ADPW technique uses the VGG-16 model for feature vector generation. In addition, the SCA approach is designed for the optimal hyperparameter tuning process. For anomaly detection, the long short-term memory (LSTM) method can be exploited. The experimental results of the SCADL-ADPW technique are studied on the UCSD anomaly detection dataset. The comparative outcomes stated the improved anomaly detection results of the SCADL-ADPW technique.

Colliding Bodies Optimization With Deep Belief Network Based Robust Pedestrian Detection

Article

Full-text available

Jan 2023

Pedestrian detection is a significant research topic in the computer vision (CV) domain for a longer period. Recently, deep learning (DL) and specifically convolutional neural network (CNN) exhibit significant improvement in the computer vision tasks such as object detection, segmentation, image classification, etc. With this motivation, this study develops a novel Colliding Bodies Optimization with Deep Learning based Robust Pedestrian Detection (CBODL-RPD) model. The goal of the CBODL-RPD approach is to identify the occurrence of pedestrians and non-pedestrians via object detection process. For object detection process, YOLO v4 with Adagrad optimizer is applied. In addition, the CBODL-RPD technique employs SqueezeNet model to generate feature vectors, and the hyperparameter tuning process is performed via the CBO algorithm. At last, deep belief network (DBN) model is applied for accurate pedestrian detection. A comprehensive experimental analysis is made to demonstrate the significant pedestrian detection results of the CBODL-RPD technique. The comparative outcome study reported the improved outcomes of the CBODL-RPD method over other recent methods.

ConvGRU-CNN: Spatiotemporal Deep Learning for Real-World Anomaly Detection in Video Surveillance System

Article

Jan 2023

Golden Jackal Optimization with Deep Learning-Based Anomaly Detection in Pedestrian Walkways for Road Traffic Safety

Chapter

Aug 2023

Saleh Al Sulaie

Road traffic safety discusses the procedures and measures utilized for preventing road users from being dead or critically injured. Archetypal road users contain horse riders, cyclists, pedestrians, vehicle passengers, motorists, and passengers of on-road public transport (mostly buses and trams). Anomaly detection in pedestrian pathways is a crucial investigation topic, generally utilized for improving pedestrian safety. Because of the varied consumption of video surveillance methods and the improved quantity of captured videos, the typical manual analysis of labeling abnormal proceedings is a tiresome task, thus, an automated surveillance method in which anomaly detection develops important betwixt computer vision researchers. At present, the progress of deep learning (DL) algorithms has obtained important interest in distinct computer vision procedures. Therefore, this article introduces a new Golden Jackal Optimization with Deep Learning-based Anomaly Detection in Pedestrian Walkways (GJODL-ADPW) for road traffic safety. The presented GJODL-ADPW technique aims to effectively recognize the presence of anomalies (such as vehicles, skaters) on pedestrian walkways. In the presented GJODL-ADPW technique, Xception methodology was exploited for effective extraction feature process. For optimal hyperparameter selection, the GJO algorithm is utilized in this study. Finally, bidirectional long short-term memory (BiLSTM) approach was employed for anomaly detection purposes. A widespread experimental analysis is performed to examine the enhanced performance of the GJODL-ADPW system. A detailed comparative analysis demonstrated the enhancements of the GJODL-ADPW technique over other recent approaches.KeywordsPedestrian walkwaysRoad safetySurveillance systemAnomaly detectionDeep learning

Exploring Factors Associated With Crossing Assertiveness of Pedestrians at Unsignalized Intersections

Article

Full-text available

Feb 2023

A pedestrian’s assertiveness when crossing an intersection measures his or her willingness to cross under given conditions, and this level of assertiveness affects pedestrian crossing behavior and safety. Crossing assertiveness at an unsignalized intersection is a complex psychological decision affected by many features, such as the speeds and trajectories of oncoming vehicles, eye contact, facial expression, and hand gesture communications between pedestrians and drivers. To provide a comprehensive understanding of crossing assertiveness of pedestrians at unsignalized intersections, this study applied a pattern recognition method—association rules mining to uncover the patterns for different levels of crossing assertiveness, including assertive, neutral, and passive, using a unique naturalistic driving dataset. An elaborated feature engineering with the decision tree, gradient-boosting decision tree, and XGBoost with SHAP were utilized to select a distinct feature set as input of the Apriori algorithm to recognize the patterns. The results revealed that the driver’s facial expression, the driver’s initiative and passive yield, and the presence of the “yield-to-pedestrian” traffic sign were highly associated with assertive crossing. Features such as the absence of pedestrians on the crosswalk, the presence of incoming speeding vehicles, and the absence of traffic control signs were strongly related to passive crossing. Meanwhile, the number and position of pedestrians at the crosswalk or near the curbside, the communication between pedestrians and drivers, and who actively seeks eye contact were the three major features to convert crossing from neutral to assertive or passive. The results provided a unique and meaningful understanding of pedestrian crossing assertiveness at unsignalized intersections.

Self-Calibrating Anomaly and Change Detection for Autonomous Inspection Robots

Conference Paper

Dec 2022

Reliable Multi-Object Tracking Model Using Deep Learning and Energy Efficient Wireless Multimedia Sensor Networks

Article

Full-text available

Jan 2020

Presently, sensor-cloud based environment becomes highly beneficial due to its applicability in several domains. Wireless multimedia sensor network (WMSN) is one among them, which involves a set of multimedia sensors to collect data about the deployed region. Compared to traditional object tracking models, animal tracking in WMSN is a tedious process owing to the harsh, dynamic, and energy limited sensors. This article introduces a new Reliable Multi-Object Tracking Model using Deep Learning (DL) and Energy Efficient WMSN. Initially, the fuzzy logic technique is employed to determine the cluster heads (CHs) to attain energy efficiency. Next, in the second stage, a novel tracking algorithm by the use of Recurrent Neural Network (RNN) with a tumbling effect called RNN-T is developed. The proposed RNN-T model gets executed by every sensor node and the CHs execute the tracking algorithm to track the animals. Finally, the tracking results are transmitted to the cloud server for investigation purposes. In order to assess the performance of the presented model, an extensive experimental analysis is carried out by the use of a real-time wildlife video. The obtained results ensured that the RNN-T model has achieved better performance over the compared methods in different aspects.

A novel hand-crafted with deep learning features based fusion model for COVID-19 diagnosis and classification using chest X-ray images

Article

Full-text available

Nov 2020

COVID-19 pandemic is increasing in an exponential rate, with restricted accessibility of rapid test kits. So, the design and implementation of COVID-19 testing kits remain an open research problem. Several findings attained using radio-imaging approaches recommend that the images comprise important data related to coronaviruses. The application of recently developed artificial intelligence (AI) techniques, integrated with radiological imaging, is helpful in the precise diagnosis and classification of the disease. In this view, the current research paper presents a novel fusion model hand-crafted with deep learning features called FM-HCF-DLF model for diagnosis and classification of COVID-19. The proposed FM-HCF-DLF model comprises three major processes, namely Gaussian filtering-based preprocessing, FM for feature extraction and classification. FM model incorporates the fusion of handcrafted features with the help of local binary patterns (LBP) and deep learning (DL) features and it also utilizes convolutional neural network (CNN)-based Inception v3 technique. To further improve the performance of Inception v3 model, the learning rate scheduler using Adam optimizer is applied. At last, multilayer perceptron (MLP) is employed to carry out the classification process. The proposed FM-HCF-DLF model was experimentally validated using chest X-ray dataset. The experimental outcomes inferred that the proposed model yielded superior performance with maximum sensitivity of 93.61%, specificity of 94.56%, precision of 94.85%, accuracy of 94.08%, F score of 93.2% and kappa value of 93.5%.

Privacy Preservation in Edge Consumer Electronics by Combining Anomaly Detection with Dynamic Attribute-Based Re-Encryption

Article

Full-text available

Oct 2020

The expanding utilization of edge consumer electronic (ECE) components and other innovations allows medical devices to communicate with one another to distribute sensitive clinical information. This information is used by health care authorities, specialists and emergency clinics to offer enhanced medication and help. The security of client data is a major concern, since modification of data by hackers can be life-threatening. Therefore, we have developed a privacy preservation approach to protect the wearable sensor data gathered from wearable medical devices by means of an anomaly detection strategy using artificial intelligence combined with a novel dynamic attribute-based re-encryption (DABRE) method. Anomaly detection is accomplished through a modified artificial neural network (MANN) based on a gray wolf optimization (GWO) technique, where the training speed and classification accuracy are improved. Once the anomaly data are removed, the data are stored in the cloud, secured through the proposed DABRE approach for future use by doctors. Furthermore, in the proposed DABRE method, the biometric attributes, chosen dynamically, are considered for encryption. Moreover, if the user wishes, the data can be modified to be unrecoverable by re-encryption with the true attributes in the cloud. A detailed experimental analysis takes place to verify the superior performance of the proposed method. From the experimental results, it is evident that the proposed GWO–MANN model attained a maximum average detection rate (DR) of 95.818% and an accuracy of 95.092%. In addition, the DABRE method required a minimum average encryption time of 95.63 s and a decryption time of 108.7 s, respectively.

Enhanced Mask R-CNN for Chinese Food Image Detection

Article

Full-text available

Jul 2020
MATH PROBL ENG

Food image detection plays an essential role in visual object detection, considering its applicability in solutions that improve people’s nutritional status and thus their health-care. At present, most food detection technologies are aimed at Western food and Japanese food, but few at Chinese foods. In this work, we exert effort to establish a Chinese food image dataset called CF-108 that can be used as an essential data basis for Chinese food image detection. The CF-108 dataset contains most Chinese dishes and covers large variations in presentations of the same category. In addition, we introduce a training architecture that replaces the traditional convolution in mask region convolutional neural network (Mask R-CNN) with depthwise separable convolution, namely, Mask R-DSCNN, to reduce the expensive computation cost. Experiments demonstrate that Mask R-DSCNN can significantly reduce resource consumption and improve Chinese food images’ detection efficiency without hurting too much accuracy.

Fast Vehicle and Pedestrian Detection Using Improved Mask R-CNN

Article

Full-text available

May 2020
MATH PROBL ENG

This study presents a simple and effective Mask R-CNN algorithm for more rapid detection of vehicles and pedestrians. The method is of practical value for anticollision warning systems in intelligent driving. Deep neural networks with more layers have greater capacity but also have to perform more complicated calculations. To overcome this disadvantage, this study adopts a Resnet-86 network as a backbone that differs from the backbone structure of Resnet-101 in the Mask R-CNN algorithm within practical conditions. The results show that the Resnet-86 network can reduce the operation time and greatly improve accuracy. The detected vehicles and pedestrians are also screened out based on the Microsoft COCO dataset. The new dataset is formed by screening and supplementing COCO dataset, which makes the training of the algorithm more efficient. Perhaps, the most important part of our research is that we propose a new algorithm, Side Fusion FPN. The parameters in the algorithm have not increased, the amount of calculation has increased by less than 0.000001, and the mean average precision (mAP) has increased by 2.00 points. The results show that, compared with the algorithm of Mask R-CNN, our algorithm decreased the weight memory size by 9.43%, improved the training speed by 26.98%, improved the testing speed by 7.94%, decreased the value of loss by 0.26, and increased the value of mAP by 17.53 points.

Human behaviour during a healthcare facility evacuation drills: Investigation of pre-evacuation and travel phases

Article

Full-text available

Sep 2020
SAFETY SCI

Evacuation models are key tools to assess the fire safety of complex buildings. Those tools and their results rely on the input values selected by users based on existing datasets and the way they are transformed into output. Several evacuation studies have been carried out to provide input values for evacuation models. However, those regarding healthcare facilities evacuation are still rare in the literature. In this paper, we present a new evacuation dataset for healthcare facility evacuation simulation. The data was collected from an outpatients’ area of a public hospital in Auckland (New Zealand) during two unannounced fire drills. The video images were analysed to generate new evacuation model inputs for healthcare facility evacuation scenarios. The drills involved both staff and patients. Pre-evacuation times, evacuee horizontal travel speeds, exit selection and total evacuation times were collected and analysed. Moreover, we investigated evacuee reactions and actions to study staff and patients’ interaction during the evacuation process. The results showed that pre-evacuation time of patients ranges from 8 to 63 s; while, pre-evacuation time of staff ranges from 8 to 141 s. In addition, during the movement phase, staff who were not assisting patients, and patients with no impairments, travelled at a similar average walking speed (i.e. 1.06 m/s for staff members and 0.93 m/s for patients with no impairments). Finally, the results indicated that the average travel speed of patients with walking impairments and staff assisting them was almost half of the travel speed of the first two groups (i.e. 0.52 m/s).

Automated cattle counting using Mask R-CNN in quadcopter vision system

Article

Full-text available

Apr 2020
COMPUT ELECTRON AGR

The accurate and reliable counting of animals in quadcopter acquired imagery is one of the most promising but challenging tasks in intelligent livestock management in the future. In this paper we demonstrate the application of the cutting-edge instance segmentation framework, Mask R-CNN, in the context of cattle counting in different situations such as extensive production pastures and also in intensive housing such as feedlots. The optimal IoU threshold (0.5) and the full-appearance detection for the algorithm in this study are verified through performance evaluation. Experimental results in this research show the framework’s potential to perform reliably in offline quadcopter vision systems with an accuracy of 94% in counting cattle on pastures and 92% in feedlots. Compared with the existing typical competing algorithms, Mask R-CNN outperforms both in the counting accuracy and average precision especially on the datasets with occlusion and overlapping. Our research shows promising steps towards the incorporation of artificial intelligence using quadcopters for enhanced management of animals.

Accepting surveillance – An increased sense of security after terror strikes?

Article

Dec 2019
SAFETY SCI

Over the last two decades, several large terror attacks have led to increased discussion of the use of different surveillance technologies. The use of novel technologies for pre-emptive security and surveillance has been discussed and criticized academically, but few studies have addressed the public. Studies that target the public tend to assume an oversimplified trade-off between privacy and security, i.e. how much privacy a person is willing to yield to attain greater security. We used three large surveys of random samples from the Swedish population to study public attitudes to a number of surveillance technologies. The last survey took place shortly after a terror attack in Stockholm, and an aim was to see how this affected acceptance. The main differences between 2009 and 2017 were that the demand for transparency (i.e. public scrutiny) had increased dramatically, and that the notion of risk posed by the new technologies had diminished. Beyond this, changes in attitudes were small. Technologies were perceived as contributing to making society safer – albeit not decisively. Also, acceptance was not only influenced by what data was collected, but also by who was collecting and owning it. In public discussions about security, two things are often assumed: that an increase of hard security measures will increase societal security, and that citizens are willing to do a trade-off between privacy and security. We find that this is not the case. Instead, citizens weigh the pros and cons of surveillance and also distinguish between different forms of surveillance.

Region-based scalable smart system for anomaly detection in pedestrian walkways

Article

May 2019
COMPUT ELECTR ENG

Different-sized anomalies and its occurrence in a shorter period have always been an open research issue. To resolve the issue of detecting anomalies of different sizes, especially in pedestrian pathways, within a shorter time period, the current research article introduced a Region based Scalable Convolution Neural Network (RS-CNN). The proposed method used region based proposals for faster identification and performed well with the scalability issues. The RS-CNN model was validated using different video sequences from the UCSD anomaly detection dataset. When compared with state-of-the-art detection techniques such as Fast R-CNN, Minimization of Drive Testing (MDT), Mixtures of Probabilistic Principal Component Analyzers (MPPCA) and Social Force (SF), the RS-CNN model was found to be faster and efficient even in the presence of anomalies of various sizes.

Future Frame Prediction for Anomaly Detection - A New Baseline

Conference Paper

Jun 2018

An automated deep learning based anomaly detection in pedestrian walkways for vulnerable road users safety

Abstract and Figures

Recommended publications

Optimal deep transfer learning enabled object detector for anomaly recognition in pedestrian ways

An Automated Deep Learning based Anomaly Detection in Pedestrian Walkways for Vulnerable Road Users...

An Efficient Anomaly Detection System for Crowded Scenes Using Variational Autoencoders

Street Scene: A new dataset and evaluation protocol for video anomaly detection

Transfer learning for video anomaly detection