Conference PaperPDF Available

Detecting Criminal Activities and Promoting Safety Using Deep Learning

January 2022

January 2022

DOI:10.1109/ACCAI53970.2022.9752619

Conference: 2022 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI)

Authors:

Rajeswari Devarajan

SRM Institute of Science and Technology

Content uploaded by Rajeswari Devarajan

Content may be subject to copyright.

Detecting Criminal Activities and Promoting

Safety Using Deep Learning

Rohan Mathur, Tejas Chintala and Rajeswari D*

Department of Data Science and Business Systems, School of Computing, SRM Institute of Science and Technology,

Kattankulathur, Tamil Nadu– 603203, India.

E-mail : rm3686@srmist.edu.in, tc5739@srmist.edu.in, rajeswad@srmist.edu.in

Abstract- Automation and autonomous systems are among

the few powerhouses of innovation that drive entire domains

towards advancing further in leaps and bounds. Great

technological innovations can be attributed to tasks that are

made easier and more perceptible by automation, and

artificial intelligence is here to make these automated

systems smart enough to perform their tasks with the power

of decision-making, thereby greatly reducing human

intervention in redundant processes. Our project follows the

aforementioned ideals: building a product to minimize

manual labor (both physical and mental) for tasks that can

be seamlessly automated and processed while solving the

main problem statement at hand. Currently, surveillance

cameras play a vital role to ensure the safety of the people,

yet they are plain video-providing entities with no smart

decision making mechanisms of their own. Because of this

growth of data composed from surveillance cameras,

automated video streams have become a requisite for

automatically detecting abnormal events. The main aim of

the project focuses on promoting safety on campus by

employing deep learning techniques to automate the task of

monitoring and reporting crimes from the physical Closed-

Circuit Television (CCTV), assigning the responsibility of

detecting criminal activity to a framework that can identify

patterns to differentiate them for smarter monitoring. The

model in this paper that we propose was able to distinguish

between certain crimes with a 0.94 and 0.95 precision for

Assault and Abuse respectively.

Keywords—ResNet, Computer Vision, Deep Learning, Triplet

Loss Function, Single Shot Detector, UCF Crime

I.INTRODUCTION

Video classification is a computer vision problem

that was showcased with the motivation of it being

able to solve and automate classification tasks

concerning the real-time live video. Considering that

the problem is recent, there are still exist solutions

left to be tested. Even after this, the applications

constitute a wide spectrum of solutions, with the

start of detecting important aspects of sports actions

[1] or daily pursuit that is taking place in a scene, to

various security and healthcare. Our project aims at

promoting safety on campus by automating the task

of monitoring and reporting crimes by assigning the

responsibility of detecting criminal or abnormal

activity to a system that is well-versed in deducing

patterns that distinguish criminal activity from

normal activity. Traditional surveillance systems

deal with a plethora of shortcomings that do not sync

well with the current facilities available in today‟s

day and age.

One of the most notable flaws in vanilla surveillance

systems is the heavy dependence on an attentive

supervisor monitoring footage and ensuring that any

abnormal activity is duly noted and taken care of.

CCTV footage requires a human intervention which

may lead to errors [8]. In a bid to reduce manual

intervention and labor, the project not only

eliminates the need for additional supervision, but it

also immediately detects the occurrence of a crime

and takes note of people involved, recognizes the

type of crime occurring, and accordingly triggers

actions that initiate mitigation measures on the crime

scene, real-time.

There have been different implementations for the

automatic detection of crimes like localizing the gun

and victim present in the video footage [4] using

deep learning approaches, especially CNN

architectures like Residual models [5]. With the

availability of vast databases like video footage from

the UCF-crime dataset [6] or RWF-2000 database

[7], it makes it easier to create a system of automatic

activity detectors, for the enhancement of safety in

public places.

II. LITERATURE REVIEW

There have been many papers on this application

along with their implementations, and have been

discussed in this paper. Umadevi V. Navalgund and

Priyadharshini K [8] proposed a crime detection

system by creating a pipeline to detect weapons

using images of the same and training them on

pretrained models to classify them. They used

Authorized licensed use limited to: SRM University. Downloaded on May 11,2022 at 06:34:43 UTC from IEEE Xplore. Restrictions apply.

VGGNet 19 as their pre-trained model for detection

and achieved 69% accuracy and 75% recall. Olmos,

Siham Tabik, and Francisco Herrera [4] described a

system of detecting guns in surveillance videos, and

then classifying if it is or not, predicting the crime

happened or not. The results for this paper are

obtained by the use of Region-based Convolutional

Neural Network (RCNN) and Faster Region-based

Convolutional Neural Network (FRCNN) based

models which the authors trained on their data. This

paper showcased a prediction of high crime

occurrence videos of low quality, which were able to

give out satisfactory predictions.

A. R. Zamir [1] proposes an overview on prominent

action localization and recognition methods for

sports videos. Using the Sports dataset provided by

UCF as the benchmark for evaluation of the

discussed techniques, they proposed a pipeline for

recognition of action into three major pipelines

which follow - feature extraction, representation of

videos with dictionary learning, and finally the

classification of the data, in this use-case, the sport.

A. Karpathy et al. [9] studies the performance of

convolutional neural networks (CNNS) in video

classification, finding that CNN architectures [10]

are capable of being able to grasp powerful features

from weakly-labeled data and surpassing other

existing methods in performance.

The transfer learning experiments suggest that the

features that are learned are generic and showcase

that other classification tasks are generalized. Tahani

Almanie, Rsha Mirza and Elizabeth Lor [11] discuss

a similar objective by using machine learning

techniques like Decision Trees and Naïve Bayes

Classifiers for classifying and predicting crimes. The

datasets being used here are spatial data. They were

able to achieve a 51% and 54% accuracy

respectively for datasets for two cities - Denver and

Los Angeles.

III. PROPOSED METHOD

The main aim of this paper is to detect crimes

happening in video streams such as CCTV cameras.

Moreover, an additional module for detecting faces

in these streams is also done using a previously

implemented method known as Triplet Loss. This

paper proposes two modules for showcasing the

given problem. The first is in taking a CCTV stream

and recognizing the faces using methods explained

below. The next module discusses how to tackle

crime detection using deep learning methods. Figure

1 gives a brief overview of both the modules

discussed in this paper. The Face Recognition

module uses Open-CV‟s built-in DNN (Deep Neural

Networks) module to train a model that is primarily

built to differentiate faces and recognize them. This

is done by identifying faces through given data,

performing data preprocessing, and then proceeding

with training on embeddings using Triplet Loss

Function. After these embeddings are calculated, the

model is then used for the recognition of such faces.

This can be easily done by loading the model and

using a webcam to identify the face by surrounding

it with bounding boxes as well as giving a

confidence parameter. This paper was able to

achieve significant results whilst evaluating this

module. The Crime detection module initially selects

the input videos that possess criminal activity as

well as videos with normal / no criminal activity.

After performing basic preprocessing steps, which

includes data augmentation and converting the

videos to image frames, it is trained through a pre-

existing ResNet architecture through which accuracy

and other evaluation metrics are observed.

Finally, using the model that is trained, one can

detect crimes by playing simulated webcam streams

through a mobile phone. Accuracies over 90% were

achieved when evaluating for a total of six classes in

this paper. Overall, this paper suggests an end-to-end

pipeline to tie both these modules and use them for

simultaneously identifying who is committing the

crime.

Face Recognition

Face Detection is the technique of detecting and

returning the location of a face inside the frame from

a given photo or video stream. Face verification

takes it a step further and checks if the given face

holds any resemblance with another face stored in

memory. This is done by gauging the similarity

between the two faces using distance metrics like the

L2 norm or the cosine similarity. Finally, face

recognition cumulatively involves both the above

techniques to extract prominent features from the

face and identify who the face belongs to from a set

of labels obtained from the dataset on which the

model is being trained. The proposed solution that

this paper offers for face recognition involves

detecting faces, computing the embeddings of the

face, training these on a Support Vector Machine

(SVM) on the given embeddings and finally being

Authorized licensed use limited to: SRM University. Downloaded on May 11,2022 at 06:34:43 UTC from IEEE Xplore. Restrictions apply.

able to detect faces in images or simulated video

streams. Figure 2 explains the pipeline that is

discussed in this paper. The Caffe Model is being

used for Face Detection and the OpenFace Model for

feature extraction.

Fig. 1. Overview of Face Recognition and Crime

Detection Module

Fig. 2. Overall Face Recognition Pipeline with

OpenCV

OpenCV‟s deep learning face detection is based on

the Single Shot Detector (SSD) architecture with a

ResNet. The single-shot detection refers to a

technique wherein the model requires a single shot

to detect multiple objects within the image. It

discretizes the given image into certain bounded

boxes around regions with feature maps of high

confidence and generates multiple boxes around

such region maps. The confidence for each of these

boxes is calculated and the box dimensions are

adjusted to obtain the best fit for detection. Figure 3

depicts how the final bounding boxes would look

once it recognizes a face.

Fig. 3. SSD – Multiple Bounding Boxes for

Localization and Confidence for

A Given Face

Additionally, we can compute facial landmarks

(mouth, right/left eyebrows, eyes, nose, jawline)

using dlib library, which will further enable us to

preprocess the images and perform face alignment

on datasets for better results. After cropping and

performing face alignment, we pass the given face

through the proposed neural network. For training a

face recognition model, an input batch must contain

an Anchor Image (current image of person “A”),

Positive Image (another image of person “A”) , and

a Negative Image (any other image that is not person

“A”). Through this, the neural network calculates the

face embeddings and adjusts the weights using a

method called triplet loss. In this way, the

embeddings of the “Anchor” and the “Positive”

image are close to each other, whereas the

”Negative” image it is farther away. A CNN (Caffe)

model computes the embeddings for all input

images, and these embeddings are sufficiently

different to train a classifier such as SVMs, SGD

Authorized licensed use limited to: SRM University. Downloaded on May 11,2022 at 06:34:43 UTC from IEEE Xplore. Restrictions apply.

Classifiers, Random Forests, etc. on top of the

calculated embeddings of the face, which constitute

the facial recognition pipeline.

1) Data Augmentation: To benefit from a low

amount of given data, our paper proposes certain

data augmentation methods to derive patterns in our

data to multiply the effectiveness. The concepts of

flipping, rotating, zooming and translating, scaling,

cropping, moving along x-axis and y-axis, adding

gaussian noise, shearing, skewing, implementing

black and white filters, and blurring images have

been implemented. For our dataset, the augmented

version has been displayed in Figure 4.

B. Video Classification for Crime Detection

This module aims towards detecting anomalies in

videos of footage and normal activity. The dataset at

play here used for training is the UCF Crimes

Dataset, which is the only dataset that contains

videos of diverse classes of crimes, each replete with

valuable and distinctive features. The dataset

contains 13 classes in total: Accidents, Fighting,

Burglary, Shoplifting, Robbery, Shooting, Abuse,

Arrest, Arson, Assault, Accidents, Explosion,

Stealing, and Vandalism. In total, it consists of

approximately one thousand nine hundred real-world

videos, all taken under different places and each

video showing a realistic crime. This amounts to a

total of one hundred and twenty-eight hours of

videos, cumulatively amounting to 95GB of storage

capacity. Figure 5 depicts the pipeline that is

followed for this module. Figure 6 showcases the

different videos of a few classes

Fig. 4. Different Types Of Augmented Images

Fig. 5. Crime Detection Module

1) Dataset Preparation and Pre-Processing: Three

techniques are identified and used in each scene,

which involve converting, enhancing and finally,

augmenting the data. This paper uses six classes -

“Abuse”, ”Assault”, ”Fighting”, ”Normal”,

”Robbery” and ”Vandalism”. Because of the sheer

size of the given videos, the authors had to reduce

the dataset classes. Video editing and trimming

helped in the reduction of redundant and misleading

features, as videos that were 5 minutes long were

trimmed to 45 seconds focusing on the duration

when the crime occurred as the irrelevant portions

had been discarded. Videos with low resolutions and

portions were sharpened and cropped to highlight

portions of the crime scene. Manually labelling the

parts of each crime video is done. The remaining

stream of the video can then be labeled under the

normal class. Finally, data augmentation in the form

of enlargement of the different types of data

available to train a specific model was done. Figure

7 gives an example of the data augmentation that

was done.

Authorized licensed use limited to: SRM University. Downloaded on May 11,2022 at 06:34:43 UTC from IEEE Xplore. Restrictions apply.

Fig. 6. Video Classes being turned into photos for preparation

Fig. 7. Normal Image, Rotated Left, Rotated Right

2) Residual Network (ResNet): The ResNet layers

are made such that they formulate as learning

residual functions with function to the inputs of the

layer, instead of learning unreferenced functions.

The architecture, with a depth of between 18 to 152

convolutional layers, bypasses the signal from one

layer to the other by introducing a short connection.

These connections have the ability to pass through

the gradient flows of layer networks, from its early

ones to the later ones, hence easing the training of

very deep networks. Residual Block in Figure 8

illustrated below shows how the connection

bypasses the signal from the top to the tail of the

block. By introducing the concept of Residual

Network, the architecture was able to solve the

vanishing gradient issue by using the skip

connections. This was it is able to skip training for

few layers and can make a direct connection to the

output [12], [13].

3) Proposed Video Classification Technique: The

proposed pipeline for crime classification requires

looping over all frames in an input video. For each

frame that it recognizes, it is passed through a CNN

and the same frame is classified individually and

independent of each other. Subsequently, the model

Authorized licensed use limited to: SRM University. Downloaded on May 11,2022 at 06:34:43 UTC from IEEE Xplore. Restrictions apply.

then chooses the label with the largest probability for

the frame and labels it, and finally writes the output

frame. Since we have a sequential problem, the

above method won‟t work since it is for a single

frame only. In the form of crime detection, it needs a

preserved correlation between subsequent frames for

single video input.

Fig. 8. Residual Block

This is achieved by calculating the above for a given

frame and maintaining a list of the last „N‟

predictions. Using these, the pipeline computes the

average of the last „N‟ predictions and chooses the

appropriate label with the largest probability and

returns the final output. 4) Training the ResNet on

UCF-Crimes Dataset: For establishing the training

and testing steps through ResNet is discussed below.

Firstly, locating the file directories of the training

and validation images, along with training

parameters like batch size, the number of epochs,

height, and width of images, the learning rate is also

specified. Generating training and validation data is

done using the Image Data Generator module in

Tensor Flow. As training runs, the behaviour for

each epoch is saved and plotted at the end of

training. Finally, the model along with its weights

are stored. For training, the stated hyper-parameters

like learning rate, the number of epochs, batch size

are specified for the model, These hyper-parameters

must be taken into account while training a CNN to

improve performance. Figure 9 shows the final

training epochs that were done.

The following parameters were provided to the

ResNet Module -

 Splitting the dataset: keeping 25% as test data

and remaining for model training.

 Number of Epochs = 50

 Loss = Categorical Cross-Entropy

 Optimizer = Learning rate of 0.0001 for

Stochastic Gradient Descent

 Metric = Accuracy

The evaluation and result of the final is based on the

following terms -

 False Positive: No crime occurs on the

scene, the model considers otherwise.

 False Negative: The model fails to detect the

crime occurred.

 True Positive: The model is able to detect

the crime correctly.

 True Negative: The crime does not occur

and the model is unable to detect it.

Fig. 9. Training the classifier (Epochs = 50)

The final epoch displays the following parameters -

 Training Loss - 0.158

 Training Accuracy - 0.953

 Validation Loss - 0.1168

 Validation Accuracy - 0.966

IV. RESULTS

A. Face Recognition

Training for Face Recognition was done for three

different faces with approximately 15 photos of each

person. Having limited resources and limited

computational power, the authors chose on keeping

the data for this module minimal, The results for it is

as shown below in Figure 10 and Figure 11 when

using a webcam as simulated CCTV stream.

As seen, the model was able to correctly classify the

faces as well as provide bounding boxes and

confidence levels for the faces.

Authorized licensed use limited to: SRM University. Downloaded on May 11,2022 at 06:34:43 UTC from IEEE Xplore. Restrictions apply.

Fig. 10. Face Recognition for person A

(simulating a CCTV)

Fig. 11. Face Recognition for person B

(simulating a CCTV)

B. Crime Detection

On training For 50 epochs, the model was evaluated

and gave the results shown in Table I and Table II.

The training loss and accuracy along with validation

loss and accuracy was recorded and is as seen in

Figure 12. Using the model that was trained, the

authors evaluated this using a rolling stream of a

YouTube video of a criminal activity of „Fighting‟,

„Vandalism‟ and „Abuse‟. This way, the model was

able to correctly predict the happenings going on in

the stream. This is shown in Figure 13, Figure 14

and Figure 15.

V. CONCLUSION AND FUTURE WORKS

The face recognition module‟s work done in this

project was able to demonstrate a simulated version

of the final prototype and were able to successfully

replicate a facial recognition model. The Crime

Detection module was able to give satisfactory

results with the ResNet Model, having trained it for

numerous classes after the final data preprocessing.

Future enhancements would include using more

facial classes for training the face recognition model.

It would also be interesting in creating a database of

these faces to easily identify a given victim for our

implementation with a live CCTV camera stream

when deploying it. Trying to use different facial

recognition models and comparing the results would

also be interesting to experiment with. Having

constraints due to computational power, this paper

uses a limited number of classes, this project can aim

to classify different types of crimes, and thereby,

using more data.

Table I Results for the Different Classes

Table II Accuracy Metrics

Fig. 12. Loss and Accuracy as a Function of Epochs

Authorized licensed use limited to: SRM University. Downloaded on May 11,2022 at 06:34:43 UTC from IEEE Xplore. Restrictions apply.

Fig. 13. Detecting and Classifying Crimes from Test

Footage

Fig. 14. Detecting and Classifying Movement with

Weapons

Fig. 15. Detecting and Classifying Crimes

REFERENCES

[1] A. R. Zamir, “Action recognition in realistic sports videos,” in

Computer vision in sports. Springer, 2014, pp. 181–208.

[2] Agarwal, H., Singh, A., Rajeswari, D., Deepfake Detection

using SVM, Proceedings of the 2nd International Conference on

Electronics and Sustainable Communication Systems, ICESC

2021this link is disabled, 2021, pp. 1245–1249.

[3] Soni, H., Arora, P., Rajeswari, D., Malicious Application

Detection in Android using Machine Learning, Proceedings of

the 2020 IEEE International Conference on Communication and

Signal Processing, ICCSP 2020this link is disabled, 2020, pp.

846–848.

[4] Olmos, Siham Tabik, and Francisco Herrera, Soft Computing

and Intelligent Information Systems research group, on

“Automatic Handgun Detection Alarm in Videos Using Deep

Learning Roberto”, February 20, 2017.

[5] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, “Deep

Residual Learning for Image Recognition”, 2015.

[6] Ucf-crime dataset (real-world anomalies detection in videos),

Jun 2019.

[7] Ming Cheng, Kunjing Cai, Ming Li, “RWF-2000: An Open

Large Scale Video Database for Violence Detection”, ICPR‟20,

https://arxiv.org/pdf/1911.05913v3.pdf.

[8] Umadevi V. Navalgund; Priyadharshini K, “Crime Intention

Detection System Using Deep Learning”, International

Conference on Circuits and Systems in Digital Enterprise

Technology (ICCSDET), December 2018.

[9] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar,

and L. Fei-Fei, “Largescale video classification with

convolutional neural networks,” in The IEEE Conference on

Computer Vision and Pattern Recognition (CVPR), June 2014.

[10] K. O‟Shea and R. Nash: An introduction to convolutional neural

networks, ArXiv e-prints, (2015)

[11] Tahani Almanie, Rsha Mirza, Elizabeth Lor, “Crime Prediction

Based on Crime Types and Using Spatial and Temporal

Criminal Hotspots”, International Journal of Data Mining

Knowledge Management Process (IJDKP), Vol.5, No.4, July

2015.

[12] Devvi Sarwinda, Radifa Hilya Paradisa, Alhadi

Bustamam,Pinkie Anggia: Deep Learning in Image

Classification using Residual Network (ResNet) Variants for

Detection of Colorectal Cancer, Procedia Computer Science

Volume 179, (2021) 423-431.

[13] Panagiotis Stalidis, Theodoros Semertzidis: Examining Deep

Learning Architectures for Crime Classification and Prediction,

Improved Forecasting through Artificial Intelligence. (October

2021).

Authorized licensed use limited to: SRM University. Downloaded on May 11,2022 at 06:34:43 UTC from IEEE Xplore. Restrictions apply.

Deep Learning Models for Crime Intention Detection Using Object Detection

Article

Full-text available

Apr 2023

The majority of visual based surveillance applications and security systems heavily rely on object detection, which serves as a critical module. In the context of crime scene analysis, images and videos play an essential role in capturing visual documentation of a particular scene. By detecting objects associated with a specific crime, police officers are able to reconstruct a scene for subsequent analysis. Nevertheless, the task of identifying objects of interest can be highly arduous for law enforcement agencies, mainly because of the massive amount of data that must be processed. Hence, the main objective of this paper is to propose a DL-based model for detecting tracked objects such as handheld firearms and informing the authority about the threat before the incident happens. We have applied VGG-19, ResNet, and GoogleNet as our deep learning models. The experiment result shows that ResNet50 has achieved the highest average accuracy of 0.92% compared to VGG19 and GoogleNet, which have achieved 0.91% and 0.89%, respectively. Also, YOLOv6 has achieved the highest MAP and inference speed compared to the faster R-CNN.

An Efficient Abnormal Event Detection System in Video Surveillance Using Deep Learning-Based Reconfigurable Autoencoder

Article

Apr 2024

Phylogenetic-Informed Generative Adversarial Network for Predicting Mutations in SARS-CoV-2

Conference Paper

Full-text available

Apr 2024

Colour Correction and Detail Enhancement in Underwater Images using Hybrid Real-ESRGAN

Conference Paper

Full-text available

Dec 2023

Traffic Sign Detection Using YOLOv8

Conference Paper

Full-text available

Jan 2024

Train Track Crack Prediction Using CNN with LeNet - 5 Architecture

Conference Paper

Full-text available

Nov 2022

Train track crack detection is a process of identifying cracks in the structure of railway tracks. Railways are major modes of transport in India. The tracks must be in good condition for trains to have safe voyages. Cracks that appear on the tracks are often due to heat and other natural causes. At present these cracks are identified manually by railway personnel by inspecting them at regular intervals. This process is not effective as it consumes more time and there is an increased chance of leaving the cracked track undiscovered. The aim of this research work is to avoid the derailment of trains and reduce the cost and time that happens due to the cracks. This work proposed a technique for recognizing railway track cracks by combining Convolutional Neural Networks with image pre-processing techniques. Observations indicate that neural networks are capable of capturing the colours and textures of lesions related to respective railway track breaks with 94.6% accuracy.

Identification of criminal & non-criminal faces using deep learning and optimization of image processing

Article

Full-text available

Oct 2023
MULTIMED TOOLS APPL

Since identifying criminals is a crucial function of intelligent surveillance systems, it has attracted a lot of attention. Although various approaches are developed for criminal face recognition, they cannot accurately identify the criminal faces. In this study, a novel advanced deep learning model was designed for accurate identification of criminal face from the CCTV images. The developed model utilizes five major phases namely, data collection, pre-processing, feature extraction, feature selection and classification. The study utilizes the data collected from the National Institute of Standards and Technology (NIST) containing criminal and non-criminal face images. The developed model employs Haarcascade algorithm for scaling and transforming the raw images into appropriate format for subsequent analysis. Further, the designed model utilizes Principal Component Analysis (PCA) and Ant Colony Optimization (ACO) for feature extraction and selection, respectively. Finally, the face recognition task was performed using the DenseNet 169 classifier. The developed framework was designed and implemented in Pytorch software and the result metrics are estimated. Furthermore, a comprehensive comparative study was conducted to validate the performances of the developed model with the conventional deep learning models. The experimental results and comparative study illustrate that the designed model outperformed the traditional models.

A real time crime scene intelligent video surveillance systems in violence detection framework using deep learning techniques

Article

Oct 2022
COMPUT ELECTR ENG

Surveillance system research is now experiencing great expansion. Surveillance cameras put in public locations such as offices, hospitals, schools, roads, and other locations can be utilised to capture important activities and movements for event prediction, online monitoring, goal-driven analysis, and intrusion detection. This research proposed novel technique in detecting crime scene video surveillance system in real time violence detection using deep learning architectures. Here the aim is to collect the real time crime scene video of surveillance system and extract the features using spatio temporal (ST) technique with Deep Reinforcement neural network (DRNN) based classification technique. The input video has been processed and converted as video frames and from the video frames the features has been extracted and classified. Its purpose is to detect signals of hostility and violence in real time, allowing abnormalities to be distinguished from typical patterns. To validate our system's performance, it is trained as well as tested in large-scale UCF Crime anomaly dataset. The experimental results reveal that the suggested technique performs well in real-time datasets, with accuracy of 98%, precision of 96%, recall of 80%t, and F-1 score of 78%.

Identification of Illicit Activities & Scream Detection using Computer Vision & Deep Learning

Conference Paper

Full-text available

May 2022

Deepfake Detection Using SVM

Conference Paper

Full-text available

Aug 2021

Deep Learning in Image Classification using Residual Network (ResNet) Variants for Detection of Colorectal Cancer

Article

Full-text available

Jan 2021

This paper investigates a deep learning method in image classification for the detection of colorectal cancer with ResNet architecture. The exceptional performance of a deep learning classification incites scholars to implement them in medical images. In this study, we trained ResNet-18 and ResNet-50 on colon glands images. The models trained to distinguish colorectal cancer into benign and malignant. We assessed our prototypes on three varieties of testing data (20%, 25%, and 40% of whole datasets). The empirical outcomes confirm that the application of ResNet-50 provides the most reliable performance for accuracy, sensitivity, and specificity value than ResNet-18 in three kinds of testing data. Upon three test assortments, we perceive the best performance value on 20% and 25% test sets with a classification accuracy of above 80%, the sensitivity of above 87%, and the specificity of above 83%. In this research, a deep learning method demonstrates the profoundly reliable and reproducible outcomes for biomedical image analysis.

Examining Deep Learning Architectures for Crime Classification and Prediction

Article

Full-text available

Oct 2021

In this paper, a detailed study on crime classification and prediction using deep learning architectures is presented. To the best of our knowledge, this is the first work that examines the effectiveness of deep learning algorithms on this domain and provides recommendations for designing and training deep learning systems for crime prediction and classification, using open data from police reports. Having as training data time-series of crime types per location, a comparative study of state-of-the-art methods against 3 different deep learning configurations is conducted. In our experiments with five publicly available datasets, we demonstrate that the deep learning-based methods consistently outperform the existing best-performing methods. Moreover, we evaluate the effectiveness of different parameters in the deep learning architectures and give insights for configuring them in order to achieve improved performance in crime classification and prediction.

RWF-2000: An Open Large Scale Video Database for Violence Detection

Conference Paper

Jan 2021

Malicious Application Detection in Android using Machine Learning

Conference Paper

Jul 2020

Crime Intention Detection System Using Deep Learning

Conference Paper

Dec 2018

Deep Residual Learning for Image Recognition

Conference Paper

Jun 2016

Action Recognition in Realistic Sports Videos

Article

Jan 2014

The ability to analyze the actions which occur in a video is essential for automatic understanding of sports. Action localization and recognition in videos are two main research topics in this context. In this chapter, we provide a detailed study of the prominent methods devised for these two tasks which yield superior results for sports videos.We adopt UCF Sports, which is a dataset of realistic sports videos collected from broadcast television channels, as our evaluation benchmark. First, we present an overview of UCF Sports along with comprehensive statistics of the techniques tested on this dataset as well as the evolution of their performance over time. To provide further details about the existing action recognition methods in this area, we decompose the action recognition framework into three main steps of feature extraction, dictionary learning to represent a video, and classification; we overview several successful techniques for each of these steps. We also overview the problem of spatio-temporal localization of actions and argue that, in general, it manifests a more challenging problem compared to action recognition. We study several recent methods for action localizationwhich have shown promising results on sports videos. Finally, we discuss a number of forward-thinking insights drawn from overviewing the action recognition and localization methods. In particular, we argue that performing the recognition on temporally untrimmed videos and attempting to describe an action, instead of conducting a forced-choice classification, are essential for analyzing the human actions in a realistic environment.

Large-Scale Video Classification with Convolutional Neural Networks

Conference Paper

Jun 2014

Convolutional Neural Networks (CNNs) have been established as a powerful class of models for image recognition problems. Encouraged by these results, we provide an extensive empirical evaluation of CNNs on large-scale video classification using a new dataset of 1 million YouTube videos belonging to 487 classes. We study multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggest a multiresolution, foveated architecture as a promising way of speeding up the training. Our best spatio-temporal networks display significant performance improvements compared to strong feature-based baselines (55.3% to 63.9%), but only a surprisingly modest improvement compared to single-frame models (59.3% to 60.9%). We further study the generalization performance of our best model by retraining the top layers on the UCF-101 Action Recognition dataset and observe significant performance improvements compared to the UCF-101 baseline model (63.3% up from 43.9%).

Deep Residual Learning for Image Recognition

Article

Dec 2015

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

Detecting Criminal Activities and Promoting Safety Using Deep Learning

Recommended publications

Deep Learning based Safe Social Distancing and Face Mask Detection in Public Areas for COVID-19 Safe...

A Deep Hybrid Learning Model to Detect Unsafe Behavior: Integrating Convolution Neural Networks and...

Safety Helmet Detection in Industrial Environment using Deep Learning

An Advanced Deep Learning Approach for Safety Helmet Wearing Detection