ArticlePDF Available

Disparity-Based Multiscale Fusion Network for Transportation Detection

Authors:

Abstract and Figures

The transportation detection of long-distance small objects has low accuracy. In this work, we propose DMF, which is based on disparity depths. We map different disparity regions to 2D candidate regions according to the distance to solve the small-object detection problem. This method clusters disparity maps of different depths. The projected image is extracted with image features in the mapping region. On the one hand, it uses a multicluster method to unsample 2D mapping regions. On the other hand, the feature fusion of different scales is performed on each cluster region. The experimental results on two datasets show that DMF can improve the detection accuracy of small objects.
Content may be subject to copyright.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 10, OCTOBER 2022 18855
Disparity-Based Multiscale Fusion Network
for Transportation Detection
Jing Chen ,QichaoWang , Weiming Peng, Haitao Xu, Xiaodong Li , and Wenqiang Xu
Abstract The transportation detection of long-distance small
objects has low accuracy. In this work, we propose DMF, which
is based on disparity depths. We map different disparity regions
to 2D candidate regions according to the distance to solve the
small-object detection problem. This method clusters disparity
maps of different depths. The projected image is extracted with
image features in the mapping region. On the one hand, it uses
a multicluster method to unsample 2D mapping regions. On the
other hand, the feature fusion of different scales is performed
on each cluster region. The experimental results on two datasets
show that DMF can improve the detection accuracy of small
objects.
Index Terms—Disparity depths, long distance, small objects,
multicluster, multiscale.
I. INTRODUCTION
DETECTION methods based on 2D image convolutional
neural networks (CNNs) have shown impressive results
on various tasks such as object detection, classification,
instance segmentation, and tracking [1]–[3]. They can provide
object positions and category confidence in 2D images. How-
ever, such methods lack in-depth information. These methods
cannot accurately quantify the distance between an object and
the camera [4]. Therefore, 3D object detection methods are
required to obtain the 3D size and rotation angle of an object
for the fields of automatic driving and traffic monitoring.
Thus, further information can be obtained, such as the speed
and direction of a detected object. In addition, 3D detection
methods have natural advantages to handle occluded and
overlapping objects.
The goal of 3D object detection is to obtain 3D bounding
boxes. In addition to an RGB image, the required information
contains the corresponding depth. In [5], a depth map was
Manuscript received 12 July 2021; revised 24 January 2022 and 9 March
2022; accepted 20 March 2022. Date of publication 31 March 2022; date
of current version 11 October 2022. This work was supported in part by
the National Science Foundation of China under Grant 61703127 and Grant
61976074, in part by the National Science Foundation of Zhejiang Province
of China under Grant LY17F020026 and Grant LY21F020014, and in part
by the Major Research Plan of the National Natural Science Foundation of
Zhejiang Province of China under Grant 2021C01114. The Associate Editor
for this article was Y. Kamarianakis. (Corresponding author: Jing Chen.)
Jing Chen, Qichao Wang, Weiming Peng, Haitao Xu, and Xiaodong
Li are with the School of Computer Science and Technology, Hangzhou
Dianzi University, Hangzhou 310018, China (e-mail: cj@hdu.edu.cn;
fugowang@hdu.edu.cn; penwm@hdu.edu.cn; xuhaitao@hdu.edu.cn;
hzxiaodong22@163.com).
Wenqiang Xu is with the College of Economics and Manage-
ment, China Jiliang University, Hangzhou 314423, China (e-mail:
wenqiang_xu163@163.com).
Digital Object Identifier 10.1109/TITS.2022.3161977
added to the 2D object detection framework. By detecting the
contours in the image, 2.5D proposals were generated. These
proposals included the disparity, height, and tilt angle of each
pixel of the target. On this basis, CNN was used for feature
extraction, and a support vector machine (SVM) was used for
object classification. In [6], a 3D object proposal (3DOP)
method for autonomous driving scenarios was proposed. This
method used a tuple to represent each 3D bounding box. The
tuple elements included the center coordinates, direction angle,
object category and the corresponding 3D box template set.
In [7], a 3D object detection method called deep sliding shapes
was proposed, which focused on indoor scenes. The paper
proposed a multiscale 3D region proposal network (RPN). For
each sliding window, 19 types of anchor boxes were defined.
A 3D CNN method with deep sliding shapes, which outputs
3D bounding boxes by inputting the stereo scene, was also
designed.
Some problems remain with the current 3D object detection
methods: (1) To detect small objects, few effective strategies
have been adopted. Because small objects carry less infor-
mation, the abilities of the current models to express small
object features are weaker. The most common method to
solve this problem is to use upsampling to resize the input
image in the training network. Due to the low efficiency of
image pyramids, some methods that consider features, such as
feature pyramid networks(FPNs) [8] and scale normalization
for image pyramids(SNIP) [9], have been proposed. (2) Hand-
crafted features must be added to most of these 3D methods
to compensate for the missing depth information. The use
of these specific hand-crafted features and a single RGB
image will hinder the expansion of an application scene.
These problems also restrict neural networks from effectively
learning 3D spatial information. (3) Since 3D object detection
adds additional depth information, it takes longer to acquire
these features.
General detection methods are only trained to detect objects
within a specific scale. The remaining range is ignored during
the back propagation process, which makes it difficult to detect
large objects in high-resolution images and small objects
in low-resolution images. The commonly used approach for
small-object detection is to upsample an image and apply
a pretrained network on the high-resolution image [10].
Detection with high-resolution images can enhance the small-
object information, which is conducive to better classification.
However, these methods miss the detection of medium- and
large-sized objects [9]. In addition, the method of dividing the
1558-0016 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: HANGZHOU DIANZI UNIVERSITY. Downloaded on October 22,2022 at 06:05:57 UTC from IEEE Xplore. Restrictions apply.
18856 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 10, OCTOBER 2022
scale according to the target size in an image can improve
the detection accuracy of different scale targets. However, this
type of method is not suitable for road scenes because in a
traffic scene, a small target in an image may be a small-size
target near the viewpoint in the real world or a large-size target
from the viewpoint in the real world. Therefore, the multiscale
detection of road objects based only on the size of an image
target may result in unsatisfactory results. We use an adaptive
scale based on disparity clustering to obtain more accurate
scale information for objects at different distances in an image.
We summarize our main contributions as follows:
We propose a multicluster region mapping method based
on disparity segmentation to detect objects of different
sizes. The feature loss caused by semantic abstraction in
the shallow network can be reduced.
We use different scales of feature extraction and different
levels of feature fusion strategies. Multiscale fusion can
be used on the upsampling feature and bottom-layer fea-
ture based on the distance information. The recognition
accuracy of extreme-size objects can be improved.
For most datasets that serve the field of autonomous
driving, regardless of the sampling angle or range, differ-
ences remain from the urban traffic scene. We constructed
a stereo vison dataset that contained urban traffic roads
in Hangzhou. This dataset makes the detection method
more suitable for traffic monitoring scenarios.
This paper is organized as follows. Section II discusses
the related works. Section III introduces the multiscale fusion
model based on disparity, including the generation method
of multiscale regions, mapping of disparity regions and mul-
tiscale feature fusion network. Section IV uses the KITTI
dataset to verify the proposed method. The effectiveness of this
method is verified by quantitative evaluations and comparative
experiments. Section V presents the conclusions.
II. RELATED WORK
3D object detection methods include images or image chan-
nels related to the distances of objects in a scene. According
to the feature dimension of the detection method, 3D detection
methods can be divided into three categories: image-based
detection, point cloud-based detection, and image and point
cloud fusion detection.
Image-based detection methods are mainly based on stereo
vision or monocular depth estimations. Qian et al. [11] used
a depth estimation method to generate a dense depth map.
By introducing a change of representation (CoR) layer, the
depth map can be directly converted to a pseudolidar point
cloud for 3D object detection. However, pseudolidar methods
are also required to estimate the background depth, which
wastes computational resources. In [12], DSGN was proposed
to extract 2D features from stereo image pairs with a Siamese
network. Then, these features were used to construct a plane-
sweep volume, which was transformed into a 3D geometric
volume by reversing the 3D projection for object detection.
When constructing a 3D volume, this method considers the
image depth information and semantic information, so it was
not efficient. In [13], Stereo R-CNN was proposed to create 2D
anchors and associated objects in left and right images. Then,
keypoints and viewpoints were combined with 2D anchors
to estimate the rough 3D object box. Finally, photometric
alignment was used to refine the 3D bounding boxes. However,
this method was inferior to 3D object detection methods based
on depth maps. CG-Stereo [14] proposed a confidence-guided
stereo 3D object detection pipeline, which used separate
decoders for foreground and background pixels to focus more
on the accurate points and boosts the 3D object detection
accuracy.
Affected by the receptive field, traditional 3D CNNs cannot
learn the local features of different scales well [15], [16].
Therefore, PointNet [17] and PointCNN [18] were proposed
and were new network structures for point cloud features.
PointNet [17] was the first deep learning framework that
directly addressed point clouds. After the original data and
features were aligned, the features that used feature points
as units were extracted first. Then, the global feature vec-
tor of the point cloud was extracted in the feature space
dimension. Afterward, a series of 3D detection methods
based on point clouds appeared, such as PV-RCNN [19] and
Frustum-PointNet [20]. In [18], an X-transform method called
PointCNN was proposed, which learned a set of weights X
from the input point. This set of weights can be used to
reweight and arrange the features associated with each point.
These methods focus on how to learn more effective spatial
geometric representations from point clouds. However, 3D
CNN on a large-scale sparse point cloud requires convolution
in 3 dimensions, so the entire detection process is extremely
time consuming. Another type of point cloud-based detection
method maps the point cloud to a certain plane, such as a
front view (FV) or a bird’s-eye view (BEV). This method
can maintain the depth information of a space and geometric
shape of an object. In addition, the occlusion problem can
be easily solved. A BEV method [21] with point cloud data
was used to achieve real-time 3D detection. It projects the
input point cloud to a 3-channel BEV plane. Then, 2D object
features and ground estimations are combined to realize offline
3D object detection. Because of the information loss in a
certain dimension during the mapping process, this type of
method is only suitable for scenarios where objects are located
on the same plane (such as in autonomous driving). In this
case, a partial loss of information will not greatly impact
the detection results. Recently, Point-GNN [22] was used to
encode a point cloud scene into a graph structure, and a graph
neural network was proposed for 3D object detection. 3DSSD
[23] is a one-stage anchor-free method. It uses a new fusion
sampling strategy to reduce the amount of calculation.
Image and point cloud fusion detection methods usually
map point clouds to a certain plane or several planes. A point
cloud can be fused with a feature map by selectively adding
handcrafted features. In [24], a deep fusion network for
lidar and image data was proposed, which is called MV3D.
MMF [25] was used to project a point cloud into an image
coordinate system to obtain a sparse depth image to learn
better feature representations. In addition, ground estimation
and depth completion can be used to improve the 3D object
detection precision. In [26], PointPainting was proposed to
Authorized licensed use limited to: HANGZHOU DIANZI UNIVERSITY. Downloaded on October 22,2022 at 06:05:57 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: DISPARITY-BASED MULTISCALE FUSION NETWORK FOR TRANSPORTATION DETECTION 18857
Fig. 1. Disparity-based detection network. The network structure can be divided into two parts: multicluster embedding and a multiscale network.
project semantic image information onto a point cloud. With
PointPainting, the semantic features of a point cloud can be
increased, and the deficiency of point cloud features can be
compensated. In [27], the fusion of BEV and FV proved that
fusion before RPN achieved better results. The sparse nonho-
mogeneous pooling layer in the network mainly implemented
the perspective transformation and fusion process between
RGB image and BEV map. ConFuse [28] used ResNet [29]
to extract features in both image and point cloud streams.
Multiscale fusion was performed on the image features; then,
the features were projected onto the BEV map. The fusion of
image features, location information, and point cloud stream
features was used to achieve 3D detection. This fusion method,
which simultaneously processes multiple data, can obtain more
comprehensive spatial and texture information. However, due
to the fusion operation and handcrafted feature acquisition, the
computation speed of the entire system decreases.
III. DISPARITY-BASED MULTISCALE
DETECTION METHOD
In this paper, we utilize the depth information of the dis-
parity map to branch the RGB image. The multiscale network
is obtained by clustering the depth feature. Then, the image
can be sent to a multiscale network for detection to balance
the recall ability and complexity of the model.
Our method inputs regions with different depths into a
detection network with different levels of feature fusion.
Thus, the recognition rate of small objects can be improved.
As shown in Figure 1, the network structure can be divided
into two parts: multicluster embedding and a multiscale net-
work. In the RPN part, the left image generates proposals
of various scales through multicluster embedding operations.
After RPN, through the operation of RoIAlign, the region of
interest (RoI) feature in each scale is obtained. Finally, after
the non-maximum suppression (NMS), the final classification
and regression results are obtained.
A. Disparity-Based Multicluster Embedding
In the multicluster embedding part, DMF inputs the map-
ping regions of different depths into different clusters to
replace the traditional network method of training and testing
on the entire image. Except for the FPN processing of different
scales, each cluster uses the same network structure. Therefore,
each cluster can share weights to reduce the complexity and
training time.
The image pair correlation method [13] does not require
additional matching calculations, but the data association is
based on the premise that the left and right image pairs
have identical detection results. It is difficult to satisfy this
correlation in an actual detection environment. For example,
if we adjust the sampling parameters of an image in the
image pair, the data correlation method will most likely fail.
In actual sampling, inconsistent detection results are common
due to differences in camera parameters, synchronization time
and other factors. Considering the robustness and real-time
requirements, we use the Fourier fast matching algorithm
proposed in [30] to calculate the disparity in the image pair.
The RGB image and disparity map are normalized first.
Then, we perform threshold segmentation on the disparity
image according to the depth information. The relationships
between object sizes and their distances in different image
regions are not linear. Therefore, we use the Fibonacci function
to segment the pixel region. Each cluster of the disparity image
can finally be obtained. The image segmentation algorithm
based on the disparity depth can be described as Algorithm 1.
where {x1,x2,...,xm}are the sample points of the disparity
map area at different distances; kis the number of cluster
branches; c(i)is the distance from each sample point to each
cluster center; u
kis the center of the updated sample point;
and Jis the error function of the algorithm.
To reduce the loss of edge features in disparity clustering,
we added an overlap between two regions adjacent to the
distance scale, i.e., we retain at both scales the region where
the disparity value is in the overlap range. Assuming that the
number of clusters is N, we define the overlapping area of the
corresponding segmentation disparity oias
oi=oi1δ, i=0,1,...,N1(1)
where δis the partition coefficient, and o0is the overlapping
area of the first layer. The parameters must be selected to
ensure that there is no significant difference in resolution
Authorized licensed use limited to: HANGZHOU DIANZI UNIVERSITY. Downloaded on October 22,2022 at 06:05:57 UTC from IEEE Xplore. Restrictions apply.
18858 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 10, OCTOBER 2022
Algorithm 1: Disparity-Based Cluster Algorithm
Input: x1,x2,...,xm;
Output: cluster k, avg distance;
different distance, divide into kcluster;
while Repeat for rounds do
c(i):= arg mink
xiμk
2;
for distance both min do
μ
k:= 1
|c(i)|xc(i)x;
if μ
k!=μkthen
μk=μ
k;
end
update distance;
end
J(c(1),c(2),...,c(m)
1
2,...,μ
k)
=1
m
1
km
i=1k
j=1
x(i)μj
2;
end
μ1
2,...,μ
krepresent the center of k depth ranges,
respectively;
return μ1
2,...,μ
k;
between the effective region of interest and the training
network after rescaling. Through the disparity segmentation
of the depth range, target regions at different distances can
be separated. Branches can be easily established in the sub-
sequent detection process. Objects with different gray values
in different distance ranges are separately detected. The same
scale is applied for objects within the corresponding depth
range.
B. Clustering Based Multiscale Network
In the multiscale part, a multiscale pretraining mechanism
and a multiscale feature fusion strategy are utilized. Compared
with the feature fusion of an entire image, multiscale feature
fusion can play the role of a shallow network to improve the
detection effect of small object regions. Small-object detec-
tion requires higher-resolution feature maps to focus on the
information in the corresponding area. The fusion of shallow
high-resolution features and high-level semantic features can
be used to detect objects at different scales.
Based on disparity clustering, the disparity segmentation
regions are mapped back to the 2D image according to
the coordinate points. By using a branch extraction strategy,
feature extraction of different scales can be performed on
regions of different depths. The mapping result is branched in
different clusters. Then, pretrained networks of different scales
can be used in each cluster. It is assumed that the distance and
scale satisfy the following relationship:
N=ae(d+b)(2)
where Nis the scale factor of the network. The settings
of the number of upsampling and network layers are related
to the scale factor. dis the distance scale between the target in
the image region and the optical center of the left camera; a,b
are scale factors whose values are obtained from experiments.
The first nlayers are shared convolutional layers in the net-
work. Features of different convolutional layers are extracted
after the n+1 th layer. The network is shown in Figure 1. For
clusters with larger distances, a shallower network is used to
fuse the high-resolution and upper-layer semantic features. The
gradient of pixels that are not in the cluster is set to 0 during
the training process. Finally, the detection result is obtained
by classifier and bounding box regression.
C. Loss Function
We use multitask loss, including classification, regression,
and depth value. Joint training has faster speed and a smaller
memory footprint. In addition, we adopted a multiscale train-
ing strategy. In the loss function, the loss of each scale task
is calculated according to the range of the clustering branch.
The loss function of the network structure is defined as:
L=
n
i=1
Lcls
i+
n
i=1
Lreg
i+
n
i=1
Ldepth
i(3)
where Lcls is the classification loss; Lres is the regression loss;
Ldepth is the depth distance loss; kis the number of cluster
branches.
Lcls
i=−log p
ipi+1p
i(1pi)(4)
p
i=0,negative label
1,positive label (5)
where piis the probability when the object is predicted, and
p
iis the label of the ground truth.
The regression loss can be calculated using the smooth
L1function based on the target distance, where xi,yi,wi,and
hiare the ground-truth bounding box center point coordinates,
width and height, respectively. x
i,y
i,w
i,andh
iare the
predicted bounding box center point coordinates, width and
height, respectively.
Lreg
i=1
Nreg
N
i
smooth L1(xix
i)2+(yiy
i)2
+(wiw
i)2+(hih
i)2(6)
The depth loss uses the smooth L1function:
Ldepth
i=1
ND
ND
i=1
smooth L1(diˆ
di)(7)
where NDis the number of pixels in the ground-truth depth.
By projection transformation, BEV is considered the rota-
tion angle along the yaxis of the camera coordinate system.
In other words, the elevation data are projected on the xoz
plane of the stereo vision coordinate system. The origin of
this coordinate system is the position of the optical center of
the left camera. The projection transformation satisfies:
xpypzpT=Rcy xcyczcT(8)
Rcy =
cos θc0sin θc
01 0
sin θc0cosθc
(9)
Authorized licensed use limited to: HANGZHOU DIANZI UNIVERSITY. Downloaded on October 22,2022 at 06:05:57 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: DISPARITY-BASED MULTISCALE FUSION NETWORK FOR TRANSPORTATION DETECTION 18859
TAB L E I
AVERAGE PRECISION OF BIRDSEYEVIEW (APBEV)AND 3D BOXES (AP3D )COMPARISON
EVALUATED ON THE KITTI VALIDATION SET CONTAINING CARS
TAB L E I I
AVERAGE PRECISION OF THE BIRDSEYEVIEW (APBEV)AND 3D BOX (AP3D)COMPARISON EVALUATED ON THE KITTI TEST SET CONTAINING CARS
where (xp,yp,zp)is the projection coordinate, (xc,yc,zc)is
the visual coordinate, and Rcy is the rotation vector of the
road plane along the yaxis of the visual coordinate system.
When the projection coordinates and visual coordinates are
known, rotation angle θcof the object along the yaxis can be
obtained, which is the vehicle orientation.
IV. EXPERIMENT
A. KITTI Quantitative Analysis
We use the KITTI dataset [39] for a quantitative comparison.
The dataset contains the 3D border annotations of objects. The
label of each object is composed of its category and 3D size.
For vehicle detection, the intersection over union (IoU) overlap
threshold is 0.7. For the detection of pedestrians and bicycles,
the IoU overlap threshold is 0.5. The detections of regions
that are not of interest or of objects that are smaller than the
minimum size are not considered false positives. The dataset
defines three modes of easy, moderate and hard based on
the minimum border height, maximum occlusion degree, and
maximum truncation. The evaluation website uses a moderate
mode to sort the algorithms. We use 3D detection and BEV
detection metrics for contrasting experiments.
Since different methods have different input images and
network structures, we directly compare the accuracy of the
final detection results. Table I compares the average precision
of BEV (APBEV) and the 3D box (AP3D) evaluated on the
KITTI validation set. Table II compares APBEV and AP3D
evaluated on the KITTI test set. Table III compares APBEV
and AP3D evaluated for pedestrians and cyclists on the KITTI
test set.
As can be seen from the tables, our algorithm has clear
advantage between the stereo detection methods. Specifically,
the depth-based multiscale method improves the accuracy in
the moderate and hard modes. We did not use point cloud
data, but in moderate and hard modes, the detection results
are still better than the MMLAB LIGA-Stereo method using
point cloud and stereo data. The greater resolution difference
between validation data and training data, the worse the
experimental effect, which shows that the robustness of CNN
to the scale changes of the data is not good enough [9]. Since
the CNN method has no scale invariance, it can only adapt
to objects of different scales by setting many parameters.
Common upsampling strategies can improve the detection
effect of small objects. However, these strategies make the
original normal size object appear larger, which will reduce the
detection performance. The multi-scale fusion in this paper is
to extract the scale features of images of different resolutions,
thereby reducing the impact of the inconsistent resolution of
the training data on the model effect. The experimental results
demonstrate that our method does not affect the accuracy of
detection in the easy mode. Therefore, the disparity-based
multiscale detection method is effective in improving small-
object detection.
B. Urban Road Validation
For most datasets that serve the field of autonomous driving,
the camera installation is not fixed, and the sampling angle
Authorized licensed use limited to: HANGZHOU DIANZI UNIVERSITY. Downloaded on October 22,2022 at 06:05:57 UTC from IEEE Xplore. Restrictions apply.
18860 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 10, OCTOBER 2022
TABLE III
AVERAGE PRECISION OF THE BIRDSEYE VIEW (APBEV )AND 3D BOX (AP3D)COMPARISON EVALUATED
ON THE KITTI TEST SET CONTAINING PEDESTRIANS AND CYCLISTS
TAB L E I V
DETECTION RESULTS WITH DIFFERENT NUMBERS OF CLUSTER BRANCHES
between camera and road is small. Therefore, these datasets
are not suitable for the detection and analysis of traffic
scenes. To compensate for this limitation, we created an urban
road stereo vision dataset. We collected stereo vision video
and image data of the Hangzhou city road network. The
labeled information includes vehicles, nonmotorized vehicles,
pedestrians, and lanes. The dataset includes 5890 sets of left
and right training images and 3902 sets of left and right
test images. It contains 8126 labeled objects in total. The
detection defines four different ranges of 50 m, 100, 150 m
and 200 m according to the distance between the target and
the sampling camera. We specifically annotated long-distance
objects in the dataset. The number of marked objects at the
maximum distance is approximately 1642. The training data
are divided into a training set and a validation set. The ratio of
the training set to the validation set was 4:1. Figure 2 shows
the result of 3D object detection using our dataset. The mul-
tiscale training strategy improves the effectiveness of feature
extraction.
ResNet [29] is the backbone network. For different cluster-
ing regions, pretraining is performed at different resolutions.
According to the depth information, the region ranges for
different disparity values in the same image are clustered.
A multiscale strategy is used for training and testing. We use
AP3D and AP2D to evaluate the different methods.
The number of clusters k in Algorithm 1 is the number of
divisions of the disparity map. This is an experience value.
Table IV shows the detection results of training with different
numbers of branches. The result in Table IV demonstrates
that when k<4, more branches correspond to higher detection
accuracy. However, when the k value exceeds 3, the accuracy
is not significantly improved, which shows that the multiscale
network enhances the expression ability of target features at
different distances and resolutions. However, when the scale
division exceeds 3, the object features in a certain range are
not so obviously affected by the scale. Considering the cost
of pretraining and network training caused by scale branches,
the number of clusters is 3.
We analyze the importance of the multicluster and multi-
scale models in DMF. The baseline (Table V(a)) used ResNet
[29]. We used 3 different image resolutions: (1400 ×2000),
(800 ×1200) and (480 ×800). The disparity parameters of
the overlap zone are set to 6 and 10. The corresponding
disparity intervals of the object are [0,168], [159,224] and
[219,255]. Table V(b) shows that the multicluster method
improves the long-distance detection. Table V(c) shows that
Authorized licensed use limited to: HANGZHOU DIANZI UNIVERSITY. Downloaded on October 22,2022 at 06:05:57 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: DISPARITY-BASED MULTISCALE FUSION NETWORK FOR TRANSPORTATION DETECTION 18861
Fig. 2. Validation of Hangzhou urban road transportation detection.
TAB L E V
ADDED DEPTH AND MULTICLUSTER ANALYSES O F DMF FOR ABLATION STUDIES
the multiscale method significantly improves the detection
accuracy. There is less information with small target objects.
Its feature expression capabilities is weaker. For objects with
different depth ranges, using different scale training strategies
can better improve this problem.
V. C ONCLUSION
One of the reasons for the low accuracy of small object
detection is that the resolution of small objects is lower
than that of normal-sized targets. As a result, the ability to
express small object features is weak, which often makes the
extraction of small object features insufficient. The commonly
used upsampling method cannot detect targets of all scales
in an image [9]. Although the method of dividing the scale
according to the image size can improve the detection accuracy
of different scale targets, this kind of method is not suitable
for road scenes.
To solve these problems, we propose a multiscale 3D
detection method based on disparity clustering. This method
uses disparity clustering information to map the area to
be inspected. According to the disparity depth information,
a multiscale feature fusion detection model is constructed. The
fusion of 2D image features and 3D depth features can be
adapted to the detection of long-distance, small-size, partially
occluded or truncated objects.
REFERENCES
[1] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards
real-time object detection with region proposal networks,” IEEE
Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149,
Jun. 2017.
[2] C. Chen, B. Liu, S. Wan, P. Qiao, and Q. Pei, An edge traffic flow
detection scheme based on deep learning in an intelligent transportation
system, IEEE Trans. Intell. Transp. Syst., vol. 22, no. 3, pp. 1840–1852,
Mar. 2021.
[3] Y. Cui et al., “Deep learning for image and point cloud fusion in
autonomous driving: A review, IEEE Trans. Intell. Transp. Syst., vol. 23,
no. 2, pp. 722–739, Feb. 2021.
[4] Y. Wang, W.-L. Chao, D. Garg, B. Hariharan, M. Campbell, and
K. Q. Weinberger, “Pseudo-LiDAR from visual depth estimation: Bridg-
ing the gap in 3D object detection for autonomous driving,” in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019,
pp. 8445–8453.
[5] S. Gupta, R. Girshick, P. Arbeláez, and J. Malik, “Learning rich
features from RGB-D images for object detection and segmentation,”
in Proc. Eur. Conf. Comput. Vis. Zürich, Switzerland: Springer, 2014,
pp. 345–360.
Authorized licensed use limited to: HANGZHOU DIANZI UNIVERSITY. Downloaded on October 22,2022 at 06:05:57 UTC from IEEE Xplore. Restrictions apply.
18862 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 10, OCTOBER 2022
[6] X. Z. Chen, K. Kundu, Y. Zhu, S. Fidle, R. Urtasun, and H. Ma,
“3D object proposals using stereo imagery for accurate object class
detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 5,
pp. 1259–1272, May 2018.
[7] S. Song and J. Xiao, “Deep sliding shapes for amodal 3D object
detection in RGB-D images,” in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit. (CVPR), Jun. 2016, pp. 808–816.
[8] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and
S. Belongie, “Feature pyramid networks for object detection,” in
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017,
pp. 2117–2125.
[9] B. Singh and L. S. Davis, “An analysis of scale invariance in object
Detection–SNIP,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog-
nit., Jun. 2018, pp. 3578–3587.
[10] M. Haris, G. Shakhnarovich, and N. Ukita, “Task-driven super
resolution: Object detection in low-resolution images, 2018,
arXiv:1803.11316.
[11] R. Qian et al., “End-to-End pseudo-LiDAR for image-based 3D object
detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
(CVPR), Jun. 2020, pp. 5881–5890.
[12] Y. Chen, S. Liu, X. Shen, and J. Jia, “DSGN: Deep stereo geometry
network for 3D object detection,” in Proc. IEEE/CVF Conf. Comput.
Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 12536–12545.
[13] P. Li, X. Chen, and S. Shen, “Stereo R-CNN based 3D object detection
for autonomous driving,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
Recognit. (CVPR), Jun. 2019, pp. 7644–7652.
[14] C. Li, J. Ku, and S. L. Waslander, “Confidence guided stereo 3D object
detection with split depth estimation,” in Proc. IEEE/RSJ Int. Conf.
Intell. Robots Syst. (IROS), Oct. 2020, pp. 5776–5783.
[15] B. Li, “3D fully convolutional network for vehicle detection in point
cloud,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS),
Sep. 2017, pp. 1513–1518.
[16] M. Engelcke, D. Rao, D. Z. Wang, C. H. Tong, and I. Posner,
“Vote3Deep: Fast object detection in 3D point clouds using efficient
convolutional neural networks,” in Proc. IEEE Int. Conf. Robot. Autom.
(ICRA), May 2017, pp. 1355–1361.
[17] R. Q. Charles, H. Su, M. Kaichun, and L. J. Guibas, “PointNet:
Deep learning on point sets for 3D classification and segmentation,”
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017,
pp. 652–660.
[18] Y. Li, R. Bu, M. Sun, W. Wu, X. Di, and B. Chen, “PointCNN:
Convolution on X-transformed points,” in Proc. Adv. Neural Inf. Process.
Syst., vol. 31, 2018, pp. 820–830.
[19] S. Shi et al., “PV-RCNN: Point-voxel feature set abstraction for 3D
object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog-
nit. (CVPR), Jun. 2020, pp. 10529–10538.
[20] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas, “Frustum pointnets
for 3D object detection from RGB-D data,” in Proc. IEEE/CVF Conf.
Comput. Vis. Pattern Recognit., Jun. 2018, pp. 918–927.
[21] J. Beltran, C. Guindel, F. M. Moreno, D. Cruzado, F. Garcia, and
A. De La Escalera, “BirdNet: A 3D object detection framework from
LiDAR information,” in Proc. 21st Int. Conf. Intell. Transp. Syst. (ITSC),
Nov. 2018, pp. 3517–3523.
[22] W. Shi and R. Rajkumar, “Point-GNN: Graph neural network for 3D
object detection in a point cloud,” in Proc. IEEE/CVF Conf. Comput.
Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 1711–1719.
[23] Z. Yang, Y. Sun, S. Liu, and J. Jia, “3DSSD: Point-based 3D single
stage object detector,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
Recognit. (CVPR), Jun. 2020, pp. 11040–11048.
[24] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-view 3D object detec-
tion network for autonomous driving,” in Proc. IEEE Conf. Comput. Vis.
Pattern Recognit. (CVPR), Jul. 2017, pp. 1907–1915.
[25] M. Liang, B. Yang, Y. Chen, R. Hu, and R. Urtasun, “Multi-
task multi-sensor fusion for 3D object detection,” in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019,
pp. 7345–7353.
[26] S. Vora, A. H. Lang, B. Helou, and O. Beijbom, “Point-
Painting: Sequential fusion for 3D object detection,” in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020,
pp. 4604–4612.
[27] Z. Wang, W. Zhan, and M. Tomizuka, “Fusing bird’s eye view LIDAR
point cloud and front view camera image for 3D object detection, in
Proc. IEEE Intell. Vehicles Symp. (IV), Jun. 2018, pp. 1–6.
[28] M. Liang, B. Yang, S. Wang, and R. Urtasun, “Deep continuous fusion
for multi-sensor 3D object detection,” in Proc. Eur. Conf. Comput. Vis.
(ECCV), 2018, pp. 641–656.
[29] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
(CVPR), Jun. 2016, pp. 770–778.
[30] J. Chen, W. Xu, H. Xu, F. Lin, Y. Sun, and X. Shi, “Fast vehicle detection
using a disparity projection method, IEEE Trans. Intell. Transp. Syst.,
vol. 19, no. 9, pp. 2801–2813, Sep. 2018.
[31] A. Mousavian, D. Anguelov, J. Flynn, and J. Kosecka, “3D bound-
ing box estimation using deep learning and geometry,” in Proc.
IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017,
pp. 7074–7082.
[32] A. D. Pon, J. Ku, C. Li, and S. L. Waslander, “Object-centric stereo
matching for 3D object detection,” in Proc. IEEE Int. Conf. Robot.
Autom. (ICRA), May 2020, pp. 8383–8389.
[33] W. Bao, B. Xu, and Z. Chen, “MonoFENet: Monocular 3D object
detection with feature enhancement networks,” IEEE Trans. Image
Process., vol. 29, pp. 2753–2765, 2019.
[34] H. Konigshof, N. O. Salscheider, and C. Stiller, “Realtime 3D object
detection for automated driving using stereo vision and semantic infor-
mation,” in Proc. IEEE Intell. Transp. Syst. Conf. (ITSC), Oct. 2019,
pp. 1405–1410.
[35] Z. Xu et al., “ZoomNet: Part-aware adaptive zooming neural network
for 3D object detection,” in Proc. AAAI Conf. Artifi. Intel., vol. 34, 2020,
pp. 12557–12564.
[36] J. Sun et al., “Disp R-CNN: Stereo 3D object detection via
shape prior guided instance disparity estimation, in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020,
pp. 10548–10557.
[37] D. Garg, Y. Wang, B. Hariharan, M. Campbell, K. Q. Weinberger,
and W.-L. Chao, “Wasserstein distances for stereo disparity estimation,”
2020, arXiv:2007.03085.
[38] X. Guo, S. Shi, X. Wang, and H. Li, “LIGA-stereo: Learning
LiDAR geometry aware representations for stereo-based 3D detec-
tor, in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021,
pp. 3153–3163.
[39] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics:
The KITTI dataset, Int. J. Robot. Res., vol. 32, no. 11, pp. 1231–1237,
2013.
Jing Chen received the Ph.D. degree in mechan-
ical engineering from Zhejiang University, China,
in 2010. She is currently an Associate Professor with
the School of Computer Science and Technology,
Hangzhou Dianzi University, China. Her research
interests include computer vision, machine learning,
and urban transportation.
Qichao Wang received the B.S. degree from
the Qingdao University of Technology, Qingdao,
China, in 2015. He is currently pursuing the M.D.
degree in information technology with the School
of Computer Science, Hangzhou Dianzi University,
Zhejiang, China. His research interests include intel-
ligent transportation systems, pattern recognition,
and computer vision.
Authorized licensed use limited to: HANGZHOU DIANZI UNIVERSITY. Downloaded on October 22,2022 at 06:05:57 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: DISPARITY-BASED MULTISCALE FUSION NETWORK FOR TRANSPORTATION DETECTION 18863
Weiming Peng received the Ph.D. degree in
computer application technology from the South
China University of Technology, Guangzhou, China,
in 2013. He is currently a Lecturer with Hangzhou
Dianzi University, Hangzhou, China. His current
research interests include quantum computation and
data fusion.
Haitao Xu is currently an Associate Professor with
Hangzhou Dianzi University. His research interests
include machine learning and data mining. As a
member, he won the Second Prize of Zhejiang
Science and Technology Award and the First Prize
of Zhejiang Higher Scientific Research Achievement
Awa r d .
Xiaodong Li received the Ph.D. degree from
the College of Control Science and Engineering,
Zhejiang University, China, in 2014. He is cur-
rently a Lecturer with Hangzhou Dianzi Univer-
sity, Hangzhou, China. His current research interests
include machine learning, image processing, and
pattern recognition.
Wenqiang Xu received the Ph.D. degree in eco-
nomics management from Wuhan University, China,
in 2015. He is currently a Lecturer with the Col-
lege of Economics and Management, China Jiliang
University. His research interests include computer
vision, machine learning, and urban transportation.
Authorized licensed use limited to: HANGZHOU DIANZI UNIVERSITY. Downloaded on October 22,2022 at 06:05:57 UTC from IEEE Xplore. Restrictions apply.
... The major purpose was to furnish the table with datasets in a way which was not sporadic, as these datasets were not highly focused in other studies. The Vehicle Reidentification (VeRi) dataset [59,60] was built for vehicle re-identification from an original surveillance scene, which was labeled with various attributes and which contains 50 k images captured from different camcorders covering a distance of 1 km. The sample of labeled parameters with spatiotemporal factors includes boxes of plates, strings of plates, and timestamps of different vehicles. ...
... The major purpose was to furnish the table with datasets in a way which was not sporadic, as these datasets were not highly focused in other studies. The Vehicle Re-identification (VeRi) dataset [59,60] was built for vehicle re-identification from an original surveillance scene, which was labeled with various attributes and which contains 50 k images captured from different camcorders covering a distance of 1 km. The sample of labeled parameters with spatiotemporal factors includes boxes of plates, strings of plates, and timestamps of different vehicles. ...
Article
Full-text available
Recent advancements in image processing and machine-learning technologies have significantly improved vehicle monitoring and identification in road transportation systems. Vehicle classification (VC) is essential for effective monitoring and identification within large datasets. Detecting and classifying vehicles from surveillance videos into various categories is a complex challenge in current information acquisition and self-processing technology. In this paper, we implement a dual-phase procedure for vehicle selection by merging eXtreme Gradient Boosting (XGBoost) and the Multi-Objective Optimization Genetic Algorithm (Mob-GA) for VC in vehicle image datasets. In the initial phase, vehicle images are aligned using XGBoost to effectively eliminate insignificant images. In the final phase, the hybrid form of XGBoost and Mob-GA provides optimal vehicle classification with a pioneering attribute-selection technique applied by a prominent classifier on 10 publicly accessible vehicle datasets. Extensive experiments on publicly available large vehicle datasets have been conducted to demonstrate and compare the proposed approach. The experimental analysis was carried out using a myRIO FPGA board and HUSKY Lens for real-time measurements, achieving a faster execution time of 0.16 ns. The investigation results show that this hybrid algorithm offers improved evaluation measures compared to using XGBoost and Mob-GA individually for vehicle classification.
... Edge servers in the context of edge computing can encounter resource insufficiency issues. When an edge server is burdened with serving a large number of tasks, the combined computational requirements of these tasks may surpass the available computing resources of the server [5]. This situation leads to overload and a deterioration in the Quality of Service (QoS) for the tasks being processed. ...
Article
Full-text available
The study focuses on utilizing the computational resources present in vehicles to enhance the performance of multi-access edge computing (MEC) systems. While vehicles are typically equipped with computational services for vehicle-centric Internet of Vehicles (IoV) applications, their resources can also be leveraged to reduce the workload on edge servers and improve task processing speed in MEC scenarios. Previous research efforts have overlooked the potential resource utilization of passing vehicles, which can be a valuable addition to MEC systems alongside parked cars. This study introduces an assisted MEC scenario where a base station (BS) with an edge server serves various devices, parked cars, and vehicular traffic. A cooperative approach using the Deep Deterministic Policy Gradient (DDPG) based Federated Learning method is proposed to optimize resource allocation and job offloading. This method enables the transfer of device operations from devices to the BS or from the BS to vehicles based on specific requirements. The proposed system also considers the duration for which a vehicle can provide job offloading services within the range of the BS before leaving. The objective of the DDPG-FL method is to minimize the overall priority-weighted task computation time. Through simulation results and a comparison with three other schemes, the study demonstrates the superiority of their proposed method in seven different scenarios. The findings highlight the potential of incorporating vehicular resources in MEC systems, showcasing improved task processing efficiency and overall system performance.
... Multiple initiatives have been undertaken to decrease the energy consumption of traditional deep learning models. Several methods have been devised to identify smaller networks that maintain equal performance, but require fewer parameters and exhibit less complexity than the original network [24,25]. Quantization, pruning, and knowledge distillation are the key elements of these methodologies. ...
Article
Full-text available
In an era dominated by network connectivity, the reliance on robust and secure networks has become paramount. With the advent of 5G and the Internet of Things, networks are expanding in both scale and complexity, rendering them susceptible to a myriad of cyber threats. This escalating risk encompasses potential breaches of user privacy, unauthorized access to transmitted data, and targeted attacks on the underlying network infrastructure. To safeguard the integrity and security of modern networked societies, the deployment of Network Intrusion Detection Systems is imperative. This paper presents a novel lightweight detection model that seamlessly integrates Spiking Neural Networks and Convolutional Neural Networks with advanced algorithmic frameworks. Leveraging this hybrid approach, the proposed model achieves superior detection accuracy while maintaining efficiency in terms of power consumption and computational resources. This paper presents a new style recognition model that seamlessly integrates spiking neural networks and convolutional neural networks with advanced algorithmic frameworks. We call this combined method Spiking-HCCN. Using this hybrid approach, Spiking-HCCN achieves superior detection accuracy while maintaining efficiency in terms of power consumption and computational resources. Comparative evaluations against state-of-the-art models, including Spiking GCN and Spike-DHS, demonstrate significant performance advantages. Spiking-HCCN outperforms these benchmarks by 24% in detection accuracy, 21% in delay, and 29% in energy efficiency, underscoring its efficacy in fortifying network security in the face of evolving cyber threats.
... The fog computing paradigm is extremely suitable for latency-sensitive IoT applications since it not only reduces latency but also minimizes the quantity of data transmitted to the cloud for processing [18,19]. Stated differently, fog computing technology presents a viable means of tackling the issues brought about by the exponential expansion of IoT devices [20][21][22]. Fog computing is extremely susceptible to breaches in user privacy and information security because of its capabilities and flexible deployment options [23]. ...
Article
Full-text available
Compliance with security requirements in the fog computing environment is known as an important phenomenon in maintaining the quality of service due to the dynamic topology. Security and privacy breaches can occur in fog computing because of its properties and the adaptability of its deployment method. These characteristics render current systems inappropriate for fog computing, including support for high mobility, a dynamic environment, geographic distribution, awareness of location, closeness to end users, and absence of redundancy. Although efficient secure routing protocols have been developed by researchers in recent years, it is challenging to ensure security, reliability, and quality of service at the same time to overcome the limitations of cloud-fog computing. In light of the fact that trust management is an effective means of protecting sensitive information, this study proposes a two-way trust management system (TMS) that would enable both the service requester and the service provider to verify each other's reliability and safety. The trustworthiness of the service seeker can also be verified in this way. So that fog clients can confirm that fog nodes can deliver suitable, dependable, and secure services, trust in a fog computing environment should ideally be two-way. The ability to verify the authenticity of fog clients is an important capability for fog nodes to have. A distributed, event-based, multi-trust trust system is presented by the suggested approach to trust computation, which makes use of social relationships (nodes and clients) and service quality criteria. Hence, the trust score is computed using a number of characteristics. Here, the weight of direct and indirect ratings is emphasized, and the final trust score is computed by dynamically merging the information gained from self-observation and the suggestions of nearby nodes. An extensive evaluation of the proposed method shows that it is resistant to a large number of badly behaved nodes and can successfully neutralize trust-based attacks.
... The transportation sector is incorporating quantum IoT devices into logistics with the aim of improving supply chain visibility, fleet management, and route optimization [94]. The utilization of real-time analysis for traffic, weather, and product conditions enables the optimization of delivery routes, reduction in delays, and lowering of transportation expenses [95][96][97][98][99][100]. Quantum Internet (QI) uses quantum bits for secure communication, based on quantum mechanics laws. ...
Article
Full-text available
With the advent of internet-enabled and hybrid technologies, information is becoming increasingly accessible to the general public. Smartphones and other gadgets are used extensively by people to share and promote ideas, in a variety of ways. Human interaction and communication has become more reliable and effective through advanced computing technologies. Quantum computing is an emerging paradigm that will change the lives of individuals and the operations of organizations. Quantum computers solve problems at high speed by operating in a superposition state in which the state can be either zero or one at the same instant. Quantum sensors can be used efficiently in technological research to make accurate measurements and collect data that provide new insights into the behavior of nanomaterials. The use of quantum computing could also speed up the manufacturing process of devices with remarkable properties such as superconductivity, high strength or improved signal performance. Quantum computing has the ability to dramatically speed up the development process of various organizations and increase their efficiency and effectiveness. The security and reliability of data and communication is improved by quantum computing techniques such as key generation and entanglement dispersion. Companies use cryptographic algorithms to protect their data. However, with the advent of quantum computing, cryptographic methods that rely on numerical aspects are no longer sufficient to protect data. Quantum computing is an emerging field that is being applied to various problems that previously could not be solved using conventional methods. Quantum computing plays an important role in the field of information processing, where information is precisely analyzed. Various quantum technologies and algorithms are used to secure company data. This paper provides a systematic review of the literature on the principles of quantum computing. The SLR focuses on achieving four aims “identifying a variety of quantum IoT devices, analyzing their importance in different industries, highlighting the challenges of quantum technology, and presenting various techniques used by researchers to overcome different problems”. Quantum cryptography is identified as a key strategy for improving the security of IoT systems and ensuring the security and consistency of information.
... This understanding allows it to effectively differentiate between benign and harmful activities [9]. Recent research indicates that IDS which employ ML algorithms are capable of achieving superior accuracy rates, surpassing those of conventional approaches [10,11]. However, a common drawback of IDS is its inability to detect previously unseen attacks, particularly those classified as zero-day attacks [12,13]. ...
Article
Full-text available
Computer networks face vulnerability to numerous attacks, which pose significant threats to our data security and the freedom of communication. This paper introduces a novel intrusion detection technique that diverges from traditional methods by leveraging Recurrent Neural Networks (RNNs) for both data preprocessing and feature extraction. The proposed process is based on the following steps: (1) training the data using RNNs, (2) extracting features from their hidden layers, and (3) applying various classification algorithms. This methodology offers significant advantages and greatly differs from existing intrusion detection practices. The effectiveness of our method is demonstrated through trials on the Network Security Laboratory (NSL) and Canadian Institute for Cybersecurity (CIC) 2017 datasets, where the application of RNNs for intrusion detection shows substantial practical implications. Specifically, we achieved accuracy scores of 99.6% with Decision Tree, Random Forest, and CatBoost classifiers on the NSL dataset, and 99.8% and 99.9%, respectively, on the CIC 2017 dataset. By reversing the conventional sequence of training data with RNNs and then extracting features before applying classification algorithms, our approach provides a major shift in intrusion detection methodologies. This modification in the pipeline underscores the benefits of utilizing RNNs for feature extraction and data preprocessing, meeting the critical need to safeguard data security and communication freedom against ever-evolving network threats.
... More precisely, it is believed that various SC functions-Logistics 4.0, Purchasing 4.0 [13], Storage 4.0, and Production 4.0-will become 4.0 if they are digitised utilising Industry 4.0 technology. Scientists also refer to smart transportation, smart buying, smart storage, intelligent production, and smart factories as "smart supplier chains" [14]. The gaining of robotics, information, and communication via implementing technologies associated with Industry 4.0 is called "smart" technology. ...
Article
Full-text available
Intelligent network administration and oversight are key components of the 6G future of networks, even though the cloudification of networking with a micro-services-oriented architecture is an established component of 5G. Therefore, a significant role for deep learning (DL), machine learning (ML), and artificial intelligence (AI) can be found in the envisaged 6G model. Upcoming end-to-end automated network operation necessitates the early identification of threats, using resourceful prevention techniques, and the assurance that 6G systems will be self-sufficient. The present piece investigates how AI can be used in 6G data communication and supply chain role 6G networks. In this work, the 6G-based Automotive Supply Chain network is used to evaluate the supply chain using the Deep Learning method. The proposed method integrates an automotive supply chain and deep learning method to improve operational efficiency, improve decision-making and minimise the risks present in the data. Initially, the dataset is collected with the help of a 6G network; next, the dataset is pre-processed. Finally, the dataset is trained by using Deep Q networks. The Guangzhou Automobile Toyota Company dataset is used for evaluation in this work. The proposed work evaluates the enterprise’s and suppliers’ demands based on the product category, and then it also detects the errors found during the transactions between the enterprise and suppliers. This technique makes it possible for businesses and suppliers to communicate clearly and work collaboratively to pursue additional promotion. Managers in enterprises can use theoretical data to support their research while making judgments.
... Then, probabilistic Bayesian networks were used to find the cyberattacks inside the network. A machine learning-based, lightweight maintenance planning approach that protects privacy and can be used to evaluate the likelihood of failures for industrial assets or processes at any given moment was described by authors in [10] for application in 6G-IIoT scenarios. Binary neural networks, or CNNs, were created to construct a predictive maintenance model that could be utilised with homomorphic encryption circuits to guarantee participant privacy. ...
Article
Full-text available
Our study investigates using a unified framework for indoor scene creation in industrial settings that combines Cyber-Physical Systems (CPS) with 6G technologies. This research aims to improve automation and real-time interaction in intricate industrial environments by utilising the superior capabilities of 6G and CPS. We provide a case study of a manufacturing facility to demonstrate how our method makes space optimisation easier, boosts operational effectiveness, and strengthens safety protocols. This case study is an excellent example of the potential and real-world advantages of using cutting-edge technologies in industrial settings. This study delves into the challenge of failure prediction in process sectors that use intelligent and autonomous cyber-physical systems (CPS) in a 6G setting. This aligns with the latest developments in Industry 4.0 and the IIoT. Specifically, we developed a full-stack deep learning approach that used massive amounts of real-time sensory data collected from wireless sensors in a chemical plant. To start, while working with unbalanced time-series data, a unique recursive architecture is proposed that uses several lookback inputs to make an initial forecast using autoregression. During this method, a new learning algorithm called "Recursive Gradient Descent (RGD)" is developed for the proposed architecture to reduce the cumulative prediction uncertainties. Afterwards, a multi-class classification model using temporal convolutions across many channels with a decay effect is proposed to detect and localise the root causes of failure. Because of its exceptional ability to reduce prediction uncertainties accumulated across numerous prediction stages, the entire network is termed the Cumulative Uncertainty Reduction Network with Bayesian Neural Network (CURN-BNN). Results show that CURN-BNN outperforms the state-of-the-art approaches, especially regarding recall for fault prediction and fault type categorisation accuracy.
... In electronic toll tax collection (ETC), a toll fee is collected by deducting an amount from a prepaid radiofrequency identification (RFID) passive tag. The tag, installed on the wind shield of a vehicle, usually carries the basic information such as driver data, vehicle id, and available balance in the user account [5,6]. A tag reader installed on the toll gate reads and routes the tag data to a host computer. ...
Article
Full-text available
Road toll tax contributes significantly in the economic development of any nation. In developing countries, the toll tax collection is carried out either manually or electronically. However, both approaches suffer from various challenges, including prolonged waiting times, lack of transparency, high operational costs, and concerns regarding data security and privacy. This research aims to address these challenges using a blockchain-based system. The proposed system employs advanced image processing techniques, specifically “You Only Look Once” version 5 (YOLOv5), to accurately capture and store vehicles’ registration numbers in a local server situated at toll plazas. Subsequently, the vehicle identification, along with the driver’s credentials, is transmitted to an application server, where an Ethereum smart contract verifies the information and automatically deducts the toll charges from the driver’s account. The results from this study indicate that the proposed system effectively reduces vehicle waiting time and facilitates uninterrupted vehicular movement. Additionally, the system ensures transaction transparency, safeguards the security and privacy of vehicle details, facilitates non-stop payments, rendering unnecessary cash payments or radio-frequency identification scanning at toll booths, and incorporates a decentralized architectural framework to enhance security and mitigate potential system failures.
Article
Full-text available
Vehicle classification (VC) is a prominent research domain within image processing and machine learning (ML) for identifying vehicle volumes and traffic rule violations. In developed countries, nearly 40% of daily accidents are fatal, while in developing countries, the figure rises to 70%. Traditionally, vehicle detection and classification have been performed manually by experts, which is difficult, time-consuming, and prone to errors. Furthermore, incorrect detection and classification can result in hazardous situations. This highlights the need for more reliable techniques to identify and classify vehicles accurately and practically. In existing applications, numerous automated methods have been proposed. However, employing deep and machine learning algorithms on complex datasets of vehicle images has failed to achieve accuracy in various climate conditions and has been time-consuming. This paper presents an accurate, robust, real-time system to classify vehicles from onsite roads. The proposed system utilizes a random wavelet transform for pre-processing, edge and region-based segmentation for feature extraction, an embedded method for feature selection, and the XGBoost algorithm for VC. The proposed work classifies vehicles under complex weather, illumination, color, and occlusion conditions over 10 datasets, including a novel dataset named SRM2KTR, containing 75,436 vehicle images on an FPGA platform. The results show 98.81% accuracy, outperforming the state-of-the-art (98%). The system was demonstrated with four different classifiers, classifying images in 0.16 ns with an average accuracy of 97.79%. The system exhibits high accuracy, rapid identification time, and robustness in practical use.
Article
Full-text available
Autonomous vehicles were experiencing rapid development in the past few years. However, achieving full autonomy is not a trivial task, due to the nature of the complex and dynamic driving environment. Therefore, autonomous vehicles are equipped with a suite of different sensors to ensure robust, accurate environmental perception. In particular, the camera-LiDAR fusion is becoming an emerging research theme. However, so far there has been no critical review that focuses on deep-learning-based camera-LiDAR fusion methods. To bridge this gap and motivate future research, this article devotes to review recent deep-learning-based data fusion approaches that leverage both image and point cloud. This review gives a brief overview of deep learning on image and point cloud data processing. Followed by in-depth reviews of camera-LiDAR fusion methods in depth completion, object detection, semantic segmentation, tracking and online cross-sensor calibration, which are organized based on their respective fusion levels. Furthermore, we compare these methods on publicly available datasets. Finally, we identified gaps and over-looked challenges between current academic researches and real-world applications. Based on these observations, we provide our insights and point out promising research directions.
Chapter
We consider how image super-resolution (SR) can contribute to an object detection task in low-resolution images. Intuitively, SR gives a positive impact on the object detection task. While several previous works demonstrated that this intuition is correct, SR and detector are optimized independently in these works. This paper analyze a framework to train a deep neural network where the SR sub-network explicitly incorporates a detection loss in its training objective, via a tradeoff with a traditional detection loss. This end-to-end training procedure allows us to train SR preprocessing for any differentiable detector. We demonstrate extensive experiments that show our task-driven SR consistently and significantly improves the accuracy of an object detector on low-resolution images from COCO and PASCAL VOC data set for a variety of conditions and scaling factors.
Article
An intelligent transportation system (ITS) plays an important role in public transport management, security and other issues. Traffic flow detection is an important part of the ITS. Based on the real-time acquisition of urban road traffic flow information, an ITS provides intelligent guidance for relieving traffic jams and reducing environmental pollution. The traffic flow detection in an ITS usually adopts the cloud computing mode. The edge of the network will transmit all the captured video to the cloud computing center. However, the increasing traffic monitoring has brought great challenges to the storage, communication and processing of traditional transportation systems based on cloud computing. To address this issue, a traffic flow detection scheme based on deep learning on the edge node is proposed in this article. First, we propose a vehicle detection algorithm based on the YOLOv3 (You Only Look Once) model trained with a great volume of traffic data. We pruned the model to ensure its efficiency on the edge equipment. After that, the DeepSORT (Deep Simple Online and Realtime Tracking) algorithm is optimized by retraining the feature extractor for multiobject vehicle tracking. Then, we propose a real-time vehicle tracking counter for vehicles that combines the vehicle detection and vehicle tracking algorithms to realize the detection of traffic flow. Finally, the vehicle detection network and multiple-object tracking network are migrated and deployed on the edge device Jetson TX2 platform, and we verify the correctness and efficiency of our framework. The test results indicate that our model can efficiently detect the traffic flow with an average processing speed of 37.9 FPS (frames per second) and an average accuracy of 92.0% on the edge device.