ArticlePDF Available

Disparity-Based Multiscale Fusion Network for Transportation Detection

October 2022
IEEE Transactions on Intelligent Transportation Systems 23(10):1-9

October 2022
23(10):1-9

DOI:10.1109/TITS.2022.3161977

Authors:

Jing Chen

Hangzhou Dianzi University

Show all 6 authorsHide

The transportation detection of long-distance small objects has low accuracy. In this work, we propose DMF, which is based on disparity depths. We map different disparity regions to 2D candidate regions according to the distance to solve the small-object detection problem. This method clusters disparity maps of different depths. The projected image is extracted with image features in the mapping region. On the one hand, it uses a multicluster method to unsample 2D mapping regions. On the other hand, the feature fusion of different scales is performed on each cluster region. The experimental results on two datasets show that DMF can improve the detection accuracy of small objects.

Disparity-based detection network. The network structure can be divided into two parts: multicluster embedding and a multiscale network.

…

Figures - uploaded by Jing Chen

Content may be subject to copyright.

Content uploaded by Jing Chen

Content may be subject to copyright.

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 10, OCTOBER 2022 18855

Disparity-Based Multiscale Fusion Network

for Transportation Detection

Jing Chen ,QichaoWang , Weiming Peng, Haitao Xu, Xiaodong Li , and Wenqiang Xu

Abstract— The transportation detection of long-distance small

objects has low accuracy. In this work, we propose DMF, which

is based on disparity depths. We map different disparity regions

to 2D candidate regions according to the distance to solve the

small-object detection problem. This method clusters disparity

maps of different depths. The projected image is extracted with

image features in the mapping region. On the one hand, it uses

a multicluster method to unsample 2D mapping regions. On the

other hand, the feature fusion of different scales is performed

on each cluster region. The experimental results on two datasets

show that DMF can improve the detection accuracy of small

objects.

Index Terms—Disparity depths, long distance, small objects,

multicluster, multiscale.

I. INTRODUCTION

DETECTION methods based on 2D image convolutional

neural networks (CNNs) have shown impressive results

on various tasks such as object detection, classiﬁcation,

instance segmentation, and tracking [1]–[3]. They can provide

object positions and category conﬁdence in 2D images. How-

ever, such methods lack in-depth information. These methods

cannot accurately quantify the distance between an object and

the camera [4]. Therefore, 3D object detection methods are

required to obtain the 3D size and rotation angle of an object

for the ﬁelds of automatic driving and trafﬁc monitoring.

Thus, further information can be obtained, such as the speed

and direction of a detected object. In addition, 3D detection

methods have natural advantages to handle occluded and

overlapping objects.

The goal of 3D object detection is to obtain 3D bounding

boxes. In addition to an RGB image, the required information

contains the corresponding depth. In [5], a depth map was

Manuscript received 12 July 2021; revised 24 January 2022 and 9 March

2022; accepted 20 March 2022. Date of publication 31 March 2022; date

of current version 11 October 2022. This work was supported in part by

the National Science Foundation of China under Grant 61703127 and Grant

61976074, in part by the National Science Foundation of Zhejiang Province

of China under Grant LY17F020026 and Grant LY21F020014, and in part

by the Major Research Plan of the National Natural Science Foundation of

Zhejiang Province of China under Grant 2021C01114. The Associate Editor

for this article was Y. Kamarianakis. (Corresponding author: Jing Chen.)

Jing Chen, Qichao Wang, Weiming Peng, Haitao Xu, and Xiaodong

Li are with the School of Computer Science and Technology, Hangzhou

Dianzi University, Hangzhou 310018, China (e-mail: cj@hdu.edu.cn;

fugowang@hdu.edu.cn; penwm@hdu.edu.cn; xuhaitao@hdu.edu.cn;

hzxiaodong22@163.com).

Wenqiang Xu is with the College of Economics and Manage-

ment, China Jiliang University, Hangzhou 314423, China (e-mail:

wenqiang_xu163@163.com).

Digital Object Identiﬁer 10.1109/TITS.2022.3161977

added to the 2D object detection framework. By detecting the

contours in the image, 2.5D proposals were generated. These

proposals included the disparity, height, and tilt angle of each

pixel of the target. On this basis, CNN was used for feature

extraction, and a support vector machine (SVM) was used for

object classiﬁcation. In [6], a 3D object proposal (3DOP)

method for autonomous driving scenarios was proposed. This

method used a tuple to represent each 3D bounding box. The

tuple elements included the center coordinates, direction angle,

object category and the corresponding 3D box template set.

In [7], a 3D object detection method called deep sliding shapes

was proposed, which focused on indoor scenes. The paper

proposed a multiscale 3D region proposal network (RPN). For

each sliding window, 19 types of anchor boxes were deﬁned.

A 3D CNN method with deep sliding shapes, which outputs

3D bounding boxes by inputting the stereo scene, was also

designed.

Some problems remain with the current 3D object detection

methods: (1) To detect small objects, few effective strategies

have been adopted. Because small objects carry less infor-

mation, the abilities of the current models to express small

object features are weaker. The most common method to

solve this problem is to use upsampling to resize the input

image in the training network. Due to the low efﬁciency of

image pyramids, some methods that consider features, such as

feature pyramid networks(FPNs) [8] and scale normalization

for image pyramids(SNIP) [9], have been proposed. (2) Hand-

crafted features must be added to most of these 3D methods

to compensate for the missing depth information. The use

of these speciﬁc hand-crafted features and a single RGB

image will hinder the expansion of an application scene.

These problems also restrict neural networks from effectively

learning 3D spatial information. (3) Since 3D object detection

adds additional depth information, it takes longer to acquire

these features.

General detection methods are only trained to detect objects

within a speciﬁc scale. The remaining range is ignored during

the back propagation process, which makes it difﬁcult to detect

large objects in high-resolution images and small objects

in low-resolution images. The commonly used approach for

small-object detection is to upsample an image and apply

a pretrained network on the high-resolution image [10].

Detection with high-resolution images can enhance the small-

object information, which is conducive to better classiﬁcation.

However, these methods miss the detection of medium- and

large-sized objects [9]. In addition, the method of dividing the

See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: HANGZHOU DIANZI UNIVERSITY. Downloaded on October 22,2022 at 06:05:57 UTC from IEEE Xplore. Restrictions apply.

18856 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 10, OCTOBER 2022

scale according to the target size in an image can improve

the detection accuracy of different scale targets. However, this

type of method is not suitable for road scenes because in a

trafﬁc scene, a small target in an image may be a small-size

target near the viewpoint in the real world or a large-size target

from the viewpoint in the real world. Therefore, the multiscale

detection of road objects based only on the size of an image

target may result in unsatisfactory results. We use an adaptive

scale based on disparity clustering to obtain more accurate

scale information for objects at different distances in an image.

We summarize our main contributions as follows:

•We propose a multicluster region mapping method based

on disparity segmentation to detect objects of different

sizes. The feature loss caused by semantic abstraction in

the shallow network can be reduced.

•We use different scales of feature extraction and different

levels of feature fusion strategies. Multiscale fusion can

be used on the upsampling feature and bottom-layer fea-

ture based on the distance information. The recognition

accuracy of extreme-size objects can be improved.

•For most datasets that serve the ﬁeld of autonomous

driving, regardless of the sampling angle or range, differ-

ences remain from the urban trafﬁc scene. We constructed

a stereo vison dataset that contained urban trafﬁc roads

in Hangzhou. This dataset makes the detection method

more suitable for trafﬁc monitoring scenarios.

This paper is organized as follows. Section II discusses

the related works. Section III introduces the multiscale fusion

model based on disparity, including the generation method

of multiscale regions, mapping of disparity regions and mul-

tiscale feature fusion network. Section IV uses the KITTI

dataset to verify the proposed method. The effectiveness of this

method is veriﬁed by quantitative evaluations and comparative

experiments. Section V presents the conclusions.

II. RELATED WORK

3D object detection methods include images or image chan-

nels related to the distances of objects in a scene. According

to the feature dimension of the detection method, 3D detection

methods can be divided into three categories: image-based

detection, point cloud-based detection, and image and point

cloud fusion detection.

Image-based detection methods are mainly based on stereo

vision or monocular depth estimations. Qian et al. [11] used

a depth estimation method to generate a dense depth map.

By introducing a change of representation (CoR) layer, the

depth map can be directly converted to a pseudolidar point

cloud for 3D object detection. However, pseudolidar methods

are also required to estimate the background depth, which

wastes computational resources. In [12], DSGN was proposed

to extract 2D features from stereo image pairs with a Siamese

network. Then, these features were used to construct a plane-

sweep volume, which was transformed into a 3D geometric

volume by reversing the 3D projection for object detection.

When constructing a 3D volume, this method considers the

image depth information and semantic information, so it was

not efﬁcient. In [13], Stereo R-CNN was proposed to create 2D

anchors and associated objects in left and right images. Then,

keypoints and viewpoints were combined with 2D anchors

to estimate the rough 3D object box. Finally, photometric

alignment was used to reﬁne the 3D bounding boxes. However,

this method was inferior to 3D object detection methods based

on depth maps. CG-Stereo [14] proposed a conﬁdence-guided

stereo 3D object detection pipeline, which used separate

decoders for foreground and background pixels to focus more

on the accurate points and boosts the 3D object detection

accuracy.

Affected by the receptive ﬁeld, traditional 3D CNNs cannot

learn the local features of different scales well [15], [16].

Therefore, PointNet [17] and PointCNN [18] were proposed

and were new network structures for point cloud features.

PointNet [17] was the ﬁrst deep learning framework that

directly addressed point clouds. After the original data and

features were aligned, the features that used feature points

as units were extracted ﬁrst. Then, the global feature vec-

tor of the point cloud was extracted in the feature space

dimension. Afterward, a series of 3D detection methods

based on point clouds appeared, such as PV-RCNN [19] and

Frustum-PointNet [20]. In [18], an X-transform method called

PointCNN was proposed, which learned a set of weights X

from the input point. This set of weights can be used to

reweight and arrange the features associated with each point.

These methods focus on how to learn more effective spatial

geometric representations from point clouds. However, 3D

CNN on a large-scale sparse point cloud requires convolution

in 3 dimensions, so the entire detection process is extremely

time consuming. Another type of point cloud-based detection

method maps the point cloud to a certain plane, such as a

front view (FV) or a bird’s-eye view (BEV). This method

can maintain the depth information of a space and geometric

shape of an object. In addition, the occlusion problem can

be easily solved. A BEV method [21] with point cloud data

was used to achieve real-time 3D detection. It projects the

input point cloud to a 3-channel BEV plane. Then, 2D object

features and ground estimations are combined to realize ofﬂine

3D object detection. Because of the information loss in a

certain dimension during the mapping process, this type of

method is only suitable for scenarios where objects are located

on the same plane (such as in autonomous driving). In this

case, a partial loss of information will not greatly impact

the detection results. Recently, Point-GNN [22] was used to

encode a point cloud scene into a graph structure, and a graph

neural network was proposed for 3D object detection. 3DSSD

[23] is a one-stage anchor-free method. It uses a new fusion

sampling strategy to reduce the amount of calculation.

Image and point cloud fusion detection methods usually

map point clouds to a certain plane or several planes. A point

cloud can be fused with a feature map by selectively adding

handcrafted features. In [24], a deep fusion network for

lidar and image data was proposed, which is called MV3D.

MMF [25] was used to project a point cloud into an image

coordinate system to obtain a sparse depth image to learn

better feature representations. In addition, ground estimation

and depth completion can be used to improve the 3D object

detection precision. In [26], PointPainting was proposed to

Authorized licensed use limited to: HANGZHOU DIANZI UNIVERSITY. Downloaded on October 22,2022 at 06:05:57 UTC from IEEE Xplore. Restrictions apply.

CHEN et al.: DISPARITY-BASED MULTISCALE FUSION NETWORK FOR TRANSPORTATION DETECTION 18857

Fig. 1. Disparity-based detection network. The network structure can be divided into two parts: multicluster embedding and a multiscale network.

project semantic image information onto a point cloud. With

PointPainting, the semantic features of a point cloud can be

increased, and the deﬁciency of point cloud features can be

compensated. In [27], the fusion of BEV and FV proved that

fusion before RPN achieved better results. The sparse nonho-

mogeneous pooling layer in the network mainly implemented

the perspective transformation and fusion process between

RGB image and BEV map. ConFuse [28] used ResNet [29]

to extract features in both image and point cloud streams.

Multiscale fusion was performed on the image features; then,

the features were projected onto the BEV map. The fusion of

image features, location information, and point cloud stream

features was used to achieve 3D detection. This fusion method,

which simultaneously processes multiple data, can obtain more

comprehensive spatial and texture information. However, due

to the fusion operation and handcrafted feature acquisition, the

computation speed of the entire system decreases.

III. DISPARITY-BASED MULTISCALE

DETECTION METHOD

In this paper, we utilize the depth information of the dis-

parity map to branch the RGB image. The multiscale network

is obtained by clustering the depth feature. Then, the image

can be sent to a multiscale network for detection to balance

the recall ability and complexity of the model.

Our method inputs regions with different depths into a

detection network with different levels of feature fusion.

Thus, the recognition rate of small objects can be improved.

As shown in Figure 1, the network structure can be divided

into two parts: multicluster embedding and a multiscale net-

work. In the RPN part, the left image generates proposals

of various scales through multicluster embedding operations.

After RPN, through the operation of RoIAlign, the region of

interest (RoI) feature in each scale is obtained. Finally, after

the non-maximum suppression (NMS), the ﬁnal classiﬁcation

and regression results are obtained.

A. Disparity-Based Multicluster Embedding

In the multicluster embedding part, DMF inputs the map-

ping regions of different depths into different clusters to

replace the traditional network method of training and testing

on the entire image. Except for the FPN processing of different

scales, each cluster uses the same network structure. Therefore,

each cluster can share weights to reduce the complexity and

training time.

The image pair correlation method [13] does not require

additional matching calculations, but the data association is

based on the premise that the left and right image pairs

have identical detection results. It is difﬁcult to satisfy this

correlation in an actual detection environment. For example,

if we adjust the sampling parameters of an image in the

image pair, the data correlation method will most likely fail.

In actual sampling, inconsistent detection results are common

due to differences in camera parameters, synchronization time

and other factors. Considering the robustness and real-time

requirements, we use the Fourier fast matching algorithm

proposed in [30] to calculate the disparity in the image pair.

The RGB image and disparity map are normalized ﬁrst.

Then, we perform threshold segmentation on the disparity

image according to the depth information. The relationships

between object sizes and their distances in different image

regions are not linear. Therefore, we use the Fibonacci function

to segment the pixel region. Each cluster of the disparity image

can ﬁnally be obtained. The image segmentation algorithm

based on the disparity depth can be described as Algorithm 1.

where {x1,x2,...,xm}are the sample points of the disparity

map area at different distances; kis the number of cluster

branches; c(i)is the distance from each sample point to each

cluster center; u

kis the center of the updated sample point;

and Jis the error function of the algorithm.

To reduce the loss of edge features in disparity clustering,

we added an overlap between two regions adjacent to the

distance scale, i.e., we retain at both scales the region where

the disparity value is in the overlap range. Assuming that the

number of clusters is N, we deﬁne the overlapping area of the

corresponding segmentation disparity oias

oi=oi−1∗δ, i=0,1,...,N−1(1)

where δis the partition coefﬁcient, and o0is the overlapping

area of the ﬁrst layer. The parameters must be selected to

ensure that there is no signiﬁcant difference in resolution

Authorized licensed use limited to: HANGZHOU DIANZI UNIVERSITY. Downloaded on October 22,2022 at 06:05:57 UTC from IEEE Xplore. Restrictions apply.

18858 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 10, OCTOBER 2022

Algorithm 1: Disparity-Based Cluster Algorithm

Input: x1,x2,...,xm;

Output: cluster k, avg distance;

different distance, divide into kcluster;

while Repeat for rounds do

c(i):= arg mink

xi−μk



for distance both min do

μ

k:= 1

|c(i)|x∈c(i)x;

if μ

k!=μkthen

μk=μ

end

update distance;

end

J(c(1),c(2),...,c(m),μ

1,μ

2,...,μ

km

i=1k

j=1

x(i)−μj



end

μ1,μ

2,...,μ

krepresent the center of k depth ranges,

respectively;

return μ1,μ

2,...,μ

between the effective region of interest and the training

network after rescaling. Through the disparity segmentation

of the depth range, target regions at different distances can

be separated. Branches can be easily established in the sub-

sequent detection process. Objects with different gray values

in different distance ranges are separately detected. The same

scale is applied for objects within the corresponding depth

range.

B. Clustering Based Multiscale Network

In the multiscale part, a multiscale pretraining mechanism

and a multiscale feature fusion strategy are utilized. Compared

with the feature fusion of an entire image, multiscale feature

fusion can play the role of a shallow network to improve the

detection effect of small object regions. Small-object detec-

tion requires higher-resolution feature maps to focus on the

information in the corresponding area. The fusion of shallow

high-resolution features and high-level semantic features can

be used to detect objects at different scales.

Based on disparity clustering, the disparity segmentation

regions are mapped back to the 2D image according to

the coordinate points. By using a branch extraction strategy,

feature extraction of different scales can be performed on

regions of different depths. The mapping result is branched in

different clusters. Then, pretrained networks of different scales

can be used in each cluster. It is assumed that the distance and

scale satisfy the following relationship:

N=ae(d+b)(2)

where Nis the scale factor of the network. The settings

of the number of upsampling and network layers are related

to the scale factor. dis the distance scale between the target in

the image region and the optical center of the left camera; a,b

are scale factors whose values are obtained from experiments.

The ﬁrst nlayers are shared convolutional layers in the net-

work. Features of different convolutional layers are extracted

after the n+1 th layer. The network is shown in Figure 1. For

clusters with larger distances, a shallower network is used to

fuse the high-resolution and upper-layer semantic features. The

gradient of pixels that are not in the cluster is set to 0 during

the training process. Finally, the detection result is obtained

by classiﬁer and bounding box regression.

C. Loss Function

We use multitask loss, including classiﬁcation, regression,

and depth value. Joint training has faster speed and a smaller

memory footprint. In addition, we adopted a multiscale train-

ing strategy. In the loss function, the loss of each scale task

is calculated according to the range of the clustering branch.

The loss function of the network structure is deﬁned as:



i=1

Lcls



i=1

Lreg



i=1

Ldepth

i(3)

where Lcls is the classiﬁcation loss; Lres is the regression loss;

Ldepth is the depth distance loss; kis the number of cluster

branches.

Lcls

i=−log p∗

ipi+1−p∗

i(1−pi)(4)

p∗

i=0,negative label

1,positive label (5)

where piis the probability when the object is predicted, and

p∗

iis the label of the ground truth.

The regression loss can be calculated using the smooth

L1function based on the target distance, where xi,yi,wi,and

hiare the ground-truth bounding box center point coordinates,

width and height, respectively. x∗

i,y∗

i,w∗

i,andh∗

iare the

predicted bounding box center point coordinates, width and

height, respectively.

Lreg

i=1

Nreg



smooth L1(xi−x∗

i)2+(yi−y∗

i)2

+(√wi−w∗

i)2+(hi−h∗

i)2(6)

The depth loss uses the smooth L1function:

Ldepth

i=1



i=1

smooth L1(di−ˆ

di)(7)

where NDis the number of pixels in the ground-truth depth.

By projection transformation, BEV is considered the rota-

tion angle along the yaxis of the camera coordinate system.

In other words, the elevation data are projected on the xoz

plane of the stereo vision coordinate system. The origin of

this coordinate system is the position of the optical center of

the left camera. The projection transformation satisﬁes:

xpypzpT=Rcy xcyczcT(8)

Rcy =⎡

⎣

cos θc0−sin θc

01 0

sin θc0cosθc⎤

⎦(9)

Authorized licensed use limited to: HANGZHOU DIANZI UNIVERSITY. Downloaded on October 22,2022 at 06:05:57 UTC from IEEE Xplore. Restrictions apply.

CHEN et al.: DISPARITY-BASED MULTISCALE FUSION NETWORK FOR TRANSPORTATION DETECTION 18859

TAB L E I

AVERAGE PRECISION OF BIRD’SEYEVIEW (APBEV)AND 3D BOXES (AP3D )COMPARISON

EVALUATED ON THE KITTI VALIDATION SET CONTAINING CARS

TAB L E I I

AVERAGE PRECISION OF THE BIRD’SEYEVIEW (APBEV)AND 3D BOX (AP3D)COMPARISON EVALUATED ON THE KITTI TEST SET CONTAINING CARS

where (xp,yp,zp)is the projection coordinate, (xc,yc,zc)is

the visual coordinate, and Rcy is the rotation vector of the

road plane along the yaxis of the visual coordinate system.

When the projection coordinates and visual coordinates are

known, rotation angle θcof the object along the yaxis can be

obtained, which is the vehicle orientation.

IV. EXPERIMENT

A. KITTI Quantitative Analysis

We use the KITTI dataset [39] for a quantitative comparison.

The dataset contains the 3D border annotations of objects. The

label of each object is composed of its category and 3D size.

For vehicle detection, the intersection over union (IoU) overlap

threshold is 0.7. For the detection of pedestrians and bicycles,

the IoU overlap threshold is 0.5. The detections of regions

that are not of interest or of objects that are smaller than the

minimum size are not considered false positives. The dataset

deﬁnes three modes of easy, moderate and hard based on

the minimum border height, maximum occlusion degree, and

maximum truncation. The evaluation website uses a moderate

mode to sort the algorithms. We use 3D detection and BEV

detection metrics for contrasting experiments.

Since different methods have different input images and

network structures, we directly compare the accuracy of the

ﬁnal detection results. Table I compares the average precision

of BEV (APBEV) and the 3D box (AP3D) evaluated on the

KITTI validation set. Table II compares APBEV and AP3D

evaluated on the KITTI test set. Table III compares APBEV

and AP3D evaluated for pedestrians and cyclists on the KITTI

test set.

As can be seen from the tables, our algorithm has clear

advantage between the stereo detection methods. Speciﬁcally,

the depth-based multiscale method improves the accuracy in

the moderate and hard modes. We did not use point cloud

data, but in moderate and hard modes, the detection results

are still better than the MMLAB LIGA-Stereo method using

point cloud and stereo data. The greater resolution difference

between validation data and training data, the worse the

experimental effect, which shows that the robustness of CNN

to the scale changes of the data is not good enough [9]. Since

the CNN method has no scale invariance, it can only adapt

to objects of different scales by setting many parameters.

Common upsampling strategies can improve the detection

effect of small objects. However, these strategies make the

original normal size object appear larger, which will reduce the

detection performance. The multi-scale fusion in this paper is

to extract the scale features of images of different resolutions,

thereby reducing the impact of the inconsistent resolution of

the training data on the model effect. The experimental results

demonstrate that our method does not affect the accuracy of

detection in the easy mode. Therefore, the disparity-based

multiscale detection method is effective in improving small-

object detection.

B. Urban Road Validation

For most datasets that serve the ﬁeld of autonomous driving,

the camera installation is not ﬁxed, and the sampling angle

Authorized licensed use limited to: HANGZHOU DIANZI UNIVERSITY. Downloaded on October 22,2022 at 06:05:57 UTC from IEEE Xplore. Restrictions apply.

18860 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 10, OCTOBER 2022

TABLE III

AVERAGE PRECISION OF THE BIRD’SEYE VIEW (APBEV )AND 3D BOX (AP3D)COMPARISON EVALUATED

ON THE KITTI TEST SET CONTAINING PEDESTRIANS AND CYCLISTS

TAB L E I V

DETECTION RESULTS WITH DIFFERENT NUMBERS OF CLUSTER BRANCHES

between camera and road is small. Therefore, these datasets

are not suitable for the detection and analysis of trafﬁc

scenes. To compensate for this limitation, we created an urban

road stereo vision dataset. We collected stereo vision video

and image data of the Hangzhou city road network. The

labeled information includes vehicles, nonmotorized vehicles,

pedestrians, and lanes. The dataset includes 5890 sets of left

and right training images and 3902 sets of left and right

test images. It contains 8126 labeled objects in total. The

detection deﬁnes four different ranges of 50 m, 100, 150 m

and 200 m according to the distance between the target and

the sampling camera. We speciﬁcally annotated long-distance

objects in the dataset. The number of marked objects at the

maximum distance is approximately 1642. The training data

are divided into a training set and a validation set. The ratio of

the training set to the validation set was 4:1. Figure 2 shows

the result of 3D object detection using our dataset. The mul-

tiscale training strategy improves the effectiveness of feature

extraction.

ResNet [29] is the backbone network. For different cluster-

ing regions, pretraining is performed at different resolutions.

According to the depth information, the region ranges for

different disparity values in the same image are clustered.

A multiscale strategy is used for training and testing. We use

AP3D and AP2D to evaluate the different methods.

The number of clusters k in Algorithm 1 is the number of

divisions of the disparity map. This is an experience value.

Table IV shows the detection results of training with different

numbers of branches. The result in Table IV demonstrates

that when k<4, more branches correspond to higher detection

accuracy. However, when the k value exceeds 3, the accuracy

is not signiﬁcantly improved, which shows that the multiscale

network enhances the expression ability of target features at

different distances and resolutions. However, when the scale

division exceeds 3, the object features in a certain range are

not so obviously affected by the scale. Considering the cost

of pretraining and network training caused by scale branches,

the number of clusters is 3.

We analyze the importance of the multicluster and multi-

scale models in DMF. The baseline (Table V(a)) used ResNet

[29]. We used 3 different image resolutions: (1400 ×2000),

(800 ×1200) and (480 ×800). The disparity parameters of

the overlap zone are set to 6 and 10. The corresponding

disparity intervals of the object are [0,168], [159,224] and

[219,255]. Table V(b) shows that the multicluster method

improves the long-distance detection. Table V(c) shows that

Authorized licensed use limited to: HANGZHOU DIANZI UNIVERSITY. Downloaded on October 22,2022 at 06:05:57 UTC from IEEE Xplore. Restrictions apply.

CHEN et al.: DISPARITY-BASED MULTISCALE FUSION NETWORK FOR TRANSPORTATION DETECTION 18861

Fig. 2. Validation of Hangzhou urban road transportation detection.

TAB L E V

ADDED DEPTH AND MULTICLUSTER ANALYSES O F DMF FOR ABLATION STUDIES

the multiscale method signiﬁcantly improves the detection

accuracy. There is less information with small target objects.

Its feature expression capabilities is weaker. For objects with

different depth ranges, using different scale training strategies

can better improve this problem.

V. C ONCLUSION

One of the reasons for the low accuracy of small object

detection is that the resolution of small objects is lower

than that of normal-sized targets. As a result, the ability to

express small object features is weak, which often makes the

extraction of small object features insufﬁcient. The commonly

used upsampling method cannot detect targets of all scales

in an image [9]. Although the method of dividing the scale

according to the image size can improve the detection accuracy

of different scale targets, this kind of method is not suitable

for road scenes.

To solve these problems, we propose a multiscale 3D

detection method based on disparity clustering. This method

uses disparity clustering information to map the area to

be inspected. According to the disparity depth information,

a multiscale feature fusion detection model is constructed. The

fusion of 2D image features and 3D depth features can be

adapted to the detection of long-distance, small-size, partially

occluded or truncated objects.

REFERENCES

[1] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards

real-time object detection with region proposal networks,” IEEE

Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149,

Jun. 2017.

[2] C. Chen, B. Liu, S. Wan, P. Qiao, and Q. Pei, “An edge trafﬁc ﬂow

detection scheme based on deep learning in an intelligent transportation

system,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 3, pp. 1840–1852,

Mar. 2021.

[3] Y. Cui et al., “Deep learning for image and point cloud fusion in

autonomous driving: A review,” IEEE Trans. Intell. Transp. Syst., vol. 23,

no. 2, pp. 722–739, Feb. 2021.

[4] Y. Wang, W.-L. Chao, D. Garg, B. Hariharan, M. Campbell, and

K. Q. Weinberger, “Pseudo-LiDAR from visual depth estimation: Bridg-

ing the gap in 3D object detection for autonomous driving,” in Proc.

IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019,

pp. 8445–8453.

[5] S. Gupta, R. Girshick, P. Arbeláez, and J. Malik, “Learning rich

features from RGB-D images for object detection and segmentation,”

in Proc. Eur. Conf. Comput. Vis. Zürich, Switzerland: Springer, 2014,

pp. 345–360.

Authorized licensed use limited to: HANGZHOU DIANZI UNIVERSITY. Downloaded on October 22,2022 at 06:05:57 UTC from IEEE Xplore. Restrictions apply.

18862 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 10, OCTOBER 2022

[6] X. Z. Chen, K. Kundu, Y. Zhu, S. Fidle, R. Urtasun, and H. Ma,

“3D object proposals using stereo imagery for accurate object class

detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 5,

pp. 1259–1272, May 2018.

[7] S. Song and J. Xiao, “Deep sliding shapes for amodal 3D object

detection in RGB-D images,” in Proc. IEEE Conf. Comput. Vis. Pattern

Recognit. (CVPR), Jun. 2016, pp. 808–816.

[8] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and

S. Belongie, “Feature pyramid networks for object detection,” in

Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017,

pp. 2117–2125.

[9] B. Singh and L. S. Davis, “An analysis of scale invariance in object

Detection–SNIP,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog-

nit., Jun. 2018, pp. 3578–3587.

[10] M. Haris, G. Shakhnarovich, and N. Ukita, “Task-driven super

resolution: Object detection in low-resolution images,” 2018,

arXiv:1803.11316.

[11] R. Qian et al., “End-to-End pseudo-LiDAR for image-based 3D object

detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.

(CVPR), Jun. 2020, pp. 5881–5890.

[12] Y. Chen, S. Liu, X. Shen, and J. Jia, “DSGN: Deep stereo geometry

network for 3D object detection,” in Proc. IEEE/CVF Conf. Comput.

Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 12536–12545.

[13] P. Li, X. Chen, and S. Shen, “Stereo R-CNN based 3D object detection

for autonomous driving,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern

Recognit. (CVPR), Jun. 2019, pp. 7644–7652.

[14] C. Li, J. Ku, and S. L. Waslander, “Conﬁdence guided stereo 3D object

detection with split depth estimation,” in Proc. IEEE/RSJ Int. Conf.

Intell. Robots Syst. (IROS), Oct. 2020, pp. 5776–5783.

[15] B. Li, “3D fully convolutional network for vehicle detection in point

cloud,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS),

Sep. 2017, pp. 1513–1518.

[16] M. Engelcke, D. Rao, D. Z. Wang, C. H. Tong, and I. Posner,

“Vote3Deep: Fast object detection in 3D point clouds using efﬁcient

convolutional neural networks,” in Proc. IEEE Int. Conf. Robot. Autom.

(ICRA), May 2017, pp. 1355–1361.

[17] R. Q. Charles, H. Su, M. Kaichun, and L. J. Guibas, “PointNet:

Deep learning on point sets for 3D classiﬁcation and segmentation,”

in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017,

pp. 652–660.

[18] Y. Li, R. Bu, M. Sun, W. Wu, X. Di, and B. Chen, “PointCNN:

Convolution on X-transformed points,” in Proc. Adv. Neural Inf. Process.

Syst., vol. 31, 2018, pp. 820–830.

[19] S. Shi et al., “PV-RCNN: Point-voxel feature set abstraction for 3D

object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog-

nit. (CVPR), Jun. 2020, pp. 10529–10538.

[20] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas, “Frustum pointnets

for 3D object detection from RGB-D data,” in Proc. IEEE/CVF Conf.

Comput. Vis. Pattern Recognit., Jun. 2018, pp. 918–927.

[21] J. Beltran, C. Guindel, F. M. Moreno, D. Cruzado, F. Garcia, and

A. De La Escalera, “BirdNet: A 3D object detection framework from

LiDAR information,” in Proc. 21st Int. Conf. Intell. Transp. Syst. (ITSC),

Nov. 2018, pp. 3517–3523.

[22] W. Shi and R. Rajkumar, “Point-GNN: Graph neural network for 3D

object detection in a point cloud,” in Proc. IEEE/CVF Conf. Comput.

Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 1711–1719.

[23] Z. Yang, Y. Sun, S. Liu, and J. Jia, “3DSSD: Point-based 3D single

stage object detector,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern

Recognit. (CVPR), Jun. 2020, pp. 11040–11048.

[24] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-view 3D object detec-

tion network for autonomous driving,” in Proc. IEEE Conf. Comput. Vis.

Pattern Recognit. (CVPR), Jul. 2017, pp. 1907–1915.

[25] M. Liang, B. Yang, Y. Chen, R. Hu, and R. Urtasun, “Multi-

task multi-sensor fusion for 3D object detection,” in Proc.

IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019,

pp. 7345–7353.

[26] S. Vora, A. H. Lang, B. Helou, and O. Beijbom, “Point-

Painting: Sequential fusion for 3D object detection,” in Proc.

IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020,

pp. 4604–4612.

[27] Z. Wang, W. Zhan, and M. Tomizuka, “Fusing bird’s eye view LIDAR

point cloud and front view camera image for 3D object detection,” in

Proc. IEEE Intell. Vehicles Symp. (IV), Jun. 2018, pp. 1–6.

[28] M. Liang, B. Yang, S. Wang, and R. Urtasun, “Deep continuous fusion

for multi-sensor 3D object detection,” in Proc. Eur. Conf. Comput. Vis.

(ECCV), 2018, pp. 641–656.

[29] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for

image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.

(CVPR), Jun. 2016, pp. 770–778.

[30] J. Chen, W. Xu, H. Xu, F. Lin, Y. Sun, and X. Shi, “Fast vehicle detection

using a disparity projection method,” IEEE Trans. Intell. Transp. Syst.,

vol. 19, no. 9, pp. 2801–2813, Sep. 2018.

[31] A. Mousavian, D. Anguelov, J. Flynn, and J. Kosecka, “3D bound-

ing box estimation using deep learning and geometry,” in Proc.

IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017,

pp. 7074–7082.

[32] A. D. Pon, J. Ku, C. Li, and S. L. Waslander, “Object-centric stereo

matching for 3D object detection,” in Proc. IEEE Int. Conf. Robot.

Autom. (ICRA), May 2020, pp. 8383–8389.

[33] W. Bao, B. Xu, and Z. Chen, “MonoFENet: Monocular 3D object

detection with feature enhancement networks,” IEEE Trans. Image

Process., vol. 29, pp. 2753–2765, 2019.

[34] H. Konigshof, N. O. Salscheider, and C. Stiller, “Realtime 3D object

detection for automated driving using stereo vision and semantic infor-

mation,” in Proc. IEEE Intell. Transp. Syst. Conf. (ITSC), Oct. 2019,

pp. 1405–1410.

[35] Z. Xu et al., “ZoomNet: Part-aware adaptive zooming neural network

for 3D object detection,” in Proc. AAAI Conf. Artiﬁ. Intel., vol. 34, 2020,

pp. 12557–12564.

[36] J. Sun et al., “Disp R-CNN: Stereo 3D object detection via

shape prior guided instance disparity estimation,” in Proc.

IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020,

pp. 10548–10557.

[37] D. Garg, Y. Wang, B. Hariharan, M. Campbell, K. Q. Weinberger,

and W.-L. Chao, “Wasserstein distances for stereo disparity estimation,”

2020, arXiv:2007.03085.

[38] X. Guo, S. Shi, X. Wang, and H. Li, “LIGA-stereo: Learning

LiDAR geometry aware representations for stereo-based 3D detec-

tor,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021,

pp. 3153–3163.

[39] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics:

The KITTI dataset,” Int. J. Robot. Res., vol. 32, no. 11, pp. 1231–1237,

2013.

Jing Chen received the Ph.D. degree in mechan-

ical engineering from Zhejiang University, China,

in 2010. She is currently an Associate Professor with

the School of Computer Science and Technology,

Hangzhou Dianzi University, China. Her research

interests include computer vision, machine learning,

and urban transportation.

Qichao Wang received the B.S. degree from

the Qingdao University of Technology, Qingdao,

China, in 2015. He is currently pursuing the M.D.

degree in information technology with the School

of Computer Science, Hangzhou Dianzi University,

Zhejiang, China. His research interests include intel-

ligent transportation systems, pattern recognition,

and computer vision.

Authorized licensed use limited to: HANGZHOU DIANZI UNIVERSITY. Downloaded on October 22,2022 at 06:05:57 UTC from IEEE Xplore. Restrictions apply.

CHEN et al.: DISPARITY-BASED MULTISCALE FUSION NETWORK FOR TRANSPORTATION DETECTION 18863

Weiming Peng received the Ph.D. degree in

computer application technology from the South

China University of Technology, Guangzhou, China,

in 2013. He is currently a Lecturer with Hangzhou

Dianzi University, Hangzhou, China. His current

research interests include quantum computation and

data fusion.

Haitao Xu is currently an Associate Professor with

Hangzhou Dianzi University. His research interests

include machine learning and data mining. As a

member, he won the Second Prize of Zhejiang

Science and Technology Award and the First Prize

of Zhejiang Higher Scientiﬁc Research Achievement

Awa r d .

Xiaodong Li received the Ph.D. degree from

the College of Control Science and Engineering,

Zhejiang University, China, in 2014. He is cur-

rently a Lecturer with Hangzhou Dianzi Univer-

sity, Hangzhou, China. His current research interests

include machine learning, image processing, and

pattern recognition.

Wenqiang Xu received the Ph.D. degree in eco-

nomics management from Wuhan University, China,

in 2015. He is currently a Lecturer with the Col-

lege of Economics and Management, China Jiliang

University. His research interests include computer

vision, machine learning, and urban transportation.

Authorized licensed use limited to: HANGZHOU DIANZI UNIVERSITY. Downloaded on October 22,2022 at 06:05:57 UTC from IEEE Xplore. Restrictions apply.

An Efficient Real-Time Vehicle Classification from a Complex Image Dataset Using eXtreme Gradient Boosting and the Multi-Objective Genetic Algorithm

Article

Full-text available

Jun 2024

Recent advancements in image processing and machine-learning technologies have significantly improved vehicle monitoring and identification in road transportation systems. Vehicle classification (VC) is essential for effective monitoring and identification within large datasets. Detecting and classifying vehicles from surveillance videos into various categories is a complex challenge in current information acquisition and self-processing technology. In this paper, we implement a dual-phase procedure for vehicle selection by merging eXtreme Gradient Boosting (XGBoost) and the Multi-Objective Optimization Genetic Algorithm (Mob-GA) for VC in vehicle image datasets. In the initial phase, vehicle images are aligned using XGBoost to effectively eliminate insignificant images. In the final phase, the hybrid form of XGBoost and Mob-GA provides optimal vehicle classification with a pioneering attribute-selection technique applied by a prominent classifier on 10 publicly accessible vehicle datasets. Extensive experiments on publicly available large vehicle datasets have been conducted to demonstrate and compare the proposed approach. The experimental analysis was carried out using a myRIO FPGA board and HUSKY Lens for real-time measurements, achieving a faster execution time of 0.16 ns. The investigation results show that this hybrid algorithm offers improved evaluation measures compared to using XGBoost and Mob-GA individually for vehicle classification.

Resource Allocation Using Deep Deterministic Policy Gradient-Based Federated Learning for Multi-Access Edge Computing

Article

Full-text available

Jun 2024

The study focuses on utilizing the computational resources present in vehicles to enhance the performance of multi-access edge computing (MEC) systems. While vehicles are typically equipped with computational services for vehicle-centric Internet of Vehicles (IoV) applications, their resources can also be leveraged to reduce the workload on edge servers and improve task processing speed in MEC scenarios. Previous research efforts have overlooked the potential resource utilization of passing vehicles, which can be a valuable addition to MEC systems alongside parked cars. This study introduces an assisted MEC scenario where a base station (BS) with an edge server serves various devices, parked cars, and vehicular traffic. A cooperative approach using the Deep Deterministic Policy Gradient (DDPG) based Federated Learning method is proposed to optimize resource allocation and job offloading. This method enables the transfer of device operations from devices to the BS or from the BS to vehicles based on specific requirements. The proposed system also considers the duration for which a vehicle can provide job offloading services within the range of the BS before leaving. The objective of the DDPG-FL method is to minimize the overall priority-weighted task computation time. Through simulation results and a comparison with three other schemes, the study demonstrates the superiority of their proposed method in seven different scenarios. The findings highlight the potential of incorporating vehicular resources in MEC systems, showcasing improved task processing efficiency and overall system performance.

A revolutionary approach to use convolutional spiking neural networks for robust intrusion detection

Article

Full-text available

Jun 2024
CLUSTER COMPUT

In an era dominated by network connectivity, the reliance on robust and secure networks has become paramount. With the advent of 5G and the Internet of Things, networks are expanding in both scale and complexity, rendering them susceptible to a myriad of cyber threats. This escalating risk encompasses potential breaches of user privacy, unauthorized access to transmitted data, and targeted attacks on the underlying network infrastructure. To safeguard the integrity and security of modern networked societies, the deployment of Network Intrusion Detection Systems is imperative. This paper presents a novel lightweight detection model that seamlessly integrates Spiking Neural Networks and Convolutional Neural Networks with advanced algorithmic frameworks. Leveraging this hybrid approach, the proposed model achieves superior detection accuracy while maintaining efficiency in terms of power consumption and computational resources. This paper presents a new style recognition model that seamlessly integrates spiking neural networks and convolutional neural networks with advanced algorithmic frameworks. We call this combined method Spiking-HCCN. Using this hybrid approach, Spiking-HCCN achieves superior detection accuracy while maintaining efficiency in terms of power consumption and computational resources. Comparative evaluations against state-of-the-art models, including Spiking GCN and Spike-DHS, demonstrate significant performance advantages. Spiking-HCCN outperforms these benchmarks by 24% in detection accuracy, 21% in delay, and 29% in energy efficiency, underscoring its efficacy in fortifying network security in the face of evolving cyber threats.

A two-way trust routing scheme to improve security in fog computing environment

Article

Full-text available

Jun 2024
CLUSTER COMPUT

Compliance with security requirements in the fog computing environment is known as an important phenomenon in maintaining the quality of service due to the dynamic topology. Security and privacy breaches can occur in fog computing because of its properties and the adaptability of its deployment method. These characteristics render current systems inappropriate for fog computing, including support for high mobility, a dynamic environment, geographic distribution, awareness of location, closeness to end users, and absence of redundancy. Although efficient secure routing protocols have been developed by researchers in recent years, it is challenging to ensure security, reliability, and quality of service at the same time to overcome the limitations of cloud-fog computing. In light of the fact that trust management is an effective means of protecting sensitive information, this study proposes a two-way trust management system (TMS) that would enable both the service requester and the service provider to verify each other's reliability and safety. The trustworthiness of the service seeker can also be verified in this way. So that fog clients can confirm that fog nodes can deliver suitable, dependable, and secure services, trust in a fog computing environment should ideally be two-way. The ability to verify the authenticity of fog clients is an important capability for fog nodes to have. A distributed, event-based, multi-trust trust system is presented by the suggested approach to trust computation, which makes use of social relationships (nodes and clients) and service quality criteria. Hence, the trust score is computed using a number of characteristics. Here, the weight of direct and indirect ratings is emphasized, and the final trust score is computed by dynamically merging the information gained from self-observation and the suggestions of nearby nodes. An extensive evaluation of the proposed method shows that it is resistant to a large number of badly behaved nodes and can successfully neutralize trust-based attacks.

Transforming future technology with quantum-based IoT

Article

Full-text available

Jun 2024
J SUPERCOMPUT

With the advent of internet-enabled and hybrid technologies, information is becoming increasingly accessible to the general public. Smartphones and other gadgets are used extensively by people to share and promote ideas, in a variety of ways. Human interaction and communication has become more reliable and effective through advanced computing technologies. Quantum computing is an emerging paradigm that will change the lives of individuals and the operations of organizations. Quantum computers solve problems at high speed by operating in a superposition state in which the state can be either zero or one at the same instant. Quantum sensors can be used efficiently in technological research to make accurate measurements and collect data that provide new insights into the behavior of nanomaterials. The use of quantum computing could also speed up the manufacturing process of devices with remarkable properties such as superconductivity, high strength or improved signal performance. Quantum computing has the ability to dramatically speed up the development process of various organizations and increase their efficiency and effectiveness. The security and reliability of data and communication is improved by quantum computing techniques such as key generation and entanglement dispersion. Companies use cryptographic algorithms to protect their data. However, with the advent of quantum computing, cryptographic methods that rely on numerical aspects are no longer sufficient to protect data. Quantum computing is an emerging field that is being applied to various problems that previously could not be solved using conventional methods. Quantum computing plays an important role in the field of information processing, where information is precisely analyzed. Various quantum technologies and algorithms are used to secure company data. This paper provides a systematic review of the literature on the principles of quantum computing. The SLR focuses on achieving four aims “identifying a variety of quantum IoT devices, analyzing their importance in different industries, highlighting the challenges of quantum technology, and presenting various techniques used by researchers to overcome different problems”. Quantum cryptography is identified as a key strategy for improving the security of IoT systems and ensuring the security and consistency of information.

Advancements in intrusion detection: A lightweight hybrid RNN-RF model

Article

Full-text available

Jun 2024
PLOS ONE

Computer networks face vulnerability to numerous attacks, which pose significant threats to our data security and the freedom of communication. This paper introduces a novel intrusion detection technique that diverges from traditional methods by leveraging Recurrent Neural Networks (RNNs) for both data preprocessing and feature extraction. The proposed process is based on the following steps: (1) training the data using RNNs, (2) extracting features from their hidden layers, and (3) applying various classification algorithms. This methodology offers significant advantages and greatly differs from existing intrusion detection practices. The effectiveness of our method is demonstrated through trials on the Network Security Laboratory (NSL) and Canadian Institute for Cybersecurity (CIC) 2017 datasets, where the application of RNNs for intrusion detection shows substantial practical implications. Specifically, we achieved accuracy scores of 99.6% with Decision Tree, Random Forest, and CatBoost classifiers on the NSL dataset, and 99.8% and 99.9%, respectively, on the CIC 2017 dataset. By reversing the conventional sequence of training data with RNNs and then extracting features before applying classification algorithms, our approach provides a major shift in intrusion detection methodologies. This modification in the pipeline underscores the benefits of utilizing RNNs for feature extraction and data preprocessing, meeting the critical need to safeguard data security and communication freedom against ever-evolving network threats.

6G Automotive Supply Chain Network for Supply Chain Performance Evaluation Model

Article

Full-text available

Jun 2024
WIRELESS PERS COMMUN

Intelligent network administration and oversight are key components of the 6G future of networks, even though the cloudification of networking with a micro-services-oriented architecture is an established component of 5G. Therefore, a significant role for deep learning (DL), machine learning (ML), and artificial intelligence (AI) can be found in the envisaged 6G model. Upcoming end-to-end automated network operation necessitates the early identification of threats, using resourceful prevention techniques, and the assurance that 6G systems will be self-sufficient. The present piece investigates how AI can be used in 6G data communication and supply chain role 6G networks. In this work, the 6G-based Automotive Supply Chain network is used to evaluate the supply chain using the Deep Learning method. The proposed method integrates an automotive supply chain and deep learning method to improve operational efficiency, improve decision-making and minimise the risks present in the data. Initially, the dataset is collected with the help of a 6G network; next, the dataset is pre-processed. Finally, the dataset is trained by using Deep Q networks. The Guangzhou Automobile Toyota Company dataset is used for evaluation in this work. The proposed work evaluates the enterprise’s and suppliers’ demands based on the product category, and then it also detects the errors found during the transactions between the enterprise and suppliers. This technique makes it possible for businesses and suppliers to communicate clearly and work collaboratively to pursue additional promotion. Managers in enterprises can use theoretical data to support their research while making judgments.

Indoor Scene Construction Technology Based on 6G Virtual Simulation and CPS

Article

Full-text available

Jun 2024
WIRELESS PERS COMMUN

Our study investigates using a unified framework for indoor scene creation in industrial settings that combines Cyber-Physical Systems (CPS) with 6G technologies. This research aims to improve automation and real-time interaction in intricate industrial environments by utilising the superior capabilities of 6G and CPS. We provide a case study of a manufacturing facility to demonstrate how our method makes space optimisation easier, boosts operational effectiveness, and strengthens safety protocols. This case study is an excellent example of the potential and real-world advantages of using cutting-edge technologies in industrial settings. This study delves into the challenge of failure prediction in process sectors that use intelligent and autonomous cyber-physical systems (CPS) in a 6G setting. This aligns with the latest developments in Industry 4.0 and the IIoT. Specifically, we developed a full-stack deep learning approach that used massive amounts of real-time sensory data collected from wireless sensors in a chemical plant. To start, while working with unbalanced time-series data, a unique recursive architecture is proposed that uses several lookback inputs to make an initial forecast using autoregression. During this method, a new learning algorithm called "Recursive Gradient Descent (RGD)" is developed for the proposed architecture to reduce the cumulative prediction uncertainties. Afterwards, a multi-class classification model using temporal convolutions across many channels with a decay effect is proposed to detect and localise the root causes of failure. Because of its exceptional ability to reduce prediction uncertainties accumulated across numerous prediction stages, the entire network is termed the Cumulative Uncertainty Reduction Network with Bayesian Neural Network (CURN-BNN). Results show that CURN-BNN outperforms the state-of-the-art approaches, especially regarding recall for fault prediction and fault type categorisation accuracy.

Blockchain-enabled intelligent toll management system

Article

Full-text available

Jun 2024

Road toll tax contributes significantly in the economic development of any nation. In developing countries, the toll tax collection is carried out either manually or electronically. However, both approaches suffer from various challenges, including prolonged waiting times, lack of transparency, high operational costs, and concerns regarding data security and privacy. This research aims to address these challenges using a blockchain-based system. The proposed system employs advanced image processing techniques, specifically “You Only Look Once” version 5 (YOLOv5), to accurately capture and store vehicles’ registration numbers in a local server situated at toll plazas. Subsequently, the vehicle identification, along with the driver’s credentials, is transmitted to an application server, where an Ethereum smart contract verifies the information and automatically deducts the toll charges from the driver’s account. The results from this study indicate that the proposed system effectively reduces vehicle waiting time and facilitates uninterrupted vehicular movement. Additionally, the system ensures transaction transparency, safeguards the security and privacy of vehicle details, facilitates non-stop payments, rendering unnecessary cash payments or radio-frequency identification scanning at toll booths, and incorporates a decentralized architectural framework to enhance security and mitigate potential system failures.

Real Time Classification of Vehicles Using Machine Learning Algorithm on the Extensive Dataset

Article

Full-text available

Jan 2024

Vehicle classification (VC) is a prominent research domain within image processing and machine learning (ML) for identifying vehicle volumes and traffic rule violations. In developed countries, nearly 40% of daily accidents are fatal, while in developing countries, the figure rises to 70%. Traditionally, vehicle detection and classification have been performed manually by experts, which is difficult, time-consuming, and prone to errors. Furthermore, incorrect detection and classification can result in hazardous situations. This highlights the need for more reliable techniques to identify and classify vehicles accurately and practically. In existing applications, numerous automated methods have been proposed. However, employing deep and machine learning algorithms on complex datasets of vehicle images has failed to achieve accuracy in various climate conditions and has been time-consuming. This paper presents an accurate, robust, real-time system to classify vehicles from onsite roads. The proposed system utilizes a random wavelet transform for pre-processing, edge and region-based segmentation for feature extraction, an embedded method for feature selection, and the XGBoost algorithm for VC. The proposed work classifies vehicles under complex weather, illumination, color, and occlusion conditions over 10 datasets, including a novel dataset named SRM2KTR, containing 75,436 vehicle images on an FPGA platform. The results show 98.81% accuracy, outperforming the state-of-the-art (98%). The system was demonstrated with four different classifiers, classifying images in 0.16 ns with an average accuracy of 97.79%. The system exhibits high accuracy, rapid identification time, and robustness in practical use.

Deep Learning for Image and Point Cloud Fusion in Autonomous Driving: A Review

Article

Full-text available

Feb 2022

Autonomous vehicles were experiencing rapid development in the past few years. However, achieving full autonomy is not a trivial task, due to the nature of the complex and dynamic driving environment. Therefore, autonomous vehicles are equipped with a suite of different sensors to ensure robust, accurate environmental perception. In particular, the camera-LiDAR fusion is becoming an emerging research theme. However, so far there has been no critical review that focuses on deep-learning-based camera-LiDAR fusion methods. To bridge this gap and motivate future research, this article devotes to review recent deep-learning-based data fusion approaches that leverage both image and point cloud. This review gives a brief overview of deep learning on image and point cloud data processing. Followed by in-depth reviews of camera-LiDAR fusion methods in depth completion, object detection, semantic segmentation, tracking and online cross-sensor calibration, which are organized based on their respective fusion levels. Furthermore, we compare these methods on publicly available datasets. Finally, we identified gaps and over-looked challenges between current academic researches and real-world applications. Based on these observations, we provide our insights and point out promising research directions.

LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector

Conference Paper

Oct 2021

Task-Driven Super Resolution: Object Detection in Low-Resolution Images

Chapter

Dec 2021

We consider how image super-resolution (SR) can contribute to an object detection task in low-resolution images. Intuitively, SR gives a positive impact on the object detection task. While several previous works demonstrated that this intuition is correct, SR and detector are optimized independently in these works. This paper analyze a framework to train a deep neural network where the SR sub-network explicitly incorporates a detection loss in its training objective, via a tradeoff with a traditional detection loss. This end-to-end training procedure allows us to train SR preprocessing for any differentiable detector. We demonstrate extensive experiments that show our task-driven SR consistently and significantly improves the accuracy of an object detector on low-resolution images from COCO and PASCAL VOC data set for a variety of conditions and scaling factors.

Confidence Guided Stereo 3D Object Detection with Split Depth Estimation

Conference Paper

Oct 2020

An Edge Traffic Flow Detection Scheme Based on Deep Learning in an Intelligent Transportation System

Article

Oct 2020

An intelligent transportation system (ITS) plays an important role in public transport management, security and other issues. Traffic flow detection is an important part of the ITS. Based on the real-time acquisition of urban road traffic flow information, an ITS provides intelligent guidance for relieving traffic jams and reducing environmental pollution. The traffic flow detection in an ITS usually adopts the cloud computing mode. The edge of the network will transmit all the captured video to the cloud computing center. However, the increasing traffic monitoring has brought great challenges to the storage, communication and processing of traditional transportation systems based on cloud computing. To address this issue, a traffic flow detection scheme based on deep learning on the edge node is proposed in this article. First, we propose a vehicle detection algorithm based on the YOLOv3 (You Only Look Once) model trained with a great volume of traffic data. We pruned the model to ensure its efficiency on the edge equipment. After that, the DeepSORT (Deep Simple Online and Realtime Tracking) algorithm is optimized by retraining the feature extractor for multiobject vehicle tracking. Then, we propose a real-time vehicle tracking counter for vehicles that combines the vehicle detection and vehicle tracking algorithms to realize the detection of traffic flow. Finally, the vehicle detection network and multiple-object tracking network are migrated and deployed on the edge device Jetson TX2 platform, and we verify the correctness and efficiency of our framework. The test results indicate that our model can efficiently detect the traffic flow with an average processing speed of 37.9 FPS (frames per second) and an average accuracy of 92.0% on the edge device.

Object-Centric Stereo Matching for 3D Object Detection

Conference Paper