Kai Zeng's research works | Kunming University of Science and Technology and other places

This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

Overall architecture of the RaBFT algorithm

RaBFT: an improved Byzantine fault tolerance consensus algorithm based on raft

June 2024

7 Reads

The Journal of Supercomputing

[...]

To address the limitations of the Raft consensus algorithm, such as the lack of support for Byzantine fault tolerance, performance bottleneck of the leader single node, and high leader election delay, an improved Byzantine fault tolerance consensus algorithm called RaBFT based on Raft is proposed. The distribution process of log messages is optimized by utilizing the secret sharing technique to make it Byzantine fault tolerance, and the role of the committee is introduced to share the communication pressure of the leader, thereby resolving the performance bottleneck issue of the leader single node. The leader election algorithm based on a dynamic committee improves the speed of leader election and reduces the time required for leader election. The experimental results show that RaBFT algorithm has a significant improvement in throughput and consensus delay in the log replication phase, and has a lower leader election delay, RaBFT algorithm can improve the efficiency and performance of the system, it is a safe and efficient consensus algorithm.

View access options

Fuzzy preference matroids rough sets for approximate guided representation in transformer

Article

June 2024

Expert Systems with Applications

[...]

Illustration of the self-feature distillation (SFD) process. For the same input images, full-precision weights are used in parallel to extract teacher features. Then a value-swapping strategy is implemented to better adapt the knowledge capacity. Eventually, at the end of each binary convolution block, teacher features with more high-level information are provided for the weak student BNN.

Illustration of the self-soft label distillation (SSLD) process. The predictions output from the (T-1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(T\!-\!1)$$\end{document}-th epoch are dynamically filtered and averaged to generate the correct soft-label bank. This bank is then used to extract the desired soft labels for the teacher at the T-th epoch. Soft labels provide more category relationship knowledge to the weak student BNN

Illustration of the value-swapping strategy. In the convolutional layer (Conv), the full-precision weights use the sign function to produce binary weights. These two types of weights are convolved with the inputs to separately generate teacher and student features, but a significant knowledge capacity gap hinders direct learning. The sign information of the teacher features (grey) extracted using the full-precision weights are retained, and its value is replaced with the absolute value of the BNN student features (blue) to generate new learnable teacher features (green). Row A: Histogram of the new teacher features extracted using the full-precision weights and then applying value-swapping. Row B: Histogram of the teacher features extracted directly using the full-precision weights. Row C: Histogram of the student features extracted using the binary weights

Self-knowledge distillation enhanced binary neural networks derived from underutilized information

April 2024

10 Reads

Applied Intelligence

Binarization efficiently compresses full-precision convolutional neural networks (CNNs) to achieve accelerated inference but with substantial performance degradations. Self-knowledge distillation (SKD) can significantly improve the performance of a network by inheriting its own advanced knowledge. However, SKD for binary neural networks (BNNs) remains underexplored because the binary characteristics of weak BNNs limit their ability to act as effective teachers, hindering their ability to learn as students. In this study, a novel SKD-BNN framework is proposed by using two pieces of underutilized information. Full-precision weights, which are applied for gradient transfer, concurrently distill the feature knowledge of the teacher with high-level semantics. A value-swapping strategy minimizes the knowledge capacity gap, while the channel-spatial difference distillation loss promotes feature transfer. Moreover, historical output predictions generate a concentrated soft-label bank, providing abundant intra- and inter-category similarity knowledge. Dynamic filtering ensures the correctness of the soft labels during training, and the label-cluster loss enhances the summarization ability of the soft-label bank within the same category. The developed methods excel in extensive experiments, achieving state-of-the-art accuracy of 93.0% on the CIFAR-10 dataset, which is equivalent to that of full-precision CNNs. On the ImageNet dataset, the accuracy improves by 1.6% with the widely adopted IR-Net. It is emphasized that for the first time, the proposed method fully explores the underutilized information contained in BNNs and conducts an effective SKD process, enabling weak BNNs to serve as competent self-teachers and proficient students.

View access options

Multispectral point cloud superpoint segmentation

Article

January 2024

44 Reads

7 Citations

Science China Technological Sciences

[...]

The multitude of airborne point clouds limits the point cloud processing efficiency. Superpoints are grouped based on similar points, which can effectively alleviate the demand for computing resources and improve processing efficiency. However, existing superpoint segmentation methods focus only on local geometric structures, resulting in inconsistent spectral features of points within a superpoint. Such feature inconsistencies degrade the performance of subsequent tasks. Thus, this study proposes a novel Superpoint Segmentation method that jointly utilizes spatial Geometric and Spectral Information for multispectral point cloud superpoint segmentation (GSI-SS). Specifically, a similarity metric that combines spatial geometry and spectral information is proposed to facilitate the consistency of geometric structures and object attributes within segmented superpoints. Following the formation of the primary superpoints, an intersuperpoint pointexchange mechanism that maximizes feature consistency within the final superpoints is proposed. Experiments are conducted on two real multispectral point cloud datasets, and the proposed method achieved higher recall, precision, F score, and lower global consistency and feature classification errors. The experimental results demonstrate the superiority of the proposed GSI-SS over several state-of-the-art methods.

Figure 1. The relation representation of point cloud data. (a) The dot product or addition relations between two isolated feature vectors in classical transformer. (b) The approximation relations after granulation in the view of rough set theory.

Figure 2. The granulation operation and approximation operation. The relationship function guides the commonality between each feature, and the generated granulation matrix contains all the information about each feature with other features. The mutual approximation between concepts leads to the degree to which information grains are necessarily related to each other.

Figure 3. Architecture of RST. The encoder mainly comprises an Input Embedding module and multiple multi-headed attention modules. The decoder mainly comprises multiple linear layers for classification and segmentation.

Figure 6. Comparison experiments between traditional transformer and RST with different occlusion levels.

Comparison on the ModelNet40 classification dataset. * represents the baseline.

RST: Rough Set Transformer for Point Cloud Learning

November 2023

39 Reads

Sensors

Xinwei Sun

Kai Zeng

Point cloud data generated by LiDAR sensors play a critical role in 3D sensing systems, with applications encompassing object classification, part segmentation, and point cloud recognition. Leveraging the global learning capacity of dot product attention, transformers have recently exhibited outstanding performance in point cloud learning tasks. Nevertheless, existing transformer models inadequately address the challenges posed by uncertainty features in point clouds, which can introduce errors in the dot product attention mechanism. In response to this, our study introduces a novel global guidance approach to tolerate uncertainty and provide a more reliable guidance. We redefine the granulation and lower-approximation operators based on neighborhood rough set theory. Furthermore, we introduce a rough set-based attention mechanism tailored for point cloud data and present the rough set transformer (RST) network. Our approach utilizes granulation concepts derived from token clusters, enabling us to explore relationships between concepts from an approximation perspective, rather than relying on specific dot product functions. Empirically, our work represents the pioneering fusion of rough set theory and transformer networks for point cloud learning. Our experimental results, including point cloud classification and segmentation tasks, demonstrate the superior performance of our method. Our method establishes concepts based on granulation generated from clusters of tokens. Subsequently, relationships between concepts can be explored from an approximation perspective, instead of relying on specific dot product or addition functions. Empirically, our work represents the pioneering fusion of rough set theory and transformer networks for point cloud learning. Our experimental results, including point cloud classification and segmentation tasks, demonstrate the superior performance of our method.

Download

Multimodal rough set transformer for sentiment analysis and emotion recognition

Conference Paper

August 2023

10 Reads

[...]

Deep Graph Network for Multispectral Point Cloud Classification with Adaptive Multi-kernel Graph

Conference Paper

July 2023

4 Reads

[...]

Figure 5. Example of Caltech Pedestrian dataset.

Figure 6. Example of BDD 100 K-Person dataset.

Lightweight Pedestrian Detection Based on Feature Multiplexed Residual Network

February 2023

139 Reads

1 Citation

Electronics

[...]

As an important part of autonomous driving intelligence perception, pedestrian detection has high requirements for parameter size, real-time, and model performance. Firstly, a novel multiplexed connection residual block is proposed to construct the lightweight network for improving the ability to extract pedestrian features. Secondly, the lightweight scalable attention module is investigated to expand the local perceptual field of the model based on dilated convolution that can maintain the most important feature channels. Finally, we verify the proposed model on the Caltech pedestrian dataset and BDD 100 K datasets. The results show that the proposed method is superior to existing lightweight pedestrian detection methods in terms of model size and detection performance.

Download

UAV images are acquired by visible light camera and infrared camera, respectively. (a) Visible image. (b) Infrared image.

The overall framework of the local adaptive illumination-driven input-level fusion module. Conv k3s1 indicates that the convolutional kernel size is 3 and stride is set to 1.

Interior structure of local illumination perception module. k3s1 indicates that the kernel size is 3 and stride is set to 1. The number on the connecting line represents the number of channels of the output feature map.

(a) Original image. (b) The original image is divided into grid cells, and the sample image is divided into 16 grid cells. (c) The design value of the illumination condition in each area of the image is shown, where the value 1 represents the strongest and the value 0 represents the weakest. (d) The histogram of the pixels in the third block of the second row of the partition, where the RGB values are divided into 32 intervals for statistics.

+11

Local Adaptive Illumination-Driven Input-Level Fusion for Infrared and Visible Object Detection

January 2023

253 Reads

19 Citations

Remote Sensing

[...]

Remote sensing object detection based on the combination of infrared and visible images can effectively adapt to the around-the-clock and changeable illumination conditions. However, most of the existing infrared and visible object detection networks need two backbone networks to extract the features of two modalities, respectively. Compared with the single modality detection network, this greatly increases the amount of calculation, which limits its real-time processing on the vehicle and unmanned aerial vehicle (UAV) platforms. Therefore, this paper proposes a local adaptive illumination-driven input-level fusion module (LAIIFusion). The previous methods for illumination perception only focus on the global illumination, ignoring the local differences. In this regard, we design a new illumination perception submodule, and newly define the value of illumination. With more accurate area selection and label design, the module can more effectively perceive the scene illumination condition. In addition, aiming at the problem of incomplete alignment between infrared and visible images, a submodule is designed for the rapid estimation of slight shifts. The experimental results show that the single modality detection algorithm based on LAIIFusion can ensure a large improvement in accuracy with a small loss of speed. On the DroneVehicle dataset, our module combined with YOLOv5L could achieve the best performance.

Download

Millimeter-wave radar and camera performance in diverse weather conditions, where a indicates the camera effect map at night and during rain; and b indicates the effect map of radar points onto the image

CenterTransFuser model consists of two branches, namely radar and image. The image branch, after feature extraction by the backbone network, generates preliminary 3D information and 2D regions of interest for the radar branch. The radar branch is supplemented by data processing and frustum association with the information provided by the image. The information from both branches is fed into the cross-transformer module for contextual interaction and into the detection head for classification and regression

Spatial information enhancement and the filtering effect. The first column presents the effect of mapping radar points to the image; the second column presents the spatial information enhancement of radar points; the third column presents the effect of mapping radar points to the image using the depth information after baseline spatial information enhancement; the fourth column presents the effect after depth filtering

Cross-transformer model mainly includes two parts, i.e., the encoder and the decoder. The query matrices of the radar branch and the image branch, respectively, guide image information and radar information into multihead cross-attention for cross-modal information interaction and then into multihead joint cross-attention for deep contextual interaction

The cross-modal attention model consisting of two branches, with the query matrix of the radar branch guiding the image for cross-modal interaction

CenterTransFuser: radar point cloud and visual information fusion for 3D object detection

January 2023

236 Reads

4 Citations

EURASIP Journal on Advances in Signal Processing

Yan Li

Kai Zeng

Tao Shen

Sensor fusion is an important component of the perception system in autonomous driving, and the fusion of radar point cloud information and camera visual information can improve the perception capability of autonomous vehicles. However, most of the existing studies ignore the extraction of local neighborhood information and only consider shallow fusion between the two modalities based on the extracted global information, which cannot perform a deep fusion of cross-modal contextual information interaction. Meanwhile, in data preprocessing, the noise in radar data is usually only filtered by the depth information derived from image feature prediction, and such methods affect the accuracy of radar branching to generate regions of interest and cannot effectively filter out irrelevant information of radar points. This paper proposes the CenterTransFuser model that makes full use of millimeter-wave radar point cloud information and visual information to enable cross-modal fusion of the two heterogeneous information. Specifically, a new interaction called cross-transformer is explored, which cooperatively exploits cross-modal cross-multiple attention and joint cross-multiple attention to mine radar and image complementary information. Meanwhile, an adaptive depth thresholding filtering method is designed to reduce the noise of radar modality-independent information projected onto the image. The CenterTransFuser model is evaluated on the challenging nuScenes dataset, and it achieves excellent performance. Particularly, the detection accuracy is significantly improved for pedestrians, motorcycles, and bicycles, showing the superiority and effectiveness of the proposed model.

Download

... To meet the experimental requirements, we manually segmented the above two datasets into 9606 (Harbor of Tobermory) and 9350 (University of Houston) superpoints based on the point cloud segmentation method [38]. The method specifically combines spatial and spectral similarity metrics to perform point cloud segmentation. ...
Reference:
Multi-Kernel Graph Structure Learning for Multispectral Point Cloud Classification

Multispectral point cloud superpoint segmentation

Citing Article
January 2024

Science China Technological Sciences

[...]

... The mentioned GhostC3 module in the paper increases the structural complexity of the network, which could potentially result in less than optimal model inference speed. Sha, Mengzhou [19] proposed to use a lightweight scalable attention module based on dilated convolution to maintain important feature channels and a multiplexed connection residual block to construct a lightweight network for pedestrian detection. Zhao [20] proposed a lightweight detection model based on YOLOv5, which combines the MD-SILBP operator and the five-frame differential method to enhance the contour feature extraction capability and uses Distance-IoU non-maximum suppression to reduce the missed detection rate in detection. ...
Reference:
A Robust Lightweight Network for Pedestrian Detection Based on YOLOv5-x

Lightweight Pedestrian Detection Based on Feature Multiplexed Residual Network

Citing Article
Full-text available
February 2023

Electronics

[...]

... IR imaging provides thermal information for detecting objects in low light or adverse weather conditions, while VIS imaging offers rich textural details for precise target recognition [9]. Fusion techniques integrate these complementary features, improving the accuracy and robustness of scene analysis [10]. By leveraging the benefits of both modalities, fusion enhances the system's ability to detect and track objects across varying environmental conditions, enhancing overall surveillance effectiveness and reliability in critical applications such as security and monitoring [11]. ...
Reference:
An effective reconstructed pyramid crosspoint fusion for multimodal infrared and visible images

Local Adaptive Illumination-Driven Input-Level Fusion for Infrared and Visible Object Detection

Citing Article
Full-text available
January 2023

Remote Sensing

[...]

... Li et al. [161] demonstrate how radar point cloud projection on the image plane combines sparse radar data with visual information to improve 2D and 3D object detection. The CenterTransFuser model uses a fusion approach that processes radar data and RGB images independently before combining them into a cross-transformer module, increasing detection accuracy for pedestrians, motorcycles, and bicycles. ...
Reference:
Vulnerable Road User Detection and Safety Enhancement: A Comprehensive Survey

CenterTransFuser: radar point cloud and visual information fusion for 3D object detection

Citing Article
Full-text available
January 2023

EURASIP Journal on Advances in Signal Processing

Yan Li

Kai Zeng

Tao Shen

... We demonstrate the effectiveness of proposed additions on MOT17 [41] and MOT20 [14] datasets. It has become a standard practice to apply camera motion compensation (CMC) [1,4,16,17,37,54] and interpolation of fragmented tracks [1,17,67,69] to MOT. By integrating CMC and gradient boosting interpolation from [67], we achieve comparable results with state of the art methods, without using time costly visual features and running at the speed of 65.45 FPS on MOT17 and 32.79 FPS on MOT20, on a desktop with one NVIDIA GeForce RTX 3090 GPU and AMD Ryzen 9 5950X 16-Core CPU. ...
Reference:
BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

NCT:noise-control multi-object tracking

Citing Article
Full-text available
January 2023

Complex & Intelligent Systems

[...]

... Subsequently, feature maps from both branches are fused and connected via shortcut connections. The key distinction from previous works (Touvron et al. 2021;Wang et al. 2021;Wu et al. 2021;Zeng et al. 2022) is the parallel processing of input information in our structure, in contrast to previous works that integrated convolution in the Transformer or used the Transformer solely for feature fusion. These approaches underutilize the strengths of convolution and Transformer, while our parallel structure leverages the full potential of both. ...
Reference:
VBNet: A Visually-Aware Biomimetic Network for Simulating the Human Eye’s Visual System

NLFFTNet: A non-local feature fusion transformer network for multi-scale object detection

Citing Article
April 2022

Neurocomputing

[...]

... Classification is essential in various fields, from image recognition to social sciences [16][17][18][19]. Their success is due to the increased data availability, software and hardware improvements, and various algorithmic breakthroughs that expedite data training [20][21][22][23][24][25][26][27]. Consequently, with the application of CNN methods in agriculture, farmers can recognize and treat plant diseases earlier, increasing crop yields and reducing the risk of crop loss. ...
Reference:
MiniTomatoNet: a lightweight CNN for tomato leaf disease recognition on heterogeneous FPGA-SoC

FPGA-based accelerator for object detection: a comprehensive survey

Citing Article
Full-text available
March 2022

The Journal of Supercomputing

[...]

... Therefore, task privacy must be guaranteed in the scheduling process. Moreover, the task caches shared among edge servers also need to be protected due to the lack of trust among servers, and the risk of privacy leakage is a major obstacle to data sharing in E3C [10]. This further exacerbates the security challenges. ...
Reference:
Joint multi-server cache sharing and delay-aware task scheduling for edge-cloud collaborative computing in intelligent manufacturing

Trustworthy Blockchain-Empowered Collaborative Edge Computing-as-a-Service Scheduling and Data Sharing in the IIoE

Citing Article
February 2021

IEEE Internet of Things Journal

[...]

Kai Zeng's research while affiliated with Kunming University of Science and Technology and other places

What is this page?

Publications (15)

Citations (8)