Figure 1 - uploaded by Xiaoou Tang
Content may be subject to copyright.
Left: an example shows the interaction between features and attention masks. Right: example images illustrating that different features have different corresponding attention masks in our network. The sky mask diminishes low-level background blue color features. The balloon instance mask highlights high-level balloon bottom part features.

Left: an example shows the interaction between features and attention masks. Right: example images illustrating that different features have different corresponding attention masks in our network. The sky mask diminishes low-level background blue color features. The balloon instance mask highlights high-level balloon bottom part features.

Source publication
Article
Full-text available
In this work, we propose "Residual Attention Network", a convolutional neural network using attention mechanism which can incorporate with state-of-art feed forward network architecture in an end-to-end training fashion. Our Residual Attention Network is built by stacking Attention Modules which generate attention-aware features. The attention-awar...

Contexts in source publication

Context 1
... from more discriminative feature representation brought by the attention mechanism, our model also ex- hibits following appealing properties: (1) Increasing Attention Modules lead to consistent perfor- mance improvement, as different types of attention are cap- tured extensively. Fig.1 shows an example of different types of attentions for a hot air balloon image. ...
Context 2
... Attention Module, each trunk branch has its own mask branch to learn attention that is specialized for its fea- tures. As shown in Fig.1, in hot air balloon images, blue color features from bottom layer have corresponding sky mask to eliminate background, while part features from top layer are refined by balloon instance mask. Besides, the in- cremental nature of stacked network structure can gradually refine attention for complex images. ...
Context 3
... Attention Modules can gradually refine the feature maps. As show in Fig.1, fea- tures become much clearer as depth going deeper. ...

Similar publications

Article
Full-text available
The popular deep learning method of the year established a recognition model to identify remote sensing images, which is a new method. Based on Google Inc.’s Inception-v3 convolutional neural network recognition model, this paper adopted a conception of migration learning to achieve the accurate automatic classification of remote sensing image scen...
Article
Full-text available
Multi-sensor image can provide supplementary information, usually leading to better performance in classification tasks. However, the general deep neural network-based multi-sensor classification method learns each sensor image separately, followed by a stacked concentrate for feature fusion. This way requires a large time cost for network training...
Conference Paper
Full-text available
Skin diseases are very common and nowadays easy to get remedy from. But, sometimes properly diagnosing these diseases can be very troublesome due to the very hard-to-discriminate nature of the symptoms they exhibit. Deep Neural Networks, since its recent advent, has started outperforming different algorithms in almost every sector. One of the probl...

Citations

... The recent success of neural networks O'shea and Nash [2015], Deng et al. [2009] has greatly stimulated research in the fields of pattern recognition Wang et al. [2017], Stahlberg [2020] and data miningHe et al. [2017]. Especially, in many applications Cui et al. [2019], Fout et al. [2017], data are generated from non-Euclidean domains. ...
Preprint
Massive number of applications involve data with underlying relationships embedded in non-Euclidean space. Graph neural networks (GNNs) are utilized to extract features by capturing the dependencies within graphs. Despite groundbreaking performances, we argue that Multi-layer perceptrons (MLPs) and fixed activation functions impede the feature extraction due to information loss. Inspired by Kolmogorov Arnold Networks (KANs), we make the first attempt to GNNs with KANs. We discard MLPs and activation functions, and instead used KANs for feature extraction. Experiments demonstrate the effectiveness of GraphKAN, emphasizing the potential of KANs as a powerful tool. Code is available at https://github.com/Ryanfzhang/GraphKan.
... Inspired by SENet, Yang et al., 2020;Qin et al., 2021;Zhang et al., 2018) improved the squeeze or excitation module in various ways to enhance the modeling capability of the network. Besides, some works (Woo et al., 2018;Fu et al., 2019;Liu et al., 2020;Wang et al., 2017) utilized both channel attention and spatial attention to capture feature dependencies in both domains. Vision Transformer. ...
Article
Full-text available
Ultra-high resolution image segmentation has raised increasing interests in recent years due to its realistic applications. In this paper, we innovate the widely used high-resolution image segmentation pipeline, in which an ultra-high resolution image is partitioned into regular patches for local segmentation and then the local results are merged into a high-resolution semantic mask. In particular, we introduce a novel locality-aware context fusion based segmentation model to process local patches, where the relevance between local patch and its various contexts are jointly and complementarily utilized to handle the semantic regions with large variations. Additionally, we present the alternating local enhancement module that restricts the negative impact of redundant information introduced from the contexts, and thus is endowed with the ability of fixing the locality-aware features to produce refined results. Furthermore, in comprehensive experiments, we demonstrate that our model outperforms other state-of-the-art methods in public benchmarks and verify the effectiveness of the proposed modules. Our released codes will be available at: https://github.com/liqiokkk/FCtL.
... Several studies in the literature have explored integrating attention modules into deep learning models. For example, Wang et al. 35 developed a residual attention network that stacks several attention modules. Different attention modules have been proposed in the literature, such as the convolutional block attention module (CBAM). ...
Article
Full-text available
Large fingerprint databases can make the automated search process tedious and time‐consuming. Fingerprint pattern classification is a significant step in the identification system's complexity in terms of time and speed. Although several fingerprint algorithms have been developed for classification tasks, further improvements in performance and efficiency are still required. Most of the fingerprint algorithms use deep learning techniques. However, some deep learning techniques can be resource‐intensive and computationally expensive, while others can disregard the spatial relationships between the features used in classifying fingerprint patterns. This study proposes using lightweight deep learning models (i.e., MobileNet and EfficientNet‐B0) integrated with attention modules to classify fingerprint patterns. The two lightweight models are modified, yielding MobileNet+ and EfficientNet‐B0+ models. The lightweight deep learning models can help achieve optimal performance and reduce computational complexity. The attention modules focus on distinctive features for classification. Our proposed approach integrates four attention modules for fingerprint pattern classification into two lightweight deep learning models, that is, MobileNet+ and EfficientNet‐B0+. To evaluate our approach, we use two publicly available fingerprint datasets, that is, the NIST special database 301 dataset and the LivDet dataset. The evaluation results show that the EfficientNet‐B0+ model achieves the highest classification accuracy of 97% with only 854,086 training parameters. As a conclusion, we consider the training parameters small enough for the EfficientNet‐B0+ model to be deployed on low‐resource devices.
... Consequently, existing solutions fail to meet the requirements of detecting and recognizing AHSI defects in real-world natural scenes. 2) All of these methods rely on CNNs [17], [18] and its variants [19], [20] for defect detection in OCS components. Unfortunately, CNNs excel in local cognitive abilities but lack global perception capabilities [21], [22] [23], [24], which can result in low recall and precision in recognizing AHSI defects. ...
... To overcome the challenges discussed above, we attempt to explore new techniques and methods to address these bottlenecks. Recently, the fusion of CNN and transformers [25], [26] [22], [27] has made rapid progress and has been successfully applied to various downstream tasks such as image classification [17], [18], object detection [28], [29] [19], and semantic segmentation [30], [31]. ...
... Efficient CNN blocks (ECBs) and efficient transformer blocks (ETBs) work in parallel to improve efficiency during feature extraction. Inspired by ResNet's strategy of increasing block numbers in Stage 3 [17], we fix the number of L 1 , L 2 , and L 4 in Stages 1, 2, and 4, and determine the optimal parameters N x and L 3 through numerous experiments on the Image-22K dataset [45]. The experiment provides evidence that when N x = 4 and L 3 = 4, CTBM-DAHD achieves the optimal performance. ...
Article
Full-text available
The sectional insulator with arcing horns is indispensable for electrified railways’ overhead contact system (OCS). When these arcing horns are damaged, broken, or detached, arcing can occur due to unstable contact between the pantograph and catenary, potentially causing burns and damage to OCS components or even complete system failure. Unfortunately, no specialized technique or method for detecting arcing horn defects exists. To address this critical issue, this paper proposes a novel CTBM-DAHD network that utilizes CNN-Transformer bridge mode to recognize arcing horn defects in realistic application scenarios, such as rainy, foggy, sunny, and night-time conditions. Notably, the network can accurately detect obscured and long-range minor arcing horn defects, significantly improving recall and precision of defect recognition. The experimental findings demonstrate that, compared to the state-of-the-art networks, the CTBM-DAHD network achieves outstanding performance while maintaining lower computational costs and a reduced number of weight parameters. It surpasses the best CNN-Transformer bridge fusion network by 3.5% and outperforms the dual-branch vision transformer by 1.5%. Moreover, the CTBM-DAHD network has been successfully deployed on over 500 high-speed trains in China, detecting more than 1,000 arcing horn defects. These results affirm its effectiveness in recognizing arcing horn defects in complex and natural environments.
... Recent years have witnessed that a great effort has been devoted to the network with attention mechanisms. Wang et al., proposed a residual attention network based on an attention module which utilizes an encoder-decoder structure [34]. The network is proven to perform well and be robust to noisy data through refining feature map. ...
Article
Full-text available
Three-dimensional convolutional neural networks (3D-CNNs) and full connection long short-term memory networks (FC-LSTMs) have been demonstrated as a kind of powerful non-intrusive approaches in fall detection. However, the feature extration of 3D-CNN-based requires a large-scale dataset. Meanwhile, the deployment of FC-LSTM to expand the input into one-dimension leads to the loss of spatial information. To this end, a novel model combined lightweight 3D-CNN and convolutional long short-term memory (ConvLSTM) networks is proposed in this paper. In this model, a lightweight 3D convolutional neural network with five layers is presented to avoid the phenomenon of over-fitting. To further explore the discrimnative features, the channel- and spatial-wise attention modules are adopted in each layer to improve the detection performance. In addition, the ConvLSTM is presented to extract the long-term spatial–temporal features of 3D tensors. Finally, we verify our model through extensive experiments by comprehensive comparisons with HMDB5, UCF11, URFD, and MCFD. Experimental results on the public benchmarks demonstrate that our method outperforms current state-of-the-art single-stream networks with 62.55 ± 7.99% on HMDB5, 97.28 ± 0.36% on UCF11, 98.06 ± 0.32% on URFD, and 94.84 ± 4.64% on MCFD.
... Thus, various attention-based approaches have been proposed to address tasks. In the classification tasks, attention-based modules enable the models to capture discriminant features for target classes [36]. Meanwhile, within segmentation tasks, attention mechanism modules yield promising results by enhancing attention on challenging pixels like boundaries, facilitating the understanding of contextual information [37,38]. ...
Article
Full-text available
Currently, deep learning-based methods have achieved success in glaucoma detection. However, most models focus on OCT images captured by a single scan pattern within a given region, holding the high risk of the omission of valuable features in the remaining regions or scan patterns. Therefore, we proposed a multi-region and multi-scan-pattern fusion model to address this issue. Our proposed model exploits comprehensive OCT images from three fundus anatomical regions (macular, middle, and optic nerve head regions) being captured by four scan patterns (radial, volume, single-line, and circular scan patterns). Moreover, to enhance the efficacy of integrating features across various scan patterns within a region and multiple regional features, we employed an attention multi-scan fusion module and an attention multi-region fusion module that auto-assign contribution to distinct scan-pattern features and region features adapting to characters of different samples, respectively. To alleviate the absence of available datasets, we have collected a specific dataset (MRMSG-OCT) comprising OCT images captured by four scan patterns from three regions. The experimental results and visualized feature maps both demonstrate that our proposed model achieves superior performance against the single scan-pattern models and single region-based models. Moreover, compared with the average fusion strategy, our proposed fusion modules yield superior performance, particularly reversing the performance degradation observed in some models relying on fixed weights, validating the efficacy of the proposed dynamic region scores adapted to different samples. Moreover, the derived region contribution scores enhance the interpretability of the model and offer an overview of the model’s decision-making process, assisting ophthalmologists in prioritizing regions with heightened scores and increasing efficiency in clinical practice.
... The RAN (Wang et al.,April, 2017) is formed by arranging various attention layers after one another. These attention layers are further classified into two more branches. ...
Article
The prevention of certain unwanted crime events and eliminating them even before their execution can be done by automatic identification of abnormal behavior in humans. Hence automatic prediction of abnormal human behavior is a difficult task to perform. Some of the automated model has been implemented and provided the most promising results. The manual intervention is being the greatest approach in earlier time, yet it brings with numerous errors, consumes more time and more cost effective. Henceforth, the automated model is suggested for identifying the activities. As the scholar focus on machine and deep learning, this classifier may extract the hand-crafted features. But it fails to yield the appropriate solution for finding the activities. Since it belongs to the video frames, the object detection is highly Ineffective feature vector and inadequate scale measures of the learning model paves the way for performance degradation. This issue can be resolved by including an attention mechanism in the deep learning model for both monitoring and classification purposes. The recommended Human Abnormal Behavior Recognition and Tracking (HABRT) model performs the following operations, such as the collection of video, categorizing the behavior in the video as normal or abnormal, monitoring, extraction of the object, and classification of the abnormality. The input video with such frames is initially gathered from publically available databases. By using these frames, the abnormal behavior classification is done by Multiscale Dilated assisted Residual Attention Network (MD-RAN), For further enhancement, the hyper-parameters in the MD-RAN are optimally selected by novel Modified Random Parameter-based Chimp Optimization Algorithm (MRP-ChOA). Once the abnormal frames are obtained, the activity tracking is achieved by Adaptively Modified You Only Look Once (YOLO) V3 (AM-YOLO V3). This model encompasses with multiple layers, so that utilized number of layers are determined optimally using MRP-ChOA. Consequently, the objects are extracted from the abnormal frames with the help of AM-YOLO V3. Finally, the abnormalities are classified by using the same MD-RAN. At last, the performance is analyzed and validated with diverse parameters, which are then compared with other algorithms. While implementing the dataset 1, the accuracy value attains maximum in contrast with 3.14% of DTCN, 2.308% of CNN-RNN and 13.7% of ResAttenConvLSTM, correspondingly. Thus, the findings reveal that it has the potential to deliver extensive results for abnormal recognition and tracking.
... • Well-suited for difficult and/or timeconsuming tasks that would require humans to "hand-craft" predictive features from the data [182][183][184][185][186] Predicting gene expression from DNA sequence features [187] • Training may require specialized hardware to handle parallel computations (i.e., GPU [188]) ...
Article
Applications of machine learning in the biomedical sciences are growing rapidly. This growth has been spurred by diverse cross-institutional and interdisciplinary collaborations, public availability of large datasets, an increase in the accessibility of analytic routines, and the availability of powerful computing resources. With this increased access and exposure to machine learning comes a responsibility for education and a deeper understanding of its bases and bounds, borne equally by data scientists seeking to ply their analytic wares in medical research and by biomedical scientists seeking to harness such methods to glean knowledge from data. This article provides an accessible and critical review of machine learning for a biomedically informed audience, as well as its applications in psychiatry. The review covers definitions and expositions of commonly used machine learning methods, and historical trends of their use in psychiatry. We also provide a set of standards, namely Guidelines for REporting Machine Learning Investigations in Neuropsychiatry (GREMLIN), for designing and reporting studies that use machine learning as a primary data-analysis approach. Lastly, we propose the establishment of the Machine Learning in Psychiatry (MLPsych) Consortium, enumerate its objectives, and identify areas of opportunity for future applications of machine learning in biological psychiatry. This review serves as a cautiously optimistic primer on machine learning for those on the precipice as they prepare to dive into the field, either as methodological practitioners or well-informed consumers.
... In [38], CNN is utilized to construct deep Q-network, which evaluates the user QoS demand data and energy consumption data to facilitate further network resource optimization. Moreover, recent research begins to focus on the introduction of attention mechanism to CNN to further enhance the model performance when processing large amounts of data [39], which has been explored in various scenarios including image caption generation [40], [41], machine translation [42], [43], speech recognition [44], [45], etc. [47] introduces the attention mechanism to residual convolutional neural networks, which is called residual attention network and the attention mechanism is achieved by stacking attention modules that can generate attention-aware features. [46] achieves the attention mechanism of CNN by adding a squeeze-and-excitation (SE) block, which explicitly models inter-dependencies between different channels to adaptively learn channel-wise features and proves to have a great performance enhancement of learning multi-channel data features. ...
Article
Full-text available
Network performance evaluation is crucial in ensuring the effective operation of 5G wireless networks, offering valuable insights into evaluating network status and user experience. However, the complexity of network conditions, characterized by high dynamics and diverse user requirements across various vertical applications, presents a significant challenge for generating accurate and detailed evaluation results using existing algorithms. To provide a feasible solution for this issue, an artificial intelligence-enabled 5G network performance evaluation scheme for private 5G networks is proposed. First, the network performance evaluation at different granularities is modeled with the deployment of network performance evaluation introduced. Furthermore, an intelligent network performance evaluation architecture based on residual networks with the attention mechanism is introduced, which can generate evaluation scores based on key performance indicators of reliability, accessibility, utilization, integrity, mobility and retainability. Additionally, the corresponding training strategies for the intelligent model, catering to different evaluation granularity, are thoroughly designed. Finally, to validate the effectiveness of the proposed scheme, comprehensive experiments are conducted using practical 5G network operation system data. The experimental results demonstrate the scheme’s ability to achieve highly accurate evaluations with fine spatial granularity. These findings establish the feasibility and efficacy of the proposed artificial intelligence-enabled scheme in enhancing 5G network performance evaluation.
... ii) Residual approximation: Deep residual networks (ResNets) is a classic structure for residual approximation [44], which contains some stacked residual block. Furthermore, the residual attention network stacks multiple attention modules whose features adaptively change as the layers become deeper, which can be stacked to hundreds of layers by residual approximation [45]. ...
... It mainly contains two schemes: channel-wise attention and spatial-wise attention. The former is equivalent to performing convolution [178], while the latter is responsible for encoding local information in features [45]. Actually, some studies have integrated them into one framework [179], [180]. ...
Article
Full-text available
The dynamic neural network (DNN), in contrast to the static counterpart, offers numerous advantages, such as improved accuracy, efficiency, and interpretability. These benefits stem from the network’s flexible structures and parameters, making it highly attractive and applicable across various domains. As the broad learning system (BLS) continues to evolve, DNNs have expanded beyond deep learning (DL), orienting a more comprehensive range of domains. Therefore, this comprehensive review article focuses on two prominent areas where DNN structures have rapidly developed: 1) DL and 2) broad learning. This article provides an in-depth exploration of the techniques related to dynamic construction and inference. Furthermore, it discusses the applications of DNNs in diverse domains while also addressing open issues and highlighting promising research directions. By offering a comprehensive understanding of DNNs, this article serves as a valuable resource for researchers, guiding them toward future investigations.