The Comparison of Different Step Size Therefore, the general gradient descent algorithm is following: Initialise in aim function, recursion terminal distance ε, step size α. Calculate the gradient, the partial derivative of the loss function of aim function, .

Source publication

Figure 1. Single Computing Unit in Convolutional Layer "Xn" refers to...

Figure 6. The Comparison of Different Step Size Therefore, the general...

Applying Gradient Descent in Convolutional Neural Networks

Article

Full-text available

Apr 2018

Nan Cui

With the development of the integrated circuit and computer science, people become caring more about solving practical issues via information technologies. Along with that, a new subject called Artificial Intelligent (AI) comes up. One popular research interest of AI is about recognition algorithm. In this paper, one of the most common algorithms,...

Context 1

... it cannot detect a optimal point in this convergence space. Figure 6 shows the comparison among different step size's loss functions convergence situations. If , stop the algorithm, otherwise, continue the process. ...

View in full-text

A Survey of Accelerator Architectures for 3D Convolution Neural Networks

Article

Full-text available

Jan 2021

3D convolution neural networks (CNNs) have shown excellent predictive performance on tasks such as action recognition from videos. Since 3D CNNs have unique characteristics and extremely high compute/memory-overheads, executing them on accelerators designed for 2D CNNs provides sub-optimal performance. To overcome these challenges, researchers have...

Developing a novel big dataset and a deep neural network to predict the bearing capacity of a ring footing

Article

May 2024

Deep Segmentation Techniques for Breast Cancer Diagnosis

Article

Full-text available

Apr 2024

Background: This research goes into in deep learning technologies within the realm of medical imaging, with a specific focus on the detection of anomalies in medical pathology, emphasizing breast cancer. It underscores the critical importance of segmentation techniques in identifying diseases and addresses the challenges of scarce labelled data in Whole Slide Images. Additionally, the paper provides a review, cataloguing 61 deep learning architectures identified during the study. Objectives: The aim of this study is to present and assess a novel quantitative approach utilizing specific deep learning architectures, namely the Feature Pyramid Net-work and the Linknet model, both of which integrate a ResNet34 layer encoder to enhance performance. The paper also seeks to examine the efficiency of a semi-supervised training regimen using a dual model architecture, consisting of ‘Teacher’ and ‘Student’ models, in addressing the issue of limited labelled datasets. Methods: Employing a semi-supervised training methodology, this research enables the ‘Student’ model to learn from the ‘Teacher’ model’s outputs. The study methodically evaluates the models’ stability, accuracy, and segmentation capabilities, employing metrics such as the Dice Coefficient and the Jaccard Index for comprehensive assessment. Results: The investigation reveals that the Linknet model exhibits good performance, achieving an accuracy rate of 94% in the detection of breast cancer tissues utilizing a 21-seed parameter for the initialization of model weights. It further excels in generating annotations for the ‘Student’ model, which then achieves a 91% accuracy with minimal computational demands. Conversely, the Feature Pyramid Network model demonstrates a slightly lower accuracy of 93% in the Teacher model but exhibits improved and more consistent results in the ‘Student’ model, reaching 95% accuracy with a 42-seed parameter. Conclusions: This study underscores the efficacy and potential of the Feature Pyra-mid Network and Linknet models in the domain of medical image analysis, particularly in the detection of breast cancer, and suggests their broader applicability in various medical segmentation tasks related to other pathology disorders. Furthermore, the research enhances the understanding of the pivotal role that deep learning technologies play in advancing diagnostic methods within the field of medical imaging.

Gesture Recognition Based on Deep Learning: A Review

Article

Full-text available

Mar 2024

Meng Wu

Gesture recognition is an important and inevitable technology in modern times, its appearance and improvement greatly improve the convenience of people's lives, but also enrich people's lives. It has a wide range of applications in various fields. In daily life, it can carry out human-computer interaction and the use of smart home. In terms of medical treatment, it can help patients to recover and assist doctors to carry out experiments. In terms of entertainment, it allows users to interact with the game in an immersive manner. This paper chooses three technologies that deep learning plays a more prominent role in gesture recognition, namely CNNs, LSTM and transfer learning based on deep learning. They each have their own advantages and disadvantages. Because of the different principles of use, different techniques have different roles, such as CNNs can carry out feature extraction, LSTM can deal with long time series, transfer learning can transfer what is learned from another task to this task. Select different practical technologies according to different application scenarios, and make improvements in real time in practical applications. Gesture recognition based on deep learning has the advantages of good accuracy, robustness and real-time implementation, but it also bears the disadvantages of huge economic and time costs and high hardware requirements. Despite some challenges, researchers continue to optimize and improve the technology, and believe that in the future, gesture recognition technology will be more mature and valuable.

Intelligent Environment-Adaptive GNSS/INS Integrated Positioning with Factor Graph Optimization

Article

Full-text available

Dec 2023

Global navigation satellite systems (GNSSs) applied to intelligent transport systems in urban areas suffer from multipath and non-line-of-sight (NLOS) effects due to the signal reflections from high-rise buildings, which seriously degrade the accuracy and reliability of vehicles in real-time applications. Accordingly, the integration between GNSS and inertial navigation systems (INSs) could be utilized to improve positioning performance. However, the fixed GNSS solution uncertainty of the conventional integration method cannot determine the fluctuating GNSS reliability in fast-changing urban environments. This weakness becomes solvable using a deep learning model for sensing the ambient environment intelligently, and it can be further mitigated using factor graph optimization (FGO), which is capable of generating robust solutions based on historical data. This paper mainly develops the adaptive GNSS/INS loosely coupled system on FGO, along with the fixed-gain Kalman filter (KF) and adaptive KF (AKF) being taken as comparisons. The adaptation is aided by a convolutional neural network (CNN), and the feasibility is verified using data from different grades of receivers. Compared with the integration using fixed-gain KF, the proposed adaptive FGO (AFGO) maintains the 100% positioning availability and reduces the overall 2D positioning error by up to 70% in the aspects of both root mean square error (RMSE) and standard deviation (STD).

PlantKViT: A Combination Model of Vision Transformer and KNN for Forest Plants Classification

Article

Full-text available

Sep 2023
J UNIVERS COMPUT SCI

The natural ecosystem incorporates thousands of plant species and distinguishing them is normally manual, complicated, and time-consuming. Since the task requires a large amount of expertise, identifying forest plant species relies on the work of a team of botanical experts. The emergence of Machine Learning, especially Deep Learning, has opened up a new approach to plant classification. However, the application of plant classification based on deep learning models remains limited. This paper proposed a model, named PlantKViT, combining Vision Transformer architecture and the KNN algorithm to identify forest plants. The proposed model provides high efficiency and convenience for adding new plant species. The study was experimented with using Resnet-152, ConvNeXt networks, and the PlantKViT model to classify forest plants. The training and evaluation were implemented on the dataset of DanangForestPlant, containing 10,527 images and 489 species of forest plants. The accuracy of the proposed PlantKViT model reached 93%, significantly improved compared to the ConvNeXt model at 89% and the Resnet-152 model at only 76%. The authors also successfully developed a website and 2 applications called 'plant id' and 'Danangplant' on the iOS and Android platforms respectively. The PlantKViT model shows the potential in forest plant identification not only in the conducted dataset but also worldwide. Future work should gear toward extending the dataset and enhance the accuracy and performance of forest plant identification.

Control learning rate for autism facial detection via deep transfer learning

Article

Full-text available

May 2023

Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder that affects social interaction and communication. Early detection of ASD can significantly improve outcomes for individuals with the disorder, and there has been increasing interest in using machine learning techniques to aid in the diagnosis of ASD. One promising approach is the use of deep learning techniques, particularly convolutional neural networks (CNNs), to classify facial images as indicative of ASD or not. However, choosing a learning rate for optimizing the performance of these deep CNNs can be tedious and may not always result in optimal convergence. In this paper, we propose a novel approach called the control subgradient algorithm (CSA) for tackling ASD diagnosis based on facial images using deep CNNs. CSA is a variation of the subgradient method in which the learning rate is updated by a control step in each iteration of each epoch. We apply CSA to the popular DensNet-121 CNN model and evaluate its performance on a publicly available facial ASD dataset. Our results show that CSA is faster than the baseline method and improves the classification accuracy and loss compared to the baseline. We also demonstrate the effectiveness of using CSA with L1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_1$$\end{document}-regularization to further improve the performance of our deep CNN model.

Comparison between three convolutional neural networks for local climate zone classification using Google Earth Images: A case study of the Fujian Delta in China

Article

Mar 2023
ECOL INDIC

Local Climate Zone (LCZ) is a significant classification system of urban form and function, which can reflect the 3-dimensional urban information specifically. However, previous studies of LCZ lack class expansion, comparison of classification accuracy of different CNNs, and results post-processing methods. Therefore, using very-high-resolution (VHR) images (2.2 m resolution) to expand LCZ classes, we compare three different biclassified convolutional neurel networks (CNNs),namely MobileNet-Segnet (MS), MobileNet-Unet (MU) and MobileNet-Pspnet (MP), and select optimal CNN to classify Fujian Delta images. Then, we combine “Vote-Filter-Overlay” methods to remove misidentified patches and smooth boundaries for biclassified LCZ maps. The study results show that: (1) The 2.2 m resolution VHR image can expand the LCZ class from 17 to 20 classes. (2) Different CNNs have diverse sensitivity to each LCZ, the more distinctive texture characteristics of LCZs, the higher their identification rate. Among the three CNNs, MP is the best model for LCZ (2,4,8,9,10, B,C,D,G) and MU is the optimal models for LCZ (1,3,5,6,11,A,F,H,I). (3) “Vote-Filter-Overlay” method can remove misidentified patches and noise and make the LCZ map more in line with actual urban form and functions. (4) Fujian Delta urban areas form a continuous urban belt along the southeastern coast, while the villages and forests distribute in the northwestern. Many small patches of vegetation and water, which can serve as potential urban ecological corridors, were found in the urban core area. Fujian Delta urban are dominated by compact LCZ (1–3), and Xiamen has the highest proportion of LCZ (1), especially in Xiamen island. The results of this study will provide a reference for LCZ classification and basic data for urban morphology.

Multivariate Regression using Neural Networks and Sums of Separable Functions

Thesis

Full-text available

Apr 2022

Indupama Umayangi Herath

Currently, artiﬁcial neural networks are the most popular approach to machine learning problems such as high-dimensional multivariate regression. Methods using sums of separable functions, which grew out of tensor decompositions, are designed to represent functions in high dimensions and can be applied to high-dimensional multivariate regression. Here we compare the ability of these two methods to approximate function spaces in order to assess their relative expressive power. We show that a general neural network result can be translated into sums of separable functions if the activation function satisﬁes certain smoothness conditions. Comparatively, we show that it is possible to approximate any sums of separable function result with neural networks using the approximation of products of functions by deep neural networks. We identify general approximation schemes in both the single-layer and deep-layer settings that apply to both methods for approximating certain function classes. In particular, we show that sums of separable functions give the same error rates as neural networks for function classes such as Barron’s functions and band-limited functions. Inspired by deep neural networks, we also introduce deep layer sums of separable functions that shows similar results as deep neural networks for functions with compositional structure.

Tracking of a Fixed-Shape Moving Object Based on the Gradient Descent Method

Article

Full-text available

Jan 2022
SENSORS-BASEL

Tracking moving objects is one of the most promising yet the most challenging research areas pertaining to computer vision, pattern recognition and image processing. The challenges associated with object tracking range from problems pertaining to camera axis orientations to object occlusion. In addition, variations in remote scene environments add to the difficulties related to object tracking. All the mentioned challenges and problems pertaining to object tracking make the procedure computationally complex and time-consuming. In this paper, a stochastic gradient based optimization technique has been used in conjunction with particle filters for object tracking. First, the object that needs to be tracked is detected using the Maximum Average Correlation Height (MACH) filter. The object of interest is detected based on the presence of a correlation peak and average similarity measure. The results of object detection are fed to the tracking routine. The gradient descent technique is employed for object tracking and is used to optimize the particle filters. The gradient descent technique allows particles to converge quickly, allowing less time for the object to be tracked. The results of the proposed algorithm are compared with similar state-of-the-art tracking algorithms on five datasets that include both artificial moving objects and humans to show that the gradient-based tracking algorithm provides better results, both in terms of accuracy and speed.

BLIND RESTORATION USING CONVOLUTION NEURAL NETWORK

Article

Dec 2021

Image restoration is a branch of image processing that involves a mathematical deterioration and restoration model to restore an original image from a degraded image. This research aims to restore blurred images that have been corrupted by a known or unknown degradation function. Image restoration approaches can be classified into 2 groups based on degradation feature knowledge: blind and non-blind techniques. In our research, we adopt the type of blind algorithm. A deep learning method (SR) has been proposed for single image super-resolution. This approach can directly learn an end-to-end mapping between low-resolution images and high-resolution images. The mapping is expressed by a deep convolutional neural network (CNN). The proposed restoration system must overcome and deal with the challenges that the degraded images have unknown kernel blur, to deblur degraded images as an estimation from original images with a minimum rate of error.

The Comparison of Different Step Size Therefore, the general gradient descent algorithm is following: Initialise in aim function, recursion terminal distance ε, step size α. Calculate the gradient, the partial derivative of the loss function of aim function, .

Context in source publication

Similar publications

Citations