Fig 1 - uploaded by Dario Allegra
Content may be subject to copyright.
(a) Ground truth image of Pavia University Dataset; (b) Ground truth image of Pavia Center Dataset; (c) Ground truth image of Botswana Dataset; (d) Ground truth image of Salinas Dataset; (e) Ground truth image of Indian Pines Dataset.

(a) Ground truth image of Pavia University Dataset; (b) Ground truth image of Pavia Center Dataset; (c) Ground truth image of Botswana Dataset; (d) Ground truth image of Salinas Dataset; (e) Ground truth image of Indian Pines Dataset.

Source publication
Conference Paper
Full-text available
Classification of hyperspectral images is one of the main problem in the research field of Remote Sensing. With the advantage of spectral and spatial information, it is possible to distinguish effectively different materials and terrains. In the last decade, the intensive employing of Convolutional Neural Networks (CNN) for classification and segme...

Contexts in source publication

Context 1
... has been acquired using Reflective Optics System Imaging Spectrometer (ROSIS) over Pavia, in north Italy. This dataset includes a 610×610 pixels image with 103 bands. Spatial resolution is 1.3 meters per pixel. This area presents 9 different kinds of terrains; hence, each pixel is annotated across 9 classes. Ground truth image in shown in Fig. ...
Context 2
... the Pavia Center dataset has been acquired by ROSIS and presents the same number of classes. The hyperspectral image consists of with 102 bands; image size is 1096 × 1096 and spatial resolution 1.3 meters. Fig. 1(b) depicts the ground ...
Context 3
... Botswana has been collected by the Hyperion sensor NASA EO-1 satellite over the Okavango Delta in Botswana. This image presents 145 bands of size 1476 × 256 pixels; spatial resolution is 30 meters per pixel and wavelengths covering 400nm to 2500nm. It includes 14 distinct land cover types. The ground truth data are shown in Fig. ...
Context 4
... Salinas data have been collected by AVIRIS sensor with a spatial resolution of 3.7 meters. Image size is 512 × 217 and each pixel is labeled across 16 classes. Ground truth is reported in Fig. 1(d). Original data consisted of 224 bands, but the 20 ones related to water absorption have been discarded. Hence, it includes the 204 remaining ...
Context 5
... AVIRIS sensor over the Indian Pines site in North-western Indiana and consists of 145 × 145 pixels and 224 spectral reflectance bands. This scene is a subset of a larger one. It consists of 16 classes. However, the number of channels have been reduced to 200 by removing bands covering the region of water absorption. We report its ground truth in Fig. ...

Similar publications

Article
Full-text available
Convolutional neural networks and graph convolutional neural networks are two classical deep learning models that have been widely used in hyperspectral image classification tasks with remarkable achievements. However, hyperspectral image classification models based on graph convolutional neural networks using only shallow spectral or spatial featu...
Preprint
Full-text available
Convolutional neural networks and graph convolutional neural networks are two classical deep learning models that have been widely used in hyperspectral image classification tasks with remarkable achievements. However, hyperspectral image classification models based on graph convolutional neural networks using only shallow spectral or spatial featu...
Article
Full-text available
Convolutional neural networks (CNNs) have demonstrated impressive performance and have been broadly applied in hyperspectral image (HSI) classification. However, two challenging problems still exist: the first challenge is that redundant information is averse to feature learning, which damages the classification performance; the second challenge is...
Article
Full-text available
In recent years, Convolutional Neural Networks (CNNs) have succeeded in Hyperspectral Image Classification and shown excellent performance. However, the implicit spatial information between features, which significantly affect the classification performance of CNNs, are neglected in most existing CNN models. To address this issue, we propose a para...

Citations

... The DS is a quantity ranging between 0 and 1 which measures the overlap among the models' predictions and the reference annotations. The Acc metric is the ratio between the number of pixels correctly classified and the total number of classified pixels (Devaram et al., 2019). The kappa metric quantifies the level of agreement between two sets of categorical data by taking into account the agreement that could arise by chance, beyond what would be expected due to random concordance (Congalton, 1991;Warrens, 2015). ...
Article
Full-text available
The accurate mapping of seafloor substrate types plays a major role in understanding the distribution of benthic marine communities and planning a sustainable exploitation of marine resources. Traditionally, this activity has relied on the efforts of marine geology experts, who accomplish it manually by examining information from acoustic data along with the available ground-truth samples. However, this approach is challenging and time-consuming. Hence, it is important to explore automatic methods to replace this manual process. In this study, we investigated the potential of deep learning (U-Net) for classifying the seabed as either “bedrock” or “non-bedrock” using bathymetry and/or backscatter data, acquired with multibeam echosounders (MBES). Slope and hillshade data, derived from the bathymetry, were also included in the experiment. Several U-Net models, taking as input either one of these datasets or a combination of them, were trained using an expert delineated map as reference. The analysis revealed that U-Net has the ability to map bedrock and non-bedrock areas reliably. On our test set, the models using either bathymetry or slope data showed the highest performance metrics and the best visual match with the reference map. We also observed that they often identified topographically rough features as bedrock, which were not interpreted as such by the human expert. While such discrepancy would typically be considered an error of the model, the scale of the expert annotations as well as the different methods used by the experts to manually generate maps must be considered when evaluating the predictions quality. While encouraging results were obtained here, further research is necessary to explore the potential of deep learning in mapping other seabed types and evaluating the models’ generalization capabilities on similar datasets but different geographical locations.
... For this reason, Zhao et al. [26] developed a facial age estimation recognition system through tiny models to consume low memory in real-time. Deep neural networks show great performance improvements by using residual connections instead of plain feed-forward networks [27], and Devaram et al.'s work [28] shows both dilation and standard convolutional neural networks used to classify hyperspectral images, which proves that such combinations stabilize the network over different datasets. ...
... Previous research, indeed, proved that ELU enhances the performance on unseen data. Devaram et al. [28] employed ELU to stabilize the network in classifying various hyper-spectral images having different spectral and spacial information for remote sensing, and Devi et al. [33] have conducted experiments on Natural Language Processing on tasks such as sentiment analysis with ELU and ReLU, showing that the use of ELU produced better performances than the unique ReLU activation function with different input types. ...
Article
Full-text available
The development of a Social Intelligence System based on artificial intelligence is one of the cutting edge technologies in Assistive Robotics. Such systems need to create an empathic interaction with the users; therefore, it os required to include an Emotion Recognition (ER) framework which has to run, in near real-time, together with several other intelligent services. Most of the low-cost commercial robots, however, although more accessible by users and healthcare facilities, have to balance costs and effectiveness, resulting in under-performing hardware in terms of memory and processing unit. This aspect makes the design of the systems challenging, requiring a trade-off between the accuracy and the complexity of the adopted models. This paper proposes a compact and robust service for Assistive Robotics, called Lightweight EMotion recognitiON (LEMON), which uses image processing, Computer Vision and Deep Learning (DL) algorithms to recognize facial expressions. Specifically, the proposed DL model is based on Residual Convolutional Neural Networks with the combination of Dilated and Standard Convolution Layers. The first remarkable result is the few numbers (i.e., 1.6 Million) of parameters characterizing our model. In addition, Dilated Convolutions expand receptive fields exponentially with preserving resolution, less computation and memory cost to recognize the distinction among facial expressions by capturing the displacement of the pixels. Finally, to reduce the dying ReLU problem and improve the stability of the model, we apply an Exponential Linear Unit (ELU) activation function in the initial layers of the model. We have performed training and evaluation (via one-and five-fold cross validation) of the model with five datasets available in the community and one mixed dataset created by taking samples from all of them. With respect to the other approaches, our model achieves comparable results with a significant reduction in terms of the number of parameters.
... As the layers go deeper, the model features become more precise and reliable [36], [37]. Considering that the pooling operation of CNNs may lose the spatial information of hyperspectral images (HSIs), the dilated neural networks were introduced for HSIC [38]- [40], and the core idea of which is to avoid resolution reduction in the pooling layer while enlarging the receptive field through a dilated convolution strategy. Moreover, a multi-scale dilated residual CNN has been proposed to improve further classification performance [41]. ...
Preprint
Full-text available
Previous studies have shown the great potential of capsule networks for the spatial contextual feature extraction from {hyperspectral images (HSIs)}. However, the sampling locations of the convolutional kernels of capsules are fixed and cannot be adaptively changed according to the inconsistent semantic information of HSIs. Based on this observation, this paper proposes an adaptive spatial pattern capsule network (ASPCNet) architecture by developing an adaptive spatial pattern (ASP) unit, that can rotate the sampling location of convolutional kernels on the basis of an enlarged receptive field. Note that this unit can learn more discriminative representations of HSIs with fewer parameters. Specifically, two cascaded ASP-based convolution operations (ASPConvs) are applied to input images to learn relatively high-level semantic features, transmitting hierarchical structures among capsules more accurately than the use of the most fundamental features. Furthermore, the semantic features are fed into ASP-based conv-capsule operations (ASPCaps) to explore the shapes of objects among the capsules in an adaptive manner, further exploring the potential of capsule networks. Finally, the class labels of image patches centered on test samples can be determined according to the fully connected capsule layer. Experiments on three public datasets demonstrate that ASPCNet can yield competitive performance with higher accuracies than state-of-the-art methods.
... In [27], Zhao et al. employed the CNN and the balanced local discriminative embedding algorithm to extract spatial and spectral features from HSIs separately. In [28], Devaram et al. proposed a dilated convolution based CNN model for HSI classification and applied an oversampling strategy to deal with the class imbalance problem. In [29], a 2D spectrum based CNN framework was introduced for pixelwise HSI classification, which converts the spectral vector into 2D spectrum image to exploit the spectral and spatial information. ...
... In this section, we further compared the proposed FDMFN method with another three state-of-the-art deep learning based HSI classification approaches: the dilated convolution based CNN model (Dilated-CNN) [28], the 2D spectrum based CNN model [29], and the artificial neuron network with center-loss and adaptive spatial-spectral center classifier (ANNC-ASSCC) [30]. ...
... The detailed information of the Salinas dataset can be also found in [53]. Following Dilated-CNN [28], for each dataset, 60% of the labeled samples per class were randomly selected for training. Next, the proposed FDMFN was compared with the 2D spectrum based CNN model [29] on the IP, KSC, and Salinas datasets. ...
Article
Full-text available
The convolutional neural network (CNN) can automatically extract hierarchical feature representations from raw data and has recently achieved great success in the classification of hyperspectral images (HSIs). However, most CNN based methods used in HSI classification neglect adequately utilizing the strong complementary yet correlated information from each convolutional layer and only employ the last convolutional layer features for classification. In this paper, we propose a novel fully dense multiscale fusion network (FDMFN) that takes full advantage of the hierarchical features from all the convolutional layers for HSI classification. In the proposed network, shortcut connections are introduced between any two layers in a feed-forward manner, enabling features learned by each layer to be accessed by all subsequent layers. This fully dense connectivity pattern achieves comprehensive feature reuse and enforces discriminative feature learning. In addition, various spectral-spatial features with multiple scales from all convolutional layers are fused to extract more discriminative features for HSI classification. Experimental results on three widely used hyperspectral scenes demonstrate that the proposed FDMFN can achieve better classification performance in comparison with several state-of-the-art approaches.
Article
In past few years, hyperspectral image classification (HSIC) has been one of the most sparkling fields of research in the area of remote sensing. The presence of very complex characteristics and nonlinearity in hyperspectral images (HSIs), makes the classification task very crucial. Recently, capsule networks (CapsNets) have drawn huge attention in HSIC and demonstrated remarkable performance with transcendent classification accuracy. However, the availability of very limited training samples makes the HSIC task more challenging for existing CapsNet-based models. Also, the utilization of spectral–spatial features efficiently, is considered to be very important in improving the classification performance. To address these issues, the authors have proposed a spectral–spatial three-dimensional convolutional capsule (SS-3D-ConvCapsule) network model in this article. In the proposed work, a three-dimensional convolutional capsule layer based upon a three-dimensional dynamic routing algorithm is utilized to exploit spectral–spatial features for the classification task. Furthermore, the principal component analysis (PCA), as preprocessing technique, is utilized for dimensionality reduction. Moreover, a very limited number of trainable parameters are utilized to train the SS-3D-ConvCapsule network model in order to avoid network design complexities and overfitting problem. Furthermore, the experiments are conducted over three well known HSI datasets viz. Pavia University, Salinas and Indian Pines to investigate the performance of proposed network along with eight state-of-art deep learning models. The experimental results are compared in terms of kappa coefficient, overall accuracy and average accuracy. The comparison reveals that the proposed model has clearly outperformed all of the state-of-art models in terms of classification accuracy.
Chapter
Mobile Telepresence Robots represent a class of robotic platforms, characterized by a video conferencing system mounted on a mobile robotic base, which allows a pilot user to move around in the robot’s environment. These commercially available platforms are relatively cheap and straightforward, yet robust enough to operate continuously in a dynamic environment. Their simplicity and robustness make them particularly suitable for the application in an elderly care context. Although the technology used on these robotic platforms has evolved considerably in recent years, these tools are meant to have no or minimal autonomy and are, hence, mostly relegated to provide pure telepresence services for video calls between the older users and their carers.This work aims to lay the foundations to increase the autonomy of mobile telepresence robots, both by supporting teleoperation through shared approaches and offering services to users in total autonomy. To this purpose, different artificial intelligence technologies such as Reasoning, Knowledge Representation, Automated Planning, Machine Learning, Natural Language Processing, Advanced Perception and Navigation must coexist on limited hardware. An architecture aiming to integrate these technologies is proposed together with backbone services that integrate classical and innovative AI with robotics. Additionally, the problems that arise from the integration of heterogeneous technologies such as plan adaptation needs, shared navigation challenges and the generation of data-driven models able to run on not-performant hardware, are presented along with possible solutions exemplified on the older users assistance domain.KeywordsEnhanced telepresenceRobotics and perceptionPlanning and executionActive ageing
Article
Previous studies have shown the great potential of capsule networks for the spatial contextual feature extraction from hyperspectral images (HSIs). However, the sampling locations of the convolutional kernels of capsules are fixed and cannot be adaptively changed according to the inconsistent semantic information of HSIs. Based on this observation, this paper proposes an adaptive spatial pattern capsule network (ASPCNet) architecture by developing an adaptive spatial pattern (ASP) unit, that can rotate the sampling location of convolutional kernels on the basis of an enlarged receptive field. Note that this unit can learn more discriminative representations of HSIs with fewer parameters. Specifically, two cascaded ASP-based convolution operations (ASPConvs) are applied to input images to learn relatively high-level semantic features, transmitting hierarchical structures among capsules more accurately than the use of the most fundamental features. Furthermore, the semantic features are fed into ASP-based conv-capsule operations (ASPCaps) to explore the shapes of objects among the capsules in an adaptive manner, further exploring the potential of capsule networks. Finally, the class labels of image patches centered on test samples can be determined according to the fully connected capsule layer. Experiments on three public datasets demonstrate that ASPCNet can yield competitive performance with higher accuracies than state-of-the-art methods. For the convenience of follow-up research and engineering applications, we packaged the algorithm into an arbitrary plug-in module and released it at https://github.com/Cimy-wang.
Chapter
In 2015, the Gough Map was imaged using a hyperspectral imaging system while in the collection at the Bodleian Library, University of Oxford. It is one of the earliest surviving maps of Britain. Hyperspectral image (HSI) classification has been widely used to identify materials in remotely sensed images. Recently, hyperspectral imaging has been applied to historical artifact studies. The collection of the HSI data of the Gough Map was aimed at pigment mapping for towns and writing with different spatial patterns and spectral (color) features. We developed a spatial-spectral deep learning framework called 3D-SE-ResNet to automatically classify pigments in large HSI of cultural heritage artifacts with limited reference (labelled) data and have applied it to the Gough Map. With much less effort and much higher efficiency, this is a breakthrough in object identification and classification in cultural heritage studies that leverages the spectral and spatial information contained in this imagery, providing codicological information to cartographic historians.
Article
Full-text available
This study presents a deep extraction of localized spectral features and multi-scale spatial features convolution (LSMSC) framework for spectral-spatial fusion based classification of hyperspectral images (HSIs). First, adjacent spectral bands are grouped based on their similarity measurements, where the whole hypercube is partitioned into several sub-cubes, each corresponding to one band group. Then, the proposed localized spectral features extraction (LSF) strategy is used to extract localized spectral features, which are extracted from each band group using the 1D convolutional neural network (CNN). Meanwhile, the proposed HiASPP strategy is employed to extract the multi-scale features from the first several principal components of each sub-cube. Finally, the extracted spectral and spatial features are concatenated for spectral-spatial fusion based classification of HSI. Experiments conducted on three publicly available datasets have demonstrated that the proposed architecture outperforms several state-of-the-art approaches.