ChapterPDF Available

Deep Learning for Satellite Image Classification

Authors:
  • National Egyptian E-Learning University
  • Faculty of Computers and Information Sciences Ain Shams University

Abstract and Figures

Nowadays, large amounts of high resolution remote-sensing images are acquired daily. However, the satellite image classification is requested for many applications such as modern city planning, agriculture and environmental monitoring. Many researchers introduce and discuss this domain but still, the sufficient and optimum degree has not been reached yet. Hence, this article focuses on evaluating the available and public remote-sensing datasets and common different techniques used for satellite image classification. The existing remote-sensing classification methods are categorized into four main categories according to the features they use: manually feature-based methods, unsupervised feature learning methods, supervised feature learning methods, and object-based methods. In recent years, there has been an extensive popularity of supervised deep learning methods in various remote-sensing applications, such as geospatial object detection and land use scene classification. Thus, the experiments, in this article, carried out on one of the popular deep learning models, Convolution Neural Networks (CNNs), precisely AlexNet architecture on a standard sounded dataset, UC-Merceed Land Use. Finally, a comparison with other different techniques is introduced.
Content may be subject to copyright.
Deep Learning for Satellite Image
Classication
Mayar A. Shafaey
1(&)
, Mohammed A.-M. Salem
1,2
,
H. M. Ebied
1
, M. N. Al-Berry
1
, and M. F. Tolba
1
1
Faculty of Computers and Information Sciences,
Ain Shams University, Cairo, Egypt
mayar.al.mohamed@fcis.asu.edu.eg,
{salem,maryam_nabil}@cis.asu.edu.eg,
hala.m@outlook.com, fahmytolba@gmail.com
2
Faculty of Media Engineering and Technology,
German University, Cairo, Egypt
Abstract. Nowadays, large amounts of high resolution remote-sensing images
are acquired daily. However, the satellite image classication is requested for
many applications such as modern city planning, agriculture and environmental
monitoring. Many researchers introduce and discuss this domain but still, the
sufcient and optimum degree has not been reached yet. Hence, this article
focuses on evaluating the available and public remote-sensing datasets and
common different techniques used for satellite image classication. The existing
remote-sensing classication methods are categorized into four main categories
according to the features they use: manually feature-based methods, unsuper-
vised feature learning methods, supervised feature learning methods, and object-
based methods. In recent years, there has been an extensive popularity of
supervised deep learning methods in various remote-sensing applications, such
as geospatial object detection and land use scene classication. Thus, the
experiments, in this article, carried out on one of the popular deep learning
models, Convolution Neural Networks (CNNs), precisely AlexNet architecture
on a standard sounded dataset, UC-Merceed Land Use. Finally, a comparison
with other different techniques is introduced.
Keywords: Remote-sensing Satellite image Deep learning
Convolution Neural Networks (CNNs) UC-Merceed Land Use
Parallel computing
1 Introduction
A Satellite Image is an image of the whole or part of the earth taken using articial
satellites. It can either be visible light images, water vapor images or infrared images
[1]. The different types of satellites produce (high spatial, spectral, and temporal)
resolution images that cover the whole Earth in less than a day. The large-scale nature
of these data sets introduces new challenges in image analysis.
The analysis and classication of remote-sensing images is very important in many
practical applications, such as natural hazards and geospatial object detection, precision
©Springer Nature Switzerland AG 2019
A. E. Hassanien et al. (Eds.): AISI 2018, AISC 845, pp. 383391, 2019.
https://doi.org/10.1007/978-3-319-99010-1_35
agriculture, urban planning, vegetation mapping, and military monitoring [2]. Despite
decades of research, the degree of automation for remote-sensing images analysis still
remains low [3].
The main objective of this paper is to present a literature review on the recent deep-
learning based techniques for satellite image classication and the available training
and testing datasets. Moreover, testing results will present on one popular dataset using
the AlexNet architecture of the Convolution Neural Networks (CNNs).
In the next section, a list of available datasets and their specications are presented.
A review on recent classication approaches applied on one or some of these datasets is
presented in Sect. 3. The experimental work followed by results and evaluations are
presented in Sect. 4. Finally, conclusions are highlighted in Sect. 5.
2 Review on Publicly Remote Sensing Images Datasets
In the past years, several high resolution remote-sensing image datasets have been
introduced by different groups to enable machine-learning based research for scene
classication and to evaluate different methods in this eld. The authors will review
some publicly available sets in this section, as given in Table 1. The table below shows
the number of scene classes, images per class, total images, size of images, and spatial
resolution.
The most images in these datasets are imported from Google Earth Engine and
cover the areas of: agricultural, airplane, baseball diamond, beach, buildings, chaparral,
dense residential, forest, freeway, golf course, harbor, intersection, medium density
residential, mobile home park, overpass, parking lot, river, runway, sparse residential,
storage tanks, and so on. Except the dataset in [13]Brazilian Coffee Scene dataset,
Table 1. Comparison between the different remote-sensing datasets proposed
Data set Scene
classes
Images/class Total
images
Spatial
resolution
Image
sizes
AID [4] 30 200400 10000 High 600 600
Patter Net [5] 38 800 30400 Up to 0.8 256 256
RSI-CB256 [6] 35 Various 34000 0.33 256 256
SAT_4 & SAT_6 [7] Patches (500000 + 405000) Low 28 28
UC-Merced Land Use [8] 21 100 2100 0.3 256 256
WHU-RS19 [9]19*50 1005 Up to 0.5 600 600
SIRI-WHU [10] 12 200 2400 2 200 200
RSSCN7 [11] 7 400 2800 400 400
RSC11 [12]11*100 1232 0.2 512 512
Brazilian Coffee [13] 2 1438 2876 Low 64 64
NWPU-RESISC45 [14] 45 700 31500 *300.2 256 256
384 M. A. Shafaey et al.
cropped from SPOT satellite images and contains only two scene classes, which is
appropriate for multi-class scene classication methods. In contradiction, the large
number of classes and images in NWPU-RESISC45 [14] dataset, will impact positively
the classication results.
However, the UC-Merced Land-Use [8] in Fig. 1is the most popular and has been
widely used for the task of remote-sensing image scene classication and retrieval so
far. So, the authors will choose it to carry out the classication experiment.
3 Remote Sensing Images Classication Methods
There are long and proud researches during the last and current decades that were
carried out on the satellite images for the task of scene classication. From the vast
publications of this topic, generally, the existing scene classication methods could
summarized into four main categories according to the features they used: manually
feature based methods, unsupervised classication methods, supervised learning
methods, and object-based methods.
3.1 Manually Feature Based Methods
A fundamental step in image classication is based on handcrafted features. These
methods measure the skills of researchers to design and extract important features, such
as color, orientation, texture, shape, spatial and spectral information, or their combi-
nation. Some of the most common and essential features that are used for scene
classication are: Color histograms -Texture descriptors GIST: describe orienta-
tions of a scene SIFT:describe sub-regions of a scene HOG: describe gradient of
objects [1517,40].
Fig. 1. 21 Classes representative [(a)(u)] of the UC-Merced Land-Use dataset [34].
Deep Learning for Satellite Image Classication 385
3.2 Unsupervised Classication Methods
The limitations of manually feature based methods could be solved by self-learning
features from images. This strategy is called unsupervised learning method. In recent
years, unsupervised feature learning from unlabeled input data has become an attractive
alternative to handcrafted features [18].
The idea behind that strategy is rst grouping the image pixels into clusters based
on their properties. By learning features from images instead of relying on manually
designed features, we can obtain more discriminative feature that is better suited for the
classication problem [19]. Such clustering algorithms are: principal component
analysis (PCA) [20], k-means clustering [21], sparse coding [22], and so on.
In real applications, the aforementioned unsupervised feature learning methods
have achieved good performance for land use classication, especially compared to
handcrafted based methods. For example, authors in [2325] applied unsupervised
methods and made a signicant progress for remote-sensing scene classication.
3.3 Supervised Learning Methods
Starting year 2006, the volcano of researches relied on supervised learning methods
which need to use labeled data to extract more powerful features, especially, a deep
learning method which made by Hinton and Salakhutdinov [26]. There exists different
numbers of deep learning models, such as deep belief nets (DBN) [27], deep Boltz-
mann machines (DBM) [28], stacked auto-encoder (SAE) [29], Convolutional Neural
Networks (CNNs) [30], and so on. In this article, the authors mainly review the widely
used deep learning method CNNs.
The basic concept of CNN is to train huge multi-layer networks for giving
impressive classication results of large scale input images. The CNN itself has dif-
ferent models like: AlexNet, GoogleNet, ResNet, VGGNet, CaffeNet etc. [31].
Limited by the space, a short and highlight description of AlexNet architecture was
given. The net consists of 25 layers: 5 convolution layers, max-pooling layers, dropout
layers, and 3 fully connected layers, as shown in Fig. 2. It is trained on ImageNet data,
which contained over 15 million annotated images from a total of over 22,000 cate-
gories and Used ReLU for the nonlinearity functions [32].
Fig. 2. ImageNet classication with AexNet CNN [32]
386 M. A. Shafaey et al.
Table 2represents some of authors who used the CNN models in their experiments
for large scale image scene classication and gave the proud accuracy values which
demonstrate the power of CNN learning model.
3.4 Object-Based Methods
Unlike pixel-based or image-based classication, object-based image classication
groups pixels into representative shapes and sizes and assigns each group to a semantic
object. This process relies on multi-resolution segmentation. Multi-resolution seg-
mentation produces homogenous objects by grouping pixels. It generates objects with
different scales in an image simultaneously. These objects are more meaningful
because they represent features in the image [38,41].
The question here is how to select the appropriate image classication techniques.
It is based on common sense of the engineering. Lets say you want to classify water in
a high spatial resolution image containing grasses. You decide to choose all pixels with
low NDVI (Normalized Difference Vegetation Index) in that image. NDVI is used to
analyze remote sensing measurements and assess whether the target being observed
contains live green vegetation or not. But this could also misclassify other pixels in the
image that arent water i.e. pixels of the sky. For this reason, pixel-based classication
as unsupervised and supervised classication gives a salt and pepper look.
As illustrated in this article, spatial resolution is an important factor when selecting
image classication techniques. Hence, when you have low spatial resolution, both
traditional pixel-based and object-based image classication techniques perform well.
But when you have high spatial resolution, object-based image classication is superior
to traditional pixel-based classication [39].
4 Experiments and Results
Taking advantages of the availability of UC-Merceed dataset [8], the AlextNet CNN
approach was applied to represent the large scale image classication process. In this
section, the experiments steps will be described, i.e., software, hardware specication,
results, comments, and comparisons.
Table 2. Survey of recent publications applied CNNs in their experiments on large scale
remote-sensing (RS) images, UC-Mercced dataset
References Year Application Method Accuracy
[33] 2015 Multi-spectral land use
classication
Deep CNN 93.48%
[34] 2015 Land Use RS classication GoogleNet, and
CaffeNet
97%, and
95.48%
[35] 2016 RS scene classication Large patch CNN Effective results
[36] 2016 Large scale image classication CNN 92.4%
[37] 2018 Remote sensing scene
classication
CNN 92.43%
Deep Learning for Satellite Image Classication 387
4.1 Experimental Procedure
The experiment ran on two different computers. Machine 1 has a processor: Intel®
Corei7-2670QM CPU @ 2.20 GHz8 GB RAM. Machine 2 equipped with
NVIDIA GTX 1050 4G cc: 6.1 GPU: Intel® Corei7-7700HQ @ 2.20 GHz16 GB
RAM. The time elapsed on machine 1 was 1800 s and on machine 2 was 14 s. Thanks
to Graphical Processing Unit (GPU) for giving an impressive and signicant execution
time. The parallel computing optimizes the performance 100 times than the serial
computations.
Hence, The experiment ran on Machine 2 and Matlab
®
software using alexnet()
built-in function which is trained on a subset of the ImageNet database *1.2 million
images - and can classify images into 1000 object categories. This function requires
Neural Network ToolboxModel for AlexNet Network. The basic three steps are
rstly resizing the image dimension from 256 256 to 227 227 as a required input
for the CNN. The second step is to choose the training set percentage. And thirdly, train
the multiclass SVM classier, extract test features using the CNN, and pass them to the
trained classier to get the known labels. Finally, the classication results are given by
computing the summation of main diagonal of the confusion matrix divided by the
diagonal elements number.
A number of experiments were carried out to assess the performance of the CNN
using the well-known UC-Merceed Land Use dataset. The UC-Merceed Land Use
dataset contains 2100 images, 21 distinct classes and every class contains 100 different
images. In the experiments, the size of training set ranged from 10 to 90% of the 100
different images per class and the remaining images where used for testing. Figure 3
shows the correct classication accuracy vs. the size of training set percentage.
80
82
84
86
88
90
92
94
96
0 0.2 0.4 0.6 0.8 1
Classification Acurracy Pct. %
Training-to-Testing Ratio
Classification Aaccuracy
Classification
Aaccuracy
Fig. 3. The classication accuracy for UC-Merceed Land Use dataset using the AlexNet CNN.
The x-axis represents the interval of training to testing set ratio [0.10.9]. The y-axis represents
the classication accuracy.
388 M. A. Shafaey et al.
The rst trial started to split 10% of images into training set which gave 81.3%
accuracy value. Then, repeated the experiment eight times up to 90% of images into
training set which gave around 94% accuracy value. The gure below illustrates that
the gradually increase of training images impacts positively the classication result.
4.2 Evaluation and Discussions
Compared with other CNN models, GoogleNet and CaffeNet, mentioned and discussed
in [34], and applied also on UC-Merceed dataset, the authors observed that the clas-
sication accuracy gained by GoogleNet (*97%) was better than whose gained by
CaffeNet (*94%) and AlexNet (*94%). However, The AlexNet is faster than Goo-
gleNet model. The two models ran on the same GPU, as mentioned before, the AlexNet
executed in only 14 s, but GoogleNet consumed 51 s, which is approximately 4 times
slower.
On the one hand, in comparison with traditional handcrafted features that require a
high mental thinking and skills, deep learning features are learned from data auto-
matically via deep architecture neural networks. This is the key advantage of deep
learning methods.
On the other hand, and compared with aforementioned unsupervised feature
learning methods i.e. sparse coding, deep learning models can learn more powerful
because it is composed of multiple processing layers which is more applicable for large
scale and remote-sensing image scene classication. The deep feature learning methods
act as a human brain in which every level uses the information from the previous level
to learn deeply and accurately.
The following articles support our research. In [23], the high-resolution satellite
scene classication using a sparse coding carried out on UC-Merceed dataset and
reached about 91% accuracy. And in [24], the unsupervised feature learning via
spectral clustering of multidimensional patches was carried out on the same dataset and
achieved 90% right classication.
5 Conclusions
The automation target detection or recognition, and high resolution remotely sensed
image classication are two hot topics nowadays. Hence, this paper rstly represented a
comprehensive review of common and freely remote-sensing datasets to enable the
community to develop the large scale image scene classication task. Then, it gave a
summary of recent methods used for this task. Finally, the CNN deep learning method
applied on UC-Merceed dataset evaluated and reported the results to compare against
state-of-the-art and as a baseline for future research.
Deep learning methods can undoubtedly offer better feature representations for the
related remote-sensing task, and there is a bright prospect of seeing more and more
researchers dedicated to learning better features for the target detection and scene
classication tasks by utilizing appropriate deep learning methods.
Deep Learning for Satellite Image Classication 389
Thanks to parallel computing and GPUs for optimizing and enhancing the exe-
cution time 100than the serial computations, our experiment ran in time not
exceeding 14 s to classify one testing image out of 2100 images.
References
1. NASA: What Is a Satellite? NASA Knows! (Grades 58) series (2014)
2. Zhang, L., Xia, G., Wu, T., Lin, L., Tai, X.: Deep learning for remote sensing image
understanding. J. Sens. 2016,12 (2016)
3. Marmanisad, D., Wegnera, J., Gallianib, S., Schindlerb, K., Datcuc, M., Stillad, U.:
Semantic segmentation of aerial images with an ensemble of CNNs. ICWG 3(4), 18 (2016)
4. AID Dataset. http://www.lmars.whu.edu.cn/xia/AID-project.html. Accessed 16 Feb 2018
5. PatternNet Dataset. https://sites.google.com/view/zhouwx/dataset?authuser=0. Accessed 16
Feb 2018
6. RSI Dataset. https://github.com/lehaifeng/RSI-CB. Accessed 16 Feb 2018
7. SAT_4 & SAT_6. http://csc.lsu.edu/*saikat/deepsat/. Accessed 16 Feb 2018
8. UC-Merceed Land Use Dataset. http://weegee.vision.ucmerced.edu/datasets/landuse.html.
Accessed 16 Feb 2018
9. WHU-RS19 Dataset. https://www.google.com/url?q=http%3A%2F%2Fwww.xinhua-uid.
com%2Fpeople%2Fyangwen%2FWHU-RS19.html&sa=D&sntz=1&usg=AFQjCNFzrOnVi
W6TWOoFbN1IaIMfyLdJhQ. Accessed 16 Feb 2018
10. SIRI-WHU Dataset. http://www.lmars.whu.edu.cn/prof_web/zhongyanfei/e-code.html.
Accessed 16 Feb 2018
11. RSSCN7 Dataset. https://www.dropbox.com/s/j80iv1a0mvhonsa/RSSCN7.zip?dl=0.
Accessed 16 Feb 2018
12. RSC11 Dataset. https://www.yeastgenome.org/locus/ARP7. Accessed 16 Feb 2018
13. Brazilian Coffee Dataset. http://www.patreo.dcc.ufmg.br/downloads/brazilian-coffee-dataset/.
Accessed 16 Feb 2018
14. NWPU-RESISC45 Dataset. https://www.google.com/url?q=http%3A%2F%2Fwww.
escience.cn%2Fpeople%2FJunweiHan%2FNWPU-RESISC45.html&sa=D&sntz=1&usg=
AFQjCNGs2uMeX7KT2QvEMzcD5uF4-aQChw. Accessed 16 Feb 2018
15. Cheng, G., Han, J., Lu, X.: Remote sensing image scene classication: benchmark and state
of the art. Proc. IEEE 105(10), 117 (2017)
16. Thomas, M., Farid, M., Yakoub, B., Naif, A.: A fast object detector based on high-order
gradients and Gaussian process regression for UAV images. Int. J. Remote Sens. 36(10),
27132733 (2015)
17. Aptoula, E.: Remote sensing image retrieval with global morphological texture descriptors.
IEEE Trans. Geosci. Remote Sens. 52(5), 30233034 (2014)
18. Mekhal, M., Melgani, F., Bazi, Y., Alajlan, N.: Land-use classication with compressive
sensing multifeature fusion. IEEE Geosci. Remote Sens. 12(10), 21552159 (2015)
19. Cheriyadat, A.: Unsupervised feature learning for aerial scene classication. IEEE Trans.
Geosci. Remote Sens. 52(1), 439451 (2014)
20. Jolliffe, I.: Principal component analysis. Springer, New York (2002)
21. Zhao, B., Zhong, Y., Zhang, L.: A spectralstructural bag-of-features scene classier for
very high spatial resolution remote sensing imagery. Remote Sens. 116,7385 (2016)
22. Olshausen, B., Field, D.: Sparse coding with an overcomplete basis set: a strategy employed
by V1? Vision. Res. 37(23), 33113325 (1997)
390 M. A. Shafaey et al.
23. Sheng, G., Yang, W., Xu, T., Sun, H.: High-resolution satellite scene classication using a
sparse coding based multiple feature combination. Int. J. Remote Sens. 33(8), 23952412
(2012)
24. Hu, F., Xia, G., Wang, Z., Huang, X., Zhang, L., Sun, H.: Unsupervised feature learning via
spectral clustering of multidimensional patches for remotely sensed scene classication.
IEEE J. Select. Top. Appl. Earth Obs. Remote Sens. 8(5), 20152030 (2015)
25. Daoyu, L., Kun, F., Yang, W., Guangluan, X., and Xian, S.: MARTA GANs: unsupervised
representation learning for remote sensing image classication. National Natural Science
Foundation of China (2017)
26. Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks.
Science 313(5786), 504507 (2006)
27. Hinton, G., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural
Comput. 18(7), 15271554 (2006)
28. Salakhutdinov, R., Hinton, G.: An efcient learning procedure for deep Boltzmann
machines. Neural Comput. 24(8), 19672006 (2012)
29. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising
autoencoders: Learning useful representations in a deep network with a local denoising
criterion. Mach. Learn. Res. 11, 33713408 (2010)
30. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: OverFeat:
integrated recognition, localization and detection using convolutional networks. In:
Proceedings of the International Conference on Learning Representations, pp. 116 (2014)
31. Simonyan K., Zisserman, A.: Very deep convolutional networks for large-scale image
recognition. In: Proceedings of the International Conference on Learning Representations,
pp. 113 (2015)
32. Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classication with deep convolutional
neural networks. In: Proceedings of the Conference on Advances in Neural Information
Processing Systems, pp. 10971105 (2012)
33. Luus, F., Salmon, B., Van Den Bergh, F., Maharaj, B.: Multiview deep learning for land-use
classication. IEEE Geosci. Remote Sens. Lett. 12(12), 24482452 (2015)
34. Castelluccio, M., Poggi, G., Sansone, C., Verdoliva, L.: Land Use Classication in Remote
Sensing Images by Convolutional Neural Networks. Cornell University, Ithaca (2015)
35. Zhong, Y., Fei, F., Zhang, L.: Large patch convolutional neural networks for the scene
classication of high spatial resolution imagery. Appl. Remote Sens. 10(2), 025006025006
(2016)
36. Marmanis, D., Datcu, M., Esch, T., Stilla, U.: Deep learning earth observation classication
using ImageNet pretrained networks. IEEE Geosci. Remote Sens. Lett. 13(1), 105109
(2015)
37. Jingbo, C., Chengyi, W., Zhong, M., Jiansheng, C., Dongxu, H., Stephen, A.: Remote
sensing scene classication based on convolutional neural networks pre-trained using
attention-guided sparse lters. Remote Sens. 10(290), 116 (2018)
38. Blaschke, T.: Object based image analysis for remote sensing. ISPRS J. Photogramm.
Remote Sens. 65(1), 216 (2010)
39. GIS Geography. http://gisgeography.com/image-classication-techniques-remote-sensing/.
Accessed Feb 16 2018
40. Tahoun, M., Nagaty, K., El-Arief, T., A-Megeed, M.: A robust content-based image retrieval
system using multiple features representations. In: Proceedings of IEEE Networking,
Sensing and Control, pp. 116122 (2005)
41. Mohammed, A-M.: Multiresolution Image Segmentation. Ph.D. Thesis, Department of
Computer Science, Humboldt-Universitaet zu Berlin, Germany (2008)
Deep Learning for Satellite Image Classication 391
... According to a study conducted by UN, dry land occupied an increased portion of 41.3% in whole land area by 2022 [2]. To mitigate the negative implications of abnormal weather change, quantities of remote-sensing images are collected daily for being analysed [3]. That had effects on the process of monitoring the concrete landform erosion. ...
... That had effects on the process of monitoring the concrete landform erosion. However, it seems that technology and methods that most scientists have invented and raised are not satisfied for the demand of practical use [3]. ...
... For instance, the development of image recognition has provided practical experience and technology support for urban and rural cultural heritage [5]. Hence, supervised deep-learning had been widely used in remote-sensing areas such as geospatial target detection [3]. Cowls claimed that efficiency to train an artificial model to a relatively high level had improved by nearly 44 times compared to the beginning [6]. ...
Article
Full-text available
One significant assessing criteria of climate change is geometric evolution. The rate of evolution reveals the speed that environment worsens. Advanced space mirror monitors that and generates images timely. However, it might be difficult for human to deal with collected numerous image-related data. In previous research, convolutional neural network is regarded to have specific advantage in resolving image recognition tasks. Hence, a new type of convolutional neural network model is applied to identify different kinds of landscape. Virtually, this model is called Efficient Net which based on landscape recognition dataset with 5 classes of landscapes. The study also introduces the fine-tuning to further improve the performance of the model. To evaluate the model, the precision, recall, F1 score, accuracy and loss are adopted as assessing criteria. The results shows that the model predicts the target dataset to a great extent. However, it has been tested that the class of mountain might not be suitable for predicting because of vague criterion. That is helpful in real-condition geographical applications and environmental governance.
... The increasing prominence of deep learning in this domain underscores its capacity for efficient classification, leveraging automatic learning capabilities and outperforming manual as well as unsupervised techniques. The exploration of deep learning's potential in remote sensing points towards a promising future, highlighting its ability to enhance feature representation (Shafaey et al. 2018). ...
Article
Full-text available
Floods are one of the most frequent natural disasters, often resulting in widespread devastation. Identifying floods accurately is crucial for disaster management as it helps to locate areas requiring urgent assistance and streamline post-flood evacuation processes. Recently, deep learning models, such as Convolutional Neural Networks (CNN), have become predominant for image classification tasks, as well as flood classification problems. Deep ensemble techniques,i.e. combining several deep learning architectures, are still quite new in many fields and have not been studied extensively despite showing promising results in flood classification. In this research, we develop an ensemble deep learning framework that utilizes eight state-of-the-art CNN architectures, namely MobileNet V2, ResNet 50, VGG 16, DenseNet 201, Inception V3, EfficientNet B5, NasNet Large, and Xception. The aim is to address the gap of deep ensemble learning in flood classification and provide a more effective approach to identifying potential flooding scenarios from a wide range of visual datasets. We utilize FloodNet and flood area segmentation datasets to train, test, and validate our models. In the testing phase, our ensemble model outperforms several individual benchmark models, achieving a training accuracy of 98.9% and a test accuracy of 97.4%. Our proposed methodology will predict floods and conduct early assessments of affected areas efficiently.
... However, the literature indicates various challenges in land cover classification. These include transitioning from basic algebraic methods to more complex AI-based approaches, such as Machine Learning and Deep Learning [16][17][18], aimed at generating precise data about settlement areas. The extraction of information and classification of images represent considerable challenges, leading researchers to develop innovative systems for classifying input image pixels. ...
Article
Full-text available
This study embarks on an evaluation of the efficacy of six supervised machine learning algorithms in the classification of land cover in Casablanca, Morocco, utilizing Landsat satellite imagery. Employing the Google Earth Engine (GEE) platform for data collection, the research encompasses meticulous pre-processing steps and the application of various supervised algorithms, followed by a comprehensive evaluation of their performance. The city of Casablanca, characterized by rapid urbanization and evolving land-use patterns, presents an exemplary case for scrutinizing the algorithms' ability to accurately classify different land zones. These zones encompass water bodies, urban areas, agricultural lands, barren terrains, and forests. The algorithms under scrutiny include Support Vector Machine (SVM), Random Forest (RF), Classification and Regression Trees (CART), Minimum Distance (MD), Decision Tree (DT), and Gradient Tree Boosting (GTB). The assessment of classification outcomes leverages multiple accuracy indicators, namely overall accuracy (OA), Kappa coefficient, user accuracy (UA), and producer accuracy (PA). Results indicate that the Random Forest algorithm exhibits superior performance, achieving an accuracy of 95.42%, while the Support Vector Machine algorithm lags with a lower accuracy of 83%. This investigation underscores the critical role of advanced machine learning algorithms in land cover classification, a pivotal aspect for urban and regional planning, natural resource management, and risk assessment in rapidly changing environments.
Article
Full-text available
Image segmentation and identification are crucial to modern medical image processing techniques. This research provides a novel and effective method for identifying and segmenting liver tumors from public CT images. Our approach leverages the hybrid ResUNet model, a combination of both the ResNet and UNet models developed by the Monai and PyTorch frameworks. The ResNet deep dense network architecture is implemented on public CT scans using the MSD Task03 Liver dataset. The novelty of our method lies in several key aspects. First, we introduce innovative enhancements to the ResUNet architecture, optimizing its performance, especially for liver tumor segmentation tasks. Additionally, by harassing the capabilities of Monai, we streamline the implementation process, eliminating the need for manual script writing and enabling faster, more efficient model development and optimization. The process of preparing images for analysis by a deep neural network involves several steps: data augmentation, a Hounsfield windowing unit, and image normalization. ResUNet network performance is measured by using the DC metric Dice coefficient. This approach, which utilizes residual connections, has proven to be more reliable than other existing techniques. This approach achieved DC values of 0.98% for detecting liver tumors and 0.87% for segmentation. Both qualitative and quantitative evaluations show promising results regarding model precision and accuracy. The implications of this research are that it could be used to increase the precision and accuracy of liver tumor detection and liver segmentation, reflecting the potential of the proposed method. This could help in the early diagnosis and treatment of liver cancer, which can ultimately improve patient prognosis.
Article
Full-text available
Monitoring the dynamics of land use and land cover (LULC) is imperative in the changing climate and evolving urbanization patterns worldwide. The shifts in land use have a significant impact on the hydrological response of watersheds across the globe. Several studies have applied machine learning (ML) algorithms using historical LULC maps along with elevation data and slope for predicting future LULC projections. However, the influence of other driving factors such as socio-economic and climatological factors has not been thoroughly explored. In the present study, a sensitivity analysis approach was adopted to understand the effect of both physical (elevation, slope, aspect, etc.) and socio-economic factors such as population density, distance to built-up, and distance to road and rail, as well as climatic factors (mean precipitation) on the accuracy of LULC prediction in the Brahmani and Baitarni (BB) basin of Eastern India. Additionally, in the absence of the recent LULC maps of the basin, three ML algorithms, i.e., random forest (RF), classified and regression trees (CART), and support vector machine (SVM) were utilized for LULC classification for the years 2007, 2014, and 2021 on Google earth engine (GEE) cloud computing platform. Among the three algorithms, RF performed best for classifying built-up areas along with all the other classes as compared to CART and SVM. The prediction results revealed that the proximity to built-up and population growth dominates in modeling LULC over physical factors such as elevation and slope. The analysis of historical data revealed an increase of 351% in built-up areas over the past years (2007–2021), with a corresponding decline in forest and water areas by 12% and 36% respectively. While the future predictions highlighted an increase in built-up class ranging from 11 to 38% during the years 2028–2070, the forested areas are anticipated to decline by 4 to 16%. The overall findings of the present study suggested that the BB basin, despite being primarily agricultural with a significant forest cover, is undergoing rapid expansion of built-up areas through the encroachment of agricultural and forested lands, which could have far-reaching implications for the region’s ecosystem services and sustainability.
Article
Full-text available
The massive yearly population growth is causing hazards to spread swiftly around the world and have a detrimental impact on both human life and the world economy. By ensuring early prediction accuracy, remote sensing enters the scene to safeguard the globe against weather-related threats and natural disasters. Convolutional neural networks, which are a reflection of deep learning, have been used more recently to reliably identify land use in remote sensing images. This work proposes a novel bottleneck residual and self-attention fusion-assisted architecture for land use recognition from remote sensing images. First, we proposed using the fast neural approach to generate cloud-effect satellite images. In neural style, we proposed a 5-layered residual block CNN to estimate the loss of neural-style images. After that, we proposed two novel architectures, named 3-layered bottleneck CNN architecture and 3-layered bottleneck self-attention CNN architecture, for the classification of land use images. Training has been conducted on both proposed and original neural-style generated datasets for both architectures. Subsequently, features are extracted from the deep layers and merged employing an innovative serial approach based on weighted entropy. By removing redundant and superfluous data, a novel Chimp Optimization technique is applied to the fused features in order to further refine them. In conclusion, selected features are classified using the help of neural network classifiers. The experimental procedure yielded respective accuracy rates of 99.0% and 99.4% when applied to both datasets. When evaluated in comparison to state-of-the-art (SOTA) methods, the outcomes generated by the proposed framework demonstrated enhanced precision and accuracy.
Article
Full-text available
Semantic-level land-use scene classification is a challenging problem, in which deep learning methods, e.g., convolutional neural networks (CNNs), have shown remarkable capacity. However, a lack of sufficient labeled images has proved a hindrance to increasing the land-use scene classification accuracy of CNNs. Aiming at this problem, this paper proposes a CNN pre-training method under the guidance of a human visual attention mechanism. Specifically, a computational visual attention model is used to automatically extract salient regions in unlabeled images. Then, sparse filters are adopted to learn features from these salient regions, with the learnt parameters used to initialize the convolutional layers of the CNN. Finally, the CNN is further fine-tuned on labeled images. Experiments are performed on the UCMerced and AID datasets, which show that when combined with a demonstrative CNN, our method can achieve 2.24% higher accuracy than a plain CNN and can obtain an overall accuracy of 92.43% when combined with AlexNet. The results indicate that the proposed method can effectively improve CNN performance using easy-to-access unlabeled images and thus will enhance the performance of land-use scene classification especially when a large-scale labeled dataset is unavailable.
Article
Full-text available
This paper describes a deep learning approach to semantic segmentation of very high resolution (aerial) images. Deep neural architectures hold the promise of end-to-end learning from raw images, making heuristic feature design obsolete. Over the last decade this idea has seen a revival, and in recent years deep convolutional neural networks (CNNs) have emerged as the method of choice for a range of image interpretation tasks like visual recognition and object detection. Still, standard CNNs do not lend themselves to per-pixel semantic segmentation, mainly because one of their fundamental principles is to gradually aggregate information over larger and larger image regions, making it hard to disentangle contributions from different pixels. Very recently two extensions of the CNN framework have made it possible to trace the semantic information back to a precise pixel position: deconvolutional network layers undo the spatial downsampling, and Fully Convolution Networks (FCNs) modify the fully connected classification layers of the network in such a way that the location of individual activations remains explicit. We design a FCN which takes as input intensity and range data and, with the help of aggressive deconvolution and recycling of early network layers, converts them into a pixelwise classification at full resolution. We discuss design choices and intricacies of such a network, and demonstrate that an ensemble of several networks achieves excellent results on challenging data such as the ISPRS semantic labeling benchmark , using only the raw data as input.
Article
Full-text available
This paper describes a deep learning approach to semantic segmentation of very high resolution (aerial) images. Deep neural architectures hold the promise of end-to-end learning from raw images, making heuristic feature design obsolete. Over the last decade this idea has seen a revival, and in recent years deep convolutional neural networks (CNNs) have emerged as the method of choice for a range of image interpretation tasks like visual recognition and object detection. Still, standard CNNs do not lend themselves to per-pixel semantic segmentation, mainly because one of their fundamental principles is to gradually aggregate information over larger and larger image regions, making it hard to disentangle contributions from different pixels. Very recently two extensions of the CNN framework have made it possible to trace the semantic information back to a precise pixel position: deconvolutional network layers undo the spatial downsampling, and Fully Convolution Networks (FCNs) modify the fully connected classification layers of the network in such a way that the location of individual activations remains explicit. We design a FCN which takes as input intensity and range data and, with the help of aggressive deconvolution and recycling of early network layers, converts them into a pixelwise classification at full resolution. We discuss design choices and intricacies of such a network, and demonstrate that an ensemble of several networks achieves excellent results on challenging data such as the ISPRS semantic labeling benchmark, using only the raw data as input.
Article
With the development of deep learning, supervised learning has frequently been adopted to classify remotely sensed images using convolutional networks. However, due to the limited amount of labeled data available, supervised learning is often difficult to carry out. Therefore, we proposed an unsupervised model called multiple-layer feature-matching generative adversarial networks (MARTA GANs) to learn a representation using only unlabeled data. MARTA GANs consists of both a generative model G and a discriminative model D. We treat D as a feature extractor. To fit the complex properties of remote sensing data, we use a fusion layer to merge the mid-level and global features. G can produce numerous images that are similar to the training data; therefore, D can learn better representations of remotely sensed images using the training data provided by G. The classification results on two widely used remote sensing image databases show that the proposed method significantly improves the classification performance compared with other state-of-the-art methods.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Article
Remote sensing image scene classification plays an important role in a wide range of applications and hence has been receiving remarkable attention. During the past years, significant efforts have been made to develop various datasets or present a variety of approaches for scene classification from remote sensing images. However, a systematic review of the literature concerning datasets and methods for scene classification is still lacking. In addition, almost all existing datasets have a number of limitations, including the small scale of scene classes and the image numbers, the lack of image variations and diversity, and the saturation of accuracy. These limitations severely limit the development of new approaches especially deep learning-based methods. This paper first provides a comprehensive review of the recent progress. Then, we propose a large-scale dataset, termed "NWPU-RESISC45", which is a publicly available benchmark for REmote Sensing Image Scene Classification (RESISC), created by Northwestern Polytechnical University (NWPU). This dataset contains 31,500 images, covering 45 scene classes with 700 images in each class. The proposed NWPU-RESISC45 (i) is large-scale on the scene classes and the total image number, (ii) holds big variations in translation, spatial resolution, viewpoint, object pose, illumination, background, and occlusion, and (iii) has high within-class diversity and between-class similarity. The creation of this dataset will enable the community to develop and evaluate various data-driven algorithms. Finally, several representative methods are evaluated using the proposed dataset and the results are reported as a useful baseline for future research.
Article
The increase of the spatial resolution of remote-sensing sensors helps to capture the abundant details related to the semantics of surface objects. However, it is difficult for the popular object-oriented classification approaches to acquire higher level semantics from the high spatial resolution remote-sensing (HSR-RS) images, which is often referred to as the "semantic gap." Instead of designing sophisticated operators, convolutional neural networks (CNNs), a typical deep learning method, can automatically discover intrinsic feature descriptors from a large number of input images to bridge the semantic gap. Due to the small data volume of the available HSR-RS scene datasets, which is far away from that of the natural scene datasets, there have been few reports of CNN approaches for HSR-RS image scene classifications. We propose a practical CNN architecture for HSR-RS scene classification, named the large patch convolutional neural network (LPCNN). The large patch sampling is used to generate hundreds of possible scene patches for the feature learning, and a global average pooling layer is used to replace the fully connected network as the classifier, which can greatly reduce the total parameters. The experiments confirm that the proposed LPCNN can learn effective local features to form an effective representation for different land-use scenes, and can achieve a performance that is comparable to the state-of-the-art on public HSR-RS scene datasets. © 2016 Society of Photo-Optical Instrumentation Engineers (SPIE).
Article
Land-use classification of very high spatial resolution remote sensing (VHSR) imagery is one of the most challenging tasks in the field of remote sensing image processing. However, the land-use classification is hard to be addressed by the land-cover classification techniques, due to the complexity of the land-use scenes. Scene classification is considered to be one of the expected ways to address the land-use classification issue. The commonly used scene classification methods of VHSR imagery are all derived from the computer vision community that mainly deal with terrestrial image recognition. Differing from terrestrial images, VHSR images are taken by looking down with airborne and spaceborne sensors, which leads to the distinct light conditions and spatial configuration of land cover in VHSR imagery. Considering the distinct characteristics, two questions should be answered: (1) Which type or combination of information is suitable for the VHSR imagery scene classification? (2) Which scene classification algorithm is best for VHSR imagery? In this paper, an efficient spectral–structural bag-of-features scene classifier (SSBFC) is proposed to combine the spectral and structural information of VHSR imagery. SSBFC utilizes the first- and second-order statistics (the mean and standard deviation values, MeanStd) as the statistical spectral descriptor for the spectral information of the VHSR imagery, and uses dense scale-invariant feature transform (SIFT) as the structural feature descriptor. From the experimental results, the spectral information works better than the structural information, while the combination of the spectral and structural information is better than any single type of information. Taking the characteristic of the spatial configuration into consideration, SSBFC uses the whole image scene as the scope of the pooling operator, instead of the scope generated by a spatial pyramid (SP) commonly used in terrestrial image classification. The experimental results show that the whole image as the scope of the pooling operator performs better than the scope generated by SP. In addition, SSBFC codes and pools the spectral and structural features separately to avoid mutual interruption between the spectral and structural features. The coding vectors of spectral and structural features are then concatenated into a final coding vector. Finally, SSBFC classifies the final coding vector by support vector machine (SVM) with a histogram intersection kernel (HIK). Compared with the latest scene classification methods, the experimental results with three VHSR datasets demonstrate that the proposed SSBFC performs better than the other classification methods for VHSR image scenes.
Article
Deep learning methods such as convolutional neural networks (CNNs) can deliver highly accurate classification results when provided with large enough data sets and respective labels. However, using CNNs along with limited labeled data can be problematic, as this leads to extensive overfitting. In this letter, we propose a novel method by considering a pretrained CNN designed for tackling an entirely different classification problem, namely, the ImageNet challenge, and exploit it to extract an initial set of representations. The derived representations are then transferred into a supervised CNN classifier, along with their class labels, effectively training the system. Through this two-stage framework, we successfully deal with the limited-data problem in an end-to-end processing scheme. Comparative results over the UC Merced Land Use benchmark prove that our method significantly outperforms the previously best stated results, improving the overall accuracy from 83.1% up to 92.4%. Apart from statistical improvements, our method introduces a novel feature fusion algorithm that effectively tackles the large data dimensionality by using a simple and computationally efficient approach.