ChapterPDF Available

Deep Learning for Satellite Image Classification

January 2019

January 2019

DOI:10.1007/978-3-319-99010-1_35

In book: Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2018 (pp.383-391)

Authors:

Mayar Aly Shafaey

National Egyptian E-Learning University

Mohammed Abdel-Megeed Mohammed Salem

The German University in Cairo

Hala Mousher Ebied

Faculty of Computers and Information Sciences Ain Shams University

Maryam Al-berry

Ain Shams University

Show all 5 authorsHide

Nowadays, large amounts of high resolution remote-sensing images are acquired daily. However, the satellite image classification is requested for many applications such as modern city planning, agriculture and environmental monitoring. Many researchers introduce and discuss this domain but still, the sufficient and optimum degree has not been reached yet. Hence, this article focuses on evaluating the available and public remote-sensing datasets and common different techniques used for satellite image classification. The existing remote-sensing classification methods are categorized into four main categories according to the features they use: manually feature-based methods, unsupervised feature learning methods, supervised feature learning methods, and object-based methods. In recent years, there has been an extensive popularity of supervised deep learning methods in various remote-sensing applications, such as geospatial object detection and land use scene classification. Thus, the experiments, in this article, carried out on one of the popular deep learning models, Convolution Neural Networks (CNNs), precisely AlexNet architecture on a standard sounded dataset, UC-Merceed Land Use. Finally, a comparison with other different techniques is introduced.

21 Classes representative [(a)-(u)] of the UC-Merced Land-Use dataset [34].

…

. Comparison between the different remote-sensing datasets proposed

…

ImageNet classification with AexNet CNN [32]

…

. Survey of recent publications applied CNNs in their experiments on large scale remote-sensing (RS) images, UC-Mercced dataset

…

The classification accuracy for UC-Merceed Land Use dataset using the AlexNet CNN. The x-axis represents the interval of training to testing set ratio [0.1-0.9]. The y-axis represents the classification accuracy.

…

Figures - uploaded by Mayar Aly Shafaey

Content may be subject to copyright.

Content uploaded by Mayar Aly Shafaey

Content may be subject to copyright.

Deep Learning for Satellite Image

Classiﬁcation

Mayar A. Shafaey

1(&)

, Mohammed A.-M. Salem

1,2

H. M. Ebied

, M. N. Al-Berry

, and M. F. Tolba

Faculty of Computers and Information Sciences,

Ain Shams University, Cairo, Egypt

mayar.al.mohamed@fcis.asu.edu.eg,

{salem,maryam_nabil}@cis.asu.edu.eg,

hala.m@outlook.com, fahmytolba@gmail.com

Faculty of Media Engineering and Technology,

German University, Cairo, Egypt

Abstract. Nowadays, large amounts of high resolution remote-sensing images

are acquired daily. However, the satellite image classiﬁcation is requested for

many applications such as modern city planning, agriculture and environmental

monitoring. Many researchers introduce and discuss this domain but still, the

sufﬁcient and optimum degree has not been reached yet. Hence, this article

focuses on evaluating the available and public remote-sensing datasets and

common different techniques used for satellite image classiﬁcation. The existing

remote-sensing classiﬁcation methods are categorized into four main categories

according to the features they use: manually feature-based methods, unsuper-

vised feature learning methods, supervised feature learning methods, and object-

based methods. In recent years, there has been an extensive popularity of

supervised deep learning methods in various remote-sensing applications, such

as geospatial object detection and land use scene classiﬁcation. Thus, the

experiments, in this article, carried out on one of the popular deep learning

models, Convolution Neural Networks (CNNs), precisely AlexNet architecture

on a standard sounded dataset, UC-Merceed Land Use. Finally, a comparison

with other different techniques is introduced.

Keywords: Remote-sensing Satellite image Deep learning

Convolution Neural Networks (CNNs) UC-Merceed Land Use

Parallel computing

1 Introduction

A Satellite Image is an image of the whole or part of the earth taken using artiﬁcial

satellites. It can either be visible light images, water vapor images or infrared images

[1]. The different types of satellites produce (high spatial, spectral, and temporal)

resolution images that cover the whole Earth in less than a day. The large-scale nature

of these data sets introduces new challenges in image analysis.

The analysis and classiﬁcation of remote-sensing images is very important in many

practical applications, such as natural hazards and geospatial object detection, precision

©Springer Nature Switzerland AG 2019

A. E. Hassanien et al. (Eds.): AISI 2018, AISC 845, pp. 383–391, 2019.

https://doi.org/10.1007/978-3-319-99010-1_35

agriculture, urban planning, vegetation mapping, and military monitoring [2]. Despite

decades of research, the degree of automation for remote-sensing images analysis still

remains low [3].

The main objective of this paper is to present a literature review on the recent deep-

learning based techniques for satellite image classiﬁcation and the available training

and testing datasets. Moreover, testing results will present on one popular dataset using

the AlexNet architecture of the Convolution Neural Networks (CNNs).

In the next section, a list of available datasets and their speciﬁcations are presented.

A review on recent classiﬁcation approaches applied on one or some of these datasets is

presented in Sect. 3. The experimental work followed by results and evaluations are

presented in Sect. 4. Finally, conclusions are highlighted in Sect. 5.

2 Review on Publicly Remote Sensing Images Datasets

In the past years, several high resolution remote-sensing image datasets have been

introduced by different groups to enable machine-learning based research for scene

classiﬁcation and to evaluate different methods in this ﬁeld. The authors will review

some publicly available sets in this section, as given in Table 1. The table below shows

the number of scene classes, images per class, total images, size of images, and spatial

resolution.

The most images in these datasets are imported from Google Earth Engine and

cover the areas of: agricultural, airplane, baseball diamond, beach, buildings, chaparral,

dense residential, forest, freeway, golf course, harbor, intersection, medium density

residential, mobile home park, overpass, parking lot, river, runway, sparse residential,

storage tanks, and so on. Except the dataset in [13]“Brazilian Coffee Scene dataset”,

Table 1. Comparison between the different remote-sensing datasets proposed

Data set Scene

classes

Images/class Total

images

Spatial

resolution

Image

sizes

AID [4] 30 200–400 10000 High 600 600

Patter Net [5] 38 800 30400 Up to 0.8 256 256

RSI-CB256 [6] 35 Various 34000 0.3–3 256 256

SAT_4 & SAT_6 [7] Patches (500000 + 405000) Low 28 28

UC-Merced Land Use [8] 21 100 2100 0.3 256 256

WHU-RS19 [9]19*50 1005 Up to 0.5 600 600

SIRI-WHU [10] 12 200 2400 2 200 200

RSSCN7 [11] 7 400 2800 –400 400

RSC11 [12]11*100 1232 0.2 512 512

Brazilian Coffee [13] 2 1438 2876 Low 64 64

NWPU-RESISC45 [14] 45 700 31500 *30–0.2 256 256

384 M. A. Shafaey et al.

cropped from SPOT satellite images and contains only two scene classes, which is

appropriate for multi-class scene classiﬁcation methods. In contradiction, the large

number of classes and images in NWPU-RESISC45 [14] dataset, will impact positively

the classiﬁcation results.

However, the UC-Merced Land-Use [8] in Fig. 1is the most popular and has been

widely used for the task of remote-sensing image scene classiﬁcation and retrieval so

far. So, the authors will choose it to carry out the classiﬁcation experiment.

3 Remote Sensing Images Classiﬁcation Methods

There are long and proud researches during the last and current decades that were

carried out on the satellite images for the task of scene classiﬁcation. From the vast

publications of this topic, generally, the existing scene classiﬁcation methods could

summarized into four main categories according to the features they used: manually

feature based methods, unsupervised classiﬁcation methods, supervised learning

methods, and object-based methods.

3.1 Manually Feature Based Methods

A fundamental step in image classiﬁcation is based on handcrafted features. These

methods measure the skills of researchers to design and extract important features, such

as color, orientation, texture, shape, spatial and spectral information, or their combi-

nation. Some of the most common and essential features that are used for scene

classiﬁcation are: Color histograms -Texture descriptors –GIST: describe orienta-

tions of a scene –SIFT:describe sub-regions of a scene –HOG: describe gradient of

objects [15–17,40].

Fig. 1. 21 Classes representative [(a)–(u)] of the UC-Merced Land-Use dataset [34].

Deep Learning for Satellite Image Classiﬁcation 385

3.2 Unsupervised Classiﬁcation Methods

The limitations of manually feature based methods could be solved by self-learning

features from images. This strategy is called unsupervised learning method. In recent

years, unsupervised feature learning from unlabeled input data has become an attractive

alternative to handcrafted features [18].

The idea behind that strategy is ﬁrst grouping the image pixels into clusters based

on their properties. By learning features from images instead of relying on manually

designed features, we can obtain more discriminative feature that is better suited for the

classiﬁcation problem [19]. Such clustering algorithms are: principal component

analysis (PCA) [20], k-means clustering [21], sparse coding [22], and so on.

In real applications, the aforementioned unsupervised feature learning methods

have achieved good performance for land use classiﬁcation, especially compared to

handcrafted based methods. For example, authors in [23–25] applied unsupervised

methods and made a signiﬁcant progress for remote-sensing scene classiﬁcation.

3.3 Supervised Learning Methods

Starting year 2006, the volcano of researches relied on supervised learning methods

which need to use labeled data to extract more powerful features, especially, a deep

learning method which made by Hinton and Salakhutdinov [26]. There exists different

numbers of deep learning models, such as deep belief nets (DBN) [27], deep Boltz-

mann machines (DBM) [28], stacked auto-encoder (SAE) [29], Convolutional Neural

Networks (CNNs) [30], and so on. In this article, the authors mainly review the widely

used deep learning method CNNs.

The basic concept of CNN is to train huge multi-layer networks for giving

impressive classiﬁcation results of large scale input images. The CNN itself has dif-

ferent models like: AlexNet, GoogleNet, ResNet, VGGNet, CaffeNet …etc. [31].

Limited by the space, a short and highlight description of AlexNet architecture was

given. The net consists of 25 layers: 5 convolution layers, max-pooling layers, dropout

layers, and 3 fully connected layers, as shown in Fig. 2. It is trained on ImageNet data,

which contained over 15 million annotated images from a total of over 22,000 cate-

gories and Used ReLU for the nonlinearity functions [32].

Fig. 2. ImageNet classiﬁcation with AexNet CNN [32]

386 M. A. Shafaey et al.

Table 2represents some of authors who used the CNN models in their experiments

for large scale image scene classiﬁcation and gave the proud accuracy values which

demonstrate the power of CNN learning model.

3.4 Object-Based Methods

Unlike pixel-based or image-based classiﬁcation, object-based image classiﬁcation

groups pixels into representative shapes and sizes and assigns each group to a semantic

object. This process relies on multi-resolution segmentation. Multi-resolution seg-

mentation produces homogenous objects by grouping pixels. It generates objects with

different scales in an image simultaneously. These objects are more meaningful

because they represent features in the image [38,41].

The question here is how to select the appropriate image classiﬁcation techniques.

It is based on common sense of the engineering. Let’s say you want to classify water in

a high spatial resolution image containing grasses. You decide to choose all pixels with

low NDVI (Normalized Difference Vegetation Index) in that image. NDVI is used to

analyze remote sensing measurements and assess whether the target being observed

contains live green vegetation or not. But this could also misclassify other pixels in the

image that aren’t water i.e. pixels of the sky. For this reason, pixel-based classiﬁcation

as unsupervised and supervised classiﬁcation gives a salt and pepper look.

As illustrated in this article, spatial resolution is an important factor when selecting

image classiﬁcation techniques. Hence, when you have low spatial resolution, both

traditional pixel-based and object-based image classiﬁcation techniques perform well.

But when you have high spatial resolution, object-based image classiﬁcation is superior

to traditional pixel-based classiﬁcation [39].

4 Experiments and Results

Taking advantages of the availability of UC-Merceed dataset [8], the AlextNet CNN

approach was applied to represent the large scale image classiﬁcation process. In this

section, the experiment’s steps will be described, i.e., software, hardware speciﬁcation,

results, comments, and comparisons.

Table 2. Survey of recent publications applied CNNs in their experiments on large scale

remote-sensing (RS) images, UC-Mercced dataset

References Year Application Method Accuracy

[33] 2015 Multi-spectral land use

classiﬁcation

Deep CNN 93.48%

[34] 2015 Land Use RS classiﬁcation GoogleNet, and

CaffeNet

97%, and

95.48%

[35] 2016 RS scene classiﬁcation Large patch CNN Effective results

[36] 2016 Large scale image classiﬁcation CNN 92.4%

[37] 2018 Remote sensing scene

classiﬁcation

CNN 92.43%

Deep Learning for Satellite Image Classiﬁcation 387

4.1 Experimental Procedure

The experiment ran on two different computers. Machine 1 has a processor: Intel®

Core™i7-2670QM CPU @ 2.20 GHz–8 GB RAM. Machine 2 equipped with

NVIDIA GTX 1050 4G cc: 6.1 GPU: Intel® Core™i7-7700HQ @ 2.20 GHz–16 GB

RAM. The time elapsed on machine 1 was 1800 s and on machine 2 was 14 s. Thanks

to Graphical Processing Unit (GPU) for giving an impressive and signiﬁcant execution

time. The parallel computing optimizes the performance 100 times than the serial

computations.

Hence, The experiment ran on Machine 2 and Matlab

software using alexnet()

built-in function which is trained on a subset of the ImageNet database –*1.2 million

images - and can classify images into 1000 object categories. This function requires

Neural Network Toolbox™Model for AlexNet Network. The basic three steps are

ﬁrstly resizing the image dimension from 256 256 to 227 227 as a required input

for the CNN. The second step is to choose the training set percentage. And thirdly, train

the multiclass SVM classiﬁer, extract test features using the CNN, and pass them to the

trained classiﬁer to get the known labels. Finally, the classiﬁcation results are given by

computing the summation of main diagonal of the confusion matrix divided by the

diagonal elements number.

A number of experiments were carried out to assess the performance of the CNN

using the well-known UC-Merceed Land Use dataset. The UC-Merceed Land Use

dataset contains 2100 images, 21 distinct classes and every class contains 100 different

images. In the experiments, the size of training set ranged from 10 to 90% of the 100

different images per class and the remaining images where used for testing. Figure 3

shows the correct classiﬁcation accuracy vs. the size of training set percentage.

0 0.2 0.4 0.6 0.8 1

Classification Acurracy Pct. %

Training-to-Testing Ratio

Classification Aaccuracy

Classification

Aaccuracy

Fig. 3. The classiﬁcation accuracy for UC-Merceed Land Use dataset using the AlexNet CNN.

The x-axis represents the interval of training to testing set ratio [0.1–0.9]. The y-axis represents

the classiﬁcation accuracy.

388 M. A. Shafaey et al.

The ﬁrst trial started to split 10% of images into training set which gave 81.3%

accuracy value. Then, repeated the experiment eight times up to 90% of images into

training set which gave around 94% accuracy value. The ﬁgure below illustrates that

the gradually increase of training images impacts positively the classiﬁcation result.

4.2 Evaluation and Discussions

Compared with other CNN models, GoogleNet and CaffeNet, mentioned and discussed

in [34], and applied also on UC-Merceed dataset, the authors observed that the clas-

siﬁcation accuracy gained by GoogleNet (*97%) was better than whose gained by

CaffeNet (*94%) and AlexNet (*94%). However, The AlexNet is faster than Goo-

gleNet model. The two models ran on the same GPU, as mentioned before, the AlexNet

executed in only 14 s, but GoogleNet consumed 51 s, which is approximately 4 times

slower.

On the one hand, in comparison with traditional handcrafted features that require a

high mental thinking and skills, deep learning features are learned from data auto-

matically via deep architecture neural networks. This is the key advantage of deep

learning methods.

On the other hand, and compared with aforementioned unsupervised feature

learning methods i.e. sparse coding, deep learning models can learn more powerful

because it is composed of multiple processing layers which is more applicable for large

scale and remote-sensing image scene classiﬁcation. The deep feature learning methods

act as a human brain in which every level uses the information from the previous level

to learn deeply and accurately.

The following articles support our research. In [23], the high-resolution satellite

scene classiﬁcation using a sparse coding carried out on UC-Merceed dataset and

reached about 91% accuracy. And in [24], the unsupervised feature learning via

spectral clustering of multidimensional patches was carried out on the same dataset and

achieved 90% right classiﬁcation.

5 Conclusions

The automation target detection or recognition, and high resolution remotely sensed

image classiﬁcation are two hot topics nowadays. Hence, this paper ﬁrstly represented a

comprehensive review of common and freely remote-sensing datasets to enable the

community to develop the large scale image scene classiﬁcation task. Then, it gave a

summary of recent methods used for this task. Finally, the CNN deep learning method

applied on UC-Merceed dataset evaluated and reported the results to compare against

state-of-the-art and as a baseline for future research.

Deep learning methods can undoubtedly offer better feature representations for the

related remote-sensing task, and there is a bright prospect of seeing more and more

researchers dedicated to learning better features for the target detection and scene

classiﬁcation tasks by utilizing appropriate deep learning methods.

Deep Learning for Satellite Image Classiﬁcation 389

Thanks to parallel computing and GPUs for optimizing and enhancing the exe-

cution time 100than the serial computations, our experiment ran in time not

exceeding 14 s to classify one testing image out of 2100 images.

References

1. NASA: What Is a Satellite? NASA Knows! (Grades 5–8) series (2014)

2. Zhang, L., Xia, G., Wu, T., Lin, L., Tai, X.: Deep learning for remote sensing image

understanding. J. Sens. 2016,1–2 (2016)

3. Marmanisad, D., Wegnera, J., Gallianib, S., Schindlerb, K., Datcuc, M., Stillad, U.:

Semantic segmentation of aerial images with an ensemble of CNNs. ICWG 3(4), 1–8 (2016)

4. AID Dataset. http://www.lmars.whu.edu.cn/xia/AID-project.html. Accessed 16 Feb 2018

5. PatternNet Dataset. https://sites.google.com/view/zhouwx/dataset?authuser=0. Accessed 16

Feb 2018

6. RSI Dataset. https://github.com/lehaifeng/RSI-CB. Accessed 16 Feb 2018

7. SAT_4 & SAT_6. http://csc.lsu.edu/*saikat/deepsat/. Accessed 16 Feb 2018

8. UC-Merceed Land Use Dataset. http://weegee.vision.ucmerced.edu/datasets/landuse.html.

Accessed 16 Feb 2018

9. WHU-RS19 Dataset. https://www.google.com/url?q=http%3A%2F%2Fwww.xinhua-ﬂuid.

com%2Fpeople%2Fyangwen%2FWHU-RS19.html&sa=D&sntz=1&usg=AFQjCNFzrOnVi

W6TWOoFbN1IaIMfyLdJhQ. Accessed 16 Feb 2018

10. SIRI-WHU Dataset. http://www.lmars.whu.edu.cn/prof_web/zhongyanfei/e-code.html.

Accessed 16 Feb 2018

11. RSSCN7 Dataset. https://www.dropbox.com/s/j80iv1a0mvhonsa/RSSCN7.zip?dl=0.

Accessed 16 Feb 2018

12. RSC11 Dataset. https://www.yeastgenome.org/locus/ARP7. Accessed 16 Feb 2018

13. Brazilian Coffee Dataset. http://www.patreo.dcc.ufmg.br/downloads/brazilian-coffee-dataset/.

Accessed 16 Feb 2018

14. NWPU-RESISC45 Dataset. https://www.google.com/url?q=http%3A%2F%2Fwww.

escience.cn%2Fpeople%2FJunweiHan%2FNWPU-RESISC45.html&sa=D&sntz=1&usg=

AFQjCNGs2uMeX7KT2QvEMzcD5uF4-aQChw. Accessed 16 Feb 2018

15. Cheng, G., Han, J., Lu, X.: Remote sensing image scene classiﬁcation: benchmark and state

of the art. Proc. IEEE 105(10), 1–17 (2017)

16. Thomas, M., Farid, M., Yakoub, B., Naif, A.: A fast object detector based on high-order

gradients and Gaussian process regression for UAV images. Int. J. Remote Sens. 36(10),

2713–2733 (2015)

17. Aptoula, E.: Remote sensing image retrieval with global morphological texture descriptors.

IEEE Trans. Geosci. Remote Sens. 52(5), 3023–3034 (2014)

18. Mekhalﬁ, M., Melgani, F., Bazi, Y., Alajlan, N.: Land-use classiﬁcation with compressive

sensing multifeature fusion. IEEE Geosci. Remote Sens. 12(10), 2155–2159 (2015)

19. Cheriyadat, A.: Unsupervised feature learning for aerial scene classiﬁcation. IEEE Trans.

Geosci. Remote Sens. 52(1), 439–451 (2014)

20. Jolliffe, I.: Principal component analysis. Springer, New York (2002)

21. Zhao, B., Zhong, Y., Zhang, L.: A spectral–structural bag-of-features scene classiﬁer for

very high spatial resolution remote sensing imagery. Remote Sens. 116,73–85 (2016)

22. Olshausen, B., Field, D.: Sparse coding with an overcomplete basis set: a strategy employed

by V1? Vision. Res. 37(23), 3311–3325 (1997)

390 M. A. Shafaey et al.

23. Sheng, G., Yang, W., Xu, T., Sun, H.: High-resolution satellite scene classiﬁcation using a

sparse coding based multiple feature combination. Int. J. Remote Sens. 33(8), 2395–2412

(2012)

24. Hu, F., Xia, G., Wang, Z., Huang, X., Zhang, L., Sun, H.: Unsupervised feature learning via

spectral clustering of multidimensional patches for remotely sensed scene classiﬁcation.

IEEE J. Select. Top. Appl. Earth Obs. Remote Sens. 8(5), 2015–2030 (2015)

25. Daoyu, L., Kun, F., Yang, W., Guangluan, X., and Xian, S.: MARTA GANs: unsupervised

representation learning for remote sensing image classiﬁcation. National Natural Science

Foundation of China (2017)

26. Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks.

Science 313(5786), 504–507 (2006)

27. Hinton, G., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural

Comput. 18(7), 1527–1554 (2006)

28. Salakhutdinov, R., Hinton, G.: An efﬁcient learning procedure for deep Boltzmann

machines. Neural Comput. 24(8), 1967–2006 (2012)

29. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising

autoencoders: Learning useful representations in a deep network with a local denoising

criterion. Mach. Learn. Res. 11, 3371–3408 (2010)

30. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: OverFeat:

integrated recognition, localization and detection using convolutional networks. In:

Proceedings of the International Conference on Learning Representations, pp. 1–16 (2014)

31. Simonyan K., Zisserman, A.: Very deep convolutional networks for large-scale image

recognition. In: Proceedings of the International Conference on Learning Representations,

pp. 1–13 (2015)

32. Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classiﬁcation with deep convolutional

neural networks. In: Proceedings of the Conference on Advances in Neural Information

Processing Systems, pp. 1097–1105 (2012)

33. Luus, F., Salmon, B., Van Den Bergh, F., Maharaj, B.: Multiview deep learning for land-use

classiﬁcation. IEEE Geosci. Remote Sens. Lett. 12(12), 2448–2452 (2015)

34. Castelluccio, M., Poggi, G., Sansone, C., Verdoliva, L.: Land Use Classiﬁcation in Remote

Sensing Images by Convolutional Neural Networks. Cornell University, Ithaca (2015)

35. Zhong, Y., Fei, F., Zhang, L.: Large patch convolutional neural networks for the scene

classiﬁcation of high spatial resolution imagery. Appl. Remote Sens. 10(2), 025006–025006

(2016)

36. Marmanis, D., Datcu, M., Esch, T., Stilla, U.: Deep learning earth observation classiﬁcation

using ImageNet pretrained networks. IEEE Geosci. Remote Sens. Lett. 13(1), 105–109

(2015)

37. Jingbo, C., Chengyi, W., Zhong, M., Jiansheng, C., Dongxu, H., Stephen, A.: Remote

sensing scene classiﬁcation based on convolutional neural networks pre-trained using

attention-guided sparse ﬁlters. Remote Sens. 10(290), 1–16 (2018)

38. Blaschke, T.: Object based image analysis for remote sensing. ISPRS J. Photogramm.

Remote Sens. 65(1), 2–16 (2010)

39. GIS Geography. http://gisgeography.com/image-classiﬁcation-techniques-remote-sensing/.

Accessed Feb 16 2018

40. Tahoun, M., Nagaty, K., El-Arief, T., A-Megeed, M.: A robust content-based image retrieval

system using multiple features representations. In: Proceedings of IEEE Networking,

Sensing and Control, pp. 116–122 (2005)

41. Mohammed, A-M.: Multiresolution Image Segmentation. Ph.D. Thesis, Department of

Computer Science, Humboldt-Universitaet zu Berlin, Germany (2008)

Deep Learning for Satellite Image Classiﬁcation 391

Application and analysis of landscape recognition based on efficient net for natural scene

Article

Full-text available

Jan 2024

Sisheng Jin

One significant assessing criteria of climate change is geometric evolution. The rate of evolution reveals the speed that environment worsens. Advanced space mirror monitors that and generates images timely. However, it might be difficult for human to deal with collected numerous image-related data. In previous research, convolutional neural network is regarded to have specific advantage in resolving image recognition tasks. Hence, a new type of convolutional neural network model is applied to identify different kinds of landscape. Virtually, this model is called Efficient Net which based on landscape recognition dataset with 5 classes of landscapes. The study also introduces the fine-tuning to further improve the performance of the model. To evaluate the model, the precision, recall, F1 score, accuracy and loss are adopted as assessing criteria. The results shows that the model predicts the target dataset to a great extent. However, it has been tested that the class of mountain might not be suitable for predicting because of vague criterion. That is helpful in real-condition geographical applications and environmental governance.

Flood and Non-Flood Image Classification using Deep Ensemble Learning

Article

Full-text available

Jun 2024
WATER RESOUR MANAG

Floods are one of the most frequent natural disasters, often resulting in widespread devastation. Identifying floods accurately is crucial for disaster management as it helps to locate areas requiring urgent assistance and streamline post-flood evacuation processes. Recently, deep learning models, such as Convolutional Neural Networks (CNN), have become predominant for image classification tasks, as well as flood classification problems. Deep ensemble techniques,i.e. combining several deep learning architectures, are still quite new in many fields and have not been studied extensively despite showing promising results in flood classification. In this research, we develop an ensemble deep learning framework that utilizes eight state-of-the-art CNN architectures, namely MobileNet V2, ResNet 50, VGG 16, DenseNet 201, Inception V3, EfficientNet B5, NasNet Large, and Xception. The aim is to address the gap of deep ensemble learning in flood classification and provide a more effective approach to identifying potential flooding scenarios from a wide range of visual datasets. We utilize FloodNet and flood area segmentation datasets to train, test, and validate our models. In the testing phase, our ensemble model outperforms several individual benchmark models, achieving a training accuracy of 98.9% and a test accuracy of 97.4%. Our proposed methodology will predict floods and conduct early assessments of affected areas efficiently.

Supervised Machine Learning Algorithms for Land Cover Classification in Casablanca, Morocco

Article

Full-text available

Feb 2024

This study embarks on an evaluation of the efficacy of six supervised machine learning algorithms in the classification of land cover in Casablanca, Morocco, utilizing Landsat satellite imagery. Employing the Google Earth Engine (GEE) platform for data collection, the research encompasses meticulous pre-processing steps and the application of various supervised algorithms, followed by a comprehensive evaluation of their performance. The city of Casablanca, characterized by rapid urbanization and evolving land-use patterns, presents an exemplary case for scrutinizing the algorithms' ability to accurately classify different land zones. These zones encompass water bodies, urban areas, agricultural lands, barren terrains, and forests. The algorithms under scrutiny include Support Vector Machine (SVM), Random Forest (RF), Classification and Regression Trees (CART), Minimum Distance (MD), Decision Tree (DT), and Gradient Tree Boosting (GTB). The assessment of classification outcomes leverages multiple accuracy indicators, namely overall accuracy (OA), Kappa coefficient, user accuracy (UA), and producer accuracy (PA). Results indicate that the Random Forest algorithm exhibits superior performance, achieving an accuracy of 95.42%, while the Support Vector Machine algorithm lags with a lower accuracy of 83%. This investigation underscores the critical role of advanced machine learning algorithms in land cover classification, a pivotal aspect for urban and regional planning, natural resource management, and risk assessment in rapidly changing environments.

Segmentation of Liver Tumors by Monai and PyTorch in CT Images with Deep Learning Techniques

Article

Full-text available

Jun 2024

Image segmentation and identification are crucial to modern medical image processing techniques. This research provides a novel and effective method for identifying and segmenting liver tumors from public CT images. Our approach leverages the hybrid ResUNet model, a combination of both the ResNet and UNet models developed by the Monai and PyTorch frameworks. The ResNet deep dense network architecture is implemented on public CT scans using the MSD Task03 Liver dataset. The novelty of our method lies in several key aspects. First, we introduce innovative enhancements to the ResUNet architecture, optimizing its performance, especially for liver tumor segmentation tasks. Additionally, by harassing the capabilities of Monai, we streamline the implementation process, eliminating the need for manual script writing and enabling faster, more efficient model development and optimization. The process of preparing images for analysis by a deep neural network involves several steps: data augmentation, a Hounsfield windowing unit, and image normalization. ResUNet network performance is measured by using the DC metric Dice coefficient. This approach, which utilizes residual connections, has proven to be more reliable than other existing techniques. This approach achieved DC values of 0.98% for detecting liver tumors and 0.87% for segmentation. Both qualitative and quantitative evaluations show promising results regarding model precision and accuracy. The implications of this research are that it could be used to increase the precision and accuracy of liver tumor detection and liver segmentation, reflecting the potential of the proposed method. This could help in the early diagnosis and treatment of liver cancer, which can ultimately improve patient prognosis.

Monitoring and Forecasting Land Cover Dynamics Using Remote Sensing and Geospatial Technology

Chapter

May 2024

C-SPIN: Classification of Satellite Images for Prediction Network

Conference Paper

Jan 2024

Remote Sensing Image Classification Based on Confidence Score of Ensemble Machine Learning Classifiers

Conference Paper

Oct 2023

Spatial and temporal classification and prediction of LULC in Brahmani and Baitarni basin using integrated cellular automata models

Article

Full-text available

Jan 2024
ENVIRON MONIT ASSESS

Monitoring the dynamics of land use and land cover (LULC) is imperative in the changing climate and evolving urbanization patterns worldwide. The shifts in land use have a significant impact on the hydrological response of watersheds across the globe. Several studies have applied machine learning (ML) algorithms using historical LULC maps along with elevation data and slope for predicting future LULC projections. However, the influence of other driving factors such as socio-economic and climatological factors has not been thoroughly explored. In the present study, a sensitivity analysis approach was adopted to understand the effect of both physical (elevation, slope, aspect, etc.) and socio-economic factors such as population density, distance to built-up, and distance to road and rail, as well as climatic factors (mean precipitation) on the accuracy of LULC prediction in the Brahmani and Baitarni (BB) basin of Eastern India. Additionally, in the absence of the recent LULC maps of the basin, three ML algorithms, i.e., random forest (RF), classified and regression trees (CART), and support vector machine (SVM) were utilized for LULC classification for the years 2007, 2014, and 2021 on Google earth engine (GEE) cloud computing platform. Among the three algorithms, RF performed best for classifying built-up areas along with all the other classes as compared to CART and SVM. The prediction results revealed that the proximity to built-up and population growth dominates in modeling LULC over physical factors such as elevation and slope. The analysis of historical data revealed an increase of 351% in built-up areas over the past years (2007–2021), with a corresponding decline in forest and water areas by 12% and 36% respectively. While the future predictions highlighted an increase in built-up class ranging from 11 to 38% during the years 2028–2070, the forested areas are anticipated to decline by 4 to 16%. The overall findings of the present study suggested that the BB basin, despite being primarily agricultural with a significant forest cover, is undergoing rapid expansion of built-up areas through the encroachment of agricultural and forested lands, which could have far-reaching implications for the region’s ecosystem services and sustainability.

Multi-Target Classification Using Deep Learning Models for Automotive Applications

Conference Paper

Nov 2023

A Novel Bottleneck Residual and Self-Attention Fusion-Assisted Architecture for Land Use Recognition in Remote Sensing Images

Article

Full-text available

Jan 2024

The massive yearly population growth is causing hazards to spread swiftly around the world and have a detrimental impact on both human life and the world economy. By ensuring early prediction accuracy, remote sensing enters the scene to safeguard the globe against weather-related threats and natural disasters. Convolutional neural networks, which are a reflection of deep learning, have been used more recently to reliably identify land use in remote sensing images. This work proposes a novel bottleneck residual and self-attention fusion-assisted architecture for land use recognition from remote sensing images. First, we proposed using the fast neural approach to generate cloud-effect satellite images. In neural style, we proposed a 5-layered residual block CNN to estimate the loss of neural-style images. After that, we proposed two novel architectures, named 3-layered bottleneck CNN architecture and 3-layered bottleneck self-attention CNN architecture, for the classification of land use images. Training has been conducted on both proposed and original neural-style generated datasets for both architectures. Subsequently, features are extracted from the deep layers and merged employing an innovative serial approach based on weighted entropy. By removing redundant and superfluous data, a novel Chimp Optimization technique is applied to the fused features in order to further refine them. In conclusion, selected features are classified using the help of neural network classifiers. The experimental procedure yielded respective accuracy rates of 99.0% and 99.4% when applied to both datasets. When evaluated in comparison to state-of-the-art (SOTA) methods, the outcomes generated by the proposed framework demonstrated enhanced precision and accuracy.

Remote Sensing Scene Classification Based on Convolutional Neural Networks Pre-Trained Using Attention-Guided Sparse Filters

Article

Full-text available

Feb 2018

Semantic-level land-use scene classification is a challenging problem, in which deep learning methods, e.g., convolutional neural networks (CNNs), have shown remarkable capacity. However, a lack of sufficient labeled images has proved a hindrance to increasing the land-use scene classification accuracy of CNNs. Aiming at this problem, this paper proposes a CNN pre-training method under the guidance of a human visual attention mechanism. Specifically, a computational visual attention model is used to automatically extract salient regions in unlabeled images. Then, sparse filters are adopted to learn features from these salient regions, with the learnt parameters used to initialize the convolutional layers of the CNN. Finally, the CNN is further fine-tuned on labeled images. Experiments are performed on the UCMerced and AID datasets, which show that when combined with a demonstrative CNN, our method can achieve 2.24% higher accuracy than a plain CNN and can obtain an overall accuracy of 92.43% when combined with AlexNet. The results indicate that the proposed method can effectively improve CNN performance using easy-to-access unlabeled images and thus will enhance the performance of land-use scene classification especially when a large-scale labeled dataset is unavailable.

SEMANTIC SEGMENTATION OF AERIAL IMAGES WITH AN ENSEMBLE OF CNNS

Article

Full-text available

Jun 2016

This paper describes a deep learning approach to semantic segmentation of very high resolution (aerial) images. Deep neural architectures hold the promise of end-to-end learning from raw images, making heuristic feature design obsolete. Over the last decade this idea has seen a revival, and in recent years deep convolutional neural networks (CNNs) have emerged as the method of choice for a range of image interpretation tasks like visual recognition and object detection. Still, standard CNNs do not lend themselves to per-pixel semantic segmentation, mainly because one of their fundamental principles is to gradually aggregate information over larger and larger image regions, making it hard to disentangle contributions from different pixels. Very recently two extensions of the CNN framework have made it possible to trace the semantic information back to a precise pixel position: deconvolutional network layers undo the spatial downsampling, and Fully Convolution Networks (FCNs) modify the fully connected classification layers of the network in such a way that the location of individual activations remains explicit. We design a FCN which takes as input intensity and range data and, with the help of aggressive deconvolution and recycling of early network layers, converts them into a pixelwise classification at full resolution. We discuss design choices and intricacies of such a network, and demonstrate that an ensemble of several networks achieves excellent results on challenging data such as the ISPRS semantic labeling benchmark , using only the raw data as input.

SEMANTIC SEGMENTATION OF AERIAL IMAGES WITH AN ENSEMBLE OF CNNS

Article

Full-text available

Jun 2016

MARTA GANs: Unsupervised Representation Learning for Remote Sensing Image Classification

Article

Oct 2017

With the development of deep learning, supervised learning has frequently been adopted to classify remotely sensed images using convolutional networks. However, due to the limited amount of labeled data available, supervised learning is often difficult to carry out. Therefore, we proposed an unsupervised model called multiple-layer feature-matching generative adversarial networks (MARTA GANs) to learn a representation using only unlabeled data. MARTA GANs consists of both a generative model G and a discriminative model D. We treat D as a feature extractor. To fit the complex properties of remote sensing data, we use a fusion layer to merge the mid-level and global features. G can produce numerous images that are similar to the training data; therefore, D can learn better representations of remotely sensed images using the training data provided by G. The classification results on two widely used remote sensing image databases show that the proposed method significantly improves the classification performance compared with other state-of-the-art methods.

Imagenet classification with deep convolutional neural networks

Conference Paper

Jan 2012

We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry

Remote Sensing Image Scene Classification: Benchmark and State of the Art

Article

Apr 2017

Remote sensing image scene classification plays an important role in a wide range of applications and hence has been receiving remarkable attention. During the past years, significant efforts have been made to develop various datasets or present a variety of approaches for scene classification from remote sensing images. However, a systematic review of the literature concerning datasets and methods for scene classification is still lacking. In addition, almost all existing datasets have a number of limitations, including the small scale of scene classes and the image numbers, the lack of image variations and diversity, and the saturation of accuracy. These limitations severely limit the development of new approaches especially deep learning-based methods. This paper first provides a comprehensive review of the recent progress. Then, we propose a large-scale dataset, termed "NWPU-RESISC45", which is a publicly available benchmark for REmote Sensing Image Scene Classification (RESISC), created by Northwestern Polytechnical University (NWPU). This dataset contains 31,500 images, covering 45 scene classes with 700 images in each class. The proposed NWPU-RESISC45 (i) is large-scale on the scene classes and the total image number, (ii) holds big variations in translation, spatial resolution, viewpoint, object pose, illumination, background, and occlusion, and (iii) has high within-class diversity and between-class similarity. The creation of this dataset will enable the community to develop and evaluate various data-driven algorithms. Finally, several representative methods are evaluated using the proposed dataset and the results are reported as a useful baseline for future research.

Principal Component Analysis

Article

Jan 1986

Ian T. Jolliffe

Large patch convolutional neural networks for the scene classification of high spatial resolution imagery

Article

Apr 2016
J APPL REMOTE SENS

The increase of the spatial resolution of remote-sensing sensors helps to capture the abundant details related to the semantics of surface objects. However, it is difficult for the popular object-oriented classification approaches to acquire higher level semantics from the high spatial resolution remote-sensing (HSR-RS) images, which is often referred to as the "semantic gap." Instead of designing sophisticated operators, convolutional neural networks (CNNs), a typical deep learning method, can automatically discover intrinsic feature descriptors from a large number of input images to bridge the semantic gap. Due to the small data volume of the available HSR-RS scene datasets, which is far away from that of the natural scene datasets, there have been few reports of CNN approaches for HSR-RS image scene classifications. We propose a practical CNN architecture for HSR-RS scene classification, named the large patch convolutional neural network (LPCNN). The large patch sampling is used to generate hundreds of possible scene patches for the feature learning, and a global average pooling layer is used to replace the fully connected network as the classifier, which can greatly reduce the total parameters. The experiments confirm that the proposed LPCNN can learn effective local features to form an effective representation for different land-use scenes, and can achieve a performance that is comparable to the state-of-the-art on public HSR-RS scene datasets. © 2016 Society of Photo-Optical Instrumentation Engineers (SPIE).

A spectral–structural bag-of-features scene classifier for very high spatial resolution remote sensing imagery

Article

Jun 2016
ISPRS J PHOTOGRAMM

Land-use classification of very high spatial resolution remote sensing (VHSR) imagery is one of the most challenging tasks in the field of remote sensing image processing. However, the land-use classification is hard to be addressed by the land-cover classification techniques, due to the complexity of the land-use scenes. Scene classification is considered to be one of the expected ways to address the land-use classification issue. The commonly used scene classification methods of VHSR imagery are all derived from the computer vision community that mainly deal with terrestrial image recognition. Differing from terrestrial images, VHSR images are taken by looking down with airborne and spaceborne sensors, which leads to the distinct light conditions and spatial configuration of land cover in VHSR imagery. Considering the distinct characteristics, two questions should be answered: (1) Which type or combination of information is suitable for the VHSR imagery scene classification? (2) Which scene classification algorithm is best for VHSR imagery? In this paper, an efficient spectral–structural bag-of-features scene classifier (SSBFC) is proposed to combine the spectral and structural information of VHSR imagery. SSBFC utilizes the first- and second-order statistics (the mean and standard deviation values, MeanStd) as the statistical spectral descriptor for the spectral information of the VHSR imagery, and uses dense scale-invariant feature transform (SIFT) as the structural feature descriptor. From the experimental results, the spectral information works better than the structural information, while the combination of the spectral and structural information is better than any single type of information. Taking the characteristic of the spatial configuration into consideration, SSBFC uses the whole image scene as the scope of the pooling operator, instead of the scope generated by a spatial pyramid (SP) commonly used in terrestrial image classification. The experimental results show that the whole image as the scope of the pooling operator performs better than the scope generated by SP. In addition, SSBFC codes and pools the spectral and structural features separately to avoid mutual interruption between the spectral and structural features. The coding vectors of spectral and structural features are then concatenated into a final coding vector. Finally, SSBFC classifies the final coding vector by support vector machine (SVM) with a histogram intersection kernel (HIK). Compared with the latest scene classification methods, the experimental results with three VHSR datasets demonstrate that the proposed SSBFC performs better than the other classification methods for VHSR image scenes.

Deep Learning Earth Observation Classification Using ImageNet Pretrained Networks

Article

Dec 2015

Deep learning methods such as convolutional neural networks (CNNs) can deliver highly accurate classification results when provided with large enough data sets and respective labels. However, using CNNs along with limited labeled data can be problematic, as this leads to extensive overfitting. In this letter, we propose a novel method by considering a pretrained CNN designed for tackling an entirely different classification problem, namely, the ImageNet challenge, and exploit it to extract an initial set of representations. The derived representations are then transferred into a supervised CNN classifier, along with their class labels, effectively training the system. Through this two-stage framework, we successfully deal with the limited-data problem in an end-to-end processing scheme. Comparative results over the UC Merced Land Use benchmark prove that our method significantly outperforms the previously best stated results, improving the overall accuracy from 83.1% up to 92.4%. Apart from statistical improvements, our method introduces a novel feature fusion algorithm that effectively tackles the large data dimensionality by using a simple and computationally efficient approach.

Deep Learning for Satellite Image Classification

Abstract and Figures

Recommended publications

Hyperspectral Image Classification Using Deep Learning Technique

Deep Learning for Satellite Image Classification

Hyperspectral Image Classification Using Deep Learning Technique

Remote Sensing Image Classification Based on Convolutional Neural Networks