ArticlePDF Available

Improving Forest Detection Using Machine Learning and Remote Sensing: A Case Study in Southeastern Serbia

July 2023
Applied Sciences

July 2023

DOI:10.3390/app13148289

License
CC BY 4.0

Authors:

Ivan Potić

Military Geographical Institute - "General Stevan Bošković" Belgrade

Zoran Srdić

Academy of Technical and Art Applied Studies Belgrade

Boris Vakanjac

Futura - Fakultet za primenjenu ekologiju

Sasa Bakrac

Military Geographical Institute - "General Stevan Bošković" Belgrade

Show all 7 authorsHide

Vegetation plays an active role in ecosystem dynamics, and monitoring its patterns and changes is vital for effective environmental resource management. This study explores the possibility of machine learning techniques and remote sensing data to improve the accuracy of forest detection. The research focuses on the southeastern part of the Republic of Serbia as a case study area, using Sentinel-2 multispectral bands. The study employs publicly accessible satellite data and incorporates different vegetation indices to improve classification accuracy. The main objective is to examine the practicability of expanding the input parameters for forest detection using a machine learning approach. The classification process is performed by employing support vector machines (SVM) algorithm and utilising the SVM module in the scikit-learn package. The results demonstrate that including vegetation indices alongside the multispectral bands significantly improves the accuracy of vegetation detection. A comprehensive assessment reveals an overall classification accuracy of up to 99.01% when the selected vegetation indices (MCARI, RENDVI, NDI45, GNDVI, NDII) are combined with the Sentinel-2 bands. This research highlights the potential of machine learning and remote sensing in forest detection and monitoring. The findings underscore the importance of incorporating vegetation indices to enhance classification accuracy using the Python programming language. The study's outcomes provide valuable insights for environmental resource management and decision-making processes, particularly in regions with diverse forest ecosystems.

List of Sentinel-2 bands used for classification as Test Data 1. This table is created using the data provided in Sentinel-2 MSI User Guide [26] document.

…

Cont.

…

Part of Full Grid Search Results for SVM Hyperparameters C and Gamma.

…

Impact of Data Variations on Detected Forest Areas.

…

The overall classification accuracy is performed using Test Data 1 and each index from Test Data 2 individually. S. No. column presents the accuracy assessment quality group.

…

Figures - uploaded by Sasa Bakrac

Content may be subject to copyright.

Content uploaded by Sasa Bakrac

Content may be subject to copyright.

Content uploaded by Ivan Potić

Content may be subject to copyright.

Citation: Poti´c, I.; Srdi´c, Z.;

Vakanjac, B.; Bakraˇc, S.; Ðor ¯

devi´c, D.;

Bankovi´c, R.; Jovanovi´c, J.M.

Improving Forest Detection Using

Machine Learning and Remote

Sensing: A Case Study in

Southeastern Serbia. Appl. Sci. 2023,

13, 8289. https://doi.org/10.3390/

app13148289

Academic Editors: Romano Lottering,

Kabir Peerbhay and Samuel Adelabu

Received: 12 June 2023

Revised: 10 July 2023

Accepted: 10 July 2023

Published: 18 July 2023

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

applied

sciences

Article

Improving Forest Detection Using Machine Learning and

Remote Sensing: A Case Study in Southeastern Serbia

Ivan Poti´c 1, † , Zoran Srdi´c 1, †, Boris Vakanjac 1, Saša Bakraˇc 1,2 ,* , Dejan Ðor ¯

devi´c 1,2, Radoje Bankovi´c 1,2

and Jasmina M. Jovanovi´c 3

1Military Geographical Institute “General Stevan Boškovi´c”, 11000 Belgrade, Serbia; ipotic@gmail.com (I.P.);

zoran.m.srdic@gmail.com (Z.S.); borivac@gmail.com (B.V.); dejan.r.djordjevic@vs.rs (D.Ð.);

radoje.bankovic@vs.rs (R.B.)

2Military Academy, University of Defense, 11000 Belgrade, Serbia

3Faculty of Geography, University of Belgrade, 11000 Belgrade, Serbia; jasmina.jovanovic@gef.bg.ac.rs

*Correspondence: sasa.bakrac@vs.rs; Tel.: +381-113205009

†Co-ﬁrst authors; these authors contributed equally to this work.

Featured Application: The primary application of this work is in environmental resource man-

agement, speciﬁcally in the detection and monitoring of vegetation patterns and changes. By

employing a machine learning approach, speciﬁcally the Support Vector Machines (SVM) algo-

rithm, the study demonstrates that including vegetation indices alongside multispectral bands

signiﬁcantly improves the accuracy of vegetation detection, achieving an overall classiﬁcation ac-

curacy of up to 99.01%. The study’s ﬁndings underscore the potential of machine learning and

remote sensing in vegetation detection and monitoring and highlight the importance of incor-

porating vegetation indices to enhance classiﬁcation accuracy. The matter above has signiﬁcant

implications for decision-making processes in environmental resource management, particularly

in regions with diverse forest ecosystems. The potential applications of this work extend beyond

the speciﬁc geographical context of the study. The methodology and ﬁndings could be applied to

other regions and ecosystems, providing valuable insights for the preservation and conservation

of forest ecosystems globally. Future research could further explore the applicability of these

ﬁndings in different geographical regions and investigate other vegetation indices to improve

the accuracy of forest detection and monitoring processes.

Abstract:

Vegetation plays an active role in ecosystem dynamics, and monitoring its patterns

and changes is vital for effective environmental resource management. This study explores the

possibility of machine learning techniques and remote sensing data to improve the accuracy of

forest detection. The research focuses on the southeastern part of the Republic of Serbia as a case

study area, using Sentinel-2 multispectral bands. The study employs publicly accessible satellite

data and incorporates different vegetation indices to improve classification accuracy. The main

objective is to examine the practicability of expanding the input parameters for forest detection

using a machine learning approach. The classification process is performed by employing support

vector machines (SVM) algorithm and utilising the SVM module in the scikit-learn package.

The results demonstrate that including vegetation indices alongside the multispectral bands

significantly improves the accuracy of vegetation detection. A comprehensive assessment reveals

an overall classification accuracy of up to 99.01% when the selected vegetation indices (MCARI,

RENDVI, NDI45, GNDVI, NDII) are combined with the Sentinel-2 bands. This research highlights

the potential of machine learning and remote sensing in forest detection and monitoring. The

findings underscore the importance of incorporating vegetation indices to enhance classification

accuracy using the Python programming language. The study’s outcomes provide valuable

insights for environmental resource management and decision-making processes, particularly in

regions with diverse forest ecosystems.

Appl. Sci. 2023,13, 8289. https://doi.org/10.3390/app13148289 https://www.mdpi.com/journal/applsci

Appl. Sci. 2023,13, 8289 2 of 24

Keywords:

vegetation detection; remote sensing; Python; machine learning; classiﬁcation accuracy;

Sentinel-2

1. Introduction

Vegetation is an essential component of ecosystems that connects the atmospheric,

hydrological, and pedological processes [

]. Environmental preservation and conservation

heavily depend on economic stability and human and political resources. In the past

two decades

, the cost of evaluation procedures has signiﬁcantly reduced due to the public

availability of open satellite data. Now, we can obtain crucial information on deforestation

and degradation by employing remote sensing techniques to analyse this data [

–

]. Earth

observation (EO) data, which include satellite, aerial, or ground-based observations, and

geospatial data are crucial for monitoring changes in forest ecosystems, especially for

identifying vegetation degradation [

]. Monitoring land use and land cover change is vital

to the ecosystem. Remote sensing offers excellent potential for monitoring landscape change

caused by natural cycles and human activity [

]. One of the crucial applications of remote

sensing in environmental resource management and decision-making is the detection and

quantitative evaluation of vegetation patterns. This technology is pivotal in assessing the

ecosystem and identifying vegetation patterns and structural shifts. Such assessments and

identiﬁcations are principal when evaluating and monitoring natural resources. Remote

sensing in forest detection has been a signiﬁcant research and development topic. Remote

sensing technologies provide a powerful tool for monitoring and managing forests on

a large scale, offering the ability to detect changes in forest cover and health over time.

Barmpoutis et al. (2020) provide an overview of optical remote sensing technologies

used in early ﬁre warning systems, highlighting the importance of these technologies in

mitigating the impacts of natural hazards such as large-scale forest ﬁres [

]. Similarly,

Housman et al. (2018) discuss the Operational Remote Sensing (ORS) program, which

leverages Landsat and MODIS data to detect forest disturbances across the United States.

The ORS program supplements traditional Insect and Disease Survey (IDS) data with

imagery-derived forest disturbance data, demonstrating the potential of remote sensing in

forest health monitoring [

]. Furthermore, Chen et al. (2018) present a novel approach to

individual tree-level forest inventory using airborne LiDAR (Light Detection And Ranging)

remote sensing. Their research underscores the potential of remote sensing technologies in

providing detailed, high-resolution data for forest management [11].

This study’s primary challenge is identifying forest cover. In this case, forest detection

is simpliﬁed to a classiﬁcation issue involving categorising the input data set into two

classes, “forest” and “not forest”. This binary classiﬁcation is suitable for inventorying

forests or creating thematic masks for topographic maps when it is essential to identify forest

cover without distinguishing forest types. In terms of binary classiﬁcation, this approach

offers several advantages. Binary classiﬁcation simpliﬁes the problem by focusing on the

distinction between two classes, which can lead to more accurate and efﬁcient models. It is

beneﬁcial when classes are imbalanced, allowing the model to focus on the minority class.

Furthermore, binary classiﬁcation models are often easier to interpret and understand,

making them more practical for decision-making processes [12].

Forest detection becomes challenging and complex in such instances, especially in

regions with high biodiversity, i.e., a wide range of forest ecosystems. The challenge

of categorising data under such speciﬁc conditions is distinctively novel. The research

investigated the feasibility of augmenting the initial array of input variables, including

Sentinel-2 bands and vegetation indices, for executing the machine learning protocol.

Satellite imagery serves as the input data for forest identiﬁcation and is the most prevalent

data source for forest inventory, particularly for categorising extensive regions. Examining

the content within satellite imagery presents an additional concern, primarily due to the

heterogeneity of materials in the images and the substantial volume of data. It is imperative

Appl. Sci. 2023,13, 8289 3 of 24

to employ sophisticated and robust technologies to effectively manage such a complex data

set, especially in categorisation tasks. In addressing the classiﬁcation problem associated

with forest detection using satellite imagery, this research harnessed the power of artiﬁcial

intelligence and machine learning.

The development of Machine Learning (ML) requires the determination of all essential

metrics for the decision-making process. The mechanisms of machine learning generate

models to enhance the metrics. In order to ensure the development of an effective solution

for any decision-making process, it is crucial to carefully select and consider the metrics

used throughout the conceptual phases. This is necessary because the metrics are essential

in decision-making, and their selection can signiﬁcantly impact the outcome [

]. ML’s

purpose is to anticipate future occurrences or situations unknown to the computer. It

belongs to the subﬁeld of artiﬁcial intelligence (AI) that synthesises the underlying cor-

relations between data and information via the systematic application of algorithms [

In 1959, Arthur Samuel deﬁned ML as “the ﬁeld of study that allows computers to learn

without being explicitly programmed”. He stated that training computers to learn from

experience would someday obviate the need for a signiﬁcant portion of this comprehensive

programming work [

]. The increasing prevalence of ML can be attributed to its ability to

describe underlying connections within massive data arrays, thereby solving challenges

in big data analytics, behavioural pattern identiﬁcation, and information evolution. In

addition, ML systems may be taught to classify the changing circumstances of a process

to represent changes in operational behaviour. As knowledge evolves under the impact

of new ideas and technologies, ML systems may detect disruptions to old models and

redesign and retrain themselves to adapt to and coevolve with the new information [

Using vegetation indices and multispectral bands in machine learning models has

proven to be a powerful tool in various applications. For instance, researchers have suc-

cessfully used vegetation indices derived from light reﬂectance properties of plants to

distinguish soybean from weeds, demonstrating the potential of these indices as decision-

support tools for weed identiﬁcation [

]. In precision agriculture, an automatic segmenta-

tion method combining vegetation indices with a Discriminative Common Vector Approach

classiﬁcation algorithm has outperformed traditional methods, facilitating sustainable pro-

duction [

]. Furthermore, a machine learning model using the extreme gradient boosting

method is developed to predict vegetation growth throughout the growing season in China,

highlighting the potential of these techniques for monitoring vegetation dynamics and

crop growth [

]. Lastly, the selection of suitable Sentinel-2 bands and vegetation index

for crop classiﬁcation using artiﬁcial neural networks has been discussed, underscoring

the importance of these parameters in enhancing classiﬁcation accuracy [

]. Mentioned

articles demonstrate the potential of incorporating vegetation indices and multispectral

channels in machine learning models for improved vegetation detection.

This study aims to leverage the power of remote sensing technologies, speciﬁcally

focusing on the Support Vector Machines (SVM) algorithm for forest detection and classiﬁ-

cation. The primary objectives of the research are:

•

To explore the potential of remote sensing in detecting and classifying forests, with a

particular focus on binary classiﬁcation;

•

To deﬁne optimal parameters of the SVM algorithm, speciﬁcally the C and gamma

parameters, for effective forest classiﬁcation;

•

To evaluate the advantages of binary classiﬁcation in forest detection and discuss its

implications for environmental management and conservation;

•

To contribute to the existing body of knowledge by introducing an original approach

to forest detection using remote sensing technologies.

The study’s ﬁndings are expected to provide valuable insights into the application

of remote sensing and SVM in forest detection, potentially informing future research and

practices in the ﬁeld.

Appl. Sci. 2023,13, 8289 4 of 24

2. Materials and Methods

The method used the SVM algorithm for satellite imagery classiﬁcation. In addition

to selected Sentinel-2 multispectral bands, vegetation indices were added to study their

ability to increase classiﬁcation accuracy (Figure 1) individually and as a group of indices.

Appl.Sci.2023,13,xFORPEERREVIEW4of25



Thestudy’sfindingsareexpectedtoprovidevaluableinsightsintotheapplicationof

remotesensingandSVMinforestdetection,potentiallyinformingfutureresearchand

practicesinthefield.

2.MaterialsandMethods

ThemethodusedtheSVMalgorithmforsatelliteimageryclassification.Inaddition

toselectedSentinel‐2multispectralbands,vegetationindiceswereaddedtostudytheir

abilitytoincreaseclassificationaccuracy(Figure1)individuallyandasagroupofindices.



Figure1.ImprovingForestDetectionUsingMachineLearningandRemoteSensingworkflowchart.

2.1.StudyArea

ThesoutheasternpartoftheRepublicofSerbiawaschosenastheareaofinterest

(AOI)(Figure2).Theareacovered1218km2(42×29km)withthecentralpointat575,100

and4,710,500(34TUTM/WGS84)or21.9147E,42.5429N(WGS84)coordinates.

ThecityofVranjeislocatedinthecentreofthestudyarea.Inasouthwest‐to‐north‐

eastdirection,theregionisintersectedbytheSouthMoravaRiver,whichcreatesavast,

flatregion.Theregion’snorthwestconsistsofhillyterrain,whereasthesoutheastisdom‐

inatedbymountainousterrain.ThelowestpointissituatedinthevalleyoftheSouthMo‐

ravaRiver,onthenorthernboundary331mabovesealevel(a.s.l.).Thehighestpointis

themountainpeakKoćurac(ZladovačkaPlaninamountain)inthesoutheasternpartof

thetestarea,whichis1558ma.s.l.(Figures2and3).Thestudyarea’svegetationconsists

ofseasonalcrops,meadows,pastures,andwoodlands.Inthenorthwestandsoutheast

partsoftheregionareforests.Mostforestcovercomprisesdeciduousspecies(beech,oak,

andothers),whereasconiferouswoods(spruce,fir,andothers)compriseasignificantly

minorportionofthelandarea.Accordingtogeomorphologicalproperties,lowlandplains

areprimarilyusedforagriculture,wherecultivatedarablecropssuchaswheat,maise,

andothersareplanted[22].



Figure2.Locationoftheresearcharea.Createdusing©OpenStreetMapcontributorsopendata[23].

Figure 1.

Improving Forest Detection Using Machine Learning and Remote Sensing workﬂow chart.

2.1. Study Area

The southeastern part of the Republic of Serbia was chosen as the area of interest (AOI)

(Figure 3). The area covered 1218 km

(42

29 km) with the central point at 575,100 and

4,710,500 (34T UTM/WGS84) or 21.9147 E, 42.5429 N (WGS84) coordinates.

The city of Vranje is located in the centre of the study area. In a southwest-to-northeast

direction, the region is intersected by the South Morava River, which creates a vast, ﬂat

region. The region’s northwest consists of hilly terrain, whereas the southeast is dominated

by mountainous terrain. The lowest point is situated in the valley of the South Morava

River, on the northern boundary 331 m above sea level (a.s.l.). The highest point is the

mountain peak Ko´curac (Zladovaˇcka Planina mountain) in the southeastern part of the

test area, which is 1558 m a.s.l. (Figures 2and 3). The study area’s vegetation consists of

seasonal crops, meadows, pastures, and woodlands. In the northwest and southeast parts

of the region are forests. Most forest cover comprises deciduous species (beech, oak, and

others), whereas coniferous woods (spruce, ﬁr, and others) comprise a signiﬁcantly minor

portion of the land area. According to geomorphological properties, lowland plains are

primarily used for agriculture, where cultivated arable crops such as wheat, maise, and

others are planted [22].

Appl.Sci.2023,13,xFORPEERREVIEW5of25



Figure3.Vranjeanditssurroundings—Areaofinterest.MapcreatedusingESAremotesensing

dataand©OpenStreetMapcontributorsopendata[22–24].

2.2.SatelliteImageryProcessing

Sentinel‐2Data(TestData1)

MaterialsusedinthisstudyprimarilycontainedSentinel‐2multispectralbandscap‐

turedon13September2021,processedbyESA(TestData1),andobtainedusingCoper‐

nicusSci‐Hub[25].TheSentinel‐2productwascharacterisedbygranulesindicativeofa

specificgeographicallocation.Eachgranulecomprisedthirteenuniquespectralbands,

categorisedintothreedistinctgroundresolutionlevels:10m,20m,and60m.The10m

bandswere:visibleBlue(B),Green(G),Red(R),andNearInfraRed(NIR);20mbands

wereVegetationRedEdgebands,NarrowNIR,andtwoShortWaveInfraRed(SWIR)

bands;and60mbandswereCoastalAerosol,Water,VapourandSWIRCirrusbands(Ta‐

ble1)[26].

Table1.ListofSentinel‐2bandsusedforclassificationasTestData1.Thistableiscreatedusingthe

dataprovidedinSentinel‐2MSIUserGuide[26]document.

BandLabelGSDResolution(m)Wavelength(nm)

B02Blue10457–522

B03Green10542–577

B04Red10647–682

B05Red‐edge120697–712

B06Red‐edge220732–747

B07Red‐edge20773–793

B08Near‐infrared(NIR)10784–899

B8ANear‐infrarednarrow(NIRn)20855–875

B10Shortwaveinfrared/Cirrus601360–1390

B11Shortwaveinfrared1(SWIR1)201565–1655

Datapreparationincludedseveralsub‐proceduresthatwereexecutedonSentinel‐2

bands.

Figure 2.

Vranje and its surroundings—Area of interest. Map created using ESA remote sensing data

and © OpenStreetMap contributors open data [22–24].

Appl. Sci. 2023,13, 8289 5 of 24