Table 4 - uploaded by Sergio Marconi
Content may be subject to copyright.
Comparison of Jaccard scores among submissions and baseline.

Comparison of Jaccard scores among submissions and baseline.

Source publication
Article
Full-text available
Ecology has reached the point where data science competitions, in which multiple groups solve the same problem using the same data by different methods, will be productive for advancing quantitative methods for tasks such as species identification from remote sensing images. We ran a competition to help improve three tasks that are central to conve...

Context in source publication

Context 1
... submissions performed well below the optimal score, but well above the baseline prediction. The highest-performing method, as determined by the Jaccard scoring function, achieved score of 0.3402 (Table 4). In comparison, our baseline system only has a score of 0.0863. ...

Citations

... Scholl et al. (2020) followed with a four species model at the Niwot Ridge NEON site in Colorado (NIWO). Two data science competitions, one focused on the OSBS site (Marconi et al., 2019) and the second combining data from the OSBS, Talladega, Alabama (TALL) and Mountain Lake, Virginia (MLBS) sites in the Southeastern United States , used NEON forest plot data with 33 species, of which only 15 had more than five individuals. The lack of data for less common species was the primary factor in poor model performance. ...
Article
Full-text available
Measuring forest biodiversity using terrestrial surveys is expensive and can only capture common species abundance in large heterogeneous landscapes. In contrast, combining airborne imagery with computer vision can generate individual tree data at the scales of hundreds of thousands of trees. To train computer vision models, ground‐based species labels are combined with airborne reflectance data. Due to the difficulty of finding rare species in a large landscape, many classification models only include the most abundant species, leading to biased predictions at broad scales. For example, if only common species are used to train the model, this assumes that these samples are representative across the entire landscape. Extending classification models to include rare species requires targeted data collection and algorithmic improvements to overcome large data imbalances between dominant and rare taxa. We use a targeted sampling workflow to the Ordway Swisher Biological Station within the US National Ecological Observatory Network (NEON), where traditional forestry plots had identified six canopy tree species with more than 10 individuals at the site. Combining iterative model development with rare species sampling, we extend a training dataset to include 14 species. Using a multi‐temporal hierarchical model, we demonstrate the ability to include species predicted at <1% frequency in landscape without losing performance on the dominant species. The final model has over 75% accuracy for 14 species with improved rare species classification compared to 61% accuracy of a baseline deep learning model. After filtering out dead trees, we generate landscape species maps of individual crowns for over 670 000 individual trees. We find distinct patches of forest composed of rarer species at the full‐site scale, highlighting the importance of capturing species diversity in training data. We estimate the relative abundance of 14 species within the landscape and provide three measures of uncertainty to generate a range of counts for each species. For example, we estimate that the dominant species, Pinus palustris accounts for c. 28% of predicted stems, with models predicting a range of counts between 160 000 and 210 000 individuals. These maps provide the first estimates of canopy tree diversity within a NEON site to include rare species and provide a blueprint for capturing tree diversity using airborne computer vision at broad scales.
... Just as the proposal of the imagenet dataset [18] has greatly advanced the development of deep learning techniques, many researchers in the field of remote sensing image-based SDM research have started to produce and open source their own datasets for comparing the performance and accuracy of different models, while advancing the technology in the field. Marconi et al. (2019) held a competition for tree species classification based on remote sensing images, which offered three tracks on canopy segmentation, tree alignment, and species classification, contributing to the development of remote sensing data for ecological and biological methods [19]. Lorieul et al. (2020) organized a competition called GeoLife-CLEF to study the relationship between the environment and the possible occurrence of species, a dataset that collected 1.9 million observations with their corresponding remote sensing data, and it is the largest open-sourced for studying species distribution [20]. ...
... Just as the proposal of the imagenet dataset [18] has greatly advanced the development of deep learning techniques, many researchers in the field of remote sensing image-based SDM research have started to produce and open source their own datasets for comparing the performance and accuracy of different models, while advancing the technology in the field. Marconi et al. (2019) held a competition for tree species classification based on remote sensing images, which offered three tracks on canopy segmentation, tree alignment, and species classification, contributing to the development of remote sensing data for ecological and biological methods [19]. Lorieul et al. (2020) organized a competition called GeoLife-CLEF to study the relationship between the environment and the possible occurrence of species, a dataset that collected 1.9 million observations with their corresponding remote sensing data, and it is the largest open-sourced for studying species distribution [20]. ...
Article
Full-text available
Species distribution models (SDMs) are critical in conservation decision-making and ecological or biogeographical inference. Accurately predicting species distribution can facilitate resource monitoring and management for sustainable regional development. Currently, species distribution models usually use a single source of information as input for the model. To determine a solution to the lack of accuracy of the species distribution model with a single information source, we propose a multimodal species distribution model that can input multiple information sources simultaneously. We used ResNet50 and Transformer network structures as the backbone for multimodal data modeling. The model’s accuracy was tested using the GEOLIFE2020 dataset, and our model’s accuracy is state-of-the-art (SOTA). We found that the prediction accuracy of the multimodal species distribution model with multiple data sources of remote sensing images, environmental variables, and latitude and longitude information as inputs (29.56%) was higher than that of the model with only remote sensing images or environmental variables as inputs (25.72% and 21.68%, respectively). We also found that using a Transformer network structure to fuse data from multiple sources can significantly improve the accuracy of multimodal models. We present a novel multimodal model that fuses multiple sources of information as input for species distribution prediction to advance the research progress of multimodal models in the field of ecology.
... High spectral variance at the edges of ITCs is typically caused by contamination of edge pixels with ground cover or neighboring trees [60]. To address these issues, researchers have used a variety of data including hyperspectral [25,[61][62][63], lidar [27,63], and temporal image stacks [64] to segment tree ITCs (also see [29]). Where there are no gaps between ITCs, most segmentation routines cannot separate them using spectral data alone. ...
... High spectral variance at the edges of ITCs is typically caused by contamination of edge pixels with ground cover or neighboring trees [60]. To address these issues, researchers have used a variety of data including hyperspectral [25,[61][62][63], lidar [27,63], and temporal image stacks [64] to segment tree ITCs (also see [29]). Where there are no gaps between ITCs, most segmentation routines cannot separate them using spectral data alone. ...
Article
A common forest restoration goal is to achieve a spatial distribution of trees consistent with historical forest structure, which can be characterized by the distribution of individuals, clumps, and openings (ICO). With the stated goal of restoring historical spatial patterns comes a need for effectiveness monitoring at appropriate spatial scales. Airborne light detection and ranging (LiDAR) can be used to identify individual tree locations and collect data at landscape scales, offering a method of analyzing tree spatial distributions over the scales at which forest restoration is conducted. In this study, we investigated whether tree locations identified by airborne LiDAR data can be used with existing spatial analysis methods to quantify ICO distributions for use in restoration effectiveness monitoring. Results showed fewer large clumps and large openings, and more small clumps and small openings relative to historical spatial patterns, suggesting that the methods investigated in this study can be used to monitor whether restoration efforts are successful at achieving desired tree spatial patterns. Study Implications: Achieving a desired spatial pattern is often a goal of forest restoration. Monitoring for spatial pattern, however, can be complex and time-consuming in the field. LiDAR technology offers the ability to analyze spatial pattern at landscape scales. Preexisting methods for evaluation of the distribution of individuals, clumps, and openings were used in this study along with LiDAR individual tree detection methodology to assess whether a forest restoration project implemented in a Southern Oregon landscape achieved desired spatial patterns.
... In densely forested areas neighboring tree crowns tend to overlap, may be similar in appearance, and are often multistoried, making it difficult to tell where one tree ends and another begins. This is further complicated by the unpredictable growth patterns of trees; in short, tree crowns grow in irregular shapes [10]. Adding to this, optical remote sensing technology, which is commonly used for forest remote sensing, can suffer from limited pixel resolution and noise. ...
Preprint
Full-text available
p>Automated individual tree crown (ITC) delineation plays an important role in forest remote sensing. Accurate ITC delineation benefits biomass estimation, allometry estimation, and species classification among other forest related tasks, all of which are used to monitor forest health and make important decisions in forest management. In this paper, we introduce Neuro-Symbolic DeepForest, a convolutional neural network (CNN) based ITC delineation algorithm that uses a neuro-symbolic framework to inject domain knowledge (represented as rules written in probabilistic soft logic) into a CNN. We create rules that encode concepts for competition, allometry, constrained growth, mean ITC area, and crown color. Our results show that the delineation model learns from the annotated training data as well as the rules and that under some conditions, the injection of rules improves model performance and affects model bias. We then analyze the effects of each rule on its related aspects of model performance.</p
... In densely forested areas neighboring tree crowns tend to overlap, may be similar in appearance, and are often multistoried, making it difficult to tell where one tree ends and another begins. This is further complicated by the unpredictable growth patterns of trees; in short, tree crowns grow in irregular shapes [10]. Adding to this, optical remote sensing technology, which is commonly used for forest remote sensing, can suffer from limited pixel resolution and noise. ...
Preprint
Full-text available
p>Automated individual tree crown (ITC) delineation plays an important role in forest remote sensing. Accurate ITC delineation benefits biomass estimation, allometry estimation, and species classification among other forest related tasks, all of which are used to monitor forest health and make important decisions in forest management. In this paper, we introduce Neuro-Symbolic DeepForest, a convolutional neural network (CNN) based ITC delineation algorithm that uses a neuro-symbolic framework to inject domain knowledge (represented as rules written in probabilistic soft logic) into a CNN. We create rules that encode concepts for competition, allometry, constrained growth, mean ITC area, and crown color. Our results show that the delineation model learns from the annotated training data as well as the rules and that under some conditions, the injection of rules improves model performance and affects model bias. We then analyze the effects of each rule on its related aspects of model performance.</p
... But, further improvement of the results would be expected if LiDAR data from the leaf-on and leaf-off phases had been considered in combination [10,47]. The fact that the generated MDG and MDA values of this variable are relatively low compared to the data from the multispectral mosaic (MDG = 18, MDA = 16) can possibly also be attributed to the underlying data preparation or crown segmentation, as is also the case in the studies of Marconi et al. [50]. ...
Article
Full-text available
For monitoring protected forest landscapes over time it is essential to follow changes in tree species composition and forest dynamics. Data driven remote sensing methods provide valuable options if terrestrial approaches for forest inventories and monitoring activities cannot be applied efficiently due to restrictions or the size of the study area. We demonstrate how species can be detected at a single tree level utilizing a Random Forest (RF) model using the Black Forest National Park as an example of a Central European forest landscape with complex relief. The classes were European silver fir (Abies alba, AA), Norway spruce (Picea abies, PA), Scots pine (Pinus sylvestris, PS), European larch (Larix decidua including Larix kampferii, LD), Douglas fir (Pseudotsuga menziesii, PM), deciduous broadleaved species (DB) and standing dead trees (snags, WD). Based on a multi-temporal (leaf-on and leaf-off phenophase) and multi-spectral mosaic (R-G-B-NIR) with 10 cm spatial resolution, digital elevation models (DTM, DSM, CHM) with 40 cm spatial resolution and a LiDAR dataset with 25 pulses per m2, 126 variables were derived and used to train the RF algorithm with 1130 individual trees. The main objective was to determine a subset of meaningful variables for the RF model classification on four heterogeneous test sites. Using feature selection techniques, mainly passive optical variables from the leaf-off phenophase were considered due to their ability to differentiate between conifers and the two broader classes. An examination of the two phenological phases (using the difference of the respective NDVIs) is important to clearly distinguish deciduous trees from other classes including snags (WD). We also found that the variables of the first derivation of NIR and the tree metrics play a crucial role in discriminating PA und PS. With this unique set of variables some classes can be differentiated more reliably, especially LD and DB but also AA, PA and WD, whereas difficulties exist in identifying PM and PS. Overall, the non-parametric object-based approach has proved to be highly suitable for accurately detecting (OA: 89.5%) of the analyzed classes. Finally, the successful classification of complex 265 km2 study area substantiates our findings.
... Specific metrics such as tree height, crown height, canopy cover, vegetation area, and the structure of the canopy and understory have been measured by using terrestrial or aerial laser scanning and/or high-resolution orthophotos (Kosmala et al. 2016). In other studies, individual tree crowns have been delineated from imaging spectroscopy and LiDAR (Light Detection and Ranging) data using convolutional neural networks or rule-based region growing (e.g., watershed and local-maxima filtering approaches) (Pouliot et al. 2002, Zhou et al. 2018b, Dalponte et al. 2019, Marconi et al. 2019a, McMahon 2019, Weinstein et al. 2019. ...
Article
Full-text available
A core goal of the National Ecological Observatory Network (NEON) is to measure changes in biodiversity across the 30‐yr horizon of the network. In contrast to NEON’s extensive use of automated instruments to collect environmental data, NEON’s biodiversity surveys are almost entirely conducted using traditional human‐centric field methods. We believe that the combination of instrumentation for remote data collection and machine learning models to process such data represents an important opportunity for NEON to expand the scope, scale, and usability of its biodiversity data collection while potentially reducing long‐term costs. In this manuscript, we first review the current status of instrument‐based biodiversity surveys within the NEON project and previous research at the intersection of biodiversity, instrumentation, and machine learning at NEON sites. We then survey methods that have been developed at other locations but could potentially be employed at NEON sites in future. Finally, we expand on these ideas in five case studies that we believe suggest particularly fruitful future paths for automated biodiversity measurement at NEON sites: acoustic recorders for sound‐producing taxa, camera traps for medium and large mammals, hydroacoustic and remote imagery for aquatic diversity, expanded remote and ground‐based measurements for plant biodiversity, and laboratory‐based imaging for physical specimens and samples in the NEON biorepository. Through its data science‐literate staff and user community, NEON has a unique role to play in supporting the growth of such automated biodiversity survey methods, as well as demonstrating their ability to help answer key ecological questions that cannot be answered at the more limited spatiotemporal scales of human‐driven surveys.
... Only 280 samples were selected for training and testing datasets as these samples were clearly visible in the CHM and hyperspectral data. Approximately two-thirds of tree locations were selected for training and one-third for validation (Duro, Franklin, and Dubé 2012;Breiman 2001;Lu et al. 2020;Marconi et al. 2019) (Table 3 and Figure 4). The training sets were applied to train the random forest classifier, and test sets to validate the object-based species classification. ...
... Mäyrä et al. (2021) found the tree matching rate ranged from 42.8% to 63.4%, and Lu et al. (2020) segmented only one species (larch) with an accuracy of 88.69%. Marconi et al. (2019) delineated on an average 34% of individual tree crowns matching with the manually delineated trees. We found most of the studies used 'trial-and-error' rule-based techniques (Moffett and Gorelick 2013;Ma et al. 2017) to visually select the final segments for classification purposes (Zhang et al. 2016;Man et al. 2020). ...
... The previous studies suggested that it was difficult to segment complex heterogeneous forests with overlapping branches. Classification accuracies depended on forest types and crown size distributions (Man et al. 2020;Marconi et al. 2019). The novelty of our study compared to previous research was to apply four combinations of datasets with a range of scales (5-45), shape, and compactness (0.05-0.95) parameters (Table 2) and explore the optimal tree crown segmentation in a complex wet eucalypt forest for species classification (Table 4). ...
Article
To sustainably manage forest biodiversity and monitor changes in species patterning, mapping the spatial distribution of tree species is indispensable. Remote sensing can provide powerful tools for mapping species, but this task is complex in areas with high plant diversity and multi-layered canopies. This paper addresses the issue of classifying wet eucalypt forest plants by examining tree crown segmentation and species classification using different combinations of remote sensing datasets against mapped tree locations. This study explores optimal segmentation parameters for tree crown delineation compared to manually digitized tree crowns. The best segmentation accuracy of 88.71%, resulted from segmenting a combined Minimum Noise Fraction (MNF) dataset derived from hyperspectral imagery (HSI) and a LiDAR-derived Canopy Height Model (CHM). Object-based classification of tree species was performed using a random forest classifier. The fused dataset of MNF and CHM produced the highest overall accuracy of 78.26% for four vegetation classes, while the fused HSI, indices, and CHM performed best (66.67%) with five vegetation classes. However, both approaches had a high overall performance. The CHM contributed to tree crown segmentation and species classification accuracy, and fused datasets were more robust to spatially discriminate wet eucalypt forest species compared to a single dataset. Eucalyptus obliqua was classified with the highest accuracy of 90.86% for four classes using the fused MNF and CHM dataset, and 86.11% for five classes using the fused HSI, indices, and CHM dataset. An important understorey species – the tree fern (Dicksonia antarctica) – was classified with the highest accuracy of 83.54% for four classes using HSI. Therefore, fusing hyperspectral and LiDAR data could classify both the overstorey and dominant understorey species, and thus play a crucial role in identifying forest biological diversity. This approach will be useful for forest managers and ecologists to plan sustainable management of eucalypt forest biodiversity and produce maps for monitoring species of interest.
... The weight map also allowed an understanding of the uniformly shaped areas of the forest, which could be helpful when creating the boundaries of forest inventory stands. Other authors [27][28][29][30] noted the technological possibilities of applying both machine learning and remote sensing data processing methods to classify vegetation and obtain forest ecological information. After constructing the weight map, we proceeded to place objects by superimposing tree coordinate point features from map A1 (Figure 3) over point features from modeled region B1 (Figure 4). ...
... The weight map also allowed an understanding of the uniformly shaped areas of the forest, which could be helpful when creating the boundaries of forest inventory stands. Other authors [27][28][29][30] noted the technological possibilities of applying both machine learning and remote sensing data processing methods to classify vegetation and obtain forest ecological information. The next step was to add tree species models. ...
Article
Full-text available
This article discusses the process of creating a digital forest model based on remote sensing data, three-dimensional modeling, and forest inventory data. Remote sensing data of the Earth provide a fundamental tool for integrating subsequent objects into a digital forest model, enabling the creation of an accurate digital model of a selected forest quarter by using forest inventory data in educational and experimental forestry, and providing a valuable and extensive database of forest characteristics. The formalization and compilation of technologies for connecting forest inventory databases and remote sensing data with the construction of three-dimensional tree models for a dynamic display of changes in forests provide an additional source of data for obtaining new knowledge. The quality of forest resource management can be improved by obtaining the most accurate details of the current state of forests. Using machine learning and regression analysis methods as part of a digital model, it is possible to visually assess the course of planting growth, changes in species composition, and other morphological characteristics of forests. The goal of digital, interactive forest modeling is to create virtual simulations of the future status of forests using a combination of predictive forest inventory models and machine learning technology. The research findings provide a basic idea and technique for developing local digital forest models based on remote sensing and data integration technologies.
... By participating in an open competition, teams are encouraged to innovate and accelerate their computational methods development (Carpenter, 2011). An earlier iteration of this competition used NEON data from a single forest to convert images into information on individual trees (Marconi et al., 2019), while this 2020 competition used data from three sites to compare how transferable teams' methods were to unseen sites. Classifier transferability to out-of-sample spatial, temporal, and geographic regions is particularly important in cases where data are limited (Wu et al., 2006;Moon et al., 2017). ...
Article
Full-text available
Airborne remote sensing offers unprecedented opportunities to efficiently monitor vegetation, but methods to delineate and classify individual plant species using the collected data are still actively being developed and improved. The Integrating Data science with Trees and Remote Sensing (IDTReeS) plant identification competition openly invited scientists to create and compare individual tree mapping methods. Participants were tasked with training taxon identification algorithms based on two sites, to then transfer their methods to a third unseen site, using field-based plant observations in combination with airborne remote sensing image data products from the National Ecological Observatory Network (NEON). These data were captured by a high resolution digital camera sensitive to red, green, blue (RGB) light, hyperspectral imaging spectrometer spanning the visible to shortwave infrared wavelengths, and lidar systems to capture the spectral and structural properties of vegetation. As participants in the IDTReeS competition, we developed a two-stage deep learning approach to integrate NEON remote sensing data from all three sensors and classify individual plant species and genera. The first stage was a convolutional neural network that generates taxon probabilities from RGB images, and the second stage was a fusion neural network that “learns” how to combine these probabilities with hyperspectral and lidar data. Our two-stage approach leverages the ability of neural networks to flexibly and automatically extract descriptive features from complex image data with high dimensionality. Our method achieved an overall classification accuracy of 0.51 based on the training set, and 0.32 based on the test set which contained data from an unseen site with unknown taxa classes. Although transferability of classification algorithms to unseen sites with unknown species and genus classes proved to be a challenging task, developing methods with openly available NEON data that will be collected in a standardized format for 30 years allows for continual improvements and major gains for members of the computational ecology community. We outline promising directions related to data preparation and processing techniques for further investigation, and provide our code to contribute to open reproducible science efforts.