ArticlePDF Available

Segment Anything for Comprehensive Analysis of Grapevine Cluster Architecture and Berry Properties

American Association for the Advancement of Science
Plant Phenomics
Authors:

Abstract and Figures

Grape cluster architecture and compactness are complex traits influencing disease susceptibility, fruit quality, and yield. Evaluation methods for these traits include visual scoring, manual methodologies, and computer vision, with the latter being the most scalable approach. Most of the existing computer vision approaches for processing cluster images often rely on conventional segmentation or machine learning with extensive training and limited generalization. The Segment Anything Model (SAM), a novel foundation model trained on a massive image dataset, enables automated object segmentation without additional training. This study demonstrates out-of-the-box SAM’s high accuracy in identifying individual berries in 2-dimensional (2D) cluster images. Using this model, we managed to segment approximately 3,500 cluster images, generating over 150,000 berry masks, each linked with spatial coordinates within their clusters. The correlation between human-identified berries and SAM predictions was very strong (Pearson’s r² = 0.96). Although the visible berry count in images typically underestimates the actual cluster berry count due to visibility issues, we demonstrated that this discrepancy could be adjusted using a linear regression model (adjusted R² = 0.87). We emphasized the critical importance of the angle at which the cluster is imaged, noting its substantial effect on berry counts and architecture. We proposed different approaches in which berry location information facilitated the calculation of complex features related to cluster architecture and compactness. Finally, we discussed SAM’s potential integration into currently available pipelines for image generation and processing in vineyard conditions.
This content is subject to copyright. Terms and conditions apply.
Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 1
RESEARCH ARTICLE
Segment Anything for Comprehensive
Analysis of Grapevine Cluster Architecture
and Berry Properties
Efrain Torres-Lomas1, Jimena Lado-Bega2, Guillermo Garcia-Zamora1,
and Luis Diaz-Garcia1*
1Department of Viticulture and Enology, University of California Davis, Davis, CA 95616, USA. 2Soil and
Water Department, Universidad de la Republica, Montevideo 11400, Uruguay.
*Address correspondence to: diazgarcia@ucdavis.edu
Grape cluster architecture and compactness are complex traits influencing disease susceptibility, fruit
quality, and yield. Evaluation methods for these traits include visual scoring, manual methodologies, and
computer vision, with the latter being the most scalable approach. Most of the existing computer vision
approaches for processing cluster images often rely on conventional segmentation or machine learning
with extensive training and limited generalization. The Segment Anything Model (SAM), a novel foundation
model trained on a massive image dataset, enables automated object segmentation without additional
training. This study demonstrates out-of-the-box SAM’s high accuracy in identifying individual berries
in 2-dimensional (2D) cluster images. Using this model, we managed to segment approximately 3,500
cluster images, generating over 150,000 berry masks, each linked with spatial coordinates within their
clusters. The correlation between human-identified berries and SAM predictions was very strong (Pearson’s
r2 = 0.96). Although the visible berry count in images typically underestimates the actual cluster berry
count due to visibility issues, we demonstrated that this discrepancy could be adjusted using a linear
regression model (adjusted R2 = 0.87). We emphasized the critical importance of the angle at which the
cluster is imaged, noting its substantial effect on berry counts and architecture. We proposed different
approaches in which berry location information facilitated the calculation of complex features related to
cluster architecture and compactness. Finally, we discussed SAM’s potential integration into currently
available pipelines for image generation and processing in vineyard conditions.
Introduction
Grape cluster architecture and compactness are important fruit
traits that inuence yield, quality, and susceptibility to pests and
diseases [1]. Cluster architecture is directly related to cluster
compactness, which describes the ratio between the volume
occupied by berries and the total cluster volume [2]. In other
words, cluster architecture determines the arrangement of berries
in a cluster and the distribution of free space. Cluster architecture
is complex, dicult to measure quantitatively, and determined
by many factors such as berry number, size, shape, and spatial
location, which all relate to the rachis ramication patterns [3].
While certain features of cluster architecture can be discerned
by looking at the cluster contour, a more precise analysis requires
the identication and spatial localization of the individual berries
within the cluster. Cluster architecture and compactness are
determined genetically, as many genomic regions have been
associated with trait variation [2–6]. However, environmental
factors such as temperature, humidity, nutrient availability, and
vineyard management, among others, are known to alter cluster
architecture and compactness directly or indirectly [1,2,7,8].
Understanding the factors that inuence cluster architecture
and compactness, and to what extent they do so, has implications
for vineyard management, breeding, and genetics research.
For example, high cluster compactness has been associated
with increased susceptibility to Botrytis bunch rot caused by
Botrytis cinerea [9–11]. is, in turn, has implications in terms
of vineyard management and cultivar preference, since fungi-
cide applications can better reach berries within the cluster in
the case of a more open, looser cluster. Furthermore, there is
a greater temperature variability between the inner and outer
berries in densely compacted clusters, impacting the matura-
tion rate [8]. Additionally, restricted sun exposure to berries
has been observed to intensify powdery mildew infections [12],
thereby inuencing fungicide application scheduling.
Exploring cluster architecture and compactness has been
the focus of several studies utilizing qualitative and quantitative
methods. Among qualitative approaches, researchers primarily
rely on the OIV descriptors, a set of denitions established by
the International Organization of Vine and Wine. For instance,
the descriptor OIV 204, which addresses cluster density or
compactness, categorizes grape clusters into 5 classications
ranging from very loose to very dense. Similarly, cluster archi-
tecture can be described using a combination of OIV 208—
bunch shape (cylindrical, conical, and funnel-shaped) and OIV
209—number of wings of the primary bunch (ranging from 1
Citation: Torres-LomasE,
Lado-BegaJ, Garcia-ZamoraG,
Diaz-GarciaL. Segment Anything
for Comprehensive Analysis of
Grapevine Cluster Architecture and
Berry Properties. Plant Phenomics
2024;6:Article 0202. https://doi.
org/10.34133/plantphenomics.0202
Submitted 16 February 2024
Accepted 24 May 2024
Published 27 June 2024
Copyright © 2024 Efrain Torres-Lomas
etal. Exclusive licensee Nanjing
Agricultural University. No claim
to original U.S. Government Works.
Distributed under a Creative
Commons Attribution License 4.0 (CC
BY 4.0).
Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 2
to 6 or more). Classifying clusters based on OIV descriptors
oen involves considering multiple characteristics simultane-
ously, which, while providing a comprehensive assessment, can
be challenging to replicate and scale. For example, Richter et al.
[2] studied an F1 mapping population derived from crossing
GF.GA-47-42 and Villard Blanc. eir study involved manually
recording individual cluster and berry traits (e.g., berry number,
cluster weight, rachis size and architecture, and shoulder length,
among others), which accounted for approximately half of the
observed variation compared to using the OIV 204 descriptor
alone. is emphasizes the complexity of cluster architecture
and how it is inuenced by various individual characteristics,
including cluster compactness.
Computer vision approaches can also be used to analyze clus-
ter architecture and compactness. In this case, available methods
involve 2-dimensional (2D) image analysis and 3D modeling,
which all have the capability of producing quantitative traits.
In many cases, the utilization of quantitative traits derived from
these imaging approaches has proven to be more eective in
genetics research and breeding compared to categorical traits
[13]. Depending on the algorithm, some studies have focused
on berry detection while others have focused only on whole cluster
analysis. For example, conventional segmentation on cluster
images generated in the lab has been used to assess berry color
and cluster architecture [4,14]. Cluster images generated directly
in the vineyard have been used also for cluster identication and
yield estimation using a variety of methods; however, prediction
accuracy has varied because of challenging light conditions or
occlusion [15–17]. Identifying and localizing berries within the
cluster is crucial for determining cluster architecture and com-
pactness. In this context, several approaches have been tested,
including robotic laser scanning systems to reconstruct 3D rep-
resentations of clusters and generate precise data regarding the
3D location of berries in a cluster [18]. Likewise, x-ray tomog-
raphy has been employed to scan grapevine inorescences and
model berry growth and infer phylogenetic relationships [19].
Partial 3D models of grape clusters have also been generated
using stereo-vision, which, in turn, allows berry counting [20].
Some other methodologies allow the estimation of berry numbers
from images taken directly in the eld. For example, in the work
of Luo et al. [21], the model developed allowed for an accurate
prediction of berry counts in Niagara grapes, which are generally
larger than most table and wine grapes. Neural networks have
also been applied for berry segmentation and counting, and
although they produced very accurate estimates, they were only
used on very immature clusters with limited berry growth,
low compactness, and sucient contrast between berries [22].
Furthermore, other methods based on convolutional neural net-
works and semantic segmentation have shown accurate estima-
tions of berry numbers in eld images, which might be of great
utility for, for example, yield prediction. However, using this
information to conduct cluster analysis is dicult, as the identi-
ed berries are not assigned to clusters [23].
Many of the image analysis-based methods used to describe
cluster architecture and compactness relied on traditional seg-
mentation methods. ese methods oen depend on labor-
intensive, customized functions, manually engineered features,
and error-prone thresholding designed for specic scenarios.
As an alternative, deep learning models for image analysis, with
their ability to capture latent image features, have shown promise
across various fields, including medicine, surveillance and
security, agriculture, biometrics, environmental sciences, and
remote sensing, among others. However, these models are typi-
cally designed and trained for specic segmentation tasks, and
unfortunately, their performance may substantially deteriorate
when applied to new tasks, dierent image types, or varying
external conditions. Large-scale foundational models have
revolutionized articial intelligence due to their remarkable
zero-shot and few-shot generalization capabilities across a broad
spectrum of downstream tasks [24,25]. Foundation models
are neural networks trained on vast datasets using innovative
learning methods and prompting objectives that generally
do not require conventional supervised training labels, which
makes them adaptable to a variety of external conditions [26].
e Segment Anything Model (SAM) is a new foundation model
that can be used as a zero-shot segmentation method [27]. SAM
can be used out of the box to segment a variety of objects in an
image, or can be ne-tuned for a specic task, such as the very
recently developed MedSAM [28]. SAM was built on the largest
segmentation dataset to date, with over 1 billion segmentation
masks [27]. To segment an object, SAM requires the user to
provide a prompt, which can take the form of a single point, a
polygon (similar to a mask), a bounding box, or just text [26].
In this study, we demonstrated the capabilities of SAM to
segment grape berries from 2D cluster images without addi-
tional model training or ne-tuning. Our research focused
on 4 main aspects: (1) measuring the accuracy of SAM in
identifying visible berries within a cluster image; (2) predicting
hidden berries in a cluster image and assessing the impact of
cluster imaging angle; (3) developing new quantitative methods
to describe cluster architecture based on berry distributions
within the clusters; and (4) assessing the repeatability of cluster
architecture and compactness traits in replicated experiments.
Materials and Methods
Plant material
Cluster images obtained from an F1 mapping population (n =
139 genotypes) derived from crossing Cabernet Sauvignon and
Riesling were used to test SAM. Both Cabernet Sauvignon and
Riesling, major wine grape cultivars around the world, display
contrasting cluster architectures. Cabernet Sauvignon clusters
are small to medium in size, conical, loose to well-lled, and with
medium-long peduncles. Its berries are small, round, and blue-
black. Riesling has smaller clusters, which can be cylindrical or
globular, and sometimes winged; clusters are compact and with
short peduncles. Riesling berries are small and round and have
a white-green skin coloration. is F1 progeny segregates for the
traits mentioned above, making it an ideal candidate to evaluate
the proposed pipeline. is population was planted in UC Davis
Experimental Station in Oakville, Napa County, CA, USA
(38°2545.4′′N; 122°2436.4′′W), in 2017. Vines were arranged
using a randomized complete block design with 3 blocks and
3 vines per experimental unit. For this study, one vine per experi-
mental unit was sampled (the one in the middle). For each vine,
5 representative clusters were imaged as described below.
Image capture
Five representative clusters per vine were imaged using the setup
shown in Fig. S1. e setup included a reference circle to nor-
malize measurements and account for potential variation in the
location of the camera relative to the cluster. e camera used
was a Canon EOS 70D with a 24-mm prime lens, an aperture
of f/5, and an exposure time of 1/500 s. Images were 5,472 × 3,648
Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 3
(~20 Mpx). All clusters were imaged from at least one angle. In
addition, all the clusters from a subset of 99 vines were imaged
from 3 additional angles (90°, 180°, and 270°). e latter was
used to assess complex architectures that result from the presence
of cluster ramications or wings, and that are visible only from
specic view angles.
A second image dataset was generated to validate the SAM
algorithm. is dataset consisted of cluster images, each one
accompanied by an image of all the individual berries detached
and individually placed on a white surface (Fig. S2).
Model and processing pipeline
e images described above, without editing their original
brightness or contrasts, were used as input for SAM. To reduce
the amount of pixels to be processed, a region of interest (ROI)
was manually dened, as indicated in Fig. 1. e pretrained
ViT-H (Huge Version) image encoder was used for the segmen-
tation phase (checkpoint available at https://dl.aipublicles.
com/segment_anything/sam_vit_h_4b8939.pth). e mask pre-
diction was executed by applying the Automatic Mask Generator
to the input, which was dened as the pixels within the ROI and
a prompt, described as an XY grid of points equally distributed
across the ROI. Dierent grid congurations, including 4 × 4,
6 × 6, 8 × 8, and so on, up to 62 × 62, were explored and tested
for eciency. Both the number of masks and the area increased
as the number of points in the grid increased, until reaching a
plateau at around 20 to 25 points; aer that, the increase was
marginal. e number of masks still increased beyond this point
(Fig. S3A and C), as more berries, mainly those partially hidden,
were found. ese berries, discovered at higher point densities,
were of smaller sizes, as the increase in total area aer reaching
about 30 points was negligible (Fig. S3B). e marginal increase
in the area or number of objects detected at higher grid densities
is also likely due to Segment Anythings reduction of image resolu-
tion, making smaller objects undetectable. To test this hypothesis,
a zoomed-in image of a cluster with numerous skin features (e.g.,
spots, color variations, and damages) was processed using a 256 ×
256 grid. is approach resulted in the detection of many smaller
features (Fig. S4), emphasizing the need to process a smaller set
of photos to optimize conditions. Aer these preliminary tests,
a 32 × 32 grid was chosen as it captured most of the grape objects
without unnecessary computational overhead. As a preliminary
analysis, SAM was executed using a graphic process unit (GPU),
massively parallel sequencing (MPS), and a central process unit
(CPU) platforms to compare any potential segmentation dif-
ferences; however, only computation time was aected. e out-
put produced by SAM comprised bounding boxes in XYWH
format, area, predicted intersection over union (IoU), stability
scores, and mask segments formatted as COCO Run Length
Encoding (RLE). e implementation of SAM, including ROI
identication and automatic mask generation, was implemented
in Python 3.11. e hardware tested was a g3.4xlarge AWS instance
(single GPU, 16 GB RAM) and a System76 workstation (32 CPU,
256 GB RAM). Details on specic dependencies are available in
the following GitHub repository: https://github.com/diazgarcialab/
SAM-cluster-segmentation.
e RLE mask segments were decoded using pycocotools
(https://github.com/cocodataset/cocoapi/blob/master/
PythonAPI/pycocotools/mask.py) to derive the x and y coordi-
nates of the mask contours and their position within the cluster.
ese coordinates were analyzed using the R package Momocs
[29] to compute various parameters such as berry area, length,
width, aspect ratio, perimeter, and color (represented as median
red, green, and blue values). SAM is a segmentation tool rather
than a classier. As such, the segmented masks it produces may
include, in addition to berries, other objects such as the clamp
used to hold the clusters or the reference circle for size normaliza-
tion. ese objects can be easily identied and distinguished
from berries due to their contrasting morphology and size, as
described below. More oen, some masks may encompass 2 or
more berries, which were addressed using the IoU estimates. IoU
is a metric used to evaluate the overlap between 2 bounding boxes
or masks, commonly employed when assessing the accuracy of
image segmentation models. In this study, IoU was calculated by
determining the size of the overlapping region between 2 masks
detected by SAM. For example, in instances where an overlapping
mask covers 2 berries, each with its own mask, the overlapping
mask will exhibit a larger size and IoU. Furthermore, lters based
on criteria such as area, perimeter-to-area ratio, and aspect ratio
were implemented to exclude objects other than berries. To rene
the segmentation further, we employed a ltering approach using
elliptical Fourier descriptors (EFDs) and principal component
analysis (PCA) to eliminate non-berry objects, especially rachis
parts. Initially, the x and y coordinates of objects were transformed
into an “Out” object using Momocs soware, which facilitated
the computation of EFD harmonic coecients. ese coecients

Grid for object detection
(prompt for SAM)
Raw masks






Filter based on
basic descriptors
(IoU, area, perimeter, aspect ratio)
Other complex shapes
(EFD + PCA)

ROI
Fig.1.Summary of the pipeline employed for generating and processing SAM masks. The process for each image is detailed below. Firstly, the region of interest (ROI) housing the
cluster is identified. Subsequently, a grid of points separated by 88 × 171 pixels is utilized as input for object identification in SAM. Following this, masks undergo analysis based
on various parameters including intersect over union (IoU), area, perimeter, length, width, aspect ratio, and elliptical Fourier descriptors (EFDs) to discern non-berry objects.
Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 4
were then analyzed using PCA for visualization purposes, and
outliers were identied through 5 rounds of outlier detection.
Each round involved recalculating the harmonics and principal
components with a cleaner dataset, adopting a threshold of ±2
standard deviations among the rst 10 principal components.
Results
Characteristics of the model implementation and
implementation time
e implementation of SAM on a population of 387 vines and
1,935 dierent clusters resulted in 215,090 masks. For 99 of the
387 vines, all the clusters were imaged 4 times, each time at a
dierent angle (0°, 90°, 180°, and 270°), which resulted in 3,431
cluster images. e identied masks included, among other
things, individual berries, 2 or more berries, the clamp used to
hold the clusters in place, stains/discolorations in the back-
ground, the reference circle for size normalization, and rachis
segments. is outcome is expected as SAM utilizes an algo-
rithm for unsupervised object segmentation, and not classica-
tion, within an area of interest dened by the user. As a result
of the ltering, 32,425 masks containing 2 or more berries were
removed using IoU. Furthermore, since berries had an expected
size and aspect ratio, 23,125 masks with signicantly larger
areas or aspect ratios, or located far from the cluster (stains in
the background) were ltered out. Finally, the rest of the mask
contours were analyzed with Momocs [29] using a combination
of EFD and PCA, leading to the identication of 5,601 objects
other than berries. Aer this ltering step, the number of true
berry masks was 153,939 (61,151 masks discarded). Each clus-
ter had, on average, 44.87 berries (median = 42). Berry number
varied between 5 and 130, and variation showed a normal dis-
tribution (Fig. S5).
Computation time per photo varied depending on the num-
ber of points in the grid used to initialize the object search, as
well as the characteristics of the machine. In this study, we used
a conguration of 32 points per side (32 × 32), resulting in a
grid where points were horizontally separated by ~88 pixels
and vertically by ~171 pixels. On average, processing a photo
took 55 s using the CPU of the System76 workstation and 14 s
with the GPU on the AWS g3.4xlarge instance. Increasing
the grid density slightly improved the number of berries
detected, although the increase was very marginal (Fig. S2).
However, when the grid density was increased to 62 × 62
points—resulting in 114 pixels of horizontal separation and
59 pixels vertically—computation time increased to 4 min and
45 s on the CPU and GPU, respectively.
2D cluster representations predict berry number
and cluster size
Berry counts from clusters imaged at 4 dierent angles were
compared with the number of berries determined manually.
e “manual” determination of berries was conducted using 2
methods. e rst involved humans counting visible berries in
a subset of 100 images, and then comparing these counts with
SAM predictions. e second involved processing additional
images of 84 clusters from 17 vines where all the berries were
detached and placed individually on a surface. e analysis of
these images is straightforward since there is no touching
among berries, and there exists good contrast between the
berry and surface colors (Fig. S2). In addition to being used to
determine the true number of berries, these images also allowed
the comparison of berry size, assuming that the masks gener-
ated from isolated, uncompressed berries imaged from the top
approximate well to the real size of a berry.
As shown in Fig. 2A, the SAM algorithm does a very good
job nding and segmenting all the berries in the cluster, inde-
pendently of the angle it is being imaged. e berries identied
were fully visible, represented as circles, or partially visible (Fig.
2B). e correlation between the berry number determined by
humans and the SAM prediction was 0.96 (Fig. S6). ere was
also good agreement between SAM berry number predictions
and the number of berries calculated from images with the
individual berries (R
2
= 0.93, 5-fold cross-validation). However,
there was a clear underestimation, which varied depending on
the imaging angle (Fig. 2C). Overall, the underestimation was
approximately 50% of the real number but linear. In symmetric
clusters (e.g., cylindrical with no ramications or wings),
images from all 4 angles yielded similar berry counts. Conversely,
clusters with wings, as they were only visible from specic angles,
increased the berry count prediction. While the berry count was
underestimated, a linear regression model of the form y ~ β0 +
β1x was sucient to adjust the prediction considerably well
(adjusted R2 = 0.8723), as long as the cluster with the maximum
number of berries (from the 4 images taken at dierent angles)
was used in the model.
Berry size (measured as projected berry area) was more
challenging to predict (Fig. 2D). Predictions were mostly over-
estimations and varied signicantly depending on the imaging
angle. Most berries were between 120 and 150 mm2, with just
a few having smaller sizes (<100 mm2). Studying clusters with
more variation in berry size might be required to better assess
the correlation for this trait. Similar to berry counts, a linear
model was tted using all cluster views available for each clus-
ter. Since it appeared to be linear, the tted values were consis-
tent with the real size estimations (adjusted R2 = 0.8457).
Cluster angle matters
Not all the berries in a cluster can be seen from a given angle;
therefore, berry counts from 2D images were, as expected,
underestimated (Fig. 2C). While cylindrical clusters are more
common among cultivars, the presence of ramications or
wings, or other asymmetries, can impact the number of berries
visible from a single view. To measure the eect of the image
angle on the berry counts, 490 clusters from 99 vines were
imaged from 4 dierent angles (0°, 90°, 180°, and 270°), and
the berry counts and sizes were compared. In general, the berry
count can vary by approximately ±50%, depending on the angle
(Fig. 3A). As expected, opposing angles (0° and 180°, 90° and
270°) tend to have more similar results (Fig. 3B). In other
words, when the cluster ramication or wing is fully visible
from a given angle, it becomes invisible or hard to distinguish
when the cluster is rotated 90°, and becomes fully visible again
aer another 90° rotation. Berry size was less dependent on the
viewing angle (Fig. 3C). In general, berry size varied by +30%.
e extent of the variation in berry count as a function of view-
ing angle is shown in Fig. 2D.
Cluster architecture
A typical approach for measuring cluster architecture and com-
pactness is based on whole cluster segmentation instead of
berry segmentation (e.g., [4]). While this method provides
insightful information and is easy to implement, it ignores the
spatial distribution of berries within the cluster. Moreover, in
Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 5
90°
180° 270°
A
B
y = −13.38 + 2.25x
50
100
150
50 100 150 200
Real number of berries
Predicted number of berries
C
y = −12.63 + 0.93x
50
100
150
200
60 90 120 150
Real berry area (mm2)
Predicted berry area (mm2)
D
180°
270°
90°
Corrected
Fig.2.Prediction of berry number using SAM from cluster images. (A) Identification of individual berries from 4 angles on the same cluster. (B) Berry masks from cluster
images in panel A, color-coded by angle view. (C) Correlation between real and predicted berry counts from SAM; predicted counts for each angle view in panel A are displayed.
Points marked with an X represent corrected counts using the angle view with the maximum berries, adjusted with a linear model. (D) Correlation between real and predicted
berry area; color and shape patterns are similar to panel C; corrected points were generated with a linear model of the form y ~ β0 + β1x. The vertical red line indicates a one-
to-one relationship between variables.
−60
−30
0
30
60
90° 180° 270°
% of change in berry number
relative to angle 1 (0 degrees)
A
0
25
50
75
−40 04080
% of change in berry number relative to angle 1
Count
180° 270° 90°
B
0
20
40
−40 −20 0204060
% of change in max berry area relative to angle 1
Count
180° 180° 90°
C
n = 21
n = 21
n = 21
n = 24
n = 37
n = 33
n = 33
n = 33
n = 48
n = 48
n = 49
n = 48
n = 44
n = 43
n = 43
n = 35
n = 27
n = 40
n = 32
n = 45
n = 60
n = 42
n = 65
n = 50
n = 20
n = 31
n = 24
n = 31
n = 40
n = 59
n = 43
n = 56
D
Fig.3.Impact of imaging angle on cluster analysis. (A) Change in berry number relative to angle 1 (0°, first image); each green line represents a cluster imaged at
4 different angles. (B) Frequency plot of changes in berry number relative to angle 1, similar to panel A. (C) Frequency plot of changes in max berry area relative to
angle 1. (D) Examples illustrating the effect of berry angle on SAM-detected berry counts; each column represents a different cluster, and each row represents a
different angle (0°, 90°, 180°, and 270°). The number of detected berries is indicated in each image. The first 4 clusters show little variation, while the last 4 exhibit
extreme berry count variation.
Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 6
the setup used for photographing clusters, it is common to use
clamps, hooks, or clips to hang clusters, which can then be chal-
lenging to identify during image analysis or post-processing.
In those cases, a common strategy is to crop the top of the image
to remove such objects. When the peduncle is long, cropping
the image does not aect the analysis; however, in clusters with
short peduncles or prominent shoulders, cropping the image
results in cropped berries as well. In this study, although the
clamps (and other objects) were masked by SAM, because of
dierent colors, sizes, and shapes, they were easy to identify
and remove.
To illustrate the capabilities of cluster architecture analysis
using berry locations, empirical cumulative distribution func-
tions were developed along the y-axis (from the top of the cluster,
or the peduncle, to the bottom, or cluster tip) and the x-axis
(from le to right). e distribution functions provided dierent
levels of information. For example, they allowed the estimation
of symmetry along both the x- and y-axes. With these symmetry
estimators, cylindrical or globular clusters are expected to have
a more uniform cumulative distribution. On the other hand,
clusters with ramications or signicant ramications will show
a cumulative distribution along the x-axis skewed opposite to
the main ramication.
A cluster with a prominent wing, photographed from dif-
ferent angles, is provided as an example in Fig. 4A. At 0° and
180° views, the wing is not visible, as it is either in front and com-
pletely aligned with the main cluster or in the back. In this case,
the cluster appears more cylindrical and symmetrical along both
axes. e empirical cumulative distribution functions for these
2 views, shown as red and green dots in Fig. 4B and E, were more
uniform and appeared as straight diagonal lines. Conversely, at
90° and 270° views, the wing becomes visible and produces a
very skewed distribution along the x-axis. Since the 90° and 270°
views, and the 0° and 180° views, can be seen as “mirror” images,
the distribution functions in Fig. 4D and E also display this
mirroring feature.
Masks generated by SAM for each berry object were repre-
sented as x, y coordinates, and their corresponding polygons
were drawn, as shown in Figs. 2A and 3D. Combining all the
berry polygons produced a representation of entire clusters. When
a cluster has a cylindrical or globular shape, and no wings are
present, representing its shape is simple. However, when other
cluster features are present, such as wings, shoulders, and conical
forms, among others, the so-called cluster shape descriptor
can vary depending on how detailed these complex features are
represented.
For example, for a cluster with a prominent wing, as the one
shown in Fig. 4A, should the outline (or contour) dening the
cluster shape include the sinus formed by the 2 wings? If so, how
far inside the sinus? e opposite approach would be to simply
90°
180° 270°
A
0.00
0.25
0.50
0.75
1.00
0200 400600 800
x coordinate (shifted to start in 0)
Fn(x)
B
0.00
0.25
0.50
0.75
1.00
025507
5100
Normalized x coordinate
Fn(x)
D
0.00
0.25
0.50
0.75
1.00
0250 500750
y coordinate (shifted to start in 0)
Fn(y)
C
0.00
0.25
0.50
0.75
1.00
025507
5100
Normalized y coordinate
Fn(y)
E
180°
270°
90°
1,000
Fig.4.Cumulative distributions of berry locations along the horizontal and vertical axes. (A) Example of berries identified in a cluster imaged from 4 different angles (0°, 90°, 180°, and
270°). (B) Empirical cumulative distributions along the x-axis for the 4 angle views; berry locations along the x-axis are shifted to start at 0. (C) Similar to panel B, but for the y-axis.
(D) Similar to panel B, but berry locations along the x-axis are scaled from 0 to 100 and sampled with n = 100. (E) Similar to panel D, but for berry locations along the y-axis.
Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 7
Fig.5.What is cluster architecture? Example of concave hull calculation for different clusters (in columns) at different cluster shape definition levels (from top to bottom,
higher to lower definition); concave hulls are calculated on the union of all berry masks in the cluster.
Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 8
connect the tips of the wing and the main cluster formation,
which would produce a simpler polygon. e same applies to
the presence of shoulders and curvatures along the cluster.
Figure 5 illustrates the same 8 clusters from Fig. 3D, outlined
using concave hulls with varying degrees of detail, from top to
bottom. At the top panels, the cluster outlines preserved detailed
features such as shoulders, indentations, separations between
wings, etc. Toward the bottom part of the gure, most of these
features were lost.
e approaches described above to measure cluster archi-
tecture were applied to all 3,431 cluster representations analyzed
in this study. Cumulative distribution functions for axes x and
y showed varying levels of asymmetry. Along the x-axis (Fig. 6A),
the asymmetry is due to having more berries either on the right
or the le side of the cluster (likely because of the presence of
a wing). For example, the green lines in Fig. 6A represent the
distribution functions of clusters in which more berries exist
on the le side of the cluster (as much as ~75%). Conversely,
the purple lines represent clusters with a larger accumulation
of berries on the right side of the cluster. Finally, gray lines
represent more symmetrical clusters, with an equal amount of
berries on the le and right. Regarding the y-axis (Fig. 6B), most
of the asymmetry is toward the base of the cluster, which is
expected, as many clusters exhibit conical forms. Importantly,
the color assignations (i.e., categories) in Fig. 6A and B are sub-
jective and for illustrative purposes only.
en, cluster shape variation was studied using the polygons
generated using concave hulls. e concave hulls were generated
using a conservative level of cluster feature preservation (using
the function R function sf::st_concave_hull(), with ratio=5) but
with enough resolution to capture major asymmetries, wings,
and shoulders. In general, cluster shape exhibited a continuous
gradient of variability with no clear group formation (Fig. 6C
and D). In other words, there were no groups formed only with,
for example, winged and non-winged clusters, or symmetric
and non-symmetric clusters. Instead, asymmetries can be small
and slightly visible, and increase gradually in size and separation
from the main cluster. To understand what cluster features were
associated with each PC, 100 clusters with extreme PC scores
(50 more negative and 50 more positive) were plotted for PCs
1 to 4 (Fig. 6E). PC1, which explained 53.23% of the variation,
was associated with aspect ratio, with more circular/globular
clusters having more negative values, and very elongated clusters
with more positive values. PC2, which explained 18.29% of the
variation, was associated with the location of the asymmetries
along the x-axis (either to the le or the right). Finally, both PCs
3 and 4, which accounted for a little less than 18%, explained
other more complex features (wings and shoulders) that are
more dicult to discern.
Is the level of sensitivity to complex cluster
features meaningful?
e methodologies employed in this study for identifying ber-
ries within a cluster, counting them, studying their spatial dis-
tribution to generate cumulative distribution functions, and
0.00
0.25
0.50
0.75
1.00
0255075100
Normalized x coordinate
Fn(x)
Non-symmetric
(more berries to the left)
Non-symmetric
(more berries to the right)
Symmetric
A
0.00
0.25
0.50
0.75
1.00
0255075100
Normalized y coordinate
Fn(y)
Non-symmetric
(more berries at the base)
Non-symmetric
(more berries at the tip)
Symmetric
(cylindrical)
B
−0.2
−0.1
0.0
0.1
0.2
−0.20.0 0.2
PC1 (54.42%)
PC2 (17.11%)
−0.2
−0.1
0.0
0.1
0.2
PC3
PC4
−0.1
0.0
0.1
C
−0.1
0.0
0.1
−0.3 −0.2 −0.10.0 0.10.2
PC3 (13.66%)
PC4 (3.57%)
PC2
−0.2
−0.1
0.0
0.1
0.2
−0.2
0.0
0.2
PC1
D
E
Fig.6.Comprehensive analysis of cluster architecture using cumulative distribution function and PCA of concave hulls. Empirical cumulative distributions for 3,431 clusters
using berry locations along the (A) x- and (B) y-axes; berry locations along both x- and y-axes are scaled from 0 to 100 and sampled with n = 100, similar to Fig. 3D and E. In
both cases, the green lines correspond to distributions with a normalized coordinate 25 larger than 0.3 and a normalized coordinate 75 larger than 0.8; the purple lines have
a normalized coordinate 25 < 0.2 and a normalized coordinate 75 < 0.7; finally, the gray lines have a coordinate 25 between 0.2 and 0.3, coordinate 50 between 0.45 and
0.55, and coordinate 75 between 0.7 and 0.8. Variation in cluster architecture along principal components 1 and 2 (B) and 3 and 4 (C). In panel C, different colors and sizes
correspond to variations in principal components 3 and 4, respectively. Similarly, in panel D, point color and size correspond to variations in principal components 1 and 2,
respectively. (E) One hundred clusters sampled from the extremes of principal components 1 (green), 2 (gray), 3 (dark cyan), and 4 (salmon); the clusters in each color group
are ordered from left to right and by rows according to their corresponding principal component values.
Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 9
applying PCA to examine cluster shape variation demonstrated
high sensitivity (Fig. 6). However, a critical question is: are these
features primarily driven by genetic variation, or are they sim-
ply a result of environmental and non-genetic factors?
e primary aim of this research was to implement SAM for
berry identication and propose methodologies for leveraging
this information in cluster architecture and compactness analysis.
erefore, the focus was not on characterizing specic cultivars
or genotypes in the surveyed population but rather on sampling
diverse cluster variations. Nevertheless, as mentioned earlier, the
sampled vines are part of a mapping population between Riesling
and Cabernet Sauvignon, planted in a randomized complete
block design with 3 contiguous vines per genotype per block.
is design allowed the calculation of repeatability, expressed
as the percentage of genetic variance relative to the phenotypic
variance.
First, to assess the consistency of the phenotypes measured
in this study, boxplot graphs per genotype were examined for
18 variables. ese variables included basic descriptors such
as berry count, area, length, and width, all computed from the
berry masks identied by SAM. Additionally, cluster compact-
ness was calculated as the ratio between the sum of all berry
areas and the concave hull area. Using the empirical cumulative
distribution functions, the predicted percentage of berries at
x or y = 25, 50, and 75 was also determined. In terms of cluster
architecture based on concave hulls, PCs 1 and 2 were included.
Finally, cluster length, width, perimeter, and aspect ratio were
computed using the concave hulls.
Overall, variables such as berry count, area, length, and width,
as well as cluster area, length, width, and perimeter, showed good
consistency (Fig. 7A), high correlation (Fig. 7B), and medium-
to-high repeatability (Fig. 7C). While descriptors derived from
cumulative distributions showed a correlation among them-
selves, except for ECDF at x = 25 and y = 25, their variability
was higher, likely inuenced by non-genetic sources given their
very low or zero repeatability. Cluster compactness demonstrated
little correlation with other traits but exhibited good consistency
with a repeatability of ~0.6. PC1 from the PCA conducted on
concave hulls, and related to cluster aspect ratio, also showed
good consistency and medium to high repeatability. In summary,
these analyses revealed that many variables computed from the
berry masks identied by SAM, along with others describing
more complex features in the cluster, possess a genetic compo-
nent. Nevertheless, certain variables, particularly those originat-
ing from empirical cumulative distribution functions, seem to
be strongly aected by variations in the environment.
Discussion
Several computational, image-based strategies have been imple-
mented to measure grapevine cluster architecture and compact-
ness. However, only a few have been utilized for identifying
ABerry count Berry area Berry length
Berry width Cluster compactness (berr y area/concave hull area)ECDF at x = 25
ECDF at x = 50 ECDF at x = 75 ECDF at y = 25
ECDF at y = 50 ECDF at y = 75 PC1 on concave hulls
PC2 on concave hulls Cluster area Cluster length
Cluster width Cluster perimeter Cluster aspect ratio
Berry count
Berry area
Berry length
Berry width
Compactness
ECDF at x = 25
ECDF at x = 50
ECDF at x = 75
ECDF at y = 25
ECDF at y = 50
ECDF at y = 75
PC1 on concave hulls
PC2 on concave hulls
Cluster area
Cluster length
Cluster width
Cluster perimeter
Cluster aspect ratio
berry count
berry area
berry length
berry width
compactness
ECDF at x=25
ECDF at x=50
ECDF at x=75
ECDF at y=25
ECDF at y=50
ECDF at y=75
PC1 on concave hulls
PC2 on concave hulls
cluster area
cluster length
cluster width
cluster perimeter
cluster aspect ratio
−1.0
−0.5
0.0
0.5
1.0
Corr
B
Berry area
Berry count
Berry length
Berry width
Cluster area
Cluster aspect ratio
Cluster compactness (berry area/concave hull area)
Cluster length
Cluster perimeter
Cluster width
ECDF at x = 25
ECDF at x = 50
ECDF at x = 75
ECDF at y = 25
ECDF at y = 50
ECDF at y = 75
PC1 on concave hulls
PC2 on concave hulls
0.00.2 0.40.6 0.8
Repeatability
C
Fig.7.Variability in berry and cluster characteristics. (A) Variation in berry characteristics and cluster architecture grouped by genotype; each genotype was replicated in 3
blocks; for each replicated vine, 5 clusters were sampled and imaged from 1 or 4 angles; genotypes are ordered by mean value, and names are omitted due to space constraints.
(B) Pearson’s correlation between traits. (C) Repeatability is calculated as the proportion of genetic variance relative to the phenotypic variance.
Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 10
individual berries within clusters [18–21,23]. Most of these
strategies rely on non-generalizable mathematical and analyti-
cal frameworks for analyzing colored images. Humans can easily
discern individual berries in a cluster image, even when taken
in the eld or under challenging light conditions. erefore, it
is reasonable to assume that machine learning algorithms could
achieve similar capabilities. However, until now, these models
have primarily been applied to cluster identication rather than
berry identication. is does not discard the potential use of
conventional” deep learning approaches trained with human-
segmented berries, but they would require a substantial amount
of image labeling for training.
With the recent introduction of foundation models, par-
ticularly the SAM [27], objects of interest can be automatically
segmented without the need for additional training or ne-
tuning, at least for natural objects. In some specic cases, such
as in medical imaging, additional ne-tuning allows for more
accurate predictive models capable of analyzing many dierent
image types [28]. Here, we demonstrate that out-of-the-box
SAM can accurately segment berries in a 2D grape cluster
image with up to a 0.96 correlation (human berry counts vs.
SAM predictions on visible berries in 2D images; Fig. S5).
While one might argue that the segmented masks produced by
SAM in this study needed supervised classication to identify
berry objects exclusively, the implementation of lters (IoU,
size, area, EFD, and PCA) was straightforward. is approach
can be applied to hundreds, thousands, or even millions of
masks without any changes to the programming. A continua-
tion of this work could be the development of an automatic
classier based on, for example, YOLO, that can use cropped
images based on bounding boxes generated by SAM.
Applying SAM to photos of clusters still on the vine is pos-
sible, but it would require further development, particularly in
regard to image preprocessing. is preprocessing step would
rst need to identify clusters within vine images, which is fea-
sible with methods already available [16], and second, to remove
the background in cropped images containing clusters. Failing
to do this last step will cause SAM to segment non-berry objects,
such as leaves, trunks, or shoots (Fig. S4). In Fig. 8, we show 2
examples of how object removal could be performed on pre-
cropped images of clusters to further process them with SAM.
e same algorithm for berry detection and non-berry object
removal was used. One of the 2 models presented, BRBG (BRIA
Background Removal), is a background-removal tool available
at https://huggingface.co/briaai/RMBG-1.4. Although BRBG is
simple to use, it is not very customizable. For example, it does
not allow for dening the object of interest. However, it does
perform well at removing the background in images. e second
Raw image
Background removal using
RMBG v1.4
Depth estimation using DepthAnything
(yellower color = closer objects)
Berry segmentation using SA
M
Fig.8.Example of 2 machine learning tools (RMBG and Depth Anything) for the preprocessing required to remove background before implementing SAM. Raw image taken
from https://fps.ucdavis.edu with permission.
Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 11
model, Depth Anything [30], is used for monocular depth
estimation, which can be employed to remove backgrounds/
foregrounds based on depth. is serves only as an example of
how a future end-to-end pipeline for vineyard applications
might look. Our study aimed to showcase the capabilities of
zero-shot machine learning models that, despite their generaliza-
tion capabilities, perform well in specic situations, such as
berry segmentation. One of the important takeaways is that
researchers will have to spend less time on model training as
these models become more widely available.
Another consideration when deploying machine learning
tools for real-world applications (e.g., processing images directly
in the vineyard using a mobile device) is processing time. In
our study, the processing time per image was as fast as 14 s on
a GPU-powered machine, which was sucient for our needs.
However, for large-scale applications and edge computing, other
SAM-like implementations such as EdgeSAM [31], fastSAM
[32], and EcientSAM [33], could be adopted.
Although the number of berries visible in a cluster image
underestimates the actual number of berries in a cluster, this
underestimation can be corrected using a linear regression model
(Fig. 2C). Moreover, to compensate for the variability in berry
number caused by cluster ramications of rachis visible only
from certain angles, an additional image, for example, taken at
90°, can correct for any berry count underestimation. Notably,
the berry masks generated by SAM can be used for compre-
hensive cluster architecture analysis, which is only possible if
berries are spatially located.
While some of the trait variation in cluster architecture and
compactness, particularly that captured by the analysis of
empirical cumulative distributions, was inuenced by envi-
ronmental factors, these traits can still have applications in
determining vineyard management practices. For instance,
cluster thinning and/or tipping could be targeted toward asym-
metric or winged clusters, or those with specic architectures.
In table grape production, certain cluster architectures might
be more appealing to consumers [34]. For these types of appli-
cations to be feasible under eld conditions, SAM would have
to be integrated into an existing pipeline that processes eld
images obtained with cameras mounted on rovers or tractors.
In [16], for example, images from vines were acquired using a
sensing kit equipped with RGB cameras, and further processed
using YOLO to identify clusters within the image for yield
prediction. In this case, SAM could be incorporated into this
pipeline to compute additional variables regarding berry count,
size, and cluster architecture.
e observation that the cumulative distribution functions
(Fig. 6A and B) explaining cluster architecture showed lower
or zero repeatability is specic to the Riesling by Cabernet
Sauvignon population analyzed in this study. However, this
does not rule out the possibility that other mapping or breed-
ing populations display heritable variation for these traits.
Consequently, these traits could still be valuable for genetics
research or selection purposes in other mapping or breeding
populations.
Acknowledgments
e authors would like to thank Veronica Nunez, Jose Munoz,
Sadikshya Sharma, Yaniv Lupo, Hollywood Banayad, and Dan
Ng for their support during vineyard management, harvest,
and image annotation. e authors would also like to thank
Dario Cantu for providing access to the F1 population used
in this study.
Funding: This project was partially supported by USDA-
NIFA Specialty Crop Research Initiative Award No. 2022-
51181-38240.
Author contributions: E.T.-L. developed the proof of concept
and set up the computational workow to implement SAM.
E.T.-L. and L.D.-G. conceived and designed the eld experiment.
J.L.-B. and G.G.-Z. supported eldwork and cluster imaging.
E.T.-L. and L.D.-G. wrote the manuscript.
Competing interests: e authors declare that they have no
competing interests.
Data Availability
All the data and code to reproduce the results of this study are
available at https://github.com/diazgarcialab/SAM-cluster-
segmentation.
Supplementary Materials
Figs. S1 to S6
References
1. Tello J, Ibáñez J. What do we know about grapevine bunch
compactness? A state-of-the-art review. Aust J Grape Wine Res.
2018;24(1):6–23.
2. Richter R, Gabriel D, Rist F, Töpfer R, Zyprian E.
Identication of co-located QTLs and genomic regions
aecting grapevine cluster architecture. eor Appl Genet.
2019;132(4):1159–1177.
3. Correa J, Mamani M, Muñoz-Espinoza C, Laborie D, Muñoz C,
Pinto M, Hinrichsen P. Heritability and identication of QTLs
and underlying candidate genes associated with the architecture
of the grapevine cluster (Vitis vinifera L.). eor Appl Genet.
2014;127(5):1143–1162.
4. Underhill A, Hirsch C, Clark M. Image-based phenotyping
identies quantitative trait loci for cluster compactness in
grape. J Am Soc Hortic Sci. 2020;145(6):363–373.
5. Fanizza G, Lamaj F, Costantini L, Chaabane R, Grando MS.
QTL analysis for fruit yield components in table grapes (Viti s
vinifera). eor Appl Genet. 2005;111(4):658–664.
6. Richter R, Rossmann S, Töpfer R, eres K, Zyprian E. Genetic
analysis of loose cluster architecture in grapevine. BIO Web
Conf. 2017;9:01016.
7. Li-Mallet A, Rabot A, Geny L. Factors controlling
inorescence primordia formation of grapevine: eir role in
latent bud fruitfulness? A review. Botany. 2016;94:147–163.
8. Pieri P, Zott K, Gomès E, Hilbert G. Nested eects of
berry half, berry and bunch microclimate on biochemical
composition in grape. OENO One. 2016;50:23.
9. Hed B, Ngugi HK, Travis JW. Relationship between cluster
compactness and bunch rot in Vignoles grapes. Plant Dis.
2009;93:1195–1201.
10. Vail ME, Wolpert JA, Gubler WD, Rademacher MR. Eect
of cluster tightness on botrytis bunch rot in six chardonnay
clones. Plant Dis. 1998;82(1):107–109.
11. Vali ME, Marois JJ. Grape cluster architecture and the
susceptibility of berries to Botrytis cinerea. Phytopathology.
1991;81:188–191.
Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 12
12. Austin CN, Wilcox WF. Eects of sunlight exposure on
grapevine powdery mildew development. Phytopathology.
2012;102(9):857–866.
13. Azevedo CF, Ferrão LFV, Benevenuto J, de Resende MDV,
Nascimento M, Nascimento ACC, Munoz PR. Using visual
scores for genomic prediction of complex traits in breeding
programs. eor Appl Genet. 2023;137(1):9.
14. Underhill A, Hirsch CD, Clark MD. Evaluating and mapping
grape color using image-based phenotyping. Plant Phenomics.
2020;2020:8086309.
15. Font D, Tresanchez M, Martínez D, Moreno J, Clotet E,
Palacín J. Vineyard yield estimation based on the analysis of
high resolution images obtained with articial illumination at
night. Sensors. 2015;15(4):8284–8301.
16. Olenskyj AG, Sams BS, Fei Z, Singh V, Raja PV, Bornhorst GM,
Earles JM. End-to-end deep learning for directly estimating
grape yield from ground-based imagery. Comput Electron
Agric. 2022;198:Article 107081.
17. Nuske S, Wilshusen K, Achar S, Yoder L, Narasimhan S,
Singh S. Automated visual yield estimation in vineyards. J Field
Robot. 2014;31(5):837–860.
18. Schöler F, Steinhage V. Automated 3D reconstruction of grape
cluster architecture from sensor data for ecient phenotyping.
Comput Electron Agric. 2015;114:163–177.
19. Li M, Klein LL, Duncan KE, Jiang N, Chitwood DH, Londo JP,
Miller AJ, Topp CN. Characterizing 3D inorescence
architecture in grapevine using X-ray imaging and advanced
morphometrics: Implications for understanding cluster
density. J Exp Bot. 2019;70(21):6261–6276.
20. Ivorra E, Sánchez AJ, Camarasa JG, Diago MP, Tardaguila J.
Assessment of grape cluster yield components based on 3D
descriptors using stereo vision. Food Control. 2015;50: 273–282.
21. Luo L, Liu W, Lu Q, Wang J, Wen W, Yan D, Tang Y. Grape
berry detection and size measurement based on edge
image processing and geometric morphology. Mach Des.
2021;9(10):233.
22. Aquino A, Diago MP, Millán B, Tardáguila J. A new
methodology for estimating the grapevine-berry number per
cluster using image analysis. Biosyst Eng. 2017;156:80–95.
23. Zabawa L, Kicherer A, Klingbeil L, Töpfer R, Kuhlmann H,
Roscher R. Counting of grapevine berries in images via
semantic segmentation using convolutional neural networks.
ISPRS J Photogramm Remote Sens. 2020;164:73–83.
24. Zhang Y, Jiao R. Towards segment anything model (SAM) for
medical image segmentation: A survey. arXiv. 2023. http://
arxiv.org/abs/2305.03678
25. Zhou C, Li Q, Li C, Yu J, Liu Y, Wang G, Zhang K, Ji C, Yan
Q, Peng H, etal. A comprehensive survey on pretrained
foundation models: A history from BERT to ChatGPT. arXiv.
2023. http://arxiv.org/abs/2302.09419
26. Mazurowski MA, Dong H, Gu H, Yang J, Konz N, Zhang Y.
Segment anything model for medical image analysis: An
experimental study. Med Image Anal. 2023;89:Article
102918.
27. Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L,
Xiao T, Whitehead W, Berg AC, Lo W-Y, etal. Segment
anything. arXiv. 2023. http://arxiv.org/abs/2304.02643
28. Ma J, He Y, Li F, Han L, You C, Wang B. Segment anything in
medical images. Nat Commun. 2024;15:654.
29. Bonhomme V, Picq S, Gaucherel C, Claude J. Momocs:
Outline analysis using R. J Stat Sow. 2014;56(13): 10.18637/
jss.v056.i13.
30. Yang L, Kang B, Huang Z, Xu X, Feng J, Zhao H. Depth
anything: Unleashing the power of large-scale unlabeled data.
arXiv. 2024. http://arxiv.org/abs/2401.10891
31. Zhou C, Li X, Loy CC, Dai B. EdgeSAM: Prompt-in-the-loop
distillation for on-device deployment of SAM. arXiv. 2023.
http://arxiv.org/abs/2312.06660
32. Zhao X, Ding W, An Y, Du Y, Yu T, Li M, Tang M, Wang
J. Fast segment anything. arXiv. 2023. http://arxiv.org/
abs/2306.12156
33. Xiong Y, Varadarajan B, Wu L, Xiang X, Xiao F, Zhu C, Dai
X, Wang D, Sun F, Iandola F, etal. EcientSAM: Leveraged
masked image pretraining for ecient segment anything.
arXiv. 2023. http://arxiv.org/abs/2312.00863
34. Zhou J, Cao L, Chen S, Perl A, Ma H. Consumer-assisted
selection: e preference for new tablegrape cultivars in China.
Aust J Grape Wine Res. 2015;21(3):351–360.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Medical image segmentation is a critical component in clinical practice, facilitating accurate diagnosis, treatment planning, and disease monitoring. However, existing methods, often tailored to specific modalities or disease types, lack generalizability across the diverse spectrum of medical image segmentation tasks. Here we present MedSAM, a foundation model designed for bridging this gap by enabling universal medical image segmentation. The model is developed on a large-scale medical image dataset with 1,570,263 image-mask pairs, covering 10 imaging modalities and over 30 cancer types. We conduct a comprehensive evaluation on 86 internal validation tasks and 60 external validation tasks, demonstrating better accuracy and robustness than modality-wise specialist models. By delivering accurate and efficient segmentation across a wide spectrum of tasks, MedSAM holds significant potential to expedite the evolution of diagnostic tools and the personalization of treatment plans.
Article
Full-text available
Key message An approach for handling visual scores with potential errors and subjectivity in scores was evaluated in simulated and blueberry recurrent selection breeding schemes to assist breeders in their decision-making. Abstract Most genomic prediction methods are based on assumptions of normality due to their simplicity and ease of implementation. However, in plant and animal breeding, continuous traits are often visually scored as categorical traits and analyzed as a Gaussian variable, thus violating the normality assumption, which could affect the prediction of breeding values and the estimation of genetic parameters. In this study, we examined the main challenges of visual scores for genomic prediction and genetic parameter estimation using mixed models, Bayesian, and machine learning methods. We evaluated these approaches using simulated and real breeding data sets. Our contribution in this study is a five-fold demonstration: (i) collecting data using an intermediate number of categories (1–3 and 1–5) is the best strategy, even considering errors associated with visual scores; (ii) Linear Mixed Models and Bayesian Linear Regression are robust to the normality violation, but marginal gains can be achieved when using Bayesian Ordinal Regression Models (BORM) and Random Forest Classification; (iii) genetic parameters are better estimated using BORM; (iv) our conclusions using simulated data are also applicable to real data in autotetraploid blueberry; and (v) a comparison of continuous and categorical phenotypes found that investing in the evaluation of 600–1000 categorical data points with low error, when it is not feasible to collect continuous phenotypes, is a strategy for improving predictive abilities. Our findings suggest the best approaches for effectively using visual scores traits to explore genetic information in breeding programs and highlight the importance of investing in the training of evaluator teams and in high-quality phenotyping.
Article
Full-text available
Counting grape berries and measuring their size can provide accurate data for robot picking behavior decision-making, yield estimation, and quality evaluation. When grapes are picked, there is a strong uncertainty in the external environment and the shape of the grapes. Counting grape berries and measuring berry size are challenging tasks. Computer vision has made a huge breakthrough in this field. Although the detection method of grape berries based on 3D point cloud information relies on scanning equipment to estimate the number and yield of grape berries, the detection method is difficult to generalize. Grape berry detection based on 2D images is an effective method to solve this problem. However, it is difficult for traditional algorithms to accurately measure the berry size and other parameters, and there is still the problem of the low robustness of berry counting. In response to the above problems, we propose a grape berry detection method based on edge image processing and geometric morphology. The edge contour search and the corner detection algorithm are introduced to detect the concave point position of the berry edge contour extracted by the Canny algorithm to obtain the best contour segment. To correctly obtain the edge contour information of each berry and reduce the error grouping of contour segments, this paper proposes an algorithm for combining contour segments based on clustering search strategy and rotation direction determination, which realizes the correct reorganization of the segmented contour segments, to achieve an accurate calculation of the number of berries and an accurate measurement of their size. The experimental results prove that our proposed method has an average accuracy of 87.76% for the detection of the concave points of the edge contours of different types of grapes, which can achieve a good edge contour segmentation. The average accuracy of the detection of the number of grapes berries in this paper is 91.42%, which is 4.75% higher than that of the Hough transform. The average error between the measured berry size and the actual berry size is 2.30 mm, and the maximum error is 5.62 mm, which is within a reasonable range. The results prove that the method proposed in this paper is robust enough to detect different types of grape berries.
Article
Full-text available
Grape ( Vitis vinifera ) cluster compactness is an important trait due to its effect on disease susceptibility, but visual evaluation of compactness relies on human judgement and an ordinal scale that is not appropriate for all populations. We developed an image analysis pipeline and used it to quantify cluster compactness traits in a segregating hybrid wine grape ( Vitis sp.) population for 2 years. Images were collected from grape clusters immediately after harvest, segmented by color, and analyzed using a custom script. Both automated and conventional phenotyping methods were used, and comparisons were made between each method. A partial least squares (PLS) model was constructed to evaluate the prediction of physical cluster compactness using image-derived measurements. Quantitative trait loci (QTL) on chromosomes 4, 9, 12, 16, and 17 were associated with both image-derived and conventionally phenotyped traits within years, which demonstrated the ability of image-derived traits to identify loci related to cluster morphology and cluster compactness. QTL for 20-berry weight were observed between years on chromosomes 11 and 17. Additionally, the automated method of cluster length measurement was highly accurate, with a deviation of less than 10 mm ( r = 0.95) compared with measurements obtained with a hand caliper. A remaining challenge is the utilization of color-based image segmentation in a population that segregates for fruit color, which leads to difficulty in differentiating the stem from the fruit when the two are similarly colored in non-noir fruit. Overall, this research demonstrates the validity of image-based phenotyping for quantifying cluster compactness and for identifying QTL for the advancement of grape breeding efforts.
Article
Full-text available
Grape berry color is an economically important trait that is controlled by two major genes influencing anthocyanin synthesis in the skin. Color is often described qualitatively using six major categories; however, this is a subjective rating that often fails to describe variation within these six classes. To investigate minor genes influencing berry color, image analysis was used to quantify berry color using different color spaces. An image analysis pipeline was developed and utilized to quantify color in a segregating hybrid wine grape population across two years. Images were collected from grape clusters immediately after harvest and segmented by color to determine the red, green, and blue (RGB); hue, saturation, and intensity (HSI); and lightness, red-green, and blue-yellow values ( L∗a∗b∗ ) of berries. QTL analysis identified known major QTL for color on chromosome 2 along with several previously unreported smaller-effect QTL on chromosomes 1, 5, 6, 7, 10, 15, 18, and 19. This study demonstrated the ability of an image analysis phenotyping system to characterize berry color and to more effectively capture variability within a population and identify genetic regions of interest.
Article
Training segmentation models for medical images continues to be challenging due to the limited availability of data annotations. Segment Anything Model (SAM) is a foundation model trained on over 1 billion annotations, predominantly for natural images, that is intended to segment user-defined objects of interest in an interactive manner. While the model performance on natural images is impressive, medical image domains pose their own set of challenges. Here, we perform an extensive evaluation of SAM's ability to segment medical images on a collection of 19 medical imaging datasets from various modalities and anatomies. In our experiments, we generated point and box prompts for SAM using a standard method that simulates interactive segmentation. We report the following findings: (1) SAM's performance based on single prompts highly varies depending on the dataset and the task, from IoU=0.1135 for spine MRI to IoU=0.8650 for hip X-ray. (2) Segmentation performance appears to be better for well-circumscribed objects with prompts with less ambiguity such as the segmentation of organs in computed tomography and poorer in various other scenarios such as the segmentation of brain tumors. (3) SAM performs notably better with box prompts than with point prompts. (4) SAM outperforms similar methods RITM, SimpleClick, and FocalClick in almost all single-point prompt settings. (5) When multiple-point prompts are provided iteratively, SAM's performance generally improves only slightly while other methods' performance improves to the level that surpasses SAM's point-based performance. We also provide several illustrations for SAM's performance on all tested datasets, iterative segmentation, and SAM's behavior given prompt ambiguity. We conclude that SAM shows impressive zero-shot segmentation performance for certain medical imaging datasets, but moderate to poor performance for others. SAM has the potential to make a significant impact in automated medical image segmentation in medical imaging, but appropriate care needs to be applied when using it. Code for evaluation SAM is made publicly available at https://github.com/mazurowski-lab/segment-anything-medical-evaluation.
Article
Yield estimation prior to harvest is a powerful tool in vineyard management, as it allows growers to fine-tune management practices to optimize yield and quality. However, yield estimation is currently performed using manual sampling, which is time-consuming and imprecise. This study demonstrates the applicability of nondestructive proximal imaging combined with deep learning for yield estimation in vineyards. Continuous image data collection using a vehicle-mounted sensing kit combined with collection of ground truth yield data at harvest using a commercial yield monitor allowed for the generation of a large dataset of 23,581 yield points and 107,933 images. Moreover, this study was conducted in a commercial vineyard which was mechanically managed, representing a challenging environment for image analysis but a common set of conditions in the California Central Valley. Three model architectures were tested: object detection, CNN regression, and trans- former models. The object detection model was trained on hand-labeled images to localize grape bunches, and detections were either counted or their pixel count was summed to obtain a metric which was correlated to grape yield. Conversely, regression models were trained end-to-end to directly predict grape yield from image data without the need for hand labeling. Results demonstrated that both a transformer model as well as the object detection model with pixel area processing performed comparably, with a mean absolute percent error of 18% and 18.5%, respectively on a representative holdout dataset. Saliency mapping was used to demonstrate the attention of the CNN regression model was localized near the predicted location of grape bunches, as well as on the top of the grapevine canopy. Overall, the study demonstrated the applicability of proximal imaging and deep learning for prediction of grapevine yield on a large scale. Additionally, the end-to-end modeling approach was able to perform comparably to the object detection approach while eliminating the need for hand-labeling.
Article
The extraction of phenotypic traits is often very time and labour intensive. Especially the investigation in viticulture is restricted to an on-site analysis due to the perennial nature of grapevine. Traditionally skilled experts examine small samples and extrapolate the results to a whole plot. Thereby different grapevine varieties and training systems, e.g. vertical shoot positioning (VSP) and semi minimal pruned hedges (SMPH) pose different challenges. In this paper we present an objective framework based on automatic image analysis which works on two different training systems. The images are collected semi automatic by a camera system which is installed in a modified grape harvester. The system produces overlapping images from the sides of the plants. Our framework uses a convolutional neural network to detect single berries in images by performing a semantic segmentation. Each berry is then counted with a connected component algorithm. We compare our results with the Mask-RCNN, a state-of-the-art network for instance segmentation and with a regression approach for counting. The experiments presented in this paper show that we are able to detect green berries in images despite of different training systems. We achieve an accuracy for the berry detection of 94.0% in the VSP and 85.6% in the SMPH.