ArticlePDF Available

Segment Anything for Comprehensive Analysis of Grapevine Cluster Architecture and Berry Properties

June 2024
6(1)

License
CC BY 4.0

Authors:

Grape cluster architecture and compactness are complex traits influencing disease susceptibility, fruit quality, and yield. Evaluation methods for these traits include visual scoring, manual methodologies, and computer vision, with the latter being the most scalable approach. Most of the existing computer vision approaches for processing cluster images often rely on conventional segmentation or machine learning with extensive training and limited generalization. The Segment Anything Model (SAM), a novel foundation model trained on a massive image dataset, enables automated object segmentation without additional training. This study demonstrates out-of-the-box SAM’s high accuracy in identifying individual berries in 2-dimensional (2D) cluster images. Using this model, we managed to segment approximately 3,500 cluster images, generating over 150,000 berry masks, each linked with spatial coordinates within their clusters. The correlation between human-identified berries and SAM predictions was very strong (Pearson’s r² = 0.96). Although the visible berry count in images typically underestimates the actual cluster berry count due to visibility issues, we demonstrated that this discrepancy could be adjusted using a linear regression model (adjusted R² = 0.87). We emphasized the critical importance of the angle at which the cluster is imaged, noting its substantial effect on berry counts and architecture. We proposed different approaches in which berry location information facilitated the calculation of complex features related to cluster architecture and compactness. Finally, we discussed SAM’s potential integration into currently available pipelines for image generation and processing in vineyard conditions.

Cumulative distributions of berry locations along the horizontal and vertical axes. (A) Example of berries identified in a cluster imaged from 4 different angles (0°, 90°, 180°, and 270°). (B) Empirical cumulative distributions along the x-axis for the 4 angle views; berry locations along the x-axis are shifted to start at 0. (C) Similar to panel B, but for the y-axis. (D) Similar to panel B, but berry locations along the x-axis are scaled from 0 to 100 and sampled with n = 100. (E) Similar to panel D, but for berry locations along the y-axis.

…

of the pipeline employed for generating and processing SAM masks. The process for each image is detailed below. Firstly, the region of interest (ROI) housing the cluster is identified. Subsequently, a grid of points separated by 88 × 171 pixels is utilized as input for object identification in SAM. Following this, masks undergo analysis based on various parameters including intersect over union (IoU), area, perimeter, length, width, aspect ratio, and elliptical Fourier descriptors (EFDs) to discern non-berry objects.

…

Prediction of berry number using SAM from cluster images. (A) Identification of individual berries from 4 angles on the same cluster. (B) Berry masks from cluster images in panel A, color-coded by angle view. (C) Correlation between real and predicted berry counts from SAM; predicted counts for each angle view in panel A are displayed. Points marked with an X represent corrected counts using the angle view with the maximum berries, adjusted with a linear model. (D) Correlation between real and predicted berry area; color and shape patterns are similar to panel C; corrected points were generated with a linear model of the form y ~ β0 + β1x. The vertical red line indicates a one-to-one relationship between variables.

…

Impact of imaging angle on cluster analysis. (A) Change in berry number relative to angle 1 (0°, first image); each green line represents a cluster imaged at 4 different angles. (B) Frequency plot of changes in berry number relative to angle 1, similar to panel A. (C) Frequency plot of changes in max berry area relative to angle 1. (D) Examples illustrating the effect of berry angle on SAM-detected berry counts; each column represents a different cluster, and each row represents a different angle (0°, 90°, 180°, and 270°). The number of detected berries is indicated in each image. The first 4 clusters show little variation, while the last 4 exhibit extreme berry count variation.

…

What is cluster architecture? Example of concave hull calculation for different clusters (in columns) at different cluster shape definition levels (from top to bottom, higher to lower definition); concave hulls are calculated on the union of all berry masks in the cluster.

…

Figures - available from: Plant Phenomics

This content is subject to copyright. Terms and conditions apply.

Access to this full-text is provided by American Association for the Advancement of Science.

Content available from Plant Phenomics

This content is subject to copyright. Terms and conditions apply.

Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 1

RESEARCH ARTICLE

Segment Anything for Comprehensive

Analysis of Grapevine Cluster Architecture

and Berry Properties

Efrain Torres-Lomas1, Jimena Lado-Bega2, Guillermo Garcia-Zamora1,

and Luis Diaz-Garcia1*

1Department of Viticulture and Enology, University of California Davis, Davis, CA 95616, USA. 2Soil and

Water Department, Universidad de la Republica, Montevideo 11400, Uruguay.

*Address correspondence to: diazgarcia@ucdavis.edu

Grape cluster architecture and compactness are complex traits influencing disease susceptibility, fruit

quality, and yield. Evaluation methods for these traits include visual scoring, manual methodologies, and

computer vision, with the latter being the most scalable approach. Most of the existing computer vision

approaches for processing cluster images often rely on conventional segmentation or machine learning

with extensive training and limited generalization. The Segment Anything Model (SAM), a novel foundation

model trained on a massive image dataset, enables automated object segmentation without additional

training. This study demonstrates out-of-the-box SAM’s high accuracy in identifying individual berries

in 2-dimensional (2D) cluster images. Using this model, we managed to segment approximately 3,500

cluster images, generating over 150,000 berry masks, each linked with spatial coordinates within their

clusters. The correlation between human-identiﬁed berries and SAM predictions was very strong (Pearson’s

r2 = 0.96). Although the visible berry count in images typically underestimates the actual cluster berry

count due to visibility issues, we demonstrated that this discrepancy could be adjusted using a linear

regression model (adjusted R2 = 0.87). We emphasized the critical importance of the angle at which the

cluster is imaged, noting its substantial eﬀect on berry counts and architecture. We proposed diﬀerent

approaches in which berry location information facilitated the calculation of complex features related to

cluster architecture and compactness. Finally, we discussed SAM’s potential integration into currently

available pipelines for image generation and processing in vineyard conditions.

Introduction

Grape cluster architecture and compactness are important fruit

traits that inuence yield, quality, and susceptibility to pests and

diseases [1]. Cluster architecture is directly related to cluster

compactness, which describes the ratio between the volume

occupied by berries and the total cluster volume [2]. In other

words, cluster architecture determines the arrangement of berries

in a cluster and the distribution of free space. Cluster architecture

is complex, dicult to measure quantitatively, and determined

by many factors such as berry number, size, shape, and spatial

location, which all relate to the rachis ramication patterns [3].

While certain features of cluster architecture can be discerned

by looking at the cluster contour, a more precise analysis requires

the identication and spatial localization of the individual berries

within the cluster. Cluster architecture and compactness are

determined genetically, as many genomic regions have been

associated with trait variation [2–6]. However, environmental

factors such as temperature, humidity, nutrient availability, and

vineyard management, among others, are known to alter cluster

architecture and compactness directly or indirectly [1,2,7,8].

Understanding the factors that inuence cluster architecture

and compactness, and to what extent they do so, has implications

for vineyard management, breeding, and genetics research.

For example, high cluster compactness has been associated

with increased susceptibility to Botrytis bunch rot caused by

Botrytis cinerea [9–11]. is, in turn, has implications in terms

of vineyard management and cultivar preference, since fungi-

cide applications can better reach berries within the cluster in

the case of a more open, looser cluster. Furthermore, there is

a greater temperature variability between the inner and outer

berries in densely compacted clusters, impacting the matura-

tion rate [8]. Additionally, restricted sun exposure to berries

has been observed to intensify powdery mildew infections [12],

thereby inuencing fungicide application scheduling.

Exploring cluster architecture and compactness has been

the focus of several studies utilizing qualitative and quantitative

methods. Among qualitative approaches, researchers primarily

rely on the OIV descriptors, a set of denitions established by

the International Organization of Vine and Wine. For instance,

the descriptor OIV 204, which addresses cluster density or

compactness, categorizes grape clusters into 5 classications

ranging from very loose to very dense. Similarly, cluster archi-

tecture can be described using a combination of OIV 208—

bunch shape (cylindrical, conical, and funnel-shaped) and OIV

209—number of wings of the primary bunch (ranging from 1

Citation: Torres-LomasE,

Lado-BegaJ, Garcia-ZamoraG,

Diaz-GarciaL. Segment Anything

for Comprehensive Analysis of

Grapevine Cluster Architecture and

Berry Properties. Plant Phenomics

2024;6:Article 0202. https://doi.

org/10.34133/plantphenomics.0202

Submitted 16 February 2024

Accepted 24 May 2024

Published 27 June 2024

etal. Exclusive licensee Nanjing

Agricultural University. No claim

to original U.S. Government Works.

Distributed under a Creative

Commons Attribution License 4.0 (CC

BY 4.0).

Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 2

to 6 or more). Classifying clusters based on OIV descriptors

oen involves considering multiple characteristics simultane-

ously, which, while providing a comprehensive assessment, can

be challenging to replicate and scale. For example, Richter et al.

[2] studied an F1 mapping population derived from crossing

GF.GA-47-42 and Villard Blanc. eir study involved manually

recording individual cluster and berry traits (e.g., berry number,

cluster weight, rachis size and architecture, and shoulder length,

among others), which accounted for approximately half of the

observed variation compared to using the OIV 204 descriptor

alone. is emphasizes the complexity of cluster architecture

and how it is inuenced by various individual characteristics,

including cluster compactness.

Computer vision approaches can also be used to analyze clus-

ter architecture and compactness. In this case, available methods

involve 2-dimensional (2D) image analysis and 3D modeling,

which all have the capability of producing quantitative traits.

In many cases, the utilization of quantitative traits derived from

these imaging approaches has proven to be more eective in

genetics research and breeding compared to categorical traits

[13]. Depending on the algorithm, some studies have focused

on berry detection while others have focused only on whole cluster

analysis. For example, conventional segmentation on cluster

images generated in the lab has been used to assess berry color

and cluster architecture [4,14]. Cluster images generated directly

in the vineyard have been used also for cluster identication and

yield estimation using a variety of methods; however, prediction

accuracy has varied because of challenging light conditions or

occlusion [15–17]. Identifying and localizing berries within the

cluster is crucial for determining cluster architecture and com-

pactness. In this context, several approaches have been tested,

including robotic laser scanning systems to reconstruct 3D rep-

resentations of clusters and generate precise data regarding the

3D location of berries in a cluster [18]. Likewise, x-ray tomog-

raphy has been employed to scan grapevine inorescences and

model berry growth and infer phylogenetic relationships [19].

Partial 3D models of grape clusters have also been generated

using stereo-vision, which, in turn, allows berry counting [20].

Some other methodologies allow the estimation of berry numbers

from images taken directly in the eld. For example, in the work

of Luo et al. [21], the model developed allowed for an accurate

prediction of berry counts in Niagara grapes, which are generally

larger than most table and wine grapes. Neural networks have

also been applied for berry segmentation and counting, and

although they produced very accurate estimates, they were only

used on very immature clusters with limited berry growth,

low compactness, and sucient contrast between berries [22].

Furthermore, other methods based on convolutional neural net-

works and semantic segmentation have shown accurate estima-

tions of berry numbers in eld images, which might be of great

utility for, for example, yield prediction. However, using this

information to conduct cluster analysis is dicult, as the identi-

ed berries are not assigned to clusters [23].

Many of the image analysis-based methods used to describe

cluster architecture and compactness relied on traditional seg-

mentation methods. ese methods oen depend on labor-

intensive, customized functions, manually engineered features,

and error-prone thresholding designed for specic scenarios.

As an alternative, deep learning models for image analysis, with

their ability to capture latent image features, have shown promise

across various fields, including medicine, surveillance and

security, agriculture, biometrics, environmental sciences, and

remote sensing, among others. However, these models are typi-

cally designed and trained for specic segmentation tasks, and

unfortunately, their performance may substantially deteriorate

when applied to new tasks, dierent image types, or varying

external conditions. Large-scale foundational models have

revolutionized articial intelligence due to their remarkable

zero-shot and few-shot generalization capabilities across a broad

spectrum of downstream tasks [24,25]. Foundation models

are neural networks trained on vast datasets using innovative

learning methods and prompting objectives that generally

do not require conventional supervised training labels, which

makes them adaptable to a variety of external conditions [26].

e Segment Anything Model (SAM) is a new foundation model

that can be used as a zero-shot segmentation method [27]. SAM

can be used out of the box to segment a variety of objects in an

image, or can be ne-tuned for a specic task, such as the very

recently developed MedSAM [28]. SAM was built on the largest

segmentation dataset to date, with over 1 billion segmentation

masks [27]. To segment an object, SAM requires the user to

provide a prompt, which can take the form of a single point, a

polygon (similar to a mask), a bounding box, or just text [26].

In this study, we demonstrated the capabilities of SAM to

segment grape berries from 2D cluster images without addi-

tional model training or ne-tuning. Our research focused

on 4 main aspects: (1) measuring the accuracy of SAM in

identifying visible berries within a cluster image; (2) predicting

hidden berries in a cluster image and assessing the impact of

cluster imaging angle; (3) developing new quantitative methods

to describe cluster architecture based on berry distributions

within the clusters; and (4) assessing the repeatability of cluster

architecture and compactness traits in replicated experiments.

Materials and Methods

Plant material

Cluster images obtained from an F1 mapping population (n =

139 genotypes) derived from crossing Cabernet Sauvignon and

Riesling were used to test SAM. Both Cabernet Sauvignon and

Riesling, major wine grape cultivars around the world, display

contrasting cluster architectures. Cabernet Sauvignon clusters

are small to medium in size, conical, loose to well-lled, and with

medium-long peduncles. Its berries are small, round, and blue-

black. Riesling has smaller clusters, which can be cylindrical or

globular, and sometimes winged; clusters are compact and with

short peduncles. Riesling berries are small and round and have

a white-green skin coloration. is F1 progeny segregates for the

traits mentioned above, making it an ideal candidate to evaluate

the proposed pipeline. is population was planted in UC Davis

Experimental Station in Oakville, Napa County, CA, USA

(38°25′45.4′′N; 122°24′36.4′′W), in 2017. Vines were arranged

using a randomized complete block design with 3 blocks and

3 vines per experimental unit. For this study, one vine per experi-

mental unit was sampled (the one in the middle). For each vine,

5 representative clusters were imaged as described below.

Image capture

Five representative clusters per vine were imaged using the setup

shown in Fig. S1. e setup included a reference circle to nor-

malize measurements and account for potential variation in the

location of the camera relative to the cluster. e camera used

was a Canon EOS 70D with a 24-mm prime lens, an aperture

of f/5, and an exposure time of 1/500 s. Images were 5,472 × 3,648

Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 3

(~20 Mpx). All clusters were imaged from at least one angle. In

addition, all the clusters from a subset of 99 vines were imaged

from 3 additional angles (90°, 180°, and 270°). e latter was

used to assess complex architectures that result from the presence

of cluster ramications or wings, and that are visible only from

specic view angles.

A second image dataset was generated to validate the SAM

algorithm. is dataset consisted of cluster images, each one

accompanied by an image of all the individual berries detached

and individually placed on a white surface (Fig. S2).

Model and processing pipeline

e images described above, without editing their original

brightness or contrasts, were used as input for SAM. To reduce

the amount of pixels to be processed, a region of interest (ROI)

was manually dened, as indicated in Fig. 1. e pretrained

ViT-H (Huge Version) image encoder was used for the segmen-

tation phase (checkpoint available at https://dl.aipublicles.

com/segment_anything/sam_vit_h_4b8939.pth). e mask pre-

diction was executed by applying the Automatic Mask Generator

to the input, which was dened as the pixels within the ROI and

a prompt, described as an XY grid of points equally distributed

across the ROI. Dierent grid congurations, including 4 × 4,

6 × 6, 8 × 8, and so on, up to 62 × 62, were explored and tested

for eciency. Both the number of masks and the area increased

as the number of points in the grid increased, until reaching a

plateau at around 20 to 25 points; aer that, the increase was

marginal. e number of masks still increased beyond this point

(Fig. S3A and C), as more berries, mainly those partially hidden,

were found. ese berries, discovered at higher point densities,

were of smaller sizes, as the increase in total area aer reaching

about 30 points was negligible (Fig. S3B). e marginal increase

in the area or number of objects detected at higher grid densities

is also likely due to Segment Anything’s reduction of image resolu-

tion, making smaller objects undetectable. To test this hypothesis,

a zoomed-in image of a cluster with numerous skin features (e.g.,

spots, color variations, and damages) was processed using a 256 ×

256 grid. is approach resulted in the detection of many smaller

features (Fig. S4), emphasizing the need to process a smaller set

of photos to optimize conditions. Aer these preliminary tests,

a 32 × 32 grid was chosen as it captured most of the grape objects

without unnecessary computational overhead. As a preliminary

analysis, SAM was executed using a graphic process unit (GPU),

massively parallel sequencing (MPS), and a central process unit

(CPU) platforms to compare any potential segmentation dif-

ferences; however, only computation time was aected. e out-

put produced by SAM comprised bounding boxes in XYWH

format, area, predicted intersection over union (IoU), stability

scores, and mask segments formatted as COCO Run Length

Encoding (RLE). e implementation of SAM, including ROI

identication and automatic mask generation, was implemented

in Python 3.11. e hardware tested was a g3.4xlarge AWS instance

(single GPU, 16 GB RAM) and a System76 workstation (32 CPU,

256 GB RAM). Details on specic dependencies are available in

the following GitHub repository: https://github.com/diazgarcialab/

SAM-cluster-segmentation.

e RLE mask segments were decoded using pycocotools

(https://github.com/cocodataset/cocoapi/blob/master/

PythonAPI/pycocotools/mask.py) to derive the x and y coordi-

nates of the mask contours and their position within the cluster.

ese coordinates were analyzed using the R package Momocs

[29] to compute various parameters such as berry area, length,

width, aspect ratio, perimeter, and color (represented as median

red, green, and blue values). SAM is a segmentation tool rather

than a classier. As such, the segmented masks it produces may

include, in addition to berries, other objects such as the clamp

used to hold the clusters or the reference circle for size normaliza-

tion. ese objects can be easily identied and distinguished

from berries due to their contrasting morphology and size, as

described below. More oen, some masks may encompass 2 or

more berries, which were addressed using the IoU estimates. IoU

is a metric used to evaluate the overlap between 2 bounding boxes

or masks, commonly employed when assessing the accuracy of

image segmentation models. In this study, IoU was calculated by

determining the size of the overlapping region between 2 masks

detected by SAM. For example, in instances where an overlapping

mask covers 2 berries, each with its own mask, the overlapping

mask will exhibit a larger size and IoU. Furthermore, lters based

on criteria such as area, perimeter-to-area ratio, and aspect ratio

were implemented to exclude objects other than berries. To rene

the segmentation further, we employed a ltering approach using

elliptical Fourier descriptors (EFDs) and principal component

analysis (PCA) to eliminate non-berry objects, especially rachis

parts. Initially, the x and y coordinates of objects were transformed

into an “Out” object using Momocs soware, which facilitated

the computation of EFD harmonic coecients. ese coecients



Grid for object detection

(prompt for SAM)

Raw masks













Filter based on

basic descriptors

(IoU, area, perimeter, aspect ratio)

Other complex shapes

(EFD + PCA)



ROI

Fig.1.Summary of the pipeline employed for generating and processing SAM masks. The process for each image is detailed below. Firstly, the region of interest (ROI) housing the

cluster is identiﬁed. Subsequently, a grid of points separated by 88 × 171 pixels is utilized as input for object identiﬁcation in SAM. Following this, masks undergo analysis based

on various parameters including intersect over union (IoU), area, perimeter, length, width, aspect ratio, and elliptical Fourier descriptors (EFDs) to discern non-berry objects.

Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 4

were then analyzed using PCA for visualization purposes, and

outliers were identied through 5 rounds of outlier detection.

Each round involved recalculating the harmonics and principal

components with a cleaner dataset, adopting a threshold of ±2

standard deviations among the rst 10 principal components.

Results

Characteristics of the model implementation and

implementation time

e implementation of SAM on a population of 387 vines and

1,935 dierent clusters resulted in 215,090 masks. For 99 of the

387 vines, all the clusters were imaged 4 times, each time at a

dierent angle (0°, 90°, 180°, and 270°), which resulted in 3,431

cluster images. e identied masks included, among other

things, individual berries, 2 or more berries, the clamp used to

hold the clusters in place, stains/discolorations in the back-

ground, the reference circle for size normalization, and rachis

segments. is outcome is expected as SAM utilizes an algo-

rithm for unsupervised object segmentation, and not classica-

tion, within an area of interest dened by the user. As a result

of the ltering, 32,425 masks containing 2 or more berries were

removed using IoU. Furthermore, since berries had an expected

size and aspect ratio, 23,125 masks with signicantly larger

areas or aspect ratios, or located far from the cluster (stains in

the background) were ltered out. Finally, the rest of the mask

contours were analyzed with Momocs [29] using a combination

of EFD and PCA, leading to the identication of 5,601 objects

other than berries. Aer this ltering step, the number of true

berry masks was 153,939 (61,151 masks discarded). Each clus-

ter had, on average, 44.87 berries (median = 42). Berry number

varied between 5 and 130, and variation showed a normal dis-

tribution (Fig. S5).

Computation time per photo varied depending on the num-

ber of points in the grid used to initialize the object search, as

well as the characteristics of the machine. In this study, we used

a conguration of 32 points per side (32 × 32), resulting in a

grid where points were horizontally separated by ~88 pixels

and vertically by ~171 pixels. On average, processing a photo

took 55 s using the CPU of the System76 workstation and 14 s

with the GPU on the AWS g3.4xlarge instance. Increasing

the grid density slightly improved the number of berries

detected, although the increase was very marginal (Fig. S2).

However, when the grid density was increased to 62 × 62

points—resulting in 114 pixels of horizontal separation and

59 pixels vertically—computation time increased to 4 min and

45 s on the CPU and GPU, respectively.

2D cluster representations predict berry number

and cluster size

Berry counts from clusters imaged at 4 dierent angles were

compared with the number of berries determined manually.

e “manual” determination of berries was conducted using 2

methods. e rst involved humans counting visible berries in

a subset of 100 images, and then comparing these counts with

SAM predictions. e second involved processing additional

images of 84 clusters from 17 vines where all the berries were

detached and placed individually on a surface. e analysis of

these images is straightforward since there is no touching

among berries, and there exists good contrast between the

berry and surface colors (Fig. S2). In addition to being used to

determine the true number of berries, these images also allowed

the comparison of berry size, assuming that the masks gener-

ated from isolated, uncompressed berries imaged from the top

approximate well to the real size of a berry.

As shown in Fig. 2A, the SAM algorithm does a very good

job nding and segmenting all the berries in the cluster, inde-

pendently of the angle it is being imaged. e berries identied

were fully visible, represented as circles, or partially visible (Fig.

2B). e correlation between the berry number determined by

humans and the SAM prediction was 0.96 (Fig. S6). ere was

also good agreement between SAM berry number predictions

and the number of berries calculated from images with the

individual berries (R

= 0.93, 5-fold cross-validation). However,

there was a clear underestimation, which varied depending on

the imaging angle (Fig. 2C). Overall, the underestimation was

approximately 50% of the real number but linear. In symmetric

clusters (e.g., cylindrical with no ramications or wings),

images from all 4 angles yielded similar berry counts. Conversely,

clusters with wings, as they were only visible from specic angles,

increased the berry count prediction. While the berry count was

underestimated, a linear regression model of the form y ~ β0 +

β1x was sucient to adjust the prediction considerably well

(adjusted R2 = 0.8723), as long as the cluster with the maximum

number of berries (from the 4 images taken at dierent angles)

was used in the model.

Berry size (measured as projected berry area) was more

challenging to predict (Fig. 2D). Predictions were mostly over-

estimations and varied signicantly depending on the imaging

angle. Most berries were between 120 and 150 mm2, with just

a few having smaller sizes (<100 mm2). Studying clusters with

more variation in berry size might be required to better assess

the correlation for this trait. Similar to berry counts, a linear

model was tted using all cluster views available for each clus-

ter. Since it appeared to be linear, the tted values were consis-

tent with the real size estimations (adjusted R2 = 0.8457).

Cluster angle matters

Not all the berries in a cluster can be seen from a given angle;

therefore, berry counts from 2D images were, as expected,

underestimated (Fig. 2C). While cylindrical clusters are more

common among cultivars, the presence of ramications or

wings, or other asymmetries, can impact the number of berries

visible from a single view. To measure the eect of the image

angle on the berry counts, 490 clusters from 99 vines were

imaged from 4 dierent angles (0°, 90°, 180°, and 270°), and

the berry counts and sizes were compared. In general, the berry

count can vary by approximately ±50%, depending on the angle

(Fig. 3A). As expected, opposing angles (0° and 180°, 90° and

270°) tend to have more similar results (Fig. 3B). In other

words, when the cluster ramication or wing is fully visible

from a given angle, it becomes invisible or hard to distinguish

when the cluster is rotated 90°, and becomes fully visible again

aer another 90° rotation. Berry size was less dependent on the

viewing angle (Fig. 3C). In general, berry size varied by +30%.

e extent of the variation in berry count as a function of view-

ing angle is shown in Fig. 2D.

Cluster architecture

A typical approach for measuring cluster architecture and com-

pactness is based on whole cluster segmentation instead of

berry segmentation (e.g., [4]). While this method provides

insightful information and is easy to implement, it ignores the

spatial distribution of berries within the cluster. Moreover, in

Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 5

0° 90°

180° 270°

y = −13.38 + 2.25x

100

150

50 100 150 200

Real number of berries

Predicted number of berries

y = −12.63 + 0.93x

100

150

200

60 90 120 150

Real berry area (mm2)

Predicted berry area (mm2)

0°

180°

270°

90°

Corrected

Fig.2.Prediction of berry number using SAM from cluster images. (A) Identiﬁcation of individual berries from 4 angles on the same cluster. (B) Berry masks from cluster

images in panel A, color-coded by angle view. (C) Correlation between real and predicted berry counts from SAM; predicted counts for each angle view in panel A are displayed.

Points marked with an X represent corrected counts using the angle view with the maximum berries, adjusted with a linear model. (D) Correlation between real and predicted

berry area; color and shape patterns are similar to panel C; corrected points were generated with a linear model of the form y ~ β0 + β1x. The vertical red line indicates a one-

to-one relationship between variables.

−60

−30

0° 90° 180° 270°

% of change in berry number

relative to angle 1 (0 degrees)

−40 04080

% of change in berry number relative to angle 1

Count

180° 270° 90°

−40 −20 0204060

% of change in max berry area relative to angle 1

Count

180° 180° 90°

n = 21

n = 24

n = 37

n = 33

n = 48

n = 49

n = 48

n = 44

n = 43

n = 35

n = 27

n = 40

n = 32

n = 45

n = 60

n = 42

n = 65

n = 50

n = 20

n = 31

n = 24

n = 31

n = 40

n = 59

n = 43

n = 56

Fig.3.Impact of imaging angle on cluster analysis. (A) Change in berry number relative to angle 1 (0°, first image); each green line represents a cluster imaged at

4 different angles. (B) Frequency plot of changes in berry number relative to angle 1, similar to panel A. (C) Frequency plot of changes in max berry area relative to

angle 1. (D) Examples illustrating the effect of berry angle on SAM-detected berry counts; each column represents a different cluster, and each row represents a

different angle (0°, 90°, 180°, and 270°). The number of detected berries is indicated in each image. The first 4 clusters show little variation, while the last 4 exhibit

extreme berry count variation.

Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 6

the setup used for photographing clusters, it is common to use

clamps, hooks, or clips to hang clusters, which can then be chal-

lenging to identify during image analysis or post-processing.

In those cases, a common strategy is to crop the top of the image

to remove such objects. When the peduncle is long, cropping

the image does not aect the analysis; however, in clusters with

short peduncles or prominent shoulders, cropping the image

results in cropped berries as well. In this study, although the

clamps (and other objects) were masked by SAM, because of

dierent colors, sizes, and shapes, they were easy to identify

and remove.

To illustrate the capabilities of cluster architecture analysis

using berry locations, empirical cumulative distribution func-

tions were developed along the y-axis (from the top of the cluster,

or the peduncle, to the bottom, or cluster tip) and the x-axis

(from le to right). e distribution functions provided dierent

levels of information. For example, they allowed the estimation

of symmetry along both the x- and y-axes. With these symmetry

estimators, cylindrical or globular clusters are expected to have

a more uniform cumulative distribution. On the other hand,

clusters with ramications or signicant ramications will show

a cumulative distribution along the x-axis skewed opposite to

the main ramication.

A cluster with a prominent wing, photographed from dif-

ferent angles, is provided as an example in Fig. 4A. At 0° and

180° views, the wing is not visible, as it is either in front and com-

pletely aligned with the main cluster or in the back. In this case,

the cluster appears more cylindrical and symmetrical along both

axes. e empirical cumulative distribution functions for these

2 views, shown as red and green dots in Fig. 4B and E, were more

uniform and appeared as straight diagonal lines. Conversely, at

90° and 270° views, the wing becomes visible and produces a

very skewed distribution along the x-axis. Since the 90° and 270°

views, and the 0° and 180° views, can be seen as “mirror” images,

the distribution functions in Fig. 4D and E also display this

mirroring feature.

Masks generated by SAM for each berry object were repre-

sented as x, y coordinates, and their corresponding polygons

were drawn, as shown in Figs. 2A and 3D. Combining all the

berry polygons produced a representation of entire clusters. When

a cluster has a cylindrical or globular shape, and no wings are

present, representing its shape is simple. However, when other

cluster features are present, such as wings, shoulders, and conical

forms, among others, the so-called cluster shape descriptor

can vary depending on how detailed these complex features are

represented.

For example, for a cluster with a prominent wing, as the one

shown in Fig. 4A, should the outline (or contour) dening the

cluster shape include the sinus formed by the 2 wings? If so, how

far inside the sinus? e opposite approach would be to simply

0° 90°

180° 270°

0.00

0.25

0.50

0.75

1.00

0200 400600 800

x coordinate (shifted to start in 0)

Fn(x)

0.00

0.25

0.50

0.75

1.00

025507

5100

Normalized x coordinate

Fn(x)

0.00

0.25

0.50

0.75

1.00

0250 500750

y coordinate (shifted to start in 0)

Fn(y)

0.00

0.25

0.50

0.75

1.00

025507

5100

Normalized y coordinate

Fn(y)

0°

180°

270°

90°

1,000

Fig.4.Cumulative distributions of berry locations along the horizontal and vertical axes. (A) Example of berries identiﬁed in a cluster imaged from 4 diﬀerent angles (0°, 90°, 180°, and

270°). (B) Empirical cumulative distributions along the x-axis for the 4 angle views; berry locations along the x-axis are shifted to start at 0. (C) Similar to panel B, but for the y-axis.

(D) Similar to panel B, but berry locations along the x-axis are scaled from 0 to 100 and sampled with n = 100. (E) Similar to panel D, but for berry locations along the y-axis.

Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 7

Fig.5.What is cluster architecture? Example of concave hull calculation for diﬀerent clusters (in columns) at diﬀerent cluster shape deﬁnition levels (from top to bottom,

higher to lower deﬁnition); concave hulls are calculated on the union of all berry masks in the cluster.

Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 8

connect the tips of the wing and the main cluster formation,

which would produce a simpler polygon. e same applies to

the presence of shoulders and curvatures along the cluster.

Figure 5 illustrates the same 8 clusters from Fig. 3D, outlined

using concave hulls with varying degrees of detail, from top to

bottom. At the top panels, the cluster outlines preserved detailed

features such as shoulders, indentations, separations between

wings, etc. Toward the bottom part of the gure, most of these

features were lost.

e approaches described above to measure cluster archi-

tecture were applied to all 3,431 cluster representations analyzed

in this study. Cumulative distribution functions for axes x and

y showed varying levels of asymmetry. Along the x-axis (Fig. 6A),

the asymmetry is due to having more berries either on the right

or the le side of the cluster (likely because of the presence of

a wing). For example, the green lines in Fig. 6A represent the

distribution functions of clusters in which more berries exist

on the le side of the cluster (as much as ~75%). Conversely,

the purple lines represent clusters with a larger accumulation

of berries on the right side of the cluster. Finally, gray lines

represent more symmetrical clusters, with an equal amount of

berries on the le and right. Regarding the y-axis (Fig. 6B), most

of the asymmetry is toward the base of the cluster, which is

expected, as many clusters exhibit conical forms. Importantly,

the color assignations (i.e., categories) in Fig. 6A and B are sub-

jective and for illustrative purposes only.

en, cluster shape variation was studied using the polygons

generated using concave hulls. e concave hulls were generated

using a conservative level of cluster feature preservation (using

the function R function sf::st_concave_hull(), with ratio=5) but

with enough resolution to capture major asymmetries, wings,

and shoulders. In general, cluster shape exhibited a continuous

gradient of variability with no clear group formation (Fig. 6C

and D). In other words, there were no groups formed only with,

for example, winged and non-winged clusters, or symmetric

and non-symmetric clusters. Instead, asymmetries can be small

and slightly visible, and increase gradually in size and separation

from the main cluster. To understand what cluster features were

associated with each PC, 100 clusters with extreme PC scores

(50 more negative and 50 more positive) were plotted for PCs

1 to 4 (Fig. 6E). PC1, which explained 53.23% of the variation,

was associated with aspect ratio, with more circular/globular

clusters having more negative values, and very elongated clusters

with more positive values. PC2, which explained 18.29% of the

variation, was associated with the location of the asymmetries

along the x-axis (either to the le or the right). Finally, both PCs

3 and 4, which accounted for a little less than 18%, explained

other more complex features (wings and shoulders) that are

more dicult to discern.

Is the level of sensitivity to complex cluster

features meaningful?

e methodologies employed in this study for identifying ber-

ries within a cluster, counting them, studying their spatial dis-

tribution to generate cumulative distribution functions, and

0.00

0.25

0.50

0.75

1.00

0255075100

Normalized x coordinate

Fn(x)

Non-symmetric

(more berries to the left)

Non-symmetric

(more berries to the right)

Symmetric

0.00

0.25

0.50

0.75

1.00

0255075100

Normalized y coordinate

Fn(y)

Non-symmetric

(more berries at the base)

Non-symmetric

(more berries at the tip)

Symmetric

(cylindrical)

−0.2

−0.1

0.0

0.1

0.2

−0.20.0 0.2

PC1 (54.42%)

PC2 (17.11%)

−0.2

−0.1

0.0

0.1

0.2

PC3

PC4

−0.1

0.0

0.1

−0.1

0.0

0.1

−0.3 −0.2 −0.10.0 0.10.2

PC3 (13.66%)

PC4 (3.57%)

PC2

−0.2

−0.1

0.0

0.1

0.2

−0.2

0.0

0.2

PC1

Fig.6.Comprehensive analysis of cluster architecture using cumulative distribution function and PCA of concave hulls. Empirical cumulative distributions for 3,431 clusters

using berry locations along the (A) x- and (B) y-axes; berry locations along both x- and y-axes are scaled from 0 to 100 and sampled with n = 100, similar to Fig. 3D and E. In

both cases, the green lines correspond to distributions with a normalized coordinate 25 larger than 0.3 and a normalized coordinate 75 larger than 0.8; the purple lines have

a normalized coordinate 25 < 0.2 and a normalized coordinate 75 < 0.7; ﬁnally, the gray lines have a coordinate 25 between 0.2 and 0.3, coordinate 50 between 0.45 and

0.55, and coordinate 75 between 0.7 and 0.8. Variation in cluster architecture along principal components 1 and 2 (B) and 3 and 4 (C). In panel C, diﬀerent colors and sizes

correspond to variations in principal components 3 and 4, respectively. Similarly, in panel D, point color and size correspond to variations in principal components 1 and 2,

respectively. (E) One hundred clusters sampled from the extremes of principal components 1 (green), 2 (gray), 3 (dark cyan), and 4 (salmon); the clusters in each color group

are ordered from left to right and by rows according to their corresponding principal component values.

Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 9

applying PCA to examine cluster shape variation demonstrated

high sensitivity (Fig. 6). However, a critical question is: are these

features primarily driven by genetic variation, or are they sim-

ply a result of environmental and non-genetic factors?

e primary aim of this research was to implement SAM for

berry identication and propose methodologies for leveraging

this information in cluster architecture and compactness analysis.

erefore, the focus was not on characterizing specic cultivars

or genotypes in the surveyed population but rather on sampling

diverse cluster variations. Nevertheless, as mentioned earlier, the

sampled vines are part of a mapping population between Riesling

and Cabernet Sauvignon, planted in a randomized complete

block design with 3 contiguous vines per genotype per block.

is design allowed the calculation of repeatability, expressed

as the percentage of genetic variance relative to the phenotypic

variance.

First, to assess the consistency of the phenotypes measured

in this study, boxplot graphs per genotype were examined for

18 variables. ese variables included basic descriptors such

as berry count, area, length, and width, all computed from the

berry masks identied by SAM. Additionally, cluster compact-

ness was calculated as the ratio between the sum of all berry

areas and the concave hull area. Using the empirical cumulative

distribution functions, the predicted percentage of berries at

x or y = 25, 50, and 75 was also determined. In terms of cluster

architecture based on concave hulls, PCs 1 and 2 were included.

Finally, cluster length, width, perimeter, and aspect ratio were

computed using the concave hulls.

Overall, variables such as berry count, area, length, and width,

as well as cluster area, length, width, and perimeter, showed good

consistency (Fig. 7A), high correlation (Fig. 7B), and medium-

to-high repeatability (Fig. 7C). While descriptors derived from

cumulative distributions showed a correlation among them-

selves, except for ECDF at x = 25 and y = 25, their variability

was higher, likely inuenced by non-genetic sources given their

very low or zero repeatability. Cluster compactness demonstrated

little correlation with other traits but exhibited good consistency

with a repeatability of ~0.6. PC1 from the PCA conducted on

concave hulls, and related to cluster aspect ratio, also showed

good consistency and medium to high repeatability. In summary,

these analyses revealed that many variables computed from the

berry masks identied by SAM, along with others describing

more complex features in the cluster, possess a genetic compo-

nent. Nevertheless, certain variables, particularly those originat-

ing from empirical cumulative distribution functions, seem to

be strongly aected by variations in the environment.

Discussion

Several computational, image-based strategies have been imple-

mented to measure grapevine cluster architecture and compact-

ness. However, only a few have been utilized for identifying

ABerry count Berry area Berry length

Berry width Cluster compactness (berr y area/concave hull area)ECDF at x = 25

ECDF at x = 50 ECDF at x = 75 ECDF at y = 25

ECDF at y = 50 ECDF at y = 75 PC1 on concave hulls

PC2 on concave hulls Cluster area Cluster length

Cluster width Cluster perimeter Cluster aspect ratio

Berry count

Berry area

Berry length

Berry width

Compactness

ECDF at x = 25

ECDF at x = 50

ECDF at x = 75

ECDF at y = 25

ECDF at y = 50

ECDF at y = 75

PC1 on concave hulls

PC2 on concave hulls

Cluster area

Cluster length

Cluster width

Cluster perimeter

Cluster aspect ratio

berry count

berry area

berry length

berry width

compactness

ECDF at x=25

ECDF at x=50

ECDF at x=75

ECDF at y=25

ECDF at y=50

ECDF at y=75

PC1 on concave hulls

PC2 on concave hulls

cluster area

cluster length

cluster width

cluster perimeter

cluster aspect ratio

−1.0

−0.5

0.0

0.5

1.0

Corr

Berry area

Berry count

Berry length

Berry width

Cluster area

Cluster aspect ratio

Cluster compactness (berry area/concave hull area)

Cluster length

Cluster perimeter

Cluster width

ECDF at x = 25

ECDF at x = 50

ECDF at x = 75

ECDF at y = 25

ECDF at y = 50

ECDF at y = 75

PC1 on concave hulls

PC2 on concave hulls

0.00.2 0.40.6 0.8

Repeatability

Fig.7.Variability in berry and cluster characteristics. (A) Variation in berry characteristics and cluster architecture grouped by genotype; each genotype was replicated in 3

blocks; for each replicated vine, 5 clusters were sampled and imaged from 1 or 4 angles; genotypes are ordered by mean value, and names are omitted due to space constraints.

(B) Pearson’s correlation between traits. (C) Repeatability is calculated as the proportion of genetic variance relative to the phenotypic variance.

Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 10

individual berries within clusters [18–21,23]. Most of these

strategies rely on non-generalizable mathematical and analyti-

cal frameworks for analyzing colored images. Humans can easily

discern individual berries in a cluster image, even when taken

in the eld or under challenging light conditions. erefore, it

is reasonable to assume that machine learning algorithms could

achieve similar capabilities. However, until now, these models

have primarily been applied to cluster identication rather than

berry identication. is does not discard the potential use of

“conventional” deep learning approaches trained with human-

segmented berries, but they would require a substantial amount

of image labeling for training.

With the recent introduction of foundation models, par-

ticularly the SAM [27], objects of interest can be automatically

segmented without the need for additional training or ne-

tuning, at least for natural objects. In some specic cases, such

as in medical imaging, additional ne-tuning allows for more

accurate predictive models capable of analyzing many dierent

image types [28]. Here, we demonstrate that out-of-the-box

SAM can accurately segment berries in a 2D grape cluster

image with up to a 0.96 correlation (human berry counts vs.

SAM predictions on visible berries in 2D images; Fig. S5).

While one might argue that the segmented masks produced by

SAM in this study needed supervised classication to identify

berry objects exclusively, the implementation of lters (IoU,

size, area, EFD, and PCA) was straightforward. is approach

can be applied to hundreds, thousands, or even millions of

masks without any changes to the programming. A continua-

tion of this work could be the development of an automatic

classier based on, for example, YOLO, that can use cropped

images based on bounding boxes generated by SAM.

Applying SAM to photos of clusters still on the vine is pos-

sible, but it would require further development, particularly in

regard to image preprocessing. is preprocessing step would

rst need to identify clusters within vine images, which is fea-

sible with methods already available [16], and second, to remove

the background in cropped images containing clusters. Failing

to do this last step will cause SAM to segment non-berry objects,

such as leaves, trunks, or shoots (Fig. S4). In Fig. 8, we show 2

examples of how object removal could be performed on pre-

cropped images of clusters to further process them with SAM.

e same algorithm for berry detection and non-berry object

removal was used. One of the 2 models presented, BRBG (BRIA

Background Removal), is a background-removal tool available

at https://huggingface.co/briaai/RMBG-1.4. Although BRBG is

simple to use, it is not very customizable. For example, it does

not allow for dening the object of interest. However, it does

perform well at removing the background in images. e second

Raw image

Background removal using

RMBG v1.4

Depth estimation using DepthAnything

(yellower color = closer objects)

Berry segmentation using SA

Fig.8.Example of 2 machine learning tools (RMBG and Depth Anything) for the preprocessing required to remove background before implementing SAM. Raw image taken

from https://fps.ucdavis.edu with permission.

Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 11

model, Depth Anything [30], is used for monocular depth

estimation, which can be employed to remove backgrounds/

foregrounds based on depth. is serves only as an example of

how a future end-to-end pipeline for vineyard applications

might look. Our study aimed to showcase the capabilities of

zero-shot machine learning models that, despite their generaliza-

tion capabilities, perform well in specic situations, such as

berry segmentation. One of the important takeaways is that

researchers will have to spend less time on model training as

these models become more widely available.

Another consideration when deploying machine learning

tools for real-world applications (e.g., processing images directly

in the vineyard using a mobile device) is processing time. In

our study, the processing time per image was as fast as 14 s on

a GPU-powered machine, which was sucient for our needs.

However, for large-scale applications and edge computing, other

SAM-like implementations such as EdgeSAM [31], fastSAM

[32], and EcientSAM [33], could be adopted.

Although the number of berries visible in a cluster image

underestimates the actual number of berries in a cluster, this

underestimation can be corrected using a linear regression model

(Fig. 2C). Moreover, to compensate for the variability in berry

number caused by cluster ramications of rachis visible only

from certain angles, an additional image, for example, taken at

90°, can correct for any berry count underestimation. Notably,

the berry masks generated by SAM can be used for compre-

hensive cluster architecture analysis, which is only possible if

berries are spatially located.

While some of the trait variation in cluster architecture and

compactness, particularly that captured by the analysis of

empirical cumulative distributions, was inuenced by envi-

ronmental factors, these traits can still have applications in

determining vineyard management practices. For instance,

cluster thinning and/or tipping could be targeted toward asym-

metric or winged clusters, or those with specic architectures.

In table grape production, certain cluster architectures might

be more appealing to consumers [34]. For these types of appli-

cations to be feasible under eld conditions, SAM would have

to be integrated into an existing pipeline that processes eld

images obtained with cameras mounted on rovers or tractors.

In [16], for example, images from vines were acquired using a

sensing kit equipped with RGB cameras, and further processed

using YOLO to identify clusters within the image for yield

prediction. In this case, SAM could be incorporated into this

pipeline to compute additional variables regarding berry count,

size, and cluster architecture.

e observation that the cumulative distribution functions

(Fig. 6A and B) explaining cluster architecture showed lower

or zero repeatability is specic to the Riesling by Cabernet

Sauvignon population analyzed in this study. However, this

does not rule out the possibility that other mapping or breed-

ing populations display heritable variation for these traits.

Consequently, these traits could still be valuable for genetics

research or selection purposes in other mapping or breeding

populations.

Acknowledgments

e authors would like to thank Veronica Nunez, Jose Munoz,

Sadikshya Sharma, Yaniv Lupo, Hollywood Banayad, and Dan

Ng for their support during vineyard management, harvest,

and image annotation. e authors would also like to thank

Dario Cantu for providing access to the F1 population used

in this study.

Funding: This project was partially supported by USDA-

NIFA Specialty Crop Research Initiative Award No. 2022-

51181-38240.

Author contributions: E.T.-L. developed the proof of concept

and set up the computational workow to implement SAM.

E.T.-L. and L.D.-G. conceived and designed the eld experiment.

J.L.-B. and G.G.-Z. supported eldwork and cluster imaging.

E.T.-L. and L.D.-G. wrote the manuscript.

Competing interests: e authors declare that they have no

competing interests.

Data Availability

All the data and code to reproduce the results of this study are

available at https://github.com/diazgarcialab/SAM-cluster-

segmentation.

Supplementary Materials

Figs. S1 to S6

References

1. Tello J, Ibáñez J. What do we know about grapevine bunch

compactness? A state-of-the-art review. Aust J Grape Wine Res.

2018;24(1):6–23.

2. Richter R, Gabriel D, Rist F, Töpfer R, Zyprian E.

Identication of co-located QTLs and genomic regions

aecting grapevine cluster architecture. eor Appl Genet.

2019;132(4):1159–1177.

3. Correa J, Mamani M, Muñoz-Espinoza C, Laborie D, Muñoz C,

Pinto M, Hinrichsen P. Heritability and identication of QTLs

and underlying candidate genes associated with the architecture

of the grapevine cluster (Vitis vinifera L.). eor Appl Genet.

2014;127(5):1143–1162.

4. Underhill A, Hirsch C, Clark M. Image-based phenotyping

identies quantitative trait loci for cluster compactness in

grape. J Am Soc Hortic Sci. 2020;145(6):363–373.

5. Fanizza G, Lamaj F, Costantini L, Chaabane R, Grando MS.

QTL analysis for fruit yield components in table grapes (Viti s

vinifera). eor Appl Genet. 2005;111(4):658–664.

6. Richter R, Rossmann S, Töpfer R, eres K, Zyprian E. Genetic

analysis of loose cluster architecture in grapevine. BIO Web

Conf. 2017;9:01016.

7. Li-Mallet A, Rabot A, Geny L. Factors controlling

inorescence primordia formation of grapevine: eir role in

latent bud fruitfulness? A review. Botany. 2016;94:147–163.

8. Pieri P, Zott K, Gomès E, Hilbert G. Nested eects of

berry half, berry and bunch microclimate on biochemical

composition in grape. OENO One. 2016;50:23.

9. Hed B, Ngugi HK, Travis JW. Relationship between cluster

compactness and bunch rot in Vignoles grapes. Plant Dis.

2009;93:1195–1201.

10. Vail ME, Wolpert JA, Gubler WD, Rademacher MR. Eect

of cluster tightness on botrytis bunch rot in six chardonnay

clones. Plant Dis. 1998;82(1):107–109.

11. Vali ME, Marois JJ. Grape cluster architecture and the

susceptibility of berries to Botrytis cinerea. Phytopathology.

1991;81:188–191.

Torres-Lomas et al. 2024 | https://doi.org/10.34133/plantphenomics.0202 12

12. Austin CN, Wilcox WF. Eects of sunlight exposure on

grapevine powdery mildew development. Phytopathology.

2012;102(9):857–866.

13. Azevedo CF, Ferrão LFV, Benevenuto J, de Resende MDV,

Nascimento M, Nascimento ACC, Munoz PR. Using visual

scores for genomic prediction of complex traits in breeding

programs. eor Appl Genet. 2023;137(1):9.

14. Underhill A, Hirsch CD, Clark MD. Evaluating and mapping

grape color using image-based phenotyping. Plant Phenomics.

2020;2020:8086309.

15. Font D, Tresanchez M, Martínez D, Moreno J, Clotet E,

Palacín J. Vineyard yield estimation based on the analysis of

high resolution images obtained with articial illumination at

night. Sensors. 2015;15(4):8284–8301.

16. Olenskyj AG, Sams BS, Fei Z, Singh V, Raja PV, Bornhorst GM,

Earles JM. End-to-end deep learning for directly estimating

grape yield from ground-based imagery. Comput Electron

Agric. 2022;198:Article 107081.

17. Nuske S, Wilshusen K, Achar S, Yoder L, Narasimhan S,

Singh S. Automated visual yield estimation in vineyards. J Field

Robot. 2014;31(5):837–860.

18. Schöler F, Steinhage V. Automated 3D reconstruction of grape

cluster architecture from sensor data for ecient phenotyping.

Comput Electron Agric. 2015;114:163–177.

19. Li M, Klein LL, Duncan KE, Jiang N, Chitwood DH, Londo JP,

Miller AJ, Topp CN. Characterizing 3D inorescence

architecture in grapevine using X-ray imaging and advanced

morphometrics: Implications for understanding cluster

density. J Exp Bot. 2019;70(21):6261–6276.

20. Ivorra E, Sánchez AJ, Camarasa JG, Diago MP, Tardaguila J.

Assessment of grape cluster yield components based on 3D

descriptors using stereo vision. Food Control. 2015;50: 273–282.

21. Luo L, Liu W, Lu Q, Wang J, Wen W, Yan D, Tang Y. Grape

berry detection and size measurement based on edge

image processing and geometric morphology. Mach Des.

2021;9(10):233.

22. Aquino A, Diago MP, Millán B, Tardáguila J. A new

methodology for estimating the grapevine-berry number per

cluster using image analysis. Biosyst Eng. 2017;156:80–95.

23. Zabawa L, Kicherer A, Klingbeil L, Töpfer R, Kuhlmann H,

Roscher R. Counting of grapevine berries in images via

semantic segmentation using convolutional neural networks.

ISPRS J Photogramm Remote Sens. 2020;164:73–83.

24. Zhang Y, Jiao R. Towards segment anything model (SAM) for

medical image segmentation: A survey. arXiv. 2023. http://

arxiv.org/abs/2305.03678

25. Zhou C, Li Q, Li C, Yu J, Liu Y, Wang G, Zhang K, Ji C, Yan

Q, Peng H, etal. A comprehensive survey on pretrained

foundation models: A history from BERT to ChatGPT. arXiv.

2023. http://arxiv.org/abs/2302.09419

26. Mazurowski MA, Dong H, Gu H, Yang J, Konz N, Zhang Y.

Segment anything model for medical image analysis: An

experimental study. Med Image Anal. 2023;89:Article

102918.

27. Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L,

Xiao T, Whitehead W, Berg AC, Lo W-Y, etal. Segment

anything. arXiv. 2023. http://arxiv.org/abs/2304.02643

28. Ma J, He Y, Li F, Han L, You C, Wang B. Segment anything in

medical images. Nat Commun. 2024;15:654.

29. Bonhomme V, Picq S, Gaucherel C, Claude J. Momocs:

Outline analysis using R. J Stat Sow. 2014;56(13): 10.18637/

jss.v056.i13.

30. Yang L, Kang B, Huang Z, Xu X, Feng J, Zhao H. Depth

anything: Unleashing the power of large-scale unlabeled data.

arXiv. 2024. http://arxiv.org/abs/2401.10891

31. Zhou C, Li X, Loy CC, Dai B. EdgeSAM: Prompt-in-the-loop

distillation for on-device deployment of SAM. arXiv. 2023.

http://arxiv.org/abs/2312.06660

32. Zhao X, Ding W, An Y, Du Y, Yu T, Li M, Tang M, Wang

J. Fast segment anything. arXiv. 2023. http://arxiv.org/

abs/2306.12156

33. Xiong Y, Varadarajan B, Wu L, Xiang X, Xiao F, Zhu C, Dai

X, Wang D, Sun F, Iandola F, etal. EcientSAM: Leveraged

masked image pretraining for ecient segment anything.

arXiv. 2023. http://arxiv.org/abs/2312.00863

34. Zhou J, Cao L, Chen S, Perl A, Ma H. Consumer-assisted

selection: e preference for new tablegrape cultivars in China.

Aust J Grape Wine Res. 2015;21(3):351–360.

ResearchGate has not been able to resolve any citations for this publication.

Segment anything in medical images

Article

Full-text available

Jan 2024

Medical image segmentation is a critical component in clinical practice, facilitating accurate diagnosis, treatment planning, and disease monitoring. However, existing methods, often tailored to specific modalities or disease types, lack generalizability across the diverse spectrum of medical image segmentation tasks. Here we present MedSAM, a foundation model designed for bridging this gap by enabling universal medical image segmentation. The model is developed on a large-scale medical image dataset with 1,570,263 image-mask pairs, covering 10 imaging modalities and over 30 cancer types. We conduct a comprehensive evaluation on 86 internal validation tasks and 60 external validation tasks, demonstrating better accuracy and robustness than modality-wise specialist models. By delivering accurate and efficient segmentation across a wide spectrum of tasks, MedSAM holds significant potential to expedite the evolution of diagnostic tools and the personalization of treatment plans.

Using visual scores for genomic prediction of complex traits in breeding programs

Article

Full-text available

Dec 2023
THEOR APPL GENET

Key message An approach for handling visual scores with potential errors and subjectivity in scores was evaluated in simulated and blueberry recurrent selection breeding schemes to assist breeders in their decision-making. Abstract Most genomic prediction methods are based on assumptions of normality due to their simplicity and ease of implementation. However, in plant and animal breeding, continuous traits are often visually scored as categorical traits and analyzed as a Gaussian variable, thus violating the normality assumption, which could affect the prediction of breeding values and the estimation of genetic parameters. In this study, we examined the main challenges of visual scores for genomic prediction and genetic parameter estimation using mixed models, Bayesian, and machine learning methods. We evaluated these approaches using simulated and real breeding data sets. Our contribution in this study is a five-fold demonstration: (i) collecting data using an intermediate number of categories (1–3 and 1–5) is the best strategy, even considering errors associated with visual scores; (ii) Linear Mixed Models and Bayesian Linear Regression are robust to the normality violation, but marginal gains can be achieved when using Bayesian Ordinal Regression Models (BORM) and Random Forest Classification; (iii) genetic parameters are better estimated using BORM; (iv) our conclusions using simulated data are also applicable to real data in autotetraploid blueberry; and (v) a comparison of continuous and categorical phenotypes found that investing in the evaluation of 600–1000 categorical data points with low error, when it is not feasible to collect continuous phenotypes, is a strategy for improving predictive abilities. Our findings suggest the best approaches for effectively using visual scores traits to explore genetic information in breeding programs and highlight the importance of investing in the training of evaluator teams and in high-quality phenotyping.

Grape Berry Detection and Size Measurement Based on Edge Image Processing and Geometric Morphology

Article

Full-text available

Oct 2021

Counting grape berries and measuring their size can provide accurate data for robot picking behavior decision-making, yield estimation, and quality evaluation. When grapes are picked, there is a strong uncertainty in the external environment and the shape of the grapes. Counting grape berries and measuring berry size are challenging tasks. Computer vision has made a huge breakthrough in this field. Although the detection method of grape berries based on 3D point cloud information relies on scanning equipment to estimate the number and yield of grape berries, the detection method is difficult to generalize. Grape berry detection based on 2D images is an effective method to solve this problem. However, it is difficult for traditional algorithms to accurately measure the berry size and other parameters, and there is still the problem of the low robustness of berry counting. In response to the above problems, we propose a grape berry detection method based on edge image processing and geometric morphology. The edge contour search and the corner detection algorithm are introduced to detect the concave point position of the berry edge contour extracted by the Canny algorithm to obtain the best contour segment. To correctly obtain the edge contour information of each berry and reduce the error grouping of contour segments, this paper proposes an algorithm for combining contour segments based on clustering search strategy and rotation direction determination, which realizes the correct reorganization of the segmented contour segments, to achieve an accurate calculation of the number of berries and an accurate measurement of their size. The experimental results prove that our proposed method has an average accuracy of 87.76% for the detection of the concave points of the edge contours of different types of grapes, which can achieve a good edge contour segmentation. The average accuracy of the detection of the number of grapes berries in this paper is 91.42%, which is 4.75% higher than that of the Hough transform. The average error between the measured berry size and the actual berry size is 2.30 mm, and the maximum error is 5.62 mm, which is within a reasonable range. The results prove that the method proposed in this paper is robust enough to detect different types of grape berries.

Image-based Phenotyping Identifies Quantitative Trait Loci for Cluster Compactness in Grape

Article

Full-text available

Nov 2020

Grape ( Vitis vinifera ) cluster compactness is an important trait due to its effect on disease susceptibility, but visual evaluation of compactness relies on human judgement and an ordinal scale that is not appropriate for all populations. We developed an image analysis pipeline and used it to quantify cluster compactness traits in a segregating hybrid wine grape ( Vitis sp.) population for 2 years. Images were collected from grape clusters immediately after harvest, segmented by color, and analyzed using a custom script. Both automated and conventional phenotyping methods were used, and comparisons were made between each method. A partial least squares (PLS) model was constructed to evaluate the prediction of physical cluster compactness using image-derived measurements. Quantitative trait loci (QTL) on chromosomes 4, 9, 12, 16, and 17 were associated with both image-derived and conventionally phenotyped traits within years, which demonstrated the ability of image-derived traits to identify loci related to cluster morphology and cluster compactness. QTL for 20-berry weight were observed between years on chromosomes 11 and 17. Additionally, the automated method of cluster length measurement was highly accurate, with a deviation of less than 10 mm ( r = 0.95) compared with measurements obtained with a hand caliper. A remaining challenge is the utilization of color-based image segmentation in a population that segregates for fruit color, which leads to difficulty in differentiating the stem from the fruit when the two are similarly colored in non-noir fruit. Overall, this research demonstrates the validity of image-based phenotyping for quantifying cluster compactness and for identifying QTL for the advancement of grape breeding efforts.

Evaluating and Mapping Grape Color Using Image-Based Phenotyping

Article

Full-text available

Apr 2020

Grape berry color is an economically important trait that is controlled by two major genes influencing anthocyanin synthesis in the skin. Color is often described qualitatively using six major categories; however, this is a subjective rating that often fails to describe variation within these six classes. To investigate minor genes influencing berry color, image analysis was used to quantify berry color using different color spaces. An image analysis pipeline was developed and utilized to quantify color in a segregating hybrid wine grape population across two years. Images were collected from grape clusters immediately after harvest and segmented by color to determine the red, green, and blue (RGB); hue, saturation, and intensity (HSI); and lightness, red-green, and blue-yellow values ( L∗a∗b∗ ) of berries. QTL analysis identified known major QTL for color on chromosome 2 along with several previously unreported smaller-effect QTL on chromosomes 1, 5, 6, 7, 10, 15, 18, and 19. This study demonstrated the ability of an image analysis phenotyping system to characterize berry color and to more effectively capture variability within a population and identify genetic regions of interest.

Segment Anything

Conference Paper

Oct 2023

Segment anything model for medical image analysis: An experimental study

Article

Aug 2023
MED IMAGE ANAL

Training segmentation models for medical images continues to be challenging due to the limited availability of data annotations. Segment Anything Model (SAM) is a foundation model trained on over 1 billion annotations, predominantly for natural images, that is intended to segment user-defined objects of interest in an interactive manner. While the model performance on natural images is impressive, medical image domains pose their own set of challenges. Here, we perform an extensive evaluation of SAM's ability to segment medical images on a collection of 19 medical imaging datasets from various modalities and anatomies. In our experiments, we generated point and box prompts for SAM using a standard method that simulates interactive segmentation. We report the following findings: (1) SAM's performance based on single prompts highly varies depending on the dataset and the task, from IoU=0.1135 for spine MRI to IoU=0.8650 for hip X-ray. (2) Segmentation performance appears to be better for well-circumscribed objects with prompts with less ambiguity such as the segmentation of organs in computed tomography and poorer in various other scenarios such as the segmentation of brain tumors. (3) SAM performs notably better with box prompts than with point prompts. (4) SAM outperforms similar methods RITM, SimpleClick, and FocalClick in almost all single-point prompt settings. (5) When multiple-point prompts are provided iteratively, SAM's performance generally improves only slightly while other methods' performance improves to the level that surpasses SAM's point-based performance. We also provide several illustrations for SAM's performance on all tested datasets, iterative segmentation, and SAM's behavior given prompt ambiguity. We conclude that SAM shows impressive zero-shot segmentation performance for certain medical imaging datasets, but moderate to poor performance for others. SAM has the potential to make a significant impact in automated medical image segmentation in medical imaging, but appropriate care needs to be applied when using it. Code for evaluation SAM is made publicly available at https://github.com/mazurowski-lab/segment-anything-medical-evaluation.

How Segment Anything Model (Sam) Boost Medical Image Segmentation: A Survey

Preprint

Jan 2023

End-to-end deep learning for directly estimating grape yield from ground-based imagery

Article

Jun 2022
COMPUT ELECTRON AGR

Yield estimation prior to harvest is a powerful tool in vineyard management, as it allows growers to fine-tune management practices to optimize yield and quality. However, yield estimation is currently performed using manual sampling, which is time-consuming and imprecise. This study demonstrates the applicability of nondestructive proximal imaging combined with deep learning for yield estimation in vineyards. Continuous image data collection using a vehicle-mounted sensing kit combined with collection of ground truth yield data at harvest using a commercial yield monitor allowed for the generation of a large dataset of 23,581 yield points and 107,933 images. Moreover, this study was conducted in a commercial vineyard which was mechanically managed, representing a challenging environment for image analysis but a common set of conditions in the California Central Valley. Three model architectures were tested: object detection, CNN regression, and trans- former models. The object detection model was trained on hand-labeled images to localize grape bunches, and detections were either counted or their pixel count was summed to obtain a metric which was correlated to grape yield. Conversely, regression models were trained end-to-end to directly predict grape yield from image data without the need for hand labeling. Results demonstrated that both a transformer model as well as the object detection model with pixel area processing performed comparably, with a mean absolute percent error of 18% and 18.5%, respectively on a representative holdout dataset. Saliency mapping was used to demonstrate the attention of the CNN regression model was localized near the predicted location of grape bunches, as well as on the top of the grapevine canopy. Overall, the study demonstrated the applicability of proximal imaging and deep learning for prediction of grapevine yield on a large scale. Additionally, the end-to-end modeling approach was able to perform comparably to the object detection approach while eliminating the need for hand-labeling.

Counting of grapevine berries in images via semantic segmentation using convolutional neural networks

Article

Apr 2020
ISPRS J PHOTOGRAMM

The extraction of phenotypic traits is often very time and labour intensive. Especially the investigation in viticulture is restricted to an on-site analysis due to the perennial nature of grapevine. Traditionally skilled experts examine small samples and extrapolate the results to a whole plot. Thereby different grapevine varieties and training systems, e.g. vertical shoot positioning (VSP) and semi minimal pruned hedges (SMPH) pose different challenges. In this paper we present an objective framework based on automatic image analysis which works on two different training systems. The images are collected semi automatic by a camera system which is installed in a modified grape harvester. The system produces overlapping images from the sides of the plants. Our framework uses a convolutional neural network to detect single berries in images by performing a semantic segmentation. Each berry is then counted with a connected component algorithm. We compare our results with the Mask-RCNN, a state-of-the-art network for instance segmentation and with a regression approach for counting. The experiments presented in this paper show that we are able to detect green berries in images despite of different training systems. We achieve an accuracy for the berry detection of 94.0% in the VSP and 85.6% in the SMPH.

Segment Anything for Comprehensive Analysis of Grapevine Cluster Architecture and Berry Properties

Abstract and Figures

Recommended publications

Image-based Phenotyping Identifies Quantitative Trait Loci for Cluster Compactness in Grape

Grapevine bunch weight estimation using image-based features: comparing the predictive performance o...

Estimation of the flower buttons per inflorescences of grapevine (Vitis vinifera L.) by image auto-a...

Combination of an Automated 3D Field Phenotyping Workflow and Predictive Modelling for High-Throughp...