ArticlePDF Available

Mapping of structural arrangement of cells and collective calcium transients: an integrated framework combining live cell imaging using confocal microscopy and UMAP-assisted HDBSCAN-based approach

Authors:

Abstract and Figures

Live cell calcium (Ca2+) imaging is one of the important tools to record cellular activity during in vitro and in vivo preclinical studies. Specially, high-resolution microscopy can provide valuable dynamic information at the single cell level. One of the major challenges in the implementation of such imaging schemes is to extract quantitative information in the presence of significant heterogeneity in Ca2+ responses attained due to variation in structural arrangement and drug distribution. To fill this gap, we propose time-lapse imaging using spinning disk confocal microscopy and machine learning-enabled framework for automated grouping of Ca2+ spiking patterns. Time series analysis is performed to correlate the drug induced cellular responses to self-assembly pattern present in multicellular systems. The framework is designed to reduce the large-scale dynamic responses using uniform manifold approximation and projection (UMAP). In particular, we propose the suitability of hierarchical DBSCAN (HDBSCAN) in view of reduced number of hyperparameters. We find UMAP-assisted HDBSCAN outperforms existing approaches in terms of clustering accuracy in segregation of Ca2+ spiking patterns. One of the novelties includes the application of non-linear dimension reduction in segregation of the Ca2+ transients with statistical similarity. The proposed pipeline for automation was also proved to be a reproducible and fast method with minimal user input. The algorithm was used to quantify the effect of cellular arrangement and stimulus level on collective Ca2+ responses induced by GPCR targeting drug. The analysis revealed a significant increase in subpopulation containing sustained oscillation corresponding to higher packing density. In contrast to traditional measurement of rise time and decay ratio from Ca2+ transients, the proposed pipeline was used to classify the complex patterns with longer duration and cluster-wise model fitting. The two-step process has a potential implication in deciphering biophysical mechanisms underlying the Ca2+ oscillations in context of structural arrangement between cells.
Content may be subject to copyright.
Received: May 25, 2022. Revised: November 22, 2022. Editorial decision: November 30, 2022. Accepted: November 30, 2022
© The Author(s) 2023. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Integrative Biology, 2023, 1–20
https://doi.org/10.1093/intbio/zyac017
Original Article
Mapping of structural arrangement of cells and
collective calcium transients: an integrated framework
combining live cell imaging using confocal microscopy
and UMAP-assisted HDBSCAN-based approach
Suman Gare1,Soumita Chel1,T.K. Abhinav1,Vai bh av Dhyani 1,Soumya Jana2and Lopamudra Giri 1, *
1Department of Chemical Engineering, Indian Institute of Technology, Hyderabad, India
2Department of Electrical Engineering, Indian Institute of Technology, Hyderabad, India
*Corresponding author. E-mail: giril@che.iith.ac.in
Abstract
Live cell calcium (Ca2+) imaging is one of the important tools to record cellular activity during in vitro and in vivo preclinical studies.
Specially, high-resolution microscopy can provide valuable dynamic information at the single cell level. One of the major challenges in
the implementation of such imaging schemes is to extract quantitative information in the presence of significant heterogeneity in Ca2+
responses attained due to variation in structural arrangement and drug distribution. To fill this gap, we propose time-lapse imaging
using spinning disk confocal microscopy and machine learning-enabled framework for automated grouping of Ca2+spiking patterns.
Time series analysis is performed to correlate the drug induced cellular responses to self-assembly pattern present in multicellular
systems. The framework is designed to reduce the large-scale dynamic responses using uniform manifold approximation and projection
(UMAP). In particular, we propose the suitability of hierarchical DBSCAN (HDBSCAN) in view of reduced number of hyperparameters.
We find UMAP-assisted HDBSCAN outperforms existing approaches in terms of clustering accuracy in segregation of Ca2+spiking
patterns. One of the novelties includes the application of non-linear dimension reduction in segregation of the Ca2+transients with
statistical similarity. The proposed pipeline for automation was also proved to be a reproducible and fast method with minimal user
input. The algorithm was used to quantify the effect of cellular arrangement and stimulus level on collective Ca2+responses induced
by GPCR targeting drug. The analysis revealed a significant increase in subpopulation containing sustained oscillation corresponding to
higher packing density. In contrast to traditional measurement of rise time and decay ratio from Ca2+transients, the proposed pipeline
was used to classify the complex patterns with longer duration and cluster-wise model fitting. The two-step process has a potential
implication in deciphering biophysical mechanisms underlying the Ca2+oscillations in context of structural arrangement between
cells.
Keywords: calcium imaging, t-SNE, UMAP, HDBSCAN, cell-to-cell connectivity, confocal microscopy, GPCR targeting drug, SSDD
Insight, innovation and integration
Although measurement of Ca2+transients is crucial in assessment of cell–drug interactions and preclinical studies, the analysis is
significantly hindered by the cell-to-cell variability in spiking patterns. The inherent heterogeneity in complex oscillation patterns
faces an emergent data analysis challenge. In this context, the authors innovate an integrated approach combining confocal
imaging and two step machine learning framework combining non-linear dimension reduction and density-based clustering. We
demonstrate that UMAP-assisted HDBSCAN clustering outperforms over existing techniques in handling time-series analysis. The
findings indicate distinct spiking patterns induced by GPCR-targeting drug that is specific to structural arrangement between
cells. Gained insight assumes significant importance in identification of biophysical mechanisms underlying the collective Ca2+
dynamics.
INTRODUCTION
In vitro Ca2+imaging using a fluorescent and confocal microscope
is widely used to study cell–drug interactions as well as assess-
ment of drug efficacy and cytotoxicity [1]. Specifically, intracellu-
lar Ca2+is a crucial parameter to control many cellular functions
[2] and Ca2+imaging assumes importance in performing func-
tional assay during an assessment of drugs [3]. In recent times,
measurement of cytosolic Ca2+is considered as one of the tools
for G-protein-coupled receptor targeting (GPCR) drug screening [4,
5]. Activated GPCR is known to regulate the Ca2+signaling in a
cell [68] and change in Ca2+flux is one of the key markers for
the evaluation of cell state. However, the drug-induced Ca2+oscil-
lations obtained using fluorescence imaging yield a collection of
complex spiking patterns [5,9]. Full understanding of such com-
2|Integrative Biology, 2023
plex oscillations requires quantitative dynamic measurements of
signal transmission between cells. The automation in the segre-
gation of such time-course data consisting of asynchronous Ca2+
spiking remains challenging due to the variation in inter-spike-
interval and damping patterns present in the dataset.
Cellular architecture and structural arrangement is known
to regulate functioning of cells including contraction, migration
and differentiation [10]. The analysis of cell-to-cell connectivity
effect on cells is not only useful for understanding molecular
mechanism regulating cell function but also important in drug
development based on cell-based assays. One of the major chal-
lenges in the analysis of Ca2+spiking pattern is to identify the
signature responses that arise due to higher packing density and
gap junction-mediated Ca2+diffusion [1113]. Especially, when
the regions of interest are chosen randomly during imaging of
cells treated with a particular drug dose, it has been noticed
that the regions can be from a dense region having a higher
packing density of cells or it can be from a lower packing density.
However, such variability in structural arrangement and cell-
to-cell connectivity may play a crucial role in regulating Ca2+
spiking patterns [10,11,14]. Moreover, cell-to-cell variability may
arise due to the distribution in the number of receptors and
uneven drug distribution in the cells. Hence, the interpretation of
cellular activity in a large dataset is rather challenging. In order
to address this issue, we propose Ca2+imaging using confocal
microscopy and machine learning-based framework for cluster-
ing of a Ca2+dataset and identifying the distinguishable pat-
terns corresponding to low and high packing density. Generally,
the asynchronous Ca2+spiking patterns are not only different
with respect to frequency and amplitude but also have signifi-
cant variability in the inter-spike interval (ISI) in the course of
time [5].
Previously, k-means and functional clustering methods were
used for Ca2+spiking analysis [4,15,16]. However, the major
drawback of functional clustering [15,16] is that it focuses only
on calculating the distance between spike trains based on spike
timing where information on amplitude is omitted. As the number
of cells increases, there is a significant increase in computation
time due to the calculation of the distance between spikes for the
actual dataset along with the surrogate dataset [16]. Considering
the increase in the number of Ca2+spiking trains obtained from
various experiments and the presence of tremendous variability
in ISI and amplitude, there is an urgent need for the development
of an efficient and reliable method for automation in analysis.
K-means and FCM [4,5,17] were previously used to clus-
ter the feature extraction data; however, they failed to reduce
the noise. On the other hand, Density-based Spatial Clustering
of Applications with Noise (DBSCAN), developed by Ester et al.
[18], assumes importance in separating noises, which relies on a
density-based notion of clusters [1820]. One of the disadvantages
of DBSCAN is that it gives flat clustering and does not work well
for the data with different densities. In this context, we show
that Hierarchical Density-based Spatial Clustering of Applications
with Noise (HDBSCAN) [21,22] can be used for clustering after
reduction of dimension using uniform manifold approximation
and projection (UMAP) and t-distribution stochastic neighborhood
embedding (t-SNE) [23,24]. This algorithm works on its parental
algorithm, DBSCAN, by converting it into a hierarchical clustering
algorithm. HDBSCAN can be used to visualize the data by means
of a simplified cluster tree that does not require many critical
hyperparameters as input other than min_cluster_size’, which
is defined as the minimum number of points required for the
clustering.
One of the major objectives of this work is to identify the
best-suited dimension reduction algorithm for spiking datasets
based on their effectiveness in grouping the Ca2+responses using
a machine learning-enabled pipeline. First, we performed time
series data acquisition using spinning disk confocal microscopy
and compared the performance of two nonlinear algorithms for
dimension reduction, including t-SNE and UMAP with principal
component analysis (PCA) [25]. Furthermore, we implemented
clustering using k-means, agglomerative clustering, DBSCAN and
HDBSCAN. Here,we show that the UMAP-assisted HDBSCAN clus-
tering outperforms over existing approaches for clustering of Ca2+
spiking dataset.
In order to validate the proposed method, the statistical simi-
larity [26] was assessed by fitting the dataset to various statistical
distributions based on minimum Akaike-Information Criterion
(AIC) [27]. Additionally, we performed pairwise Kruskal–Walli’s
testing for the distribution parameters obtained for each clus-
ter. In contrast to conventionally used validation indices [2831],
we used (Shapes, different Sizes & Densities, small separation
Distances) SSDD index as the cluster validity index [32], which is
able to find the best fitting partition for the case of clusters with
arbitrary shapes, sizes, densities and small separation Distances.
In the SSDD index, it is assumed that good clusters are high pack-
ing density regions surrounded by low packing density regionsand
separated from other high packing density regions.
One of the most important contributions of this work is the
introduction of a strategy in the workflow so that reproducibility
is retained when the algorithm is running multiple times to find
a suitable number of clusters. One of the major issues in using
t-SNE and UMAP is obtaining variations in 2D embedding from a
single dataset for several repetitions due to the stochastic nature
of the algorithm. In order to address this, we propose a workflow
for performing t-SNE or UMAP with 1000 random seed states
followed by density-based clustering and choosing an embedding
with the highest reproducibility and minimum SSDD index. Also,
the similarity between the various embedding was quantified
using the structural similarity ratio [33,34]. The proposed pipeline
combining dimension reduction and clustering offers a simple,
fast and flexible toolbox for grouping of spiking behavior of cells
obtained from regions of disparate structural arrangement and
different drug doses.
In this paper, we hypothesize that higher packing density and
gap-junction-mediated Ca2+diffusion may induce an increase
in spiking amplitude, duration of oscillation and frequency [10,
12,13,3537]. On the other hand, paracrine signaling may be
responsible for the Ca2+diffusion in cells with lower packing
density and leads to lower frequency and amplitude [38]. In
this work, we show that the proposed framework is necessary
to demonstrate that the structural arrangement and packing
density of cells are the crucial factor in controlling the relative
distribution of cells having various levels of spiking frequency,
ISI and amplitude. Here, we used norepinephrine as the GPCR
targeting drug that is generally used for increasing blood pressure
in case of emergency at intensive care unit [39,40]. The rationale
for choosing norepinephrine is that it is known to induce complex
oscillations followed by activation of adrenergic receptors [41].
Since the proposed algorithm was able to detect subtle differ-
ences in cell functionality, it can be used to assess the impact
of structural arrangement and drug dose distribution. In contrast
to traditional evaluation of rise time and decay ratio from Ca2+
spikes, the method can be used to quantify the complex patterns
with longer duration. Furthermore, we show a proof of concept
that a clustering of time course responses and cluster-wise model
Mapping of structural arrangement of cells and collective calcium transients |3
fitting is necessary to obtain insight into cell states and underlying
signaling mechanism.
The paper’s contents are arranged as follows: Section 2 briefly
describes the methodology of this research, followed by a detailed
description of the algorithms and cluster validity indices. Section
3 applies the algorithm to a Ca2+spiking dataset obtained from
HeLa monolayer culture and is compared with some classic
existing clustering algorithms. Furthermore, Section 4 describes
the significance of the clusters and how the two-step machine
learning paradigm along with cluster-based model fitting can
be implemented in the identification of the role of cellular
arrangement in controlling Ca2+signaling in cells treated with
GPCR targeting drug, norepinephrine. The implication of the
proposed method and limitations are concluded in Section 5.
An overall workflow for the proposed pipeline is presented in
Fig. 1.
MATERIALS AND METHODS
Cell culture
HeLa cells (ATCC, Manassas, VA) were cultured in minimum
essential media (Cellgro, Manassas, VA) supplemented with 10%
dialyzed fetal bovine serum (Atlanta Biologicals), in the presence
of 1% penicillin–streptomycin (PS) in a 29-mm glass-bottom
dishes (In Vitro Scientific, Sunnyvale, CA) at 37CinCO2(5%)
humidified incubator, and 0.2 ×10 [6] cells were seeded and
maintained in culture until 70–80% confluency and used for
further analysis.
Ca2+imaging with Fluo-4 dye (spinning disc
confocal microscopy)
HeLa cells were incubated for 30 min in Hank’s balanced salt solu-
tion (HBSS, Invitrogen, Life Technologies, Grand Island, NY) with
1.25 mM Ca2+, 5.3 nM KCl and 0.44 nM KH2PO4 (Sigma, St. Louis,
MO) with 2 μM Fluo-4 dye (Molecular Probes, Life Technologies,
Grand Island, NY) followed by washing with HBSS without FLuo-4.
The HeLa cells were rinsed with HBSS without Fluo-4 three times,
with 15 min of incubation time allowing for the de-esterification.
Ca2+imaging was carried out using a spinning-disk confocal
imaging system (Leica DMI6000B microscope, a Yokogawa CSU-
X1 spinning disk unit), and the HeLa cells were maintained at
37Cand5%CO
2in the incubator attached to the microscope
system. The Fluo-4 intensity was recorded with an argon laser
at 488-nm excitation, and emission was recorded at 510 nm. To
obtain the time course of cytosolic Ca2+oscillation in the HeLa
cell population, the cells were imaged using a 63x oil objective in
the confocal imaging system with an Andor-IXonEM-CCD camera.
Basal level Ca2+was measured without any drug. The time of
Fluo-4 intensity was recorded (every 600 ms) before and after the
addition of the drug. Since we aim to have assay and analytics for
testing of multiple drugs, the duration of imaging was maintained
at 10 minutes.
GPCR targeting drug loading
Norepinephrine (Sigma, St. Louis, MO) reconstituted in HBSS was
used to activate the G-protein subunits at a concentration of 1, 10
and 100 μM. For drug treatment studies, the drug was added after
100sin10μL volume, and the Fluo-4 intensity was measured for
600 s after adding drugs. Time course of Fluo-4 intensity obtained
from time-lapse imaging was used for dose–response analysis.
Leica adaptive focus control was used to prevent the changes to
the plane of imaging (drifting) over time.
Data acquisition from time-lapse Ca2+imaging
The packing density of cells is significantly different in the case of
different regions present in the same tissue culture dish, and such
video-to-video variability cannot be avoided during high content
imaging studies. Various regions were chosen randomly, and HeLa
cell morphology was manually selected as a region of interest
(ROI). Furthermore, the dataset was grouped based on a similar
number of cells/areas and similar arrangement (manual labeling).
Specifically, the two groups were labeled as (i) higher packing
density and (ii) lower packing density for videos obtained from
each drug dose (1, 10 and 100 μM). Figure 2a shows the schematic
diagram for the experimental setup and data acquisition using
confocal microscopy. The data collection and grouping are pre-
sented in (Supplementary Fig. S1). Multi-tiff time-lapse files were
analyzed using Andor iQ software to obtain the time course of
fluorescent intensity of Fluo-4 for the entire duration of the Ca2+
spiking in single cells. The videos were taken for 10 minutes with a
frame rate of 1 image/s resulting in a file size of several gigabytes
(GB). HeLa cells were treated with different concentrations of
norepinephrine (1, 10 and 100 μM). Representative images depict-
ing the structural arrangement of cells from low and high packing
density regions are shown in Fig. 2b.
Data smoothing and baseline correction
An area with no fluorescence at 488 nm was considered as the
average background fluorescence. The background fluorescence
was subtracted from the average pixel fluorescence (at 488 nm)
of each ROI. In order to eliminate the effect of photobleaching, we
used an iterative average (IA) algorithm [42]. As shown in figure
(Supplementary Fig. S2a), ROI bordered in red color represents the
single cell. Supplementary Figure S2b shows the corresponding
time course of Fluo-4 intensity. The lower curve (red) is the
raw plot obtained from the ROI, while the black underneath
is the baseline retrieved from the intensity profile using an
IA algorithm. The dark blue curve in Supplementary Fig. S2b
shows the fluorescent intensity profile after baseline correction
of the same ROI. Next, the time course of Ca2+was denoised
using the method of an exponential moving average. tsmovavg
function in MATLAB was used to obtain the smoothened data
(Supplementary Fig. S2c). Since the time points at which the
fluorescence was captured (using 488 laser) were not identical
for each of the videos in the dataset, we performed interpolation
(‘interp1 MATLAB version R2020a) to get the fluorescence at the
same set of time points across the whole dataset. The dataset
contains 756 cells and 339 time points.
Uniform manifold approximation and projection
UMAP is a very recent method used for nonlinear dimension
reduction, which searches for an accurate local structure and
incorporation of an improved global structure [24]. It is a fuzzy
topology-based method, and it has several advantages compared
with t-SNE [19,43]. The time-series data matrix (time course of
Fluo-4 intensity for all cells present in the dataset) was taken as
the high dimensional data as {x1,x2,...,xN|xiRM},andweaim
to identify the lower dimensional representation {y1,y2,...,yN|yi
Rk}, such that k=2. Like t-SNE, UMAP also constructs exponential
probability distribution in the high dimensional manifold as
pij =e(d(xi,xj)ρi)
σi,(1)
where d(xi,xj)is the distance between the ith and jth data points
and ρis the distance between ith data points and its first
nearest neighbor. One of the important differences of UMAP
compared with t-SNE is that the probability distribution is
4|Integrative Biology, 2023
Figure 1. Flow diagram for the proposed pipeline for selection of methods, cluster number, and analysis of live imaging data. Green box denotes the
data preprocessing, blue box denotes the dimension reduction methods, pink box denotes the implementation of density-based clustering methods,
and brown boxes indicate the validation methods including computation of SSDD an index for selecting optimal cluster, similarity ratio and
correlation coefficient. Redbox and arrow denote the initialization of 1000 matrices to identify optimal 2D embedding for t-SNE and UMAP. (SSDD =
shapes, Sizes, Densities and small separation Distances)
the local metric, which is unique for every pair of points. The
probability distribution in the lower dimension is given by
qij =1+ayiyj2b1,(2)
where aandbare constants. The other main difference between
UMAP from t-SNE is the loss function used to estimate the lower
dimension structure. In UMAP, cross-entropy () is used instead
of KL divergence, and CE is defined as
CE (X,Y)=
i
jpij(X)log pij (X)
qij(Y)+1pij (X)log 1pij(X)
1qij(Y).
(3)
Mapping of structural arrangement of cells and collective calcium transients |5
Figure 2. Data acquisition for norepinephrine mediated Ca2+oscillations from regions of varied packing density of HeLa cells using confocal
microscopy. (A) Schematic diagram for various conditions including low-dose high-packing density, low-dose low-packing density, medium-dose
high-packing density, medium-dose low-packing density, high-dose high-packing density, and high-dose low-packing density. (B) Representative
images for detailed structural arrangement from high- and low-packing density with low, medium and high drug doses. (C) Heat map representation
of the entire Ca2+spiking dataset [for a duration of 600 s] obtained from six different conditions.
The CE function significantly improves the ability to preserve
the correlation between distances in the high and low dimensions
for both small and large distances. Implementation and further
details of UMAP can be found in McInnes et al. [24]. All the software
versions and parameters used for UMAP and t-SNE are provided
in Supplementary Table S1.
HDBSCAN clustering
In HDBSCAN, ‘min_cluster_size’ is the primary parameter that
affect the clustering. The plot of ‘min_cluster_size’ as a function
of cluster number, as shown in Supplementary Fig. S3, was used to
select cluster numbers. The major steps of HDBSCAN are briefly
summarized here [21,44].
Steps for HDBSCAN algorithm:
Step 1 Computation of the core distance with respect to a mini-
mum number of data points in a cluster (‘min_cluster_size’)
for all data objects in the dataset.
Step 2 Computation of minimum spanning tree (MST), with
mutual reachability distance between the sample points
as edge.
Step 3 Transform the MST into a hierarchical structure.
Step 4 Use the input parameter min_cluster_size to find the com-
pressed cluster tree.
Step 5 Finally, the density-adaptive clustering result is obtained
through a stability function.
Multiple run analysis
t-SNE and UMAP are both known to yield a distinctly different
solution for a number of the random initial condition [33]. In
this context, we propose to run the t-SNE and UMAP 1000 times
with different 2D initialization (Fig. 1). In order to get a robust
solution and clustering pattern, we obtained the distribution of
cluster numbers using 1000 different 2D initializations. Next, we
clustered the data using two clustering algorithms, DBSCAN and
HDBSCAN. The optimal cluster number was chosen based on the
model of the frequency distribution of cluster numbers from 1000
runs for DBSCAN and HDBSCAN. Since there are still multiple
structures present corresponding to the chosen cluster number,
furthermore, we performed a selection of an optimal 2D structure.
The selection was performed based on minimum SSDD from
the nruns with k(mode) clusters with the highest frequency
(Fig. 1).
Clustering performance evaluation
Since many of the cluster validation indices are not suitable for
non-spherical clusters, here, we used SSDD for validation of the
clustering results. SSDD cluster validation index is specifically
designed for hard clustering with irregular clustering results,
where the clusters are in arbitrary shapes, different sizes and
densities, and with small separation distances. Recently, Liang
et al. [32] developed a new cluster validation index SSDD based
on inner and inter-cluster validation measures. The SSDD index
is calculated as
SSDD(C)=
ciC
[α.DC (ci)+β.DR (ci)],(4)
where DC(ci)is the density changes along the backbone of
cluster ciand DR(ci)is the inter and inner cluster ratio of
cluster ci. The workflow for the calculation of SSDD is provided
in Supplementary Fig. S4. The details of the SSDD algorithm
were presented in its original paper [32]. The specific differ-
ence between the proposed pipeline and the state-of-the-art
approaches is presented in Supplementary Fig. S5.
6|Integrative Biology, 2023
Statistical analysis
The differences between the two treatment groups were com-
pared with a Kruskal–Walli’s test (MATLAB), P<0.05 was consid-
ered to indicate statistical significance. Data are presented as the
mean ±standard deviation (SD).
Mathematical model for Ca2+signaling
In order to understand the difference in mechanism underlying
Ca2+oscillations with low and higher packing density, we per-
formed the parameter estimation using a mathematical model
[45] that captures the norepinephrine mediated Ca2+oscillations
in HeLa cells. This particular model was developed to investigate
the mechanism underlying Gicoupled GPCR induced Ca2+oscil-
lation. Overall, the model has the 7 variables: (i) fast activated
Gβγ at the plasma membrane (βγfastPM ), (ii) slow activated Gβγ
at the plasma membrane (βγ slowPM ), (iii) fast Gβγ at the internal
membrane βγfastIM ), (iv) slow Gβγ at the internal membrane
(βγslowIM ), (v) cytosolic Ca2+,(vi)ERCa
2+, and (vii) [IP3] and 21
parameters. Detailed description of the model is presented in sup-
plementary section 1.7. In this model, it was assumed that IP3is in
a quasistationary state with respect to the concentration of active
[PLC β],andtheCa
2+inflow from internal stores/endoplasmic
reticulum depends on [PLC β]. Also, it was assumed that IP3
formation is directly proportional to [PLC β],aswellasactive
[Gβγ]. Since many of the parameters are known to be constant
[4547], we performed estimation of few parameters from the
single cell responses from cluster 10 [signature response from
low packing density] and from cluster 5 [signature response from
high packing density]. Specifically, we performed estimation of
parameters that are controlling the store Ca2+, which includes,
k2: rate constant for receptor desensitization (negative feedback
parameter), k3:rate constant for PLC- βactivation, k5:rate
constant for Ca2+influx from extracellular space to cytosol that is
modeled as a function PLC- β,and
k6:rateofCa
2+influx from ER
to cytosol. The cell numbers taken from each case were 10 and all
the cells correspond to treatment with 10 μM of Norepinephrine.
Estimation of kinetic parameters for each cell was performed
using genetic algorithm (GA) [48]. Detailed description of the
parameters is presented in supplementary section 1.7.
RESULTS
We developed a framework to cluster the single cell Ca2+
responses collected from the live imaging assay (Fig. 1). The cells
with Fluo-4 were treated with a various drug dose. Cells were
imaged in a glass-bottom dish from low and high packing density
followed by analysis to track and outline individual cell responses
over time (Fig. 2b). UMAP was used to reduce the dimensionality
of extracted time-series information. Furthermore, we used the
data for HDBSCAN clustering. In the following section, we will give
the details of developing the framework and selection of cluster
number. Furthermore, we present a proof-of-principal study in
profiling the heterogeneity in Ca2+responses for various stimulus
level and structural arrangement.
Confocal imaging of Ca2+spiking in single cells
for low and high packing density regions
To observe the Ca2+spiking behavior in HeLa cells, we performed
cytosolic Ca2+time-lapse imaging using spinning disk confocal
microscopy. Initially, we characterized the Ca2+spiking in HeLa
cells for the dose–response experiments carried out at three
different drug doses of norepinephrine (1, 10 and 100 μM) and for
two packing densities (Fig. 2b and c). Figure 3a and b shows the
time-lapse images corresponding to low and high packing density
for 100 μM dose. Next, we present the intensity mapping of cell
populations, which shows the differences in Ca2+spiking patterns
corresponding to low and high packing density when treated
with Norepinephrine (Fig. 3a and b). Supplementary videos 1 and
2show the time-lapse images of Fluo-4 intensity for low and high
packing density when treated with 100 μM Norepinephrine.
Cellular heterogeneity based on Ca2+spiking
pattern for various structural arrangements
between cells: distinct Ca2+oscillation profiles
from single cells
Next, we examined the spiking pattern and activity in cells chosen
from regions with two different packing densities corresponding
to three doses of norepinephrine (1, 10 and 100 μM). Figure 3c–h
shows the time course of Fluo-4 intensity for five cells randomly
chosen from each of the six cases mentioned above. The dose–
response study indicates that norepinephrine induces oscillatory
responses at a higher dose. Supplementary Figure S6 shows the
histogram analysis of three features: number of peaks,maximum
amplitude, and inter-spiking interval. The result shows that the
Ca2+spiking in the cell population is heterogeneous in lower and
higher doses. The results clearly show some differences between
low and high packing density regarding the number of peaks,
maximum amplitude, and ISI. However, it is rather challenging to
automate the quantification of the relative percentage of various
patterns, including high spiking, low spiking, high ISI between
spikes, low ISI between spikes, as well as low amplitude and high
amplitude Ca2+responses.
Dimension reduction framework for 2D
visualization of complex Ca2+oscillation patterns
To examine the overall shape of the multi-dimensional data set,
we implemented three different dimension reduction methods,
PCA, t-SNE and UMAP (Supplementary Fig. S7). The dataset
considered here contains the time-series information obtained
from multiple videos obtained with various drug doses. For
the purpose of visualization, reduction in 2D was applied for
each of the reduction schemes. Supplementary Figure S7 shows
the comparison of PCA, t-SNE and UMAP visualization of the
dataset containing single-cell Ca2+dynamics for low and high
packing density regions. Each color in the 2D plot shows the data
point from each condition (low-dose high-packing density, low-
dose low-packing density, medium-dose high-packing density,
medium-dose low-packing density, high-dose high-packing
density, and high-dose low-packing density). The result shows
that various types of spiking patterns are present among different
conditions. The PCA results from scatter plot show the overlap
between the different spiking pattern, but in general, the specific
spiking pattern should readily be distinguishable from one
another. Whereas scatter plot from t-SNE and UMAP shows
different clusters that indicates that non-linear dimensionality
reduction methods were able to segregate various spiking pattern
(Supplementary Fig. S7). Also, it can be seen that the points from
each condition remain in the close neighborhood in the 2D visual-
ization from both t-SNE and UMAP. At the same time, PCA shows
more intermixing, indicating that linear dimension reduction
may not be good for clustering, whereas UMAP and t-SNE show
efficient mapping. This experiment and analysis demonstrate
that the critical difference between data can be captured through
t-SNE or UMAP representation (Supplementary Fig. S7). Figure 4a
Mapping of structural arrangement of cells and collective calcium transients |7
Figure 3. Fluorescent Time-lapse images of Fluo-4 loaded HeLa cells excited at 480 nm and Spatial intensity mapping of Fluo-4 showing the
distribution of Ca2+/response in HeLa cell population in presence of norepinephrine (a2-adrenergic receptor agonist). Representative time-lapse
images were collected from the videos captured using 63X oil objective in spinning-disk confocal microscopy for 600 s. (a) High packing density
fluorescent images at 100 μM and corresponding spatial intensity map, (b) low packing density fluorescent images at 100 μM and corresponding
spatial intensity map. Time course of Fluo-4 intensity obtained from single cells treated with a norepinephrine at 1, 10 and 100 μM shows cell to cell
variability in Ca2+/response. From each case, five representative fluorescent traces from individual cells are shown. (c), (d), (e) show the time course
of Ca2+for low-packing density cells at three different doses, and (f), (g), (h) show the time course of Ca2+for high-packing density cells at three
different doses.
8|Integrative Biology, 2023
Figure 4. Dimension reduction and cluster number distribution across 1000 runs of t-SNE and UMAP. Selection of optimal cluster number was
performed using SSDD index. (a) Dimensionality reduction using UMAP of Ca2+responses from high packing (left) and low packing (right) density of
cells for various doses. (b) and (c) The cluster number distribution for t-SNE and UMAP from DBSCAN. Cluster number is chosen based on the value
with highest reproducibility (Mode). (Modet-SNE+DBSCAN =12, ModeUMAP+DBSCAN =10). (d) and (e) The cluster number distribution for t-SNE and UMAP
from HDBSCAN clustering algorithm (Modet-SNE +HDBSCAN =13, ModeUMAP+HDBSCAN =17). (f) Box plot representation of SSDD values for each
structure obtained after dimension reduction corresponding to the cluster numbers chosen from a, b, c and d (for t-SNE+DBSCAN, k=12, for
UMAP+DBSCAN, k=10, for t-SNE +HDBSCAN, k=13, and UMAP+HDBSCAN, k=17, respectively where k=cluster number).
shows UMAP representation of Ca2+spiking pattern from
different structural arrangement between cells and various drug
dose level.
Multiple run analysis for t-SNE and UMAP and
selection of cluster number for DBSCAN and
HDBSCAN
Since multiple runs of t-SNE and UMAP yielded significant varia-
tion in the 2D structures, we used randominitial conditions for the
2D projection for finding the optimal cluster number.We obtained
the reduced dimension by performing the t-SNE and UMAP 1000
times with different initial conditions followed by clustering using
various methods, including k-means, agglomerative, DBSCAN and
HDBSCAN.
Figure 4b and c shows the distribution of cluster numbers cor-
responding to each run of DBSCAN for constant EPS’ and MinPts
from t-SNE and UMAP. For the DBSCAN analysis, we obtained
the cluster number corresponding to each run and selected the
optimal cluster number based on the model obtained from 1000
runs. Furthermore, we calculated the SSDD index corresponding
to this model to identify the 2D structure with the best clustering
performance metrics (Fig. 4f). The optimal cluster number for
Mapping of structural arrangement of cells and collective calcium transients |9
Figure 5. Comparison of different combination of dimensional reduction and clustering algorithms on Ca2+spiking dataset containing data for three
different drug doses and two different packing densities. (A) t-SNE projection and DBSCAN clustering (k=12), HDBSCAN clustering (k=13). (B) UMAP
projection and DBSCAN clustering (k=10). HDBSCAN clustering (k=17). Each color represents different cluster for the four cases.
t-SNE and UMAP embedded data for DBSCAN was found to be 12
and 10, respectively.
In order to choose the cluster number and filter out an
appropriate result from the numerous clustering results obtained
from HDBSCAN, we used three strategies. We selected the cluster
number such that they reside within the stable range of the
elbow structure (Supplementary Fig. S3d), having the maximum
frequency from the multiple runs of UMAP and minimum
clustering performance index (SSDD). First, we identified the
range of possible cluster numbers near the elbow structure [43]
through the determination of the connection between minimum
cluster size and cluster number. Since the major tuning parameter
of HDBSCAN is the minimum cluster size or ‘min_cluster_size,’
we considered all integer ‘min_cluster_size’ values in the range
[2–100] [34] . The trend of estimated cluster number against every
‘min_cluster_size’ value was shown in Supplementary Fig. S3,
and the trend fitted an exponential decay. It was noticed that for
the ‘min_cluster_size’ value from 9 to 14, and the cluster number
lies between 15 and 18. Although the cluster number remained
stable for ‘min_cluster_size’ =15, the cluster number remained
as low as four (lower cluster number leads to intermixing).
However, all the other tuning parameters of HDBSCAN were left
as default. The second strategy was to choose a cluster number
that is obtained using UMAP and t-SNE with high reproducibility.
Toward this, we obtained the cluster number corresponding to
1000 runs of t-SNE and HDBSCAN (Fig. 4d). Similarly, 1000 runs of
UMAP and HDBSCAN were performed, and it was found that 279
runs out of 1000 run provide 17 clusters (Fig. 4e).
Furthermore, for these 279 runs, the SSDD index was calculated
to identify the 2D structure with the best clustering performance
metrics. Figure 4f shows the boxplot representation of SSDD val-
ues corresponding to 17 clusters obtained from HDBSCAN. Specif-
ically, the cluster number was set to 17, in this case, since it well
resided in the optimal region of cluster number in the elbow
structure (Supplementary Fig. S3). One of the major benefits of
the chosen cluster number with high reproducibility is the reliable
performance for reducing the dimension of the original dataset
so that the distribution of the dataset can be observed in the 2D
space. These results show that the proposed scheme, as men-
tioned above, can achieve better reproducibility in selecting the
embedded structure and cluster number.
UMAP-assisted HDBSCAN modeling of collective
Ca2+responses improve the clustering efficiency
over existing methods: comparison of efficiency
across various techniques
To evaluate the performance of t-SNE and UMAP, we used
an unsupervised learning framework where the dimension-
reduced data were clustered using k-means, agglomerative
Supplementary Fig. S8 and Supplementary Fig. S9a and b,and
DBSCAN and HDBSCAN clustering (Fig. 5a and b). Figure 5a and b
shows the visualization performance of UMAP-assisted and t-
SNE-assisted clustering of the Ca2+spiking dataset. For each case,
the dataset was reduced to two dimensions, and a specific color
was assigned to a cluster. Since the ground truth for the dataset
was not known, the number of clusters was chosen based on
validation indices.
Next,we compared the various clustering techniques with t-
SNE and UMAP dimension reduction with different internal and
external validation techniques to determine which set of clusters
is optimal for approximating the underlying subgroups in the
dataset. Figures 6a and b and Supplementary Fig. S10 show the
heatmap of the Pearson-Correlation coefficient between pairwise
responses in each cluster for the four methods. Results show
that t-SNE and UMAP followed by HDBSCAN perform well in
separating the similar responses into each cluster. The red color in
Fig. 6a and b indicates that responses are strongly correlated, and
the dark blue color shows that responses are weakly correlated.
10 |Integrative Biology, 2023
Figure 6. Assessment of proposed framework using correlation analysis and similarity analysis. (A) Heatmap representation of pairwise Pearson
correlation coefficient between two cells present within each of the cluster (A) t-SNE projection followed by DBSCAN clustering (k=12) and HDBSCAN
clustering (k=13). (B) UMAP projection followed by DBSCAN clustering (k=10) and HDBSCAN clustering (k=17). Similarity ratio analysis between any
pairwise cell (cell iand cell j). Each entry into the heatmap shows a similarity ratio of the cell pair that consists of cell iand cell j(i=1, 2. . .., n,j=1,
2....n) obtained from 1000 runs. (C) UMAP+HDBSCAN (D) t-SNE HDBSCAN.
Although the reduction ratio is 2/339, the correlation analysis
indicates that the time series data from each cluster are similar.
It was noticed that for t-SNE and UMAP embedding and DBSCAN,
a large cluster corresponds to the data containing noisy data and
single spiking data Fig. 6a and b.
Table 1 shows the performance evaluation of various clustering
algorithms assisted by t-SNE and UMAP using different cluster
validation indexes. Although UMAP and k-means has the
lowest DB index, noise separation was not efficient in this
case (Supplementary Fig. S10). Furthermore, k-means leads to
Mapping of structural arrangement of cells and collective calcium transients |11
Tab le 1. Comparison of various clustering methods using different validation indices.
Case /Method SSDD CVD CNN WB SIL CH DB SDBW
t-SNE
Agglomerative
0.416 21.338 96.088 0.588 0.597 1408.702 0.630 0.038
t-SNE k-means 0.458 15.437 93.645 0.645 0.663 1302.229 0.636 0.058
t-SNE DBSCAN 0.331 4.015 130.226 3.274 0.064 247.878 5.814 0.126
t-SNE HDBSCAN 0.321 7.621 113.503 8.894 0.050 90.505 1.177 0.084
UMAP
Agglomerative
0.430 88.146 20.402 0.329 0.702 2432.220 0.474 0.020
UMAP k-means 0.436 100.892 20.327 0.283 0.769 2819.391 0.470 0.018
UMAP DBSCAN 0.323 1.464 21.532 1.186 0.117 698.894 3.612 0.051
UMAP HDBSCAN 0.319 44.257 25.919 1.358 0.517 578.176 0.963 0.037
intermixing of various types within the same cluster, as shown
in (Supplementary Fig. S10). This is happening since these indices
are not appropriate for evaluating irregularly shaped clusters
[32]. Hence, we performed the comparison based on SSDD
computation, as shown in(Supplementary Fig. S4. Notably, UMAP
and HDBSCAN performed best in separating different types of
Ca2+spiking responses based on SSDD. The result shows that the
SSDD value for t-SNE and HDBSCAN is very close to UMAP and
HDBSCAN. Hence, t-SNE can also be used to obtain an efficient
separation. However, the run time for t-SNE was found to be
significantly higher than UMAP (Supplementary Fig. S11). Also,
while performing HDBSCAN, UMAP shows a lower run time
compared with t-SNE. Hence, it can be concluded that UMAP
and HDBSCAN can be used as the optimal method for the given
dataset on Ca2+responses containing noise.
Testing of reproducibility for t-SNE and UMAP
In order to assess whether the same cells are going to the same
clusters in the 279 runs of UMAP and HDBSCAN (corresponding
to cluster number =17), we recorded whether a pair of cells (i,j)
are in the similar clusters in a given run. Then, we calculated the
ratio for each such pair of cell responses to check whether they
are in the same cluster across all runs. This ratio was denoted
as the similarity ratio [33]. We constructed the similarity ratio
matrix using the data from multiple run analysis for t-SNE and
UMAP for HDBSCAN (Fig. 1). Figure 6c and d shows the heatmap
of similarity ratios for HDBSCAN clustering of the 2D projection
obtained by UMAP and t-SNE. Results show whether the similar
responses are going into the same cluster in various runs. A high
similarity ratio indicates that two responses were consistently
grouped into the same cluster, and a low ratio indicates that two
responses are grouped in a different cluster. Each value in the
heatmap shows a similarity ratio of the corresponding column
response and row response (cell iand cell j). The red color
shows a higher similarity ratio; dark blue color indicates a low
similarity ratio. Results show that the 17 groups can be identified
where pairwise similarity ratios within the clusters are quite high.
Similarity ratio results also show that UMAP and HDBSCAN are
more robust methods for this dataset. Supplementary Figure S12
shows the heatmap representation of the Ca2+spiking pattern in
each cluster with cluster label for HDBSCAN clustering with t-SNE
and UMAP assisted dimension reduction.
Automated identification of Ca2+spiking patterns
for diverse structural arrangement by HDBSCAN
The final step is visualizing the spiking patterns for various
dataset corresponding to diverse structural arrangements
through the labels of 17 clusters (noise +16 clusters) obtained
through the HDBSCAN algorithm (Fig. 7a). We found that there are
17 clusters that are formed for most of the UMAP and HDBSCAN
structures out of 1000 runs (See Fig. 4). Figure 7a shows the
16 spiking patterns from each cluster. Each cluster represents
distinct types of spiking patterns present in the dataset and can
be described as follows: clusters 4, 5, 8, 9, 15 and 17 are with a
high number of peaks with low, medium and high amplitude, and
clusters 7, 13, 14, 11 and 12 with a low number of peaks with low,
medium and high amplitude (Supplementary Table S4). Cluster
2 shows a plateau-like spiking pattern: clusters 4 and 7 show a
spiking pattern of higher amplitude with distinct ISI. Similarly,
clusters 6 and 10 show a spiking pattern of low amplitude with
a distinct ISI pattern. Supplementary Table S4 shows that UMAP
and HDBSCAN can be used for the segregation of patterns with
various amplitude and ISI. Also, the noise (cluster 1) separated by
HDBSCAN is shown in Supplementary Fig. S13.Figure 7b shows
the mapping back of spiking pattern from cluster 4, 5 and 7 that
corresponds to specific structural arrangements for the cases
of higher packing density. In contrast, the spiking pattern from
cluster 10 and 11 corresponds to low packing density. We found
significant separation between various structural arrangements.
These observations collectively strengthen the link between cell-
to-cell connectivity and duration of oscillation.
Imaging and machine learning framework
indicates the existence of distinct clusters with
statistical similarity between single cell
responses
Furthermore, we compared the statistical similarity between the
Ca2+spiking within a cluster for UMAP and HDBSCAN cases. In
order to do this, we first fitted the spiking data with different
probability distributions, including Normal, Gamma, Exponential,
Log-Normal, Weibull and Birnbaum and Saunders (B-S) distri-
bution. Furthermore, we used AIC for selecting the best distri-
bution fitting for the data. Supplementary Figure S16 shows the
heatmap of AIC values of various distributions fitted to spiking
data from each cluster. Results show that, in most of the clusters,
all the Ca2+responses follow the unique distribution that is
lognormal. However, in clusters 1, 16 and 17, the spiking pattern
for few responses follows the normal distribution, whereas, for
clusters 1, 2, 8, 15, 16 and 17, the data follow the BS distribution
(Supplementary Fig. S17)andSupplementary Table S2. Further-
more, we compared the parameters mean (μ) and standard devia-
tion (σ) of lognormal distribution for each cluster. Figure 8a and b
shows the boxplot representation of mean (μ) and standard devi-
ation (σ) for 17 clusters obtained from UMAP and HDBSCAN. Next,
we performed the Kruskal–Walli’s analysis for pairwise compar-
isons between the mean (μ) and standard deviation (σ) of each
cluster. Figure 8c and d shows the heatmap of pij-values obtained
from Kruskal–Wallis analysis, where pijdenotes the P-values (with
12 |Integrative Biology, 2023
Figure 7. Mapping of Ca2+response corresponding to specific structural arrangement of cells from UMAP and HDBSCAN. (a) Represents time course of
Fluo-4 intensity obtained from single cells present in each cluster (k=2, 3. . .17). (b) Significance of few clusters in terms of structural arrangement.
Cluster 4, 5 and 7 are mainly present in high packing density cells regions, whereas cluster, 10 and 11 are mainly from lower packing density cells
regions.
α=0.05) corresponding to the comparison of the ith cluster with jth
cluster, [i=1,2 .....17] and [j=1, 2, .....17]. The results show that
for most of the clusters, mean (μ) and standard deviation (σ)are
significantly different. However, for few pairs, including clusters
(4, 5), (4, 7), (9,14), (8,2) and (7,15), there is no signif icant difference
between both mean (μ) and standard deviation (σ) obtained by
fitting lognormal distribution (Fig. 8c and d). However,for clusters
2, 7 and 15, we found that some of the data are best fitted
with BS distribution (Supplementary Fig. S17), and hence, they
are separated as another cluster. Moreover, 4 and 5 are different
with respect to ISI, whereas 4 and 7 are different with respect to
amplitude. Similarly, it was found that both (9, 14) and (7, 15) are
different with respect to the pattern in ISI. Note that, although
8 and 2 have similar mean and standard deviation (σ)froma
lognormal distribution, the pattern of Ca2+response is distinctly
different for cluster 2. Here, the result shows that the data are
automatically grouped into clusters that are different with respect
to the statistical similarity and ISI and amplitude. These observa-
tions show that each cluster may attain oscillation characteris-
tics specific to certain biophysical parameters associated with it.
Mapping of structural arrangement of cells and collective calcium transients |13
Figure 8. Box plot representation of parameters mean(μ) and standard deviation (σ) obtained by fitting lognormal distribution to ca2+ response from
each cell in 17 clusters obtained by UMAP and HDBSCAN. (A) Comparison of mean (μ) across 17 clusters and (B) comparison of standard deviation (σ)
across 17 clusters. from each cluster. Heatmap representation of P-values (p_ij) obtained from pairwise Kruskal–Wallis analysis between ith and jth
cluster [i=1,2,3. . .0.17, j=1,2,3. . .0.17]. (C) P-value heat map for mean (μ). (D) P-value heatmap for standard deviation (σ) (white color represents
P<0.05-significant difference between mean (μ) and standard deviation (σ)ofith and jth cluster), red color shows P>0.05—no significant difference
between clusters.
Such parameters may include the kinetic parameters controlling
Ca2+channel function, gap junction mediated diffusion and the
process of receptor desensitization. Additionally, the transition
dynamics within a cluster may follow a parameter coming from
the same distribution.
Validation of the machine learning method with
labeled data
Furthermore, we validated the proposed method using labeled
time series dataset (Mallet dataset [49], Supplementary Fig. S18).
The proposed method is able to segregate the given data set with
more than 94% accuracy (Supplementary Table S5) for 5 classes
out of 8, corresponding time course representation is shown in
Supplementary Fig. S19.
Two-step machine learning approach indicates a
significant increase in the subpopulation
containing longer duration of oscillation in
higher packing density region
This algorithm was then used to compare the response pattern
in cell population that was exposed to various drug doses for
various structural arrangement that arises due to the natural self-
assembly process after cell seeding. Here, we chose to investigate
the effect of structural arrangement between cells through low
and high packing density regions of the petri-dishes. Figure 9a
shows that cluster 5 and 7 remains significantly high in case of
cell-responses corresponding to high-density region. On the other
hand, the percentage of cluster 10 and 11 are significantly lower in
case of high packing density region. In contrast, Figure 9b shows
that cluster 8 and 9 are significantly higher in case of highest
dose. Figure 9c and d shows the heap map representation of the
signature patterns for higher packing density (cluster 5 and 6)
and higher dose (cluster 8 and 9). The signatures of low packing
density and low dose responses are also shown in Fig. 9c and d.
Overall, the analysis revealed a significant increase in the subpop-
ulation containing longer duration of oscillation corresponding to
higher packing density.
Further we evaluated the trajectory features of cells from
clusters 5 and 10 (Fig. 9e and f). The results reveal a clear
distinction between features from low (cluster 10) and high
(cluster 5) packing density. We also used pie chart representation
to denote the relative distribution of various spiking patterns
corresponding to low and high packing density for 100 μM
(Supplementary Fig. S20). This shows the relative percentage
of various clusters present in the case of lower and higher
packing densities. Supplementary Table S3 shows that various
cases with lower and higher cell to cell connectivity yield distinct
relative distribution of clusters. In the case of low packing density
and low dose, it mainly contains cluster 1 and 16, with lower
amplitude and frequency. In contrast, the medium and high dose
in higher packing density contains clusters 4 and 7, including
spiking with higher frequency and amplitudes. Additionally, the
algorithm is able to distinguish cluster 2 that contains cells with
a plateau-like pattern present only in high dose and high packing
density. This result shows that the proposed pipeline is able to
quantify the relative amount of clusters the complex oscillatory
patterns and can be used to identify the biophysical parameters
corresponding to various levels of structural arrangement and
cell-to-cell connectivity. These observations strengthen the link
14 |Integrative Biology, 2023
Figure 9. Automated detection of Ca2+spiking features for various structural arrangement/ packing density and drug doses and identification of
relative distribution of clusters corresponding to these factors. (A) Bar graph representation of average relative percentages of four clusters (5,7,10
and 11) for low and high packing density. (B) Bar graph representation of average relative percentages of four clusters (8, 9, 1 and 16) for three different
doses (1, 10 and 100 μM). (C) Heatmap representing the time course of Ca2+spiking pattern from the four clusters (5, 7, 10 and 11) in (A). (D) Heatmap
representing the time course of Ca2+spiking pattern from the four clusters (8, 9, 1 and 16) in (B). (E) Example of single-cell Ca2+data with
representative features of the basal value (F0), time to reach half maximum (T50U), time to reach maximum (Tm), maximum value (Fm), time to decay
to half maximum (T50D), and steady-state final value (Ff). (F) Bar graph representation of time-series features of Ca2+signals in the cell population
from high packing density (from cluster 5) and low packing density (cluster 10).
Mapping of structural arrangement of cells and collective calcium transients |15
between stronger communication between cells and sustained
oscillation.
Sustained Ca2+oscillation in higher packing
density: Mechanistic insight using UMAP and
HDBSCAN-assisted model fitting
In order to obtain mechanistic insight underlying the sustained
oscillation in higher packing density, we performed clustering-
based model fitting. First, we computed the difference in bio-
physical parameter values between the Ca2+responses from var-
ious clusters obtained from UMAP and HDBSCAN. Specifically,
we chose to fit the model presented in Giri et al. [45], for Ca2+
responses from cluster 5 and cluster 10 (Fig. 9a and f). Cluster
5 consists of responses from high packing density and cluster
10 consists of responses from reduced packing density. First,
we found that the receptor desensitization parameter, k2that
is regulated by MAP kinase expression [50,51], is significantly
lower for cells from higher packing density (cluster 5) than that
of lower density (cluster 10) (Fig. 10a). On the other hand, the rate
of [PLC β]activation (k3) remains significantly higher for higher
packing density (cluster 5) than lower packing density (cluster 10)
(Fig. 10b). In contrast, k5, which denotes the rate constant for Ca2+
influx from extracellular space to cytosol, is not signif icantly dif-
ferent for responses arising from higher and lower packing density
(Fig. 10c). However, the result shows that k5has a much higher
variance in case of cells from higher packing density compared
with that of cells from lower density (Fig. 10c). Figure 10d shows a
box plot comparison of the parameters controlling the Ca2+stores
(k6,rateofCa
2+influx from ER to cytosol) in clusters 5 and 10.
The result clearly shows that the rate constant for Ca2+influx
from ER to cytosol (k6) remains significantly lower (P<0.05) for
cells in case of higher packing density yielding an increase in store
Ca2+(P<0.05) (Supplementary Fig. S23) compared with cells from
low packing density. Hence, it can be concluded that there are two
cell states that are defined by distinct underlying mechanisms as
shown in Fig. 11a and b. The results indicate that the cells from
high packing density are not only associated with higher level of
store Ca2+but also associated with weak receptor desensitization
and higher variability in Ca2+influx from extra-cellular space.
This may contribute to the oscillations with longer duration.
Figure 12 shows the box plot representation of features
(number of peaks, maximum amplitude and duration of Ca2+
oscillations in single cells) extracted from responses in cluster
5 and 10 from experiments (Fig. 12a, b and c) and simulation
(Fig. 12d, e and f). The findings demonstrate that the estimated
parameters for the mathematical model are able to capture
the specific features of the Ca2+oscillation from experimental
measurements for cluster 5 and 10. Also, the specific parameters
obtained from single cells through the fitting process show
significant differences between high (cluster 5) and low packing
(cluster 10) densities (Fig. 10). We further identified the structural
arrangement of cells that causes the particular patterns from
cluster 5 and 10 (Fig. 12g and h). Together, we demonstrate
that cluster-based model fitting can be used for forming
new hypothesis for future experiments and deciphering the
mechanism underlying various Ca2+spiking pattern obtained
from distinct structural arrangement of cells.
DISCUSSIONS
In the field of Ca2+imaging, there are several computational
toolboxes available for automation in cell segmentation, motion
correction and Ca2+activity identification from the time-lapse
videos [5257]. Most of this workflow includes Python or
MATLAB framework for image processing and signal process-
ing. In contrast, there are less investigations in developing
tools for clustering complex oscillations obtained by treat-
ment with drugs. The major challenge is to develop and
validate a framework that would be scalable and reliable
for identification of cells with similar Ca2+oscillation pat-
terns. Specifically, the heterogeneity is remarkably high due
to uneven distribution of drugs and variability in packing
density when a large number of petri-dish or 96 well plate is
used.
In this paper, we demonstrate that the proposed tool is able
to separate the noise from a larger dataset and group the time
series patterns with similar features. Here, we apply the UMAP-
assisted HDBSCAN clustering to the Ca2+response dataset from
three doses that display seventeen distinct clusters.
Furthermore, we provide a new perspective in finding the effect
of specific structural arrangement in controlling collective Ca2+
response. This was possible through automated detection of Ca2+
signature by UMAP-assisted HDBSCAN clustering that are specific
to structural features or packing density. Overall, the results indi-
cate that the cells can sense and respond to the drug according
to the self-assembly pattern in cell population via collective Ca2+
signaling. The physical explanation of these clusters could be fur-
ther analyzed by experimental investigations depicting the gap-
junctional activity at high packing density regions [1113,58]. On
the other hand, the features of the Ca2+signature corresponding
to low packing density can be attributed to paracrine signaling
[38,59]. The 2D visualization of UMAP-assisted HDBSCAN clus-
tering of the dataset forms distinct clusters to amplitude, ISI and
frequency (Supplementary Table S4).
Commonly used techniques include functional clustering,
which is time-consuming with larger sample size. Although k-
means and fuzzy clustering have been implemented along with
feature extraction [4,5,17], there can be significant intermixing
between patterns. Hence, we show a detailed comparison between
various clustering techniques after performing dimension
reduction using t-SNE and UMAP (Supplementary Figs S7,5and
S9). The commonly used cluster validity index [28] works for
the partitioning of clusters that are spherically distributed with
similar sizes and densities and are separated by distances. In
order to evaluate the relative performance of the clustering of
Ca2+spiking trains, which is an unsupervised problem, we have
used the SSDD index [32]. This index is based on the computation
of the MST across the points within a cluster.
The proposed pipeline has the potential to be used for selection
of the best suited method for any dataset on Ca2+imaging. The
results show that overall, the performance of UMAP is better
than t-SNE for the dataset investigated in this work. Also, it was
found that HDBSCAN show an improvement over other clustering
algorithm. In previous bioinformatics studies, UMAP is found to
be superior to t-SNE and PCA [43,60]. However, most of these
biological datasets present in previous studies are static datasets.
One of the major novelties of the proposed pipeline is to show
that the algorithm works efficiently on oscillatory responses with
large variability in spiking patterns. Moreover, it has also been
found that UMAP requires less time for computation compared
with t-SNE in the given dataset as found by others [24,43,60,61].
In previous work, clustering was done based on feature extracted
data [4,5,17,62], but those methods were based on the measure-
ment of a number of peaks and amplitude. However, the major
bottleneck in these methods lies in the fact that even though
16 |Integrative Biology, 2023
Figure 10. Mechanistic insight on cell states from various clusters having distinct Ca2+oscillation pattern and illustrations of the differences in
parameters as manifested in Ca2+spiking patterns. Distinct parameter distributions underlying Cluster 10 (low packing density, treated with 10 μMof
norepinephrine) and cluster 5 (high packing density, treated with 10 μM of norepinephrine) indicates two distinct cell states. Box plot comparison of
four parameters (A) k2: Rate constant for receptor desensitization (Negative feedback parameter) (B) k3: Rate constant for PLC- βactivation (C) k5:
Rate constant for Ca2+influx from extracellular space to cytosol (that is modeled as a function [PLC-β]) (Kummer et al. [46]) (D) k6: Rate of Ca2+inf lux
from ER to cytosol, for cluster 5 and 10, respectively. The kinetic parameters are obtained from the fit results of Ca2+responses from cluster 5 and 10.
[P<0.05], Mann–Whitney test.
the number of peaks is the same in various spiking trains, their
response pattern can be distinctly different. Those methods fail to
separate the similar pattern into groups (Supplementary Figs S21
and S22). Through the proposed framework of dimension reduc-
tion and clustering, this paper provides a method for extracting
the types of spiking for various drug doses with distinct structural
arrangement of cells, which can be used a better visualization
tool. The method enables detection of trajectory features with
variations in amplitude, variation in ISI and damping pattern.
From the clustering results, we found that more than 80%
Ca2+responses from cluster 5 come from high packing density,
whereas the responses from cluster 10 come from lower packing
density. In order to obtain the mechanistic insight on the cell
states from cluster 5 and cluster 10, we performed fitting of a
network structure underlying GPCR mediated signaling network
[45](Fig. 11a and b). The analysis provides insight on the factors
that causes sustained oscillation during higher packing density.
The analysis predicts that the packing density leads to differential
intrinsic parameters of the cells including the rate constant of
various Ca2+channel activity and parameter regulating receptor
desensitization.
Although we need to perform further experiments to find out
the cause of difference in these parameters corresponding to
various packing density, our analysis matches with the experi-
mental results from previous work on HeLa cells seeded at dif-
ferent density [14]. They have shown that when the cells were
packed dense, the basal activity of the mitogen-activated protein
(MAP) kinase and Ca2+store content undergoes an augmentation
Mapping of structural arrangement of cells and collective calcium transients |17
Figure 11. Schematic diagram of signaling pathways for various cell
states (A) from cluster 10 (low packing density) and (B) from cluster 5
(high packing density).
[14]. Since higher MAPK expression leads to higher level of ERK
and GRK that inhibits the receptor desensitization process [50,
51], the receptor desensitization level denoted by k2 might be
lower for higher packing density as found from the proposed
analysis. Similarly, the cluster-wise parameter estimation predicts
an increase in Ca2+store as found experimentally [14]. On the
other hand, the rate constant for Ca2+inf lux from extracellu-
lar medium may be highly variable in case of higher density
due to variation in gap-junction mediated IP3 diffusion from the
adjacent cells through proteins such as connexin [11,6365].
This leads to higher level stochasticity in k5 in case of higher
packing density compared with lower packing density, as found
from cluster-wise model fitting. Without clustering, the estima-
tion of these parameters remains challenging due to significant
variability in oscillation pattern. The proposed method enables an
efficient clustering of complex oscillation using an unsupervised
framework followed by parameter estimation specific to a cluster.
Such analysis provided insight into a particular cell state based
on their structural arrangement and neighborhood properties.
We also extracted features the Ca2+responses for two signature
clusters from higher and lower packing density and found that
there is significant difference in number of peaks, maximum
amplitude, oscillation period and store Ca2+. In this study, we
demonstrate that cluster 5 with higher frequency and amplitude
implicates higher store content in Ca2+compared with cluster
10 (Supplementary Fig. S23). Hence,HDBSCAN modeling of Ca2+
spiking dataset can assist in the estimation of parameters specific
to each cluster during construction of biophysical model [66]that
explains the regulation of oscillation [10,37].
Our proposed framework can also be implemented for analysis
of other complex time course transients in biology that exhibit
similar complex behavior such as Ca2+transients, viz. NF-
kappa-β,ERK,etc.[
67](Supplementary Fig. S24). Similar to
Ca2+dataset, the result shows that UMAP provides a superior
segregation of patterns compared with t-SNE for ERK dataset
(Supplementary Fig. S24). In general, the framework enables
grouping of similar responses from a complex dataset consisting
of heterogeneous and complex transients. Such a dataset may
be obtained from various drug doses and packing density,
which is a typical outcome from most of the experiments
for biological studies in petri dishes. Additionally, our current
framework presenting cluster-wise parameter selection can be
used for identifying the cell states and differentiating between
the signaling pathways for various cell states corresponding to
distinct structural arrangements.
Such a clustering framework for analysis of Ca2+, NF-kappa-
βand ERK transients might also be important in deciphering
the intrinsic and extrinsic noise cluster-wise. Specifically, param-
eter estimation in individual clusters may provide insight into
extrinsic noise and cell state. Because the information trans-
mission capacity is dependent on intrinsic as well as extrinsic
noise [67], the technique may further be useful in cluster-wise
characterization of such capacity. Additionally, it can be relevant
in fine-grained estimation of information transmission capacity
in connection to structural (as well as possible spatial) mapping
of cells. It can be further speculated that cluster-wise characteri-
zation of said capacity may elicit insight into the spatiotemporal
heterogeneity and hence high-level functionality.
In this work, the different cell states based on packing density
is interpreted as a case of spatial heterogeneity driven by
deterministic mechanisms. A similar observation was reported
depicting temporal heterogeneity of Ca2+oscillations in previous
studies [68]. One of the major advantages of such deterministic
algorithms underlying mammalian signaling pathway is to have
a specific mechanism underlying cell states that lead to distinct
cell fate and help cells to take particular decision [68]. It has
been shown that higher level of stimulus induces a plateau
profile of Ca2+dynamics along with sequestration of Ca2+in
sub-cellular parts that lead to cell blebbing and lytic cell death
[9]. In future, range of parameter set can be obtained using
advanced estimation framework [66] for all the 17 clusters
found by HDBSCAN modeling. The deterministic mechanism
for each cluster may shed light in understanding a range of
cell states and responses specific to structural arrangement
and stimulus level. Specifically, the specific properties of
Ca2+spiking train may control the norepinephrine mediated
18 |Integrative Biology, 2023
Figure 12. Comparison of oscillation pattern of cytosolic Ca2+from experiment and simulation using UMAP and HDBSCAN assisted model fitting.
Boxplot shows the comparison of extracted features for various cell states from experiment and simulation (A) number of peaks in cluster 5 and 10
from experiment. (B) Maximum amplitude in cluster 5 and 10 from experiment, (C) oscillations period in cluster 5 and 10 from experiment, (D)
number of peaks in cluster 5 and 10 from simulation, (E) maximum amplitude in cluster 5 and 10 from simulation and (F) oscillations period in cluster
5 and 10 from simulation. Distinct features of Ca2+oscillations were found for low and high packing density. Structural arrangement of the cells
regulates the features of Ca2+oscillation pattern. (H) and (G) Packing density and cell morphology specific to cluster number 5 and 10, respectively.
[n=number of peaks of current cell, n_max =maximum of number of peaks, n_0 =minimum of number of peaks, a=maximum amplitude of the
current cell, a_max =maximum of maximum amplitudes, a_0=minimum of maximum amplitudes, t=oscillation time period of current cell
t_max =maximum of oscillation time periods, t_0 =minimum of oscillation time periods]. [P<0.05 ], Mann–Whitney test.
change in cell contractility level [69] which can be further
investigated.
The proposed computation tool can be further validated with
a large dataset containing thousands of neurons on cytosolic-
Ca2+measured in animal models using genetic sensors and
two-photon microscopy [70]. Furthermore, to achieve the fully
automated video analysis, an image processing module for the
segmentation of cells at different time frames can be coupled with
proposed dimension reduction and clustering. Such a framework
assumes importance in automated analysis of fluorescent images
during high content drug screening, dose selection and toxicity
mapping.
Acknowledgements
We also like to thank Dr Kishalay Mitra for his valuable sugges-
tions. We also thank Dr Gautam Narasimhan for allowing us to
conduct the confocal imaging experiments. The authors acknowl-
edge the research facilities provided by the Indian Institute of
Technology Hyderabad, India and Ministry of Education for the
fellowship support for Suman Gare.
Funding
This work was supported by DBT (BT/PR16582/BID/7/667/2016),
Department of Science and Technology (MSC/2020/000592).
Mapping of structural arrangement of cells and collective calcium transients |19
Declaration of competing interest
The authors declare no conflict of interest.
Data availability
The authors confirm that the data supporting the findings of this
study are available within the article [and/or] its supplementary
materials.
References
1. Martinez NJ, Titus SA, Wagner AK et al. High-throughput f luo-
rescence imaging approaches for drug discovery using in vitro
and in vivo three-dimensional models. Expert Opin Drug Discov
2015;10:1347–61.
2. Berridge MJ, Lipp P, Bootman MD. The versatility and universal-
ity of calcium signalling. Nat Rev Mol Cell Biol 2000;1:11.
3. Seshadri S, Hoeppner DJ, Tajinda K. Calcium imaging in drug
discovery for psychiatric disorders. Front Psych 2020;11:713.
4. Swain S, Gupta RK, Ratnayake K et al. Confocal imaging and k-
means clustering of GABA(B) and mGluR mediated modulation
of ca(2+) spiking in hippocampal neurons. ACS C hem Nerosc i
2018;9:3094–107.
5. Gupta RK, Swain S, Kankanamge D et al. Comparison of calcium
dynamics and specific features for G protein–coupled receptor–
targeting drugs using live cell imaging and automated analysis.
SLAS Discovery 2017;22:848–58.
6. Nash MS, Young KW, Challiss RAJ et al. Intracellular signalling:
receptor-specific messenger oscillations. Nature 2001;413:381.
7. Bao XR, Fraser IDC, Wall EA et al. Variability in G-protein-
coupled signaling studied with microfluidic devices. Biophys J
2010;99:2414–22.
8. Dhyani V, Gare S, Gupta RK et al. GPCR mediated con-
trol of calcium dynamics: a systems perspective. Cell Signal
2020;74:109717.
9. Manohar K, Gare S, Chel S et al. Quantitative confocal
microscopy for grouping of dose-response data: decipher-
ing calcium sequestration and subsequent cell death in
the presence of excess norepinephrine. SLAS Technol 2021;26:
24726303211019390.
10. Sun J, Hoying JB, Deymier PA et al. Cellular architecture regulates
collective calcium Signaling and cell contractility. PLoS Comput
Biol 2016;12:e1004955.
11. Lin GC, Rurangirwa JK, Koval M et al. Gap junctional com-
munication modulates agonist-induced calcium oscillations in
transfected HeLa cells. J Cell Sci 2004;117:881–7.
12. Balaji R, Bielmeier C, Harz H et al. Calcium spikes, waves
and oscillations in a large, patterned epithelial tissue. Sci Rep
2017;7:42786.
13. Potter GD, Byrd TA, Mugler A et al. Communication shapes
sensory response in multicellular networks. Proc Natl Acad Sci
2016;113:10334–9.
14. Morita M, Nakane A, Fujii Y et al. High cell density upregu-
lates calcium oscillation by increasing calcium store content
via basal mitogen-activated protein kinase activity. PLoS One
2015;10:e0137610.
15. Feldt Muldoon S, Soltesz I, Cossart R. Spatially clustered neu-
ronal assemblies comprise the microstructure of synchrony in
chronically epileptic networks. Proc Natl Acad Sci U S A 2013;110:
3567–72.
16. Feldt S, Waddell J, Hetrick VL et al. Functional clustering algo-
rithm for the analysis of dynamic network data. Phys Rev E Stat
Nonlin Soft Matter Phys 2009;79:56104.
17. Pantula PD, Miriyala SS,Mitra K. An evolutionary neuro-fuzzy C-
means clustering technique. Eng Appl Artif Intel 2020;89:103435.
18. Ester M, Kriegel HP, Sander J et al. A density-based algorithm
for discovering clusters in large spatial databases with noise.
In: Proceedings of the Second International Conference on Knowledge
Discovery and Data Mining. KDD’96. United States: AAAI Press,
1996, 226–31.
19. Gare S, Chel S, Kuruba M et al. Dimension reduction and clus-
tering of single cell calcium spiking: comparison of t-SNE and
UMAP. In: 2021 National Conference on Communications (NCC).
United States: IEEE, 2021.
20. Sharafoddini A, Dubin JA, Lee J. Identifying subpopulations of
septic patients: a temporal data-driven approach. Comput Biol
Med 2021;130:1–16.
21. Campello RJGB, Moulavi D, Sander J. Density-Based Cluster-
ing Based on Hierarchical Density Estimates. Berlin, Heidelberg,
Springer, 2013.
22. Chel S, Gare S, Giri L. Detection of Specific Templates in Calcium
Spiking in HeLa Cells Using Hierarchical DBSCAN: Clustering
and Visualization of CellDrug Interaction at Multiple Doses.In:
2020 42nd Annual International Conference of the IEEE Engineering
in Medicine & Biology Society (EMBC), Institute of Electrical and
Electronics Engineers Inc., United states, 2020, 2425–8.
23. Hinton L MG. Visualizing data using t-SNE. J Mach Learn Res
2008;9:2579–605.
24. McInnes L, Healy J, Melville J. UMAP: Uniform Manifold Approxima-
tion and Projection for Dimension Reduction. Institute of Electrical
and Electronics Engineers Inc., United states, Published online
February 9, 2018.
25. Hotelling H. Analysis of a complex of statistical variables into
principal components. JEducPsychol1933;24:417–41.
26. Daniels AL, Calderon CP, Randolph TW. Machine learning and
statistical analyses for extracting and characterizing “finger-
prints” of antibody aggregation at container interfaces from flow
microscopy images. Biotechnol Bioeng 2020;117:3322–35.
27. Saxena A, Ravutla S, Upadhyay V et al. Statistical modeling
of cell-to-cell variability in viral infection during passaging in
suspension cell culture: application in Monte-Carlo simulation.
Biotechnol Bioeng 2020;117:1483–501.
28. Al-jabery KK, Obafemi-Ajayi T, Olbricht GR et al. Evaluation of
cluster validation metrics. In: Computational Learning Approaches
to Data Analytics in Biomedical Applications. United States: Elsevier,
2020, 189–208.
29. Zhao Q, Fränti P. WB-index: a sum-of-squares based index for
cluster validity. Data Knowl Eng 2014;92:77–89.
30. Liu Y, Li Z, Xiong H et al. Understanding and enhancement
of internal clustering validation measures. IEEE Trans Cybern
2013;43:982–94.
31. Moulavi D, Jaskowiak PA, Campello RJGB et al. Density-based
clustering validation. In: Proceedings of the 2014 SIAM International
Conference on Data Mining. United States: Society for Industrial
and Applied Mathematics, 2014, 839–47.
32. Liang S, Han D, Yang Y. Cluster validity index for irregular
clustering results. Appl Soft Comput 2020;95:106583.
33. Greengard P, Liu Y, Steinerberger S et al. Factor clustering with
t-SNE. SSRN Electron J 2020.
34. Vermeulen M, Smith K, Eremin K et al. Application of uni-
form manifold approximation and projection (UMAP) in spectral
imaging of artworks. Spectrochim Acta A Mol Biomol Spectrosc
2021;252:119547.
35. Kaji H, Takoh K, Nishizawa M et al. Intracellular Ca2+imaging
for micropatterned cardiac myocytes. Biotechnol Bioeng 2003;81:
748–51.
20 |Integrative Biology, 2023
36. Pinto MCX, Tonelli FMP, Vieira ALG et al. Studying complex
system: calcium oscillations as attractor of cell differentiation.
Integr Biol 2016;8:130–48.
37. Petersen AP, Cho N, Lyra-Leite DM et al. Regulation of calcium
dynamics and propagation velocity by tissue microstructure in
engineered strands of cardiac tissue. Integr Biol 2020;12:34–46.
38. Wang N, de Bock M, Decrock E et al. Paracrine signaling through
plasma membrane hemichannels. Biochimica et Biophysica Acta
(BBA) - Biomembranes 2013;1828:35–50.
39. Manghani C, Gupta A, Tripathi V et al. Cardioprotective poten-
tial of curcumin against norepinephrine-induced cell death: a
microscopic study. J Microsc 2017;265:232–44.
40. Levy B, Clere-Jehl R, Legras A et al. Epinephrine versus nore-
pinephrine for cardiogenic shock after acute myocardial infarc-
tion. J Am Coll Cardiol 2018;72:173–82.
41. Maletic V, Eramo A, Gwin K et al.The role of norepinephrine and
its α-adrenergic receptors in the pathophysiology and treatment
of major depressive disorder and schizophrenia: a systematic
review. Front Psych 2017;8:1–12.
42. Venkateswarlu K, Suman G, Dhyani V et al. Three - dimensional
imaging and quantification of real - time cytosolic calcium
oscillations in microglial cells cultured on electrospun matri-
ces using laser scanning confocal microscopy. Biotechnol Bioeng
2020;e117:1–16.
43. Hozumi Y, Wang R, Yin C et al. UMAP-assisted K-means clus-
tering of large-scale SARS-CoV-2 mutation datasets. Comput Biol
Med 2021;131:104264.
44. Wang L, Chen P, Chen L et al. Ship AIS trajectory clustering: an
HDBSCAN-based approach. J Mar Sci Eng 2021;9:1–20.
45. Giri L, Patel AK, Karunarathne WKA et al. A G-protein subunit
translocation embedded network motif underlies GPCR regula-
tion of calcium oscillations. Biophys J 2014;107:242–54.
46. Kummer U, Olsen LF, Dixon CJ et al. Switching from simple
to complex oscillations in calcium signaling. Biophys J 2000;79:
1188–95.
47. Larsen AZ, Olsen LF, Kummer U. On the encoding and decod-
ing of calcium signals in hepatocytes. Biophys Chem 2004;107:
83–99.
48. Upadhyay V, Teja RS, Dhyani V et al. A model screening frame-
work for the generation of Ca2+oscillations in hippocampal
neurons using differential evolution. In: 2019 9th International
IEEE/EMBS Conference on Neural Engineering (NER). United States:
IEEE, 2019, 961–4.
49. Dau HA, Keogh E, Kamgar K. et al. The UCR time series classif i-
cation archive. UCR Archive 2018. URL https://www.cs.ucr.edu/~
eamonn/time_series_data_2018/.
50. Fu X, Koller S, Abd Alla J et al. Inhibition of G-protein-
coupled receptor kinase 2 (GRK2) triggers the growth-promoting
mitogen-activated protein kinase (MAPK) pathway. JBiolChem
2013;288:7738–55.
51. Elorza A, Penela P, Sarnago S et al. MAPK-dependent degradation
of G protein-coupled receptor kinase 2. JBiolChem2003;278:
29164–73.
52. Radstake FDW, Raaijmakers EAL, Luttge R et al. CALIMA: the
semi-automated open-source calcium imaging analyzer. Comput
Methods Programs Biomed 2019;179:104991.
53. Pnevmatikakis EA. Analysis pipelines for calcium imaging data.
Curr Opin Neurobiol 2019;55:15–21.
54. Cantu DA, Wang B, Gongwer MW et al. EZcalcium: open-source
toolbox for analysis of calcium imaging data. Front Neural Circuits
2020;14:25.
55. Delestro F, Scheunemann L, Pedrazzani M et al. In vivo large-
scale analysis of drosophila neuronal calcium traces by auto-
mated tracking of single somata. Sci Rep 2020;10:7153.
56. Booij TH, Price LS, Danen EHJ. 3D cell-based assays for drug
screens: challenges in imaging, image analysis, and high-
content analysis. SLAS Discov 2019;24:615–27.
57. Nayak L, De RK. An algorithm for modularization of MAPK
and calcium signaling pathways: comparative analysis among
different species. J Biomed Inform 2007;40:726–49.
58. Choi EJ, Palacios-Prado N, Sáez JC et al. Confirmation of Con-
nexin45 underlying weak gap junctional intercellular coupling
in HeLa cells. Biomolecules 2020;10:1389.
59. Kepseu WD, Woafo P. Intercellular waves propagation in an
array of cells coupled through paracrine signaling: a computer
simulation study. Phys Rev E 2006;73:41912.
60. Becht E, McInnes L, Healy J et al. Dimensionality reduction for
visualizing single-cell data using UMAP. Nat Biotechnol 2019;37:
38–47.
61. Kanapeckait˙
e A, Burokien ˙
e N. Insights into therapeutic tar-
gets and biomarkers using integrated multi-‘omics’ approaches
for dilated and ischemic cardiomyopathies. Integr Biol 2021;13:
121–37.
62. Saxena A, Dhyani V, Suman G et al. Effect of topology and time
window on probability distribution underlying baclofen induced
Ca2+response in hippocampal neurons. In: Proceedings of the
Annual International Conference of the IEEE Engineering in Medicine
and Biology Society, EMBS, Institute of Electrical and Electronics
Engineers Inc., United states, 2019.
63. Choi EJ, Palacios-Prado N, Sáez JC et al. Confirmation of Con-
nexin45 underlying weak gap junctional intercellular coupling
in HeLa cells. Biomolecules 2020;10:1389.
64. Rimkut˙
e L, Jotautis V, Marandykina A et al. The role of neural
connexins in HeLa cell mobility and intercellular communica-
tion through tunneling tubes. BMC Cell Biol 2016;17:3.
65. Paemeleire K, Martin PEM, Coleman SL et al. Intercellular cal-
cium waves in HeLa cells expressing GFP-labeled Connexin 43,
32, or 26. Mol Biol Cell 2000;11:1815–27.
66. Yao J, Pilko A, Wollman R. Distinct cellular states determine
calcium signaling response. Mol Syst Biol 2016;12:1–12.
67. Selimkhanov J, Taylor B, Yao J et al. Accurate information
transmission through dynamic biochemical signaling networks.
Science (1979) 2014;346:1370–3.
68. Sumit M, Jovic A, Neubig RR et al. A two-pulse cellular stimu-
lation test elucidates variability and mechanisms in signaling
pathways. Biophys J 2019;116:962–73.
69. Huang J, Liu Y, Chen J et al.Harmine is an effective therapeutic
small molecule for the treatment of cardiac hypertrophy. Acta
Pharmacol Sin 2022;43:50–63.
70. Stringer C, Pachitariu M. Computational processing of neu-
ral recordings from calcium imaging data. Curr Opin Neurobiol
2019;55:22–31.
Preprint
Full-text available
Mass spectrometry imaging (MSI) is a powerful technology that can be employed to define the spatial distribution and relative abundance of structurally identified and yet-undefined metabolites across a tissue cryosection. While numerous software packages enable pixel-by-pixel imaging of individual metabolite distributions, the research community lacks a discovery tool that provides spatial imaging of all metabolite abundance ratio pairs. Importantly, the recognition of correlating metabolite pairs offers a strategy to discover unanticipated molecules that contribute to or regulate a shared metabolic pathway, uncover hidden metabolic heterogeneity across cells and tissue subregions, and offers a single timepoint indicator of flux through a particular metabolic pathway of interest. Here, we describe the development and implementation of an untargeted R package workflow for pixel-by-pixel imaging of ratios for all metabolites detected in an MSI experiment. Considering untargeted MSI studies of murine brain and embryogenesis, we demonstrate that ratio imaging offers the opportunity to minimize systematic data variation introduced during sample handling or due to instrument drift, markedly enhances spatial image resolution, and can serve to reveal previously unrecognized tissue regions that are metabotype-distinct. Furthermore, ratio imaging facilitates the discovery of novel regional biomarkers, and can provide anatomical information regarding the spatial distribution of metabolite-linked biochemical pathways. The algorithm described herein is generic and can be applied to any MSI dataset containing spatial information for metabolites, peptides or proteins. Importantly, this software package offers a powerful add-on tool that can significantly enhance knowledge obtained from currently employed spatial metabolite profiling technologies.
Article
Full-text available
The Automatic Identification System (AIS) of ships provides massive data for maritime transportation management and related researches. Trajectory clustering has been widely used in recent years as a fundamental method of maritime traffic analysis to provide insightful knowledge for traffic management and operation optimization, etc. This paper proposes a ship AIS trajectory clustering method based on Hausdorff distance and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), which can adaptively cluster ship trajectories with their shape characteristics and has good clustering scalability. On this basis, a re-clustering method is proposed and comprehensive clustering performance metrics are introduced to optimize the clustering results. The AIS data of the estuary waters of the Yangtze River in China has been utilized to conduct a case study and compare the results with three popular clustering methods. Experimental results prove that this method has good clustering results on ship trajectories in complex waters.
Article
Full-text available
Harmine is a β-carboline alkaloid isolated from Banisteria caapi and Peganum harmala L with various pharmacological activities, including antioxidant, anti-inflammatory, antitumor, anti-depressant, and anti-leishmanial capabilities. Nevertheless, the pharmacological effect of harmine on cardiomyocytes and heart muscle has not been reported. Here we found a protective effect of harmine on cardiac hypertrophy in spontaneously hypertensive rats in vivo. Further, harmine could inhibit the phenotypes of norepinephrine-induced hypertrophy in human embryonic stem cell-derived cardiomyocytes in vitro. It reduced the enlarged cell surface area, reversed the increased calcium handling and contractility, and downregulated expression of hypertrophy-related genes in norepinephrine-induced hypertrophy of human cardiomyocytes derived from embryonic stem cells. We further showed that one of the potential underlying mechanism by which harmine alleviates cardiac hypertrophy relied on inhibition of NF-κB phosphorylation and the stimulated inflammatory cytokines in pathological ventricular remodeling. Our data suggest that harmine is a promising therapeutic agent for cardiac hypertrophy independent of blood pressure modulation and could be a promising addition of current medications for cardiac hypertrophy.
Article
Fluorescent calcium (Ca ²⁺ ) imaging is one of the preferred methods to record cellular activity during in vitro preclinical studies, high-content drug screening, and toxicity analysis. Visualization and analysis for dose–response data obtained using high-resolution imaging remain challenging, due to the inherent heterogeneity present in the Ca ²⁺ spiking. To address this challenge, we propose measurement of cytosolic Ca ²⁺ ions using spinning-disk confocal microscopy and machine learning–based analytics that is scalable. First, we implemented uniform manifold and projection (UMAP) for visualizing the multivariate time-series dataset in the two-dimensional (2D) plane using Python. The dataset was obtained through live imaging experiments with norepinephrine-induced Ca ²⁺ oscillation in HeLa cells for a large range of doses. Second, we demonstrate that the proposed framework can be used to depict the grouping of the spiking pattern for lower and higher drug doses. To the best of our knowledge, this is the first attempt at UMAP visualization of the time-series dose response and identification of the Ca ²⁺ signature during lytic death. Such quantitative microscopy can be used as a component of a high-throughput data analysis workflow for toxicity analysis.
Article
At present, heart failure (HF) treatment only targets the symptoms based on the left ventricle dysfunction severity; however, the lack of systemic ‘omics’ studies and available biological data to uncover the heterogeneous underlying mechanisms signifies the need to shift the analytical paradigm towards network-centric and data mining approaches. This study, for the first time, aimed to investigate how bulk and single cell RNA-sequencing as well as the proteomics analysis of the human heart tissue can be integrated to uncover HF-specific networks and potential therapeutic targets or biomarkers. We also aimed to address the issue of dealing with a limited number of samples and to show how appropriate statistical models, enrichment with other datasets as well as machine learning-guided analysis can aid in such cases. Furthermore, we elucidated specific gene expression profiles using transcriptomic and mined data from public databases. This was achieved using the two-step machine learning algorithm to predict the likelihood of the therapeutic target or biomarker tractability based on a novel scoring system, which has also been introduced in this study. The described methodology could be very useful for the target or biomarker selection and evaluation during the pre-clinical therapeutics development stage as well as disease progression monitoring. In addition, the present study sheds new light into the complex aetiology of HF, differentiating between subtle changes in dilated cardiomyopathies (DCs) and ischemic cardiomyopathies (ICs) on the single cell, proteome and whole transcriptome level, demonstrating that HF might be dependent on the involvement of not only the cardiomyocytes but also on other cell populations. Identified tissue remodelling and inflammatory processes can be beneficial when selecting targeted pharmacological management for DCs or ICs, respectively.
Article
Coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has a worldwide devastating effect. Understanding the evolution and transmission of SARS-CoV-2 is of paramount importance for controlling, combating and preventing COVID-19. Due to the rapid growth in both the number of SARS-CoV-2 genome sequences and the number of unique mutations, the phylogenetic analysis of SARS-CoV-2 genome isolates faces an emergent large-data challenge. We introduce a dimension-reduced K-means clustering strategy to tackle this challenge. We examine the performance and effectiveness of three dimension-reduction algorithms: principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP). By using four benchmark datasets, we found that UMAP is the best-suited technique due to its stable, reliable, and efficient performance, its ability to improve clustering accuracy, especially for large Jaccard distanced-based datasets, and its superior clustering visualization. The UMAP-assisted K-means clustering enables us to shed light on increasingly large datasets from SARS-CoV-2 genome isolates.
Article
This study assesses the potential of Uniform Manifold Approximation and Projection (UMAP) as an alternative tool to t-distributed stochastic neighbor embedding (t-SNE) for the reduction and visualization of visible spectral images of works of art. We investigate the influence of UMAP parameters—such as, correlation distance, minimum embedding distance, as well as number of embedding neighbors— on the reduction and visualization of spectral images collected from Poèmes Barbares (1896), a major work by the French artist Paul Gauguin in the collection of the Harvard Art Museums. The use of a cosine distance metric and number of neighbors equal to 10 preserves both the local and global structure of the Gauguin dataset in a reduced two-dimensional embedding space thus yielding simple and clear groupings of the pigments used by the artist. The centroids of these groups were identified by locating the densest regions within the UMAP embedding through a 2D histogram peak finding algorithm. These centroids were subsequently fit to the dataset by non-negative least square thus forming maps of pigments distributed across the work of art studied. All findings were correlated to macro XRF imaging analyses carried out on the same painting. The described procedure for reduction and visualization of spectral images of a work of art is quick, easy to implement, and the software is opensource thus promising an improved strategy for interrogating reflectance images from complex works of art.
Article
Coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has a worldwide devastating effect. The understanding of evolution and transmission of SARS-CoV-2 is of paramount importance for the COVID-19 control, combating, and prevention. Due to the rapid growth of both the number of SARS-CoV-2 genome sequences and the number of unique mutations, the phylogenetic analysis of SARS-CoV-2 genome isolates faces an emergent large-data challenge. We introduce a dimension-reduced $k$-means clustering strategy to tackle this challenge. We examine the performance and effectiveness of three dimension-reduction algorithms: principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP). By using four benchmark datasets, we found that UMAP is the best-suited technique due to its stable, reliable, and efficient performance, its ability to improve clustering accuracy, especially for large Jaccard distanced-based datasets, and its superior clustering visualization. The UMAP-assisted $k$-means clustering enables us to shed light on increasingly large datasets from SARS-CoV-2 genome isolates.
Article
Sepsis is one of the deadliest diseases in North America and in spite of the vast amount of research on this topic there is still uncertainty in the outcome of sepsis treatments. This study aimed at investigating the informativeness of temporal electronic health records (EHR) in stratifying septic patients and identifying subpopulations of septic patients with similar trajectories and clinical needs. We performed hierarchical clustering and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) analyses using data from septic patients in the MIMIC III intensive care unit database. The t-Distributed Stochastic Neighbor Embedding (t-SNE) method was utilized to map patients to a two-dimensional space. We utilized silhouette index and cluster-wise stability assessment by resampling to investigate the validity of the clusters. The hierarchical clustering with Euclidean metric identified twelve clinically recognizable subgroups that demonstrated different characteristics in spite of sharing common conditions. Our results demonstrated that data-driven approaches can help in customizing care platforms for septic patients by identifying similar clinically relevant groups.