ArticlePDF Available

Mapping of structural arrangement of cells and collective calcium transients: an integrated framework combining live cell imaging using confocal microscopy and UMAP-assisted HDBSCAN-based approach

January 2023
Integrative Biology 14(8-12)

January 2023
14(8-12)

DOI:10.1093/intbio/zyac017

Authors:

Gare Suman

Universitätsklinikum Bonn

Soumita Chel

Indian Institute of Technology Hyderabad

Show all 6 authorsHide

Live cell calcium (Ca2+) imaging is one of the important tools to record cellular activity during in vitro and in vivo preclinical studies. Specially, high-resolution microscopy can provide valuable dynamic information at the single cell level. One of the major challenges in the implementation of such imaging schemes is to extract quantitative information in the presence of significant heterogeneity in Ca2+ responses attained due to variation in structural arrangement and drug distribution. To fill this gap, we propose time-lapse imaging using spinning disk confocal microscopy and machine learning-enabled framework for automated grouping of Ca2+ spiking patterns. Time series analysis is performed to correlate the drug induced cellular responses to self-assembly pattern present in multicellular systems. The framework is designed to reduce the large-scale dynamic responses using uniform manifold approximation and projection (UMAP). In particular, we propose the suitability of hierarchical DBSCAN (HDBSCAN) in view of reduced number of hyperparameters. We find UMAP-assisted HDBSCAN outperforms existing approaches in terms of clustering accuracy in segregation of Ca2+ spiking patterns. One of the novelties includes the application of non-linear dimension reduction in segregation of the Ca2+ transients with statistical similarity. The proposed pipeline for automation was also proved to be a reproducible and fast method with minimal user input. The algorithm was used to quantify the effect of cellular arrangement and stimulus level on collective Ca2+ responses induced by GPCR targeting drug. The analysis revealed a significant increase in subpopulation containing sustained oscillation corresponding to higher packing density. In contrast to traditional measurement of rise time and decay ratio from Ca2+ transients, the proposed pipeline was used to classify the complex patterns with longer duration and cluster-wise model fitting. The two-step process has a potential implication in deciphering biophysical mechanisms underlying the Ca2+ oscillations in context of structural arrangement between cells.

Flow diagram for the proposed pipeline for selection of methods, cluster number, and analysis of live imaging data. Green box denotes the data preprocessing, blue box denotes the dimension reduction methods, pink box denotes the implementation of density-based clustering methods, and brown boxes indicate the validation methods including computation of SSDD an index for selecting optimal cluster, similarity ratio and correlation coefficient. Redbox and arrow denote the initialization of 1000 matrices to identify optimal 2D embedding for t-SNE and UMAP. (SSDD = shapes, Sizes, Densities and small separation Distances)

…

Data acquisition for norepinephrine mediated Ca 2+ oscillations from regions of varied packing density of HeLa cells using confocal microscopy. (A) Schematic diagram for various conditions including low-dose high-packing density, low-dose low-packing density, medium-dose high-packing density, medium-dose low-packing density, high-dose high-packing density, and high-dose low-packing density. (B) Representative images for detailed structural arrangement from high-and low-packing density with low, medium and high drug doses. (C) Heat map representation of the entire Ca 2+ spiking dataset [for a duration of 600 s] obtained from six different conditions.

…

Fluorescent Time-lapse images of Fluo-4 loaded HeLa cells excited at 480 nm and Spatial intensity mapping of Fluo-4 showing the distribution of Ca 2 + /− response in HeLa cell population in presence of norepinephrine (a2-adrenergic receptor agonist). Representative time-lapse images were collected from the videos captured using 63X oil objective in spinning-disk confocal microscopy for 600 s. (a) High packing density f luorescent images at 100 μM and corresponding spatial intensity map, (b) low packing density f luorescent images at 100 μM and corresponding spatial intensity map. Time course of Fluo-4 intensity obtained from single cells treated with a norepinephrine at 1, 10 and 100 μM shows cell to cell variability in Ca 2 + /− response. From each case, five representative f luorescent traces from individual cells are shown. (c), (d), (e) show the time course of Ca 2 + for low-packing density cells at three different doses, and (f), (g), (h) show the time course of Ca 2 + for high-packing density cells at three different doses.

…

Dimension reduction and cluster number distribution across 1000 runs of t-SNE and UMAP. Selection of optimal cluster number was performed using SSDD index. (a) Dimensionality reduction using UMAP of Ca 2+ responses from high packing (left) and low packing (right) density of cells for various doses. (b) and (c) The cluster number distribution for t-SNE and UMAP from DBSCAN. Cluster number is chosen based on the value with highest reproducibility (Mode). (Mode t-SNE+ DBSCAN = 12, Mode UMAP+DBSCAN = 10). (d) and (e) The cluster number distribution for t-SNE and UMAP from HDBSCAN clustering algorithm (Mode t-SNE + H DBSCAN = 13, Mode UMAP +H DBSCAN = 17). (f) Box plot representation of SSDD values for each structure obtained after dimension reduction corresponding to the cluster numbers chosen from a, b, c and d (for t-SNE + DBSCAN, k = 12, for UMAP+DBSCAN, k = 10, for t-SNE + HDBSCAN, k = 13, and UMAP+HDBSCAN, k = 17, respectively where k= cluster number).

…

Comparison of different combination of dimensional reduction and clustering algorithms on Ca 2 + spiking dataset containing data for three different drug doses and two different packing densities. (A) t-SNE projection and DBSCAN clustering (k = 12), HDBSCAN clustering (k = 13). (B) UMAP projection and DBSCAN clustering (k = 10). HDBSCAN clustering (k = 17). Each color represents different cluster for the four cases.

…

Figures - uploaded by Gare Suman

Content may be subject to copyright.

Content uploaded by Gare Suman

Content may be subject to copyright.

Received: May 25, 2022. Revised: November 22, 2022. Editorial decision: November 30, 2022. Accepted: November 30, 2022

Integrative Biology, 2023, 1–20

https://doi.org/10.1093/intbio/zyac017

Original Article

Mapping of structural arrangement of cells and

collective calcium transients: an integrated framework

combining live cell imaging using confocal microscopy

and UMAP-assisted HDBSCAN-based approach

Suman Gare1,Soumita Chel1,T.K. Abhinav1,Vai bh av Dhyani 1,Soumya Jana2and Lopamudra Giri 1, *

1Department of Chemical Engineering, Indian Institute of Technology, Hyderabad, India

2Department of Electrical Engineering, Indian Institute of Technology, Hyderabad, India

*Corresponding author. E-mail: giril@che.iith.ac.in

Abstract

Live cell calcium (Ca2+) imaging is one of the important tools to record cellular activity during in vitro and in vivo preclinical studies.

Specially, high-resolution microscopy can provide valuable dynamic information at the single cell level. One of the major challenges in

the implementation of such imaging schemes is to extract quantitative information in the presence of significant heterogeneity in Ca2+

responses attained due to variation in structural arrangement and drug distribution. To fill this gap, we propose time-lapse imaging

using spinning disk confocal microscopy and machine learning-enabled framework for automated grouping of Ca2+spiking patterns.

Time series analysis is performed to correlate the drug induced cellular responses to self-assembly pattern present in multicellular

systems. The framework is designed to reduce the large-scale dynamic responses using uniform manifold approximation and projection

(UMAP). In particular, we propose the suitability of hierarchical DBSCAN (HDBSCAN) in view of reduced number of hyperparameters.

We find UMAP-assisted HDBSCAN outperforms existing approaches in terms of clustering accuracy in segregation of Ca2+spiking

patterns. One of the novelties includes the application of non-linear dimension reduction in segregation of the Ca2+transients with

statistical similarity. The proposed pipeline for automation was also proved to be a reproducible and fast method with minimal user

input. The algorithm was used to quantify the effect of cellular arrangement and stimulus level on collective Ca2+responses induced

by GPCR targeting drug. The analysis revealed a significant increase in subpopulation containing sustained oscillation corresponding to

higher packing density. In contrast to traditional measurement of rise time and decay ratio from Ca2+transients, the proposed pipeline

was used to classify the complex patterns with longer duration and cluster-wise model fitting. The two-step process has a potential

implication in deciphering biophysical mechanisms underlying the Ca2+oscillations in context of structural arrangement between

cells.

Keywords: calcium imaging, t-SNE, UMAP, HDBSCAN, cell-to-cell connectivity, confocal microscopy, GPCR targeting drug, SSDD

Insight, innovation and integration

Although measurement of Ca2+transients is crucial in assessment of cell–drug interactions and preclinical studies, the analysis is

significantly hindered by the cell-to-cell variability in spiking patterns. The inherent heterogeneity in complex oscillation patterns

faces an emergent data analysis challenge. In this context, the authors innovate an integrated approach combining confocal

imaging and two step machine learning framework combining non-linear dimension reduction and density-based clustering. We

demonstrate that UMAP-assisted HDBSCAN clustering outperforms over existing techniques in handling time-series analysis. The

findings indicate distinct spiking patterns induced by GPCR-targeting drug that is specific to structural arrangement between

cells. Gained insight assumes significant importance in identification of biophysical mechanisms underlying the collective Ca2+

dynamics.

INTRODUCTION

In vitro Ca2+imaging using a fluorescent and confocal microscope

is widely used to study cell–drug interactions as well as assess-

ment of drug efficacy and cytotoxicity [1]. Specifically, intracellu-

lar Ca2+is a crucial parameter to control many cellular functions

[2] and Ca2+imaging assumes importance in performing func-

tional assay during an assessment of drugs [3]. In recent times,

measurement of cytosolic Ca2+is considered as one of the tools

for G-protein-coupled receptor targeting (GPCR) drug screening [4,

5]. Activated GPCR is known to regulate the Ca2+signaling in a

cell [6–8] and change in Ca2+flux is one of the key markers for

the evaluation of cell state. However, the drug-induced Ca2+oscil-

lations obtained using fluorescence imaging yield a collection of

complex spiking patterns [5,9]. Full understanding of such com-

2|Integrative Biology, 2023

plex oscillations requires quantitative dynamic measurements of

signal transmission between cells. The automation in the segre-

gation of such time-course data consisting of asynchronous Ca2+

spiking remains challenging due to the variation in inter-spike-

interval and damping patterns present in the dataset.

Cellular architecture and structural arrangement is known

to regulate functioning of cells including contraction, migration

and differentiation [10]. The analysis of cell-to-cell connectivity

effect on cells is not only useful for understanding molecular

mechanism regulating cell function but also important in drug

development based on cell-based assays. One of the major chal-

lenges in the analysis of Ca2+spiking pattern is to identify the

signature responses that arise due to higher packing density and

gap junction-mediated Ca2+diffusion [11–13]. Especially, when

the regions of interest are chosen randomly during imaging of

cells treated with a particular drug dose, it has been noticed

that the regions can be from a dense region having a higher

packing density of cells or it can be from a lower packing density.

However, such variability in structural arrangement and cell-

to-cell connectivity may play a crucial role in regulating Ca2+

spiking patterns [10,11,14]. Moreover, cell-to-cell variability may

arise due to the distribution in the number of receptors and

uneven drug distribution in the cells. Hence, the interpretation of

cellular activity in a large dataset is rather challenging. In order

to address this issue, we propose Ca2+imaging using confocal

microscopy and machine learning-based framework for cluster-

ing of a Ca2+dataset and identifying the distinguishable pat-

terns corresponding to low and high packing density. Generally,

the asynchronous Ca2+spiking patterns are not only different

with respect to frequency and amplitude but also have signifi-

cant variability in the inter-spike interval (ISI) in the course of

time [5].

Previously, k-means and functional clustering methods were

used for Ca2+spiking analysis [4,15,16]. However, the major

drawback of functional clustering [15,16] is that it focuses only

on calculating the distance between spike trains based on spike

timing where information on amplitude is omitted. As the number

of cells increases, there is a significant increase in computation

time due to the calculation of the distance between spikes for the

actual dataset along with the surrogate dataset [16]. Considering

the increase in the number of Ca2+spiking trains obtained from

various experiments and the presence of tremendous variability

in ISI and amplitude, there is an urgent need for the development

of an efficient and reliable method for automation in analysis.

K-means and FCM [4,5,17] were previously used to clus-

ter the feature extraction data; however, they failed to reduce

the noise. On the other hand, Density-based Spatial Clustering

of Applications with Noise (DBSCAN), developed by Ester et al.

[18], assumes importance in separating noises, which relies on a

density-based notion of clusters [18–20]. One of the disadvantages

of DBSCAN is that it gives flat clustering and does not work well

for the data with different densities. In this context, we show

that Hierarchical Density-based Spatial Clustering of Applications

with Noise (HDBSCAN) [21,22] can be used for clustering after

reduction of dimension using uniform manifold approximation

and projection (UMAP) and t-distribution stochastic neighborhood

embedding (t-SNE) [23,24]. This algorithm works on its parental

algorithm, DBSCAN, by converting it into a hierarchical clustering

algorithm. HDBSCAN can be used to visualize the data by means

of a simplified cluster tree that does not require many critical

hyperparameters as input other than ‘min_cluster_size’, which

is defined as the minimum number of points required for the

clustering.

One of the major objectives of this work is to identify the

best-suited dimension reduction algorithm for spiking datasets

based on their effectiveness in grouping the Ca2+responses using

a machine learning-enabled pipeline. First, we performed time

series data acquisition using spinning disk confocal microscopy

and compared the performance of two nonlinear algorithms for

dimension reduction, including t-SNE and UMAP with principal

component analysis (PCA) [25]. Furthermore, we implemented

clustering using k-means, agglomerative clustering, DBSCAN and

HDBSCAN. Here,we show that the UMAP-assisted HDBSCAN clus-

tering outperforms over existing approaches for clustering of Ca2+

spiking dataset.

In order to validate the proposed method, the statistical simi-

larity [26] was assessed by fitting the dataset to various statistical

distributions based on minimum Akaike-Information Criterion

(AIC) [27]. Additionally, we performed pairwise Kruskal–Walli’s

testing for the distribution parameters obtained for each clus-

ter. In contrast to conventionally used validation indices [28–31],

we used (Shapes, different Sizes & Densities, small separation

Distances) SSDD index as the cluster validity index [32], which is

able to find the best fitting partition for the case of clusters with

arbitrary shapes, sizes, densities and small separation Distances.

In the SSDD index, it is assumed that good clusters are high pack-

ing density regions surrounded by low packing density regionsand

separated from other high packing density regions.

One of the most important contributions of this work is the

introduction of a strategy in the workflow so that reproducibility

is retained when the algorithm is running multiple times to find

a suitable number of clusters. One of the major issues in using

t-SNE and UMAP is obtaining variations in 2D embedding from a

single dataset for several repetitions due to the stochastic nature

of the algorithm. In order to address this, we propose a workflow

for performing t-SNE or UMAP with 1000 random seed states

followed by density-based clustering and choosing an embedding

with the highest reproducibility and minimum SSDD index. Also,

the similarity between the various embedding was quantified

using the structural similarity ratio [33,34]. The proposed pipeline

combining dimension reduction and clustering offers a simple,

fast and flexible toolbox for grouping of spiking behavior of cells

obtained from regions of disparate structural arrangement and

different drug doses.

In this paper, we hypothesize that higher packing density and

gap-junction-mediated Ca2+diffusion may induce an increase

in spiking amplitude, duration of oscillation and frequency [10,

12,13,35–37]. On the other hand, paracrine signaling may be

responsible for the Ca2+diffusion in cells with lower packing

density and leads to lower frequency and amplitude [38]. In

this work, we show that the proposed framework is necessary

to demonstrate that the structural arrangement and packing

density of cells are the crucial factor in controlling the relative

distribution of cells having various levels of spiking frequency,

ISI and amplitude. Here, we used norepinephrine as the GPCR

targeting drug that is generally used for increasing blood pressure

in case of emergency at intensive care unit [39,40]. The rationale

for choosing norepinephrine is that it is known to induce complex

oscillations followed by activation of adrenergic receptors [41].

Since the proposed algorithm was able to detect subtle differ-

ences in cell functionality, it can be used to assess the impact

of structural arrangement and drug dose distribution. In contrast

to traditional evaluation of rise time and decay ratio from Ca2+

spikes, the method can be used to quantify the complex patterns

with longer duration. Furthermore, we show a proof of concept

that a clustering of time course responses and cluster-wise model

Mapping of structural arrangement of cells and collective calcium transients |3

fitting is necessary to obtain insight into cell states and underlying

signaling mechanism.

The paper’s contents are arranged as follows: Section 2 briefly

describes the methodology of this research, followed by a detailed

description of the algorithms and cluster validity indices. Section

3 applies the algorithm to a Ca2+spiking dataset obtained from

HeLa monolayer culture and is compared with some classic

existing clustering algorithms. Furthermore, Section 4 describes

the significance of the clusters and how the two-step machine

learning paradigm along with cluster-based model fitting can

be implemented in the identification of the role of cellular

arrangement in controlling Ca2+signaling in cells treated with

GPCR targeting drug, norepinephrine. The implication of the

proposed method and limitations are concluded in Section 5.

An overall workflow for the proposed pipeline is presented in

Fig. 1.

MATERIALS AND METHODS

Cell culture

HeLa cells (ATCC, Manassas, VA) were cultured in minimum

essential media (Cellgro, Manassas, VA) supplemented with 10%

dialyzed fetal bovine serum (Atlanta Biologicals), in the presence

of 1% penicillin–streptomycin (PS) in a 29-mm glass-bottom

dishes (In Vitro Scientific, Sunnyvale, CA) at 37◦CinCO2(5%)

humidified incubator, and 0.2 ×10 [6] cells were seeded and

maintained in culture until 70–80% confluency and used for

further analysis.

Ca2+imaging with Fluo-4 dye (spinning disc

confocal microscopy)

HeLa cells were incubated for 30 min in Hank’s balanced salt solu-

tion (HBSS, Invitrogen, Life Technologies, Grand Island, NY) with

1.25 mM Ca2+, 5.3 nM KCl and 0.44 nM KH2PO4 (Sigma, St. Louis,

MO) with 2 μM Fluo-4 dye (Molecular Probes, Life Technologies,

Grand Island, NY) followed by washing with HBSS without FLuo-4.

The HeLa cells were rinsed with HBSS without Fluo-4 three times,

with 15 min of incubation time allowing for the de-esterification.

Ca2+imaging was carried out using a spinning-disk confocal

imaging system (Leica DMI6000B microscope, a Yokogawa CSU-

X1 spinning disk unit), and the HeLa cells were maintained at

37◦Cand5%CO

2in the incubator attached to the microscope

system. The Fluo-4 intensity was recorded with an argon laser

at 488-nm excitation, and emission was recorded at 510 nm. To

obtain the time course of cytosolic Ca2+oscillation in the HeLa

cell population, the cells were imaged using a 63x oil objective in

the confocal imaging system with an Andor-IXonEM-CCD camera.

Basal level Ca2+was measured without any drug. The time of

Fluo-4 intensity was recorded (every 600 ms) before and after the

addition of the drug. Since we aim to have assay and analytics for

testing of multiple drugs, the duration of imaging was maintained

at 10 minutes.

GPCR targeting drug loading

Norepinephrine (Sigma, St. Louis, MO) reconstituted in HBSS was

used to activate the G-protein subunits at a concentration of 1, 10

and 100 μM. For drug treatment studies, the drug was added after

100sin10μL volume, and the Fluo-4 intensity was measured for

600 s after adding drugs. Time course of Fluo-4 intensity obtained

from time-lapse imaging was used for dose–response analysis.

Leica adaptive focus control was used to prevent the changes to

the plane of imaging (drifting) over time.

Data acquisition from time-lapse Ca2+imaging

The packing density of cells is significantly different in the case of

different regions present in the same tissue culture dish, and such

video-to-video variability cannot be avoided during high content

imaging studies. Various regions were chosen randomly, and HeLa

cell morphology was manually selected as a region of interest

(ROI). Furthermore, the dataset was grouped based on a similar

number of cells/areas and similar arrangement (manual labeling).

Specifically, the two groups were labeled as (i) higher packing

density and (ii) lower packing density for videos obtained from

each drug dose (1, 10 and 100 μM). Figure 2a shows the schematic

diagram for the experimental setup and data acquisition using

confocal microscopy. The data collection and grouping are pre-

sented in (Supplementary Fig. S1). Multi-tiff time-lapse files were

analyzed using Andor iQ software to obtain the time course of

fluorescent intensity of Fluo-4 for the entire duration of the Ca2+

spiking in single cells. The videos were taken for 10 minutes with a

frame rate of 1 image/s resulting in a file size of several gigabytes

(GB). HeLa cells were treated with different concentrations of

norepinephrine (1, 10 and 100 μM). Representative images depict-

ing the structural arrangement of cells from low and high packing

density regions are shown in Fig. 2b.

Data smoothing and baseline correction

An area with no fluorescence at 488 nm was considered as the

average background fluorescence. The background fluorescence

was subtracted from the average pixel fluorescence (at 488 nm)

of each ROI. In order to eliminate the effect of photobleaching, we

used an iterative average (IA) algorithm [42]. As shown in figure

(Supplementary Fig. S2a), ROI bordered in red color represents the

single cell. Supplementary Figure S2b shows the corresponding

time course of Fluo-4 intensity. The lower curve (red) is the

raw plot obtained from the ROI, while the black underneath

is the baseline retrieved from the intensity profile using an

IA algorithm. The dark blue curve in Supplementary Fig. S2b

shows the fluorescent intensity profile after baseline correction

of the same ROI. Next, the time course of Ca2+was denoised

using the method of an exponential moving average. ‘tsmovavg’

function in MATLAB was used to obtain the smoothened data

(Supplementary Fig. S2c). Since the time points at which the

fluorescence was captured (using 488 laser) were not identical

for each of the videos in the dataset, we performed interpolation

(‘interp1’ MATLAB version R2020a) to get the fluorescence at the

same set of time points across the whole dataset. The dataset

contains 756 cells and 339 time points.

Uniform manifold approximation and projection

UMAP is a very recent method used for nonlinear dimension

reduction, which searches for an accurate local structure and

incorporation of an improved global structure [24]. It is a fuzzy

topology-based method, and it has several advantages compared

with t-SNE [19,43]. The time-series data matrix (time course of

Fluo-4 intensity for all cells present in the dataset) was taken as

the high dimensional data as {x1,x2,...,xN|xi∈RM},andweaim

to identify the lower dimensional representation {y1,y2,...,yN|yi∈

Rk}, such that k=2. Like t-SNE, UMAP also constructs exponential

probability distribution in the high dimensional manifold as

pij =e−(d(xi,xj)−ρi)

σi,(1)

where d(xi,xj)is the distance between the ith and jth data points

and ρis the distance between ith data points and its first

nearest neighbor. One of the important differences of UMAP

compared with t-SNE is that the probability distribution is

4|Integrative Biology, 2023

Figure 1. Flow diagram for the proposed pipeline for selection of methods, cluster number, and analysis of live imaging data. Green box denotes the

data preprocessing, blue box denotes the dimension reduction methods, pink box denotes the implementation of density-based clustering methods,

and brown boxes indicate the validation methods including computation of SSDD an index for selecting optimal cluster, similarity ratio and

correlation coefficient. Redbox and arrow denote the initialization of 1000 matrices to identify optimal 2D embedding for t-SNE and UMAP. (SSDD =

shapes, Sizes, Densities and small separation Distances)

the local metric, which is unique for every pair of points. The

probability distribution in the lower dimension is given by

qij =1+ayi−yj2b−1,(2)

where aandbare constants. The other main difference between

UMAP from t-SNE is the loss function used to estimate the lower

dimension structure. In UMAP, cross-entropy () is used instead

of KL divergence, and CE is defined as

CE (X,Y)=

i

jpij(X)log pij (X)

qij(Y)+1−pij (X)log 1−pij(X)

1−qij(Y).

(3)

Mapping of structural arrangement of cells and collective calcium transients |5

Figure 2. Data acquisition for norepinephrine mediated Ca2+oscillations from regions of varied packing density of HeLa cells using confocal

microscopy. (A) Schematic diagram for various conditions including low-dose high-packing density, low-dose low-packing density, medium-dose

high-packing density, medium-dose low-packing density, high-dose high-packing density, and high-dose low-packing density. (B) Representative

images for detailed structural arrangement from high- and low-packing density with low, medium and high drug doses. (C) Heat map representation

of the entire Ca2+spiking dataset [for a duration of 600 s] obtained from six different conditions.

The CE function significantly improves the ability to preserve

the correlation between distances in the high and low dimensions

for both small and large distances. Implementation and further

details of UMAP can be found in McInnes et al. [24]. All the software

versions and parameters used for UMAP and t-SNE are provided

in Supplementary Table S1.

HDBSCAN clustering

In HDBSCAN, ‘min_cluster_size’ is the primary parameter that

affect the clustering. The plot of ‘min_cluster_size’ as a function

of cluster number, as shown in Supplementary Fig. S3, was used to

select cluster numbers. The major steps of HDBSCAN are briefly

summarized here [21,44].

Steps for HDBSCAN algorithm:

Step 1 Computation of the core distance with respect to a mini-

mum number of data points in a cluster (‘min_cluster_size’)

for all data objects in the dataset.

Step 2 Computation of minimum spanning tree (MST), with

mutual reachability distance between the sample points

as edge.

Step 3 Transform the MST into a hierarchical structure.

Step 4 Use the input parameter min_cluster_size to find the com-

pressed cluster tree.

Step 5 Finally, the density-adaptive clustering result is obtained

through a stability function.

Multiple run analysis

t-SNE and UMAP are both known to yield a distinctly different

solution for a number of the random initial condition [33]. In

this context, we propose to run the t-SNE and UMAP 1000 times

with different 2D initialization (Fig. 1). In order to get a robust

solution and clustering pattern, we obtained the distribution of

cluster numbers using 1000 different 2D initializations. Next, we

clustered the data using two clustering algorithms, DBSCAN and

HDBSCAN. The optimal cluster number was chosen based on the

model of the frequency distribution of cluster numbers from 1000

runs for DBSCAN and HDBSCAN. Since there are still multiple

structures present corresponding to the chosen cluster number,

furthermore, we performed a selection of an optimal 2D structure.

The selection was performed based on minimum SSDD from

the nruns with k(mode) clusters with the highest frequency

(Fig. 1).

Clustering performance evaluation

Since many of the cluster validation indices are not suitable for

non-spherical clusters, here, we used SSDD for validation of the

clustering results. SSDD cluster validation index is specifically

designed for hard clustering with irregular clustering results,

where the clusters are in arbitrary shapes, different sizes and

densities, and with small separation distances. Recently, Liang

et al. [32] developed a new cluster validation index SSDD based

on inner and inter-cluster validation measures. The SSDD index

is calculated as

SSDD(C)=

ciC

[α.DC (ci)+β.DR (ci)],(4)

where DC(ci)is the density changes along the backbone of

cluster ciand DR(ci)is the inter and inner cluster ratio of

cluster ci. The workflow for the calculation of SSDD is provided

in Supplementary Fig. S4. The details of the SSDD algorithm

were presented in its original paper [32]. The specific differ-

ence between the proposed pipeline and the state-of-the-art

approaches is presented in Supplementary Fig. S5.

6|Integrative Biology, 2023

Statistical analysis

The differences between the two treatment groups were com-

pared with a Kruskal–Walli’s test (MATLAB), P<0.05 was consid-

ered to indicate statistical significance. Data are presented as the

mean ±standard deviation (SD).

Mathematical model for Ca2+signaling

In order to understand the difference in mechanism underlying

Ca2+oscillations with low and higher packing density, we per-

formed the parameter estimation using a mathematical model

[45] that captures the norepinephrine mediated Ca2+oscillations

in HeLa cells. This particular model was developed to investigate

the mechanism underlying Gicoupled GPCR induced Ca2+oscil-

lation. Overall, the model has the 7 variables: (i) fast activated

Gβγ at the plasma membrane (βγfastPM ), (ii) slow activated Gβγ

at the plasma membrane (βγ slowPM ), (iii) fast Gβγ at the internal

membrane βγfastIM ), (iv) slow Gβγ at the internal membrane

(βγslowIM ), (v) cytosolic Ca2+,(vi)ERCa

2+, and (vii) [IP3] and 21

parameters. Detailed description of the model is presented in sup-

plementary section 1.7. In this model, it was assumed that IP3is in

a quasistationary state with respect to the concentration of active

[PLC −β],andtheCa

2+inflow from internal stores/endoplasmic

reticulum depends on [PLC −β]. Also, it was assumed that IP3

formation is directly proportional to [PLC −β],aswellasactive

[Gβγ]. Since many of the parameters are known to be constant

[45–47], we performed estimation of few parameters from the

single cell responses from cluster 10 [signature response from

low packing density] and from cluster 5 [signature response from

high packing density]. Specifically, we performed estimation of

parameters that are controlling the store Ca2+, which includes,

k2: rate constant for receptor desensitization (negative feedback

parameter), k3:rate constant for PLC- βactivation, k5:rate

constant for Ca2+influx from extracellular space to cytosol that is

modeled as a function PLC- β,and

k6:rateofCa

2+influx from ER

to cytosol. The cell numbers taken from each case were 10 and all

the cells correspond to treatment with 10 μM of Norepinephrine.

Estimation of kinetic parameters for each cell was performed

using genetic algorithm (GA) [48]. Detailed description of the

parameters is presented in supplementary section 1.7.

RESULTS

We developed a framework to cluster the single cell Ca2+

responses collected from the live imaging assay (Fig. 1). The cells

with Fluo-4 were treated with a various drug dose. Cells were

imaged in a glass-bottom dish from low and high packing density

followed by analysis to track and outline individual cell responses

over time (Fig. 2b). UMAP was used to reduce the dimensionality

of extracted time-series information. Furthermore, we used the

data for HDBSCAN clustering. In the following section, we will give

the details of developing the framework and selection of cluster

number. Furthermore, we present a proof-of-principal study in

profiling the heterogeneity in Ca2+responses for various stimulus

level and structural arrangement.

Confocal imaging of Ca2+spiking in single cells

for low and high packing density regions

To observe the Ca2+spiking behavior in HeLa cells, we performed

cytosolic Ca2+time-lapse imaging using spinning disk confocal

microscopy. Initially, we characterized the Ca2+spiking in HeLa

cells for the dose–response experiments carried out at three

different drug doses of norepinephrine (1, 10 and 100 μM) and for

two packing densities (Fig. 2b and c). Figure 3a and b shows the

time-lapse images corresponding to low and high packing density

for 100 μM dose. Next, we present the intensity mapping of cell

populations, which shows the differences in Ca2+spiking patterns

corresponding to low and high packing density when treated

with Norepinephrine (Fig. 3a and b). Supplementary videos 1 and

2show the time-lapse images of Fluo-4 intensity for low and high

packing density when treated with 100 μM Norepinephrine.

Cellular heterogeneity based on Ca2+spiking

pattern for various structural arrangements

between cells: distinct Ca2+oscillation profiles

from single cells

Next, we examined the spiking pattern and activity in cells chosen

from regions with two different packing densities corresponding

to three doses of norepinephrine (1, 10 and 100 μM). Figure 3c–h

shows the time course of Fluo-4 intensity for five cells randomly

chosen from each of the six cases mentioned above. The dose–

response study indicates that norepinephrine induces oscillatory

responses at a higher dose. Supplementary Figure S6 shows the

histogram analysis of three features: number of peaks,maximum

amplitude, and inter-spiking interval. The result shows that the

Ca2+spiking in the cell population is heterogeneous in lower and

higher doses. The results clearly show some differences between

low and high packing density regarding the number of peaks,

maximum amplitude, and ISI. However, it is rather challenging to

automate the quantification of the relative percentage of various

patterns, including high spiking, low spiking, high ISI between

spikes, low ISI between spikes, as well as low amplitude and high

amplitude Ca2+responses.

Dimension reduction framework for 2D

visualization of complex Ca2+oscillation patterns

To examine the overall shape of the multi-dimensional data set,

we implemented three different dimension reduction methods,

PCA, t-SNE and UMAP (Supplementary Fig. S7). The dataset

considered here contains the time-series information obtained

from multiple videos obtained with various drug doses. For

the purpose of visualization, reduction in 2D was applied for

each of the reduction schemes. Supplementary Figure S7 shows

the comparison of PCA, t-SNE and UMAP visualization of the

dataset containing single-cell Ca2+dynamics for low and high

packing density regions. Each color in the 2D plot shows the data

point from each condition (low-dose high-packing density, low-

dose low-packing density, medium-dose high-packing density,

medium-dose low-packing density, high-dose high-packing

density, and high-dose low-packing density). The result shows

that various types of spiking patterns are present among different

conditions. The PCA results from scatter plot show the overlap

between the different spiking pattern, but in general, the specific

spiking pattern should readily be distinguishable from one

another. Whereas scatter plot from t-SNE and UMAP shows

different clusters that indicates that non-linear dimensionality

reduction methods were able to segregate various spiking pattern

(Supplementary Fig. S7). Also, it can be seen that the points from

each condition remain in the close neighborhood in the 2D visual-

ization from both t-SNE and UMAP. At the same time, PCA shows

more intermixing, indicating that linear dimension reduction

may not be good for clustering, whereas UMAP and t-SNE show

efficient mapping. This experiment and analysis demonstrate

that the critical difference between data can be captured through

t-SNE or UMAP representation (Supplementary Fig. S7). Figure 4a

Mapping of structural arrangement of cells and collective calcium transients |7

Figure 3. Fluorescent Time-lapse images of Fluo-4 loaded HeLa cells excited at 480 nm and Spatial intensity mapping of Fluo-4 showing the

distribution of Ca2+/−response in HeLa cell population in presence of norepinephrine (a2-adrenergic receptor agonist). Representative time-lapse

images were collected from the videos captured using 63X oil objective in spinning-disk confocal microscopy for 600 s. (a) High packing density

fluorescent images at 100 μM and corresponding spatial intensity map, (b) low packing density fluorescent images at 100 μM and corresponding

spatial intensity map. Time course of Fluo-4 intensity obtained from single cells treated with a norepinephrine at 1, 10 and 100 μM shows cell to cell

variability in Ca2+/−response. From each case, five representative fluorescent traces from individual cells are shown. (c), (d), (e) show the time course

of Ca2+for low-packing density cells at three different doses, and (f), (g), (h) show the time course of Ca2+for high-packing density cells at three

different doses.

8|Integrative Biology, 2023

Figure 4. Dimension reduction and cluster number distribution across 1000 runs of t-SNE and UMAP. Selection of optimal cluster number was

performed using SSDD index. (a) Dimensionality reduction using UMAP of Ca2+responses from high packing (left) and low packing (right) density of

cells for various doses. (b) and (c) The cluster number distribution for t-SNE and UMAP from DBSCAN. Cluster number is chosen based on the value

with highest reproducibility (Mode). (Modet-SNE+DBSCAN =12, ModeUMAP+DBSCAN =10). (d) and (e) The cluster number distribution for t-SNE and UMAP

from HDBSCAN clustering algorithm (Modet-SNE +HDBSCAN =13, ModeUMAP+HDBSCAN =17). (f) Box plot representation of SSDD values for each

structure obtained after dimension reduction corresponding to the cluster numbers chosen from a, b, c and d (for t-SNE+DBSCAN, k=12, for

UMAP+DBSCAN, k=10, for t-SNE +HDBSCAN, k=13, and UMAP+HDBSCAN, k=17, respectively where k=cluster number).

shows UMAP representation of Ca2+spiking pattern from

different structural arrangement between cells and various drug

dose level.

Multiple run analysis for t-SNE and UMAP and

selection of cluster number for DBSCAN and

HDBSCAN

Since multiple runs of t-SNE and UMAP yielded significant varia-

tion in the 2D structures, we used randominitial conditions for the

2D projection for finding the optimal cluster number.We obtained

the reduced dimension by performing the t-SNE and UMAP 1000

times with different initial conditions followed by clustering using

various methods, including k-means, agglomerative, DBSCAN and

HDBSCAN.

Figure 4b and c shows the distribution of cluster numbers cor-

responding to each run of DBSCAN for constant ‘EPS’ and ‘MinPts’

from t-SNE and UMAP. For the DBSCAN analysis, we obtained

the cluster number corresponding to each run and selected the

optimal cluster number based on the model obtained from 1000

runs. Furthermore, we calculated the SSDD index corresponding

to this model to identify the 2D structure with the best clustering

performance metrics (Fig. 4f). The optimal cluster number for

Mapping of structural arrangement of cells and collective calcium transients |9

Figure 5. Comparison of different combination of dimensional reduction and clustering algorithms on Ca2+spiking dataset containing data for three

different drug doses and two different packing densities. (A) t-SNE projection and DBSCAN clustering (k=12), HDBSCAN clustering (k=13). (B) UMAP

projection and DBSCAN clustering (k=10). HDBSCAN clustering (k=17). Each color represents different cluster for the four cases.

t-SNE and UMAP embedded data for DBSCAN was found to be 12

and 10, respectively.

In order to choose the cluster number and filter out an

appropriate result from the numerous clustering results obtained

from HDBSCAN, we used three strategies. We selected the cluster

number such that they reside within the stable range of the

elbow structure (Supplementary Fig. S3d), having the maximum

frequency from the multiple runs of UMAP and minimum

clustering performance index (SSDD). First, we identified the

range of possible cluster numbers near the elbow structure [43]

through the determination of the connection between minimum

cluster size and cluster number. Since the major tuning parameter

of HDBSCAN is the minimum cluster size or ‘min_cluster_size,’

we considered all integer ‘min_cluster_size’ values in the range

[2–100] [34] . The trend of estimated cluster number against every

‘min_cluster_size’ value was shown in Supplementary Fig. S3,

and the trend fitted an exponential decay. It was noticed that for

the ‘min_cluster_size’ value from 9 to 14, and the cluster number

lies between 15 and 18. Although the cluster number remained

stable for ‘min_cluster_size’ =15, the cluster number remained

as low as four (lower cluster number leads to intermixing).

However, all the other tuning parameters of HDBSCAN were left

as default. The second strategy was to choose a cluster number

that is obtained using UMAP and t-SNE with high reproducibility.

Toward this, we obtained the cluster number corresponding to

1000 runs of t-SNE and HDBSCAN (Fig. 4d). Similarly, 1000 runs of

UMAP and HDBSCAN were performed, and it was found that 279

runs out of 1000 run provide 17 clusters (Fig. 4e).

Furthermore, for these 279 runs, the SSDD index was calculated

to identify the 2D structure with the best clustering performance

metrics. Figure 4f shows the boxplot representation of SSDD val-

ues corresponding to 17 clusters obtained from HDBSCAN. Specif-

ically, the cluster number was set to 17, in this case, since it well

resided in the optimal region of cluster number in the elbow

structure (Supplementary Fig. S3). One of the major benefits of

the chosen cluster number with high reproducibility is the reliable

performance for reducing the dimension of the original dataset

so that the distribution of the dataset can be observed in the 2D

space. These results show that the proposed scheme, as men-

tioned above, can achieve better reproducibility in selecting the

embedded structure and cluster number.

UMAP-assisted HDBSCAN modeling of collective

Ca2+responses improve the clustering efficiency

over existing methods: comparison of efficiency

across various techniques

To evaluate the performance of t-SNE and UMAP, we used

an unsupervised learning framework where the dimension-

reduced data were clustered using k-means, agglomerative

Supplementary Fig. S8 and Supplementary Fig. S9a and b,and

DBSCAN and HDBSCAN clustering (Fig. 5a and b). Figure 5a and b

shows the visualization performance of UMAP-assisted and t-

SNE-assisted clustering of the Ca2+spiking dataset. For each case,

the dataset was reduced to two dimensions, and a specific color

was assigned to a cluster. Since the ground truth for the dataset

was not known, the number of clusters was chosen based on

validation indices.

Next,we compared the various clustering techniques with t-

SNE and UMAP dimension reduction with different internal and

external validation techniques to determine which set of clusters

is optimal for approximating the underlying subgroups in the

dataset. Figures 6a and b and Supplementary Fig. S10 show the

heatmap of the Pearson-Correlation coefficient between pairwise

responses in each cluster for the four methods. Results show

that t-SNE and UMAP followed by HDBSCAN perform well in

separating the similar responses into each cluster. The red color in

Fig. 6a and b indicates that responses are strongly correlated, and

the dark blue color shows that responses are weakly correlated.

10 |Integrative Biology, 2023

Figure 6. Assessment of proposed framework using correlation analysis and similarity analysis. (A) Heatmap representation of pairwise Pearson

correlation coefficient between two cells present within each of the cluster (A) t-SNE projection followed by DBSCAN clustering (k=12) and HDBSCAN

clustering (k=13). (B) UMAP projection followed by DBSCAN clustering (k=10) and HDBSCAN clustering (k=17). Similarity ratio analysis between any

pairwise cell (cell iand cell j). Each entry into the heatmap shows a similarity ratio of the cell pair that consists of cell iand cell j(i=1, 2. . .., n,j=1,

2....n) obtained from 1000 runs. (C) UMAP+HDBSCAN (D) t-SNE HDBSCAN.

Although the reduction ratio is 2/339, the correlation analysis

indicates that the time series data from each cluster are similar.

It was noticed that for t-SNE and UMAP embedding and DBSCAN,

a large cluster corresponds to the data containing noisy data and

single spiking data Fig. 6a and b.

Table 1 shows the performance evaluation of various clustering

algorithms assisted by t-SNE and UMAP using different cluster

validation indexes. Although UMAP and k-means has the

lowest DB index, noise separation was not efficient in this

case (Supplementary Fig. S10). Furthermore, k-means leads to

Mapping of structural arrangement of cells and collective calcium transients |11

Tab le 1. Comparison of various clustering methods using different validation indices.

Case /Method SSDD CVD CNN WB SIL CH DB SDBW

t-SNE

Agglomerative

0.416 21.338 96.088 0.588 0.597 1408.702 0.630 0.038

t-SNE k-means 0.458 15.437 93.645 0.645 0.663 1302.229 0.636 0.058

t-SNE DBSCAN 0.331 4.015 130.226 3.274 0.064 247.878 5.814 0.126

t-SNE HDBSCAN 0.321 7.621 113.503 8.894 0.050 90.505 1.177 0.084

UMAP

Agglomerative

0.430 88.146 20.402 0.329 0.702 2432.220 0.474 0.020

UMAP k-means 0.436 100.892 20.327 0.283 0.769 2819.391 0.470 0.018

UMAP DBSCAN 0.323 1.464 21.532 1.186 0.117 698.894 3.612 0.051

UMAP HDBSCAN 0.319 44.257 25.919 1.358 0.517 578.176 0.963 0.037

intermixing of various types within the same cluster, as shown

in (Supplementary Fig. S10). This is happening since these indices

are not appropriate for evaluating irregularly shaped clusters

[32]. Hence, we performed the comparison based on SSDD

computation, as shown in(Supplementary Fig. S4. Notably, UMAP

and HDBSCAN performed best in separating different types of

Ca2+spiking responses based on SSDD. The result shows that the

SSDD value for t-SNE and HDBSCAN is very close to UMAP and

HDBSCAN. Hence, t-SNE can also be used to obtain an efficient

separation. However, the run time for t-SNE was found to be

significantly higher than UMAP (Supplementary Fig. S11). Also,

while performing HDBSCAN, UMAP shows a lower run time

compared with t-SNE. Hence, it can be concluded that UMAP

and HDBSCAN can be used as the optimal method for the given

dataset on Ca2+responses containing noise.

Testing of reproducibility for t-SNE and UMAP

In order to assess whether the same cells are going to the same

clusters in the 279 runs of UMAP and HDBSCAN (corresponding

to cluster number =17), we recorded whether a pair of cells (i,j)

are in the similar clusters in a given run. Then, we calculated the

ratio for each such pair of cell responses to check whether they

are in the same cluster across all runs. This ratio was denoted

as the similarity ratio [33]. We constructed the similarity ratio

matrix using the data from multiple run analysis for t-SNE and

UMAP for HDBSCAN (Fig. 1). Figure 6c and d shows the heatmap

of similarity ratios for HDBSCAN clustering of the 2D projection

obtained by UMAP and t-SNE. Results show whether the similar

responses are going into the same cluster in various runs. A high

similarity ratio indicates that two responses were consistently

grouped into the same cluster, and a low ratio indicates that two

responses are grouped in a different cluster. Each value in the

heatmap shows a similarity ratio of the corresponding column

response and row response (cell iand cell j). The red color

shows a higher similarity ratio; dark blue color indicates a low

similarity ratio. Results show that the 17 groups can be identified

where pairwise similarity ratios within the clusters are quite high.

Similarity ratio results also show that UMAP and HDBSCAN are

more robust methods for this dataset. Supplementary Figure S12

shows the heatmap representation of the Ca2+spiking pattern in

each cluster with cluster label for HDBSCAN clustering with t-SNE

and UMAP assisted dimension reduction.

Automated identification of Ca2+spiking patterns

for diverse structural arrangement by HDBSCAN

The final step is visualizing the spiking patterns for various

dataset corresponding to diverse structural arrangements

through the labels of 17 clusters (noise +16 clusters) obtained

through the HDBSCAN algorithm (Fig. 7a). We found that there are

17 clusters that are formed for most of the UMAP and HDBSCAN

structures out of 1000 runs (See Fig. 4). Figure 7a shows the

16 spiking patterns from each cluster. Each cluster represents

distinct types of spiking patterns present in the dataset and can

be described as follows: clusters 4, 5, 8, 9, 15 and 17 are with a

high number of peaks with low, medium and high amplitude, and

clusters 7, 13, 14, 11 and 12 with a low number of peaks with low,

medium and high amplitude (Supplementary Table S4). Cluster

2 shows a plateau-like spiking pattern: clusters 4 and 7 show a

spiking pattern of higher amplitude with distinct ISI. Similarly,

clusters 6 and 10 show a spiking pattern of low amplitude with

a distinct ISI pattern. Supplementary Table S4 shows that UMAP

and HDBSCAN can be used for the segregation of patterns with

various amplitude and ISI. Also, the noise (cluster 1) separated by

HDBSCAN is shown in Supplementary Fig. S13.Figure 7b shows

the mapping back of spiking pattern from cluster 4, 5 and 7 that

corresponds to specific structural arrangements for the cases

of higher packing density. In contrast, the spiking pattern from

cluster 10 and 11 corresponds to low packing density. We found

significant separation between various structural arrangements.

These observations collectively strengthen the link between cell-

to-cell connectivity and duration of oscillation.

Imaging and machine learning framework

indicates the existence of distinct clusters with

statistical similarity between single cell

responses

Furthermore, we compared the statistical similarity between the

Ca2+spiking within a cluster for UMAP and HDBSCAN cases. In

order to do this, we first fitted the spiking data with different

probability distributions, including Normal, Gamma, Exponential,

Log-Normal, Weibull and Birnbaum and Saunders (B-S) distri-

bution. Furthermore, we used AIC for selecting the best distri-

bution fitting for the data. Supplementary Figure S16 shows the

heatmap of AIC values of various distributions fitted to spiking

data from each cluster. Results show that, in most of the clusters,

all the Ca2+responses follow the unique distribution that is

lognormal. However, in clusters 1, 16 and 17, the spiking pattern

for few responses follows the normal distribution, whereas, for

clusters 1, 2, 8, 15, 16 and 17, the data follow the BS distribution

(Supplementary Fig. S17)andSupplementary Table S2. Further-

more, we compared the parameters mean (μ) and standard devia-

tion (σ) of lognormal distribution for each cluster. Figure 8a and b

shows the boxplot representation of mean (μ) and standard devi-

ation (σ) for 17 clusters obtained from UMAP and HDBSCAN. Next,

we performed the Kruskal–Walli’s analysis for pairwise compar-

isons between the mean (μ) and standard deviation (σ) of each

cluster. Figure 8c and d shows the heatmap of pij-values obtained

from Kruskal–Wallis analysis, where pijdenotes the P-values (with

12 |Integrative Biology, 2023

Figure 7. Mapping of Ca2+response corresponding to specific structural arrangement of cells from UMAP and HDBSCAN. (a) Represents time course of

Fluo-4 intensity obtained from single cells present in each cluster (k=2, 3. . .17). (b) Significance of few clusters in terms of structural arrangement.

Cluster 4, 5 and 7 are mainly present in high packing density cells regions, whereas cluster, 10 and 11 are mainly from lower packing density cells

regions.

α=0.05) corresponding to the comparison of the ith cluster with jth

cluster, [i=1,2 .....17] and [j=1, 2, .....17]. The results show that

for most of the clusters, mean (μ) and standard deviation (σ)are

significantly different. However, for few pairs, including clusters

(4, 5), (4, 7), (9,14), (8,2) and (7,15), there is no signif icant difference

between both mean (μ) and standard deviation (σ) obtained by

fitting lognormal distribution (Fig. 8c and d). However,for clusters

2, 7 and 15, we found that some of the data are best fitted

with BS distribution (Supplementary Fig. S17), and hence, they

are separated as another cluster. Moreover, 4 and 5 are different

with respect to ISI, whereas 4 and 7 are different with respect to

amplitude. Similarly, it was found that both (9, 14) and (7, 15) are

different with respect to the pattern in ISI. Note that, although

8 and 2 have similar mean and standard deviation (σ)froma

lognormal distribution, the pattern of Ca2+response is distinctly

different for cluster 2. Here, the result shows that the data are

automatically grouped into clusters that are different with respect

to the statistical similarity and ISI and amplitude. These observa-

tions show that each cluster may attain oscillation characteris-

tics specific to certain biophysical parameters associated with it.

Mapping of structural arrangement of cells and collective calcium transients |13

Figure 8. Box plot representation of parameters mean(μ) and standard deviation (σ) obtained by fitting lognormal distribution to ca2+ response from

each cell in 17 clusters obtained by UMAP and HDBSCAN. (A) Comparison of mean (μ) across 17 clusters and (B) comparison of standard deviation (σ)

across 17 clusters. from each cluster. Heatmap representation of P-values (p_ij) obtained from pairwise Kruskal–Wallis analysis between ith and jth

cluster [i=1,2,3. . .0.17, j=1,2,3. . .0.17]. (C) P-value heat map for mean (μ). (D) P-value heatmap for standard deviation (σ) (white color represents

P<0.05-significant difference between mean (μ) and standard deviation (σ)ofith and jth cluster), red color shows P>0.05—no significant difference

between clusters.

Such parameters may include the kinetic parameters controlling

Ca2+channel function, gap junction mediated diffusion and the

process of receptor desensitization. Additionally, the transition

dynamics within a cluster may follow a parameter coming from

the same distribution.

Validation of the machine learning method with

labeled data

Furthermore, we validated the proposed method using labeled

time series dataset (Mallet dataset [49], Supplementary Fig. S18).

The proposed method is able to segregate the given data set with

more than 94% accuracy (Supplementary Table S5) for 5 classes

out of 8, corresponding time course representation is shown in

Supplementary Fig. S19.

Two-step machine learning approach indicates a

significant increase in the subpopulation

containing longer duration of oscillation in

higher packing density region

This algorithm was then used to compare the response pattern

in cell population that was exposed to various drug doses for

various structural arrangement that arises due to the natural self-

assembly process after cell seeding. Here, we chose to investigate

the effect of structural arrangement between cells through low

and high packing density regions of the petri-dishes. Figure 9a

shows that cluster 5 and 7 remains significantly high in case of

cell-responses corresponding to high-density region. On the other

hand, the percentage of cluster 10 and 11 are significantly lower in

case of high packing density region. In contrast, Figure 9b shows

that cluster 8 and 9 are significantly higher in case of highest

dose. Figure 9c and d shows the heap map representation of the

signature patterns for higher packing density (cluster 5 and 6)

and higher dose (cluster 8 and 9). The signatures of low packing

density and low dose responses are also shown in Fig. 9c and d.

Overall, the analysis revealed a significant increase in the subpop-

ulation containing longer duration of oscillation corresponding to

higher packing density.

Further we evaluated the trajectory features of cells from

clusters 5 and 10 (Fig. 9e and f). The results reveal a clear

distinction between features from low (cluster 10) and high

(cluster 5) packing density. We also used pie chart representation

to denote the relative distribution of various spiking patterns

corresponding to low and high packing density for 100 μM

(Supplementary Fig. S20). This shows the relative percentage

of various clusters present in the case of lower and higher

packing densities. Supplementary Table S3 shows that various

cases with lower and higher cell to cell connectivity yield distinct

relative distribution of clusters. In the case of low packing density

and low dose, it mainly contains cluster 1 and 16, with lower

amplitude and frequency. In contrast, the medium and high dose

in higher packing density contains clusters 4 and 7, including

spiking with higher frequency and amplitudes. Additionally, the

algorithm is able to distinguish cluster 2 that contains cells with

a plateau-like pattern present only in high dose and high packing

density. This result shows that the proposed pipeline is able to

quantify the relative amount of clusters the complex oscillatory

patterns and can be used to identify the biophysical parameters

corresponding to various levels of structural arrangement and

cell-to-cell connectivity. These observations strengthen the link

14 |Integrative Biology, 2023

Figure 9. Automated detection of Ca2+spiking features for various structural arrangement/ packing density and drug doses and identification of

relative distribution of clusters corresponding to these factors. (A) Bar graph representation of average relative percentages of four clusters (5,7,10

and 11) for low and high packing density. (B) Bar graph representation of average relative percentages of four clusters (8, 9, 1 and 16) for three different

doses (1, 10 and 100 μM). (C) Heatmap representing the time course of Ca2+spiking pattern from the four clusters (5, 7, 10 and 11) in (A). (D) Heatmap

representing the time course of Ca2+spiking pattern from the four clusters (8, 9, 1 and 16) in (B). (E) Example of single-cell Ca2+data with

representative features of the basal value (F0), time to reach half maximum (T50U), time to reach maximum (Tm), maximum value (Fm), time to decay

to half maximum (T50D), and steady-state final value (Ff). (F) Bar graph representation of time-series features of Ca2+signals in the cell population

from high packing density (from cluster 5) and low packing density (cluster 10).

Mapping of structural arrangement of cells and collective calcium transients |15

between stronger communication between cells and sustained

oscillation.

Sustained Ca2+oscillation in higher packing

density: Mechanistic insight using UMAP and

HDBSCAN-assisted model fitting

In order to obtain mechanistic insight underlying the sustained

oscillation in higher packing density, we performed clustering-

based model fitting. First, we computed the difference in bio-

physical parameter values between the Ca2+responses from var-

ious clusters obtained from UMAP and HDBSCAN. Specifically,

we chose to fit the model presented in Giri et al. [45], for Ca2+

responses from cluster 5 and cluster 10 (Fig. 9a and f). Cluster

5 consists of responses from high packing density and cluster

10 consists of responses from reduced packing density. First,

we found that the receptor desensitization parameter, k2that

is regulated by MAP kinase expression [50,51], is significantly

lower for cells from higher packing density (cluster 5) than that

of lower density (cluster 10) (Fig. 10a). On the other hand, the rate

of [PLC −β]activation (k3) remains significantly higher for higher

packing density (cluster 5) than lower packing density (cluster 10)

(Fig. 10b). In contrast, k5, which denotes the rate constant for Ca2+

influx from extracellular space to cytosol, is not signif icantly dif-

ferent for responses arising from higher and lower packing density

(Fig. 10c). However, the result shows that k5has a much higher

variance in case of cells from higher packing density compared

with that of cells from lower density (Fig. 10c). Figure 10d shows a

box plot comparison of the parameters controlling the Ca2+stores

(k6,rateofCa

2+influx from ER to cytosol) in clusters 5 and 10.

The result clearly shows that the rate constant for Ca2+influx

from ER to cytosol (k6) remains significantly lower (P<0.05) for

cells in case of higher packing density yielding an increase in store

Ca2+(P<0.05) (Supplementary Fig. S23) compared with cells from

low packing density. Hence, it can be concluded that there are two

cell states that are defined by distinct underlying mechanisms as

shown in Fig. 11a and b. The results indicate that the cells from

high packing density are not only associated with higher level of

store Ca2+but also associated with weak receptor desensitization

and higher variability in Ca2+influx from extra-cellular space.

This may contribute to the oscillations with longer duration.

Figure 12 shows the box plot representation of features

(number of peaks, maximum amplitude and duration of Ca2+

oscillations in single cells) extracted from responses in cluster

5 and 10 from experiments (Fig. 12a, b and c) and simulation

(Fig. 12d, e and f). The findings demonstrate that the estimated

parameters for the mathematical model are able to capture

the specific features of the Ca2+oscillation from experimental

measurements for cluster 5 and 10. Also, the specific parameters

obtained from single cells through the fitting process show

significant differences between high (cluster 5) and low packing

(cluster 10) densities (Fig. 10). We further identified the structural

arrangement of cells that causes the particular patterns from

cluster 5 and 10 (Fig. 12g and h). Together, we demonstrate

that cluster-based model fitting can be used for forming

new hypothesis for future experiments and deciphering the

mechanism underlying various Ca2+spiking pattern obtained

from distinct structural arrangement of cells.

DISCUSSIONS

In the field of Ca2+imaging, there are several computational

toolboxes available for automation in cell segmentation, motion

correction and Ca2+activity identification from the time-lapse

videos [52–57]. Most of this workflow includes Python or

MATLAB framework for image processing and signal process-

ing. In contrast, there are less investigations in developing

tools for clustering complex oscillations obtained by treat-

ment with drugs. The major challenge is to develop and

validate a framework that would be scalable and reliable

for identification of cells with similar Ca2+oscillation pat-

terns. Specifically, the heterogeneity is remarkably high due

to uneven distribution of drugs and variability in packing

density when a large number of petri-dish or 96 well plate is

used.

In this paper, we demonstrate that the proposed tool is able

to separate the noise from a larger dataset and group the time

series patterns with similar features. Here, we apply the UMAP-

assisted HDBSCAN clustering to the Ca2+response dataset from

three doses that display seventeen distinct clusters.

Furthermore, we provide a new perspective in finding the effect

of specific structural arrangement in controlling collective Ca2+

response. This was possible through automated detection of Ca2+

signature by UMAP-assisted HDBSCAN clustering that are specific

to structural features or packing density. Overall, the results indi-

cate that the cells can sense and respond to the drug according

to the self-assembly pattern in cell population via collective Ca2+

signaling. The physical explanation of these clusters could be fur-

ther analyzed by experimental investigations depicting the gap-

junctional activity at high packing density regions [11–13,58]. On

the other hand, the features of the Ca2+signature corresponding

to low packing density can be attributed to paracrine signaling

[38,59]. The 2D visualization of UMAP-assisted HDBSCAN clus-

tering of the dataset forms distinct clusters to amplitude, ISI and

frequency (Supplementary Table S4).

Commonly used techniques include functional clustering,

which is time-consuming with larger sample size. Although k-

means and fuzzy clustering have been implemented along with

feature extraction [4,5,17], there can be significant intermixing

between patterns. Hence, we show a detailed comparison between

various clustering techniques after performing dimension

reduction using t-SNE and UMAP (Supplementary Figs S7,5and

S9). The commonly used cluster validity index [28] works for

the partitioning of clusters that are spherically distributed with

similar sizes and densities and are separated by distances. In

order to evaluate the relative performance of the clustering of

Ca2+spiking trains, which is an unsupervised problem, we have

used the SSDD index [32]. This index is based on the computation

of the MST across the points within a cluster.

The proposed pipeline has the potential to be used for selection

of the best suited method for any dataset on Ca2+imaging. The

results show that overall, the performance of UMAP is better

than t-SNE for the dataset investigated in this work. Also, it was

found that HDBSCAN show an improvement over other clustering

algorithm. In previous bioinformatics studies, UMAP is found to

be superior to t-SNE and PCA [43,60]. However, most of these

biological datasets present in previous studies are static datasets.

One of the major novelties of the proposed pipeline is to show

that the algorithm works efficiently on oscillatory responses with

large variability in spiking patterns. Moreover, it has also been

found that UMAP requires less time for computation compared

with t-SNE in the given dataset as found by others [24,43,60,61].

In previous work, clustering was done based on feature extracted

data [4,5,17,62], but those methods were based on the measure-

ment of a number of peaks and amplitude. However, the major

bottleneck in these methods lies in the fact that even though

16 |Integrative Biology, 2023

Figure 10. Mechanistic insight on cell states from various clusters having distinct Ca2+oscillation pattern and illustrations of the differences in

parameters as manifested in Ca2+spiking patterns. Distinct parameter distributions underlying Cluster 10 (low packing density, treated with 10 μMof

norepinephrine) and cluster 5 (high packing density, treated with 10 μM of norepinephrine) indicates two distinct cell states. Box plot comparison of

four parameters (A) k2: Rate constant for receptor desensitization (Negative feedback parameter) (B) k3: Rate constant for PLC- βactivation (C) k5:

Rate constant for Ca2+influx from extracellular space to cytosol (that is modeled as a function [PLC-β]) (Kummer et al. [46]) (D) k6: Rate of Ca2+inf lux

from ER to cytosol, for cluster 5 and 10, respectively. The kinetic parameters are obtained from the fit results of Ca2+responses from cluster 5 and 10.

[P<0.05∗], Mann–Whitney test.

the number of peaks is the same in various spiking trains, their

response pattern can be distinctly different. Those methods fail to

separate the similar pattern into groups (Supplementary Figs S21

and S22). Through the proposed framework of dimension reduc-

tion and clustering, this paper provides a method for extracting

the types of spiking for various drug doses with distinct structural

arrangement of cells, which can be used a better visualization

tool. The method enables detection of trajectory features with

variations in amplitude, variation in ISI and damping pattern.

From the clustering results, we found that more than 80%

Ca2+responses from cluster 5 come from high packing density,

whereas the responses from cluster 10 come from lower packing

density. In order to obtain the mechanistic insight on the cell

states from cluster 5 and cluster 10, we performed fitting of a

network structure underlying GPCR mediated signaling network

[45](Fig. 11a and b). The analysis provides insight on the factors

that causes sustained oscillation during higher packing density.

The analysis predicts that the packing density leads to differential

intrinsic parameters of the cells including the rate constant of

various Ca2+channel activity and parameter regulating receptor

desensitization.

Although we need to perform further experiments to find out

the cause of difference in these parameters corresponding to

various packing density, our analysis matches with the experi-

mental results from previous work on HeLa cells seeded at dif-

ferent density [14]. They have shown that when the cells were

packed dense, the basal activity of the mitogen-activated protein

(MAP) kinase and Ca2+store content undergoes an augmentation

Mapping of structural arrangement of cells and collective calcium transients |17

Figure 11. Schematic diagram of signaling pathways for various cell

states (A) from cluster 10 (low packing density) and (B) from cluster 5

(high packing density).

[14]. Since higher MAPK expression leads to higher level of ERK

and GRK that inhibits the receptor desensitization process [50,

51], the receptor desensitization level denoted by k2 might be

lower for higher packing density as found from the proposed

analysis. Similarly, the cluster-wise parameter estimation predicts

an increase in Ca2+store as found experimentally [14]. On the

other hand, the rate constant for Ca2+inf lux from extracellu-

lar medium may be highly variable in case of higher density

due to variation in gap-junction mediated IP3 diffusion from the

adjacent cells through proteins such as connexin [11,63–65].

This leads to higher level stochasticity in k5 in case of higher

packing density compared with lower packing density, as found

from cluster-wise model fitting. Without clustering, the estima-

tion of these parameters remains challenging due to significant

variability in oscillation pattern. The proposed method enables an

efficient clustering of complex oscillation using an unsupervised

framework followed by parameter estimation specific to a cluster.

Such analysis provided insight into a particular cell state based

on their structural arrangement and neighborhood properties.

We also extracted features the Ca2+responses for two signature

clusters from higher and lower packing density and found that

there is significant difference in number of peaks, maximum

amplitude, oscillation period and store Ca2+. In this study, we

demonstrate that cluster 5 with higher frequency and amplitude

implicates higher store content in Ca2+compared with cluster

10 (Supplementary Fig. S23). Hence,HDBSCAN modeling of Ca2+

spiking dataset can assist in the estimation of parameters specific

to each cluster during construction of biophysical model [66]that

explains the regulation of oscillation [10,37].

Our proposed framework can also be implemented for analysis

of other complex time course transients in biology that exhibit

similar complex behavior such as Ca2+transients, viz. NF-

kappa-β,ERK,etc.[

67](Supplementary Fig. S24). Similar to

Ca2+dataset, the result shows that UMAP provides a superior

segregation of patterns compared with t-SNE for ERK dataset

(Supplementary Fig. S24). In general, the framework enables

grouping of similar responses from a complex dataset consisting

of heterogeneous and complex transients. Such a dataset may

be obtained from various drug doses and packing density,

which is a typical outcome from most of the experiments

for biological studies in petri dishes. Additionally, our current

framework presenting cluster-wise parameter selection can be

used for identifying the cell states and differentiating between

the signaling pathways for various cell states corresponding to

distinct structural arrangements.

Such a clustering framework for analysis of Ca2+, NF-kappa-

βand ERK transients might also be important in deciphering

the intrinsic and extrinsic noise cluster-wise. Specifically, param-

eter estimation in individual clusters may provide insight into

extrinsic noise and cell state. Because the information trans-

mission capacity is dependent on intrinsic as well as extrinsic

noise [67], the technique may further be useful in cluster-wise

characterization of such capacity. Additionally, it can be relevant

in fine-grained estimation of information transmission capacity

in connection to structural (as well as possible spatial) mapping

of cells. It can be further speculated that cluster-wise characteri-

zation of said capacity may elicit insight into the spatiotemporal

heterogeneity and hence high-level functionality.

In this work, the different cell states based on packing density

is interpreted as a case of spatial heterogeneity driven by

deterministic mechanisms. A similar observation was reported

depicting temporal heterogeneity of Ca2+oscillations in previous

studies [68]. One of the major advantages of such deterministic

algorithms underlying mammalian signaling pathway is to have

a specific mechanism underlying cell states that lead to distinct

cell fate and help cells to take particular decision [68]. It has

been shown that higher level of stimulus induces a plateau

profile of Ca2+dynamics along with sequestration of Ca2+in

sub-cellular parts that lead to cell blebbing and lytic cell death

[9]. In future, range of parameter set can be obtained using

advanced estimation framework [66] for all the 17 clusters

found by HDBSCAN modeling. The deterministic mechanism

for each cluster may shed light in understanding a range of

cell states and responses specific to structural arrangement

and stimulus level. Specifically, the specific properties of

Ca2+spiking train may control the norepinephrine mediated

18 |Integrative Biology, 2023

Figure 12. Comparison of oscillation pattern of cytosolic Ca2+from experiment and simulation using UMAP and HDBSCAN assisted model fitting.

Boxplot shows the comparison of extracted features for various cell states from experiment and simulation (A) number of peaks in cluster 5 and 10

from experiment. (B) Maximum amplitude in cluster 5 and 10 from experiment, (C) oscillations period in cluster 5 and 10 from experiment, (D)

number of peaks in cluster 5 and 10 from simulation, (E) maximum amplitude in cluster 5 and 10 from simulation and (F) oscillations period in cluster

5 and 10 from simulation. Distinct features of Ca2+oscillations were found for low and high packing density. Structural arrangement of the cells

regulates the features of Ca2+oscillation pattern. (H) and (G) Packing density and cell morphology specific to cluster number 5 and 10, respectively.

[n=number of peaks of current cell, n_max =maximum of number of peaks, n_0 =minimum of number of peaks, a=maximum amplitude of the

current cell, a_max =maximum of maximum amplitudes, a_0=minimum of maximum amplitudes, t=oscillation time period of current cell

t_max =maximum of oscillation time periods, t_0 =minimum of oscillation time periods]. [P<0.05 ∗], Mann–Whitney test.

change in cell contractility level [69] which can be further

investigated.

The proposed computation tool can be further validated with

a large dataset containing thousands of neurons on cytosolic-

Ca2+measured in animal models using genetic sensors and

two-photon microscopy [70]. Furthermore, to achieve the fully

automated video analysis, an image processing module for the

segmentation of cells at different time frames can be coupled with

proposed dimension reduction and clustering. Such a framework

assumes importance in automated analysis of fluorescent images

during high content drug screening, dose selection and toxicity

mapping.

Acknowledgements

We also like to thank Dr Kishalay Mitra for his valuable sugges-

tions. We also thank Dr Gautam Narasimhan for allowing us to

conduct the confocal imaging experiments. The authors acknowl-

edge the research facilities provided by the Indian Institute of

Technology Hyderabad, India and Ministry of Education for the

fellowship support for Suman Gare.

Funding

This work was supported by DBT (BT/PR16582/BID/7/667/2016),

Department of Science and Technology (MSC/2020/000592).

Mapping of structural arrangement of cells and collective calcium transients |19

Declaration of competing interest

The authors declare no conflict of interest.

Data availability

The authors confirm that the data supporting the findings of this

study are available within the article [and/or] its supplementary

materials.

References

1. Martinez NJ, Titus SA, Wagner AK et al. High-throughput f luo-

rescence imaging approaches for drug discovery using in vitro

and in vivo three-dimensional models. Expert Opin Drug Discov

2015;10:1347–61.

2. Berridge MJ, Lipp P, Bootman MD. The versatility and universal-

ity of calcium signalling. Nat Rev Mol Cell Biol 2000;1:11.

3. Seshadri S, Hoeppner DJ, Tajinda K. Calcium imaging in drug

discovery for psychiatric disorders. Front Psych 2020;11:713.

4. Swain S, Gupta RK, Ratnayake K et al. Confocal imaging and k-

means clustering of GABA(B) and mGluR mediated modulation

of ca(2+) spiking in hippocampal neurons. ACS C hem Nerosc i

2018;9:3094–107.

5. Gupta RK, Swain S, Kankanamge D et al. Comparison of calcium

dynamics and specific features for G protein–coupled receptor–

targeting drugs using live cell imaging and automated analysis.

SLAS Discovery 2017;22:848–58.

6. Nash MS, Young KW, Challiss RAJ et al. Intracellular signalling:

receptor-specific messenger oscillations. Nature 2001;413:381.

7. Bao XR, Fraser IDC, Wall EA et al. Variability in G-protein-

coupled signaling studied with microfluidic devices. Biophys J

2010;99:2414–22.

8. Dhyani V, Gare S, Gupta RK et al. GPCR mediated con-

trol of calcium dynamics: a systems perspective. Cell Signal

2020;74:109717.

9. Manohar K, Gare S, Chel S et al. Quantitative confocal

microscopy for grouping of dose-response data: decipher-

ing calcium sequestration and subsequent cell death in

the presence of excess norepinephrine. SLAS Technol 2021;26:

24726303211019390.

10. Sun J, Hoying JB, Deymier PA et al. Cellular architecture regulates

collective calcium Signaling and cell contractility. PLoS Comput

Biol 2016;12:e1004955.

11. Lin GC, Rurangirwa JK, Koval M et al. Gap junctional com-

munication modulates agonist-induced calcium oscillations in

transfected HeLa cells. J Cell Sci 2004;117:881–7.

12. Balaji R, Bielmeier C, Harz H et al. Calcium spikes, waves

and oscillations in a large, patterned epithelial tissue. Sci Rep

2017;7:42786.

13. Potter GD, Byrd TA, Mugler A et al. Communication shapes

sensory response in multicellular networks. Proc Natl Acad Sci

2016;113:10334–9.

14. Morita M, Nakane A, Fujii Y et al. High cell density upregu-

lates calcium oscillation by increasing calcium store content

via basal mitogen-activated protein kinase activity. PLoS One

2015;10:e0137610.

15. Feldt Muldoon S, Soltesz I, Cossart R. Spatially clustered neu-

ronal assemblies comprise the microstructure of synchrony in

chronically epileptic networks. Proc Natl Acad Sci U S A 2013;110:

3567–72.

16. Feldt S, Waddell J, Hetrick VL et al. Functional clustering algo-

rithm for the analysis of dynamic network data. Phys Rev E Stat

Nonlin Soft Matter Phys 2009;79:56104.

17. Pantula PD, Miriyala SS,Mitra K. An evolutionary neuro-fuzzy C-

means clustering technique. Eng Appl Artif Intel 2020;89:103435.

18. Ester M, Kriegel HP, Sander J et al. A density-based algorithm

for discovering clusters in large spatial databases with noise.

In: Proceedings of the Second International Conference on Knowledge

Discovery and Data Mining. KDD’96. United States: AAAI Press,

1996, 226–31.

19. Gare S, Chel S, Kuruba M et al. Dimension reduction and clus-

tering of single cell calcium spiking: comparison of t-SNE and

UMAP. In: 2021 National Conference on Communications (NCC).

United States: IEEE, 2021.

20. Sharafoddini A, Dubin JA, Lee J. Identifying subpopulations of

septic patients: a temporal data-driven approach. Comput Biol

Med 2021;130:1–16.

21. Campello RJGB, Moulavi D, Sander J. Density-Based Cluster-

ing Based on Hierarchical Density Estimates. Berlin, Heidelberg,

Springer, 2013.

22. Chel S, Gare S, Giri L. Detection of Specific Templates in Calcium

Spiking in HeLa Cells Using Hierarchical DBSCAN: Clustering

and Visualization of CellDrug Interaction at Multiple Doses∗.In:

2020 42nd Annual International Conference of the IEEE Engineering

in Medicine & Biology Society (EMBC), Institute of Electrical and

Electronics Engineers Inc., United states, 2020, 2425–8.

23. Hinton L   MG. Visualizing data using t-SNE. J Mach Learn Res

2008;9:2579–605.

24. McInnes L, Healy J, Melville J. UMAP: Uniform Manifold Approxima-

tion and Projection for Dimension Reduction. Institute of Electrical

and Electronics Engineers Inc., United states, Published online

February 9, 2018.

25. Hotelling H. Analysis of a complex of statistical variables into

principal components. JEducPsychol1933;24:417–41.

26. Daniels AL, Calderon CP, Randolph TW. Machine learning and

statistical analyses for extracting and characterizing “finger-

prints” of antibody aggregation at container interfaces from flow

microscopy images. Biotechnol Bioeng 2020;117:3322–35.

27. Saxena A, Ravutla S, Upadhyay V et al. Statistical modeling

of cell-to-cell variability in viral infection during passaging in

suspension cell culture: application in Monte-Carlo simulation.

Biotechnol Bioeng 2020;117:1483–501.

28. Al-jabery KK, Obafemi-Ajayi T, Olbricht GR et al. Evaluation of

cluster validation metrics. In: Computational Learning Approaches

to Data Analytics in Biomedical Applications. United States: Elsevier,

2020, 189–208.

29. Zhao Q, Fränti P. WB-index: a sum-of-squares based index for

cluster validity. Data Knowl Eng 2014;92:77–89.

30. Liu Y, Li Z, Xiong H et al. Understanding and enhancement

of internal clustering validation measures. IEEE Trans Cybern

2013;43:982–94.

31. Moulavi D, Jaskowiak PA, Campello RJGB et al. Density-based

clustering validation. In: Proceedings of the 2014 SIAM International

Conference on Data Mining. United States: Society for Industrial

and Applied Mathematics, 2014, 839–47.

32. Liang S, Han D, Yang Y. Cluster validity index for irregular

clustering results. Appl Soft Comput 2020;95:106583.

33. Greengard P, Liu Y, Steinerberger S et al. Factor clustering with

t-SNE. SSRN Electron J 2020.

34. Vermeulen M, Smith K, Eremin K et al. Application of uni-

form manifold approximation and projection (UMAP) in spectral

imaging of artworks. Spectrochim Acta A Mol Biomol Spectrosc

2021;252:119547.

35. Kaji H, Takoh K, Nishizawa M et al. Intracellular Ca2+imaging

for micropatterned cardiac myocytes. Biotechnol Bioeng 2003;81:

748–51.

20 |Integrative Biology, 2023

36. Pinto MCX, Tonelli FMP, Vieira ALG et al. Studying complex

system: calcium oscillations as attractor of cell differentiation.

Integr Biol 2016;8:130–48.

37. Petersen AP, Cho N, Lyra-Leite DM et al. Regulation of calcium

dynamics and propagation velocity by tissue microstructure in

engineered strands of cardiac tissue. Integr Biol 2020;12:34–46.

38. Wang N, de Bock M, Decrock E et al. Paracrine signaling through

plasma membrane hemichannels. Biochimica et Biophysica Acta

(BBA) - Biomembranes 2013;1828:35–50.

39. Manghani C, Gupta A, Tripathi V et al. Cardioprotective poten-

tial of curcumin against norepinephrine-induced cell death: a

microscopic study. J Microsc 2017;265:232–44.

40. Levy B, Clere-Jehl R, Legras A et al. Epinephrine versus nore-

pinephrine for cardiogenic shock after acute myocardial infarc-

tion. J Am Coll Cardiol 2018;72:173–82.

41. Maletic V, Eramo A, Gwin K et al.The role of norepinephrine and

its α-adrenergic receptors in the pathophysiology and treatment

of major depressive disorder and schizophrenia: a systematic

review. Front Psych 2017;8:1–12.

42. Venkateswarlu K, Suman G, Dhyani V et al. Three - dimensional

imaging and quantification of real - time cytosolic calcium

oscillations in microglial cells cultured on electrospun matri-

ces using laser scanning confocal microscopy. Biotechnol Bioeng

2020;e117:1–16.

43. Hozumi Y, Wang R, Yin C et al. UMAP-assisted K-means clus-

tering of large-scale SARS-CoV-2 mutation datasets. Comput Biol

Med 2021;131:104264.

44. Wang L, Chen P, Chen L et al. Ship AIS trajectory clustering: an

HDBSCAN-based approach. J Mar Sci Eng 2021;9:1–20.

45. Giri L, Patel AK, Karunarathne WKA et al. A G-protein subunit

translocation embedded network motif underlies GPCR regula-

tion of calcium oscillations. Biophys J 2014;107:242–54.

46. Kummer U, Olsen LF, Dixon CJ et al. Switching from simple

to complex oscillations in calcium signaling. Biophys J 2000;79:

1188–95.

47. Larsen AZ, Olsen LF, Kummer U. On the encoding and decod-

ing of calcium signals in hepatocytes. Biophys Chem 2004;107:

83–99.

48. Upadhyay V, Teja RS, Dhyani V et al. A model screening frame-

work for the generation of Ca2+oscillations in hippocampal

neurons using differential evolution. In: 2019 9th International

IEEE/EMBS Conference on Neural Engineering (NER). United States:

IEEE, 2019, 961–4.

49. Dau HA, Keogh E, Kamgar K. et al. The UCR time series classif i-

cation archive. UCR Archive 2018. URL https://www.cs.ucr.edu/~

eamonn/time_series_data_2018/.

50. Fu X, Koller S, Abd Alla J et al. Inhibition of G-protein-

coupled receptor kinase 2 (GRK2) triggers the growth-promoting

mitogen-activated protein kinase (MAPK) pathway. JBiolChem

2013;288:7738–55.

51. Elorza A, Penela P, Sarnago S et al. MAPK-dependent degradation

of G protein-coupled receptor kinase 2. JBiolChem2003;278:

29164–73.

52. Radstake FDW, Raaijmakers EAL, Luttge R et al. CALIMA: the

semi-automated open-source calcium imaging analyzer. Comput

Methods Programs Biomed 2019;179:104991.

53. Pnevmatikakis EA. Analysis pipelines for calcium imaging data.

Curr Opin Neurobiol 2019;55:15–21.

54. Cantu DA, Wang B, Gongwer MW et al. EZcalcium: open-source

toolbox for analysis of calcium imaging data. Front Neural Circuits

2020;14:25.

55. Delestro F, Scheunemann L, Pedrazzani M et al. In vivo large-

scale analysis of drosophila neuronal calcium traces by auto-

mated tracking of single somata. Sci Rep 2020;10:7153.

56. Booij TH, Price LS, Danen EHJ. 3D cell-based assays for drug

screens: challenges in imaging, image analysis, and high-

content analysis. SLAS Discov 2019;24:615–27.

57. Nayak L, De RK. An algorithm for modularization of MAPK

and calcium signaling pathways: comparative analysis among

different species. J Biomed Inform 2007;40:726–49.

58. Choi EJ, Palacios-Prado N, Sáez JC et al. Confirmation of Con-

nexin45 underlying weak gap junctional intercellular coupling

in HeLa cells. Biomolecules 2020;10:1389.

59. Kepseu WD, Woafo P. Intercellular waves propagation in an

array of cells coupled through paracrine signaling: a computer

simulation study. Phys Rev E 2006;73:41912.

60. Becht E, McInnes L, Healy J et al. Dimensionality reduction for

visualizing single-cell data using UMAP. Nat Biotechnol 2019;37:

38–47.

61. Kanapeckait˙

e A, Burokien ˙

e N. Insights into therapeutic tar-

gets and biomarkers using integrated multi-‘omics’ approaches

for dilated and ischemic cardiomyopathies. Integr Biol 2021;13:

121–37.

62. Saxena A, Dhyani V, Suman G et al. Effect of topology and time

window on probability distribution underlying baclofen induced

Ca2+response in hippocampal neurons. In: Proceedings of the

Annual International Conference of the IEEE Engineering in Medicine

and Biology Society, EMBS, Institute of Electrical and Electronics

Engineers Inc., United states, 2019.

63. Choi EJ, Palacios-Prado N, Sáez JC et al. Confirmation of Con-

nexin45 underlying weak gap junctional intercellular coupling

in HeLa cells. Biomolecules 2020;10:1389.

64. Rimkut˙

e L, Jotautis V, Marandykina A et al. The role of neural

connexins in HeLa cell mobility and intercellular communica-

tion through tunneling tubes. BMC Cell Biol 2016;17:3.

65. Paemeleire K, Martin PEM, Coleman SL et al. Intercellular cal-

cium waves in HeLa cells expressing GFP-labeled Connexin 43,

32, or 26. Mol Biol Cell 2000;11:1815–27.

66. Yao J, Pilko A, Wollman R. Distinct cellular states determine

calcium signaling response. Mol Syst Biol 2016;12:1–12.

67. Selimkhanov J, Taylor B, Yao J et al. Accurate information

transmission through dynamic biochemical signaling networks.

Science (1979) 2014;346:1370–3.

68. Sumit M, Jovic A, Neubig RR et al. A two-pulse cellular stimu-

lation test elucidates variability and mechanisms in signaling

pathways. Biophys J 2019;116:962–73.

69. Huang J, Liu Y, Chen J  et al.Harmine is an effective therapeutic

small molecule for the treatment of cardiac hypertrophy. Acta

Pharmacol Sin 2022;43:50–63.

70. Stringer C, Pachitariu M. Computational processing of neu-

ral recordings from calcium imaging data. Curr Opin Neurobiol

2019;55:22–31.

Multi-frame sampling and DBSCAN based approach for segmentation of Hela Cells from Time-Lapse Fluorescent Images

Conference Paper

Dec 2023

A Novel Imaging Mass Spectrometry Computational Tool for Biomedical Discovery: Untargeted pixel-by-pixel imaging of Metabolite Ratio Pairs

Preprint

Full-text available

Jan 2024

Mass spectrometry imaging (MSI) is a powerful technology that can be employed to define the spatial distribution and relative abundance of structurally identified and yet-undefined metabolites across a tissue cryosection. While numerous software packages enable pixel-by-pixel imaging of individual metabolite distributions, the research community lacks a discovery tool that provides spatial imaging of all metabolite abundance ratio pairs. Importantly, the recognition of correlating metabolite pairs offers a strategy to discover unanticipated molecules that contribute to or regulate a shared metabolic pathway, uncover hidden metabolic heterogeneity across cells and tissue subregions, and offers a single timepoint indicator of flux through a particular metabolic pathway of interest. Here, we describe the development and implementation of an untargeted R package workflow for pixel-by-pixel imaging of ratios for all metabolites detected in an MSI experiment. Considering untargeted MSI studies of murine brain and embryogenesis, we demonstrate that ratio imaging offers the opportunity to minimize systematic data variation introduced during sample handling or due to instrument drift, markedly enhances spatial image resolution, and can serve to reveal previously unrecognized tissue regions that are metabotype-distinct. Furthermore, ratio imaging facilitates the discovery of novel regional biomarkers, and can provide anatomical information regarding the spatial distribution of metabolite-linked biochemical pathways. The algorithm described herein is generic and can be applied to any MSI dataset containing spatial information for metabolites, peptides or proteins. Importantly, this software package offers a powerful add-on tool that can significantly enhance knowledge obtained from currently employed spatial metabolite profiling technologies.

Ship ais trajectory clustering: An hdbscan-based approach

Article

Full-text available

May 2021

The Automatic Identification System (AIS) of ships provides massive data for maritime transportation management and related researches. Trajectory clustering has been widely used in recent years as a fundamental method of maritime traffic analysis to provide insightful knowledge for traffic management and operation optimization, etc. This paper proposes a ship AIS trajectory clustering method based on Hausdorff distance and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), which can adaptively cluster ship trajectories with their shape characteristics and has good clustering scalability. On this basis, a re-clustering method is proposed and comprehensive clustering performance metrics are introduced to optimize the clustering results. The AIS data of the estuary waters of the Yangtze River in China has been utilized to conduct a case study and compare the results with three popular clustering methods. Experimental results prove that this method has good clustering results on ship trajectories in complex waters.

Harmine is an effective therapeutic small molecule for the treatment of cardiac hypertrophy

Article

Full-text available

Mar 2021

Harmine is a β-carboline alkaloid isolated from Banisteria caapi and Peganum harmala L with various pharmacological activities, including antioxidant, anti-inflammatory, antitumor, anti-depressant, and anti-leishmanial capabilities. Nevertheless, the pharmacological effect of harmine on cardiomyocytes and heart muscle has not been reported. Here we found a protective effect of harmine on cardiac hypertrophy in spontaneously hypertensive rats in vivo. Further, harmine could inhibit the phenotypes of norepinephrine-induced hypertrophy in human embryonic stem cell-derived cardiomyocytes in vitro. It reduced the enlarged cell surface area, reversed the increased calcium handling and contractility, and downregulated expression of hypertrophy-related genes in norepinephrine-induced hypertrophy of human cardiomyocytes derived from embryonic stem cells. We further showed that one of the potential underlying mechanism by which harmine alleviates cardiac hypertrophy relied on inhibition of NF-κB phosphorylation and the stimulated inflammatory cytokines in pathological ventricular remodeling. Our data suggest that harmine is a promising therapeutic agent for cardiac hypertrophy independent of blood pressure modulation and could be a promising addition of current medications for cardiac hypertrophy.

Dimension Reduction and Clustering of Single Cell Calcium Spiking: Comparison of t-SNE and UMAP

Conference Paper

Jul 2021

Quantitative Confocal Microscopy for Grouping of Dose–Response Data: Deciphering Calcium Sequestration and Subsequent Cell Death in the Presence of Excess Norepinephrine

Article

Aug 2021

Fluorescent calcium (Ca ²⁺ ) imaging is one of the preferred methods to record cellular activity during in vitro preclinical studies, high-content drug screening, and toxicity analysis. Visualization and analysis for dose–response data obtained using high-resolution imaging remain challenging, due to the inherent heterogeneity present in the Ca ²⁺ spiking. To address this challenge, we propose measurement of cytosolic Ca ²⁺ ions using spinning-disk confocal microscopy and machine learning–based analytics that is scalable. First, we implemented uniform manifold and projection (UMAP) for visualizing the multivariate time-series dataset in the two-dimensional (2D) plane using Python. The dataset was obtained through live imaging experiments with norepinephrine-induced Ca ²⁺ oscillation in HeLa cells for a large range of doses. Second, we demonstrate that the proposed framework can be used to depict the grouping of the spiking pattern for lower and higher drug doses. To the best of our knowledge, this is the first attempt at UMAP visualization of the time-series dose response and identification of the Ca ²⁺ signature during lytic death. Such quantitative microscopy can be used as a component of a high-throughput data analysis workflow for toxicity analysis.

Insights into therapeutic targets and biomarkers using integrated multi-‘omics’ approaches for dilated and ischemic cardiomyopathies

Article

May 2021
INTEGR BIOL-UK

At present, heart failure (HF) treatment only targets the symptoms based on the left ventricle dysfunction severity; however, the lack of systemic ‘omics’ studies and available biological data to uncover the heterogeneous underlying mechanisms signifies the need to shift the analytical paradigm towards network-centric and data mining approaches. This study, for the first time, aimed to investigate how bulk and single cell RNA-sequencing as well as the proteomics analysis of the human heart tissue can be integrated to uncover HF-specific networks and potential therapeutic targets or biomarkers. We also aimed to address the issue of dealing with a limited number of samples and to show how appropriate statistical models, enrichment with other datasets as well as machine learning-guided analysis can aid in such cases. Furthermore, we elucidated specific gene expression profiles using transcriptomic and mined data from public databases. This was achieved using the two-step machine learning algorithm to predict the likelihood of the therapeutic target or biomarker tractability based on a novel scoring system, which has also been introduced in this study. The described methodology could be very useful for the target or biomarker selection and evaluation during the pre-clinical therapeutics development stage as well as disease progression monitoring. In addition, the present study sheds new light into the complex aetiology of HF, differentiating between subtle changes in dilated cardiomyopathies (DCs) and ischemic cardiomyopathies (ICs) on the single cell, proteome and whole transcriptome level, demonstrating that HF might be dependent on the involvement of not only the cardiomyocytes but also on other cell populations. Identified tissue remodelling and inflammatory processes can be beneficial when selecting targeted pharmacological management for DCs or ICs, respectively.

UMAP-assisted K-means clustering of large-scale SARS-CoV-2 mutation datasets

Article

Feb 2021
COMPUT BIOL MED

Coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has a worldwide devastating effect. Understanding the evolution and transmission of SARS-CoV-2 is of paramount importance for controlling, combating and preventing COVID-19. Due to the rapid growth in both the number of SARS-CoV-2 genome sequences and the number of unique mutations, the phylogenetic analysis of SARS-CoV-2 genome isolates faces an emergent large-data challenge. We introduce a dimension-reduced K-means clustering strategy to tackle this challenge. We examine the performance and effectiveness of three dimension-reduction algorithms: principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP). By using four benchmark datasets, we found that UMAP is the best-suited technique due to its stable, reliable, and efficient performance, its ability to improve clustering accuracy, especially for large Jaccard distanced-based datasets, and its superior clustering visualization. The UMAP-assisted K-means clustering enables us to shed light on increasingly large datasets from SARS-CoV-2 genome isolates.

Application of Uniform Manifold Approximation and Projection (UMAP) in Spectral Imaging of Artworks

Article

Feb 2021
SPECTROCHIM ACTA A

This study assesses the potential of Uniform Manifold Approximation and Projection (UMAP) as an alternative tool to t-distributed stochastic neighbor embedding (t-SNE) for the reduction and visualization of visible spectral images of works of art. We investigate the influence of UMAP parameters—such as, correlation distance, minimum embedding distance, as well as number of embedding neighbors— on the reduction and visualization of spectral images collected from Poèmes Barbares (1896), a major work by the French artist Paul Gauguin in the collection of the Harvard Art Museums. The use of a cosine distance metric and number of neighbors equal to 10 preserves both the local and global structure of the Gauguin dataset in a reduced two-dimensional embedding space thus yielding simple and clear groupings of the pigments used by the artist. The centroids of these groups were identified by locating the densest regions within the UMAP embedding through a 2D histogram peak finding algorithm. These centroids were subsequently fit to the dataset by non-negative least square thus forming maps of pigments distributed across the work of art studied. All findings were correlated to macro XRF imaging analyses carried out on the same painting. The described procedure for reduction and visualization of spectral images of a work of art is quick, easy to implement, and the software is opensource thus promising an improved strategy for interrogating reflectance images from complex works of art.

UMAP-assisted $K$-means clustering of large-scale SARS-CoV-2 mutation datasets

Article

Dec 2020

Coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has a worldwide devastating effect. The understanding of evolution and transmission of SARS-CoV-2 is of paramount importance for the COVID-19 control, combating, and prevention. Due to the rapid growth of both the number of SARS-CoV-2 genome sequences and the number of unique mutations, the phylogenetic analysis of SARS-CoV-2 genome isolates faces an emergent large-data challenge. We introduce a dimension-reduced $k$-means clustering strategy to tackle this challenge. We examine the performance and effectiveness of three dimension-reduction algorithms: principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP). By using four benchmark datasets, we found that UMAP is the best-suited technique due to its stable, reliable, and efficient performance, its ability to improve clustering accuracy, especially for large Jaccard distanced-based datasets, and its superior clustering visualization. The UMAP-assisted $k$-means clustering enables us to shed light on increasingly large datasets from SARS-CoV-2 genome isolates.

Identifying Subpopulations of Septic Patients: A Temporal Data-Driven Approach

Article

Dec 2020
COMPUT BIOL MED

Sepsis is one of the deadliest diseases in North America and in spite of the vast amount of research on this topic there is still uncertainty in the outcome of sepsis treatments. This study aimed at investigating the informativeness of temporal electronic health records (EHR) in stratifying septic patients and identifying subpopulations of septic patients with similar trajectories and clinical needs. We performed hierarchical clustering and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) analyses using data from septic patients in the MIMIC III intensive care unit database. The t-Distributed Stochastic Neighbor Embedding (t-SNE) method was utilized to map patients to a two-dimensional space. We utilized silhouette index and cluster-wise stability assessment by resampling to investigate the validity of the clusters. The hierarchical clustering with Euclidean metric identified twelve clinically recognizable subgroups that demonstrated different characteristics in spite of sharing common conditions. Our results demonstrated that data-driven approaches can help in customizing care platforms for septic patients by identifying similar clinically relevant groups.

Factor Clustering with t-SNE

Article

Jan 2020

Mapping of structural arrangement of cells and collective calcium transients: an integrated framework combining live cell imaging using confocal microscopy and UMAP-assisted HDBSCAN-based approach

Abstract and Figures

Recommended publications

Quantitative Confocal Microscopy for Grouping of Dose–Response Data: Deciphering Calcium Sequestrati...

Dimension Reduction and Clustering of Single Cell Calcium Spiking: Comparison of t-SNE and UMAP

Gaussian Mixture Modeling of Single-Neuron Responses Obtained from Confocal-Calcium-Imaging of Disso...

Analytics Pipeline for Visualization of Single Cell RNA Sequencing Data from Brochoaveolar Fluid in...