ArticlePDF Available

A new approach for the detection of abnormal heart sound signals using TQWT, VMD and neural networks

Authors:

Abstract and Figures

Phonocardiogram (PCG) plays an important role in evaluating many cardiac abnormalities, such as the valvular heart disease, congestive heart failure and anatomical defects of the heart. However, effective cardiac auscultation requires trained physicians whose work is tough, laborious and subjective. The objective of this study is to develop an automatic classification method for anomaly (normal vs. abnormal) detection of PCG recordings without any segmentation of heart sound signals. Hybrid signal processing and artificial intelligence tools, including tunable Q-factor wavelet transform (TQWT), variational mode decomposition (VMD), phase space reconstruction (PSR) and neural networks, are utilized to extract representative features in order to model, identify and detect abnormal patterns in the dynamics of PCG system caused by heart disease. First, heart sound signal is decomposed into a set of frequency subbands with a number of decomposition levels by using the TQWT method. Second, VMD is employed to decompose the subband of the heart sound signal into different intrinsic modes, in which the first four intrinsic modes contain the majority of the heart sound signal’s energy and are considered to be the predominant intrinsic modes. They are selected to construct the reference variable for analysis. Third, phase space of the reference variable is reconstructed, in which the properties associated with the nonlinear PCG system dynamics are preserved. Three-dimensional PSR together with Euclidean distance has been utilized to derive features, which demonstrate significant difference in PCG system dynamics between normal and abnormal heart sound signals. Finally, PhysioNet/CinC Challenge heart sound database is used for evaluation and the synthetic minority over-sampling technique method is applied to balance the datasets. By using the 10-fold cross-validation style, experimental results demonstrate that the proposed features with dynamical neural networks based classifier yield classification performance with sensitivity, specificity, overall score and accuracy values of 97.73\(\%\), 98.05\(\%\), 97.89\(\%\), and 97.89\(\%\), respectively. The results verify the effectiveness of the proposed method which can serve as a potential candidate for the automatic anomaly detection in the clinical application.
Content may be subject to copyright.
Vol.:(0123456789)
Artificial Intelligence Review
https://doi.org/10.1007/s10462-020-09875-w
1 3
A new approach forthedetection ofabnormal heart sound
signals using TQWT, VMD andneural networks
WeiZeng1 · JianYuan1· ChengzhiYuan2· QinghuiWang1· FenglinLiu1· YingWang1
© Springer Nature B.V. 2020
Abstract
Phonocardiogram (PCG) plays an important role in evaluating many cardiac abnormali-
ties, such as the valvular heart disease, congestive heart failure and anatomical defects of
the heart. However, effective cardiac auscultation requires trained physicians whose work
is tough, laborious and subjective. The objective of this study is to develop an automatic
classification method for anomaly (normal vs. abnormal) detection of PCG recordings
without any segmentation of heart sound signals. Hybrid signal processing and artificial
intelligence tools, including tunable Q-factor wavelet transform (TQWT), variational mode
decomposition (VMD), phase space reconstruction (PSR) and neural networks, are utilized
to extract representative features in order to model, identify and detect abnormal patterns
in the dynamics of PCG system caused by heart disease. First, heart sound signal is decom-
posed into a set of frequency subbands with a number of decomposition levels by using
the TQWT method. Second, VMD is employed to decompose the subband of the heart
sound signal into different intrinsic modes, in which the first four intrinsic modes contain
the majority of the heart sound signal’s energy and are considered to be the predominant
intrinsic modes. They are selected to construct the reference variable for analysis. Third,
phase space of the reference variable is reconstructed, in which the properties associated
with the nonlinear PCG system dynamics are preserved. Three-dimensional PSR together
with Euclidean distance has been utilized to derive features, which demonstrate significant
difference in PCG system dynamics between normal and abnormal heart sound signals.
Finally, PhysioNet/CinC Challenge heart sound database is used for evaluation and the
synthetic minority over-sampling technique method is applied to balance the datasets. By
using the 10-fold cross-validation style, experimental results demonstrate that the proposed
features with dynamical neural networks based classifier yield classification performance
with sensitivity, specificity, overall score and accuracy values of 97.73
%
, 98.05
%
, 97.89
%
,
and 97.89
%
, respectively. The results verify the effectiveness of the proposed method
which can serve as a potential candidate for the automatic anomaly detection in the clinical
application.
Keywords Heart sound· Phonocardiogram (PCG)· Tunable Q-factor wavelet transform
(TQWT)· Variational mode decomposition (VMD)· Phase space reconstruction (PSR)·
System dynamics· Synthetic minority over-sampling technique (SMOTE)· Neural
networks
Extended author information available on the last page of the article
W.Zeng et al.
1 3
1 Introduction
Cardiac auscultation is one of the most popular non-invasive and cost-effective procedures
for the early diagnosis of various cardiac abnormalities, such as the valvular heart disease,
congestive heart failure and anatomical defects of the heart (Alam etal. 2010). However,
effective cardiac auscultation requires trained physicians which is not accessible in remote
regions and low-income countries of the world. In addition, physicians’ work is tough, tedi-
ous and subjective. Therefore, machine learning based automated heart sound classification
systems can be of significant impact for early diagnosis of cardiac diseases (Humayun etal.
2020).
Automated classification of the heart sound signals (i.e., the Phonocardiogram, PCG),
has attracted increasing attentions and has been extensively studied in the past few dec-
ades. It can be generally divided into two areas: (1) segmentation of the heart sound sig-
nals; and (2) detection of heart sound recordings as pathologic or physiologic (Humayun
et al. 2020). For the former one, in previous studies, several PCG signal segmentation
methods have been proposed based on the digital filters (Varghees etal. 2014), Fourier
transform (FT), short-time Fourier transform (STFT) and time-frequency representation
(Boutana et al. 2011), Hilbert transform (HT) (Sun et al. 2014), homomorphic filtering
(Hassani etal. 2014), empirical wavelet transform (EWT) (Varghees and Ramachandran
2017), wavelet packet transform (WPT) (Safara etal. 2013), empirical mode decomposi-
tion (EMD) (Cheema and Singh 2019), ensemble EMD (EEMD) (Papadaniil and Had-
jileontiadis 2013), variational mode decomposition (VMD) (Sujadevi et al. 2019), Mel
frequency cepstral coefficient (MFCC) (Nogueira etal. 2019), and higher order statistics
(Xie etal. 2019). Springer etal. (2015) proposed a logistic regression based hidden semi-
Markov model (HSMM) for the segmentation of the first (S1) and second (S2) heart sound
within noisy, real-world PCG recordings. Varghees and Ramachandran (2017) proposed
empirical wavelet transform (EWT) based algorithm for the PCG signal decomposition.
Messner etal. (2018) proposed an event detection approach with deep recurrent neural net-
works (DRNNs) for heart sound segmentation, i.e. the detection of the state-sequence of
the S1 and S2 heart sound. On the contrary, Deng and Han (2016) proposed a new frame-
work for heart sound classification without any segmentation. They extracted autocorrela-
tion features from the sub-band envelopes by computing the sub-band coefficients of the
heart sound signal with the discrete wavelet decomposition (DWT). Following that, the
autocorrelation features were used for obtaining the unified feature representation with dif-
fusion maps.
For the detection of heart sound recordings as pathologic or physiologic, researchers
have utilized various machine learning algorithms, such as support vector machine (SVM)
(Li etal. 2019a), neural network (NN) (Beritelli etal. 2018), hidden semi-Markov model
(HSMM) (Noman etal. 2020), k-neareast neighbor (KNN) (Singh and Majumder 2019),
decision tree (Langley and Murray 2017), and convolutional neural network (CNN) (Xiao
etal. 2019), to deal with the problem. Zhang etal. (2017) proposed a scaled spectrogram
and partial least squares regression (PLSR) based method for the extraction of effective
features from PCG signals. Then these features were fed to the support vector machine
(SVM) for the classification of PCG signals. Whitaker etal. (2017) combined the sparse
coding features with time-domain features to classify PCG signals by using the SVM clas-
sifier. Hamidi etal. (2018) utilized curve fitting and Mel frequency cepstrum coefficients
(MFCC) fused with the fractal dimension to extract features from heart sound signals.
Then the nearest neighbor classifier with Euclidean distance was used for the classification
A new approach forthedetection ofabnormal heart sound signals…
1 3
task. Zhang etal. (2019) proposed a method for abnormal heart sound detection using tem-
poral quasi-periodic features and long short-term memory (LSTM) without segmentation.
Bozkurt etal. (2018) fed MFCC and Mel-Spectrogram features into convolutional neural
network (CNN) for the PCG signal classification.
Above-mentioned works have achieved excellent performance by using different signal
processing and machine learning methods. Nonetheless, since the abnormal heart sound
detection is based upon PCG signals, the use of signal processing techniques, feature
extraction and selection become critical and challenging regarding the design of specialized
computerized systems. Due to the discrete-time, oscillatory and nonlinear characteristics of
heart sound signals (Li et al. 2019b), numerous methods with combination of time and
frequency domains and nonlinear analysis have been developed to handle the classification
problem. For the time-frequency-domain analysis, recently, the tunable Q-factor wavelet
transform (TQWT) has become popular in biomedical signal processing as a flexible and
discrete wavelet transform that is applicable particularly for analysing oscillatory signals
(Selesnick 2011; Nishad etal. 2018; Patidar etal. 2017; Hassan et al. 2016). The TQWT
is capable of adjusting its Q-factor and has thus emerged as a powerful tool for oscillatory
signals analysis. By changing the Q-factor and redundancy, the oscillatory behavior of the
wavelet basis can better reflect the oscillatory behavior of the signal (Selesnick 2011). Fol-
lowing that a sparse signal representation can be obtained, which will in turn improve the
performance of sparsity-based signal processing for applications in denoising, classifica-
tion and signal separation. Patidar and Pachori (2014) proposed a constrained TQWT based
segmentation of cardiac sound signals into heart beat cycles. The features obtained from
heart beat cycles of separately reconstructed heart sounds and murmur can better represent
the various types of cardiac sound signals than that from containing both. Therefore, heart
sounds and murmur have been separated using constrained TQWT. Jain and Tiwari (2018)
presented a segmentation method for the PCG signal. Parameters of TQWT were tuned to
vary the frequency range of the approximation level such that its kurtosis was maximized.
The intrinsic characteristic of heart sound signal is revealed from the nonlinear perspec-
tive. It provides important information for the feature of heart sound signal. These nonlin-
ear parameters, extracted through different types of entropies (Cheema and Singh 2019),
multifractal analysis (Gavrovska etal. 2016), and recurrence quantification analysis (RQA)
(Liang etal. 2015), have been employed for automatic detection of abnormal heart sound
signal. Considering the characteristics that the heart sound signal is highly random, non-
linear and nonstationary in nature (Li etal. 2019b), self-adaptive signal processing meth-
ods, such as empirical mode decomposition (EMD) (Huang etal. 1998; Huang and Kunoth
2013) and local mean decomposition (LMD) (Park etal. 2011), have been employed to
extract effective and predominant features from heart sound signals (Cheema and Singh
2019; Salman etal. 2016; Liu etal. 2010). EMD decomposes a multi-component signal
into a number of individual monocomponents, that is, intrinsic mode functions and a resid-
ual signal while LMD decomposes any complicated signal into a series of product func-
tions. However, there exist some drawbacks in these methods, in which the EMD method
contains over envelope, mode mixing, end effects and unexplainable negative frequency
caused by Hilbert transformation (Chen etal. 2011), while the LMD method has distorted
components, mode mixing and time-consuming decomposition (Li etal. 2015). Recently,
variational mode decomposition (VMD) was proposed by Dragomiretskiy and Zosso
(2014) as an alternative to the EMD and LMD for the separation of composite real-valued
time series into respective modes. VMD has been extensively used in the areas of biomedi-
cal signal processing, speech signal processing and seismic signal processing (Mert 2016;
Lal et al. 2018; Xue et al. 2016). It has been reported that VMD is theoretically better
W.Zeng et al.
1 3
founded compared to the sequential iterative sifting of EMD. VMD is based on a clear var-
iational model and the resulting minimization steps perform concurrent mode extraction in
an intuitive way (Wang etal. 2017). It was also pointed out by Dragomiretskiy and Zosso
(2014) that VMD over EMD has some advantages on tones separation and is less sensitive
to noise and sampling. VMD captures the relevant center frequencies, which can ensure
good frequency separation and is efficient for identifying various discontinuities present
in a non-stationary signal (Dragomiretskiy and Zosso 2014; Mert 2016). Sujadevi et al.
(2019) used group sparsity algorithm to denoise the measured PCG signals by exploiting
the group sparse (GS) property of PCG signals. The denoised GS-PCG signals were then
decomposed into subsequent modes with specific spectral characteristics using VMD algo-
rithm. The appropriate mode for further processing was selected based on mode central
frequencies and mode energy. It was then followed by the extraction of Hilbert envelope
and a thresholding on the selected mode to segment S1 and S2 heart sounds. Mishra etal.
(2018) employed VMD technique for the separation of heart sound (HS) and lung sound
(LS) signals, resulting in minimizing the HS interference from LS signals. Mishra etal.
(2020) used VMD to generate a set of amplitude and frequency modulated narrow band-
limited components (NBCs). The VMD-based decomposition of PCG signals in terms of
NBCs was used for quantifying the nonlinear and non-stationary nature of PCG signals. In
the present work we have developed a novel technique to compute the representative fea-
tures based on TQWT and VMD algorithms which are applied to the heart sound signals.
We hypothesize that these features reflect the abnormal alterations in the dynamics of the
PCG system and can achieve high sensitivity and specificity simultaneously as a discrimi-
nator of abnormal heart sound signal. The ultimate goal of the present study is to propose a
novel method for the detection of abnormal PCG signal. It can provide practitioners with a
more robust, simple and computing-efficient computer-aided tool compared with the clas-
sical cardiac auscultation schemes based on the physicians’ experience.
The main contributions of this work are highlighted as follows:
TQWT decomposes the heart sound signal into different frequency bands, which are
used to extract the main subband with majority of the heart sound signal’s energy.
VMD method captures most part of the signal information, preserving important wave-
form features as a slightly asymmetry. It resolves mode mixing and aliasing problems
with high computational efficiency. With the employment of VMD, it could measure
the variability of the heart sound signal. The first four intrinsic modes are then extracted
as predominant modes which contain majority of the heart sound signal’s energy.
3D phase space of the predominant intrinsic mode is reconstructed, in which properties
associated with the PCG system dynamics are preserved.
A reliable model for the anomaly detection of PCG recordings is proposed based on the
difference of PCG system dynamics between normal and abnormal heart sound signals.
The rest of this paper is organized as follows. Section2 introduces the details of the pro-
posed method, including the PhysioNet/CinC Challenge 2016 heart sound database,
TQWT, VMD, PSR, ED, feature extraction and selection, learning and classification mech-
anisms. Section3 presents experimental results. Sections 4 and 5 give some discussions
and conclusions, respectively.
A new approach forthedetection ofabnormal heart sound signals…
1 3
2 Method
In this section, we propose a method to discriminate between normal and abnormal heart
sound signals using the information obtained from nonlinear PCG system dynamics for
anomaly detection of PCG recordings. It is divided into the training stage and the clas-
sification stage, which include the following steps. In the first step, TQWT is employed to
decompose the heart sound signal into different frequency bands. In the second step, VMD
is applied to decompose the predominant subband of the heart sound signal into several
intrinsic modes to extract predominant modes. In the third step, PSR is applied to extract
nonlinear dynamics of PCG system and Euclidean distances are computed. Finally, feature
vectors are fed into the neural networks for the modeling and identification of PCG system
dynamics. The difference of PCG system dynamics between normal and abnormal heart
sound signals will be applied for the classification task. The procedure of the proposed
algorithm is illustrated in Fig.1.
2.1 Heart sound database
In this study we utilize the popular and public PhysioNet/CinC Challenge 2016 heart sound
database (Liu etal. 2016; Goldberger etal. 2003) which is available at the following web-
site: https://physionet.org/content/challenge-2016. This database is consisting of six heart
sound datasets (a through f) from different research groups. In these datasets heart sound
signals were sourced from several contributors around the world from both healthy sub-
jects and pathological patients with certain heart diseases. Specifically, the Challenge set
consists of 3153 heart sound recordings from 764 subjects/patients, lasting from 5s to just
over 120s which were resampled to 2000 Hz. Figure2 demonstrates samples of the wave-
forms corresponding to a normal and an abnormal heart sound signal.
Fig. 1 Flowchart of the proposed method for the anomaly detection of PCG recordings using TQWT,
VMD, PSR, ED and neural networks
W.Zeng et al.
1 3
The heart sound recordings were collected from different locations on the body, in which
the typical four locations are aortic area, pulmonic area, tricuspid area and mitral area. In
the database, heart sound recordings were divided into two types: normal and abnormal.
The normal recordings were from healthy subjects while the abnormal ones were from
patients with a confirmed cardiac diagnosis. The patients suffered from a variety of ill-
nesses (which we do not provide on a case-by-case basis), but typically they were heart
valve defects and coronary artery disease patients. All the recordings from the patients
were generally labelled as abnormal. The grouped types are further divided into the train-
ing dataset and testing dataset using the 10-fold cross-validation method. The details of the
datasets are demonstrated in Table1. The number of normal recordings is 2488 while the
number of abnormal recordings is 665. All the six datasets are unbalanced, i.e., the number
of normal recordings does not equal that of abnormal recordings.
A balanced heart sound database is selected (Otherwise, without prior probabilities on
the illness, a prevalence bias would be created.), where the abnormal and normal signals
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
(a)Normal heartsound signal
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1000 2000300040005000600070008000900010000
-0.06
-0.04
-0.02
0
0.02
0.04
0.06
(b)Abnormalheart soundsignal
Fig. 2 The waveforms of heart sound signals
Table 1 Numbers of raw and balanced recordings for each dataset
Here
#
represents ‘number of’
Dataset name
#
Raw recordings
#
Recordings after selected
balanced
#
Recordings after
balanced with SMOTE
method
Abnormal Normal Abnormal Normal Abnormal Normal
a 292 117 117 117 292 234
b 104 386 104 104 312 386
c 24 7 7 7 24 21
d 28 27 27 27 28 27
e 183 1871 183 183 1830 1871
f 34 80 34 34 68 80
Total 665 2488 472 472 2554 2619
Ratio of abnormal
to normal
0.27 1 0.98
A new approach forthedetection ofabnormal heart sound signals…
1 3
have the same number of recordings, as shown in Table1. However, the selected balanced
database might reduce the number of raw recordings, especially in Datasets b and e. There-
fore, we then adopt the synthetic minority over-sampling technique (SMOTE) algorithm
(Chawla etal. 2002) to over-sample the minority class so as to balance the database in
avoidance of greatly reducing the raw recordings of normal and abnormal heart sound
signals. SMOTE is a popular over-sampling technique for handling imbalanced class data
which can create synthetic samples in the minority group by applying an iterative search
and selection approach (Feng et al. 2019; Wang etal. 2019; Rivera and Xanthopoulos.
2016). Each observation from the minority class will be iterated through till the needed
number is reached.
The working principle of the SMOTE algorithm is briefly depicted as follows. For
details, please refer to (Chawla etal. 2002).
Required: Minority Data
D=xiX
where
i=1, 2, ..., T
. Number of minority
instances (T), SMOTE percentage (N), Number of nearest neighbors (k).
for
i=1, 2, ..., T
do
The working principle of the SMOTE algorithm is briefly depicted as find k nearest
minority neighbors of
xi
̂
N=[N100]
while
do
select one of the k nearest neighbor,
̄x
select random number
𝛼∈[0, 1]
̂x=xi+𝛼(̄xxi)
Append
̂x
to S
̂
N=
̂
N1
Output: Synthetic data S
With SMOTE algorithm, the balanced datasets are illustrated in Table1.
The heart sound is subjected to the following de-nosing preprocessing step. Heart sound
signals obtained using diagnostic tools are usually contaminated with noise from various
sources. These sounds hinder the early detection of mild heart sounds in the PCG signals.
So filtering of noise to remove such artifacts becomes essential (Shervegar and Bhat 2018).
This should be done at the cost of preserving all diagnostic information required for analy-
sis of the PCG signals, but removing all unwanted entities called noise. The heart sound
taken from the Physionet database is contaminated with various types of noises. The heart
sound selected is heavily filtered to remove the maximum noise from the sound. A 6th-
order Chebyshev low-pass filter with cut-off frequency of 140 Hz is used for this purpose.
The noises are in high frequencies while diagnostic information is in low frequencies. Fil-
tering removes the high-frequency noise.
2.2 Tunable Q‑factor wavelet transform (TQWT)
Wavelet transform is an effective time-frequency tool for the analysis of non-stationary sig-
nals. The tunable Q-factor wavelet transform (TQWT) is a flexible fully-discrete wavelet
transform suitable for analysis of oscillatory signals (Selesnick 2011). TQWT depends on
changeable parameters: Q-factor (Q), redundancy (R), and decomposition level (J). Gen-
erally, Q measures the oscillatory behavior and waveform shape of wavelet waveform. R
W.Zeng et al.
1 3
helps localize the wavelet in time-domain without affecting its shape. The decomposition
level J controls the expansion extent and bandpass location of wavelet waveform. There
will be a total of
J+1
subbands. For the TQWT parameters, the wavelet transform should
have a low Q-factor when the signal illustrates small or no oscillatory behavior. On the
other hand, the wavelet transform should have a relatively high Q-factor for the analy-
sis and processing of oscillatory signals. Q is often setting at a high value because heart
sound signals have more oscillations. It is worth noting that unwanted excessive ringing
of wavelets needs to be prevented while performing TQWT by appropriately choosing the
value of R greater than or equal to 3 (Selesnick 2011). Generally, a value of
R=3
is rec-
ommended. The TQWT decomposes heart sound signals into subbands with a number of
decomposition levels by using the input parameters (Q, R, and J). TQWT consists of two
iterative band-pass filter banks, i.e., the high resonance component filter
Hfilter(𝜔)
and the
low resonant component filter
Lfilter(𝜔)
. The resonance characteristics of oscillatory signal
can be represented by quality factor Q, i.e. the ratio of its center frequency to its band-
width,
Q=fcBw
, where
fc
denotes the center frequency and
Bw
represents the bandwidth
of signal.
Let the low-pass and high-pass scaling factors of the two-channel filter bank be denoted
by
𝜆
and
𝜎
, respectively. In order to prevent excessive redundancy and achieve perfect
reconstruction, the scaling factors should be:
0<𝜆<1
,
0<𝜎1
,
𝜆+𝜎>1
. Mathemat-
ically, the low-pass filter
Lfilter(𝜔)
and high-pass filter
Hfilter(𝜔)
are expressed as follows
(Selesnick 2011), respectively :
and
where
𝜃(𝜔)
is the frequency response of Daubechies filter and is defined with the following
expression:
The Q-factor, R and maximum number of decomposition level
Jmax
can be expressed in
terms of parameters
𝜆
and
𝜎
as follows:
where L is the length of the analysed heart sound signal. Detailed expressions of Q, R,
Jmax
,
fc
and
Bw
are provided in (Selesnick 2011).
In order to extract efficient heart sound signal bands, 10 levels (
J=10
,
J+1=11
subbands) of TQWT with
Q=3
and
R=3
have been empirically selected in this study.
Figures 3 and 4 represent the decomposed TQWT coefficient plot and energy distri-
bution over sample values for normal and abnormal PCG signals. Here, subband 1
(1)
L
filter(𝜔)=
1, if 𝜔(1𝜎)𝜋
𝜃(𝜔+(𝜎1)𝜋
𝜆+𝜎1),if (1𝜎)𝜋<𝜔< 𝜆𝜋
0, if 𝜆𝜋 𝜔𝜋
(2)
H
filter(𝜔)=
0, if 𝜔(1𝜎)𝜋
𝜃(𝜆𝜋𝜔
𝜆+𝜎1),if (1𝜎)𝜋<𝜔< 𝜆𝜋
1, if 𝜆𝜋 𝜔𝜋
(3)
𝜃
(𝜔)=0.5 ×(1+cos(𝜔)) ×
2cos(𝜔),𝜔
𝜋
.
(4)
Q
=
f
c
Bw
=2𝜎
𝜎
;R=𝜎
1𝜆
;Jmax =
log(𝜎L8)
log(1𝜆),
A new approach forthedetection ofabnormal heart sound signals…
1 3
Fig. 3 Examples of subbands of 10 levels TQWT of the normal and abnormal heart sound signals
W.Zeng et al.
1 3
corresponds to the high-frequencies and subband 11 corresponds to the low-frequen-
cies. It is deduced that heart sound activity shows significant variations in value over
all frequency sub-bands. However, low frequency subbands show large variation in
heart sound activity and carry high amount of energy compared to high frequency sub-
bands. It is observed from these figures that majority of the heart sound signal’s energy
is concentrated in the 11th subband (marked as
Sub11
), especially for the abnormal heart
sound signal. In comparison, nearly 2
%
of the normal heart sound signal’s energy is dis-
tributed in subbands 9 and 10, respectively, which means the energy is relatively decen-
tralized. Since the majority of the heart sound signal’s energy is concentrated in the
11th subband,
Sub11
is selected for feature acquisition.
2.3 Variational mode decomposition (VMD)
VMD is aiming to decompose a composite input signal x(t) into n number of intrinsic
modes
𝜇n(t)
which have specific sparsity properties while reproducing the input signal.
The decomposition process can be written as a constrained variational problem with the
following function:
where K is the number of decomposition modes,
𝜕
𝜕t
[
]
denotes the partial deriva-
tive of a function,
𝛿
is the Dirac function, ‘
’ represents convolution computation,
𝜇n={𝜇1,𝜇2,,𝜇n}
is the set of all modes,
𝜔n={𝜔1,𝜔2,,𝜔n}
is the set of center fre-
quency, t is the time script, j is the complex square root of
1
.
Considering a quadratic penalty term and Lagrange multipliers
𝜂
, the above-men-
tioned constrained variational problem can be transferred into an unconstrained optimi-
zation problem, which is represented as follows:
(5)
min
𝜇
n,𝜔n
{
K
n=1
𝜕
𝜕t[(𝛿(t)+ j
𝜋t)∗𝜇n(t)]ej𝜔kt
2
2
}
, subject to
K
n=1
𝜇n(t)=x(t)
,
SUBBAND
0
10
20
30
40
50
60
70
80
90
100
SUBBAND ENERGY (% OF TOTAL)
DISTRIBUTION OF SIGNAL ENERGY
(a)Normal
123456789101112345678910 11
SUBBAND
0
10
20
30
40
50
60
70
80
90
100
SUBBAND ENERGY (% OF TOTAL)
DISTRIBUTION OF SIGNAL ENERGY
(b)Abnormal
Fig. 4 Examples of the energy distribution of the subbands of TQWT of the normal and abnormal heart
sound signals
A new approach forthedetection ofabnormal heart sound signals…
1 3
where L denotes the augmented Lagrangian,
𝛼
is balancing parameter of the data-fidelity
constraint,‘
’ represents the inner product.
Alternate direction method of multipliers (ADMM) has been used to generate vari-
ous decompose modes and centre frequency at the time of shifting operation of each mode
(Dragomiretskiy and Zosso 2014). The solution of Eq.(6) can be derived by using ADMM, in
which the process of the solution of
𝜇n
and
𝜔n
mainly consists of the following steps:
Step 1 Intrinsic mode update. The Wiener filtering is embedded for updating the mode
directly in Fourier domain with a filter tuned to the current center frequency. The solution
for updated mode is obtained as follows:
where
𝜅
is the number of iterations,
̂x(𝜔)
,
̂𝜇 i(𝜔)
and
̂𝜂 (𝜔)
represent the Fourier trans-
forms of
̂x(t)
,
̂𝜇 i(t)
and
̂𝜂 (t)
, respectively.
Step 2 Center frequency update. The center frequency is updated as the center of gravity of
the corresponding mode’s power spectrum, which is represented as follows:
The complete algorithm of VMD can be found in (Dragomiretskiy and Zosso 2014). The
VMD method can effectively capture narrow-band and wide-band modes unlike the fixed
bandwidth of subabands as in the case of the wavelet transform based decomposition approach
(Babu etal. 2018). It is more robust to noisy data. Since each mode is updated by Wiener fil-
tering in Fourier domain during the optimization process, the updated mode is less affected by
noisy disturbances. Therefore, VMD can be more efficient for capturing the signal’s short and
long variations (Mishra etal. 2018; Sujadevi etal. 2019). Hence we apply the VMD method to
make up for the disadvantage of TQWT and serve as complementary tool to more effectively
extract features from PCG signals.
Figure5 demonstrates examples of the VMD of the 11th subband
Sub11
of the normal and
abnormal heart sound signals. It is obvious that each
Sub11
is decomposed into 6 intrinsic
modes which are respectively denoted by
𝜇1,𝜇2,,𝜇6
. The lower modes are slow varying in
time domain while higher modes exhibit faster variation. Results show that the dominant com-
ponents of the PCG signal are the fundamental heart sounds that may appear in the first fewer
modes of the signal decomposition.
(6)
L
({𝜇n},{𝜔n},𝜂)=𝛼
K
n=1
𝛿t[(𝛿(t)+ j
𝜋t)∗𝜇n(t)]ej𝜔kt
2
2
+
x(t)−
K
n=1
𝜇n(t)
2
2
+
𝜂(t),x(t)−
K
n=1
𝜇n(t)
,
(7)
̂𝜇
𝜅+1
n=
̂x(𝜔)−
in̂𝜇 i(𝜔)+
̂𝜂 (𝜔)
2
1+2𝛼(𝜔𝜔
n
)2
,
(8)
̂𝜔
𝜅+1
n=
0𝜔
|
̂𝜇 n(𝜔)
|2
d
𝜔
0|
̂𝜇
n
(𝜔)
|
2d𝜔
W.Zeng et al.
1 3
2.4 Phase space reconstruction (PSR)
It is sometimes necessary to search for patterns in a time series and in a higher dimen-
sional transformation of the time series (Sun et al. 2015). Phase space reconstruction is
a method used to reconstruct the so-called phase space. The concept of phase space is a
useful tool for characterizing any low-dimensional or high-dimensional dynamic system. A
dynamic system can be described using a phase space diagram, which essentially provides
a coordinate system where the coordinates are all the variables comprising mathematical
formulation of the system. A point in the phase space represents the state of the system at
any given time (Sivakumar 2002; Lee etal. 2014). Every intrinsic mode of the subbands
of the normal and abnormal heart sound signals can be written as the time series vector
𝜐={𝜐1,𝜐2,𝜐3,,𝜐K}
, where K is the total number of data points. The phase space can be
reconstructed according to (Lee etal. 2014):
where
j=1, 2, ,K−(d1)𝜏
, d is the embedding dimension of the phase space and
𝜏
is
a time lag. It is worthwhile to mention that the properties associated with the PCG system
dynamics are preserved in the reconstructed phase space.
The behaviour of the signal over time can be visualized using PSR (especially when
d=
2 or 3). In this work, we have confined our discussion to the value of embedding dimension
d=3
, because of their visualization simplicity. In addition, different studies have found
this value to best represent the attractor for human biological system (Venkataraman and
Turaga 2016; Som etal. 2016). For
𝜏
, we either use the first-zero crossing of the autocorre-
lation function for each time series or the average
𝜏
value obtained from all the time series
in the training dataset using the method proposed in Michael (2005). In this study, we con-
sider the values of time lag
𝜏=5
to test the classification performance. PSR for
d=3
has
been referred to as 3D PSR.
Reconstructed phase spaces have been proven to be topologically equivalent to the orig-
inal system and therefore are capable of recovering the nonlinear dynamics of the gen-
erating system (Takens 1981; Xu etal. 2013). This implies that the full dynamics of the
PCG system are accessible in this space, and for this reason, features extracted from it can
potentially contain more and/or different information than the common features extraction
method (Chen etal. 2014).
3D PSR is the plot of three delayed vectors
𝜐j,𝜐j+1
and
𝜐j+2
to visualize the dynamics of
the PCG system. Euclidian distance (ED) of a point
(𝜐j,𝜐j+1,𝜐j+2)
, which is the distance of
the point from origin in 3D PSR and can be defined as (Lee etal. 2014)
ED measures can be used in features extraction and have been studied and applied in many
fields, such as clustering algorithms and induced aggregation operators (Merigó and Casa-
novas 2011).
(9)
Yj=(𝜐j,𝜐j+𝜏,𝜐j+2𝜏,,𝜐j+(d1)𝜏)
(10)
ED
j=
𝜐2
j+𝜐2
j+1+𝜐2
j+
2
A new approach forthedetection ofabnormal heart sound signals…
1 3
200400 600800 1000 1200 1400
-0.5
0
0.5
11th subband
200400 600800 1000 1200 1400
-0.1
0
0.1
µ1
200400 600800 1000 1200 1400
-0.1
0
0.1
µ2
200400 600800 1000 1200 1400
-0.1
0
0.1
µ3
VMD of the 11th subband of the normal heart sound signal
200400 600800 1000 1200 1400
-0.05
0
0.05
µ4
200400 600800 1000 1200 1400
-0.05
0
0.05
µ5
200400 600800 1000 1200 1400
Samples
-0.1
0
0.1
µ6
(a)Original Sub11 of thenormalheart soundsignaland itsVMD.
(b)Original Sub
11
of the abnormalheart sound signal and itsVMD.
Fig. 5 Examples of VMD of
Sub11
of the normal and abnormal heart sound signals
W.Zeng et al.
1 3
2.5 Feature extraction andselection
In order to obtain more efficient features, this paper proposes the following extraction
scheme.
(1) Ten levels TQWT is employed to decompose the heart sound signal into eleven
subbands, in which the 11th subband
Sub11
contains the majority of heart sound signal’s
energy and is selected for analysis.
(2) VMD of the
Sub11
of the heart sound signal and derivation of predominant intrinsic
modes. The signals obtained by VMD method, which are a series of decomposing sig-
nals, cannot be directly used to classify because of the high feature dimension. To solve
this problem, the Pearson’s correlation coefficient is calculated to measure the correla-
tion between the first six intrinsic modes and the original
Sub11
of the heart sound signal.
The intrinsic modes with higher correlation coefficient are more highly correlated to the
original signal, which means the signal energy is mostly concentrated in these intrinsic
modes as well. In the present study most of the energy is concentrated in the first four
intrinsic mode (
𝜇1
,
𝜇2
,
𝜇3
and
𝜇4
), which contain the most important information from the
heart sound signal and are considered to be the predominant intrinsic modes (seen from
Table2). In addition, an independent t-test analysis of variance (SPSS Inc., IL, USA) is
used to compare the difference of the first six intrinsic modes between normal and abnor-
mal heart sound signals in the PhysioNet/CinC Challenge 2016 database. A p value of
Table 2 The average correlation coefficients and their statistical analysis between each intrinsic mode and
the original 11th subband (
Sub11
) of TQWT of all the raw normal and abnormal heart sound signals from
the PhysioNet/CinC Challenge 2016 heart sound database
A p value of < 0.05 in bold is considered to indicate statistical significance
Heart sound type Average correlation coefficients
𝜇1
𝜇2
𝜇3
𝜇4
𝜇5
𝜇6
Normal of Dataset a 0.4082 0.5159 0.4388 0.3119 0.1679 0.1523
Abnormal of Dataset a 0.4342 0.5217 0.4154 0.3045 0.1651 0.1426
Difference between groups (p value) 0.002 0.042 0.001 0.044 0.549 0.158
Normal of Dataset b 0.4538 0.4502 0.3163 0.2133 0.1343 0.1479
Abnormal of Dataset b 0.4885 0.4515 0.3193 0.2148 0.1311 0.1412
Difference between groups (p value) 0.013 0.036 0.034 0.042 0.485 0..612
Normal of Dataset c 0.4582 0.5036 0.4206 0.2967 0.1613 0.1521
Abnormal of Dataset c 0.4818 0.5023 0.3918 0.2623 0.1624 0.1376
Difference between groups (p value) 0.003 0.048 <0.001 <0.001 0..852 0.109
Normal of Dataset d 0.4635 0.4925 0.3690 0.2321 0.1456 0.1180
Abnormal of Dataset d 0.4535 0.5074 0.4221 0.2999 0.1650 0.1709
Difference between groups (p value) 0.037 0.047 <0.001 <0.001 0.068 <0.001
Normal of Dataset e 0.4161 0.4688 0.4284 0.3306 0.1245 0.1166
Abnormal of Dataset e 0.3320 0.5326 0.4250 0.2798 0.1728 0.1549
Difference between groups (p value) <0.001 <0.001 0.474 <0.001 <0.001 <0.001
Normal of Dataset f 0.4463 0.4999 0.4678 0.3787 0.1299 0.1152
Abnormal of Dataset f 0.4389 0.4960 0.4528 0.3865 0.1438 0.1447
Difference between groups (p value) 0.019 0.175 <0.001 0.039 0.52 <0.001
Mean value of correlation coefficients 0.4396 0.4952 0.4056 0.2926 0.1503 0.1412
A new approach forthedetection ofabnormal heart sound signals…
1 3
<0.05
is considered to indicate statistical significance. It is seen from Table2 that there
exist significant differences in most cases of the first four intrinsic modes between normal
and abnormal heart sound signals in the six datasets. Hence, based on the Pearson’s corre-
lation coefficient and its statistical analysis,
𝜇1
,
𝜇2
,
𝜇3
and
𝜇4
of the
Sub11
of the heart sound
signal are selected as reference variable
[
Sub
𝜇
1
11
,Sub
𝜇
2
11
,Sub
𝜇
3
11
,Sub
𝜇
4
11
]
T
and are used for the
following feature derivation.
(3) Reconstruct the phase space of the reference variable with selected values of d and
𝜏;
(4) Compute ED of 3D PSR of the reference variables. Concatenate them to form a fea-
ture vector
[
EDSub
𝜇
1
11
j
,EDSub
𝜇
2
11
j
,EDSub
𝜇
3
11
j
,EDSub
𝜇
4
11
j
]T.
For the PhysioNet/CinC Challenge 2016 heart sound database, heart sound signals are ana-
lyzed and PCG system dynamics are extracted by using TQWT, VMD and 3D PSR. First, ten
levels TQWT of the normal and abnormal heart sound signals is demonstrated in Fig.3. VMD
of the 11th subband of TQWT of the heart sound signals is exhibited in Fig. 5. The first four
intrinsic modes are utilized to form the reference variable
[
Sub
𝜇
1
11
,Sub
𝜇
2
11
,Sub
𝜇
3
11
,Sub
𝜇
4
11
]
T
. Sam-
ples of the 3D PSR of the reference variable for normal and abnormal PCG signals are exhib-
ited in Figs.6 and7. It can be observed that phase space tracks of the abnormal heart sound
signals are in a more chaotic state in comparison to the normal heart sound signals. The asym-
metric nature of the portraits fitted on the 3D space portrays the erratic time-varying phase
space dynamics of the abnormal PCG signals. These figures show that patterns related to the
higher dimensional transformations can be more discriminative than those in the time series
-0.1
0.1
-0.05
0.05 0.1
0
υj+2
0.05
0.05
υj+1
0
υj
0
0.1
-0.05 -0.05
-0.1 -0.1
(a)3D PSRofSubµ1
11 forµ1.
-0.1
0.1
-0.05
0.05 0.1
0
υj+2
0.05
0.05
υj+1
0
υj
0
0.1
-0.05 -0.05
-0.1 -0.1
(b)3D PSRofSubµ2
11 forµ2.
-0.08
0.1
-0.06
-0.04
-0.02
0.05 0.1
0
υj+2
0.02
0.05
0.04
υj+1
0
0.06
υj
0
0.08
-0.05 -0.05
-0.1 -0.1
(c)3D PSRofSubµ3
11
forµ3.
-0.05
0.05
0.05
0
υj+2
υj+1
0
υj
0
0.05
-0.05 -0.05
(d)3D PSRofSubµ4
11
forµ4.
Fig. 6 Samples of 3D PSR of
[
Sub
𝜇
1
11
,Sub
𝜇
2
11
,Sub
𝜇
3
11
,Sub
𝜇
4
11
]
T
of the normal heart sound signal
W.Zeng et al.
1 3
itself. The disparity of the PCG system dynamics between the normal and abnormal PCG sig-
nals is treated as the differentiation criterion in the present study. After 3D PSR, features of
[
EDSub
𝜇1
11
j
,EDSub
𝜇2
11
j
,EDSub
𝜇3
11
j
,EDSub
𝜇4
11
j
]T for normal and abnormal heart sound signals are derived
through ED computation. It can be observed from Figs.8 and9 that the Euclidean distances
calculated from the 3D PSR in normal and abnormal heart sound signals are different from each
other. This implies that the Euclidean distances can serve as useful features in classifying the
normal and abnormal PCG signal. They are fed into the neural networks for the following mod-
eling, identification and classification of the PCG system dynamics between the two groups.
2.6 Training andmodeling mechanism based onselected features
In this section, we present a scheme for modeling and derivation of nonlinear PCG system
dynamics derived from heart sound signals of normal and abnormal subjects based on the
extracted features.
Consider a temporal data sequence
𝜑𝜁=[
Y
(
1
)
,
,Y
(
k
)]T
R
n
generated from the fol-
lowing discrete-time PCG dynamical system:
where
Y(k)=[y1(k),,yn(k)]TRn
is the state of the system, which is measurable and
represents the feature
[
EDSub
𝜇1
11
j
,EDSub
𝜇2
11
j
,EDSub
𝜇3
11
j
,EDSub
𝜇4
11
j
]T , p
=[
p
1,,
p
n]T
is a constant
(11)
Y(k)=F(Y(k1),,Y(km);p)+v(Y(k1),,Y(km);p),
-0.02
0.02
-0.015
-0.01
-0.005
0.01 0.02
0
υj+2
0.005
0.01
0.01
υj+1
0
0.015
υj
0
0.02
-0.01 -0.01
-0.02 -0.02
(a)3D PSRofSubµ1
11 forµ1.
-0.03
0.04
-0.02
-0.01
0.02 0.03
0
υj+2
0.02
0.01
υj+1
00.01
0.02
υj
0
0.03
-0.02 -0.01
-0.02
-0.04 -0.03
(b)3D PSRofSubµ2
11 forµ2.
-0.04
0.04
-0.03
-0.02
-0.01
0.02 0.04
0
υj+2
0.01
0.02
0.02
υj+1
0
0.03
υj
0
0.04
-0.02 -0.02
-0.04 -0.04
(c)3D PSRofSubµ3
11
forµ3.
-0.03
0.04
-0.02
-0.01
0.02 0.03
0
υj+2
0.02
0.01
υj+1
00.01
0.02
υj
0
0.03
-0.02 -0.01
-0.02
-0.04 -0.03
(d)3D PSRofSubµ4
11
forµ4.
Fig. 7 Samples of 3D PSR of
[
Sub
𝜇
1
11
,Sub
𝜇
2
11
,Sub
𝜇
3
11
,Sub
𝜇
4
11
]
T
of the abnormal heart sound signal
A new approach forthedetection ofabnormal heart sound signals…
1 3
vector of system parameters (different p will generate different dynamical behaviors),
F(
;p)=[f1(
;p1),,fn(
;pn)]T
is a smooth but unknown nonlinear PCG system dynamics,
v(
;p
)=[
v
1(
;p
1)
,
,v
n(
;p
n)]T
is the modeling uncertainty.
Since the modeling uncertainty
v(
;p)
and the PCG system dynamics
F(
;p)
cannot be
decoupled from each other, we consider the two terms together as an undivided term, and
define
𝜙(
;p) ∶= F(
;p)+v(
;p)
as the general PCG system dynamics. The objective of the
training or learning stage is to identify or approximate the general PCG system dynam-
ics
𝜙(
;p)=[
𝜙
1(
;p1),,
𝜙
n(
;pn)]T
to a desired accuracy via deterministic learning (Wang
and Hill 2006, 2007, 2009).
In the first step, standard radial basis function (RBF) neural networks are constructed in
the following form
where Z is the input vector,
W=[w1,,wN]TRN
is the weight vector, N is the node
number of the neural networks, and
S(
Z
)=[
s
1(∥
Z
𝜇
1∥),,
s
N(∥
Z
𝜇
N∥)]T
, with
s
i(∥ Z𝜇i∥) = exp[−(Z𝜇i)
T
(Z𝜇i)
𝜂2
i
]
being a Gaussian function,
𝜇i(i=1, ,N)
being dis-
tinct points in state space, and
𝜂i
being the width of the receptive field.
(12)
fnn(Z)=
N
i=1
wisi(Z)=WTS(Z)
,
1000 2000 300040005000600070008000900010000
Number of the data points
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
EDj
(a)Euclidiandistanceof3DPSR of Subµ1
11
forµ1.
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Number of the data points
0
0.02
0.04
0.06
0.08
0.1
0.12
EDj
(b)Euclidiandistanceof3DPSR of Subµ2
11
forµ2.
1000 2000 300040005000600070008000900010000
Number of the data points
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
EDj
(c)Euclidiandistanceof3DPSR of Subµ3
11 for
µ
3
.
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Number of the data points
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
EDj
(d)Euclidiandistanceof3DPSR of Subµ4
11
forµ
4
.
Fig. 8 Samples of the Euclidian distance of 3D PSR of
[
Sub
𝜇
1
11
,Sub
𝜇
2
11
,Sub
𝜇
3
11
,Sub
𝜇
4
11
]
T
of the normal heart
sound signal
W.Zeng et al.
1 3
In the second step, the following dynamical RBF neural networks are employed to model
and derive the general PCG system dynamics
𝜙(
;p)
:
where
̂
Y
(k)=[̂y
1
(k),,̂y
n
(k)]
T
R
n
is the state vector of the dynamical model,
A=diag{a1,,an}
is a diagonal matrix, with
|ai|<1
being design constants, localized
RBF network
̂
WT
(k)S
k
=[
̂
W
T
1
(k)S
1
,,
̂
W
T
n
(k)S
n
]
T
are used to approximate the unknown
𝜙(
;p)=[
𝜙
1(
;p),,
𝜙
n(
;p)]T
,
̂
W
T
(k)=[
̂
W1(k),,
̂
Wn(k)]
is the weight estimate of the
neural networks,
Sk(Z)=S(Y(k1),,Y(km))
,
Z=[Y(k1),,Y(km)]
is the
input of the neural networks.
From Eqs. (11) and (13), the derivative of the state estimation error
ei=̂yi(k)−yi(k)
satisfies:
where
̃
Wi
=
̂
W
i
W
i
,
W
i
is the ideal constant neural network weight,
𝜙i
(;p)=W
i
T
S
k
+𝜖
i
,
𝜖i
is the ideal neural network approximation error. The weight estimate
̂
Wi
is updated by the
following Lyapunov-based learning law:
(13)
̂
Y(k)=A(
̂
Y(k1)−Y(k1)) +
̂
WT(k)Sk(Z),
(14)
e
i
(k+1)=̂y
i
(k+1)−y
i
(k+1)
=ai(̂yi(k)−yi(k)) + ̃
WT
i(k+1)Sk(Z)−𝜖
i
=a
i
e
i
(k)+ ̃
WT
i
(k+1)S
k
(Z)−𝜖
i
,
1000 2000 300040005000600070008000900010000
Number of the data points
0
0.005
0.01
0.015
0.02
0.025
EDj
(a)Euclidiandistanceof3DPSR of Subµ1
11
forµ1.
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Number of the data points
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
EDj
(b)Euclidiandistanceof3DPSR of Subµ2
11
forµ2.
1000 2000 300040005000600070008000900010000
Number of the data points
0
0.01
0.02
0.03
0.04
0.05
0.06
EDj
(c)Euclidiandistanceof3DPSR of Subµ3
11 for
µ
3
.
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Number of the data points
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
EDj
(d)Euclidiandistanceof3DPSR of Subµ4
11
forµ
4
.
Fig. 9 Samples of the Euclidian distance of 3D PSR of
[
Sub
𝜇
1
11
,Sub
𝜇
2
11
,Sub
𝜇
3
11
,Sub
𝜇
4
11
]
T
of the abnormal heart
sound signal
A new approach forthedetection ofabnormal heart sound signals…
1 3
where
0<|𝛼|<2
, P is any symmetric positive definite matrix, and the weight estimation
error of neural networks
̃
W
satisfies:
Assumption 1 There exists a constant
SM>0
such that for all
k0
, the following
bound is satisfied:
The following theorem indicates the learning ability of the above-mentioned identification
algorithm for discrete-time PCG system.
Theorem1 Consider adaptive system consisting of the nonlinear PCG system (11), the
dynamical RBF network (13) and the neural network weight updating law (15). For almost
any recurrent trajectory
𝜑𝜁
with initial condition
̂
Wi(0)=0
, we have: (1) the state estima-
tion error
ei(k)
exponentially converges to a small neighbor of zero, and the neural network
weight estimation
̂
W𝜁i
exponentially converges to a small neighborhood of the ideal weight
W
𝜁i
; (2) a locally accurate approximation for the unknown
𝜙i(
;pi)
to the desired error level
𝜖i
is obtained along the trajectory
𝜑𝜁
by
̄
WT
i
S
k
.
Proof We construct the following form:
Then, the state estimation error and neural network weight estimation error become:
(15)
̂
W
i(k+1)= ̂
Wi(k)−
𝛼P(̂y
i
(k)−y
i
(k)−a
i
(̂y
i
(k1)−y
i
(k1)))S
k1
(Z)
1+𝜆
max
(P)ST
k1
(Z)S
k1
(Z)
,
(16)
̃
W
i(k+1)=
̂
Wi(k+1)−W
i
=̃
Wi(k)−
𝛼P(̃
Wi(k)Sk1(Z)−𝜖i)Sk1(Z)
1+𝜆max(P)ST
k1(Z)Sk1(Z)
=̃
Wi(k)[I
𝛼PST
k1(Z)Sk1(Z)
1+𝜆max(P)ST
k1(Z)Sk1(Z)
]
+
𝛼PSk1(Z)𝜖i
1+𝜆
max
(P)ST
k1
(Z)S
k1
(Z)
(17)
S(Z(k)) SM
[
zi(k)
̃
Wi(k)
]
=
[
1ST
k1(Z)
01
][
ei(k)
̃
Wi(k)
]
W.Zeng et al.
1 3
and
Equations (18) and (19) can be transformed into the form of state equation:
By using the local approximation properties of RBF networks, the state estimation error
and weight estimates learning law can be expressed as a unified form as follows:
(18)
z
i(k+1)=ei(k+1)−S
T
k(Z)
̃
Wi(k+1)
=aiei(k)+ ̃
WT
i(k+1)Sk(z)−𝜖i
ST
k(Z)̃
Wi(k)I
𝛼PST
k1(Z)Sk1(Z)
1+𝜆max(P)ST
k1(Z)Sk1(Z)
+
𝛼PSk1(Z)𝜖i
1+𝜆max(P)Sk1(Z)Sk1(Z)
=aiei(k)+̃
Wi(k)I
𝛼PST
k1(Z)Sk1(Z)
1+𝜆max(P)ST
k1(Z)Sk1(Z)
+
𝛼PSk1(Z)𝜖i
1+𝜆max(P)ST
k1(Z)Sk1(Z)Sk(Z)
𝜖iST
k(Z)̃
Wi(k)I
𝛼PST
k1(Z)Sk1(Z)
1+𝜆max(P)ST
k1(Z)Sk1(Z)
+
𝛼PSk1(Z)𝜖i
1+𝜆max(P)ST
k1(Z)Sk1(Z)
=aiei(k)−𝜖iaĩ
WST
k1(Z)+aĩ
WTST
k1(Z)
=a
i
z
i
(k)+a
i
̃
WST
k1
(Z)−𝜖
i
(19)
̃
W
i(k+1)=̃
Wi(k)
[
I
𝛼PST
k1(Z)Sk1(Z)
1+𝜆max(P)ST
k1
(Z)Sk1(Z)
]
+
𝛼PSk1(Z)𝜖i
1+𝜆max(P)ST
k1
(Z)Sk1(Z
)
(20)
zi(k+1)
̃
Wi(k+1)
=
aiaiS
T
k1(Z)
0I𝛼PST
k1(Z)Sk1(Z)
1+𝜆max(P)ST
k1(Z)Sk1(Z)
zi(k)
̃
Wi(k)
+
𝜖i
𝛼PSk1(Z)𝜖i
1+𝜆
max
(P)ST
k1
(Z)S
k1
(Z)
(21)
zi(k+1)
̃
W𝜁i(k+1)
=
aiaiST
𝜁(k1)(Z)
0I𝛼P𝜁ST
𝜁(k1)(Z)S𝜁(k1)(Z)
1+𝜆max(P𝜁)ST
𝜁(k1)(Z)S𝜁(k1)(Z)
zi(k)
̃
W𝜁i(k)
+
𝜖
𝜁i
𝛼P𝜁ST
𝜁(k1)(Z)𝜖
𝜁i
1+𝜆max(P𝜁)ST
𝜁(k1)(Z)S𝜁(k1)(Z)
A new approach forthedetection ofabnormal heart sound signals…
1 3
and
(
)𝜁i
and
(
)
̄
𝜁i
stand for terms which are close to the orbit
𝜑𝜁
and far away from the orbit
𝜑𝜁
respectively.
S𝜁k
is a subvector of
Sk
.
̂
W𝜁i
is the corresponding weight subvector.
𝜖
𝜁i=𝜖𝜁i
̂
WT
̄
𝜁i
S̄
𝜁k(Z)=O(𝜖𝜁i
)
is the approximation error along the trajectory
𝜑𝜁
.
Now, we first prove the stability of the nominal part of Eq.(21). Based on the properties
of RBF networks (Wang and Hill 2006, 2007, 2009), almost any periodic or recurrent tra-
jectory
𝜑𝜁
ensures persistence of excitation (PE) of the regressor subvector
S𝜁k
(Gorinevsky
1995). With Assumption1,
S𝜁k
in (21) satisfies the PE condition. Then, there exist constants
𝛼1>0, n>n𝜁>0
, such that:
where
n𝜁
is the dimension of
S𝜁k
.
Consider the following Lyapunov function candidate:
where
𝛽>0
. Then, we have:
Equation (25) can also be written as:
(22)
̃
W
̄
𝜁i(k+1)= ̃
W̄
𝜁i(k)
[
I
𝛼P̄
𝜁S
T
̄
𝜁(k1)(Z)S̄
𝜁(k1)(Z)
1+𝜆max(P̄
𝜁)ST
̄
𝜁(k1)(Z)S̄
𝜁(k1)(Z)
]
+
𝛼P̄
𝜁S̄
𝜁(k1)(Z)𝜖
𝜁i
1+𝜆max(P̄
𝜁)ST
̄
𝜁(k1)
(Z)S̄
𝜁(k1)(Z)
(23)
𝛼
1I
j+n1
k=j
S𝜁(k1)(Z)ST
𝜁(k1)(Z),j
1
(24)
V
i(k)=𝛽z
2
i
(k)+
̃
W
T
𝜁i
(k)P
1
𝜁̃
W𝜁i(k
)
(25)
𝛥V
i
(k)=V
i
(k+1)−V
i
(k)
=𝛽z2
i(k+1)+ ̃
WT
𝜁i(k+1)P1
𝜁̃
W𝜁i(k+1)−𝛽z2
i(k)− ̃
WT
𝜁i(k)P1
𝜁̃
W𝜁i(k)
=−𝛽(1a2
i)z2
i(k)+2𝛽a2
izi(k)ST
𝜁(k1)(Z)̃
W𝜁i(k)
+𝛽a2
iST
𝜁(k1)(Z)̃
W𝜁i(k)ST
𝜁(k1)̃
W𝜁i(k)
ST
𝜁(k1)(Z)̃
W𝜁i(k)
2𝛼I𝛼2P𝜁ST
𝜁(k1)(Z)S𝜁(k1)(Z)
1+𝜆max(P𝜁)ST
𝜁(k1)(Z)S𝜁(k1)(Z)
1+𝜆max(P𝜁)ST
𝜁(k1)
(Z)S𝜁(k1)(Z)
̃
WT
𝜁i(k)S𝜁(k1)(Z
)
(26)
𝛥
Vi(k)=−zi(k)ST
𝜁(k1)(Z)̃
W𝜁i(k)
𝛽(1a2
i)−𝛽a2
i
𝛽a2
i
2𝛼I
𝛼2P𝜁ST
𝜁(k1)(Z)S𝜁(k1)(Z)
1+𝜆max(P𝜁)ST
𝜁(k1)(Z)S𝜁(k1)(Z)
1+𝜆max(P𝜁)ST
𝜁(k1)(Z)S𝜁(k1)(Z)𝛽a2
i
zi(k)
̃
WT
𝜁i
(k)S𝜁(k1)(Z)
W.Zeng et al.
1 3
Let
D
(k)=
𝛽(1a2
i)−𝛽a2
i
𝛽a2
i
2𝛼I
𝛼2P𝜁ST
𝜁(k1)(Z)S𝜁(k1)(Z)
1+𝜆max(P𝜁)ST
𝜁(k1)(Z)S𝜁(k1)(Z)
1+𝜆max(P𝜁)ST
𝜁(k1)(Z)S𝜁(k1)(Z)𝛽a2
i
, when
2
𝛼I
𝛼
2
P𝜁S
T
𝜁(k1)(Z)S𝜁(k1)(Z)
1+𝜆max(P𝜁)ST
𝜁(k1)(Z)S𝜁(k1)(Z)
1+𝜆max(P𝜁)ST
𝜁(
k
1
)
(Z)S𝜁(k1)(Z)
𝛽[a2
i+a
4
i
1a2
i
]>
0
, that is,
2𝛼I
𝛼
2
P𝜁S
T
𝜁(k1)(Z)S𝜁(k1)(Z)
1+𝜆max(P𝜁)ST
𝜁(k1)(Z)S𝜁(k1)(Z)
1
+𝜆max(P𝜁)[a2
i+
a4
i
1a2
i
]ST
k1(Z)Sk1(Z)
2𝛼I
𝛼
2
P𝜁S
T
𝜁(k1)(Z)S𝜁(k1)(Z)
𝜆max(P𝜁)ST
𝜁(k1)(Z)S𝜁(k1)(Z)
1+𝜆max(P𝜁)[a2
i+
a4
i
1a2
i
]S2
M
>𝛽>
0
.
Then,
D(k)>0
leas to the results that
Equation (27) means: (1)
zi(k)
and
̃
W𝜁i
(k)S
𝜁(k1)
(Z
)
converge exponentially to zero when
k
. Hence,
ei(k)
converges exponentially to zero when
k
; (2)
̃
W𝜁i
is uniformly
ultimately bounded. Since
̃
W𝜁i
(k)S
𝜁(k1)
(Z
)
converges exponentially to zero, this implies
that
̃
W𝜁i
converges to a constant vector
̃
Wc
.
It can be deduced from Eq.(27) that
S
T
𝜁(k1)
(Z)
̃
Wc=
0
. Then, we have:
Sum up the above equations, we have ̃
W
c
j+n1
k=j
S𝜁(k1+n)(Z)ST
𝜁(k1+n)
(Z)=
0
. For
S𝜁k(Z)
satisfying Eq.(23), the matrix
j+n1
k=j
S𝜁(k1)(Z)ST
𝜁(k1)
(Z
)
is positive definite, then
̃
Wc=0
,
hence
̃
W𝜁i
converges exponentially to zero.
Thus, the nominal part of Eq. (21) is exponentially stable. Since
𝜖𝜁i
is small, both the state
estimation error
zi(k+1)
and the parameter error
̃
W𝜁i
(k+1
)
in Eq.(21) converge exponentially
to small neighborhoods of zero, and the range of the neighborhood is determined by the param-
eter
𝜖𝜁i
.
The convergence of
̂
W𝜁i
to be in a small neighborhood of
W
𝜁i
implies that along the trajec-
tory
𝜑𝜁
,
where
𝜖
𝜁i
1
=𝜖𝜁i
̃
WT
𝜁i
S𝜁k(𝜑𝜁)=O(𝜖𝜁i)=O(𝜖i
)
is the practical approximation error for
using ̂
W
T
𝜁i
S𝜁
k
, which is small due to the exponential convergence of
̃
W𝜁i
.
By this convergence result, we can obtain a constant vector of neural weights according to
where
{ka,,kb}
represents a piece of time segment after the transient process. Thus,
using
̄
WT
𝜁i
S𝜁k(𝜑𝜁
)
, where
̄
W𝜁i
is the subvector of
̄
Wi
, we have:
(27)
{
𝛥V(k)<0, for {zi(k),
̃
W𝜁i(k)S𝜁(k1)(Z
)}
𝛥V(k)
0for {z
i
(k),̃
W
𝜁i
(k)}
S
𝜁(k1)(Z)S
T
𝜁(k1)
(Z)
̃
Wc=0, ,S𝜁(k1+n)(Z)S
T
𝜁(k1+n)
(Z)
̃
Wc=
0
(28)
𝜙
i(𝜑𝜁;pi)=W
𝜁i
T
S𝜁k(𝜑𝜁)+𝜖𝜁i
=̂
WT
𝜁iS𝜁k(𝜑𝜁)− ̃
WT
𝜁iS𝜁k(𝜑𝜁)+𝜖𝜁
i
=̂
WT
𝜁i
S𝜁
k
(𝜑𝜁)+𝜖𝜁
i1
(29)
̄
W
i=1
kbka+1
k
b
k=ka
̂
Wi(k
)
A new approach forthedetection ofabnormal heart sound signals…
1 3
where
𝜖
𝜁i
2
is the practical approximation error for using
̂
W
T
𝜁i
S𝜁
k
. It is clear that after the
transient process,
𝜖𝜁i2=O(𝜖𝜁i1)=O(𝜖i)
.
It can be seen from Eq.(22) that for the neurons with centers far away from the trajec-
tory
𝜑𝜁
,
S𝜁k
will become very small due to the localization property of RBF networks. In
this case, the neural weights
̂
W𝜁i
will only be slightly updated. Both
̂
W𝜁i
and
̂
W
T
𝜁i
S𝜁
k
, as well
as
̄
W𝜁i
and
̄
WT
𝜁i
S𝜁
k
will remain very small. This means that the entire RBF network
̂
WT
i
S
k
can approximate the unknown
𝜙i(𝜑𝜁;pi)
along the trajectory
𝜑𝜁
as follows:
where
𝜖
i1=𝜖𝜁i1
̄
W
T
̄
𝜁i
S̄
𝜁k(𝜑𝜁)=O(𝜖𝜁i1)=O(𝜖i
)
. Similarly, using Eq.(30), we have
where
𝜖
i
2
=𝜖𝜁i
2
̄
W
T
̄
𝜁i
S̄
𝜁k(𝜑𝜁)=O(𝜖𝜁i
2
)=O(𝜖i
)
. Equations (31) and (32) mean that locally
accurate identification of the system dynamics
𝜙i(
;pi)
to the desired level
𝜖i
along the tra-
jectory
𝜑𝜁
can be achieved by using the RBF network. This completes the proof.
It is seen that the employment of localized RBF networks under periodic or periodic-
like (recurrent) inputs, yields a guaranteed PE excitation condition. This condition, with
the localization property of RBF networks, leads to the exponential stability of a localized
adaptive discrete-time PCG system. In this way, parameter convergence and accurate local
approximation of PCG system dynamics can be achieved naturally.
2.7 Classication mechanism
In this section, we present a scheme to classify normal and abnormal heart sound signals.
Consider a set of training temporal data sequences
𝜑s
𝜁,s=1, ,M
, among which the
sth training temporal data sequence
𝜑s
𝜁
generated from the following system:
where
Ys
(k)=[y
s
1
(k),,y
s
n
(k)]
T
R
n
is the state of the system, which is measurable,
ps
is
a constant vector of system parameters,
Fs
(;p
s
)=[f
s
1
(;p
s
1
),,f
s
n
(;p
s
n
)]
T
denotes the PCG
system dynamics,
vs
(;p
s
)=[v
s
1
(;p
s
1
),,v
s
n
(;p
s
n
)]
T
denotes the modeling uncertainty.
As mentioned above, the general PCG system dynamics
𝜙s(
;p) ∶= Fs(
;p)+vs(
;p)
can
be accurately derived and preserved in constant RBF neural networks
̄
WT
i
S
k
(Y
s
(k
))
.
Consider
𝜑𝜍
generated from Eq.(11) as a test temporal data sequence. For the sth train-
ing temporal data sequence
𝜑s
𝜁
, a dynamical model is constructed by using the time-invari-
ant representation
̄
WsT
S
k
as:
(30)
𝜙
i(𝜑𝜁;pi)=
̂
W
T
𝜁iS𝜁k(𝜑𝜁)+𝜖𝜁i
1
=̄
WT
𝜁i
S
𝜁k
(𝜑
𝜁
)+𝜖
𝜁i2
(31)
𝜙
i(𝜑𝜁;pi)=
̂
W
T
𝜁iS𝜁k(𝜑𝜁)+𝜖𝜁i1
=̄
WT
𝜁iS𝜁k(𝜑𝜁)+ ̄
WT
̄
𝜁iS̄
𝜁k(𝜑𝜁)+𝜖𝜁i1̄
WT
̄
𝜁iS̄
𝜁k(𝜑𝜁
)
=̂
WT
i
Sk(𝜑𝜁)+𝜖i
1
(32)
𝜙
i(𝜑𝜁;pi)=
̄
W
T
𝜁iS𝜁k(𝜑𝜁)+𝜖𝜁i2
=̄
WT
𝜁iS𝜁k(𝜑𝜁)− ̄
WT
̄
𝜁iS̄
𝜁k(𝜑𝜁)+𝜖𝜁i
2
=̄
WT
i
S
k
(𝜑
𝜁
)+𝜖
i2
(33)
Ys(k)=Fs(Ys(k1),,Ys(km);ps)+vs(Ys(k1),,Ys(km);ps)
W.Zeng et al.
1 3
where
̄
Y
(k)=[̄y
s
1
(k),,̄y
s
n
(k)]
T
R
n
is the state vector of the dynamical model,
B=diag{b1,,bn}
is a diagonal matrix that is kept the same for all training sequences.
Sk(Z)=S(Y(k1),,Y(km))
,
[Y(k1),,Y(km)]
is the test temporal data
sequence
𝜑𝜍
generated from Eq. (11). Then, corresponding to the test temporal data
sequence
𝜑𝜍
and the dynamical model (34), we obtain the following recognition error
system:
where
es
i(k)=yi(k)−̄ys
i(k)
,
|bi|<1
.
We have that the error
|es
i(k)|
can effectively measure the similarity between the test
sequence
𝜑𝜍
and the training sequences
𝜑s
𝜁
. Compute the average
Lp
norm of
|es
i(k)|
, for
example, for
p=1
,
Hence, we have the following classification method for temporal data sequences:
Consider the recognition error system consisting of Eqs.(11), (34) and (35). Among the
M dynamical models, if the error
es
i(k)L1
between the sth dynamical model and the test
temporal data sequence
𝜑𝜍
is the smallest one, then the test temporal data sequence
𝜑𝜍
is
said to be most similar to the training temporal data sequence
𝜑s
𝜁
.
The fundamental idea of the classification of abnormal heart sound signals is that if
a test heart sound signal pattern is similar to the trained heart sound signal pattern
s(s∈{1, ,k})
, the constant RBF network
̄
W
s
TSk
embedded in the matched estimator s
will quickly recall the learned knowledge by providing accurate approximation to PCG sys-
tem dynamics. Thus, the corresponding error
es
i(k)L1
will become the smallest among all
the errors
e
k
i
(k)
L1
. Based on the smallest error principle, the appearing test heart sound
signal pattern can be classified.
Classification scheme If there exists some finite time
ts,s∈{1, ,k}
and some
i∈{1, ,n}
such that
e
s
i
(k)
L1
<
e
k
i
(k)
L1
for all
t>ts
, then the appearing PCG system
pattern can be classified and abnormal heart sound signal can be detected.
3 Experimental results
Experiments are implemented using matlab software and tested on an Intel Core i7 6700K
3.5GHz computer with 64GB RAM. We assign feature vector sequences for all the normal
and abnormal heart sound signals in the PhysioNet/CinC Challenge 2016 heart sound data-
base. According to the method described in Sect.2.5, we extract features, which means the
input of the RBF neural networks is
[
EDSub
𝜇1
11
j
,EDSub
𝜇2
11
j
,EDSub
𝜇3
11
j
,EDSub
𝜇4
11
j
]T . In order to elim-
inate data difference between different features, all feature data are normalized to
[−1, 1]
.
Several experiments are carried out to verify the effectiveness of the proposed method.
The classification results will be evaluated with the 10-fold cross-validation style in which
the variance of the estimate for the classifiers is reduced. The data are divided into the train-
ing and test subsets. For the 10-fold cross-validation, the data set is divided into ten subsets.
(34)
̄
Y(k)=B(
̄
Y
s
(k1)−Y(k1)) +
̄
W
s
TSk,
(35)
es
i
(k)=b
i
e
s
i
(k1)+(𝜙
i
(;p
i
)−
̄
W
sT
i
S
k
),i=1, ,n,k=1, ,M
,
(36)
es
i(k)
L1=1
k
k
j=1
es
i(j)
,s=1, ,M
A new approach forthedetection ofabnormal heart sound signals…
1 3
Each time, one of the ten subsets is used as the test set and the other night subsets are put
together to form a training set. As such, every fold has been used nine times as training data
and one time as test data. The final result is the average of the 10 implementations. For the
evaluation, the sensitivity (
Se
), the specificity (
Sp
), the overall score (
Sc
) of the sensitivity and
the specificity, and the accuracy (
ACC
) are used and defined as follows (Clifford etal. 2016):
where TP is the number of true positives referring to the abnormal heart sound signals,
FN is the number of false negatives referring to the misidentified abnormal heart sound
signals, TN is the number of true negatives referring to the correctly detected normal heart
sound signals, and FP is the number of false positives referring to the misidentified normal
heart sound signals. The overall score is also defined as mean accuracy (
MACC
) in some
literatures.
The classification results on normal and abnormal heart sound signals (with two dif-
ferent data balance methods mentioned before) have been illustrated in Tables3 and 4
with 10-fold cross-validation style. We apply three types of features to verify and com-
pare their classification performance: (1) derived from TQWT+PSR/ED; (2) derived
from VMD+PSR/ED; and (3) derived from TQWT, VMD, PSR and ED (proposed fea-
tures). Here when only applying TQWT+PSR/ED, we use the 11th subband of 10 levels
TQWT of the heart sound signal together with PSR/ED as the features, which are rep-
resented as
EDSub11
j
. When only applying VMD+PSR/ED, we use the first four intrinsic
modes of the heart sound signal together with PSR/ED as the features, which are repre-
sented as
[
ED
𝜇
1
j
,ED
𝜇
2
j
,ED
𝜇
3
j
,ED
𝜇
4
j
]
T
. It is seen from Tables3 and 4 that the classification
(37)
S
e=
TP
TP +FN
×100(%)
,
(38)
S
p=
TN
TN +FP
×100(%)
,
(39)
S
c=
S
e
+S
p
2,
(40)
ACC
=
TP +TN
TP +TN +FN +FP
×100(%)
,
Table 3 Classification performance of the proposed features and its comparison with other two features on
selected balanced recordings evaluated by 10-fold cross-validation. Total numbers of the abnormal and nor-
mal recordings are 472 and 472, respectively
Evaluated features Predicted
groups
Actual groups
Se
(
%
)
Sp
(
%
)
Sc
(
%
)
ACC
(
%
)
Normal Abnormal
TQWT+PSR/ED:
EDSub11
j
Normal 392 80 85.38 83.05 84.22 84.22
Abnormal 69 403
VMD+PSR/ED:
[
ED
𝜇
1
j
,ED
𝜇
2
j
,ED
𝜇
3
j
,ED
𝜇
4
j
]
T
Normal 404 68 87.29 85.59 86.44 86.44
Abnormal 60 412
Proposed features:
[
EDSub
𝜇1
11
j
,EDSub
𝜇2
11
j
,EDSub
𝜇3
11
j
,EDSub
𝜇4
11
j
]
T
Normal 461 11 97.46 97.67 97.57 97.56
Abnormal 12 460
W.Zeng et al.
1 3
performance of the proposed features is superior to that of the other two features. Overall,
our classification approach achieves good performance, which indicates that the proposed
pattern classification system can effectively detect abnormal heart sound signals by using
nonlinear features and neural network based classification tools.
4 Discussion
Experimental results of this study demonstrate that abnormal heart sound signals could be
detected automatically by means of nonlinear features and neural networks based artificial
intelligence tool. The proposed scheme focuses not only on providing evidence to support
the claim that pathological patients demonstrate altered PCG system dynamics compared
to normal subjects, but also on providing an automatic, objective and computationally con-
venient method to distinguish between normal and abnormal heart sound signals.
Potes etal. (2016) used two classifiers, in which the AdaBoost classifier and the CNN
were included. They first extracted 124 time-frequency features from the PCG signal and
used them as input to a variant of the AdaBoost classifier. Then they decomposed the PCG
cardiac cycles into four frequency bands, which were used as input of the CNN for training.
Finally, they classified the normal and abnormal heart sound signals based on an ensemble
of classifiers combining the outputs of AdaBoost and the CNN. The reported best perfor-
mance was with the sensitivity of
94.24%
, the specificity of
77.81%
, and the overall score
of
86.02%
, respectively.
Dominguez-Morales etal. (2017) divided the heart sound recordings into windows of a
specific time length. Then they sent these segments of the original sound to a Neuromor-
phic Auditory Sensor, which could decompose the audio into frequency bands and pack-
etize the information. Finally, this information was converted to sonogram images, which
were fed to the CNN for classification by using deep learning algorithms. The reported best
performance with 10-fold cross-validation was with the accuracy of
97.05%
, the sensitivity
of
95.12%
, the specificity of
93.20%
, and the overall score of
94.16%
, respectively.
Beritelli etal. (2018) extracted features from PCG signals by using Gram polynomials
and the Fourier transform. Afterwards, features were fed to the probabilistic neural net-
works for classification. The reported best performance with 10-fold cross-validation was
Table 4 Classification performance of the proposed features and its comparison with other two features
on balanced recordings with SMOTE method evaluated by 10-fold cross-validation. Total numbers of the
abnormal and normal recordings are 2554 and 2619, respectively
Evaluation methods Predicted
groups
Actual groups
Se
(
%
)
Sp
(
%
)
Sc
(
%
)
ACC
(
%
)
Normal Abnormal
TQWT+PSR/ED:
EDSub11
j
Normal 2235 384 84.30 85.34 84.82 84.83
Abnormal 401 2153
VMD+PSR/ED:
[
ED
𝜇
1
j
,ED
𝜇
2
j
,ED
𝜇
3
j
,ED
𝜇
4
j
]
T
Normal 2277 342 86.06 86.94 86.50 86.51
Abnormal 356 2198
Proposed features:
[
EDSub
𝜇
1
11
j
,EDSub
𝜇
2
11
j
,EDSub
𝜇
3
11
j,
ED
Sub
𝜇4
11
j
]
T
Normal 2568 51 97.73 98.05 97.89 97.89
Abnormal 58 2496
A new approach forthedetection ofabnormal heart sound signals…
1 3
with the accuracy of
94%
, the sensitivity of
93%
, the specificity of
91%
, and the overall
score of
92%
, respectively.
Bozkurt etal. (2018) extracted features from heart sound signal by using Mel-Spectro-
gram, MFCC and subband envelopes. These features were used as input of the CNN classi-
fier and the reported best performance with 10-fold cross-validation was with the accuracy
of
81.5%
, the sensitivity of
84.5%
, the specificity of
78.5%
, and the overall score of
81.5%
,
respectively.
Zhang et al. (2019) extracted the spectrogram of the heart sound signal by using the
short-time Fourier transform. Following that, they calculated the temporal quasi-periodic
features by the average magnitude difference function in each frequency band of the heart
sound spectrogram. The extracted features were fed to the two-layer LSTM neural net-
work for classification. The reported best performance with 10-fold cross-validation was
with the sensitivity of
96.15%
, the specificity of
93.18%
, and the overall score of
94.66%
,
respectively.
Adiban etal. (2019) constructed a fixed length feature vector from the heart sound sig-
nal by using MFCC features. Afterwards, Principal Component Analysis (PCA) transform
and Variational Autoencoder (VA) were used to reduce the feature dimension. Finally, the
reduced size feature vector was fed to Gaussian Mixture Models and SVM for classifica-
tion. The reported best performance was with the sensitivity of
92.28%
, the specificity of
94.95%
, and the overall score of
93.61%
, respectively.
Xiao etal. (2019) took 3-s 1-D waveform PCG as the inputs of CNN. At first, the initial
low-level features were extract by 64 convolutional filters. Then max pooling layers were
used to further reduce the spatial size of feature maps. After that the feature maps were fed
to the stacked clique blocks. The reported best performance with 10-fold cross-validation
was with the accuracy of
93%
, the sensitivity of
86%
, the specificity of
95%
, and the overall
score of
91%
, respectively.
Das etal. (2019) extracted three kinds of features from PCG signal, including MFCC,
Short time fourier transform and Cochleagram feature, and then fed them to a supervised
artificial neural network for classification. The reported best performance with 10-fold
cross-validation was with the accuracy of
93.7%
, the sensitivity of
84.5%
, the specificity of
95.2%
, and the overall score of
89.9%
, respectively.
Different from the above discussed methods, this study proposes a hybrid method to
extract nonlinear features using TQWT, VMD, PSR and ED techniques. These features
are fed into dynamical estimators which are consisting of constant RBF neural networks to
classify normal and abnormal heart sound signals. Comparison of the classification perfor-
mance to other state-of-the-art methods on the same database is demonstrated in Table5.
The proposed method provides sensitivity, specificity, overall score and accuracy values
of 97.73
%
, 98.05
%
, 97.89
%
, and 97.89
%
, respectively, through 10-fold cross-validation
style. Modeling, identification and classification of PCG system dynamics were employed
instead of putting feature vectors directly into the classifier in comparison to other meth-
ods. This provides another candidate tool for the detection of abnormal heart sound signals.
In TQWT the variation of Q-factor affects the computed features in different oscillatory
levels. Selecting the proper value of Q improves the system accuracy until it reaches its
best performance, and then any further increase in the value of Q will reduce the system
performance. Increasing R, while keeping Q unchanged, has the effect of increasing the
overlap between adjacent frequency responses. The parameter R does not affect the general
shape of the wavelet of frequency response spectrum (they are controlled by Q). With a
larger R, the number of level J should be increased in order to cover the same frequency
range because of the increased overlap. The value of J has been restricted to 15 in the
W.Zeng et al.
1 3
Table 5 Summary of classification performance on the normal and abnormal heart sound signals with 10-fold cross-validation style obtained from the same PhysioNet/CinC
Challenge 2016 heart sound database in the literature
References Features Classifier Sensitivity (
%
) Specificity (
%
) Overall score (
%
) Accuracy (
%
)
Potes etal. (2016) Using time-frequency features AdaBoost and CNN 94.24 77.81 86.02 Not mentioned
Dominguez-Morales etal. (2017) Using sonogram images con-
verted from frequency bands
of PCG
CNN 95.12 93.20 94.16 97.05
Beritelli etal. (2018) Using features extracted from
Gram polynomials and the
Fourier transform
Probabilistic neural networks
classifier
93 91 92 94
Bozkurt etal. (2018) Using features extracted from
Mel-Spectrogram, MFCC, sub-
band envelopes
CNN 84.5 78.5 81.5 81.5
Zhang etal. (2019) Using heart sound spectrogram
features
LSTM 96.15 93.18 94.66
%
Not mentioned
Adiban etal. (2019) Using MFCC features Gaussian Mixture Models and
SVM
92.28 94.95 93.61 Not mentioned
Xiao etal. (2019) 3-s 1-D waveform with 64 convo-
lutional filters
CNN 86 95 91 93
Das etal. (2019) MFCC, Short time fourier trans-
form and Cochleagram features
Supervised artificial neural
network
84.5 95.2 89.9 93.7
Proposed work Extracted through TQWT, VMD,
PSR and ED
Dynamical estimators consisting
of neural networks
97.73
%
98.05
%
97.89
%
97.89
%
A new approach forthedetection ofabnormal heart sound signals…
1 3
present study owing to the fact that higher values of J will lead to higher dimension of fea-
ture matrices which in turn, will increase computational burden. Several experiments are
performed for an optimum selection of Q-factor and J values. The R value is fixed to be 3,
as the R value increases, the overlapping in the adjacent frequency response also increases.
For Q and J the minimum value is selected as 1. Hence, Q is varied from 1 to 10 and J is
varied from 1 to 15, respectively. Then the features are computed from the sub-band with
the majority of the heart sound signal’s energy and fed into RBF neural networks for the
modeling, identification and classification of PCG system dynamics based on deterministic
learning theory. Figures10 and11 depict the effect of variation of Q-factor and J level on
the classification performance. It can be observed from Fig.10 that significant variation in
classification accuracy is achieved by varying Q-factor value. However, the highest clas-
sification accuracy is obtained for
Q=3
. Classification accuracy further decreases with
increment in Q-factor value. Therefore, optimum value of Q-factor is found to be 3 in the
present study. The optimal value of J is determined in the same manner. It can be observed
from Fig.11 that the maximum accuracy value is achieved for
J=10
. The experimental
results demonstrate that features based on time-frequency properties of TQWT are quite
effective to represent the behavior of cardiac sound signals giving higher classification per-
formance. One way to increase the classification performance of our method could be with
the fine-tuned parameters of the TQWT on a subject by subject basis, so as to account for
inter-individual differences. To what extent the performance can be improved by modify-
ing the tuneable parameters of TQWT (globally or for each individual) is not clear and
could be the focus of further investigation in the future.
PSR can reduce the effects of the noise or outliers of the PCG signals. Hence, features
extracted in phase space might help improve the classification results. The most visual way
to observe the dynamic behavior of a chaotic system is through the phase space, which is
the track record of the chaotic system and can reflect the changes of the system state. For
12345678910
Q-factor
91
92
93
94
95
96
97
98
Accuracy (%)
Fig. 10 Variation of classification accuracy with Q-factor on balanced recordings with SMOTE method
W.Zeng et al.
1 3
the convenience of observation, a phase space is often studied to directly judge the non-
linear dynamic behavior of chaotic systems. For example, for periodic motion, the phase
diagram trajectory is a simple closed curve. Because heart sound is a quasi-periodic signal,
we further use the phase space to analyze the chaotic characteristics of the heart sound. In
this work, we have confined our discussion to the value of embedding dimension
d=3
,
because of their visualization simplicity. In addition, different studies have found this value
to best represent the attractor for human biological system (Venkataraman and Turaga
2016; Som etal. 2016). From a theoretical viewpoint, the time lag
𝜏
has little impact on the
classification performance, and in fact there are no limitations or assumptions placed upon
it with respect to the underlying time-lag reconstruction theorems for discrete-time signals
(Sauer etal. 1991). However, since topological invariance of systems does not equate to
identical phase spaces or attractors, from a practical viewpoint the lag must be selected
with respect to some relevant criteria (Johnson etal. 2005), such as the first-zero crossing
of the autocorrelation function for each time series or the average
𝜏
value obtained from
all the time series in the training dataset using the method proposed in Michael (2005).
The dimension d is held constant and the classification task is implemented with time lag
varying across a range of 1–20. It can be observed from Fig.12 that the accuracy is highest
for a lag of 5, with a decline followed by a second lower peak value at lag 12. However, to
what extent the classification performance can be improved by modifying the dimension
and time lag is not clear and construction of regulation principle of the PSR parameters
will be considered in future research.
The TQWT can be used to extract the dynamical changes in the abnormal PCG sig-
nals with respect to that of normal. It is a nonlinear method and hence able to capture the
subtle variations in the PCG signals which results in high accuracy. Decomposing signals
with VMD is considered insightful because it provides more descriptive details about the
12345678910 11 12 13 14 15
Decomposition level J
93
93.5
94
94.5
95
95.5
96
96.5
97
97.5
98
Accuracy (%)
Fig. 11 Variation of classification accuracy with decomposition level J on balanced recordings with
SMOTE method
A new approach forthedetection ofabnormal heart sound signals…
1 3
original signal. For example, a signal that is decomposed into 4 intrinsic modes is more
descriptive than one decomposed into 2 intrinsic modes. VMD is essentially a set of adap-
tive Wiener filter banks, which transforms signal decomposition into variational solution
problem and can decompose a signal into an ensemble of band-limited mode concurrently
in a non-recursive way. 3D phase spaces of the predominant intrinsic modes are recon-
structed, in which properties associated with the PCG system dynamics are preserved.
PSR plots PCG system dynamics along the advisable
𝜇1
,
𝜇2
,
𝜇3
and
𝜇4
intrinsic modes of
the 11th subbands trajectory in a 3D phase space diagram and visualizes the PCG system
dynamics. Features derived from TQWT, VMD, 3D PSR and ED may better reflect the
abnormal alterations in the dynamics of the PCG system and can achieve high sensitivity
and specificity simultaneously as a discriminator of abnormal heart sound signal. When
feeding these features into the RBF neural networks for the modeling and identificantion of
PCG system dynamics, it could greatly improve the modeling accuracy which is effective
for the anomaly (normal vs. abnormal) detection of PCG recordings.
5 Conclusions
In this study, we propose a new approach including TQWT, VMD, PSR and ED for the
detection of abnormal heart sound signals, which is computationally simple and easy to
implement. The results of this study indicate that the pattern classification of heart sound
signal can offer an objective method to assess the disparity of PCG system dynamics
between normal subjects and pathological patients with heart diseases. However, some
limitations still need to be improved and overcome, such as the limited size of the database,
12345678910 11 12 13 14 15 16 17 18 19 20
Time lag τ
96
96.2
96.4
96.6
96.8
97
97.2
97.4
97.6
97.8
98
Accuracy (%)
Fig. 12 Variation of classification accuracy with time lag
𝜏
at dimension 3 on balanced recordings with
SMOTE method
W.Zeng et al.
1 3
the regulation principle of the TQWT amd PSR parameters. Future work will include a
clinical validation of the proposed technique with a larger number of pathological patients
with different heart diseases. Assessments of the mathematical relationship between the
embedding dimension, time lag, Q-factor, redundancy, decomposition level and the clas-
sification accuracy can also be considered in future investigations. In the present study we
did not regroup the PhysioNet/CinC Challenge 2016 database in a patient-wise manner
since we did not provide on a case-by-case basis for the patients with a variety of illness.
In future research we will regroup the database in a patient-wise manner and consider the
impact of illness (such as heart valve defects and coronary artery disease) of the patients on
the effectiveness of the stratified classification model. Features introduced in other methods
such as various entropies, Hurst exponent, fractal dimension and other nonlinear features,
can also be explored in the proposed framework to evaluate its classification performance.
The proposed automated detection system can assist physicians in cross-checking their
diagnosis of heart diseases.
Acknowledgements This work was supported by the National Natural Science Foundation of China (Grant
No. 61773194), by the Natural Science Foundation of Fujian Province (Grant No. 2018J01542), by the Pro-
gram for New Century Excellent Talents in Fujian Province University and by the Training Program of
Innovation and Entrepreneurship for Undergraduates (Grant No. 201911312009).
Compliance with ethical standards
Conict of interest There is no conflict of interest.
References
Adiban M, BabaAli B, Shehnepoor S (2019) I-vector based features embedding for heart sound classifica-
tion. arXiv preprint arXiv :1904.11914
Alam U, Asghar O, Khan SQ, Hayat S, Malik RA (2010) Cardiac auscultation: an essential clinical skill in
decline. Br J Cardiol 17(1):8
Babu KA, Ramkumar B, Manikandan MS (2018) Automatic identification of S1 and S2 heart sounds using
simultaneous PCG and PPG recordings. IEEE Sens J 18(22):9430–9440
Beritelli F, Capizzi G, Sciuto GL, Napoli C, Scaglione F (2018) Automatic heart activity diagnosis based on
Gram polynomials and probabilistic neural networks. Biomed Eng Lett 8(1):77–85
Boutana D, Benidir M, Barkat B (2011) Segmentation and identification of some pathological phonocardio-
gram signals using time-frequency analysis. IET Signal Process 5(6):527–537
Bozkurt B, Germanakis I, Stylianou Y (2018) A study of time-frequency features for CNN-based automatic
heart sound classification for pathology detection. Comput Biol Med 100:132–143
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling
technique. J Artif Intell Res 16:321–357
Cheema A, Singh M (2019) An application of phonocardiography signals for psychological stress detection
using non-linear entropy based features in empirical mode decomposition domain. Appl Soft Comput
77:24–33
Chen B, He Z, Chen X, Cao H, Cai G, Zi Y (2011) A demodulating approach based on local mean decom-
position and its applications in mechanical fault diagnosis. Meas Sci Technol 22(5):055704
Chen M, Fang Y, Zheng X (2014) Phase space reconstruction for improving the classification of single trial
EEG. Biomed Signal Process Control 11:10–16
Clifford GD, Liu C, Moody B, Springer D, Silva I, Li Q, Mark RG (2016) Classification of normal/abnor-
mal heart sound recordings: the PhysioNet/computing in cardiology challenge 2016. In: 2016 Comput-
ing in cardiology conference (CinC), pp 609–612
Das S, Pal S, Mitra M (2019) Supervised model for Cochleagram feature based fundamental heart sound
identification. Biomed Signal Process Control 52:32–40
Deng SW, Han JQ (2016) Towards heart sound classification without segmentation via autocorrelation fea-
ture and diffusion maps. Future Gener Comput Syst 60:13–21
A new approach forthedetection ofabnormal heart sound signals…
1 3
Dominguez-Morales JP, Jimenez-Fernandez AF, Dominguez-Morales MJ, Jimenez-Moreno G (2017) Deep
neural networks for the recognition and classification of heart murmurs using neuromorphic auditory
sensors. IEEE Trans Biomed Circuits Syst 12(1):24–34
Dragomiretskiy K, Zosso D (2014) Variational mode decomposition. IEEE Trans Signal Process
62(3):531–544
Feng W, Dauphin G, Huang W, Quan Y, Bao W, Wu M, Li Q (2019) Dynamic synthetic minority over-
sampling technique-based rotation forest for the classification of imbalanced hyperspectral data. IEEE
J Sel Top Appl Earth Obs Remote Sens 12(7):2159–2169
Gavrovska A, Zajic G, Bogdanovic V, Reljin I, Reljin B (2016) Paediatric heart sound signal analysis
towards classification using multifractal spectra. Physiol Meas 37(9):1556
Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PCh, Mark RG, Mietus JE, Moody GB, Peng
CK, Stanley HE (2003) PhysioBank, physioToolkit, and physioNet: components of a new research
resource for complex physiologic signals. Circulation 101(23):e215–e220
Gorinevsky D (1995) On the persistency of excitation in radial basis function network identification of non-
linear systems. IEEE Trans Neural Netw 6(5):1237–1244
Hamidi M, Ghassemian H, Imani M (2018) Classification of heart sound signal using curve fitting and frac-
tal dimension. Biomed Signal Process Control 39:351–359
Hassan AR, Siuly S, Zhang Y (2016) Epileptic seizure detection in EEG signals using tunable-Q factor
wavelet transform and bootstrap aggregating. Comput Methods Programs Biomed 137:247–259
Hassani K, Bajelani K, Navidbakhsh M, Doyle DJ, Taherian F (2014) Heart sound segmentation based on
homomorphic filtering. Perfusion 29(4):351–359
Huang B, Kunoth A (2013) An optimization based empirical mode decomposition scheme. J Comput Appl
Math 240:174–183
Huang NE, Shen Z, Long SR, Wu MC, Shih HH, Zheng Q, Liu HH (1998) The empirical mode decomposi-
tion and Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc R Soc Lond A
Math Phys Eng Sci 454(1971):903–995
Humayun AI, Ghaffarzadegan S, Ansari MI, Feng Z, Hasan T (2020) Towards domain invariant heart
sound abnormality detection using learnable filterbanks. IEEE J Biomed Health Inform. https ://doi.
org/10.1109/JBHI.2020.29702 52
Jain PK, Tiwari AK (2018) A robust algorithm for segmentation of phonocardiography signal using tunable
quality wavelet transform. J Med Biol Eng 38(3):396–410
Johnson MT, Povinelli RJ, Lindgren AC, Ye J, Liu X, Indrebo KM (2005) Time-domain isolated phoneme
classification using reconstructed phase spaces. IEEE Trans Speech Audio Process 13(4):458–466
Lal GJ, Gopalakrishnan EA, Govind D (2018) Epoch estimation from emotional speech signals using vari-
ational mode decomposition. Circuits Syst Signal Process 37(8):3245–3274
Langley P, Murray A (2017) Heart sound classification from unsegmented phonocardiograms. Physiol Meas
38(8):1658
Lee SH, Lim JS, Kim JK, Yang J, Lee Y (2014) Classification of normal and epileptic seizure EEG signals
using wavelet transform, phase-space reconstruction, and Euclidean distance. Comput Methods Pro-
grams Biomed 116(1):10–25
Li Y, Xu M, Wei Y, Huang W (2015) Rotating machine fault diagnosis based on intrinsic characteristic-
scale decomposition. Mech Mach Theory 94:9–27
Li J, Ke L, Du Q, Ding X, Chen X, Wang D (2019a) Heart sound signal classification algorithm: a combina-
tion of wavelet scattering transform and twin support vector machine. IEEE Access 7:179339–179348
Li J, Ke L, Du Q (2019b) Classification of heart sounds based on the wavelet fractal and twin support vector
machine. Entropy 21(5):472
Liang QZ, Guo XM, Zhang WY, Dai WD, Zhu XH (2015) Identification of heart sounds with arrhythmia
based on recurrence quantification analysis and Kolmogorov entropy. J Med Biol Eng 35(2):209–217
Liu L, Wang H, Wang Y, Tao T, Wu X (2010) Feature analysis of heart sound based on the improved
Hilbert-Huang transform. In: 3rd IEEE international conference on computer science and information
technology, pp 378–381
Liu C, Springer D, Li Q, Moody B, Juan RA, Chorro FJ, Syed Z (2016) An open access database for the
evaluation of heart sound algorithms. Physiol Meas 37(12):2181
Merigó JM, Casanovas M (2011) Induced aggregation operators in the Euclidean distance and its applica-
tion in financial decision making. Expert Syst Appl 38:7603–7608
Mert A (2016) ECG feature extraction based on the bandwidth properties of variational mode decomposi-
tion. Physiol Meas 37(4):530
Messner E, Zohrer M, Pernkopf F (2018) Heart sound segmentation-an event detection approach using deep
recurrent neural networks. IEEE Trans Biomed Eng 65(9):1964–1974
W.Zeng et al.
1 3
Michael S (2005) Applied nonlinear time series analysis: applications in physics, physiology and finance
(Vol 52). World Scientific, Singapore
Mishra M, Banerjee S, Thomas DC, Dutta S, Mukherjee A (2018) Detection of third heart sound using
variational mode decomposition. IEEE Trans Instrum Meas 67(7):1713–1721
Mishra M, Pratiher S, Menon H, Mukherjee A (2020) Identification of S1 and S2 heart sounds using
spectral and convex hull features. IEEE Sens J 20(8):4311–4320
Nishad A, Pachori RB, Acharya UR (2018) Application of TQWT based filter-bank for sleep apnea
screening using ECG signals. J Ambient Intell Humaniz Comput. https ://doi.org/10.1007/s1265
2-018-0867-3
Nogueira DM, Ferreira CA, Gomes EF, Jorge AM (2019) Classifying heart sounds using images of
Motifs, MFCC and temporal features. J Med Syst 43(6):168
Noman FM, Salleh SH, Ting CM, Samdin SB, Ombao H, Hussain H (2020) A Markov-switching model
approach to heart sound segmentation and classification. IEEE J Biomed Health Inform 24(3):705–716
Papadaniil CD, Hadjileontiadis LJ (2013) Efficient heart sound segmentation and extraction using ensemble
empirical mode decomposition and kurtosis features. IEEE J Biomed Health Inform 18(4):1138–1152
Park C, Looney D, Van Hulle MM, Mandic DP (2011) The complex local mean decomposition. Neuro-
computing 74(6):867–875
Patidar S, Pachori RB (2014) Classification of cardiac sound signals using constrained tunable-Q wave-
let transform. Expert Syst Appl 41(16):7161–7170
Patidar S, Pachori RB, Upadhyay A, Acharya UR (2017) An integrated alcoholic index using tunable-Q
wavelet transform based features extracted from EEG signals for diagnosis of alcoholism. Appl Soft
Comput 50:71–78
Potes C, Parvaneh S, Rahman A, Conroy B (2016) Ensemble of feature-based and deep learning-based
classifiers for detection of abnormal heart sounds. In: 2016 computing in cardiology conference
(CinC), pp 621–624
Rivera WA, Xanthopoulos P (2016) A priori synthetic over-sampling methods for increasing classifica-
tion sensitivity in imbalanced data sets. Expert Syst Appl 66:124–135
Safara F, Doraisamy S, Azman A, Jantan A, Ramaiah ARA (2013) Multi-level basis selection of wavelet
packet decomposition tree for heart sound classification. Comput Biol Med 43(10):1407–1414
Salman AH, Ahmadi N, Mengko R, Langi AZ, Mengko TL (2016) Empirical mode decomposition
(EMD) based denoising method for heart sound signal and its performance analysis. Int J Electr
Comput Eng 6(5):1–8
Sauer T, Yorke JA, Casdagli M (1991) Embedology. J Stat Phys 65(3–4):579–616
Selesnick I (2011) Wavelet transform with tunable Q-factor. IEEE Trans Signal Process 59(8):3560–3575
Shervegar MV, Bhat GV (2018) Heart sound classification using Gaussian mixture model. Porto Biomed
J 3(1):e4
Singh SA, Majumder S (2019) Classification of unsegmented heart sound recording using KNN classi-
fier. J Mech Med Biol 19(04):1950025
Sivakumar B (2002) A phase-space reconstruction approach to prediction of suspended sediment con-
centration in rivers. J Hydrol 258(1–4):149–162
Som A, Krishnamurthi N, Venkataraman V, Turaga P (2016) Attractor-shape descriptors for balance
impairment assessment in Parkinson’s disease. In: IEEE conference on engineering in medicine and
biology society, pp 3096–3100
Springer DB, Tarassenko L, Clifford GD (2015) Logistic regression-HSMM-based heart sound segmen-
tation. IEEE Trans Biomed Eng 63(4):822–832
Sujadevi VG, Mohan N, Kumar SS, Akshay S, Soman KP (2019) A hybrid method for fundamental heart
sound segmentation using group-sparsity denoising and variational mode decomposition. Biomed
Eng Lett 9(4):413–424
Sun S, Jiang Z, Wang H, Fang Y (2014) Automatic moment segmentation and peak detection analysis of
heart sound pattern via short-time modified Hilbert transform. Comput Methods Programs Biomed
114(3):219–230
Sun Y, Li J, Liu J, Chow C, Sun B, Wang R (2015) Using causal discovery for feature selection in multi-
variate numerical time series. Mach Learn 101(1–3):377–395
Takens F (1981) Detecting strange attractors in turbulence. In: Rand DA, Young L-S (eds) Dynamical
systems and turbulence, Warwick 1980. Springer, Berlin, pp 366–381
Varghees VN, Ramachandran KI (2014) A novel heart sound activity detection framework for automated
heart sound analysis. Biomed Signal Process Control 13:174–188
Varghees VN, Ramachandran KI (2017) Effective heart sound segmentation and murmur classification
using empirical wavelet transform and instantaneous phase for electronic stethoscope. IEEE Sens J
17(12):3861–3872
A new approach forthedetection ofabnormal heart sound signals…
1 3
Aliations
WeiZeng1 · JianYuan1· ChengzhiYuan2· QinghuiWang1· FenglinLiu1· YingWang1
* Wei Zeng
zw0597@126.com
1 School ofPhysics andMechanical andElectrical Engineering, Longyan University,
Longyan364012, People’sRepublicofChina
2 Department ofMechanical, Industrial andSystems Engineering, University ofRhode Island,
Kingston, RI02881, USA
Venkataraman V, Turaga P (2016) Shape distributions of nonlinear dynamical systems for video-based
inference. IEEE Trans Pattern Anal Mach Intell 38(12):2531–2543
Wang C, Hill DJ (2006) Learning from neural control. IEEE Trans Neural Networks 17(1):130–146
Wang C, Hill DJ (2007) Deterministic learning and rapid dynamical pattern recognition. IEEE Trans Neural
Netw 18(3):617–630
Wang C, Hill DJ (2009) Deterministic learning theory for identification, recognition and control. CRC
Press, Boca Raton
Wang Y, Liu F, Jiang Z, He S, Mo Q (2017) Complex variational mode decomposition for signal processing
applications. Mech Syst Signal Process 86:75–85
Wang Q, Zhou X, Wang C, Liu Z, Huang J, Zhou Y, Cheng JZ (2019) WGAN-based synthetic minor-
ity over-sampling technique: improving semantic fine-grained classification for lung nodules in CT
images. IEEE Access 7:18450–18463
Whitaker BM, Suresha PB, Liu C, Clifford GD, Anderson DV (2017) Combining sparse coding and time-
domain features for heart sound classification. Physiol Meas 38(8):1701
Xiao B, Xu Y, Bi X, Zhang J, Ma X (2019) Heart sounds classification using a novel 1-D convolutional neu-
ral network with extremely low parameter consumption. Neurocomputing. https ://doi.org/10.1016/j.
neuco m.2018.09.101
Xie Y, Xie K, Xie S (2019) Underdetermined blind source separation for heart sound using higher-order
statistics and sparse representation. IEEE Access 7:87606–87616
Xu B, Jacquir S, Laurent G, Bilbault JM, Binczak S (2013) Phase space reconstruction of an experimental
model of cardiac field potential in normal and arrhythmic conditions. In: 35th annual international
conference of the IEEE engineering in medicine and biology society, pp 3274–3277
Xue YJ, Cao JX, Wang DX, Du HK, Yao Y (2016) Application of the variational-mode decomposition for
seismic time-frequency analysis. IEEE J Sel Top Appl Earth Obs Remote Sens 9(8):3821–3831
Zhang WJ, Han JQ, Deng SW (2017) Heart sound classification based on scaled spectrogram and partial
least squares regression. Biomed Signal Process Control 32:20–28
Zhang WJ, Han JQ, Deng SW (2019) Abnormal heart sound detection using temporal quasi-periodic fea-
tures and long short-term memory without segmentation. Biomed Signal Process Control 53:101560
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
... However, the most important shortcoming of EWT is that signal spectrum is susceptible to noise and non-stationary factors that may leads to false boundaries estimation, which may request further tedious work on selecting dominant frequency peaks in the spectrum as reported in [20]. Motivated by the prowess of timefrequency representations and envelogram, many studies used features to train their deep learning models for heart sounds segmentation or abnormal heart sound detection [21][22][23][24][25][26][27]. ...
Preprint
Full-text available
The purpose of this paper is to present a straightforward framework for Heart Rate (HR) estimation from a Phonocardiogram (PCG) records and study the impact of murmur severity on HR. The system focuses primarily on data processing procedure, which is based on signal preprocessing using Maximal Overlap Discrete Wavelet Transform (MODWT) to delineate murmurs from heart sounds. We exploit the characteristics of Logistic function to derive an enhanced PCG envelop that serves as prerequisite for HR algorithm detection. In fact, the PCG envelop present a cyclostationarity that can be easily detected throughout a cross-covariance autocorrelation function to calculate the Heart Rate (HR). In addition, the effect of minor and pronounced murmurs is gauged by the Energetic Ratio (ER) that provide a comprehensive idea about the superimposed murmur energy on first and second Heart sounds. The study was conducted on PASCAL datasets with 335 real clinical records. Results shows that subjects with Heart murmurs present an averaged Heart Rate (HR ≈ 77 bpm) is within the normal range in mild and medium murmurs. These findings suggests that the change in heart rate is not associated with the severity of murmur that occurs in structural heart valve disorder. A result that could be valuable to medical professionals operating in the emergency departments.
... Tunable-Q wavelet transform (TQWT) Zeng et al. (2021) The discrete wavelet transform known as TQWT is effective at analysing oscillating signals. ...
Article
Full-text available
Heart sound signal analysis is very important for the early identification and treatment of cardiovascular illness. With rapid advancements in science and technology, artificial intelligence technologies are providing tremendous opportunities to enhance diagnosis and clinical decision‐making. Instruments can now perform clinical diagnoses that previously could only be handled by human experts more conveniently and efficiently. Despite multiple works on automatic heart sound analysis, there are few summarization and review works. This article attempts to give a thorough overview of various heart sound analysis subtasks and examine the improvements made in each subtask by both machine learning techniques and deep learning algorithms. It goals to highlight the potential of AI to revolutionize cardiovascular healthcare by enabling accurate and automated analysis of heart sounds. The findings of this review are beneficial for researchers, clinicians, and engineers in the development and application of AI‐based solutions for improved heart sound classification and diagnosis.
Article
Abstract Background: Cardiac diseases are highly detrimental illnesses, responsible for approximately 32% of global mortality [1]. Early diagnosis and prompt treatment can reduce deaths caused by cardiac diseases. In paediatric patients, it is challenging for paediatricians to identify functional murmurs and pathological murmurs from heart sounds. Objective: The study intends to develop a novel blended ensemble model using hybrid deep learning models and softmax regression to classify adult, and paediatric heart sounds into five distinct classes, distinguishing itself as a groundbreaking work in this domain. Furthermore, the research aims to create a comprehensive 5-class paediatric phonocardiogram (PCG) dataset. The dataset includes two critical pathological classes, namely atrial septal defects and ventricular septal defects, along with functional murmurs, pathological and normal heart sounds. Methods: The work proposes a blended ensemble model (HbNet-Heartbeat Network) comprising two hybrid models, CNN-BiLSTM and CNN-LSTM, as base models and Softmax regression as meta-learner. HbNet leverages the strengths of base models and improves the overall PCG classification accuracy. Mel Frequency Cepstral Coefficients (MFCC) capture the crucial audio signal characteristics relevant to the classification. The amalgamation of these two deep learning structures enhances the precision and reliability of PCG classification, leading to improved diagnostic results. Results: The HbNet model exhibited excellent results with an average accuracy of 99.72% and sensitivity of 99.3% on an adult dataset, surpassing all the existing state-of-the-art works. The researchers have validated the reliability of the HbNet model by testing it on a real-time paediatric dataset. The paediatric model's accuracy is 86.5%. HbNet detected functional murmur with 100% precision. Conclusion: The results indicate that the HbNet model exhibits a high level of efficacy in the early detection of cardiac disorders. Results also imply that HbNet has the potential to serve as a valuable tool for the development of decision-support systems that aid medical practitioners in confirming their diagnoses. This method makes it easier for medical professionals to diagnose and initiate prompt treatment while performing preliminary auscultation and reduces unnecessary echocardiograms. Keywords: Blended ensemble; mel frequency cepstral coefficient; meta-learner; phonocardiogram; softmax regression.
Article
Full-text available
Recent studies have shown the potential of the Data-Efficient Image Transformer (DeiT)-based transfer learning method in speech/image recognition and classification utilizing models pre-trained on image datasets. However, the use of DeiT models, especially those pre-trained on image datasets, has not yet been explored for Valvular Heart Disease (VHD) detection. This paper proposes a transfer learning methodology using the DeiT model pre-trained on image datasets for VHD classification. Additionally, we introduce a hybrid Convolution-DeiT (Conv-DeiT) architecture to further improve classification performance. The Conv-DeiT framework integrates a convolutional block with a Squeeze-and-Excitation (SE) attention mechanism to enhance the channel and spatial information within the input features before processing by the DeiT model. The proposed models were assessed using the Heart Sound Murmur (HSM) database, accessible on GitHub. Experimental results show that the DeiT-based transfer learning approach achieved an overall accuracy of 97.44%. Moreover, our Conv-DeiT method outperformed the DeiT-based transfer learning with an impressive overall accuracy of 99.44%. This study indicates the effectiveness of transfer learning using DeiT models pre-trained on image datasets for heart sound classification. Specifically, our hybrid Conv-DeiT method, which combines the convolutional block and the SE-attention mechanism, demonstrates significant advantages in this context.
Article
In the diagnosis of cardiac disorders Heart sound has a major role, and early detection is crucial to safeguard the patients. Computerized strategies of heart sound classification advocate intensive and more exact results in a quick and better manner. Using a hybrid optimization-controlled deep learning strategy this paper proposed an automatic heart sound classification module. The parameter tuning of the Deep Neural Network (DNN) classifier in a satisfactory manner is the importance of this research which depends on the Hybrid Sneaky optimization algorithm. The developed sneaky optimization algorithm inherits the traits of questing and societal search agents. Moreover, input data from the Phonocardiogram (PCG) database undergoes the process of feature extraction which extract the important features, like statistical, Heart Rate Variability (HRV), and to enhance the performance of this model, the features of Mel frequency Cepstral coefficients (MFCC) are assisted. The developed Sneaky optimization-based DNN classifier's performance is determined in respect of the metrics, namely precision, accuracy, specificity, and sensitivity, which are around 97%, 96.98%, 97%, and 96.9%, respectively.
Preprint
Full-text available
Cardiovascular diseases remain the foremost global cause of mortality, necessitating timely and accurate diagnosis. Auscultation, relying on a physician's expertise and a stethoscope, stands as the primary diagnostic tool for cardiovascular disorders. However, its inherent subjectivity necessitates the development of an efficient clinical support system capable of transforming this subjective process into a computerized and proficient method. In real-world clinical settings, auscultation sounds frequently become entangled with ambient noise, demanding the implementation of an effective denoising technique followed by a robust classification model to ensure accurate categorization. In this research paper, we present an innovative preprocessing technique that harnesses the Variational Mode Decomposition (VMD) method to effectively denoise heart sounds. Subsequently, the denoised sound signals undergo processing through a Gammatone filter bank and Short-Time Fourier Transform (STFT) to generate time-frequency distributions in the form of Gammatonegram images and Spectrogram images. To tackle the challenges associated with imbalanced datasets, we incorporate a data augmentation method during the image processing phase. These images are then subjected to classification using various deep convolutional neural network architectures grounded in transfer learning principles, specifically CNN models, including AlexNet, SqueezeNet, GoogLeNet, and VGG19, to mitigate model overfitting.Our experimental results undergo rigorous validation using the publicly accessible PhysioNet 2016 dataset. Notably, our proposed methodology, particularly when leveraging Gammatonegram images, demonstrates highly promising results. These outcomes underscore the considerable clinical potential of our approach, particularly in the context of detecting imbalanced and noisy heart sound signals, ultimately contributing to the enhancement of cardiovascular disease diagnosis.
Article
Full-text available
This paper proposes a pre‐processing method for heart sound screening and extracts the high‐order spectral feature of phonocardiogram. Moreover, a multi‐convolutional neural network (mCNN) is constructed to achieve the classification of normal, aortic stenosis, mitral regurgitation, mitral stenosis, and mitral valve prolapse. First, the heart sound recordings are down‐sampled, denoised by wavelet transform, and normalized. Second, a new heart sound screening algorithm is proposed. The waveform of the heart sound recording is segmented and saved as an image which is performed by the gray‐scale processing to calculate the amplitude of the heart sound. The extremely noisy heart sound segments are screened out based on the amplitude information, and the remaining heart sound segments are spliced as pure heart sound recordings. After 50% superposition segmentation of the heart sound recordings, high‐order spectral features are extracted and image data are stored. Finally, a 34‐layer mCNN is specifically designed to boost the performance of heart sound classification through multi‐layer dimensionality reduction. Experimental results show that the proposed method has superior performance compared with the existing one. For the two‐category dataset, the accuracy with and without PCG screening is 97.99% and 99.42%, respectively. For the five‐category dataset, the average accuracy is 99%.
Article
Full-text available
By classifying the heart sound signals, it can provide very favorable clinical information to the diagnosis of cardiovascular diseases. According to the characteristics of heart sound signals which are complex and difficult to classify and recognize, a new method of feature extraction and classification about heart sound signal is proposed by a combination of wavelet scattering transform and twin support vector machine in this paper. The method is as follows: The heart sound signal data set is firstly divided into two parts, one as a training set and the other as a testing set. Then the wavelet scattering transform is applied to the heart sound signals in the training set and the testing set. The scattering transform is a new time-frequency analysis method. It overcomes the shortcomings of the traditional wavelet transform which has the time-shift changes. It has the advantages of translation invariance and elastic deformation stability. Thus obtain the scattering feature matrix of the heart sound signal. Due to the large dimension of scattering feature matrix, this paper uses multidimensional scaling (MDS) method to reduce the dimension. This method is compared with the classical dimension reduction method-principal component analysis (PCA). Finally, the dimensionality-reduced feature matrix is input into the twin support vector machine (TWSVM) for training. After training the classifier to get the optimal parameters, the dimensionality-reduced scattering feature matrix of the testing signal is input into the classifier for testing. Experimental results show that the classification accuracy of the proposed method can reach 98% or more, and the running time is greatly reduced compared with support vector machine (SVM).
Article
Full-text available
Objective: Cardiac auscultation is the most practiced non-invasive and cost-effective procedure for the early diagnosis of heart diseases. While machine learning based systems can aid in automatically screening patients, the robustness of these systems is affected by numerous factors including the stethoscope/sensor, environment and data collection protocol. This paper studies the adverse effect of domain variability on heart sound abnormality detection and develops strategies to address this problem. Methods: We propose a novel Convolutional Neural Network (CNN) layer, consisting of time-convolutional (tConv) units, that emulate Finite Impulse Response (FIR) filters. The filter coefficients can be updated via backpropagation and be stacked in the front-end of the network as a learnable filterbank. Results: On publicly available multi-domain datasets, the proposed method surpasses the top-scoring systems found in the literature for heart sound abnormality detection (a binary classification task). We utilized sensitivity, specificity, F-1 score and Macc (average of sensitivity and specificity) as performance metrics. Our systems achieved relative improvements of up to 11.84% in terms of MAcc, compared to state-of-the-art methods. Conclusion: The results demonstrate the effectiveness of the proposed learnable filterbank CNN architecture in achieving robustness towards sensor/domain variability in PCG signals. Significance: The proposed methods pave the way for deploying automated cardiac screening systems in diversified and underserved communities.
Article
Full-text available
p>In this paper, a denoising method for heart sound signal based on empirical mode decomposition (EMD) is proposed. To evaluate the performance of the proposed method, extensive simulations are performed using synthetic normal and abnormal heart sound data corrupted with white, colored, exponential and alpha-stable noise under different SNR input values. The performance is evaluated in terms of signal-to-noise ratio (SNR), root mean square error (RMSE), and percent root mean square difference (PRD), and compared with wavelet transform (WT) and total variation (TV) denoising methods. The simulation results show that the proposed method outperforms two other methods in removing three types of noises.</p
Article
Full-text available
Cardiovascular Disease (CVD) is considered as one of the principal causes of death in the world. Over recent years, this field of study has attracted researchers’ attention to investigate heart sounds’ patterns for disease diagnostics. In this study, an approach is proposed for normal/abnormal heart sound classification on the Physionet challenge 2016 dataset. For the first time, a fixed length feature vector; called i-vector; is extracted from each heart sound using Mel Frequency Cepstral Coefficient (MFCC) features. Afterwards, Principal Component Analysis (PCA) transform and Variational Autoencoder (VAE) are applied on the i-vector to achieve dimension reduction. Eventually, the reduced size vector is fed to Gaussian Mixture Models (GMMs) and Support Vector Machine (SVM) for classification purpose. Experimental results demonstrate the proposed method could achieve a performance improvement of 16% based on Modified Accuracy (MAcc) compared with the baseline system on the Physionet2016 dataset.
Article
Full-text available
Underdetermined blind source separation (UBSS) is a hot and challenging problem in signal processing. In the traditional UBSS algorithm, the number of source signals is often assumed to be known, which is very inconvenient in practice. Additionally, it is more difficult to obtain the accurate estimation of mixing matrix in the underdetermined case. However, these information has a great influence on the source separation results, which can easily lead to poor separation performance. In this paper, a novel UBSS algorithm is presented to carry out a combined source signal number estimation and source signal separation task. In the proposed algorithm, we first design a gap-based detection method to detect the number of source signals by eigenvalue decomposition. Then, the estimation of mixing matrix is processed using a higher-order cumulant-based method so that the uniqueness of the estimated mixing matrix is guaranteed. Furthermore, an improved l1-norm minimization algorithm is proposed to estimate the source signals. Meanwhile, the per-conditioned conjugate gradient technology is employed to accelerate the convergence rate such that the computational load is reduced. Finally, a series of simulation experiments with synthetic heart sound data and image reconstruction results demonstrate that the proposed algorithm achieve better separating property than the state-of-the-art algorithms.
Article
A new set of morphological characteristics of Phonocardiogram (PCG) signal is presented for recognition of first (S1) and second (S2) heart sounds (HSs). Initially, variational mode decomposition on PCG signal generates a set of amplitude and frequency modulated narrow band-limited components (NBCs) and Hilbert transformation of these NBCs comprehends its complex plane analytic signal representation (ASR). Instantaneous spectral attributes encompassing amplitude modulation bandwidths and convex hull area measure from the ASRs are concatenated to form the feature set. Experimental results on both publicly available and experimentally recorded HSs signals outperform the existing state-of-the-art. Also, the proposed technique does not require any timing information between S1 and S2 and electrocardiogram (ECG) signal reference and is highly robust to noisy real-world PCGs as shown by noise analysis.
Article
Abnormal heart sound detection is an effective and convenient method for the preliminary diagnosis of heart diseases. In this study, we propose a novel method for abnormal heart sound detection using temporal quasi-periodic features and long short-term memory without segmentation. In the proposed method, the spectrogram of the heart sound signal is extracted using the short-time Fourier transform in the first step. Subsequently, the temporal quasi-periodic features of the heart sound signal are calculated by the average magnitude difference function from the spectrogram in different frequency bands. Moreover, to extract the dependency relation within the temporal quasi-periodic features, the method of long short-term memory is applied. Thus, more discriminative features are obtained. Finally, the performance of the proposed method is evaluated on the public dataset offered by the 2016 PhysioNet/Computing in Cardiology Challenge, and the results indicate that our proposed method is competitive compared with the state-of-the-art abnormal heart sound detection methods.
Article
Segmentation of fundamental heart sounds–S1 and S2 is important for automated monitoring of cardiac activity including diagnosis of the heart diseases. This pa-per proposes a novel hybrid method for S1 and S2 heart sound segmentation using group sparsity denoising and variation mode decomposition (VMD) technique. In the proposed method, the measured phonocardiogram (PCG) signals are denoised using group sparsity algorithm by exploiting the group sparse (GS) property of PCG signals. The denoised GS-PCG signals are then decomposed into subsequent modes with specific spectral characteristics using VMD algorithm. The appropriate mode for further processing is selected based on mode central frequencies and mode energy. It is then followed by the extraction of Hilbert envelope (HEnv) and a thresholding on the selected mode to segment S1 and S2 heart sounds. The performance advantage of the proposed method is verified using PCG signals from benchmark databases namely eGeneralMedical, Littmann, Washington, and Michigan. The proposed hybrid algorithm has achieved a sensitivity of 100%, positive predictivity of 98%, accuracy of 98% and detection error rate of 1.5%. The promising results obtained suggest that proposed approach can be considered for automated heart sound segmentation.
Article
The efficiency of automated heart sound analysis mostly depends on accurate detection of acoustic events. In this study, an acoustic feature based heart sound segmentation algorithm has been proposed for automatic identification of the fundamental heart sounds (FHS). Gammatone filter bank energy has been introduced to represent the heart sound distinctive features. A supervised artificial neural network (ANN) model is used to detect S1-S2 and non S1-S2 segments of the cardiac cycle. Finally time based information is utilized to identify S1 and S2 positions. Performance of the system is evaluated using 764 real and noisy heart sound cycles (both normal and abnormal domains) from the 2016 PhysioNet/CinC challenge database with annotations provided for heart sound states. The accuracy achieved using Cochleagram feature is more than 95% for both first and second heart sound identification. Proposed technique shows that multilayer perceptron (MLP) neural network using Cochleagram feature improvises the overall S1-S2 identification accuracy compared to the other acoustic features reported earlier.
Article
Rotation forest (RoF) is a powerful ensemble classifier and has attracted substantial attention due to its performance in hyperspectral data classification. Multi-class imbalance learning is one of the biggest challenges in machine learning and remote sensing. The standard technique for constructing RoF ensemble tends to increase the overall accuracy; RoF has difficulty to sufficiently recognize the minority class. This paper proposes a novel dynamic SMOTE (synthetic minority oversampling technique)-based RoF algorithm for the multi-class imbalance problem. The main idea of the proposed method is to dynamically balance the class distribution before building each rotation decision tree. A resampling rate is set in each iteration (ranging from 10% in the first iteration to 100% in the last) and this ratio defines the number of minority class instances randomly resampled (with replacement) from the original dataset in each iteration. The rest of the minority class instances are generated by the SMOTE method. The reported results on three real hyperspectral datasets show that the proposed method can get better performance than random forest, RoF, and some popular data sampling methods.