Conference PaperPDF Available

Lung Disease Classification using Deep Convolutional Neural Network

Authors:

Abstract and Figures

The advanced technologies are essential to achieving the improvement of medicine. More specifically, an extensive investigation in a partnership among researchers, health care providers, and patients is integral to bringing precise and customized treatment strategies in taking care of various diseases. This paper aims to assess the degree of accuracy acceptable in the medical field by utilizing deep learning to publicly available data. First, we extracted spectrogram features and labels of the annotated lung sound samples and used them as an input to our 2D Convolutional Neural Network (CNN) model. Secondly, we normalized the lung sounds to remove the peak values and noise from them. For deep learning classification, publicly available data was not sufficient to conduct the learning process. Finally, we have created a deep learning model called Lung Disease Classification (LDC), combined with advanced data normaliza-tion and data augmentation techniques, for high-performance classification in lung disease diagnosis. The final accuracy obtained after the normalization and augmentation was approximately 97%. The proposed model paves the way for adequate assessment of the degree of accuracy acceptable in the medical field and guarantees better performance than other previously reported approaches.
Block Diagram for LDC System ing OST, which rescaled the features for ResNets. Dalal et al. [11] has compared four methods of machine learning approaches for the purpose of lung sound classification using lungs dataset. CNN according to their experimentation outperformed all other classifiers. However, this all depend on the batch size and number of epochs. Although they have obtained an accuracy of approximately 97% but their machine utilization was very high by applying almost 1 million or more epochs. Rupesh et al. [12] have reviewed several features extraction and classification techniques for pulmonary obstructive diseases such as COPD and asthma. In their review, the feature extraction used were FFT, STFT, spectrograms and wavelet transform. The best accuracy that was reported for CNN was approximately 95% after all possible efforts. Chen et al. [13] proposed a solution for automatic early detection of a disease using CNN for heart and lungs. They collected data from volunteer patients, which were manually annotated by doctors for the consideration of experiments. The dataset was too limited to have any consequences for results. Salamon and Bello [14] presented the data augmentation technique for environmental sound classification using CNN. The deformation of audio was performed through stretching, pitch shifting, dynamic range compression, and background noise. Piczak [15] proposed a CNN model for classification of environmental sounds. Their 1D CNN architecture consists of two convolutional rectified layers by applying max pooling, two fully connected hidden layers, and a softmax output layer. The data was augmented through random time delays and pitch shifting. Mel spectrograms were extracted from all audio samples, resampled and normalized.
… 
Content may be subject to copyright.
Lung Disease Classification using Deep Convolutional
Neural Network
Zeenat Tariq, Sayed Khushal Shah, Yugyung Lee
School of Computing and Engineering, University of Missouri-Kansas City, USA
zt2gc@mail.umkc.edu, ssqn7@mail.umkc.edu, leeyu@umkc.edu
Abstract—The advanced technologies are essential to achieving
the improvement of medicine. More specifically, an extensive
investigation in a partnership among researchers, health care
providers, and patients is integral to bringing precise and
customized treatment strategies in taking care of various diseases.
This paper aims to assess the degree of accuracy acceptable in
the medical field by utilizing deep learning to publicly available
data. First, we extracted spectrogram features and labels of the
annotated lung sound samples and used them as an input to our
2D Convolutional Neural Network (CNN) model. Secondly, we
normalized the lung sounds to remove the peak values and noise
from them. For deep learning classification, publicly available
data was not sufficient to conduct the learning process. Finally,
we have created a deep learning model called Lung Disease
Classification (LDC), combined with advanced data normaliza-
tion and data augmentation techniques, for high-performance
classification in lung disease diagnosis.
The final accuracy obtained after the normalization and
augmentation was approximately 97%. The proposed model
paves the way for adequate assessment of the degree of accuracy
acceptable in the medical field and guarantees better performance
than other previously reported approaches.
Index Terms—Data normalization, Data augmentation, Con-
volutional neural network, Lungs sound classification, Deep
learning.
I. INTRODUCTION
Lung sounds are the acoustic signals generated from breath-
ing. An auscultatory method has been applied widely by
physicians to examine lung sounds associated with different
respiratory symptoms. The auscultatory method has been the
easiest way to diagnose patients with respiratory diseases such
as pneumonia, asthma, and bronchiectasis [1], [2]. However,
it is a manual process, which takes a lot of time and creates
a possibility of more or less accuracy due to the complexity
of the sound patterns and characteristics. This may involve
a high risk of missed data, leading to underdiagnosed or
misdiagnosed results [3], [4]. The accuracy of auscultation is
not always correct and reliable since it was found that in one
of the studies, the residents were not able to identify 100% of
wheezing sounds in a series of pulmonary disease sounds [5].
Machine learning plays an important role in classifying
different types of sounds through multiple algorithms [6].
Deep learning is a branch of machine learning, which has
attracted a lot of attention due to its high performance in
prediction and classification. These learning techniques are
among the fastest-growing fields nowadays in the area of audio
classification [7]. These classifiers outperform humans due to
the ability to ignore noise and memory issues.
In this paper, we have applied deep learning techniques
for better classification of our results on the diagnosis of
respiratory symptoms. We propose our model that is uniquely
designed with a popular deep learning network, Convolutional
Neural Network (CNN). Specifically, we introduce various
advanced preprocessing techniques such as normalization and
augmentation for an effective lung sounds classification. The
classification is based on the spectrogram features that are
extracted from the audio dataset. The traditional classification
results vary due to the existence of noise in the audio samples,
which are due to the environmental interference. The existing
CNN approaches have adopted a different architecture and
therefore obtaining an accuracy between 80 - 95% with very
high consumption of memory, which are purely based on audio
feature techniques. The dataset used for the experimentation
is a public dataset provided for research in [8].
One of the challenges in the research was finding the data
that is publicly available and cleaning the data that are not
recorded properly and cannot be accepted if it is given as
an input to a class. Because of directly recording audio from
lungs, the audio samples may have some noise coming from
the heart or any other sounds that exist in the body. To improve
the accuracy, we have applied the data normalization technique
on the original data to rescale the audio samples in a better
position and average values for better accuracy.
Deep learning relies on large amounts of data. Due to
limited amount of publicly available data, there is limited
research progress in this field. To tackle this problem. we
have proposed a solution that is known in deep learning field
as Data Augmentation [9]. Therefore, to improve our results
further, we needed larger amounts of data. For that purpose,
we have applied our data augmentation techniques, which can
help the CNN model report a better accuracy. Finally, our
model was observed to outperform all other models that are
already researched so far. Large amounts of data could not be
experimented by other researchers while data augmentation
made us stand out and outperform all other researches.
II. RE LATE D WORK
Chen et al. [10] proposed a novel solution for lung sounds
classification by using a publicly available dataset. The dataset
was divided into three categories, i.e., wheezes, crackles and
normal. They proposed a detection method using optimized
S-transformed (OST) and deep residual networks (ResNets).
They performed preprocessing on the audio samples by us-
Fig. 1. Block Diagram for LDC System
ing OST, which rescaled the features for ResNets. Dalal et
al. [11] has compared four methods of machine learning
approaches for the purpose of lung sound classification us-
ing lungs dataset. CNN according to their experimentation
outperformed all other classifiers. However, this all depend
on the batch size and number of epochs. Although they
have obtained an accuracy of approximately 97% but their
machine utilization was very high by applying almost 1 million
or more epochs. Rupesh et al. [12] have reviewed several
features extraction and classification techniques for pulmonary
obstructive diseases such as COPD and asthma. In their review,
the feature extraction used were FFT, STFT, spectrograms
and wavelet transform. The best accuracy that was reported
for CNN was approximately 95% after all possible efforts.
Chen et al. [13] proposed a solution for automatic early
detection of a disease using CNN for heart and lungs. They
collected data from volunteer patients, which were manually
annotated by doctors for the consideration of experiments. The
dataset was too limited to have any consequences for results.
Salamon and Bello [14] presented the data augmentation
technique for environmental sound classification using CNN.
The deformation of audio was performed through stretching,
pitch shifting, dynamic range compression, and background
noise. Piczak [15] proposed a CNN model for classification
of environmental sounds. Their 1D CNN architecture consists
of two convolutional rectified layers by applying max pooling,
two fully connected hidden layers, and a softmax output layer.
The data was augmented through random time delays and
pitch shifting. Mel spectrograms were extracted from all audio
samples, resampled and normalized.
III. MET HO D
A. Data Normalization
In this paper, we have evaluated existing normalization
techniques and selected three best ones for the evaluation.
Root Mean Square Normalization In the Root Mean Square
(RMS) Normalization, the amplitude level takes the average
of a signal amplitude where it does not work as the arithmetic
mean of a signal received.
The RMS level is useful for finding the signal strength based
on the amplitude regardless of positive or negative values of
the signal. For a given signal, x=x1, x2, . . . , xn, the RMS
value, xrms is:
xrms =rx2
n=r1
n(x2
1+x2
2+. . . +x2
n)(1)
The signal amplitude normalization can only be possible if
we can figure out the scaling factor that can perform the linear
gain change. There is a possibility to scale a signal with an
amplitude that is higher than 1 or less than zero 0 decibels
(db). For applying the linear gain change we can rearrange
the above RMS level formula as shown in Equation 2 where
R has a linear scale.
R=r1
n[(ax1)2+ (ax2)2+. . . + (axn)2]
a=snR2
(x1)2+ (x2)2+. . . + (xn)2
(2)
Peak Normalization In peak normalization, the peak signal
level is analyzed in decibels relative to full scale (dBFS) and
for the purpose of normalization, it amplifies the volume of the
signal in such a manner that the output gets 0 dB maximum.
This process can scale the amplitude of all input audio signals
in such a way that the highest amplitude of the signal has a
value of 1. The output signal based on above scaling can be
mathematically calculated as
out =1
max(abs(in)).in (3)
EBU Union Standard R128 Normalization European Broad-
casting Union (EBU) Standard R128 Normalization focused
on measuring the average loudness of a program in the
normalization of audio signals.
B. Data Augmentation
We have experimented different types of data augmentation
and concluded to experiment our results in three different ways
such as time stretching, pitch shifting, and dynamic range
compression [16]. Initially, the original data consists of 920
audio samples. After applying the augmentation techniques,
the total audio samples obtained including the original audio
samples were 11960. The files size that occupied the storage
Fig. 2. Spectrogram Feature Extraction
was 26GB. For data augmentation, it is important to select the
deformation patterns in such a way that the original labels are
maintained and augmented.
Time Stretching For augmentation, the speed of the audio
sample is changed and is increased or decreased by some
factors [17]. We used four audio speeds, i.e., 0.5, 0.7, 1.2
and 1.5, along with the original audio sample files.
Pitch Shifting For data augmentation, the pitch of the audio
samples are either decreased or increased by 4 values (semi-
tones) [18]. The duration of the audio samples is kept constant
similar to the original audio samples i.e.,4 - 10 seconds. The
value changed in semitones ranged between -2, -1, 1, 2.
Dynamic Range Compression This technique compresses the
dynamic range of the audio sample by four parameters. Among
them, three are taken from Dolby E Standard and 1 is taken
from ice cast radio live streaming server.
C. Network Model
Figure 1 shows the block diagram for the LDC system.
Data Normalization and data augmentation techniques are
applied to Lungs sound data where spectrogram features are
extracted from the regenerated audio samples. These extracted
features are passed to 2D CNN for classification. There are
two main components of a convolutional neural network,
i.e.,feature extractor and classifier. The feature extractor extract
the spectrogram features from the audio signal and pass them
to a classifier to classify the signals into their appropriate
categories. The classifier consists of different convolutional
and pooling layers, followed by linear activation and fully
connected layers, which are used for classification purpose.
The mathematical form of the convolutional layers can be
found in Equation 4 and 5.
[xl
i,j,k =X
aX
bX
c
w(l1,f)
i,j,k y(l1)
i+a,j+b,k+c+biasf](4)
[yl
i,j,k =σ(x(l)
i,j,k)] (5)
The output layer is represented by yl
i,j,k where as the 3-
dimensional input tensor is denoted by i, j, k. The weights
for filters are denoted by w(l)
i,j,k and σ(x(l)
i,j,k)describes the
sigmoid function for linear activation. The fully connected is
layer is represented by Equation 6 and 7.
[x(l)
iX
j
wl1
i,j yl1
j+biasl1
j](6)
Fig. 3. Classification Accuracy: (a) Original, (b) Normalized, (c) Augmented
[yl
i,j,k =σ(xl
i,j,k)] (7)
The 2D CNN architecture is composed of 5 layers. The first
three are the convolutional layers, which are enclosed by max
pool layer and finally they are followed two fully connected
layers. We extracted librosa features for Mel spectrograms
because for noise data spectrograms are considered as the best
to differentiate between type of sounds. During the extraction
of features, we have used window size and hope the size of 23
ms. As the sound clips vary between 3 to 10 seconds so that
we kept the extraction to 3 seconds to make every bit of the
sound clip usable. The input from the sound clips is reshaped
and XR128x128 shape is provided to the classifier.
The first layer takes the reshaped features as an input in
the form of spectrograms with 24 filters. It takes the shape
of [24x1x5x5]. The stride in this layer is [4x2] with ReLU
as the activation function. The second layer has 48 filters of
the shape [48x24x5x5] with [4x2] stride max-pooling layer
and using ReLU as the activation layer. The third layer also
takes 48 filters with receptive field [5x5] resulting in shape
[48x48x5x5], and the activation is ReLU without pooling.
Finally, the fourth layer has 64 hidden units resulting in shape
[2000x64] with ReLU activation and [64x10] with softmax
activation. In the top layer, we considered [5x5], which is a
very small receptive layer due to the localized patterns.
IV. EXPERIMENTAL DESI GN A ND RESULTS
The dataset is composed of a total 5.5 hours of recording,
which are further divided into recording samples of 126
patients. The categories include Asthma, Chronic Obstructive
Pulmonary Disease (COPD), Healthy, Upper Respiratory Tract
Infection (URTI), Lower Respiratory Tract Infection (LRTI),
and Pneumonia. . Table 1 shows the categories and the number
of data in the dataset.
We used librosa [19] for the spectrogram feature extraction.
Figure 2 shows the features extracted from the Lung Sounds
dataset for spectrograms.
TABLE I
ORIGINAL AND AUGMENTED DATA SIZE
ID Name of Disease #Audio File #Augmented Audio File
1 Asthma 1 13
2 Bronchiectasis 29 377
3 COPD 785 10205
4 Healthy 35 455
5 LRTI 2 26
6 Pneumonia 37 481
7 URTI 31 403
TABLE II
LDC SYS TE M MODE L RESU LT COMPARISON
Model Technique Accuracy
Model 1 Original Data 83%
Model 2 Peak Value Normalization 86%
Model 3 RMS Normalization 87%
Model 4 EBU Normalization 88%
Model 5 Augmentation applied on Original Data 93%
Model 6 Normalized Peak Value Augmentation 92%
Model 7 Normalized RMS Value Augmentation 94%
Model 8 Normalized EBU Value Augmentation 97%
We have designed our experiments to evaluate the proposed
lung sound classification based on 2D CNN with the lungs
sound dataset. The dataset is split into 70% and 30% for
training/testing. The batch size was 32 and the number of
epochs was fixed at 100 to avoid any over/under-fitting. The
results of each instance for the LDC system is shown in Table
II. It was observed during our experimentation stage that the
highest accuracy achieved by the existing research is 97%,
which is dependent on GPU usage and memory consumption.
It can be seen from Table II that the LDC experimentation
for Models 1-8. Although the data was not enough for the
training, we were able to achieve good results.
Our model is experimented for 2D CNN classification
network on the original dataset, which reported an accuracy
of approximately 83%. Further, we have applied the three
types of normalization i.e., Peak, RMS and EBU, and obtained
an accuracy of 86%, 87% and 88%, respectively. The data
augmentation is considered as the trend making technique in
deep learning for small datasets. Further, the accuracy reported
from the 2D CNN for the original data augmentation was
93%. We have also applied three augmentation techniques on
normalized data and the highest accuracy achieved was 97%.
Even though the data was not enough and had a lot of variation
and environmental interference in recording (i.e.,heart beat,
running fan), it was observed that our technique has achieved
a very good accuracy in comparison with the state of the art
research considering feature based approach.
Figure 3(a)-(c) represents the accuracy of 2D CNN network
based on lungs dataset for original, normalized and augmented
data. It was analyzed that when the data was in original form,
CNN ran into overfitting and the highest accuracy reported was
between 83%-86%. The accuracy reported for the models has
little variations, which is due to the nature of the data. After
normalization, we have noticed that the accuracy improved and
ranged between 85%-90%. Finally, by applying augmentation
we can see a visible increase in accuracy, which was reported
approximately between 96%-99%. The result obtained during
our experimentation out performs the method proposed in [11].
V. CONCLUSION
In this paper, we developed the Lung Disease Classification
(LDC) system combined with advanced data normalization and
data augmentation techniques, for high-performance classifica-
tion in lung disease diagnosis. We have obtained 97% accuracy
better than the state-of-the art accuracy. This confirms that
the proposed model could be used for the diagnosis of lung
diseases with lung sounds in health care.
REFERENCES
[1] E. Pacht, J. Turner, M. Gaillun, L. Violi, D. Ralston, H. Mekhjian, and
R. John, “Effectiveness of telemedicine in the outpatient pulmonary
clinic,” Telemedicine Journal, vol. 4, no. 4, pp. 287–292, 1998.
[2] Y. Kahya, EC. Guler, and S. Sahin, “Respiratory disease diagnosis
using lung sounds,” in Proceedings of the 19th Annual International
Conference of the IEEE Engineering in Medicine and Biology So-
ciety.’Magnificent Milestones and Emerging Opportunities in Medical
Engineering. IEEE, 1997, vol. 5, pp. 2051–2053.
[3] J. Kaur, K. Chugh, A. Sachdeva, and L. Satyanarayana, “Under diagnosis
of asthma in school children and its related factors,” Indian pediatrics,
vol. 44, no. 6, pp. 425, 2007.
[4] A. Mandke and K. Mandke, “Under diagnosis of copd in primary care
setting in surat, india,” 2015.
[5] S. Mangione and L. Nieman, “Pulmonary auscultatory skills during
training in internal medicine and family practice,” Am J respiratory &
critical care medicine, vol. 159, no. 4, pp. 1119–1124, 1999.
[6] J. Geiger and K. Helwani, “Improving event detection for audio
surveillance using gabor filterbank features,” in Signal Processing
Conference, 23rd European. IEEE, 2015, pp. 714–718.
[7] L. Deng, D. Yu, et al., “Deep learning: methods and applications,
Foundations and Trends® in Signal Processing, vol. 7, no. 3–4, pp.
197–387, 2014.
[8] BM Rocha, D Filos, L Mendes, I Vogiatzis, E Perantoni, et al., “A respi-
ratory sound database for the development of automated classification,
in Precision Medicine Powered by pHealth and Connected Health, pp.
33–37. Springer, 2018.
[9] I. Rebai, Y. BenAyed, W. Mahdi, and JP. Lorr´
e, “Improving speech
recognition using data augmentation and acoustic model fusion,” Pro-
cedia Computer Science, vol. 112, pp. 316–322, 2017.
[10] H. Chen, X. Yuan, Z. Pei, M. Li, and J. Li, “Triple-classification
of respiratory sounds using optimized s-transform and deep residual
networks,” IEEE Access, vol. 7, pp. 32845–32852, 2019.
[11] D. Bardou, K. Zhang, and S. Ahmad, “Lung sounds classification using
convolutional neural networks, Artificial intelligence in medicine, vol.
88, pp. 58–69, 2018.
[12] R. Dubey and R. M. Bodade, “A review of classification techniques
based on neural networks for pulmonary obstructive diseases, 2019.
[13] Q. Chen, W. Zhang, X. Tian, X. Zhang, S. Chen, and W. Lei, “Auto-
matic heart and lung sounds classification using convolutional neural
networks,” in 2016 Asia-Pacific Signal and Information Processing
Association Annual Summit and Conference. IEEE, 2016, pp. 1–4.
[14] J. Salamon, C. Jacoby, and JP. Bello, A dataset and taxonomy for
urban sound research,” in Proceedings of the 22nd ACM international
conference on Multimedia. ACM, 2014, pp. 1041–1044.
[15] K. Piczak, “Environmental sound classification with convolutional
neural networks,” in Machine Learning for Signal Processing, IEEE
25th International Workshop on. IEEE, 2015, pp. 1–6.
[16] LR. Aguiar, Y. Costa, and NC. Silla, “Exploring data augmentation to
improve music genre classification with convnets,” in 2018 International
Joint Conference on Neural Networks. IEEE, 2018, pp. 1–8.
[17] S. Wei, K. Xu, D. Wang, F. Liao, H. Wang, and Q. Kong, “Sample
mixed-based data augmentation for domestic audio tagging,” arXiv
preprint arXiv:1808.03883, 2018.
[18] N. Davis and K. Suresh, “Environmental sound classification using
deep convolutional neural networks and data augmentation, in Recent
Advances in Intelligent Computational Systems. IEEE, 2018, pp. 41–45.
[19] B. McFee, C. Raffel, D. Liang, D. PW Ellis, M. McVicar, E. Battenberg,
and O. Nieto, “librosa: Audio and music signal analysis in python,” in
Proceedings of the 14th python in science conference, 2015, pp. 18–25.
... This results in an increase in the models' generalization capability, while also expanding the field of application where the trained algorithm can be used. Data constructed by this methodology are employed in [44,51,54,55,57,61,62,67,68]. On the other hand, only a limited number of studies have employed smartphone recordings as a means of data collection, with one notable example being [63], which utilized various smartphones to record lung sounds. ...
... Nevertheless, most studies are focused on respiratory sounds, since they can be correlated with almost all diseases or at least provide useful information for a preliminary diagnosis. The studies [47,[51][52][53][54][55][56][57][58][59][60][61][62]64,65] provide different techniques and methodologies for the identification of respiratory sounds, like crackles and wheezes, with promising results. The accuracy achieved in these studies is between 73% and 98%. Figure 10 provides an overview of the distribution of studies with respect to the respective research topics for each study. ...
... As mentioned above, data engineering consists of feature-based methodologies and transformation-based methodologies. The studies [44,47,52,53,[56][57][58][59]64,65] follow various transformation methodologies, in order to convert the audio signal to a more manageable form, surfacing as much helpful information as possible. It is a common procedure to transform vector-shaped data into an image-like form by performing time-frequency domain analysis. ...
Article
Full-text available
Respiratory diseases represent a significant global burden, necessitating efficient diagnostic methods for timely intervention. Digital biomarkers based on audio, acoustics, and sound from the upper and lower respiratory system, as well as the voice, have emerged as valuable indicators of respiratory functionality. Recent advancements in machine learning (ML) algorithms offer promising avenues for the identification and diagnosis of respiratory diseases through the analysis and processing of such audio-based biomarkers. An ever-increasing number of studies employ ML techniques to extract meaningful information from audio biomarkers. Beyond disease identification, these studies explore diverse aspects such as the recognition of cough sounds amidst environmental noise, the analysis of respiratory sounds to detect respiratory symptoms like wheezes and crackles, as well as the analysis of the voice/speech for the evaluation of human voice abnormalities. To provide a more in-depth analysis, this review examines 75 relevant audio analysis studies across three distinct areas of concern based on respiratory diseases’ symptoms: (a) cough detection, (b) lower respiratory symptoms identification, and (c) diagnostics from the voice and speech. Furthermore, publicly available datasets commonly utilized in this domain are presented. It is observed that research trends are influenced by the pandemic, with a surge in studies on COVID-19 diagnosis, mobile data acquisition, and remote diagnosis systems.
Article
Full-text available
Early detection of lung disease is important for timely intervention and treatment, enhancing patient outcomes and decreasing healthcare cost. Chest X-rays are a widely employed imaging modality to examine the structures within the chest, including the lungs and surrounding tissues. Lung disease detection using chest X-rays is a critical application of medical imaging and artificial intelligence (AI) in healthcare. Recently, lung disease detection using deep learning (DL) becomes a significant research area, which has the potential to improve early detection rate and decrease mortality rate. Therefore, this article introduces a Multi-Feature Fusion Based Deep Transfer Learning with Enhanced Dung Beetle Optimization Algorithm (MFFTL-EDBOA) for lung disease detection and classification. The MFFTL-EDBOA technique aims to recognize the existence of lung diseases on CXR images. At the primary stage, the MFFTL-EDBOA technique uses adaptive filtering (AF) approach to remove the noise level. Besides, a multi-feature fusion-based feature extraction approach is developed based on three DL models namely DenseNet, EfficientNet, and MobileNet. For accurate lung disease detection and classification purposes, the convolutional fuzzy neural network (CFNN) approach is utilized. The hyperparameter tuning of the CFNN model occurs using the EDBOA. To illustrate the enhanced lung disease detection results of the MFFTL-EDBOA technique, a sequence of experiments is carried out on benchmark medical dataset from Kaggle repository. The experimental values highlighted the greater result of the MFFTL-EDBOA system over other recent approaches with maximum accuracy of 98.99%.
Article
Full-text available
This pioneering study aims to revolutionize self-symptom management and telemedicine-based remote monitoring through the development of a real-time wheeze counting algorithm. Leveraging a novel approach that includes the detailed labeling of one breathing cycle into three types: break, normal, and wheeze, this study not only identifies abnormal sounds within each breath but also captures comprehensive data on their location, duration, and relationships within entire respiratory cycles, including atypical patterns. This innovative strategy is based on a combination of a one-dimensional convolutional neural network (1D-CNN) and a long short-term memory (LSTM) network model, enabling real-time analysis of respiratory sounds. Notably, it stands out for its capacity to handle continuous data, distinguishing it from conventional lung sound classification algorithms. The study utilizes a substantial dataset consisting of 535 respiration cycles from diverse sources, including the Child Sim Lung Sound Simulator, the EMTprep Open-Source Database, Clinical Patient Records, and the ICBHI 2017 Challenge Database. Achieving a classification accuracy of 90%, the exceptional result metrics encompass the identification of each breath cycle and simultaneous detection of the abnormal sound, enabling the real-time wheeze counting of all respirations. This innovative wheeze counter holds the promise of revolutionizing research on predicting lung diseases based on long-term breathing patterns and offers applicability in clinical and non-clinical settings for on-the-go detection and remote intervention of exacerbated respiratory symptoms.
Article
Full-text available
Auscultation is crucial for the diagnosis of respiratory system diseases. However, traditional stethoscopes have inherent limitations, such as inter-listener variability and subjectivity, and they cannot record respiratory sounds for offline/retrospective diagnosis or remote prescriptions in telemedicine. The emergence of digital stethoscopes has overcome these limitations by allowing physicians to store and share respiratory sounds for consultation and education. On this basis, machine learning, particularly deep learning, enables the fully-automatic analysis of lung sounds that may pave the way for intelligent stethoscopes. This review thus aims to provide a comprehensive overview of deep learning algorithms used for lung sound analysis to emphasize the significance of artificial intelligence (AI) in this field. We focus on each component of deep learning-based lung sound analysis systems, including the task categories, public datasets, denoising methods, and, most importantly, existing deep learning methods, i.e., the state-of-the-art approaches to convert lung sounds into two-dimensional (2D) spectrograms and use convolutional neural networks for the end-to-end recognition of respiratory diseases or abnormal lung sounds. Additionally, this review highlights current challenges in this field, including the variety of devices, noise sensitivity, and poor interpretability of deep models. To address the poor reproducibility and variety of deep learning in this field, this review also provides a scalable and flexible open-source framework that aims to standardize the algorithmic workflow and provide a solid basis for replication and future extension: https://github.com/contactless-healthcare/Deep-Learning-for-Lung-Sound-Analysis .
Chapter
Artificial intelligence has been a revolutionary concept for the healthcare sector in recent years. Deep Neural Networks (DNNs) are subdomains of machine learning which is a vital tool for applications such as diagnostic and therapy suggestions. Pulmonary diseases significantly influence the overall well-being of numerous individuals worldwide, greatly hampering their ability to lead a healthy and balanced life. The present study uses an ensemble technique to detect Pulmonary Diseases. Here, lung sounds obtained by auscultation are transformed into spectrograms and classified using Convolutional Neural Networks (CNN) trained on various architectures. The proposed study shows an accuracy of 97.3%.
Article
Full-text available
Digital respiratory sounds provide valuable information for telemedicine and smart diagnosis in non-invasive way of pathological detection. As the typical continuous abnormal respiratory sound, wheeze is clinically correlated with asthma or chronic obstructive lung diseases. Meanwhile, the discontinuous adventitious crackle is clinically correlated with pneumonia, bronchitis and so on. The detection and classification of the both attract many studies for decades. However, due to the contained artifacts and constrained feature extraction methods, the reliability and accuracy of the classification of wheeze, crackle and normal sounds need significant improvement. In this paper, we propose a novel method for the identification of wheeze, crackle and normal sounds using the Optimized S-transform (OST) and deep Residual Networks (ResNet). Firstly, the raw respiratory sound is processed by the proposed OST. Then the spectrogram of OST is rescaled for the Resnet. After the feature learning and classification are fulfilled by the ResNet, the classes of respiratory sounds are recognized. Because the proposed OST highlights the features of wheeze, crackle and respiratory sounds, and the deep residual-learning generates discriminative features for better recognition, this proposed method provides a reliable access for respiratory disease related telemedicine and E-health diagnosis. Experimental results show that the proposed OST and ResNet is excellent for the multi-classification of respiratory sounds with the accuracy, sensitivity, specificity up to 98.79%, 96.27% and 100% respectively. The comparison results of the triple-classification of respiratory sounds indicate that the proposed method outperforms the deep-learning based ensembling Convolutional Neural Network (CNN) by 3.23% and the empirical mode decomposition-based Artificial Neural Network (ANN) by 4.63% respectively.
Conference Paper
Full-text available
The automatic analysis of respiratory sounds has been a field of great research interest during the last decades. Automated classification of respiratory sounds has the potential to detect abnormalities in the early stages of a respiratory dysfunction and thus enhance the effectiveness of decision making. However, the existence of a publically available large database, in which new algorithms can be implemented, evaluated, and compared, is still lacking and is vital for further developments in the field. In the context of the International Conference on Biomedical and Health Informatics (ICBHI), the first scientific challenge was organized with the main goal of developing algorithms able to characterize respiratory sound recordings derived from clinical and non-clinical environments. The database was created by two research teams in Portugal and in Greece, and it includes 920 recordings acquired from 126 subjects. A total of 6898 respiration cycles were recorded. The cycles were annotated by respiratory experts as including crackles, wheezes, a combination of them, or no adventitious respiratory sounds. The recordings were collected using heterogeneous equipment and their duration ranged from 10 to 90 s. The chest locations from which the recordings were acquired was also provided. Noise levels in some respiration cycles were high, which simulated real life conditions and made the classification process more challenging.
Article
Full-text available
Deep learning based systems have greatly improved the performance in speech recognition tasks, and various deep architectures and learning methods have been developed in the last few years. Along with that, Data Augmentation (DA), which is a common strategy adopted to increase the quantity of training data, has been shown to be effective for neural network training to make invariant predictions. On the other hand, Ensemble Method (EM) approaches have received considerable attention in the machine learning community to increase the effectiveness of classifiers. Therefore, we propose in this work a new Deep Neural Network (DNN) speech recognition architecture which takes advantage from both DA and EM approaches in order to improve the prediction accuracy of the system. In this paper, we first explore an existing approach based on vocal tract length perturbation and we propose a different DA technique based on feature perturbation to create a modified training data sets. Finally, EM techniques are used to integrate the posterior probabilities produced by different DNN acoustic models trained on different data sets. Experimental results demonstrate an increase in the recognition performance of the proposed system.
Conference Paper
Audio tagging has attracted increasing attention since last decade and has various potential applications in many fields. The objective of audio tagging is to predict the labels of an audio clip. Recently deep learning methods have been applied to audio tagging and have achieved state-of-the-art performance, which provides a poor generalization ability on new data. However due to the limited size of audio tagging data such as DCASE data, the trained models tend to result in overfitting of the network. Previous data augmentation methods such as pitch shifting, time stretching and adding background noise do not show much improvement in audio tagging. In this paper, we explore the sample mixed data augmentation for the domestic audio tagging task, including mixup, SamplePairing and extrapolation. We apply a convolutional recurrent neural network (CRNN) with attention module with log-scaled mel spectrum as a baseline system. In our experiments, we achieve an state-of-the-art of equal error rate (EER) of 0.10 on DCASE 2016 task4 dataset with mixup approach, outperforming the baseline system without data augmentation.
Article
Lung sounds convey relevant information related to pulmonary disorders, and to evaluate patients with pulmonary conditions, the physician or the doctor uses the traditional auscultation technique. However, this technique suffers from limitations. For example, if the physician is not well trained, this may lead to a wrong diagnosis. Moreover, lung sounds are non-stationary, complicating the tasks of analysis, recognition, and distinction. This is why developing automatic recognition systems can help to deal with these limitations. In this paper, we compare three machine learning approaches for lung sounds classification. The first two approaches are based on the extraction of a set of handcrafted features trained by three different classifiers (support vector machines, k-nearest neighbor, and Gaussian mixture models) while the third approach is based on the design of convolutional neural networks (CNN). In the first approach, we extracted the 12 MFCC coefficients from the audio files then calculated six MFCCs statistics. We also experimented normalization using zero mean and unity variance to enhance accuracy. In the second approach, the local binary pattern (LBP) features are extracted from the visual representation of the audio files (spectrograms). The features are normalized using whitening. The dataset used in this work consists of seven classes (normal, coarse crackle, fine crackle, monophonic wheeze, polyphonic wheeze, squawk, and stridor). We have also experimentally tested dataset augmentation techniques on the spectrograms to enhance the ultimate accuracy of the CNN. The results show that CNN outperformed the handcrafted feature based classifiers.
Conference Paper
We study the effectiveness of using convolutional neural networks (CNNs) to automatically detect abnormal heart and lung sounds and classify them into different classes in this paper. Heart and respiratory diseases have been affecting humankind for a long time. An effective and automatic diagnostic method is highly attractive since it can help discover potential threat at the early stage, even at home without a professional doctor. We collected a data set containing normal and abnormal heart and lung sounds. These sounds were then annotated by professional doctors. CNNs based systems were implemented to automatically classify the heart sounds into one of the seven categories: normal, bruit de galop, mitral inadequacy, mitral stenosis, interventricular septal defect (IVSD), aortic incompetence, aorta stenosis, and the lung sounds into one of the three categories: normal, moist rales, wheezing rale.